CN106384593A

CN106384593A - Voice information conversion and information generation method and device

Info

Publication number: CN106384593A
Application number: CN201610801720.7A
Authority: CN
Inventors: 吴育強
Original assignee: Beijing Kingsoft Software Co Ltd; Beijing Jinshan Digital Entertainment Technology Co Ltd
Current assignee: Beijing Kingsoft Software Co Ltd; Kingsoft Corp Ltd; Beijing Kingsoft Digital Entertainment Co Ltd; Beijing Jinshan Digital Entertainment Technology Co Ltd
Priority date: 2016-09-05
Filing date: 2016-09-05
Publication date: 2017-02-08
Anticipated expiration: 2036-09-05
Also published as: CN110060687A; CN106384593B

Abstract

The embodiment of the invention discloses a voice information conversion and information generation method and device, and relates to the computer technology field. The voice information conversion and information generation method and device are applied to the electronic device. The voice information conversion method comprises: receiving target voice information; and in the condition of satisfying the information conversation condition, performing voice identification of the target voice information to obtain the character conversion information to allow clients to display the character conversion information based on the display position of the target voice information. According to the scheme provided by the embodiment of the invention, the voice can be converted to the characters.

Description

A kind of voice messaging conversion, information generating method and device

Technical field

The application is related to field of computer technology, particularly to a kind of conversion of voice messaging, information generating method and device.

Background technology

With the fast development of computer technology, the various communication class clients being applied to user terminal are arisen at the historic moment.With Family can be communicated with good friend by these communication class clients.

When user is communicated with good friend by above-mentioned communication class client, can be communicated by Word message, also Can be communicated by voice messaging, so greatly be facilitated user, however, in some cases, for example, participated in certain Meeting, or surrounding is noisy, or user is not intended in the case that other people hear, user is inconvenient to listen to and is received Voice messaging, it may be considered that changing to the voice messaging receiving, is converted into word and shows to user.

In view of the foregoing, a kind of voice messaging conversion method need to be provided, to convert speech information into Word message.

Content of the invention

The embodiment of the present application discloses a kind of voice messaging conversion, information generating method and device, and voice messaging is turned Change Word message into.

For reaching above-mentioned purpose, the embodiment of the present application discloses a kind of voice messaging conversion method, is applied to electronic equipment, Methods described includes：

Receive target voice information；

In the case of meeting information switch condition, described target voice information is carried out with speech recognition and obtains text conversion Information, so that the display location based on described target voice information for the client shows described text conversion information.

For reaching above-mentioned purpose, the embodiment of the present application discloses a kind of information generating method, and methods described includes：

The target voice information that reception source client sends；

According to the first user sound bank of the corresponding first user of described target voice information, to described target voice information Carry out speech recognition；

Recognition result is sent to described source client；

Receive the update information for described recognition result that described source client sends；

Described first user sound bank is updated according to described update information.

For reaching above-mentioned purpose, the embodiment of the present application discloses a kind of voice messaging conversion equipment, is applied to electronic equipment, Described device includes：

Information receiving module, for receiving target voice information；

Sound identification module, for, in the case of meeting information switch condition, carrying out language to described target voice information Sound identification obtains text conversion information, so that the display location based on described target voice information for the client shows described word Transitional information.

For reaching above-mentioned purpose, the embodiment of the present application discloses a kind of information generation device, and described device includes：

Information receiving module, for receiving the target voice information that source client sends；

Sound identification module, for the first user sound bank according to the corresponding first user of described target voice information, Speech recognition is carried out to described target voice information；

Result sending module, for sending recognition result to described source client；

Update information receiver module, for receiving the correction letter for described recognition result that described source client sends Breath；

Sound bank update module, for updating described first user sound bank according to described update information.

As seen from the above, in the scheme that the embodiment of the present application provides, after receiving target voice information, turn meeting information Change the situation of condition, target voice information is carried out with speech recognition and obtains text conversion information, such client can be based on mesh The display location of mark voice messaging shows above-mentioned text conversion information.It can be seen that, the scheme that application the embodiment of the present application provides, can Convert speech information into Word message.

Brief description

In order to be illustrated more clearly that the embodiment of the present application or technical scheme of the prior art, below will be to embodiment or existing Have technology description in required use accompanying drawing be briefly described it should be apparent that, drawings in the following description be only this Some embodiments of application, for those of ordinary skill in the art, on the premise of not paying creative work, acceptable Other accompanying drawings are obtained according to these accompanying drawings.

The schematic flow sheet of the first voice messaging conversion method that Fig. 1 provides for the embodiment of the present application；

The schematic flow sheet of the second voice messaging conversion method that Fig. 2 provides for the embodiment of the present application；

The schematic flow sheet of the third voice messaging conversion method that Fig. 3 provides for the embodiment of the present application；

The schematic flow sheet of the 4th kind of voice messaging conversion method that Fig. 4 provides for the embodiment of the present application；

The schematic flow sheet of the 5th kind of voice messaging conversion method that Fig. 5 provides for the embodiment of the present application；

The schematic flow sheet of the 6th kind of voice messaging conversion method that Fig. 6 provides for the embodiment of the present application；

A kind of schematic flow sheet of information generating method that Fig. 7 provides for the embodiment of the present application；

A kind of structural representation of voice messaging conversion equipment that Fig. 8 provides for the embodiment of the present application；

A kind of structural representation of information generation device that Fig. 9 provides for the embodiment of the present application；

The first voice messaging conversion effect schematic diagram that Figure 10 a provides for the embodiment of the present application；

The second voice messaging conversion effect schematic diagram that Figure 10 b provides for the embodiment of the present application.

Specific embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present application, the technical scheme in the embodiment of the present application is carried out clear, complete Site preparation describes it is clear that described embodiment is only some embodiments of the present application, rather than whole embodiments.It is based on Embodiment in the application, it is every other that those of ordinary skill in the art are obtained under the premise of not making creative work Embodiment, broadly falls into the scope of the application protection.

First the concept being related in the embodiment of the present application is introduced below.

1st, sound bank

For the corresponding relation between stored voice message and Word message, sound bank includes at least one pair of above-mentioned correspondence Relation.Wherein, the corresponding Word message of voice messaging can be single language or bilingual, and the application does not enter to this Row limits.

In addition, sound bank can be sound bank that system carries or train the voice obtaining in use Storehouse.

For example, the voice messaging of pronunciation " child " corresponds to " child " two word；Or corresponding " child "；Pronunciation " child " Voice messaging, " shoes " of corresponding Sichuan words etc..

2nd, sound bank set

At least one sound bank is comprised, these sound banks can be divided in sound bank set：Received pronunciation storehouse, conventional voice Storehouse, classification sound bank etc..

Wherein, received pronunciation storehouse can be understood as the sound bank for received pronunciation, for example, for mandarin in Chinese Sound bank, the sound bank for British English, the sound bank etc. for Americanese, conventional sound bank；

Conventional sound bank, can be the sound bank being selected according to own situation, can be client according to current device institute The geographical location choice sound bank at place, can also be the sound bank that client selects according to current language mode.

Classification sound bank can be the sound bank being obtained based on different classification foundations.

For example, can be divided into according to the dissimilar of language：Chinese putonghua speech storehouse, Chinese Sichuan words sound bank, the Chinese Language Cantonese speech storehouse, Chinese Guangdong language sound bank, English Phonetics storehouse, German speech storehouse, French sound bank, Russian sound bank etc. Deng；

Can be divided into according to different professional classifications：Law class sound bank, computer sound bank, economic class sound bank etc. Deng.

In addition, the method that the voice that prior art exists is converted to word, often only support the speech recognition of mandarin.But It is known that everyone has dialect, there is a pronunciation of oneself uniqueness to some vocabulary, or some vocabulary are misread Probability also compares many, and carries out exchanging with different language between country variant user and also becomes more frequently, and this Exchange is all often that bandwidth and network environment are all more complicated by communication, leads to existing voice to be converted to word Method can not effectively identify all of voice, and be smoothly converted to word it is impossible to meet the needs of most of users.

From tut, there are personalization features, i.e. the accent of each user, pronunciation, definition, word speed, tone all not phases Same angle is set out, and can set up sound bank for each user, can be referred to as user speech storehouse, can in above-mentioned sound bank set To comprise the user speech storehouse of each user.

Specifically, set up for each user and an initial storehouse can be first set when sound bank, then using process In, constantly this initial storehouse is modified obtaining for this use according to the voice messaging of user, recognition result and update information The user speech storehouse at family, realizes the unique sound of each user and can correspond to correct Chinese text or English words.

Wherein, above-mentioned initial storehouse can artificially be selected, and for example, the normal word of user is mandarin, then this initial storehouse Chinese putonghua speech storehouse can be chosen to be, the normal word of user is Sichuan words, then this initial storehouse can be chosen to be Chinese Sichuan Words sound bank.In addition, above-mentioned initial storehouse can also be Default sound storehouse, the application is not defined to this.

The schematic flow sheet of the first voice messaging conversion method that Fig. 1 provides for the embodiment of the present application, the method is applied In electronic equipment.

Specifically, as the executive agent of the embodiment of the present application, above-mentioned electronic equipment can be server or use Family terminal, further, in the case that above-mentioned electronic equipment is user terminal, is often provided with various clients in user terminal End, therefore, in this case, the executive agent of the provided scheme of the embodiment of the present application is it can be appreciated that client.

Specifically, said method includes：

S101：Receive target voice information.

S102：In the case of meeting information switch condition, target voice information is carried out with speech recognition acquisition word and turns Change information, so that the display location based on target voice information for the client shows text conversion information.

Above-mentioned target text transitional information includes the text conversion information of at least one form, for example, it is possible to only comprise Chinese, can also be Chinese, English, can also be the text conversion that Chinese, English, French, German etc. comprise various ways Information.

Specifically, in the case of meeting information switch condition, target voice information is carried out with speech recognition and obtains word During transitional information, can be receive target voice information after, directly target voice information is carried out speech recognition obtain word After transitional information or monitoring receive for the information conversion instruction of target voice information, then target voice is carried out Speech recognition obtains text conversion information.

Meet information switch condition it is, receiving target voice information and may be considered, or only connecing Just it is considered after receiving information conversion instruction to meet information switch condition.

In addition, target voice information is carried out may having in the Word message that speech recognition is obtained with the word continuously repeating Language, or substantially do not meet the word of language convention, so after obtaining above-mentioned Word message, rule can be revised according to default Then first above-mentioned Word message is modified processing, then obtains the literary composition of target voice information further according to revised Word message Word transitional information.It is of course also possible to directly above-mentioned Word message is defined as the text conversion information of target voice information, this Shen Please be taking above-mentioned as a example to illustrate, this is not defined.

Specifically, above-mentioned default modification rule can be, filter out the word of repetition, filter out dittograph language, will not The word meeting language convention is modified to and meets word of language convention etc., and the application is not to above-mentioned default modification rule Particular content be defined.

For example, I hobby, after correction be：My hobby；

Your child's paper is very lovely, after correction is：Your child is very lovely.

Below with executive agent as server, as a example client, point different situations illustrate：

It is assumed that being communicated between the first client and the second client.

Situation one, executive agent are server

First client sends above-mentioned target voice information to server, and server receives above-mentioned target voice information Afterwards, above-mentioned target voice information is sent to the second client, the second client receives after above-mentioned target voice information at it Show and target voice information is shown on interface.

In the case of one kind, server, after receiving above-mentioned target voice information, just starts target voice information is carried out Speech recognition obtains text conversion information, when server receives the information conversion instruction that the second client sends, will be above-mentioned Text conversion information sends to the second client, shows this word by the second client based on the display location of target voice information Transitional information.Because server just proceeds by speech recognition after receiving target voice information, so receiving second Client send information conversion instruction after, can quick response second client, to second client send text conversion letter The time delay of breath is little, and Consumer's Experience is preferable.

In the case of another kind, server, after receiving above-mentioned target voice information, can also be without starting to target at once Voice messaging carries out speech recognition, but after receiving the information conversion instruction that the second client sends, just carries out voice knowledge Not Huo get text conversion information, such server only just carries out voice knowledge in the case that client has voice conversion requirements Not, server resource can effectively be saved.

Situation two, executive agent are server

First client sends above-mentioned target voice information to the second client, the first client or the second visitor simultaneously Family end also sends this target voice information to server, after the second client receives above-mentioned target voice information, in its displaying Target voice information is shown on interface.

Situation three, executive agent are server

First client sends above-mentioned target voice information to the second client, and the second client receives above-mentioned target After voice messaging, show at it and target voice information is shown on interface.

When the second client has voice messaging conversion requirements, the second user end to server sends information conversion instruction, Carry target voice information in this information conversion instruction, after server receives above- mentioned information conversion instruction, refer to from information conversion Parse target voice information in order, then carry out speech recognition and obtain text conversion information, and by the text conversion being obtained Information sends to the second client, is shown by the second client.

Situation four, executive agent are client

First client sends above-mentioned target voice information to the second client, and the second client receives target voice Show at it after information and show target voice information on interface, when carrying out voice communication in this case, the first client is direct Target voice information is sent to the second client, without by means of server, such other equipment also just cannot obtain this Target voice information, to a certain extent for can improve the safety of voice communication.

In the case of one kind, after the second client receives above-mentioned target voice information, just start target voice information is entered Row speech recognition obtains text conversion information, after the second client receives information conversion instruction, direct basis target voice The display location of information, shows above-mentioned text conversion information.

In the case of another kind, the second client can also only be shown on interface at it after receiving target voice information Show target voice information, and do not start target voice information is carried out speech recognition, only receive information conversion instruction Afterwards, just proceed by speech recognition and obtain text conversion information, and then the text conversion information of acquisition target voice information, then Just according to the display location of target voice information, show above-mentioned text conversion information.

The second user end to server mentioned in above-mentioned situation one, situation two and situation three sends information conversion instruction, Can be to receive with sending after the head of a household according to target instruction of voice messaging naturally it is also possible to be second in the second client Client receives and sends after user is directed to the selection operation of voice conversion, and the application is not to triggering the second client to clothes The condition that business device sends information conversion instruction is defined.

In above-mentioned situation four, the second client receives and may be considered second with the instruction of the head of a household's according to target voice messaging Client have received information conversion instruction, and the second client receives user and can also recognize for the selection operation of voice conversion For being that the second client have received information conversion instruction.

In addition, when the display location based on target voice information for the client shows text conversion information, can be in target language Message breath displayed page in, the preset direction of the display location of target voice information, show text conversion information, for example, can Be display location in target voice information lower section, above etc. position show text conversion information, text conversion information with The distance between target voice information can set according to practical situation, and the application does not limit the exhibition of above-mentioned text conversion information Show, particular location can determine according to practical situation.It is respectively the exhibition in target voice information for example, with reference to Figure 10 a, Figure 10 b Show in the page, show the schematic diagram of text conversion information above and below the display location of target voice information.

As seen from the above, in the scheme that the present embodiment provides, after receiving target voice information, information conversion stripes are being met The situation of part, carries out speech recognition and obtains text conversion information, such client can be based on target language to target voice information The display location of message breath shows above-mentioned text conversion information.It can be seen that, the scheme that application the embodiment of the present application provides, can be by language Message breath is converted into Word message.

In a kind of specific implementation of the application, referring to Fig. 2, there is provided the stream of second voice messaging conversion method Journey schematic diagram, compared with previous embodiment, in the present embodiment, in the case of meeting information switch condition, to target language message Breath carries out speech recognition and obtains text conversion information, so that the display location based on target voice information for the client shows word Transitional information (S102), including：

S102A：In the case of meeting information switch condition, target voice information is carried out with voice word segmentation processing, obtain The voice participle that target voice packet contains.

Specifically, when voice word segmentation processing being carried out to target voice information, can be in conjunction with the amplitude of sound in voice messaging Information, for example, represents that amplitude information of sound pausing etc. carries out voice word segmentation processing, can also be according to fixing data length Carry out voice word segmentation processing, the application is not defined to the concrete mode of voice word segmentation processing.

Can summarize from foregoing description and learn, carrying out voice word segmentation processing to voice messaging can simply be interpreted as：Right Voice messaging carries out segment processing, the little voice messaging of the corresponding data length of each segmentation after segment processing.Also It is that above-mentioned voice participle can be understood as：The data length voice messaging less than the data length of target voice information.

S102B：For each voice participle being obtained, select rule according to default sound bank, from sound bank set Middle selection sound bank, and speech recognition acquisition text conversion information is carried out by selected sound bank to this voice participle, with The display location based on target voice information for the client is made to show text conversion information.

Above-mentioned default sound bank selects the rule that rule can be for sound bank selecting sequence, carries out speech recognition When, select sound bank selecting sequence specified in rule according to sound bank, first select a sound bank, with this sound bank to voice Participle carries out speech recognition, if recognition result display discrimination is low, selects next according to above-mentioned sound bank selecting sequence Sound bank, carries out speech recognition to voice participle again, until recognition result display discrimination meets default requirement.

For example, it is assumed that default sound bank selection rule is：First select user speech storehouse, the identification knot in user speech storehouse In the case that fruit is unsatisfactory for default requirement, selection standard sound bank, then when speech recognition being carried out to a voice participle, first adopt Carry out speech recognition with active user's corresponding user speech storehouse, if recognition result display discrimination is higher than default threshold value, Using corresponding for present user speech storehouse recognition result as this voice participle voice identification result；If recognition result display identification Rate is not higher than default threshold value, then carry out speech recognition using received pronunciation storehouse to above-mentioned voice participle, and recognition result is made Voice identification result for this voice participle.

Certainly above-mentioned default sound bank selects the rule can also to be：First selection standard sound bank, for received pronunciation In the case that the recognition result in storehouse is unsatisfactory for default requirement, select user speech storehouse.

Specifically, above-mentioned default sound bank selects rule can also be according at least one determination in following information Rule：

Send the classification of the first user said target group of target voice information, for example：First user said target group The classification of group is law class, then the topic that in this group, user the discusses probability related to law is higher, prioritizing selection law class Sound bank, the classification of first user said target group is IT class, then general related to IT of topic that in this group, user discusses Rate is higher, prioritizing selection IT class sound bank etc.；

The language form of above-mentioned target group title, for example, the entitled Chinese of target group, then prioritizing selection Chinese is common Words sound bank, the entitled English of target group, then prioritizing selection English Phonetics storehouse etc.；

The customer attribute information of above-mentioned first user, above-mentioned customer attribute information can comprise sex, age etc., example As the customer attribute information of first user is：Sex：Female, age：5 years old, then prioritizing selection female voice storehouse and child's voice Storehouse；

Send the geographical position residing for the client of source of target voice information, the certain journey in the geographical position residing for the client of source The language that user is used can be reflected on degree, for example, geographical position residing for the client of source is：Beijing, then user speak standard Chinese pronunciation Probability higher, with prioritizing selection received pronunciation storehouse, geographical position residing for the client of source can be：Britain, then user is English-speaking Probability is higher, can be with prioritizing selection English Phonetics storehouse, in addition, the IP of source client can be passed through in geographical position residing for the client of source The information acquisitioies such as address, movable signal, GPS information；

Before receiving target voice information, the Word message of the first user of storage and/or voice messaging correspond to voice In participle, the forward sound bank belonging to the first predetermined number voice participle of occurrence number sequence；

Before receiving target voice information, the Word message of the target group of storage and/or voice messaging correspond to voice In participle, the forward sound bank belonging to the second predetermined number voice participle of occurrence number sequence；

The sound bank selecting sequence of user setup.

It should be noted that the application simply taking above-mentioned as a example illustrates, in practical application, default sound bank selects The particular content of rule is not limited to that.

In a kind of implementation of the application, select rule acceptable according to the default sound bank that aforesaid way determines Combine with user speech storehouse and carry out speech recognition, specifically, for group, wherein generally comprise multiple users, voice The user speech storehouse of each user can be comprised in the set of storehouse, when carrying out speech recognition, can also first adopt and send voice letter The user speech storehouse of the user of breath carries out speech recognition, and in the case that recognition result can not meet requirement, reselection is according to this The sound bank that the information of group determines is identified, if if recognition result still can not meet requirement, can continue often to select Carry out speech recognition with sound bank, received pronunciation storehouse etc..For example, the voice messaging user A in law class group being sent When carrying out speech recognition, first speech recognition can be carried out using the user speech storehouse of user A, requirement can not be met in recognition result In the case of, select law class sound bank to carry out speech recognition.

In addition, in this step, select sound bank from sound bank set, and by selected sound bank to voice participle Carry out speech recognition, be for individual voice participle, for each voice participle being obtained, be required to repeat above-mentioned Process.

As seen from the above, in the scheme that the present embodiment provides, first voice word segmentation processing is carried out to target voice information, then Select sound bank to carry out speech recognition for each voice participle, the accuracy rate of speech recognition can be improved, especially for mesh Comprise the voice messaging of different language, the situation of the voice messaging of different user in mark voice messaging, can further improve The accuracy rate of speech recognition.

It should be understood that for even for same user, its accent is also to be likely to occur change, for example, sometimes Communicated using mandarin, sometimes communicated using dialect, in consideration of it, in a kind of specific implementation of the application, Referring to Fig. 3, there is provided the schematic flow sheet of the third voice messaging conversion method, compared with previous embodiment, in the present embodiment, In the case of meeting information switch condition, target voice information is carried out with speech recognition and obtains text conversion information, so that The display location based on target voice information for the client shows text conversion information (S102), including：

S102C：In the case of meeting information switch condition, determine rule according to default voice segments, obtain target language The first aim voice segments of message breath.

Specifically, above-mentioned default voice segments determine that rule can be：First pause in detection target voice information, Then target voice information original position is started the voice messaging to the above-mentioned position detecting and be defined as target voice The first aim voice segments of information.

Wherein, during pause in detection target voice information, may be referred to the amplitude information of sound naturally it is also possible to reference Other information, the method for the pause in detection voice messaging belongs to prior art, and I will not elaborate.

In addition, above-mentioned default voice segments determine that rule can also be：From the beginning of the original position of target voice information, choosing Select the first aim voice segments as target voice information for the voice messaging of preset length.

The value of above-mentioned preset length can be fixed value set in advance, can also be the length according to target voice information The numerical value that degree determines.

S102D：It is respectively adopted each sound bank in sound bank set, speech recognition is carried out to target language segment.

S102E：Discrimination highest sound bank is defined as target voice storehouse.

S102F：Speech recognition is partly carried out in target voice information in addition to target language segment using target voice storehouse, Obtain the first recognition result.

S102G：According to the first recognition result and the second recognition result, obtain the text conversion information of target voice information, So that the display location based on target voice information for the client shows text conversion information.

Wherein, above-mentioned second recognition result is：Using target voice storehouse, target language segment is carried out with the result of speech recognition.

Those skilled in the art are it is understood that determine that target language segment may have error, adopt voice Each sound bank in the set of storehouse carries out also likely to be present error during speech recognition to target language segment, therefore, finally selected The target voice storehouse going out may not be optimal sound bank.In consideration of it, in a kind of optional implementation of the application, according to First recognition result and the second recognition result, during the text conversion information of acquisition target voice information, can be according to the first identification Result and the second recognition result, obtain the first discrimination for target voice storehouse, then judge whether the first discrimination is less than Default discrimination threshold value, if it is, carry out speech recognition using default Default sound storehouse to target voice information, and according to pin Recognition result to default Default sound storehouse, obtains the text conversion information of target voice information.

Specifically, a sound bank can be comprised in above-mentioned default Default sound storehouse and can also comprise multiple sound banks, example As received pronunciation storehouse, English Phonetics storehouse, user speech storehouse etc. can be comprised, in addition, also may be used in above-mentioned default Default sound storehouse To specify the use priority information of sound bank included in above-mentioned default Default sound storehouse, for example, it is possible to regulation user speech The priority in storehouse is higher than the priority that received pronunciation storehouse, the priority in received pronunciation storehouse are higher than English Phonetics storehouse, that is, preferentially Select user speech storehouse, in the case that recognition result is undesirable, selection standard sound bank carries out speech recognition, in recognition result In the case of undesirable, reselection English Phonetics storehouse carries out speech recognition, if this time recognition result reaches requirement, by this Recognition result is as final recognition result, if this time recognition result is also undesirable, can be by target voice storehouse, received pronunciation In storehouse, the corresponding recognition result in English Phonetics storehouse, best recognition result is as final recognition result.The application be more than Illustrate as a example stating, in practical application, user can set each sound bank in default Default sound storehouse according to the demand of oneself Priority, this is not defined, so each user can have personalization sound bank recognition sequence.

As seen from the above, in the scheme that the present embodiment provides, according to the first aim voice segments in target voice information Determine the sound bank for carrying out speech recognition, rather than speech recognition is carried out using fixing sound bank, be favorably improved voice The accuracy of identification.

Below by instantiation, above-mentioned voice messaging conversion method is further discussed in detail.

It is assumed that the Sichuan user of only one China and Japanese user are sent out carrying out speech exchange, the Sichuan user of China The voice messaging sending is：The Sichuan intonation voice messaging of " child wear all well and good, meet physics engineering, very good ", day After this user receives this voice messaging, select to carry out voice conversion, the default language that Japanese user institute using terminal selects is： Japanese,

When " child wear all well and good, meet physics engineering " is identified, find to use user speech storehouse (Sichuan language Adjust) when being identified, discrimination is higher, it is believed that the corresponding Chinese of this section of voice, the Chinese recognition result obtaining is for " shoes are worn Get up all well and good, meet physics engineering " it is contemplated that Japanese user may be failed to understand, English and Japanese OCR can be provided for it As a result, specifically, first above-mentioned Chinese recognition result can be converted to English recognition result, then more English recognition result be turned It is changed to Japanese OCR result naturally it is also possible to directly English recognition result and Japanese OCR result be obtained by Chinese recognition result；

When " very good " is identified, when finding that sound bank is identified in English, discrimination is higher, so recognizing For the corresponding English of this section of voice, can directly obtain English recognition result is " very good ", then according to English recognition result Obtain Chinese recognition result " fine ", Japanese OCR result is obtained according to English recognition result；

Finally Japanese OCR result, English recognition result are supplied to Japanese user together with Chinese recognition result.

In addition, when comprising japanese voice in voice messaging, when can be identified with japanese voice storehouse, discrimination is relatively Height, it may be determined that this section of voice corresponds to Japanese, directly obtains Japanese recognition result and is supplied to Japanese user, and do not carry out and it Conversion between his languages.

In a kind of specific implementation of the application, referring to Fig. 4, there is provided the stream of the 4th kind of voice messaging conversion method Journey schematic diagram, compared with previous embodiment, in the present embodiment, above-mentioned voice messaging conversion method also includes：

S103：Text conversion information is sent to the first client.

Wherein, the first client is the client sending target voice information.

S104：Receive the update information for text conversion information that the first client sends, and according to update information more Newly above-mentioned text conversion information.

Can be server or client by the executive agent that the present embodiment is learnt in description above.

For server, according to the update information that the first client sends, above-mentioned text conversion information is carried out more Newly, contributing to other asks the client into row information conversion to obtain correct transformation result, in addition, the sound bank in server In the case of there is user speech storehouse in set, server can also update user speech storehouse according to above-mentioned update information, so It is favorably improved the accuracy rate subsequently carrying out speech recognition.

It is noted that being the safety improving telex network information, above-mentioned server can be only in a period of time Voice messaging between memory storage first client and the second client, Word message etc., after the time reaching setting, in deletion State voice messaging between two clients, Word message.And for user speech storehouse, wherein only store the voice according to user The participle information that information determines, and do not store the complete voice messaging of user it is possible to ensure telex network letter well The safety of breath.

For client, obtain update information from the first client, school can be carried out to the recognition result of client Just so that user sees accurate recognition result, it is favorably improved Consumer's Experience, in addition, in the sound bank set of client In the case of there is also user speech storehouse, client can also update user speech storehouse according to above-mentioned update information, so helps Subsequently carry out the accuracy rate of speech recognition in raising.

In a kind of specific implementation of the application, in the case that above-mentioned electronic equipment is server, above-mentioned voice letter Breath conversion method can also include：

Whether the text conversion information before judging to update has sent to other clients, if it is, to receiving above-mentioned literary composition Second client of word transitional information sends revises information, and is directed to above-mentioned correction information according to the second client Feedback result, it is determined whether send the text conversion information after updating to the second client.

Specifically, if the second client feedback does not need to be updated according to update information, server need not be to second Client sends the text conversion information after updating；If the second client needs to be updated according to update information, server Directly the text conversion information after updating can be sent to the second client, the text conversion after the second client will update is believed Breath shows user.

Learn from description above, speech recognition can be carried out using user speech storehouse, so, directly to target language message When breath carries out speech recognition, first user corresponding first user sound bank can be adopted, directly target voice information be carried out Speech recognition, wherein, first user is：Send the user of target voice information；

When speech recognition is carried out to target voice information, using first user sound bank, language is carried out to target voice information Sound identifies.

By carrying out the corresponding user speech storehouse of used first user when speech recognition, and user speech storehouse There may be error, for ensureing that subsequently reusing this user speech storehouse carries out thering is higher accuracy rate during speech recognition, connects After receiving above-mentioned update information, first user sound bank can also be updated further according to above-mentioned update information.

As seen from the above, in the scheme providing in the present embodiment, after obtaining text conversion information, by text conversion information Send to the first client sending target voice information, and receive the update information that the first client sends, believe according to revising Breath updates text conversion information, and the text conversion information so enabling to target voice information is more accurate.

In a kind of specific implementation of the application, referring to Fig. 5, there is provided the stream of the 5th kind of voice messaging conversion method Journey schematic diagram, compared with previous embodiment, in the present embodiment, in the case of meeting information switch condition, to target language message Breath carries out speech recognition and obtains text conversion information, so that the display location based on target voice information for the client shows word Transitional information, including：

S102H：In the case of meeting information switch condition, obtain the frequency of the comprised audio frame of target voice information.

S102I：According to the frequency being obtained, target voice information is divided at least one audio section.

It should be understood that the voice messaging of multiple users may be comprised in target voice information, and the sound of each user Frequency typically differs, when speech recognition is carried out to target voice information, can first the frequency according to audio frame by target language Message breath is divided into multiple audio sections.Specifically, audio frame frequency being located in a certain frequency range is divided into one Audio section.One frequency range can be understood as a user, the different user of different frequency range correspondences.

Specifically, frequency can be located in a certain frequency range and adjacent audio frame is divided into an audio section, will Frequency is located at another frequency range and adjacent audio frame is divided into another audio section.

S102J：Based on the frequency range dividing the audio section obtaining, it is that each frequency range selects from sound bank set Corresponding sound bank, and then determine the corresponding sound bank of each audio section.

It is assumed that a frequency range corresponds to multiple audio sections, then only first can be directed to one of audio section, from voice Select sound bank in the set of storehouse, then selected sound bank is corresponded to sound bank as this audio section, carry out speech recognition.

S102K：Using the corresponding sound bank of each audio section, speech recognition is carried out respectively to each audio section.

Specifically, for each audio section, when carrying out speech recognition according to its corresponding sound bank, recognition result Requirement can not be met, then other sound banks can be selected further to be identified, for example, according to the voice of group information determination Storehouse, conventional sound bank, received pronunciation storehouse etc., the application is not defined to this.

S102L：According to the recognition result of each audio section, obtain the text conversion information of target voice information.

As seen from the above, in the scheme that the present embodiment provides, according to frequency, target voice information is divided into different sounds Frequency range, then selects sound bank to carry out speech recognition for different audio sections respectively, so can be according to not audio segments Different specific, determine sound bank, and then obtain preferably voice identification result.

In a kind of specific implementation of the application, referring to Fig. 6, there is provided the stream of the 6th kind of voice messaging conversion method Journey schematic diagram, compared with previous embodiment, in the present embodiment, above-mentioned voice messaging conversion method also includes：

S105：Receive meeting summary and generate instruction.

S106：Obtain the Word message for generating meeting summary and voice messaging corresponding text conversion information.

It should be noted that before obtaining above-mentioned voice messaging corresponding text conversion information, may have been completed For the speech recognition of voice messaging, then the text conversion information obtaining after can directly obtaining speech recognition, if also undone For the speech recognition of voice messaging, then can first carry out speech recognition, then obtain text conversion letter further according to recognition result Breath.

S107：According to the Word message being obtained and the text conversion information that obtained, according to default meeting summary Form, generates meeting summary.

Above-mentioned default meeting summary form can include：The time of meeting, meeting time span, participate in meeting personnel, spokesman, Minutes, meeting key word etc. information.

Specifically, above-mentioned the time of meeting, meeting time span can be according to for generating Word message, the voice of meeting summary The earliest transmission time of information and the latest transmission time determine.

Can the information such as user according to included in group determine referring to meeting personnel.

Meeting keyword can be to the Word message being obtained and the text conversion information that obtained carries out keyword Extract and determine, extract the keyword that obtains may compare many, can according to rules such as occurrence number orders from less to more, Select some keywords as final meeting keyword, can also be according to default filtering rule, to extract Keyword is filtered, for example, filter out " ", the word such as " obtaining ", the keyword after filtering is defined as final meeting crucial Word.

Specifically, according to the Word message being obtained and the text conversion information that obtained, record according to default meeting Want form, generate meeting summary, including：

Determine and send obtained Word message and the user name of the user of text conversion information being obtained and IP Address；

According to determined by user name, IP address and for generating each audio frame in the voice messaging of meeting summary Frequency, determines spokesman；

According to the Word message being obtained, the text conversion information that obtained and spokesman, record according to default meeting Want form, generate meeting summary.

Above-mentioned meeting summary can be pure words form, can also be multimedia form, you can to comprise voice letter The information such as breath, picture, video, word.

Specifically, in a kind of implementation of the application, in addition to generating meeting summary, can be combined with meeting summary Generate a voice backup information and/or word backup information, to facilitate later stage work personnel to proofread.

Furthermore it is also possible to receive the update information for meeting summary, and above-mentioned meeting is updated according to above-mentioned update information Summary.

As seen from the above, in the scheme that the present embodiment provides, after meeting adjourned, without staff manual editing Generate meeting summary, alleviate the operating pressure of staff, improve work efficiency.

A kind of schematic flow sheet of information generating method that Fig. 7 provides for the embodiment of the present application, the method includes：

S701：The target voice information that reception source client sends.

S702：According to the first user sound bank of the corresponding first user of target voice information, target voice information is entered Row speech recognition.

S703：Recognition result is sent to source client.

S704：The update information for recognition result that reception source client sends.

S705：Described first user sound bank is updated according to update information.

Specifically, the initial speech storehouse of first user sound bank can be default received pronunciation storehouse.

As seen from the above, in the scheme that the present embodiment provides, receive the target voice information that source client sends, go forward side by side After row speech recognition, then recognition result is sent to source client, be corrected for recognition result by source client, Ran Hougen Carry out sound bank renewal according to the update information that source client sends, so can generate the personalized speech storehouse for user, have When helping the later stage speech recognition being carried out to the voice messaging of user using this user speech storehouse, obtain accurate recognition result.

Corresponding with above-mentioned voice messaging conversion method, the embodiment of the present application provides a kind of voice messaging conversion equipment.

A kind of structural representation of voice messaging conversion equipment that Fig. 8 provides for the embodiment of the present application, this device is applied to Electronic equipment, including：

Information receiving module 801, for receiving target voice information；

Sound identification module 802, for, in the case of meeting information switch condition, carrying out to described target voice information Speech recognition obtains text conversion information, so that the display location based on described target voice information for the client shows described literary composition Word transitional information.

Specifically, described sound identification module 802 can be specifically for carrying out voice participle to described target voice information Process, obtain the voice participle that described target voice packet contains, for each voice participle being obtained, according to default language Sound storehouse selects rule, selects sound bank, and carry out language by selected sound bank to this voice participle from sound bank set Sound identification obtains text conversion information, so that the display location based on described target voice information for the client shows described word Transitional information.

Specifically, described default sound bank selects the rule that rule can be according at least one determination in following information Then：

Send the classification of the first user said target group of described target voice information；

The language form of described target group title；

The customer attribute information of described first user；

Send the geographical position residing for the client of source of described target voice information；

Before receiving described target voice information, the Word message of described first user of storage and/or voice messaging In corresponding voice participle, the forward sound bank belonging to the first predetermined number voice participle of occurrence number sequence；

Before receiving described target voice information, the Word message of described target group of storage and/or voice messaging In corresponding voice participle, the forward sound bank belonging to the second predetermined number voice participle of occurrence number sequence；

The sound bank selecting sequence of user setup.

Specifically, described sound identification module 802 includes：

Voice segments obtain submodule, for determining rule according to default voice segments, obtain described target voice information First aim voice segments；

First speech recognition submodule, for each sound bank being respectively adopted in sound bank set, to described target language Segment carries out speech recognition；

Sound bank determination sub-module, for being defined as target voice storehouse by discrimination highest sound bank；

Second speech recognition submodule, for using described target voice storehouse in described target voice information remove described mesh Poster segment carries out speech recognition with outer portion, obtains the first recognition result；

The first information obtains submodule, for according to described first recognition result and the second recognition result, obtaining described mesh The text conversion information of mark voice messaging, wherein, described second recognition result is：Using described target voice storehouse to described target Voice segments carry out the result of speech recognition.

Specifically, described information obtains submodule and can include：

Discrimination computing unit, for according to described first recognition result and the second recognition result, obtaining and being directed to described mesh First discrimination of mark sound bank；

Discrimination judging unit, for judging whether described first discrimination is less than default discrimination threshold value；

Information obtainment unit, for the judged result in described discrimination judging unit be in the case of, using default Default sound storehouse carries out speech recognition to described target voice information, and according to the identification knot for described default Default sound storehouse Really, obtain the text conversion information of described target voice information.

Specifically, described sound identification module 802, specifically for receiving after described target voice information, directly to institute State target voice information and carry out speech recognition acquisition text conversion information, so that client is based on described target voice information Display location shows described text conversion information；Or

Specifically for monitoring whether to receive the information conversion instruction for described target voice information, if it is, to institute State target voice and carry out speech recognition acquisition text conversion information, so that the displaying based on described target voice information for the client Described text conversion information is shown in position.

Specifically, described device can also include：

Result sending module, for sending described text conversion information to described first client, wherein, described first Client is the client sending described target voice information；

Result update module, for receiving the correction letter for described text conversion information that described first client sends Breath, and described text conversion information is updated according to described update information.

Specifically, in the case that described electronic equipment is server, described device can also include：

Result judge module, for judging whether the described text conversion information before updating has sent to other clients；

Information sending module, the situation being yes for the judged result in described result judge module, to receiving Second client of described text conversion information sends revises information, and is directed to described correction according to described second client The feedback result of information, it is determined whether send the described text conversion information after updating to described second client.

Specifically, described sound identification module 802, specifically for using first user corresponding first user sound bank, Directly described target voice information is carried out with speech recognition and obtains text conversion information, so that client is based on described target language The display location of message breath shows described text conversion information, and wherein, described first user is：Send described target voice information User；Or

Described sound identification module 802, specifically for using described first user sound bank, to described target voice information Carry out speech recognition and obtain text conversion information, so that the display location based on described target voice information for the client shows institute State text conversion information；

Described device can also include：

Sound bank update module, for according to described update information, updating described first user sound bank.

Specifically, described sound identification module 802 can include：

Frame per second obtains submodule, for, in the case of meeting information switch condition, obtaining described target voice information institute Comprise the frequency of audio frame；

Audio section divides submodule, for according to the frequency being obtained, described target voice information being divided at least one Individual audio section；

Sound bank selects submodule, for based on the frequency range dividing the audio section obtaining, from sound bank set being Each frequency range selects corresponding sound bank, and then determines the corresponding sound bank of each audio section；

3rd speech recognition submodule, for using the corresponding sound bank of each audio section, entering respectively to each audio section Row speech recognition；

Second information acquisition submodule, for the recognition result according to each audio section, obtains described target voice information Text conversion information.

Specifically, described voice messaging conversion equipment can also include：

Command reception module, generates instruction for receiving meeting summary；

Information acquisition module, for obtaining Word message and the corresponding word of voice messaging for generating meeting summary Transitional information；

Summary generation module, for according to the Word message being obtained and the text conversion information that obtained, according to pre- If meeting summary form, generate meeting summary.

Specifically, described summary generation module can include：

Information determination sub-module, sends obtained Word message and the text conversion information that obtained for determining The user name of user and IP address；

Spokesman's determination sub-module, for user name determined by basis, IP address and for generating meeting summary In voice messaging, the frequency of each audio frame, determines spokesman；

Summary generates submodule, for according to the Word message being obtained, the text conversion information that obtained and described Spokesman, according to default meeting summary form, generates meeting summary.

As seen from the above, in the scheme that each embodiment above-mentioned provides, after receiving target voice information, meeting information The situation of switch condition, carries out speech recognition and obtains text conversion information, such client can be based on to target voice information The display location of target voice information shows above-mentioned text conversion information.It can be seen that, the scheme that application the embodiment of the present application provides, energy Enough convert speech information into Word message.

Corresponding with above- mentioned information generation method, the embodiment of the present application additionally provides a kind of information generation device.

A kind of structural representation of information generation device that Fig. 9 provides for the embodiment of the present application, this device includes：

Information receiving module 901, for receiving the target voice information that source client sends；

Sound identification module 902, for the first user voice according to the corresponding first user of described target voice information Storehouse, carries out speech recognition to described target voice information；

Result sending module 903, for sending recognition result to described source client；

Update information receiver module 904, for receiving the correction for described recognition result that described source client sends Information；

Sound bank update module 905, for updating described first user sound bank according to described update information.

Specifically, the initial speech storehouse of described first user sound bank is default received pronunciation storehouse.

For device embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, related Part illustrates referring to the part of embodiment of the method.

It should be noted that herein, such as first and second or the like relational terms are used merely to a reality Body or operation are made a distinction with another entity or operation, and not necessarily require or imply these entities or deposit between operating In any this actual relation or order.And, term " inclusion ", "comprising" or its any other variant are intended to Comprising of nonexcludability, wants so that including a series of process of key elements, method, article or equipment and not only including those Element, but also include other key elements being not expressly set out, or also include for this process, method, article or equipment Intrinsic key element.In the absence of more restrictions, the key element that limited by sentence "including a ..." it is not excluded that Also there is other identical element including in the process of described key element, method, article or equipment.

One of ordinary skill in the art will appreciate that realizing all or part of step in said method embodiment is can Completed with the hardware instructing correlation by program, described program can be stored in computer read/write memory medium, The storage medium obtaining designated herein, such as：ROM/RAM, magnetic disc, CD etc..

The foregoing is only the preferred embodiment of the application, be not intended to limit the protection domain of the application.All Any modification, equivalent substitution and improvement made within spirit herein and principle etc., are all contained in the protection domain of the application Interior.

Claims

1. a kind of voice messaging conversion method, is applied to electronic equipment it is characterised in that methods described includes：

Receive target voice information；

In the case of meeting information switch condition, described target voice information is carried out with speech recognition and obtains text conversion letter Breath, so that the display location based on described target voice information for the client shows described text conversion information.

2. method according to claim 1 is it is characterised in that described carry out speech recognition to described target voice information and obtain Obtain text conversion information, including：

Described target voice information is carried out with voice word segmentation processing, obtains the voice participle that described target voice packet contains；

For each voice participle being obtained, select rule according to default sound bank, select voice from sound bank set Storehouse, and speech recognition acquisition text conversion information is carried out by selected sound bank to this voice participle.

3. method according to claim 2 is it is characterised in that it is according to following letter that described default sound bank selects rule The rule of at least one determination in breath：

The language form of described target group title；

The customer attribute information of described first user；

Before receiving described target voice information, the Word message of described first user of storage and/or voice messaging correspond to In voice participle, the forward sound bank belonging to the first predetermined number voice participle of occurrence number sequence；

Before receiving described target voice information, the Word message of described target group of storage and/or voice messaging correspond to In voice participle, the forward sound bank belonging to the second predetermined number voice participle of occurrence number sequence；

The sound bank selecting sequence of user setup.

4. method according to claim 1 is it is characterised in that described carry out speech recognition to described target voice information and obtain Obtain text conversion information, including：

Determine rule according to default voice segments, obtain the first aim voice segments of described target voice information；

It is respectively adopted each sound bank in sound bank set, speech recognition is carried out to described target language segment；

Discrimination highest sound bank is defined as target voice storehouse；

Using described target voice storehouse to partly carrying out voice knowledge in described target voice information in addition to described target language segment Not, obtain the first recognition result；

According to described first recognition result and the second recognition result, obtain the text conversion information of described target voice information, its In, described second recognition result is：Using described target voice storehouse, described target language segment is carried out with the result of speech recognition.

5. method according to claim 4 is it is characterised in that described tie according to described first recognition result and the second identification Really, obtain the text conversion information of described target voice information, including：

According to described first recognition result and the second recognition result, obtain the first discrimination for described target voice storehouse；

Judge whether described first discrimination is less than default discrimination threshold value；

If it is, speech recognition is carried out to described target voice information using default Default sound storehouse, and according to for described pre- If the recognition result in Default sound storehouse, obtain the text conversion information of described target voice information.

6. method according to claim 1 it is characterised in that described in the case of meeting information switch condition, to institute State target voice information and carry out speech recognition acquisition text conversion information, including：

After receiving described target voice information, directly described target voice information is carried out with speech recognition and obtains text conversion letter Breath；Or

Monitor whether to receive the information conversion instruction for described target voice information, if it is, entering to described target voice Row speech recognition obtains text conversion information.

7. method according to claim 6 is it is characterised in that methods described also includes：

Described text conversion information is sent to described first client, wherein, described first client is to send described target The client of voice messaging；

Receive the update information for described text conversion information that described first client sends, and according to described update information Update described text conversion information.

8. method according to claim 7 it is characterised in that described electronic equipment be server in the case of, described Method also includes：

Whether the described text conversion information before judging to update has sent to other clients；

If it is, send to the second client receiving described text conversion information revising information, and according to described the Two clients are directed to the described feedback result revising information, it is determined whether send the institute after updating to described second client State text conversion information.

9. method according to claim 7 it is characterised in that

Described directly described target voice information carried out with speech recognition obtain text conversion information, including：

Using first user corresponding first user sound bank, directly described target voice information is carried out with speech recognition and obtains literary composition Word transitional information, wherein, described first user is：Send the user of described target voice information；

Described described target voice information is carried out speech recognition obtain text conversion information, including：

Using described first user sound bank, described target voice information is carried out with speech recognition and obtains text conversion information；

Methods described also includes：

According to described update information, update described first user sound bank.

10. method according to claim 1 is it is characterised in that carry out speech recognition acquisition to described target voice information Text conversion information, including：

Obtain the frequency of the comprised audio frame of described target voice information；

According to the frequency being obtained, described target voice information is divided at least one audio section；

Based on the frequency range dividing the audio section obtaining, it is that each frequency range selects corresponding voice from sound bank set Storehouse, and then determine the corresponding sound bank of each audio section；

Using the corresponding sound bank of each audio section, speech recognition is carried out respectively to each audio section；

According to the recognition result of each audio section, obtain the text conversion information of described target voice information.

11. methods according to any one of claim 1-10 are it is characterised in that methods described also includes：

Receive meeting summary and generate instruction；

Obtain the Word message for generating meeting summary and voice messaging corresponding text conversion information；

According to the Word message being obtained and the text conversion information that obtained, according to default meeting summary form, generate Meeting summary.

12. methods according to claim 11 it is characterised in that described according to the Word message being obtained and obtained Text conversion information, according to default meeting summary form, generate meeting summary, including：

According to determined by user name, IP address and the frequency for generating each audio frame in the voice messaging of meeting summary Rate, determines spokesman；

According to the Word message being obtained, the text conversion information that obtained and described spokesman, record according to default meeting Want form, generate meeting summary.

13. methods according to any one of claim 1-10 are it is characterised in that described client is based on described target language The display location of message breath shows described text conversion information, including：

Described client is in the displayed page of described target voice information, the display location of described target voice information default Direction, shows described text conversion information.

A kind of 14. information generating methods are it is characterised in that methods described includes：

The target voice information that reception source client sends；

According to the first user sound bank of the corresponding first user of described target voice information, described target voice information is carried out Speech recognition；

Recognition result is sent to described source client；

15. methods according to claim 14 are it is characterised in that the initial speech storehouse of described first user sound bank is pre- If received pronunciation storehouse.

A kind of 16. voice messaging conversion equipments, are applied to electronic equipment it is characterised in that described device includes：

Information receiving module, for receiving target voice information；

Sound identification module, for, in the case of meeting information switch condition, carrying out voice knowledge to described target voice information Not Huo get text conversion information so that client based on described target voice information display location show described text conversion Information.

17. devices according to claim 16 it is characterised in that

Described sound identification module, specifically for described target voice information is carried out with voice word segmentation processing, obtains described target The voice participle that voice messaging comprises, for each voice participle being obtained, selects rule according to default sound bank, from language Select sound bank in the set of sound storehouse, and by selected sound bank, this voice participle carried out with speech recognition to obtain text conversion Information, so that the display location based on described target voice information for the client shows described text conversion information.

18. devices according to claim 17 are it is characterised in that it is according to following that described default sound bank selects rule The rule of at least one determination in information：

The language form of described target group title；

The customer attribute information of described first user；

The sound bank selecting sequence of user setup.

19. devices according to claim 16 it is characterised in that described sound identification module, including：

Voice segments obtain submodule, for determining rule according to default voice segments, obtain the first of described target voice information Individual target language segment；

First speech recognition submodule, for each sound bank being respectively adopted in sound bank set, to described target language segment Carry out speech recognition；

Second speech recognition submodule, for using described target voice storehouse in described target voice information remove described target language Segment carries out speech recognition with outer portion, obtains the first recognition result；

The first information obtains submodule, for according to described first recognition result and the second recognition result, obtaining described target language The text conversion information of message breath, wherein, described second recognition result is：Using described target voice storehouse to described target voice The result of Duan Jinhang speech recognition.

20. devices according to claim 19 it is characterised in that described information obtain submodule, including：

Discrimination computing unit, for according to described first recognition result and the second recognition result, obtaining and being directed to described target language First discrimination in sound storehouse；

Information obtainment unit, for described discrimination judging unit judged result be in the case of, using default acquiescence Sound bank carries out speech recognition to described target voice information, and according to the recognition result for described default Default sound storehouse, Obtain the text conversion information of described target voice information.

21. devices according to claim 16 it is characterised in that

Described sound identification module, specifically for receiving after described target voice information, directly to described target voice information Carry out speech recognition and obtain text conversion information, so that the display location based on described target voice information for the client shows institute State text conversion information；

Or

Specifically for monitoring whether to receive the information conversion instruction for described target voice information, if it is, to described mesh Poster sound carries out speech recognition and obtains text conversion information, so that the display location based on described target voice information for the client Show described text conversion information.

22. devices according to claim 21 are it is characterised in that described device also includes：

Result sending module, for sending described text conversion information to described first client, wherein, described first client Hold the client for sending described target voice information；

Result update module, for receiving the update information for described text conversion information that described first client sends, And described text conversion information is updated according to described update information.

23. devices according to claim 22 it is characterised in that described electronic equipment be server in the case of, institute State device also to include：

Information sending module, the situation being yes for the judged result in described result judge module, described to receiving Second client of text conversion information sends revises information, and is directed to described correction prompting according to described second client The feedback result of information, it is determined whether send the described text conversion information after updating to described second client.

24. devices according to claim 22 it is characterised in that

Described sound identification module, specifically for using first user corresponding first user sound bank, directly to described target Voice messaging carries out speech recognition and obtains text conversion information, so that the displaying position based on described target voice information for the client Put the described text conversion information of displaying, wherein, described first user is：Send the user of described target voice information；Or

Described sound identification module, specifically for using described first user sound bank, carrying out language to described target voice information Sound identification obtains text conversion information, so that the display location based on described target voice information for the client shows described word Transitional information；

Described device also includes：

25. devices according to claim 16 it is characterised in that described sound identification module, including：

Frame per second obtains submodule, is comprised in the case of meeting information switch condition, obtaining described target voice information The frequency of audio frame；

Audio section divides submodule, for according to the frequency being obtained, described target voice information being divided at least one sound Frequency range；

Sound bank selects submodule, for based on the frequency range dividing the audio section obtaining, being each from sound bank set Frequency range selects corresponding sound bank, and then determines the corresponding sound bank of each audio section；

3rd speech recognition submodule, for using the corresponding sound bank of each audio section, carrying out language respectively to each audio section Sound identifies；

Second information acquisition submodule, for the recognition result according to each audio section, obtains the literary composition of described target voice information Word transitional information.

26. devices according to any one of claim 16-25 are it is characterised in that described device also includes：

Information acquisition module, for obtaining Word message and the corresponding text conversion of voice messaging for generating meeting summary Information；

Summary generation module, for according to the Word message being obtained and the text conversion information that obtained, according to default Meeting summary form, generates meeting summary.

27. devices according to claim 26 it is characterised in that described summary generation module, including：

Information determination sub-module, for determining the user sending obtained Word message and the text conversion information being obtained User name and IP address；

Spokesman's determination sub-module, for user name, IP address and the voice for generating meeting summary determined by basis In information, the frequency of each audio frame, determines spokesman；

Summary generates submodule, for according to the Word message being obtained, the text conversion information that obtained and described speech People, according to default meeting summary form, generates meeting summary.

A kind of 28. information generation devices are it is characterised in that described device includes：

Sound identification module, for the first user sound bank according to the corresponding first user of described target voice information, to institute State target voice information and carry out speech recognition；

Update information receiver module, for receiving the update information for described recognition result that described source client sends；

29. devices according to claim 28 are it is characterised in that the initial speech storehouse of described first user sound bank is pre- If received pronunciation storehouse.