CN105161104A

CN105161104A - Voice processing method and device

Info

Publication number: CN105161104A
Application number: CN201510458935.9A
Authority: CN
Inventors: 欧光欣; 任禾; 陈大林
Original assignee: Beijing Yunzhisheng Information Technology Co Ltd
Current assignee: Beijing Yunzhisheng Information Technology Co Ltd
Priority date: 2015-07-31
Filing date: 2015-07-31
Publication date: 2015-12-16

Abstract

The invention discloses a voice processing method and device. The method includes: recording voice; determining a language database which converts voice; converting voice into characters through the language database; and filling the characters in a table matched with the voice. The technical scheme achieves the purpose of accurately converting voice into characters, avoiding a situation that voice content cannot be accurately converted due to existence of technical terms, and can fill character content in the table automatically, so manual filling by a user is not needed, thereby bringing great convenience to the user.

Description

A kind of method of speech processing and device

Technical field

The present invention relates to technical field of information processing, particularly relate to a kind of method of speech processing and device.

Background technology

At present, along with the development of electronics technology, phonetic entry is more and more praised highly by people, a kind of input mode of phonetic entry to be the Content Transformation of people being spoken by speech recognition be text.Phonetic entry in a lot of application is also confined to phonetic entry voice output, well phonetic entry can't be converted into word.Due in actual applications, environment faced by speech recognition is very complicated, need the content of speaking processing various different field, recognition performance is difficult to reach absolutely, especially for some special dimensions,, there is the probability that a lot of technical term occurs in phonetic entry lower in such as medical field, financial field, the communications field etc., is also difficult to be technical term by speech conversion exactly when the Content Transformation of speaking is text by electronic equipment.

Summary of the invention

The embodiment of the present invention provides a kind of method of speech processing and device.Described technical scheme is as follows:

A kind of method of speech processing, comprises the following steps:

Recorded speech;

Determine the language database that described voice are transformed;

By described language database, described voice are converted into word;

Described word is filled up in the form matched with described voice.

Some beneficial effects of the embodiment of the present invention can comprise:

Technique scheme, by the language database corresponding with the voice recorded, voice are converted into word, and word is filled up in the form matched with voice, achieve the object exactly voice being converted into word, avoid the situation that voice content cannot accurately transform because there is technical term, can automatically word content being filled up in form simultaneously, making user without the need to manually filling in again, for user brings great convenience.

In one embodiment, the described language database determining to transform described voice, comprising:

Obtain the voice recording preset time period;

Judge whether the voice of described preset time period comprise predetermined keyword;

When the voice packet of described preset time period is containing described predetermined keyword, according to the corresponding relation between predetermined keyword and language database, determine the language database corresponding with described predetermined keyword, as the language database transformed described voice.

In this embodiment, by recording the predetermined keyword comprised in the voice of preset time period, determine the language database that voice are transformed, more pointed and personalized when making to transform voice, thus more exactly voice are converted into word.

Identify the phonetic feature of described voice;

According to described phonetic feature, determine the pronunciation source of described voice;

According to the corresponding relation between pronunciation source and language database, determine to originate with the pronunciation of described voice corresponding language database, as the language database changed described voice.

In this embodiment, by determining the pronunciation source of voice, determine the language database that voice are transformed, more pointed and personalized when making to transform voice, thus more exactly voice are converted into word.

In one embodiment, describedly by described language database, described voice are converted into word, comprise:

Determine the acoustic information of described voice;

The word corresponding with described acoustic information is matched from described language database;

Described voice are converted into the word corresponding with described acoustic information.

In this embodiment, by the word that the acoustic information mating voice is corresponding, and then voice are converted into word corresponding to acoustic information, make the voice recorded can carry out accurate match by acoustic database and language database, thus exactly voice are converted into word.

In one embodiment, described method also comprises:

According to the phonetic feature in each pronunciation source, set up the audio database in described each pronunciation source.

In this embodiment, by setting up the audio database in each pronunciation source, make terminal can determine according to audio database the language database transformed voice, thus it is more accurate and personalized to make voice be converted into word.

In one embodiment, the described acoustic information determining described voice, comprising:

Determine the pronunciation source of described voice;

According to the audio database in described each pronunciation source, it is described voice match audio database;

The acoustic information of described voice is determined according to the audio database of described coupling.

In this embodiment, determined the acoustic information of voice by the audio database matching voice, and then enable terminal from language database, match word corresponding to acoustic information, finally realize object voice being converted into word.

In one embodiment, the described language database changed described voice comprises following at least one:

The particular term in field belonging to described predetermined keyword;

The article in field belonging to described predetermined keyword;

Semantic association relation.

In one embodiment, describedly from described language database, match the word corresponding with described acoustic information, comprising:

The word corresponding with described acoustic information is searched from described language database;

When the word corresponding with described acoustic information is single word, according to described particular term and/or described semantic association relation, be described single word coupling word and/or sentence;

The word matched with described single word and/or sentence are defined as the word corresponding with described acoustic information.

In this embodiment, when the word corresponding with described acoustic information is single word, can according to the particular term in language data or semantic association relation for single word mates complete word or sentence, targetedly particular term is accurately transformed when enabling voice change into word, thus make the conversion of voice more accurately with targeted.

In one embodiment, described method also comprises:

According to preset ratio, improve described particular term and and described particular term between meet the weight of element in described language database of described semantic association relation, described element comprises word, word and/or article;

Describedly from described language database, match the word corresponding with described acoustic information, comprising:

According to the weight of element each in described language database, from described language database, match the word corresponding with described acoustic information.

In this embodiment, by particular term and and particular term between meet semantic association relation the weight of element in language database improve, can transform according to the weight of each element when making voice be converted into word, thus particular term can be matched exactly, it is more accurate and personalized that voice transform.

In one embodiment, before described recorded speech, described method also comprises:

The particular term in field belonging to described predetermined keyword is added in general purpose language database, obtains the language database corresponding with described predetermined keyword;

Set up the corresponding relation between described predetermined keyword and described language database.

In this embodiment, by setting up the corresponding relation between predetermined keyword and language database, make terminal can match according to the predetermined keyword comprised in voice the language database transformed voice exactly, thus it is more accurate and personalized that voice are transformed.

In one embodiment, describedly described word to be filled up in the form matched with described voice, to comprise:

According to the corresponding relation between predetermined keyword and form, determine the form corresponding with described predetermined keyword; Described word is filled up in the form corresponding with described predetermined keyword;

Or,

According to the corresponding relation between pronunciation source and form, determine and the described corresponding form that pronounces to originate; Described word is filled up to and pronounces to originate in corresponding form with described.

In this embodiment, word, after voice are converted into word, can be filled up in corresponding form, make user without the need to manually filling in, for user brings great convenience again by terminal automatically.

In one embodiment, described form comprises at least one column; Described each column is corresponding with at least one keyword; Describedly described word to be filled up in the form matched with described voice, to comprise:

The keyword corresponding according to described column, determines word content corresponding with described column in described word;

The word content corresponding with described column is filled up in described column.

In this embodiment, word content after conversion, after voice are converted into word, can be filled up in the column of corresponding form by terminal, thus makes terminal Auto-writing form more accurately, targetedly, manually fill in without the need to user, for user brings great convenience.

A kind of voice processing apparatus, comprising:

Record module, for recorded speech;

Determination module, for determining the language database transformed described voice;

Conversion module, for being converted into word by described language database by described voice;

Fill in module, for being filled up to by described word in the form that matches with described voice.

In one embodiment, described determination module comprises:

Obtain submodule, for obtaining the voice recording preset time period;

Judge submodule, for judging whether the voice of described preset time period comprise predetermined keyword;

First determines submodule, for when the voice packet of described preset time period is containing described predetermined keyword, according to the corresponding relation between predetermined keyword and language database, determine the language database corresponding with described predetermined keyword, as the language database transformed described voice.

In one embodiment, described determination module comprises:

Recognin module, for identifying the phonetic feature of described voice;

Second determines submodule, for according to described phonetic feature, determines the pronunciation source of described voice;

3rd determines submodule, for according to the corresponding relation between pronunciation source and language database, determines to originate with the pronunciation of described voice corresponding language database, as the language database changed described voice.

In one embodiment, described conversion module comprises:

4th determines submodule, for determining the acoustic information of described voice;

First matched sub-block, for matching the word corresponding with described acoustic information from described language database;

Transformant module, for being converted into the word corresponding with described acoustic information by described voice.

In one embodiment, described device also comprises:

First sets up module, for the phonetic feature according to each pronunciation source, sets up the audio database in described each pronunciation source.

In one embodiment, described determination module comprises:

5th determines submodule, for determining the pronunciation source of described voice;

Second matched sub-block, for the audio database according to described each pronunciation source, is described voice match audio database;

6th determines submodule, for determining the acoustic information of described voice according to the audio database of described coupling.

In one embodiment, described first matched sub-block comprises:

Search unit, for searching the word corresponding with described acoustic information from described language database;

First matching unit, for when the word corresponding with described acoustic information is single word, according to the particular term in described language database and/or semantic association relation, is described single word coupling word and/or sentence;

Determining unit, for being defined as the word corresponding with described acoustic information by the word matched with described single word and/or sentence.

In one embodiment, described device also comprises:

Improve module, for according to preset ratio, improve described particular term and and described particular term between meet the weight of element in described language database of described semantic association relation, described element comprises word, word and/or article;

Described first matched sub-block comprises:

Second matching unit, for the weight according to element each in described language database, matches the word corresponding with described acoustic information from described language database.

In one embodiment, described device also comprises:

Add module, before recorded speech, the particular term in field belonging to described predetermined keyword is added in general purpose language database, obtains the language database corresponding with described predetermined keyword;

Second sets up module, for setting up the corresponding relation between described predetermined keyword and described language database.

In one embodiment, fill in module described in comprise:

7th determines submodule, for according to the corresponding relation between predetermined keyword and form, determines the form corresponding with described predetermined keyword; First fills in submodule, for being filled up in the form corresponding with described predetermined keyword by described word;

Or,

8th determines submodule, for according to the corresponding relation between pronunciation source and form, determines and the described corresponding form that pronounces to originate; Second fills in submodule, pronounces to originate in corresponding form with described for being filled up to by described word.

In one embodiment, fill in module described in comprise:

9th determines submodule, and for the keyword corresponding according to the column in described form, determine word content corresponding with described column in described word, described form comprises at least one column, and described each column is corresponding with at least one keyword;

3rd fills in submodule, for being filled up in described column by the word content corresponding with described column.

Said apparatus, by the language database corresponding with the voice recorded, voice are converted into word, and word is filled up in the form matched with voice, achieve the object exactly voice being converted into word, avoid the situation that voice content cannot accurately transform because there is technical term, can automatically word content being filled up in form simultaneously, making user without the need to manually filling in again, for user brings great convenience.

Other features and advantages of the present invention will be set forth in the following description, and, partly become apparent from instructions, or understand by implementing the present invention.Object of the present invention and other advantages realize by structure specifically noted in write instructions, claims and accompanying drawing and obtain.

Below by drawings and Examples, technical scheme of the present invention is described in further detail.

Accompanying drawing explanation

Accompanying drawing is used to provide a further understanding of the present invention, and forms a part for instructions, together with embodiments of the present invention for explaining the present invention, is not construed as limiting the invention.In the accompanying drawings:

Fig. 1 is the process flow diagram of a kind of method of speech processing in the embodiment of the present invention;

Fig. 2 is the process flow diagram of step S12 in a kind of method of speech processing in the embodiment of the present invention;

Fig. 3 is the process flow diagram of step S12 in a kind of method of speech processing in the embodiment of the present invention;

Fig. 4 is the process flow diagram of step S13 in a kind of method of speech processing in the embodiment of the present invention;

Fig. 5 is the process flow diagram of step S41 in a kind of method of speech processing in the embodiment of the present invention;

Fig. 6 is the process flow diagram of step S42 in a kind of method of speech processing in the embodiment of the present invention;

Fig. 7 is the process flow diagram of a kind of method of speech processing in the embodiment of the present invention;

Fig. 8 is the process flow diagram of a kind of method of speech processing in the embodiment of the present invention;

Fig. 9 is the process flow diagram of a kind of method of speech processing in the embodiment of the present invention;

Figure 10 is the block diagram of a kind of voice processing apparatus in the embodiment of the present invention;

Figure 11 is the block diagram of determination module in a kind of voice processing apparatus in the embodiment of the present invention;

Figure 12 is the block diagram of determination module in a kind of voice processing apparatus in the embodiment of the present invention;

Figure 13 is the block diagram of conversion module in a kind of voice processing apparatus in the embodiment of the present invention;

Figure 14 is the block diagram of determination module in a kind of voice processing apparatus in the embodiment of the present invention;

Figure 15 is the block diagram of the first matched sub-block in a kind of voice processing apparatus in the embodiment of the present invention;

Figure 16 is the block diagram of a kind of voice processing apparatus in the embodiment of the present invention;

Figure 17 is the block diagram filling in module in the embodiment of the present invention in a kind of voice processing apparatus;

Figure 18 is the block diagram filling in module in the embodiment of the present invention in a kind of voice processing apparatus;

Figure 19 is the block diagram filling in module in the embodiment of the present invention in a kind of voice processing apparatus.

Embodiment

Below in conjunction with accompanying drawing, the preferred embodiments of the present invention are described, should be appreciated that preferred embodiment described herein is only for instruction and explanation of the present invention, is not intended to limit the present invention.

Fig. 1 is the process flow diagram of a kind of method of speech processing in the embodiment of the present invention.As shown in Figure 1, this method of speech processing is used in terminal, wherein, terminal can be arbitrary equipment with language process function such as mobile phone, computing machine, digital broadcast terminal, messaging devices, game console, tablet device, Medical Devices, body-building equipment, personal digital assistant.The method comprises the following steps S11-S14:

Step S11, recorded speech.

Step S12, determines the language database transformed voice.

Voice are converted into word by language database by step S13.

Step S14, is filled up to word in the form matched with voice.

Adopt the technical scheme that the embodiment of the present invention provides, by the language database corresponding with the voice recorded, voice are converted into word, and word is filled up in the form matched with voice, achieve the object exactly voice being converted into word, avoid the situation that voice content cannot accurately transform because there is technical term, can automatically word content being filled up in form simultaneously, making user without the need to manually filling in again, for user brings great convenience.

In said method, the language database transformed voice comprises at least one or multiple in the article in field belonging to the particular term in field belonging to predetermined keyword, predetermined keyword, semantic association relation.Language database that like this can be different for different domain arrangement.Need to implement according to following steps in advance for different field or different industries configure different language database: the particular term in field belonging to predetermined keyword is added in general purpose language database, obtains the language database corresponding with predetermined keyword; Set up the corresponding relation between predetermined keyword and language database.Such as, for medical system, can the particular term in the medical system such as " flu ", " allergic ", " fever ", " aspirin " be added in general purpose language database, thus set up the language database of medical system; For banking system, can the particular term in the banking such as " credit card ", " deposit ", " financing " be added in general purpose language database, thus set up the language database of banking system.In this embodiment, by setting up the corresponding relation between predetermined keyword and language database, make terminal can match according to the predetermined keyword comprised in voice the language database transformed voice exactly, thus it is more accurate and personalized that voice are transformed.

Above-mentioned steps S12 implements by following two kinds of diverse ways.

Be the language database determined according to the predetermined keyword comprised in the voice recorded, as shown in Figure 2, step S12 may be embodied as following steps S21-S23:

Step S21, obtains the voice recording preset time period.

Step S22, judges whether the voice of preset time period comprise predetermined keyword.In this step, first by general purpose language database, the voice of preset time period can be converted into word, then judge whether comprise predetermined keyword in the word that the voice of preset time period change into.

Step S23, when the voice packet of preset time period is containing predetermined keyword, according to the corresponding relation between predetermined keyword and language database, determines the language database corresponding with predetermined keyword, as the language database transformed voice.

Such as, terminal obtains the voice of initial 2 minutes, judge whether to comprise predetermined keyword from these voice of 2 minutes, here predetermined keyword can be the word in representative certain industry or a certain field, the language database corresponding with predetermined keyword can be the pre-configured specific language database of certain industry or a certain field, wherein not only comprise the data in general purpose language database, also comprise predetermined keyword and the data relevant to predetermined keyword.For medical domain, if the voice of initial 2 minutes that terminal gets comprise predetermined keyword " allergic ", then can judge the language database of the language database corresponding with predetermined keyword " allergic " as medical domain, further, can determine that the language database corresponding with predetermined keyword " allergic " is dept. of dermatology's language database.When not comprising predetermined keyword in the voice of initial 2 minutes, the acquisition voice of 2 minutes can be continued again, till therefrom finding out the predetermined keyword can determining language database.In addition, also first can obtain the of voice in short or front a few words, then according to the of voice in short or the predetermined keyword comprised in front a few words determine the language database corresponding with predetermined keyword.In this embodiment, by recording the predetermined keyword comprised in the voice of preset time period, determine the language database that voice are transformed, more pointed and personalized when making to transform voice, thus more exactly voice are converted into word.

Another kind determines the language database that voice transform by the phonetic feature of voice.As shown in Figure 3, step S12 can also be embodied as following steps S31-S33:

Step S31, identifies the phonetic feature of voice.

Step S32, according to phonetic feature, determines the pronunciation source of voice.

Step S33, according to the corresponding relation between pronunciation source and language database, determines to originate with the pronunciation of voice corresponding language database, as the language database changed voice.

Wherein, when pronunciation source is for user, phonetic feature can be the pronunciation custom of user, because the pronunciation custom of each user is different, therefore also just there is different phonetic features, adopt in this way perform step S12 time, can be accustomed to according to the pronunciation of each user, in advance for each user mate correspondence language database.The pronunciation of user is accustomed to such as: accent, the intonation of user pronunciation, the speed degree etc. of user pronunciation entrained in the pronunciation of user.In this embodiment, by determining the pronunciation source of voice, determine the language database that voice are transformed, more pointed and personalized when making to transform voice, thus more exactly voice are converted into word.

In one embodiment, as shown in Figure 4, step S13 can also be embodied as following steps S41-S43:

Step S41, determines the acoustic information of voice.

Step S42, matches the word corresponding with acoustic information from language database.

Voice are converted into the word corresponding with acoustic information by step S43.

In this embodiment, can adopt acoustic database and language database that voice are converted into word.The acoustic information of voice is determined by acoustic database, the word corresponding with acoustic information is matched again from language database, such as, acoustic information is " α ", and so corresponding with acoustic information " α " in language database word has " ", " Ah " etc.Be specially acoustic information " α " and match the word which determines, then can determine according to the specific language database determined in step S12.In this embodiment, by the word that the acoustic information mating voice is corresponding, and then voice are converted into word corresponding to acoustic information, make the voice recorded can carry out accurate match by acoustic database and language database, thus exactly voice are converted into word.

In one embodiment, said method is further comprising the steps of: according to the phonetic feature in each pronunciation source, sets up the audio database in each pronunciation source.When source of pronouncing is for user, each user is then to there being respective audio database.Therefore, as shown in Figure 5, above-mentioned steps S41 may be embodied as following steps S51-S53:

Step S51, determines the pronunciation source of voice.

Step S52, according to the audio database in each pronunciation source, is voice match audio database.

Step S53, according to the acoustic information of the audio database determination voice of coupling.

In one embodiment, as shown in Figure 6, above-mentioned steps S42 may be embodied as following steps S61-S63:

Step S61, searches the word corresponding with acoustic information from language database.

Step S62, when the word corresponding with acoustic information is single word, according to particular term and/or semantic association relation, is single word coupling word and/or sentence.

Step S63, is defined as the word corresponding with acoustic information by with the word that single word matches and/or sentence.

Such as, the language database transformed voice that terminal is determined is Medical Language database, when terminal searches the word corresponding with acoustic information " α " from speech database, only match single word " Ah ", so according to the particular term in Medical Language database and contextual semantic association relation, can be that " Ah " matches single word " aspirin ", if and when mating with general purpose language database, probably can only match interjection " " for acoustic information " α ", visible, first terminal determines the language database transformed voice, and then adopt the language database determined that voice are converted into word, voice are transformed more pointed, more accurate.

In one embodiment, said method is further comprising the steps of: according to preset ratio, improve particular term and and particular term between meet the weight of element in language database of semantic association relation, wherein, element comprise in word, word, article one or more.Now, above-mentioned steps S42 may be embodied as following steps: according to the weight of element each in language database, match the word corresponding with acoustic information from language database.More accurate for making the voice of specific area be converted into word, higher preset ratio can be set usually, such as 1.5 times.Such as, for medical domain, can by the particular term " flu " in medical domain, " allergic ", " fever ", the word of particular term such as " aspirin " and relevant with these particular term (namely and meet semantic association relation between particular term), word or the article weight in language database improves 1.5 times, when so terminal mates the word corresponding with acoustic information from language database, can mate according to the height of the weight of each element in language database, thus make some specific technical term or words, articles etc. can matching treatment, voice are transformed more accurately with pointed.

For the different embodiments of step S14, the embodiment of method of speech processing provided by the invention is different.As shown in Figure 7, when terminal determines according to the predetermined keyword that comprises in the voice recording preset time period the language database transformed voice, said method can be embodied as following steps S71-S77:

Step S71, recorded speech.

Step S72, obtains the voice recording preset time period.

Step S73, judges whether the voice of preset time period comprise predetermined keyword.

Step S74, when the voice packet of preset time period is containing predetermined keyword, according to the corresponding relation between predetermined keyword and language database, determines the language database corresponding with predetermined keyword, as the language database transformed voice.

Voice are converted into word by the language database transformed voice by step S75.

Step S76, according to the corresponding relation between predetermined keyword and form, determines the form corresponding with predetermined keyword.

Step S77, is filled up to word in the form corresponding with predetermined keyword.

Wherein, step S76-S77 is a kind of embodiment of step S14.Such as, predetermined keyword comprises " flu ", " allergic ", " fever ", particular term in the medical domain such as " aspirin ", so corresponding with predetermined keyword form can be case form, also can refine to the surgery form in medical science, internal medicine form, dept. of dermatology's form etc.Predetermined keyword comprises " credit card ", " deposit ", particular term in the banking such as " financing ", the form filled in when so corresponding to predetermined keyword form can be and handle corresponding banking.Like this, word can be filled up in form by terminal automatically, and manually fills in without the need to user.

As shown in Figure 8, when terminal determines according to the phonetic feature of voice the language database transformed voice, said method can be embodied as following steps S81-S87:

Step S81, recorded speech.

Step S82, identifies the phonetic feature of voice.

Step S83, according to phonetic feature, determines the pronunciation source of voice.

Step S84, according to the corresponding relation between pronunciation source and language database, determines to originate with the pronunciation of voice corresponding language database, as the language database changed voice.

Voice are converted into word by language database by step S85.

Step S86, according to the corresponding relation between pronunciation source and form, determines to originate with pronunciation corresponding form.

Step S87, is filled up to word and pronounces to originate in corresponding form.

Wherein, step S86-S87 is the another kind of embodiment of step S14.Such as, when pronunciation source is the surgeon in hospital, corresponding form is then surgical case form with pronouncing to originate; When pronunciation source is the dermatologist in hospital, corresponding form is then dept. of dermatology's case form with pronouncing to originate; When pronunciation source is the business personnel in bank, be then the form of bank's transacting business with the corresponding form that pronounces to originate.Like this, word can be filled up in form by terminal automatically, and manually fills in without the need to user.

In one embodiment, form comprises multiple column, and each column is corresponding with at least one keyword, and therefore, step S14 may be embodied as following steps: the keyword corresponding according to column, determines word content corresponding with column in word; The word content corresponding with column is filled up in column.Such as, for case form in medical science, comprise the columns such as symptom, passing medical history, medication, allergies, terminal can determine the word content corresponding with each column according to the keyword of each column, and then is filled in by word content in corresponding column.

The said method that the embodiment of the present invention provides can be used in different fields or industry, such as medical domain, financial field, the communications field etc.A kind of method of speech processing that the embodiment of the present invention provides is illustrated below from medical domain.As shown in Figure 9, S91-S99 is comprised the following steps:

Step S91, using the particular term of different department as predetermined keyword, adds in general purpose language database.In this step, the article relevant to particular term can also be added in general purpose language database.

Step S92, sets up the language database corresponding to different department.

Step S93, records voice when doctor speaks.

Step S94, obtains the voice of the preset time period recorded.

Step S95, obtains representative predetermined keyword from the voice of preset time period.

Step S96, determines section office belonging to predetermined keyword, and determines the language database corresponding with these section office.

The voice of recording are converted into word according to the language database corresponding with these section office by step S97.

Step S98, determines and obtains form corresponding to these section office.Such as, if internal medicine, then obtain medical case form; If division of respiratory disease, then obtain division of respiratory disease case form.

Step S99, is filled up to the word changed in form corresponding to these section office.

In addition, can also be each doctor deploying audio database, audio database comprises the pronunciation character of doctor, like this, terminal can determine the audio database of doctor according to the pronunciation character of doctor, and then determines according to audio database the language database that this doctor is corresponding.Wherein, owing to may have multiple doctor in same section office, therefore, multiple audio database may correspond to same language database.The client that can also install in the terminal has divided classification in advance, each classification to there being respective language database, and to there being respective form.User first manually selects to enter a certain classification, then carries out recorded speech under this classification, and adopts the language database under this classification to transform voice, is finally filled up in form corresponding to this classification by the word content after conversion.From this embodiment, doctor is when to patient's interrogation, and without the need to manually filling in case form again, terminal can by recording and adopting specific language database that voice are changed into word, thus Auto-writing case form, for doctor saves the plenty of time.

Corresponding to above-mentioned a kind of method of speech processing, the embodiment of the present invention also provides a kind of voice processing apparatus, as shown in Figure 10, comprising:

Record module 101, for recorded speech;

Determination module 102, for determining the language database transformed voice;

Conversion module 103, for being converted into word by language database by voice;

Fill in module 104, for being filled up to by word in the form that matches with voice.

In one embodiment, as shown in figure 11, determination module 102 comprises:

Obtain submodule 1021, for obtaining the voice recording preset time period;

Judge submodule 1022, for judging whether the voice of preset time period comprise predetermined keyword;

First determines submodule 1023, for when the voice packet of preset time period is containing predetermined keyword, according to the corresponding relation between predetermined keyword and language database, determine the language database corresponding with predetermined keyword, as the language database transformed voice.

In one embodiment, as shown in figure 12, determination module 102 comprises:

Recognin module 1024, for identifying the phonetic feature of voice;

Second determines submodule 1025, for according to phonetic feature, determines the pronunciation source of voice;

3rd determines submodule 1026, for according to the corresponding relation between pronunciation source and language database, determines to originate with the pronunciation of voice corresponding language database, as the language database changed voice.

In one embodiment, as shown in figure 13, conversion module 103 comprises:

4th determines submodule 1031, for determining the acoustic information of voice;

First matched sub-block 1032, for matching the word corresponding with acoustic information from language database;

Transformant module 1033, for being converted into the word corresponding with acoustic information by voice.

In one embodiment, said apparatus also comprises: first sets up module, for the phonetic feature according to each pronunciation source, sets up the audio database in each pronunciation source.

In one embodiment, as shown in figure 14, determination module 102 comprises:

5th determines submodule 1027, for determining the pronunciation source of voice;

Second matched sub-block 1028, for the audio database according to each pronunciation source, is voice match audio database;

6th determines submodule 1029, for the acoustic information of the audio database determination voice according to coupling.

In one embodiment, as shown in figure 15, the first matched sub-block 1032 comprises:

Search unit 10321, for searching the word corresponding with acoustic information from language database;

First matching unit 10322, for when the word corresponding with acoustic information is single word, according to the particular term in language database and/or semantic association relation, is single word coupling word and/or sentence;

Determining unit 10323, for being defined as the word corresponding with acoustic information by with the word that single word matches and/or sentence.

In one embodiment, said apparatus also comprises: improve module, for according to preset ratio, improve particular term and and particular term between meet the weight of element in language database of semantic association relation, element comprises word, word and/or article;

First matched sub-block 1032 comprises: the second matching unit, for the weight according to element each in language database, matches the word corresponding with acoustic information from language database.

In one embodiment, as shown in figure 16, said apparatus also comprises:

Add module 105, before recorded speech, the particular term in field belonging to predetermined keyword is added in general purpose language database, obtains the language database corresponding with predetermined keyword;

Second sets up module 106, for setting up the corresponding relation between predetermined keyword and language database.

In one embodiment, as shown in figure 17, fill in module 104 to comprise:

7th determines submodule 1041, for according to the corresponding relation between predetermined keyword and form, determines the form corresponding with predetermined keyword;

First fills in submodule 1042, for being filled up in the form corresponding with predetermined keyword by word.

In one embodiment, as shown in figure 18, fill in module 104 to comprise:

8th determines submodule 1043, for according to the corresponding relation between pronunciation source and form, determines to originate with pronunciation corresponding form;

Second fills in submodule 1044, for being filled up to by word and pronouncing to originate in corresponding form.

In one embodiment, as shown in figure 19, fill in module 104 to comprise:

9th determines submodule 1045, and for the keyword corresponding according to the column in form, determine word content corresponding with column in word, form comprises at least one column, and each column is corresponding with at least one keyword;

3rd fills in submodule 1046, for being filled up in column by the word content corresponding with column.

Adopt the device that the embodiment of the present invention provides, by the language database corresponding with the voice recorded, voice are converted into word, and word is filled up in the form matched with voice, achieve the object exactly voice being converted into word, avoid the situation that voice content cannot accurately transform because there is technical term, can automatically word content being filled up in form simultaneously, making user without the need to manually filling in again, for user brings great convenience.

Those skilled in the art should understand, embodiments of the invention can be provided as method, system or computer program.Therefore, the present invention can adopt the form of complete hardware embodiment, completely software implementation or the embodiment in conjunction with software and hardware aspect.And the present invention can adopt in one or more form wherein including the upper computer program implemented of computer-usable storage medium (including but not limited to magnetic disk memory and optical memory etc.) of computer usable program code.

The present invention describes with reference to according to the process flow diagram of the method for the embodiment of the present invention, equipment (system) and computer program and/or block scheme.Should understand can by the combination of the flow process in each flow process in computer program instructions realization flow figure and/or block scheme and/or square frame and process flow diagram and/or block scheme and/or square frame.These computer program instructions can being provided to the processor of multi-purpose computer, special purpose computer, Embedded Processor or other programmable data processing device to produce a machine, making the instruction performed by the processor of computing machine or other programmable data processing device produce device for realizing the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.

These computer program instructions also can be stored in can in the computer-readable memory that works in a specific way of vectoring computer or other programmable data processing device, the instruction making to be stored in this computer-readable memory produces the manufacture comprising command device, and this command device realizes the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.

These computer program instructions also can be loaded in computing machine or other programmable data processing device, make on computing machine or other programmable devices, to perform sequence of operations step to produce computer implemented process, thus the instruction performed on computing machine or other programmable devices is provided for the step realizing the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.

Obviously, those skilled in the art can carry out various change and modification to the present invention and not depart from the spirit and scope of the present invention.Like this, if these amendments of the present invention and modification belong within the scope of the claims in the present invention and equivalent technologies thereof, then the present invention is also intended to comprise these change and modification.

Claims

1. a method of speech processing, is characterized in that, comprising:

Recorded speech;

Determine the language database that described voice are transformed;

By described language database, described voice are converted into word;

Described word is filled up in the form matched with described voice.

2. method according to claim 1, is characterized in that, the described language database determining to transform described voice, comprising:

Obtain the voice recording preset time period;

3. method according to claim 1, is characterized in that, the described language database determining to transform described voice, comprising:

Identify the phonetic feature of described voice;

4. the method according to any one of claim 1-3, is characterized in that, describedly by described language database, described voice is converted into word, comprising:

Determine the acoustic information of described voice;

5. method according to claim 4, is characterized in that, described method also comprises:

6. method according to claim 5, is characterized in that, the described acoustic information determining described voice, comprising:

Determine the pronunciation source of described voice;

7. the method according to any one of claim 4-6, is characterized in that, the described language database changed described voice comprises following at least one:

The particular term in field belonging to described predetermined keyword;

The article in field belonging to described predetermined keyword;

Semantic association relation.

8. method according to claim 7, is characterized in that, describedly from described language database, matches the word corresponding with described acoustic information, comprising:

9. method according to claim 7, is characterized in that,

Described method also comprises:

10. method according to claim 7, is characterized in that, before described recorded speech, described method also comprises:

11. according to the method in claim 2 or 3, it is characterized in that, is describedly filled up to by described word in the form matched with described voice, comprising:

Or,

12. methods according to claim 1, is characterized in that, described form comprises at least one column; Described each column is corresponding with at least one keyword; Describedly described word to be filled up in the form matched with described voice, to comprise:

13. 1 kinds of voice processing apparatus, is characterized in that, comprising:

Record module, for recorded speech;

14. devices according to claim 13, is characterized in that, described determination module comprises:

Obtain submodule, for obtaining the voice recording preset time period;

15. devices according to claim 13, is characterized in that, described determination module comprises:

Recognin module, for identifying the phonetic feature of described voice;

16. devices according to any one of claim 13-15, it is characterized in that, described conversion module comprises:

17. devices according to claim 16, is characterized in that, described device also comprises:

18. devices according to claim 17, is characterized in that, described determination module comprises:

19. devices according to any one of claim 16-18, it is characterized in that, described first matched sub-block comprises:

20. devices according to any one of claim 16-18, is characterized in that,

Described device also comprises:

Described first matched sub-block comprises:

21. devices according to any one of claim 16-18, it is characterized in that, described device also comprises:

22. devices according to claims 14 or 15, is characterized in that, described in fill in module and comprise:

Or,

23. devices according to claim 13, is characterized in that, described in fill in module and comprise: