CN105957524A

CN105957524A - Speech processing method and speech processing device

Info

Publication number: CN105957524A
Application number: CN201610264283.XA
Authority: CN
Inventors: 李霄寒; 田伟
Original assignee: Beijing Yunzhisheng Information Technology Co Ltd
Current assignee: Unisound Intelligent Technology Co Ltd
Priority date: 2016-04-25
Filing date: 2016-04-25
Publication date: 2016-09-21
Anticipated expiration: 2036-04-25
Also published as: CN105957524B

Abstract

The invention discloses a speech processing method and a speech processing device. The speech processing method is characterized in that speech information input by a user is received; a first letter corresponding to a preset-operational item and a second letter corresponding to item content are acquired by identifying the speech information; a first volume value corresponding to the first speech information and a second volume value corresponding to the second speech information are determined; a first energy index parameter corresponding to the first volume value and a second energy index parameter corresponding to the second volume value are determined; when the second energy index parameter is greater than the first energy index parameter, and the second energy index parameter is greater than a preset energy index parameter, target item content matched with the second letter is searched in a to-be-selected item content database; and the target item content is written in a corresponding item content table. By adopting the technical scheme provided by the invention, on the basis of guaranteeing the accuracy of the speech processing, and success rate and accuracy of semantic analysis are improved, and therefore user experience is improved.

Description

Method of speech processing and device

Technical field

The present invention relates to technical field of voice recognition, particularly relate to a kind of method of speech processing and device.

Background technology

During speech processes, being understood by semanteme when, depend on the effect of speech recognition Really.If speech recognition effect is poor, then can affect the effect of semantic analysis.Such as, for Fig. 1's Typing masterplate, user wants, by giving an oral account the purpose that " performing radiology department of section office " carries out automatically selecting, to select Control is as shown in Figure 2.Saying from the angle of semantic analyzer, " performing section office " is pre-operation entry, " puts Penetrate section " it is entry contents, semantic analyzer can be made a distinction by dictionary masterplate, is then converted into holding Line command.But during speech recognition, if for certain the word identification in whole piece sentence If effect is bad, may result in the failure of semantic analysis.Such as text has been identified as " holding section office to put Penetrate section ", lack " OK " word.So, whole identification process will be failed, and causing can not basis The phonetic entry of user performs corresponding operation, thus affects Consumer's Experience.

Summary of the invention

The embodiment of the present invention provides a kind of method of speech processing and device, in order to realize in guarantee speech processes Accuracy rate on the basis of, improve the success rate of semantic analysis and accuracy rate, thus promote the use of user Experience.

First aspect according to embodiments of the present invention, it is provided that a kind of method of speech processing, including:

Receiving the voice messaging of user's input, wherein, described voice messaging includes that pre-operation entry is corresponding The second voice messaging that first voice messaging is corresponding with entry contents；

Described voice messaging is identified, to obtain the first word corresponding to described pre-operation entry and institute State the second word that entry contents is corresponding；

Determine that the first volume value that described first voice messaging is corresponding is corresponding with described second voice messaging Two volume values；

Determine that the first energy indexes parameter that described first volume value is corresponding is corresponding with described second volume value Second energy indexes parameter；

When described second energy indexes parameter is more than described first energy indexes parameter, and described second energy When index parameter is more than preset energy index parameter, search with described in entry contents data base to be selected The target entry content of the second characters matching；

Described target entry content is filled up in the entry contents form of correspondence.

In this embodiment, to comprising the first voice messaging corresponding to pre-operation entry and entry contents pair After the voice messaging of the second voice messaging answered is identified, determine the volume value of two voice messagings respectively, And then determine energy indexes parameter according to the volume value of two voice messagings, entry contents corresponding second The energy indexes parameter of voice messaging is more than the energy indexes parameter of the first voice messaging, and refers to more than presetting During mark parameter, then by word corresponding for this entry contents preferentially with waiting in entry contents data base to be selected Select entry contents to mate, thus matched target entry content is filled up to the entry of correspondence In table of contents.So, user can be filled up a form by phonetic entry, it is not necessary to manually selects, and And allow user that different vocabulary are used different volumes, thus determine different according to the difference of volume Energy indexes parameter, determines whether the operation performing to fill in entry contents form according to energy indexes parameter, Avoid after mistake occurs in speech recognition, it is impossible to carry out the problem that form fills in and occur, ensureing at voice On the basis of the accuracy rate of reason, improve success rate and the accuracy rate of semantic analysis, also improve user and utilize Success rate that phonetic entry is filled up a form and user experience.

In one embodiment, described determine the first energy indexes parameter that described first volume value is corresponding and The second energy indexes parameter that described second volume value is corresponding, including:

Obtain the corresponding relation between volume value interval and energy indexes parameter, wherein, described volume value district Between become positive correlation with described energy indexes parameter；

Determine belonging to interval and described second volume value of the first volume value belonging to described first volume value Two volume values are interval；

According to the corresponding relation between volume value interval and energy indexes parameter, determine described first volume value The second energy indexes ginseng that interval the first corresponding energy indexes parameter is corresponding with described second volume value interval Number.

In this embodiment it is possible to the corresponding relation between preset volume value interval and energy indexes parameter, Thus determine that the first volume value interval is right according to the corresponding relation between volume value interval and energy indexes parameter The second energy indexes parameter that the first energy indexes parameter answered is corresponding with described second volume value interval.Tool Body ground, the interval value that positive correlation, i.e. volume value can be become with energy indexes parameter interval of volume value is the biggest, Energy indexes parameter is the biggest, and the value in volume value interval is the least, and energy indexes parameter is the least.And energy indexes Parameter is big, then the word that entry contents is corresponding is the highest by the probability carrying out priority match.So, former Energy indexes parameter is increased on the basis of having speech recognition technology, can be in the accuracy rate ensureing speech processes On the basis of, improve success rate and the accuracy rate of semantic analysis, also improve user and utilize phonetic entry to fill in The success rate of form and user experience.

In one embodiment, search and described second characters matching in entry contents data base to be selected Target entry content, including:

Calculate described second word and each entry contents to be selected in described entry contents data base to be selected Between the first similarity；

Entry contents to be selected the highest for first similarity is defined as described target entry content.

In this embodiment, calculate that the second word is each with entry contents data base to be selected treats selector bar The first similarity between mesh content, thus entry contents to be selected the highest for similarity is defined as target Entry contents, this way it is ensured that the accuracy rate of speech recognition, it also avoid owing to voice identification result goes out Show mistake and cause carrying out the problem generation of semantic analysis, improve success rate and the standard of semantic analysis Really rate.

In one embodiment, the described entry contents form that described target entry content is filled up to correspondence In, including:

Determine the object run entry that described target entry content is corresponding；

Calculate between described object run entry and described first word corresponding to described pre-operation entry Two similarities；

In described second similarity more than or equal to when presetting similarity, described target entry content is filled out Write in the entry contents form that described object run entry is corresponding.

In this embodiment, before target entry content is filled up to the entry contents form of correspondence, also Can first determine the object run entry that target entry content is corresponding, then by object run entry and pre-behaviour The first word making entry corresponding carries out Similarity Measure, if both similarities are more than presetting similarity, Be then coupling both explanation, i.e. the entry of user's pre-operation is exactly object run entry, so, enters one Step ensure that the accuracy of semantic analysis result.

In one embodiment, at the first volume value and described determining that described first voice messaging is corresponding Before the second volume value that two voice messagings are corresponding, described method also includes:

Calculate the confidence level of described voice messaging；

Judge that whether described confidence level is less than pre-seting reliability；

At described confidence level less than when pre-seting reliability, perform described to determine that described first voice messaging is corresponding The step of the first volume value second volume value corresponding with described second voice messaging.

In this embodiment it is possible to first calculate the confidence level of voice messaging, if the confidence level of voice messaging More than or equal to pre-seting reliability, then explanation confidence level is higher, can successfully carry out semantic analysis, then may be used Using the speech analysis scheme in existing correlation technique to carry out semantic analysis, and if voice messaging Confidence level is less than pre-seting reliability, then explanation confidence level is relatively low, and semantic analysis may be failed, now, The difference according to volume that can use the present invention carries out the scheme of semantic analysis.

Second aspect according to embodiments of the present invention, it is provided that a kind of voice processing apparatus, including:

Receiver module, for receiving the voice messaging of user's input, wherein, described voice messaging includes pre- The second voice messaging that first voice messaging corresponding to operation entries is corresponding with entry contents；

Identification module is for being identified described voice messaging, corresponding to obtain described pre-operation entry The first word second word corresponding with described entry contents；

First determines module, for determining the first volume value that described first voice messaging is corresponding and described the The second volume value that two voice messagings are corresponding；

Second determines module, for determining the first energy indexes parameter and institute that described first volume value is corresponding State the second energy indexes parameter that the second volume value is corresponding；

Search module, for being more than described first energy indexes parameter when described second energy indexes parameter, And described second energy indexes parameter more than preset energy index parameter time, in entry contents data to be selected Storehouse is searched the target entry content with described second characters matching；

Fill in module, in the entry contents form that described target entry content is filled up to correspondence.

In one embodiment, described second determines that module includes:

Obtain submodule, for obtaining the corresponding relation between volume value interval and energy indexes parameter, its In, the interval and described energy indexes parameter of described volume value becomes positive correlation；

First determines submodule, for determining the first volume value interval and institute belonging to described first volume value State the second volume value belonging to the second volume value interval；

Second determines submodule, is used for according to the corresponding relation between volume value interval and energy indexes parameter, Determine that the first energy indexes parameter corresponding to described first volume value interval and described second volume value interval are right The the second energy indexes parameter answered.

In one embodiment, described lookup module includes:

First calculating sub module, is used for calculating described second word and described entry contents data base to be selected In the first similarity between each entry contents to be selected；

Content determines submodule, described for entry contents to be selected the highest for the first similarity being defined as Target entry content.

In one embodiment, fill in module described in include:

Entry determines submodule, for determining the object run entry that described target entry content is corresponding；

Second calculating sub module, corresponding with described pre-operation entry for calculating described object run entry The second similarity between described first word；

Fill in submodule, for when described second similarity is more than or equal to default similarity, by institute State target entry content to be filled up in the entry contents form that described object run entry is corresponding.

In one embodiment, described device also includes:

Computing module, at the first volume value and described second determining that described first voice messaging is corresponding Before the second volume value that voice messaging is corresponding, calculate the confidence level of described voice messaging；

Judge module, is used for judging that whether described confidence level is less than pre-seting reliability；

Trigger module, for when described confidence level is less than and pre-sets reliability, triggering described second and determine mould Block determines second that the first volume value that described first voice messaging is corresponding is corresponding with described second voice messaging Volume value.

It should be appreciated that it is only exemplary and explanatory that above general description and details hereinafter describe , the present invention can not be limited.

Other features and advantages of the present invention will illustrate in the following description, and, partly from froming the perspective of Bright book becomes apparent, or understands by implementing the present invention.The purpose of the present invention is excellent with other Point can come real by structure specifically noted in the description write, claims and accompanying drawing Now and obtain.

Below by drawings and Examples, technical scheme is described in further detail.

Accompanying drawing explanation

Accompanying drawing herein is merged in description and constitutes the part of this specification, it is shown that meet this Bright embodiment, and for explaining the principle of the present invention together with description.

Fig. 1 is the typing template schematic diagram in correlation technique.

Fig. 2 is the entry contents option schematic diagram in correlation technique.

Fig. 3 is the flow chart according to a kind of method of speech processing shown in an exemplary embodiment.

Fig. 4 is according to the flow chart of step S304 in the method for speech processing shown in an exemplary embodiment.

Fig. 5 is according to the flow chart of step S305 in the method for speech processing shown in an exemplary embodiment.

Fig. 6 is according to the flow chart of step S306 in the method for speech processing shown in an exemplary embodiment.

Fig. 7 is the flow chart according to the another kind of method of speech processing shown in an exemplary embodiment.

Fig. 8 is the block diagram according to a kind of voice processing apparatus shown in an exemplary embodiment.

Fig. 9 is to determine module according in a kind of voice processing apparatus shown in an exemplary embodiment second Block diagram.

Figure 10 is according to the frame searching module in a kind of voice processing apparatus shown in an exemplary embodiment Figure.

Figure 11 is according to the frame filling in module in a kind of voice processing apparatus shown in an exemplary embodiment Figure.

Figure 12 is the block diagram according to the another kind of voice processing apparatus shown in an exemplary embodiment.

Detailed description of the invention

Here will illustrate exemplary embodiment in detail, its example represents in the accompanying drawings.Following retouches Stating when relating to accompanying drawing, unless otherwise indicated, the same numbers in different accompanying drawings represents same or analogous and wants Element.Embodiment described in following exemplary embodiment does not represent own consistent with the present invention Embodiment.On the contrary, they only with as appended claims describes in detail, the present invention some The example of the apparatus and method that aspect is consistent.

Fig. 3 is the flow chart according to the method for speech processing shown in an exemplary embodiment.This voice side of waking up up Method is applied in terminal unit, and this terminal unit can be mobile phone, computer, digital broadcast terminal, Messaging devices, game console, tablet device, armarium, body-building equipment, individual digital helps Arbitrary equipment with voice control function such as reason.As it is shown on figure 3, the method comprising the steps of S301-S306:

In step S301, receiving the voice messaging of user's input, wherein, voice messaging includes pre-behaviour Make the second voice messaging that the first voice messaging corresponding to entry is corresponding with entry contents；

Wherein, in some forms, all there is operation entries and entry contents, during as filled in school report, Including name, personality, achievement etc., these broadly fall into operation entries, and concrete Zhang San, female, 90 points It it is then corresponding entry contents.And for example, the inspection classification in Fig. 1, execution section office and execution time etc. are all Belong to operation entries, and the plain film of correspondence, radiology department and on March 8th, 2011 etc. belong to entry contents. User wants which operation entries is carried out voice operating, it is possible to this operation entries of phonetic entry, this behaviour It is pre-operation entry as entry.

In step s 302, voice messaging is identified, to obtain pre-operation entry corresponding first The second word that word is corresponding with entry contents；

In step S303, determine the first volume value and the second voice messaging that the first voice messaging is corresponding The second corresponding volume value；

For the ease of carrying out semantic analysis, user is to different vocabulary, it is possible to use different volumes, from And emphasizing the implication in semanteme, such as user says " holding radiology department of section office ", wherein " radiology department " three The sound of individual word is bigger, then explanation radiology department is the emphasis in semantic analysis.

In step s 304, the first energy indexes parameter and the second volume that the first volume value is corresponding are determined The second energy indexes parameter that value is corresponding；

The energy indexes parameter that different volume values is corresponding can be different, thus according to energy indexes parameter Go to determine whether that executive table is filled in.

In step S305, when the second energy indexes parameter is more than the first energy indexes parameter, and second Energy indexes parameter more than preset energy index parameter time, in entry contents data base to be selected search and The target entry content of the second characters matching；

In step S306, target entry content is filled up in the entry contents form of correspondence.

In one embodiment, as shown in Figure 4, above-mentioned steps S304 includes step S401-S403:

In step S401, obtain the corresponding relation between volume value interval and energy indexes parameter, its In, volume value interval becomes positive correlation with energy indexes parameter；

Wherein it is possible to the corresponding relation between preset volume value interval and energy indexes parameter, thus according to Corresponding relation between volume value interval and energy indexes parameter determines first that the first volume value interval is corresponding The second energy indexes parameter that energy indexes parameter is corresponding with the second volume value interval.Specifically, volume value The interval value that positive correlation, i.e. volume value can be become with energy indexes parameter interval is the biggest, energy indexes parameter The biggest, the value in volume value interval is the least, and energy indexes parameter is the least.

For example, volume value is characterized by decibel value, for improving the success that user speech is filled up a form Rate, the decibel value that can arrange voice is the highest, and energy indexes parameter is the biggest, volume value interval and energy in this example Corresponding relation between figureofmerit parameter is as shown in table 1.

Table 1

Decibel value	Energy indexes parameter
		0～20	1
21～30	2
		31～60	3
61～80	4

In step S402, determine the first volume value interval and the second volume value belonging to the first volume value The second affiliated volume value is interval；

In step S403, according to the corresponding relation between volume value interval and energy indexes parameter, really Determine the second energy that the first energy indexes parameter corresponding to the first volume value interval is corresponding with the second volume value interval Figureofmerit parameter.

In this embodiment, energy indexes parameter is big, then the word that entry contents is corresponding is carried out preferential The probability joined is the highest.So, on the basis of original speech recognition technology, increase energy indexes parameter, Success rate and the accuracy rate of semantic analysis on the basis of the accuracy rate ensureing speech processes, can be improved, Also success rate and user experience that user utilizes phonetic entry to fill up a form are improved.

In one embodiment, as it is shown in figure 5, above-mentioned steps S305 can include step S501-S502:

In step S501, calculate the second word and select with each waiting in entry contents data base to be selected The first similarity between entry contents；

In entry contents data base to be selected, have entry to be selected, as medical treatment class data Storehouse, entry contents to be selected can include performing section office, such as radiology department, affect section, general medicine, disappears Change section, department of endocrinology etc. can also include doctor, as second-class in Zhang San, Li Si, king, and for other class Data base, such as student performance class, then can include subject, such as politics, history, geography etc..

In step S502, entry contents to be selected the highest for the first similarity is defined as target entry Content.

In one embodiment, as shown in Figure 6, above-mentioned steps S306 includes step S601-S603:

In step s 601, the object run entry that target entry content is corresponding is determined；

In entry contents data base to be selected, entry contents to be selected and its affiliated operation entries should It is corresponding storage, therefore, according to target entry content, it may be determined that object run entry.

In step S602, calculate between object run entry and the first word corresponding to pre-operation entry The second similarity；

In step S603, in the second similarity more than or equal to when presetting similarity, by target bar Mesh content is filled up in the entry contents form that object run entry is corresponding.

In one embodiment, as it is shown in fig. 7, at the first volume value determining that the first voice messaging is corresponding Before the second volume value corresponding with the second voice messaging, method also includes step S701-S703:

In step s 701, the confidence level of voice messaging is calculated；

Wherein, the value of confidence level is between the scope of 0～1, owing to confidence level is used to assess voice The reliability of recognition result, therefore confidence level is the highest, illustrates that voice identification result is the most accurate.

In step S702, it is judged that whether confidence level is less than pre-seting reliability；Pre-set taking of confidence threshold Value is between the scope of 0～1.

In step S703, at confidence level less than when pre-seting reliability, perform to determine the first voice messaging The first corresponding volume value and the step of the second volume value corresponding to the second voice messaging.

Following for apparatus of the present invention embodiment, may be used for performing the inventive method embodiment.

Fig. 8 is the block diagram according to a kind of voice processing apparatus shown in an exemplary embodiment, and this device can With by software, hardware or both be implemented in combination with become the some or all of of terminal unit.Such as figure Shown in 8, this voice processing apparatus includes:

Receiver module 81, for receiving the voice messaging of user's input, wherein, described voice messaging includes The second voice messaging that first voice messaging corresponding to pre-operation entry is corresponding with entry contents；

Identification module 82, for being identified described voice messaging, to obtain described pre-operation entry pair The second word that the first word of answering is corresponding with described entry contents；

First determines module 83, for determining the first volume value that described first voice messaging is corresponding and described The second volume value that second voice messaging is corresponding；

Second determines module 84, for determine the first energy indexes parameter that described first volume value is corresponding and The second energy indexes parameter that described second volume value is corresponding；

Search module 85, for when described second energy indexes parameter is more than described first energy indexes ginseng Number, and when described second energy indexes parameter is more than preset energy index parameter, in entry contents to be selected Data base searches the target entry content with described second characters matching；

Fill in module 86, in the entry contents form that described target entry content is filled up to correspondence.

In one embodiment, as it is shown in figure 9, described second determines that module 84 includes:

Obtain submodule 91, for obtaining the corresponding relation between volume value interval and energy indexes parameter, Wherein, the interval and described energy indexes parameter of described volume value becomes positive correlation；

First determines submodule 92, for determine the first volume value belonging to described first volume value interval and The second volume value belonging to described second volume value is interval；

Second determines submodule 93, for according to the corresponding pass between volume value interval and energy indexes parameter System, determines the first energy indexes parameter corresponding to described first volume value interval and described second volume value district Between corresponding the second energy indexes parameter.

As shown in Figure 10, in one embodiment, described lookup module 85 includes:

First calculating sub module 101, is used for calculating described second word and described entry contents number to be selected According to the first similarity between entry contents to be selected each in storehouse；

Content determines submodule 102, for entry contents to be selected the highest for the first similarity being defined as Described target entry content.

As shown in figure 11, in one embodiment, fill in module 86 described in include:

Entry determines submodule 111, for determining the object run entry that described target entry content is corresponding；

Second calculating sub module 112, is used for calculating described object run entry and described pre-operation entry pair The second similarity between described first word answered；

Fill in submodule 113, be used for when described second similarity is more than or equal to default similarity, Described target entry content is filled up in the entry contents form that described object run entry is corresponding.

As shown in figure 12, in one embodiment, said apparatus also includes:

Computing module 121, for determining the first volume value that described first voice messaging is corresponding and described Before the second volume value that second voice messaging is corresponding, calculate the confidence level of described voice messaging；

Judge module 122, is used for judging that whether described confidence level is less than pre-seting reliability；

Trigger module 123, at described confidence level less than when pre-seting reliability, trigger described second true Cover half block determines that the first volume value that described first voice messaging is corresponding is corresponding with described second voice messaging Second volume value.

Those skilled in the art it should be appreciated that embodiments of the invention can be provided as method, system or Computer program.Therefore, the present invention can use complete hardware embodiment, complete software implementation, Or combine the form of embodiment in terms of software and hardware.And, the present invention can use one or more The computer-usable storage medium wherein including computer usable program code (includes but not limited to disk Memorizer and optical memory etc.) form of the upper computer program implemented.

The present invention is with reference to method, equipment (system) and computer program according to embodiments of the present invention The flow chart of product and/or block diagram describe.It should be understood that flow process can be realized by computer program instructions Stream in each flow process in figure and/or block diagram and/or square frame and flow chart and/or block diagram Journey and/or the combination of square frame.These computer program instructions can be provided to general purpose computer, dedicated computing The processor of machine, Embedded Processor or other programmable data processing device, to produce a machine, makes Must be produced by the instruction that the processor of computer or other programmable data processing device performs and be used for realizing The merit specified in one flow process of flow chart or multiple flow process and/or one square frame of block diagram or multiple square frame The device of energy.

These computer program instructions may be alternatively stored in and computer or the process of other programmable datas can be guided to set In the standby computer-readable memory worked in a specific way so that be stored in this computer-readable memory In instruction produce and include the manufacture of command device, this command device realize in one flow process of flow chart or The function specified in multiple flow processs and/or one square frame of block diagram or multiple square frame.

These computer program instructions also can be loaded in computer or other programmable data processing device, Make on computer or other programmable devices, perform sequence of operations step computer implemented to produce Process, thus the instruction performed on computer or other programmable devices provides for realizing at flow chart The step of the function specified in one flow process or multiple flow process and/or one square frame of block diagram or multiple square frame 。

Obviously, those skilled in the art can carry out various change and modification without deviating from this to the present invention The spirit and scope of invention.So, if these amendments of the present invention and modification belong to right of the present invention and want Ask and within the scope of equivalent technologies, then the present invention is also intended to comprise these change and modification.

Claims

1. a method of speech processing, it is characterised in that including:

Method the most according to claim 1, it is characterised in that described determine described first volume value The second energy indexes parameter that the first corresponding energy indexes parameter is corresponding with described second volume value, including:

Method the most according to claim 1, it is characterised in that entry contents data base to be selected Middle lookup and the target entry content of described second characters matching, including:

Method the most according to claim 1, it is characterised in that described by described target entry content It is filled up in the entry contents form of correspondence, including:

Method the most according to any one of claim 1 to 4, it is characterised in that described determining The first volume value that first voice messaging is corresponding, second volume value corresponding with described second voice messaging it Before, described method also includes:

Calculate the confidence level of described voice messaging；

6. a voice processing apparatus, it is characterised in that including:

Device the most according to claim 6, it is characterised in that described second determines that module includes:

Device the most according to claim 6, it is characterised in that described lookup module includes:

Device the most according to claim 6, it is characterised in that described in fill in module and include:

10. according to the device according to any one of claim 6 to 9, it is characterised in that described device Also include: