CN104575503A

CN104575503A - Speech recognition method and device

Info

Publication number: CN104575503A
Application number: CN201510024583.6A
Authority: CN
Inventors: 何伟旭
Original assignee: Guangdong Midea Refrigeration Equipment Co Ltd
Current assignee: GD Midea Air Conditioning Equipment Co Ltd
Priority date: 2015-01-16
Filing date: 2015-01-16
Publication date: 2015-04-29
Anticipated expiration: 2035-01-16
Also published as: CN104575503B

Abstract

The invention discloses a speech recognition method. The speech recognition method includes the following steps that when a speech signal is received and the current network state is normal, the received speech signal is transmitted to a preset recognition server to be recognized, and recognition results which are fed back by the recognition server are acquired; home terminal recognition is conducted on the received speech signal, and the recognition results are acquired; recognition accuracy degrees corresponding to the recognition results are calculated; the recognition result which has the highest recognition accuracy degree and is fed back by the recognition server is used as the recognition result of the speed signal. The invention further discloses a speech recognition device. According to the speech recognition method and device, the speed recognition accuracy is guaranteed, and meanwhile the speed recognition flexibility is improved.

Description

Audio recognition method and device

Technical field

The present invention relates to Voice command field, particularly relate to audio recognition method and device.

Background technology

Traditional household electrical appliance telepilot, panel button carry out man-machine interaction, and speech recognition technology makes machine enough in Voice-remote-control, becomes a kind of new man-machine interaction mode by Voice command terminal.Terminal speech identification comprises two kinds of modes, and first kind of way is identified off-line mode, is about to identify that engine is embedded on the CPU of local terminal, the CPU of local terminal runs recognizer, is called off-line type identification; The second is high in the clouds recognition method, to identify engine-operated server beyond the clouds, terminal device connects cloud server by network, and the sound signal received is sent to cloud server, cloud server returns to terminal recognition result, and this identification is called online identification.Identified off-line does not need through Internet Transmission, therefore fast response time, but owing to being limited by the impact of equipment cpu performance, CPU processing power is limited, identify that engine algorithms is simpler, therefore recognition accuracy is general, and ONLINE RECOGNITION engine beyond the clouds server runs, and can utilize the powerful processing power of Cloud Server, run more complicated identification engine algorithms, thus recognition accuracy is improved greatly, but be subject to the impact of network quality, limitation is higher.

Summary of the invention

Fundamental purpose of the present invention is to propose a kind of audio recognition method, while being intended to ensure accuracy of speech recognition, improves the dirigibility of speech recognition.

For achieving the above object, a kind of audio recognition method provided by the invention, described audio recognition method comprises the following steps:

When receiving voice signal, and when current network state is normal, the voice signal received being sent to default identified server and identifying, and obtain the recognition result of identified server feedback;

Local terminal identification is carried out to the voice signal received, and obtains recognition result;

Calculate the recognition accuracy that each recognition result is corresponding;

The recognition result fed back by identified server the highest for recognition accuracy is as the recognition result of described voice signal.

Preferably, described audio recognition method also comprises:

When current network state is abnormal, local terminal identification is carried out to the voice signal received, and obtains recognition result corresponding to described voice signal;

Using the recognition result of described recognition result as described voice signal.

Preferably, the step of the recognition accuracy that each recognition result of described calculating is corresponding comprises:

Each server and identification parameter corresponding to local terminal identification is determined based on described recognition result;

Call server that terminal prestores and recognition accuracy corresponding to local terminal identification;

Weight according to presetting is weighted on average to described identification parameter and described recognition accuracy, to determine the recognition accuracy that each recognition result is corresponding respectively.

Preferably, described identification parameter comprises the response time of server and local terminal, recognition result matching degree and performance parameter.

Preferably, described using identified server the highest for recognition accuracy feed back recognition result as the recognition result of described voice signal step after, described audio recognition method also comprises:

The server corresponding to described recognition result or the recognition accuracy of local terminal increase by a preset value.

In addition, for achieving the above object, the present invention also proposes a kind of speech recognition equipment, and described speech recognition equipment comprises:

Remote identification module, for when receiving voice signal, and when current network state is normal, being sent to default identified server by the voice signal received and identifying, and obtains the recognition result of identified server feedback;

Local terminal identification module, for carrying out local terminal identification to the voice signal received, and obtains recognition result;

Accuracy computing module, for calculating recognition accuracy corresponding to each recognition result;

Processing module, for the recognition result that fed back by identified server the highest for the recognition accuracy recognition result as described voice signal.

Preferably, described local terminal identification module, also for when current network state is abnormal, carries out local terminal identification to the voice signal received, and obtains recognition result corresponding to described voice signal; Described processing module, also for using the recognition result of described recognition result as described voice signal

Preferably, described accuracy computing module comprises:

Determining unit, for determining each server and identification parameter corresponding to local terminal identification based on described recognition result;

Call unit, for calling server that terminal prestores and recognition accuracy corresponding to local terminal identification;

Described determining unit, also for being weighted on average to described identification parameter and described recognition accuracy respectively, to determine the recognition accuracy that each recognition result is corresponding according to the weight preset.

Preferably, described speech recognition equipment also comprises: adjusting module, for increasing by a preset value to the recognition accuracy of server corresponding to described recognition result or local terminal.

The audio recognition method that the present invention proposes and device, when receiving voice signal, and when current network state is normal, being sent to default identified server by the voice signal received and identifying, and obtain the recognition result of identified server feedback; Local terminal identification is carried out to the voice signal received, and obtains recognition result; Calculate the recognition accuracy that each recognition result is corresponding; The recognition result fed back by identified server the highest for recognition accuracy, as the recognition result of described voice signal, improves the accuracy rate to voice signal identification and flexibility ratio.

Accompanying drawing explanation

Fig. 1 is the schematic flow sheet of audio recognition method preferred embodiment of the present invention;

Fig. 2 is the refinement schematic flow sheet of step S30 in Fig. 1;

Fig. 3 is the high-level schematic functional block diagram of speech recognition equipment preferred embodiment of the present invention;

Fig. 4 is the refinement high-level schematic functional block diagram of accuracy computing module in Fig. 3.

The realization of the object of the invention, functional characteristics and advantage will in conjunction with the embodiments, are described further with reference to accompanying drawing.

Embodiment

Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.

The invention provides a kind of audio recognition method.

With reference to the schematic flow sheet that Fig. 1, Fig. 1 are audio recognition method preferred embodiment of the present invention.

The present embodiment proposes a kind of audio recognition method, and described audio recognition method comprises the following steps:

Step S10, when receiving voice signal, and when current network state is normal, being sent to default identified server by the voice signal received and identifying, and obtains the recognition result of identified server feedback;

In the present embodiment, when receiving voice signal, current network state can be judged, can interconnection network and can proper communication be judge that current network state is normal in terminal.The identifying of identified server to the voice signal received is as follows, the waveform that the voice signal that acquisition receives is corresponding, and determines the control routine that this waveform is corresponding, using the recognition result of this control routine as voice signal.

It will be appreciated by persons skilled in the art that the accuracy rate for improving voice signal identification, after noise reduction and filtering process can being carried out to the voice signal received, the voice signal after process being sent to server.Identified server can be the server of different manufacturers, also can be the server that same producer is placed on diverse location.

Step S20, carries out local terminal identification to the voice signal received, and obtains recognition result;

In the present embodiment, local terminal identifying is similar to the identifying of server, does not repeat them here.It will be appreciated by persons skilled in the art that there is no sequencing between step S10 and step S20.

Step S30, calculates the recognition accuracy that each recognition result is corresponding;

In the present embodiment, calculate the detailed process of recognition accuracy corresponding to each recognition result as shown in Figure 2, described step S30 comprises:

Step S31, determines each server and identification parameter corresponding to local terminal identification based on described recognition result;

Step S33, calls server that terminal prestores and recognition accuracy corresponding to local terminal identification;

Step S34, the weight according to presetting is weighted on average to described identification parameter and described recognition accuracy, to determine the recognition accuracy that each recognition result is corresponding respectively.

Described identification parameter comprises the response time of server and local terminal, recognition result matching degree and performance parameter; Or this identification parameter also can comprise response time and the recognition result matching degree of server and local terminal.It will be understood by those skilled in the art that, described identification parameter comprises outside the response time of server and local terminal, recognition result matching degree and performance parameter, also can comprise other parameter, those skilled in the art can according to increasing the requirement of final recognition result accuracy or reducing the feature comprised in identification parameter.The computation process of response time, recognition result matching degree and performance parameter is as follows:

The concrete computation process of described response time is, when speech recognition signal will be received be sent to identified server, current point in time can be recorded as very first time point, and the speech recognition signal received is sent to identified server, when receiving the recognition result of identified server feedback, current point in time is recorded as the second time point, and be as server response time using the difference before very first time point and the second time point, the calculating of local terminal response time in like manner, the speech recognition signal received is sent to the very first time point that identification module carries out identifying by record, and get second time point of recognition result of voice signal, and be as local terminal response time using the difference before very first time point and the second time point.Described recognition result matching degree can be the matching degree between the waveform corresponding to recognition result and the waveform corresponding to voice signal that receives, when this matching degree calculates, matching degree between waveform be calculated as prior art, do not repeat them here, this recognition result matching degree can be added in the recognition result of the identification module feedback in server and terminal.Performance parameter can be calculated by the MIPS (Million InstructionsPer Second, the average execution speed of the long fixed point instruction of list), memory size etc. of server or terminal, and this performance parameter can be included in recognition result.The weight of response time, recognition result matching degree and performance parameter can be set as required by developer, and the weight sum of parameters is 1.

It will be appreciated by persons skilled in the art that because identified server may break down, when receiving recognition result corresponding to speech recognition signal within a preset time interval, determine the identification parameter corresponding to each recognition result.

Step S40, the recognition result fed back by identified server the highest for recognition accuracy is as the recognition result of described voice signal.

Can sort to the accuracy of each recognition result, extract the highest identification of accuracy as final recognition result.It will be appreciated by persons skilled in the art that the accuracy for improving speech recognition, also comprising step after described step S40: the server corresponding to described recognition result or the recognition accuracy of local terminal increase by a preset value.This preset value is preferably 1.

For improving the dirigibility to voice signal identification, described audio recognition method also comprises step: when current network state is abnormal, carries out local terminal identification, and obtain recognition result corresponding to described voice signal to the voice signal received; Using the recognition result of described recognition result as described voice signal.In the present embodiment, when Network Abnormal, the recognition accuracy of local terminal is constant.This Network Abnormal refers to non-interconnection network or interconnection network but cannot proper communication.

For improving the accuracy rate of speech recognition, this audio recognition method also comprises: obtain terminal current network state, and the mapping relations between state Network Based and information, export the information corresponding to current network state.This information is preferably light information, is realized by warning light.

The audio recognition method that the present embodiment proposes, when receiving voice signal, and when current network state is normal, being sent to default identified server by the voice signal received and identifying, and obtains the recognition result of identified server feedback; Local terminal identification is carried out to the voice signal received, and obtains recognition result; Calculate the recognition accuracy that each recognition result is corresponding; The recognition result fed back by identified server the highest for recognition accuracy, as the recognition result of described voice signal, improves the accuracy rate to voice signal identification.

The present invention further provides a kind of speech recognition equipment.

With reference to the high-level schematic functional block diagram that Fig. 3, Fig. 3 are for speech recognition equipment preferred embodiment of the present invention.

It is emphasized that, to one skilled in the art, functional block diagram shown in Fig. 3 is only the exemplary plot of a preferred embodiment, and those skilled in the art, around the functional module of the speech recognition equipment shown in Fig. 3, can carry out supplementing of new functional module easily; The title of each functional module is self-defined title, only for auxiliary each program function block understanding this speech recognition equipment, be not used in and limit technical scheme of the present invention, the core of technical solution of the present invention is, the function that the functional module of respective define name will be reached.

The present embodiment proposes a kind of speech recognition equipment, and described speech recognition equipment comprises:

Remote identification module 10, for when receiving voice signal, and when current network state is normal, being sent to default identified server by the voice signal received and identifying, and obtains the recognition result of identified server feedback;

Local terminal identification module 20, for carrying out local terminal identification to the voice signal received, and obtains recognition result;

In the present embodiment, local terminal identifying is similar to the identifying of server, does not repeat them here.

Accuracy computing module 30, for calculating recognition accuracy corresponding to each recognition result;

In the present embodiment, calculate the detailed process of recognition accuracy corresponding to each recognition result as shown in Figure 4, described accuracy computing module 30 comprises:

Determining unit 31, for determining each server and identification parameter corresponding to local terminal identification based on described recognition result;

Call unit 32, for calling server that terminal prestores and recognition accuracy corresponding to local terminal identification;

Described determining unit 31, also for being weighted on average to described identification parameter and described recognition accuracy respectively, to determine the recognition accuracy that each recognition result is corresponding according to the weight preset.

Processing module 40, for the recognition result that fed back by identified server the highest for the recognition accuracy recognition result as described voice signal.

Can sort to the accuracy of each recognition result, extract the highest identification of accuracy as final recognition result.It will be appreciated by persons skilled in the art that described speech recognition equipment also comprises: adjusting module in order to improve the accuracy of speech recognition, for increasing by a preset value to the recognition accuracy of server corresponding to described recognition result or local terminal.This preset value is preferably 1.

For improving the accuracy rate of speech recognition, this speech recognition equipment also comprises: acquisition module obtains terminal current network state; Output module, for the mapping relations between state Network Based and information, exports the information corresponding to current network state.This information is preferably light information, is realized by warning light.

The speech recognition equipment that the present embodiment proposes, when receiving voice signal, and when current network state is normal, being sent to default identified server by the voice signal of reception and identifying, and obtains the recognition result of identified server feedback; Local terminal identification is carried out to the voice signal received, and obtains recognition result; Calculate the recognition accuracy that each recognition result is corresponding; The recognition result fed back by identified server the highest for recognition accuracy, as the recognition result of described voice signal, improves the accuracy rate to voice signal identification.

It should be noted that, in this article, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thus make to comprise the process of a series of key element, method, article or system and not only comprise those key elements, but also comprise other key elements clearly do not listed, or also comprise by the intrinsic key element of this process, method, article or system.When not more restrictions, the key element limited by statement " comprising ... ", and be not precluded within process, method, article or the system comprising this key element and also there is other identical element.

The invention described above embodiment sequence number, just to describing, does not represent the quality of embodiment.

Through the above description of the embodiments, those skilled in the art can be well understood to the mode that above-described embodiment method can add required general hardware platform by software and realize, hardware can certainly be passed through, but in a lot of situation, the former is better embodiment.Based on such understanding, technical scheme of the present invention can embody with the form of software product the part that prior art contributes in essence in other words, this computer software product is stored in a storage medium (as ROM/RAM, magnetic disc, CD), comprising some instructions in order to make a station terminal equipment (can be mobile phone, computing machine, server, air conditioner, or the network equipment etc.) perform method described in each embodiment of the present invention.

These are only the preferred embodiments of the present invention; not thereby the scope of the claims of the present invention is limited; every utilize instructions of the present invention and accompanying drawing content to do equivalent structure or equivalent flow process conversion; or be directly or indirectly used in other relevant technical fields, be all in like manner included in scope of patent protection of the present invention.

Claims

1. an audio recognition method, is characterized in that, described audio recognition method comprises the following steps:

2. audio recognition method as claimed in claim 1, it is characterized in that, described audio recognition method also comprises:

3. audio recognition method as claimed in claim 1 or 2, it is characterized in that, the step of the recognition accuracy that each recognition result of described calculating is corresponding comprises:

4. audio recognition method as claimed in claim 3, is characterized in that, described identification parameter comprises the response time of server and local terminal, recognition result matching degree and performance parameter.

5. audio recognition method as claimed in claim 4, is characterized in that, the described recognition result that identified server the highest for recognition accuracy is fed back as the recognition result of described voice signal step after, described audio recognition method also comprises:

6. a speech recognition equipment, is characterized in that, described speech recognition equipment comprises:

7. speech recognition equipment as claimed in claim 6, is characterized in that, described local terminal identification module, also for when current network state is abnormal, carries out local terminal identification, and obtain recognition result corresponding to described voice signal to the voice signal received; Described processing module, also for using the recognition result of described recognition result as described voice signal.

8. speech recognition equipment as claimed in claims 6 or 7, it is characterized in that, described accuracy computing module comprises:

9. speech recognition equipment as claimed in claim 8, is characterized in that, described identification parameter comprises the response time of server and local terminal, recognition result matching degree and performance parameter.

10. speech recognition equipment as claimed in claim 9, it is characterized in that, described speech recognition equipment also comprises: adjusting module, for increasing by a preset value to the recognition accuracy of server corresponding to described recognition result or local terminal.