CN109994101A

CN109994101A - A kind of audio recognition method, terminal, server and computer readable storage medium

Info

Publication number: CN109994101A
Application number: CN201810000871.1A
Authority: CN
Inventors: 刘江锋
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Communications Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Communications Co Ltd
Priority date: 2018-01-02
Filing date: 2018-01-02
Publication date: 2019-07-09

Abstract

The present invention provides a kind of audio recognition method, terminal, server and computer readable storage mediums, wherein audio recognition method includes: to send the audio identification for identifying the audio frequency characteristics in audio file to be processed to server to request；Server is received according to the audio recognition result of audio identification request feedback；Audio recognition result is decoded identification to the audio frequency characteristics in audio file to be processed and is obtained by the stored speech recognition modeling of server calls.This programme, which can be realized, is placed on server end for time consuming process such as stress model, analytic modell analytical models using the framework of client-server end (client-server), after loading and having parsed model, data can save in memory repeatedly multiplexing always, until end of service, treatment effeciency under the scene that speech recognition cloud service is provided using Kaldi can be greatly improved.

Description

A kind of audio recognition method, terminal, server and computer readable storage medium

Technical field

The present invention relates to language data process technical field, particularly relate to a kind of audio recognition method, terminal, server and Computer readable storage medium.

Background technique

Kaldi is a very powerful speech recognition tools library, is mainly developed and is safeguarded by Daniel Povey.At present Support the training and prediction of the model of the multiple voices such as GMM-HMM, SGMM-HMM, DNN-HMM identification.Wherein in DNN-HMM Neural network can also be customized by configuration file, the nerves such as DNN, CNN, TDNN, LSTM and Bidirectional-LSTM Network structure can be supported.

The scheme of Kaldi speech recognition at present, it is main to support this scene of batch processing, to multiple wav under a certain file File is once focused on and (is once identified to audio files multiple under file), exports recognition result.At this batch The characteristics of managing scene is: the processing time is determining, file to be processed is known and has been prepared for, there are many file to be processed, to processing Speed is insensitive.Detailed process includes: to read wav audio file --- and feature list is extracted from audio file, generates characteristic series List file --- loading trained model in advance --- traversal feature list successively calls decoder to decode each feature --- Releasing theory --- output recognition result.

With the rise of artificial intelligence cloud service, triggered by web request not timing, file to be processed is unknown, single treatment Only handle a file, more and more to the scene that processing speed is sensitive, provided using kaldi speech recognition cloud service when It waits, if directly handling this scene with the scheme of batch processing scene traditional in kaldi, faces following problems: identification every time Single audio file, will be first from hard disk loads very big model file, analytic modell analytical model file data (loads instruction in advance from hard disk The model and analytic modell analytical model perfected, and model file is usually very big, up to several G), then carries out follow-up work, and loaded from hard disk Model and this process of analytic modell analytical model are time consuming, seriously affect the concurrency performance for the treatment of effeciency and cloud service.

From the foregoing, it will be observed that the prior art needs Optimum utilization Kaldi to provide this file to be processed of speech recognition cloud service not Know, not timing processing, single treatment only handle performance under a file, the scene sensitive to processing speed.

Summary of the invention

The purpose of the present invention is to provide a kind of audio recognition method, terminal, server and computer readable storage medium, It solves the problems, such as low using treatment effeciency under the scene of Kaldi offer speech recognition cloud service in the prior art.

In order to solve the above-mentioned technical problem, the embodiment of the present invention provides a kind of audio recognition method, is applied to terminal, packet It includes:

The audio identification for identifying the audio frequency characteristics in audio file to be processed is sent to server to request；

The server is received according to the audio recognition result of audio identification request feedback；

Wherein, the audio recognition result is by the stored speech recognition modeling of the server calls, to described wait locate Audio frequency characteristics in reason audio file are decoded identification and obtain.

Optionally, described that the audio identification identified to the audio frequency characteristics in audio file to be processed is sent to server The step of request includes:

The audio frequency characteristics of the audio file to be processed are obtained, audio frequency characteristics listing file is formed；

According to the audio frequency characteristics listing file, audio identification request is sent to server；

Wherein, the path of the audio frequency characteristics listing file is carried in the audio identification request.

Optionally, after the formation audio frequency characteristics listing file, the audio recognition method further include:

The audio frequency characteristics listing file is stored.

The embodiment of the invention also provides a kind of audio recognition methods, are applied to server, comprising:

Receive the audio identification request identified to the audio frequency characteristics in audio file to be processed that terminal is sent；

Stored speech recognition modeling is called, knowledge is decoded to the audio frequency characteristics in the audio file to be processed Not, and by audio recognition result the terminal is fed back to.

Optionally, described to call stored speech recognition modeling, to the audio frequency characteristics in the audio file to be processed The step of being decoded identification include:

It is requested according to the audio identification, obtains the path of the corresponding audio frequency characteristics listing file of audio file to be processed； The audio frequency characteristics listing file includes the audio frequency characteristics of the audio file to be processed；

Stored speech recognition modeling is called, according to the path, the audio frequency characteristics listing file is decoded Identification.

Optionally, in the audio identification identified to the audio frequency characteristics in audio file to be processed for receiving terminal transmission Before request, the audio recognition method further include:

The speech recognition modeling is loaded, and parses the data of the speech recognition modeling, is stored.

Optionally, after audio recognition result is fed back to the terminal, the audio recognition method further include:

Monitor whether that the audio identified to the audio frequency characteristics in audio file to be processed for receiving terminal transmission is known It does not invite and asks, and the speech recognition modeling is kept not discharge；

If receiving the audio identification request of terminal transmission identified to the audio frequency characteristics in audio file to be processed, The stored speech recognition modeling of calling is then returned, knowledge is decoded to the audio frequency characteristics in the audio file to be processed Not, and the step of audio recognition result is fed back into the terminal.

Optionally, the speech recognition modeling is stored in the memory of the server.

The embodiment of the invention also provides a kind of terminals, comprising: processor and transceiver；

The processor, for being sent to server to the audio frequency characteristics in audio file to be processed by the transceiver The audio identification request identified；

Optionally, the processor is specifically used for:

According to the audio frequency characteristics listing file, audio identification request is sent to server by the transceiver；

Optionally, the processor is also used to:

It is formed after audio frequency characteristics listing file, the audio frequency characteristics listing file is stored.

The embodiment of the invention also provides a kind of servers, comprising: processor and transceiver；

The processor, for receiving the special to the audio in audio file to be processed of terminal transmission by the transceiver Levy the audio identification request identified；

Optionally, the processor is specifically used for:

Optionally, the processor is also used to:

It is requested receiving the audio identification identified to the audio frequency characteristics in audio file to be processed that terminal is sent Before, the speech recognition modeling is loaded, and parse the data of the speech recognition modeling, is stored.

Optionally, the processor is also used to:

After audio recognition result is fed back to the terminal, monitor whether to receive terminal transmission to sound to be processed The audio identification request that audio frequency characteristics in frequency file are identified, and the speech recognition modeling is kept not discharge；

If receiving the audio identification request of terminal transmission identified to the audio frequency characteristics in audio file to be processed, The stored speech recognition modeling of calling is then returned, knowledge is decoded to the audio frequency characteristics in the audio file to be processed Not, and by audio recognition result the operation of the terminal is fed back to.

The embodiment of the invention also provides a kind of terminal, including memory, processor and it is stored on the memory simultaneously The computer program that can be run on the processor；The processor realizes the voice of above-mentioned terminal side when executing described program Recognition methods.

The embodiment of the invention also provides a kind of server, including memory, processor and it is stored on the memory And the computer program that can be run on the processor；The processor realizes above-mentioned server side when executing described program Audio recognition method.

The embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer program, the journey The step in the audio recognition method of above-mentioned terminal side is realized when sequence is executed by processor；Or

The program realizes the step in the audio recognition method of above-mentioned server side when being executed by processor.

The advantageous effects of the above technical solutions of the present invention are as follows:

In above scheme, the audio recognition method is special to the audio in audio file to be processed by sending to server Levy the audio identification request identified；The server is received according to the audio identification knot of audio identification request feedback Fruit；Wherein, the audio recognition result is by the stored speech recognition modeling of the server calls, to the audio to be processed Audio frequency characteristics in file are decoded identification and obtain；It can be realized using client-server end (client-server) The time consuming process such as stress model, analytic modell analytical model are placed on server end by framework, and after loading and having parsed model, data can be always Repeatedly multiplexing in memory is saved, until process time-consuming in end of service, such speech recognition process only needs to open in service It is carried out when dynamic once, subsequent each identification is all multiplexed the model saved in memory loaded, does not need to identify every time Audio file all reloads model, analytic modell analytical model；It substantially increases under the scene that speech recognition cloud service is provided using Kaldi Treatment effeciency.

Detailed description of the invention

Fig. 1 is the audio recognition method flow diagram one of the embodiment of the present invention；

Fig. 2 is the audio recognition method flow diagram two of the embodiment of the present invention；

Fig. 3 is the terminal side audio recognition method concrete application flow diagram of the embodiment of the present invention；

Fig. 4 is the server side audio recognition method concrete application flow diagram of the embodiment of the present invention；

Fig. 5 is the terminal structure schematic diagram of the embodiment of the present invention；

Fig. 6 is the server architecture schematic diagram of the embodiment of the present invention.

Specific embodiment

To keep the technical problem to be solved in the present invention, technical solution and advantage clearer, below in conjunction with attached drawing and tool Body embodiment is described in detail.

The present invention in view of the prior art in using Kaldi provide speech recognition cloud service scene under treatment effeciency it is low Problem provides a kind of audio recognition method, is applied to terminal, as shown in Figure 1, comprising:

Step 11: sending the audio identification identified to the audio frequency characteristics in audio file to be processed to server and ask It asks；

Step 12: receiving the server according to the audio recognition result of audio identification request feedback；

The audio recognition method provided in an embodiment of the present invention to server by sending in audio file to be processed Audio frequency characteristics identified audio identification request；The server is received according to the audio of audio identification request feedback Recognition result；Wherein, the audio recognition result is by the stored speech recognition modeling of the server calls, to described wait locate Audio frequency characteristics in reason audio file are decoded identification and obtain；It can be realized using client-server end (client- The time consuming process such as stress model, analytic modell analytical model are placed on server end by framework server), after loading and having parsed model, number According to repeatedly multiplexing can be saved in memory always, until process time-consuming in end of service, such speech recognition process only needs It to be carried out when servicing starting once, subsequent each identification is all multiplexed the model saved in memory loaded, is not required to To identify that audio file all reloads model, analytic modell analytical model every time；It substantially increases and provides speech recognition cloud clothes using Kaldi Treatment effeciency under the scene of business.

Specifically, described send the audio identification identified to the audio frequency characteristics in audio file to be processed to server The step of request includes: the audio frequency characteristics for obtaining the audio file to be processed, forms audio frequency characteristics listing file；According to described Audio frequency characteristics listing file sends audio identification request to server；Wherein, the audio is carried in the audio identification request The path of feature list file (in order to which server can read audio frequency characteristics listing file, to carry out identifying processing).

Further, after the formation audio frequency characteristics listing file, the audio recognition method further include: by the sound Frequency feature list file is stored.

The embodiment of the invention also provides a kind of audio recognition methods, are applied to server, as shown in Figure 2, comprising:

Step 21: receiving the audio identification identified to the audio frequency characteristics in audio file to be processed that terminal is sent and ask It asks；

Step 22: calling stored speech recognition modeling, the audio frequency characteristics in the audio file to be processed are carried out Decoding identification, and audio recognition result is fed back into the terminal.

The audio recognition method provided in an embodiment of the present invention is by reception terminal transmission to audio file to be processed In audio frequency characteristics identified audio identification request；Stored speech recognition modeling is called, to the audio to be processed Audio frequency characteristics in file are decoded identification, and audio recognition result is fed back to the terminal；It can be realized using client The time consuming process such as stress model, analytic modell analytical model are placed on server end by the framework at end-server end (client-server), are added After carrying and having parsed model, data can save in memory repeatedly multiplexing always, until end of service, such speech recognition stream Time-consuming process only needs to carry out when servicing starting primary in journey, and subsequent each identification is all multiplexed being stored in of having loaded Model in memory does not need to identify that audio file all reloads model, analytic modell analytical model every time；Substantially increase utilization Treatment effeciency under the scene of Kaldi offer speech recognition cloud service.

Specifically, described call stored speech recognition modeling, to the audio frequency characteristics in the audio file to be processed The step of being decoded identification includes: to be requested according to the audio identification, obtains the corresponding audio frequency characteristics of audio file to be processed The path of listing file；The audio frequency characteristics listing file includes the audio frequency characteristics of the audio file to be processed；Calling has been deposited The speech recognition modeling of storage is decoded identification to the audio frequency characteristics listing file according to the path.

Further, know in the audio identified to the audio frequency characteristics in audio file to be processed for receiving terminal transmission It does not invite before asking, the audio recognition method further include: load the speech recognition modeling, and parse the speech recognition modeling Data, stored.

Identifying processing can be carried out in time when receiving speech recognition request in this way, improve processing speed and response speed Degree.Preferred server is loaded with the speech recognition modeling when starting, and parses the data of the speech recognition modeling, is deposited Storage.

Further, after audio recognition result is fed back to the terminal, the audio recognition method further include: Monitor whether that the audio identification identified to the audio frequency characteristics in audio file to be processed for receiving terminal transmission is requested, and The speech recognition modeling is kept not discharge；If receiving carrying out to the audio frequency characteristics in audio file to be processed for terminal transmission The audio identification of identification is requested, then the stored speech recognition modeling of calling is returned to, in the audio file to be processed Audio frequency characteristics be decoded identification, and the step of audio recognition result is fed back into the terminal.

It can continue that speech recognition modeling is kept not discharge in this way, when receiving speech recognition request again, with most fast Speed is responded, and processing speed is improved.It is preferred that as long as there is no need to discharge mould there is also the demand of voice responsive identification request Type.Restart when server needs to safeguard, or when more fresh code, more new model, should first temporarily cease voice responsive identification request, Releasing theory again.

Preferably, the speech recognition modeling is stored in the memory of the server.Place can be further increased in this way Manage speed and response speed.

The audio recognition method provided in an embodiment of the present invention is carried out into one below with reference to terminal and server two sides Walk explanation.

In view of the above technical problems, in order to optimize the performance under cloud service scene, scheme provided in an embodiment of the present invention is logical The framework using client-server end (client-server) is crossed, stress model, analytic modell analytical model this time consuming process are put In server end, after loading and having parsed model, data just save in memory repeatedly multiplexing always, until end of service, and Feature this process will be extracted and be placed on client.Such benefit is that stress model this time-consuming process only needs to open in service It is carried out when dynamic once, subsequent each identification is all multiplexed the model saved in memory loaded, does not need to identify every time Audio file all reloads model, analytic modell analytical model.Detailed process is:

Client first reads wav audio file, extracts audio file feature list, is stored in file, then to server End is sent request (the feature list file path that subsidiary previous step generates in request), and the response of waiting for server；Receive clothes After device end processing result of being engaged in, client can be exited；As shown in Figure 3, comprising:

Step 31: starting；

Step 32: reading audio file (such as: wav audio file)；

Step 33: extracting audio frequency characteristics, generate audio frequency characteristics listing file, stored；

Step 34: according to audio frequency characteristics listing file, sending speech recognition request (audio frequency characteristics list text to server end Part processing request), the audio frequency characteristics listing file path incidentally generated in request；

Step 35: receiving the recognition result of server end feedback；

Step 36: terminating.

Server end is responsible for stress model, analytic modell analytical model data, and result is saved in memory, is then monitored in particular end Mouthful, after receiving the request that client is sent, audio frequency characteristics listing file path is obtained from request, then calls decoder The audio frequency characteristics listing file is handled, decoded audio frequency characteristics listing file is carried out using the speech recognition modeling loaded Identification, sends result to client after having handled.

After this client request is disposed, the speech recognition modeling in server end memory does not discharge, service Device does not also exit, and is to continue with monitoring in the port, waits client request next time.When request arrives next time, it is only necessary to Multiplexing has been saved in model data in memory, does not need to re-execute that this is time-consuming from hard disk stress model, analytic modell analytical model Process.As shown in Figure 4, comprising:

Step 41: starting；

Step 42: loading trained speech recognition modeling；

Step 43: listening port；

Step 44: judging whether there is client (terminal) and send audio identification request, if so, entering step 45；If it is not, into Enter step 47；

Step 45: obtaining audio frequency characteristics listing file path from audio identification request, call decoder to audio frequency characteristics Listing file decoding；

Step 46: decoded audio frequency characteristics listing file being identified using the speech recognition modeling loaded, is given Client sends recognition result；

Step 47: judging whether to terminate server end, if so, 48 are entered step, if it is not, return step 44；

It needs to safeguard in server and restart, or when more fresh code, more new model, can first temporarily cease voice responsive identification Request (terminates server end), then releasing theory.

Step 48: release speech recognition modeling；

Step 49: terminating.

Illustrate herein, in actual use, protocol procedures provided in an embodiment of the present invention can specifically: first start server- (speech recognition server is just loaded with speech recognition modeling from hard disk and parses speech recognition server after start completion Speech recognition modeling, and model data has been saved in memory), then start cloud service web server, waits cloud service Web request.After the web request of cloud service arrives, cloud service web server obtains audio text to be treated from request Then part calls terminal-speech recognition client to extract feature list from audio file, and past by speech recognition client Speech recognition server sends request.The model data saved in speech recognition server multiplexing memory, to feature list file It is handled, recognition result is then sent to speech recognition client.When speech recognition client receives speech-recognition services After the recognition result of device, then the web server return browser that this result is passed through into cloud service.

From the foregoing, it will be observed that scheme provided in an embodiment of the present invention is called according to this not timing of cloud service, file to be processed not Know, single treatment only handles the demand and feature of a file, the scene sensitive to processing speed, to existing kaldi batch processing Process under scene is optimized, and reorganizes process by client-server end-rack structure, loads mould from hard disk for time-consuming Type, the process of analytic modell analytical model are placed on server end and save in memory, this time-consuming process only needs to be implemented once, after Each speech recognition process, all model data for saving in multiplexing memory, it is no longer necessary to re-execute stress model, parsing mould This time consuming process of type, until end of service.This process carries out speech recognition batch processing scene using kaldi with traditional Process is compared, and recognition performance is improved, and can preferably meet that this not timing of cloud service, file to be processed be unknown, single treatment Only processing one file, the scene sensitive to processing speed.

In conclusion scheme provided in an embodiment of the present invention by being based on server-client manner, is carrying out at voice In reason, speech recognition modeling no longer discharges after calling, but is stored in memory, and when subsequent progress speech processes, calls directly The speech recognition modeling, greatly improves treatment effeciency.

The embodiment of the invention also provides a kind of terminals, as shown in Figure 5, comprising: processor 51 and transceiver 52；

The processor 51, for being sent to server to the audio in audio file to be processed by the transceiver 52 The audio identification request that feature is identified；

The terminal provided in an embodiment of the present invention is special to the audio in audio file to be processed by sending to server Levy the audio identification request identified；The server is received according to the audio identification knot of audio identification request feedback Fruit；Wherein, the audio recognition result is by the stored speech recognition modeling of the server calls, to the audio to be processed Audio frequency characteristics in file are decoded identification and obtain；It can be realized using client-server end (client-server) The time consuming process such as stress model, analytic modell analytical model are placed on server end by framework, and after loading and having parsed model, data can be always Repeatedly multiplexing in memory is saved, until process time-consuming in end of service, such speech recognition process only needs to open in service It is carried out when dynamic once, subsequent each identification is all multiplexed the model saved in memory loaded, does not need to identify every time Audio file all reloads model, analytic modell analytical model；It substantially increases under the scene that speech recognition cloud service is provided using Kaldi Treatment effeciency.

Specifically, the processor is specifically used for: obtaining the audio frequency characteristics of the audio file to be processed, it is special to form audio Levy listing file；According to the audio frequency characteristics listing file, audio identification request is sent to server by the transceiver；Its In, the path of the audio frequency characteristics listing file is carried in the audio identification request.

Further, the processor is also used to: being formed after audio frequency characteristics listing file, by the audio frequency characteristics list File is stored.

Wherein, the realization embodiment of the audio recognition method of above-mentioned terminal side is suitable for the embodiment of the terminal In, it can also reach identical technical effect.

The embodiment of the invention also provides a kind of servers, as shown in Figure 6, comprising: processor 61 and transceiver 62；

The processor 61, for by the transceiver 62 receive terminal send to the sound in audio file to be processed The audio identification request that frequency feature is identified；

The server provided in an embodiment of the present invention is by reception terminal transmission to the sound in audio file to be processed The audio identification request that frequency feature is identified；Stored speech recognition modeling is called, in the audio file to be processed Audio frequency characteristics be decoded identification, and audio recognition result is fed back into the terminal；It can be realized using client-server The time consuming process such as stress model, analytic modell analytical model are placed on server end, load and parsing by the framework at device end (client-server) After complete model, data can save in memory repeatedly multiplexing always, until time-consuming in end of service, such speech recognition process Process only need to carry out when servicing starting primary, subsequent each identification is all multiplexed the preservation loaded in memory Model does not need to identify that audio file all reloads model, analytic modell analytical model every time；It substantially increases and provides language using Kaldi Treatment effeciency under the scene of sound identification cloud service.

Specifically, the processor is specifically used for: being requested according to the audio identification, it is corresponding to obtain audio file to be processed Audio frequency characteristics listing file path；The audio frequency characteristics listing file includes that the audio of the audio file to be processed is special Sign；Stored speech recognition modeling is called, according to the path, identification is decoded to the audio frequency characteristics listing file.

Further, the processor is also used to: receiving the special to the audio in audio file to be processed of terminal transmission Before levying the audio identification request identified, the speech recognition modeling is loaded, and parse the number of the speech recognition modeling According to being stored.

Further, the processor is also used to: after audio recognition result is fed back to the terminal, monitoring is The no audio identification request that the audio frequency characteristics in audio file to be processed are identified for receiving terminal transmission, and keep institute Speech recognition modeling is stated not discharge；If receiving being identified to the audio frequency characteristics in audio file to be processed for terminal transmission Audio identification request then returns to the stored speech recognition modeling of calling, to the audio in the audio file to be processed Feature is decoded identification, and audio recognition result is fed back to the operation of the terminal.

Preferably, the speech recognition modeling is stored in the memory of the server.

Wherein, the realization embodiment of the audio recognition method of above-mentioned server side is suitable for the implementation of the server In example, it can also reach identical technical effect.

The embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer program, the journey The step in the audio recognition method of above-mentioned terminal side is realized when sequence is executed by processor.

Wherein, the realization embodiment of the audio recognition method of above-mentioned terminal side is suitable for the computer-readable storage In the embodiment of medium, it can also reach identical technical effect.

The embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer program, the journey The step in the audio recognition method of above-mentioned server side is realized when sequence is executed by processor.

Wherein, the realization embodiment of the audio recognition method of above-mentioned server side is suitable for this and computer-readable deposits In the embodiment of storage media, it can also reach identical technical effect.

Above-described is the preferred embodiment of the present invention, it should be pointed out that the ordinary person of the art is come It says, under the premise of not departing from principle of the present invention, can also make several improvements and retouch, these improvements and modifications should also regard For protection scope of the present invention.

Claims

1. a kind of audio recognition method is applied to terminal characterized by comprising

Wherein, the audio recognition result is by the stored speech recognition modeling of the server calls, to the sound to be processed Audio frequency characteristics in frequency file are decoded identification and obtain.

2. audio recognition method according to claim 1, which is characterized in that described to send to server to audio to be processed The step of audio identification that audio frequency characteristics in file are identified request includes:

3. audio recognition method according to claim 2, which is characterized in that the formation audio frequency characteristics listing file it Afterwards, the audio recognition method further include:

The audio frequency characteristics listing file is stored.

4. a kind of audio recognition method is applied to server characterized by comprising

Stored speech recognition modeling is called, identification is decoded to the audio frequency characteristics in the audio file to be processed, and Audio recognition result is fed back into the terminal.

5. audio recognition method according to claim 4, which is characterized in that described to call stored speech recognition mould Type, the step of being decoded identification to the audio frequency characteristics in the audio file to be processed include:

It is requested according to the audio identification, obtains the path of the corresponding audio frequency characteristics listing file of audio file to be processed；It is described Audio frequency characteristics listing file includes the audio frequency characteristics of the audio file to be processed；

Stored speech recognition modeling is called, according to the path, identification is decoded to the audio frequency characteristics listing file.

6. audio recognition method according to claim 4, which is characterized in that in reception terminal transmission to audio to be processed Before the audio identification request that audio frequency characteristics in file are identified, the audio recognition method further include:

7. audio recognition method according to claim 4, which is characterized in that audio recognition result is being fed back to the end After end, the audio recognition method further include:

Monitor whether that the audio identification identified to the audio frequency characteristics in audio file to be processed for receiving terminal transmission is asked It asks, and the speech recognition modeling is kept not discharge；

If receiving the audio identification request of terminal transmission identified to the audio frequency characteristics in audio file to be processed, return The stored speech recognition modeling of calling is returned, identification is decoded to the audio frequency characteristics in the audio file to be processed, And the step of audio recognition result is fed back into the terminal.

8. according to the described in any item audio recognition methods of claim 4 to 7, which is characterized in that the speech recognition modeling is deposited Storage is in the memory of the server.

9. a kind of terminal characterized by comprising processor and transceiver；

The processor carries out the audio frequency characteristics in audio file to be processed for being sent by the transceiver to server The audio identification of identification is requested；

10. terminal according to claim 9, which is characterized in that the processor is specifically used for:

11. terminal according to claim 10, which is characterized in that the processor is also used to:

12. a kind of server characterized by comprising processor and transceiver；

The processor, for by the transceiver receive terminal send to the audio frequency characteristics in audio file to be processed into The audio identification request of row identification；

13. server according to claim 12, which is characterized in that the processor is specifically used for:

14. server according to claim 12, which is characterized in that the processor is also used to:

Before receiving the audio identification request identified to the audio frequency characteristics in audio file to be processed that terminal is sent, add The speech recognition modeling is carried, and parses the data of the speech recognition modeling, is stored.

15. server according to claim 12, which is characterized in that the processor is also used to:

After audio recognition result is fed back to the terminal, monitor whether to receive the literary to audio to be processed of terminal transmission The audio identification request that audio frequency characteristics in part are identified, and the speech recognition modeling is kept not discharge；

If receiving the audio identification request of terminal transmission identified to the audio frequency characteristics in audio file to be processed, return The stored speech recognition modeling of calling is returned, identification is decoded to the audio frequency characteristics in the audio file to be processed, And audio recognition result is fed back to the operation of the terminal.

16. 2 to 15 described in any item servers according to claim 1, which is characterized in that the speech recognition modeling is stored in In the memory of the server.

17. a kind of terminal, including memory, processor and it is stored on the memory and can runs on the processor Computer program；It is characterized in that, the processor is realized as described in any one of claims 1 to 3 when executing described program Audio recognition method.

18. a kind of server, including memory, processor and it is stored on the memory and can runs on the processor Computer program；It is characterized in that, the processor is realized when executing described program such as any one of claim 4 to 8 institute The audio recognition method stated.

19. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor The step in audio recognition method as claimed any one in claims 1 to 3 is realized when execution；Or

The step in the audio recognition method as described in any one of claim 4 to 8 is realized when the program is executed by processor.