CN107845384A

CN107845384A - A kind of audio recognition method

Info

Publication number: CN107845384A
Application number: CN201711033275.5A
Authority: CN
Inventors: 朱启凯; 崔卫洁; 葛俊鹏
Original assignee: Jiangxi Borui Tongyun Technology Co Ltd
Current assignee: Jiangxi Borui Tongyun Technology Co Ltd
Priority date: 2017-10-30
Filing date: 2017-10-30
Publication date: 2018-03-27

Abstract

The present embodiments relate to a kind of audio recognition method, methods described includes：User terminal obtains the voice signal of user's input, and is sent to the first service server；First service server parses voice signal, and generation dictation object information is simultaneously sent to user terminal；User terminal is according to the application service information prestored in dictation object information matching user terminal, it is determined whether the application service information to match with dictation object information be present；If it does, user terminal jumps to corresponding application services according to application service information, with for users to use；If it does not, user terminal is sent to the second service server by object information is dictated；Second service server parsing dictation object information, generates analysis result information and is sent to user terminal；User terminal loads analysis result information and is shown as result displayed page.The present invention, improve speech recognition speed so that user is more convenient and simple using electronic equipment.

Description

A kind of audio recognition method

Technical field

The present invention relates to communication technical field, more particularly to a kind of audio recognition method.

Background technology

Voice technology is most natural, most convenient a personal-machine interactive interface, its with the continuous development of computer technology, The visual field of researcher is progressed into, especially with the fast development of computer hardware technology, voice technology is even more to obtain Universal attention.

When using electronic product, use for convenience, often replace handwriting input or input through keyboard with phonetic entry, After voice is identified equipment, exports the word content of voice or perform corresponding operational order.At present, in the prior art Voice assistant by voice be converted into word all need it is slower by server process and the step of processing locality two, speed, it is impossible to Meet the higher and higher requirement of user.

The content of the invention

The purpose of the present invention is the problem of presence for prior art, there is provided one kind can improve speech recognition speed, make Obtain electronic device more simply and easily audio recognition method.

To achieve the above object, the invention provides a kind of audio recognition method, methods described to include：

User terminal obtains the voice signal of user's input, and the voice signal is sent into the first service server；

First service server parses the voice signal, and generation dictation object information is simultaneously sent to user's end End；

The user terminal matches the application service prestored in the user terminal according to the dictation object information Information, it is determined whether the application service information to match with the dictation object information be present；

If it does, the user terminal jumps to corresponding application services according to the application service information, So that the user uses；

If it does not, the dictation object information is sent to the second service server by the user terminal；

Second service server parses the dictation object information, generates analysis result information and is sent to the use Family terminal；The analysis result information includes inquiry content information, related application information on services and download link information；

The user terminal loads the analysis result information and is shown as result displayed page.

Preferably, first service server parses the voice signal, and generation dictation object information is simultaneously sent to institute User terminal is stated to specifically include：

First service server receives the voice signal, and extracts the characteristic information of the voice signal；

Using the speech recognition database of first service server, first service server identifies the feature Information, generate the dictation object information；

The dictation object information is sent to the user terminal by first service server.

Preferably, second service server parses the dictation object information, generates analysis result information and sends Specifically included to the user terminal：

Second service server receives and parses through the dictation object information；

According to the dictation object information, second service server matches described in second service server Content information and the related application information on services are inquired about, and extracts the download chain corresponding with the related application information on services Connect information；

Content information, the related application information on services and the corresponding download link information are inquired about according to described, Second service server generates the analysis result information, and is sent to the user terminal.

It is it is further preferred that described according to the inquiry content information, the related application information on services and described relative The download link information answered, second service server generate the analysis result information and specifically included:

According to the degree of correlation of the dictation object information and the inquiry content information, the second service server generation Sequence beacon information and the attribute information for being loaded as the inquiry content information；

According to the dictation object information and the degree of correlation of the related application information on services, second service server Generate the sequence beacon information and be loaded as the related application information on services and the corresponding download link information The attribute information；

Second service server should according to the inquiry content information after the addition attribute information, the correlation With information on services and the corresponding download link information, the analysis result information is generated.

Preferably, the analysis result information is loaded and is shown as result displayed page and specifically wrapped by the user terminal Include：

The user terminal receives the analysis result information；

The user terminal identifies the analysis result information, by the inquiry content information, the related application service Information and the download link information are classified, and loaded and displayed is the result displayed page.

Audio recognition method provided in an embodiment of the present invention, solve voice assistant and voice is converted into word all needs The problem of by two step of server process and processing locality, the speech data for meeting established condition is identified, for the part number Interactive voice can be completed according to need to only pass through processing locality, improves speech recognition speed, it is quick by way of interactive voice Use function interested so that the operation of electronic equipment is more simple and convenient, meets the use demand of user.

Brief description of the drawings

Fig. 1 is the schematic flow sheet of audio recognition method provided in an embodiment of the present invention.

Embodiment

Below by drawings and examples, technical scheme is described in further detail.

The present embodiments relate to the audio recognition method of offer, improves speech recognition speed so that electronic equipment is grasped Work is more simple and convenient, meets the use demand of user.

Fig. 1 is the schematic flow sheet of audio recognition method provided in an embodiment of the present invention.Below in conjunction with shown in Fig. 1, to this The audio recognition method that inventive embodiments provide illustrates.

Step 101, user terminal obtains the voice signal of user's input, and voice signal is sent into the first business service Device.

During one specific, user terminal is the electronic equipment that user uses, and the first service server is voice Turn text service device, the equipment for voice messaging to be converted into text information, respond the request of user terminal, and located Reason.When user uses the mode of interactive voice, microphone includes the voice of user, and user terminal obtains the use that microphone is collected The voice signal of family input is audio signal, and voice signal is sent into the i.e. speech-to-text service of the first service server Device, voice signal is handled.

Step 102, the first service server parsing voice signal, generation dictation object information are simultaneously sent to user terminal.

Specifically, the first service server receives voice signal, and extract the characteristic information of voice signal；Utilize the first industry The speech recognition database of business server, the first service server identification feature information, generation dictation object information；First business Server is sent to user terminal by object information is dictated.

During one specific, the first service server receives the voice signal that user terminal is sent, and voice is believed Number parsed, and extract the characteristic information of voice signal, the first service server is in speech recognition database to extraction Feature is identified, and so as to which the voice signal of audio-frequency information to be converted to the dictation object information of text information, and is sent to use Family terminal.

Step 103, user terminal matches the application service information prestored in user terminal according to dictation object information, Determine whether there is the application service information to match with dictation object information.

If there is the application service information to match with dictation object information, then into step 104；If it does not, Then enter step 105.

During one specific, user terminal receives the dictation object information that the first service server is sent, parsing Object information is dictated, meanwhile, the application service information prestored is obtained, the matching dictation result letter in application service information Breath, it is determined whether the application service information to match with dictation object information be present, if it is present server need not be carried out Processing, if it does not exist, then needing further server process.

When the application service information to match with dictation object information be present：

Step 104, user terminal jumps to corresponding application services according to application service information, so that user makes With.

If there is the application service information to match with dictation object information, then illustrate that user terminal originally exists on the ground and use Family application services interested, therefore it may only be necessary to carry out processing locality, user terminal takes according to the application matched Business information is jumped directly on corresponding application services, for users to use, terminates this interactive voice.

When in the absence of the application service information to match with dictation object information：

Step 105, user terminal is sent to the second service server by object information is dictated.

Second service server referred herein is semantic resolution server.If there is no with dictating object information phase The application service information matched somebody with somebody, then illustrate that user's content interested is originally not present in user terminal on the ground, then need to solve by semanteme Analysis server is inquired about, and therefore, user terminal is sent to the second service server firstly the need of by dictation object information, then enters Row is handled in next step.

Step 106, the second service server parsing dictation object information, generate analysis result information and be sent to user's end End.

Wherein, analysis result information includes inquiry content information, related application information on services and download link information.Specifically , the second service server receives and parses through dictation object information；According to dictation object information, the second service server matching the Inquiry content information and related application information on services in two service servers, and extract corresponding with related application information on services Download link information.Preferably, according to dictation object information and the degree of correlation for inquiring about content information, the life of the second service server Into sequence beacon information and it is loaded as inquiring about the attribute information of content information；Believed according to dictation object information and related application service The degree of correlation of breath, the second service server generation sequence beacon information and be loaded as related application information on services and it is corresponding under Carry the attribute information of link information；Second service server is according to the inquiry content information after addition attribute information, related application Information on services and corresponding download link information, generate analysis result information.

Checked in order to facilitate user, it is necessary to enter to the multiple inquiry content informations and related application information on services inquired Row sequence, according to dictation object information and the degree of correlation inquired about between content information, the second service server can generate sequence mark Show information, while the beacon information that sorts is added to the attribute information of inquiry content information, in display, according to inquiry content letter The attribute information of breath can be with the sequence to the inquiry content information progress degree of correlation from high to low, similarly, the second service server Also the sequence of aforesaid way can be carried out to the download link information of related application information on services corresponding thereto, so as to which user is more enough The content interested to oneself must faster be found.

Step 107, user terminal loads analysis result information and is shown as result displayed page.

Specifically, user terminal receives analysis result information；User terminal identifies analysis result information, by inquiry content letter Breath, related application information on services and download link information are classified, and loaded and displayed is result displayed page.

During specific at one, user terminal receives the analysis result information after sequence, and will inquiry content information, Related application information on services and download link information carry out classification display displaying, wherein, inquiry content information is defeated for user The information of the voice signal correlation entered, such as user want the tourist attractions near inquiry, and inquiry content information will be The related introduction information of neighbouring tourist attractions, related application information on services are that the application service webpage that user wants to inquire about uses version Information, download link information are the download link for the application service that user wants inquiry.

In order to be better understood from said process, it is illustrated with a specific example.

User terminal is to additionally provide some specific value-added services, such as health consultation service, the first service server For speech-to-text server, the second service server is semantic resolution server.

User enters user terminal, the microphone icon in long-press application main screen, you can activation voice input function, now The microphone of application is in typing state, includes the voice of user, after completion is included, using speech recognition technology by voice Be converted to word.Word after conversion is carried out matching treatment by user terminal first, if the content of word applies journey to open The related content of sequence, then judge the application program to be inquired about of user whether is stored in the system of the electronic equipment, if so, then The application program is opened, terminates this interactive voice, if it is not, judging the specific of correlation whether is provided in user terminal Service, if so, then specified page is opened for users to use, if it is not, needing to transfer to this speech recognition content Semantic resolution server processing, semantic resolution server processing will return to user to the speech processes result of speech recognition content Terminal, user terminal do according to the different type of returning result and show or redirect accordingly, terminate this interactive voice.Voice is handed over After mutually terminating, user can be to carry out corresponding function use or the information of decorrelation according to the result of feedback.

Audio recognition method provided in an embodiment of the present invention, solve voice assistant and voice is converted into word all needs The problem of by two step of server process and processing locality, a part, which need to only pass through processing locality, can complete interactive voice, carry High speech recognition speed, quickly uses function interested so that the operation of electronic equipment is more by way of interactive voice It is simple and convenient, meet the use demand of user.

Professional should further appreciate that, each example described with reference to the embodiments described herein Unit and algorithm steps, it can be realized with electronic hardware, computer software or the combination of the two, it is hard in order to clearly demonstrate The interchangeability of part and software, the composition and step of each example are generally described according to function in the above description. These functions are performed with hardware or software mode actually, application-specific and design constraint depending on technical scheme. Professional and technical personnel can realize described function using distinct methods to each specific application, but this realization It is it is not considered that beyond the scope of this invention.

The method that is described with reference to the embodiments described herein can use hardware, computing device the step of algorithm Software module, or the two combination are implemented.Software module can be placed in random access memory (RAM), internal memory, read-only storage (ROM), in electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technical field In known any other form of storage medium.

Above-described embodiment, the purpose of the present invention, technical scheme and beneficial effect are carried out further Describe in detail, should be understood that the embodiment that the foregoing is only the present invention, be not intended to limit the present invention Protection domain, within the spirit and principles of the invention, any modification, equivalent substitution and improvements done etc., all should include Within protection scope of the present invention.

Claims

1. a kind of audio recognition method, it is characterised in that methods described includes：

First service server parses the voice signal, and generation dictation object information is simultaneously sent to the user terminal；

The user terminal matches the application service information prestored in the user terminal according to the dictation object information, Determine whether there is the application service information to match with the dictation object information；

If it does, the user terminal jumps to corresponding application services according to the application service information, for The user uses；

Second service server parses the dictation object information, generates analysis result information and is sent to user's end End；The analysis result information includes inquiry content information, related application information on services and download link information；

2. audio recognition method according to claim 1, it is characterised in that the first service server parsing institute predicate Sound signal, generation dictate object information and are sent to the user terminal and specifically include：

Using the speech recognition database of first service server, first service server identifies the feature letter Breath, generates the dictation object information；

3. audio recognition method according to claim 1, it is characterised in that listened described in the second service server parsing Write object information, generate analysis result information and be sent to the user terminal and specifically include：

According to the dictation object information, second service server matches the inquiry in second service server Content information and the related application information on services, and extract the download link letter corresponding with the related application information on services Breath；

It is described according to inquiry content information, the related application information on services and the corresponding download link information Second service server generates the analysis result information, and is sent to the user terminal.

4. audio recognition method according to claim 4, it is characterised in that described according to the inquiry content information, institute State related application information on services and the corresponding download link information, the second service server generation parsing knot Fruit information specifically includes:

According to the degree of correlation of the dictation object information and the related application information on services, the second service server generation The sequence beacon information is simultaneously loaded as the described of the related application information on services and the corresponding download link information Attribute information；

Second service server takes according to the inquiry content information after the addition attribute information, the related application Information of being engaged in and the corresponding download link information, generate the analysis result information.

5. audio recognition method according to claim 1, it is characterised in that the user terminal believes the analysis result Breath loads and is shown as result displayed page and specifically includes：

The user terminal receives the analysis result information；

The user terminal identifies the analysis result information, by the inquiry content information, the related application information on services Classified with the download link information, and loaded and displayed is the result displayed page.