CN109994110A

CN109994110A - Audio recognition method, device based on artificial intelligence, computer equipment

Info

Publication number: CN109994110A
Application number: CN201811488395.9A
Authority: CN
Inventors: 胡宏伟; 邹芳; 罗小涛
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-12-06
Filing date: 2018-12-06
Publication date: 2019-07-09

Abstract

This application involves field of artificial intelligence, in particular to a kind of audio recognition method based on artificial intelligence, device, computer equipment.It is instructed the described method includes: receiving user by the speech recognition request that browser is sent, the speech recognition request instruction is used for request call microphone；It is instructed in response to the speech recognition request, calls microphone, and obtain the voice flow of user's input by the microphone；Text identification is carried out to the voice flow, obtains text information；The text information is sent to the browser.To be implemented without the Html5 interface by browser to call microphone to obtain voice flow, aim to solve the problem that existing browser will carry out speech recognition, since browser is IE browser, or browser version it is too low do not support Html5 to call microphone, cause browser cannot be called by its Html5 interface microphone obtain voice flow the problem of.

Description

Audio recognition method, device based on artificial intelligence, computer equipment

Technical field

This application involves field of artificial intelligence, in particular to a kind of audio recognition method based on artificial intelligence, dress It sets, computer equipment.

Background technique

At present the browser of computer end to carry out speech recognition, general method be by the Html5 interface of browser come Microphone is called to obtain voice flow, such as the Html5 interface of Iflytek speech recognition.But use the Html5 interface of browser Can have some problems to call microphone to obtain voice flow, for example, most of intra-company's employee's office system use it is clear Device of looking at is IE, and they need to carry out speech recognition inside office system, however IE browser itself does not support microphone tune With that is, this Html5 interface cannot realize speech recognition in IE browser, other browsings can be used in some company Intranet computers Device, but other browser versions are more conservative (such as chrome version is too low), and Html5 is not supported to call microphone yet.Logical Cross the Html5 interface of browser call microphone obtain voice flow when, since browser is IE browser or browser Version is too low not to support Html5 to call microphone, and browser is caused microphone cannot to be called to obtain by its Html5 interface The problem of taking voice flow.

Apply for content

In view of the shortcomings of the prior art, the application proposes a kind of audio recognition method based on artificial intelligence, device, computer Equipment and storage medium, it is intended to speech recognition will be carried out by solving existing browser, since browser is IE browser, or browsing The version of device is too low not to support Html5 to call microphone, causes browser that cannot call microphone by its Html5 interface The problem of obtaining voice flow.

The technical solution that the application proposes is:

A kind of audio recognition method based on artificial intelligence, which comprises

It receives user to instruct by the speech recognition request that browser is sent, the speech recognition request instruction is for requesting Call microphone；

It is instructed in response to the speech recognition request, calls microphone, and user's input is obtained by the microphone Voice flow；

Text identification is carried out to the voice flow, obtains text information；

The text information is sent to the browser.

Further, in described the step of carrying out text identification to the voice flow, obtaining text information, comprising:

The voice flow is sent to speech recognition engine；

Obtain the text information after speech recognition engine identification；

Or

The C++SDK identified by integrated speech, identifies the text information after the voice flow is identified, wherein described C++SDK is the Software Development Kit using C Plus Plus.

Further, in the calling microphone the step of, comprising:

Read the user account information for currently logging in computer operating system；

Judge whether active user there is permission to call microphone according to the user account information；

If there is active user permission to call microphone, is instructed according to the speech recognition request and call microphone.

Further, after the described the step of text information is sent to the browser, comprising:

The voice broadcast request instruction that user is sent by the browser is received, the voice broadcast request instruction is used for First text information is carried out voice broadcast by request；

In response to the voice broadcast request instruction, the first text information to be broadcasted is obtained from the browser；

First text information is sent to the text-to-speech identification engine；

Obtain the voice messaging after text-to-speech identification engine identification；

According to the voice broadcast request instruction, audio unit is called；

The voice messaging is broadcasted by the audio unit.

The control instruction that user is sent in the browser by ActiveX control is received, the control instruction is to use In the start and stop of control speech recognition；

In response to the control instruction, the start and stop to speech recognition are executed.

Further, in the calling microphone the step of, comprising:

It is instructed according to the speech recognition request, whether identification current operation system is windows operating system；

If current operation system is to call microphone by windows native interface for windows operating system.

Further, the audio recognition method is the form using voice plug-in unit, the mounting means of the voice plug-in unit For the voice plug-in unit silence is installed on computers by way of pushing installation kit.

The application also provides a kind of speech recognition equipment based on artificial intelligence, and described device includes:

Receiving module is instructed for receiving user by the speech recognition request that browser is sent, and the speech recognition is asked Ask instruction for request call microphone；

Calling module calls microphone for instructing in response to the speech recognition request；

Module is obtained, for obtaining the voice flow of user's input by the microphone；

Module is obtained, for carrying out text identification to the voice flow, obtains text information；

Sending module, for the text information to be sent to the browser.

The application also provides a kind of computer equipment, including memory and processor, and the memory is stored with computer The step of program, the processor realizes method described in any of the above embodiments when executing the computer program.

The application also provides a kind of computer readable storage medium, is stored thereon with computer program, the computer journey The step of method described in any of the above embodiments is realized when sequence is executed by processor.

According to above-mentioned technical solution, for the application the utility model has the advantages that according to the request instruction of browser, voice plug-in unit calls wheat Gram wind obtains voice flow by microphone, then voice flow is carried out identification and obtains text information, is later sent to text information Browser, to be implemented without the Html5 interface by browser to call microphone to obtain voice flow, it is intended to solve existing Browser will carry out speech recognition, since the version that browser is IE browser or browser too low does not support Html5 to adjust With microphone, the problem of causing browser that microphone cannot be called to obtain voice flow by its Html5 interface.

Detailed description of the invention

Fig. 1 is the flow chart using the audio recognition method provided by the embodiments of the present application based on artificial intelligence；

Fig. 2 is the functional module using the audio recognition method device provided by the embodiments of the present application based on artificial intelligence Figure；

Fig. 3 is the structural schematic block diagram using computer equipment provided by the embodiments of the present application.

Specific embodiment

It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, and It is not used in restriction the application.

As shown in Figure 1, the embodiment of the present application proposes a kind of audio recognition method based on artificial intelligence, the method includes Following steps:

Step S101, it receives user to instruct by the speech recognition request that browser is sent, the speech recognition request refers to It enables and is used for request call microphone.

When browser needs to carry out speech recognition, browser sends speech recognition request by Html5 and instructs, and receives clear Device of looking at sends speech recognition request instruction.

In the present embodiment, speech recognition request instruction can be user and input in a browser, be also possible to browser Actively triggering is sent, and when browser identification currently needs to carry out speech recognition, triggering sends speech recognition request instruction.

Step S102, it is instructed in response to the speech recognition request, calls microphone, and obtain and use by the microphone The voice flow of family input.

After receiving the request instruction of browser, instructed in response to speech recognition request, according to speech recognition request Microphone is called in instruction, directly goes to adjust microphone without using the Html5 language of browser, but uses the form of voice plug-in unit, It is controlled by Html5 language local program (i.e. plug-in unit), local program goes to call microphone.Successfully call microphone it Afterwards, user obtains voice flow by microphone by microphone input voice.

Specifically, in the calling microphone the step of, comprising:

Due to handling official business, most of the operating system used is all for windows, for this purpose, referring to receiving speech recognition request After order, microphone is called by windows native interface.

In the present embodiment, in the calling microphone the step of, comprising:

After receiving speech recognition request instruction, to judge whether active user there is permission to call microphone, read The user account information for currently logging in computer operating system is taken to judge whether active user has according to user account information Permission calls microphone, if there is active user permission to call microphone, allows speech recognition request to instruct, is known according to voice Other request instruction calls microphone.If active user, which does not have permission, calls microphone, refuse speech recognition request instruction, no It goes to call microphone.

In the present embodiment, each user account of preset configuration calls the permission of microphone, after configuring, use by reading Family account information can get whether the user there is permission to call microphone.

Step S103, text identification is carried out to the voice flow, obtains text information.

After obtaining voice flow by microphone, according to voice flow, voice flow is identified, voice flow is converted into Text information, the text information after obtaining identification voice flow.

In the present embodiment, in step s 103, comprising:

The voice flow is sent to speech recognition engine；

Obtain the text information after speech recognition engine identification.

By interacting with speech recognition engine, voice flow is sent to speech recognition engine, speech recognition engine is to voice Stream is identified, after identifying voice flow, obtains text information, and speech recognition engine again feeds back to text information, to obtain Text information after speech recognition engine of learning from else's experience identification.

In some embodiments, in step s 103, comprising:

C++SDK is the Software Development Kit using C Plus Plus, the C++ identified by using C# language integrated speech SDK makes the function of itself having speech recognition, after getting voice flow, directly identify to voice flow, after recognition Obtain the corresponding text information of voice flow.

Step S105, the text information is sent to the browser.

After obtaining text information, text information is sent to browser, browser after receiving text information, Text information is used on its web page be shown, to realize the speech recognition of browser.

In the present embodiment, the mounting means of voice plug-in unit is to be pacified voice plug-in unit silence by way of pushing installation kit On computers, such mounting means is not only simple, conveniently, but also will not bother the normal work of user for dress, after finishing the installation, Realization is facilitated to call microphone.

In the present embodiment, after step S105, comprising:

The control instruction that user is sent by ActiveX control in a browser, control instruction are known for controlling voice Other start and stop, control instruction include starting control instruction or stop control instruction, receive browser and sent out by ActiveX control The control instruction sent executes the start and stop to speech recognition according to control instruction, specifically, such as after receiving control instruction Fruit control instruction is starting control instruction, the starting to speech recognition is executed according to starting control instruction, if control instruction is Stop control instruction, the stopping to speech recognition is executed according to stopping control instruction.

In the present embodiment, after step S105, comprising:

First text information is sent to the text-to-speech identification engine；

According to the voice broadcast request instruction, audio unit is called；

The voice messaging is broadcasted by the audio unit.

When browser needs to carry out voice broadcast, browser sends voice broadcast request instruction, receives browser and sends Voice broadcast request instruction, after receiving voice broadcast request instruction, according to voice broadcast request instruction, from browser It is middle to obtain the first text information to be broadcasted, after obtaining the first text information, need to convert the first text information, It is converted into voice messaging, just can be carried out casting, for this purpose, by identifying that engine interacts with text-to-speech, by the first text information It is sent to text-to-speech identification engine, text-to-speech identification engine identifies the first text information, by the first text Information is converted, the voice messaging after obtaining the identification of the first text information later, and text-to-speech identification engine believes voice Breath is fed back to.The voice messaging after text turns speech recognition engine identification is obtained, in acquisition voice messaging and then according to language Sound broadcasts request instruction and calls audio unit, is broadcasted voice messaging by audio unit, to realize the language of browser Sound casting.

In the present embodiment, in the step of according to the voice broadcast request instruction, calling audio unit, comprising:

Judge whether active user there is permission to call audio unit according to the user account information；

If there is active user permission to call audio unit, audio list is called according to the voice broadcast request instruction Member.

After receiving voice broadcast request instruction, to judge whether active user there is permission to call audio unit, It reads the current user account information for logging in computer operating system and judges whether active user has according to user account information Calling audio unit is had permission, if there is active user permission to call audio unit, allows voice broadcast request instruction, according to First request instruction calls audio unit.If active user, which does not have permission, calls audio unit, refuse voice broadcast request Instruction does not go to call audio unit.

In conclusion request instruction of the voice plug-in unit according to browser, calls microphone, obtains voice by microphone Stream, then voice flow is subjected to identification and obtains text information, text information is sent to browser later, to be implemented without logical The Html5 interface of browser is crossed to call microphone to obtain voice flow, it is intended to speech recognition will be carried out by solving existing browser, by In browser be IE browser or browser version it is too low do not support Html5 to call microphone, cause browser cannot The problem of calling microphone to obtain voice flow by its Html5 interface.

As shown in Fig. 2, the embodiment of the present application proposes a kind of speech recognition equipment based on artificial intelligence, using voice plug-in unit Mode, device 1 include receiving module 11, calling module 12, obtain module 13, obtain module 14 and sending module 15

Receiving module 11 is instructed for receiving user by the speech recognition request that browser is sent, the speech recognition Request instruction is used for request call microphone.

Calling module 12 calls microphone for instructing in response to the speech recognition request.

After receiving the request instruction of browser, instructed in response to speech recognition request, according to speech recognition request Microphone is called in instruction, directly goes to adjust microphone without using the Html5 language of browser, but uses the form of voice plug-in unit, It is controlled by Html5 language local program (i.e. plug-in unit), local program goes to call microphone.

Specifically, calling module 12 includes:

First sub- identification module, for according to the speech recognition request instruct, identification current operation system whether be Windows operating system；

First sub- calling module, if being that it is former to pass through windows for windows operating system for current operation system Raw interface calls microphone.

In the present embodiment, calling module 12 includes:

First read module, for reading the user account information for currently logging in computer operating system；

First judgment module, for judging whether active user there is permission to call Mike according to the user account information Wind；

First calling module, if for active user there is permission to call microphone, according to the speech recognition request Microphone is called in instruction.

Module 13 is obtained, for obtaining voice flow by the microphone.

After successfully calling microphone, user obtains voice flow by microphone by microphone input voice.

Module 14 is obtained, for carrying out text identification to the voice flow, obtains text information.

In the present embodiment, obtaining module 14 includes:

First sub- sending module, for the voice flow to be sent to speech recognition engine；

First sub-acquisition module, for obtaining the text information after speech recognition engine identification.

In some embodiments, obtaining module 14 includes:

First obtains module, the C++SDK for identifying by integrated speech, after identifying that the voice flow is identified Text information, wherein the C++SDK is the Software Development Kit using C Plus Plus.

Sending module 15, for the text information to be sent to the browser.

In the present embodiment, device 1 includes:

First receiving module, the control instruction sent in the browser by ActiveX control for receiving user, The control instruction is the start and stop for controlling speech recognition；

First execution module, for executing the start and stop to speech recognition in response to the control instruction.

The control instruction that user is sent by ActiveX control in a browser, control instruction are known for controlling voice Other start and stop, control instruction include starting control instruction or stop control instruction, receive browser and sent out by ActiveX control The control instruction sent executes the start and stop to speech recognition according to control instruction, specifically, such as after receiving control instruction Fruit control instruction is starting control instruction, executes starting according to starting control instruction, if control instruction is to stop control instruction, The stopping to speech recognition is executed according to stopping control instruction.

In the present embodiment, device 1 includes:

Second receiving module, the voice broadcast request instruction sent for receiving user by the browser, institute's predicate Sound broadcasts request instruction and the first text information is carried out voice broadcast for requesting；

Second obtains module, for being obtained wait broadcast from the browser in response to the voice broadcast request instruction The first text information；

Second sending module identifies engine for first text information to be sent to the text-to-speech；

Third obtains module, for obtaining the voice messaging after text-to-speech identification engine identification；

Second calling module, for calling audio unit according to the voice broadcast request instruction；

First broadcasting module, for being broadcasted the voice messaging by the audio unit.

When browser needs to carry out voice broadcast, browser sends the first request instruction of voice broadcast, receives browsing The voice broadcast request instruction that device is sent, after receiving voice broadcast request instruction, according to voice broadcast request instruction, from The first text information to be broadcasted is obtained in browser, after obtaining the first text information, need by the first text information into Row conversion, is converted into voice messaging, just can be carried out casting, for this purpose, by identifying that engine interacts with text-to-speech, by the first text This information is sent to text-to-speech identification engine, and text-to-speech identification engine identifies the first text information, by the One text information is converted, the voice messaging after obtaining the identification of the first text information later, and text-to-speech identifies that engine will Voice messaging is fed back to.Obtain through text turn speech recognition engine identification after voice messaging, obtain voice messaging and then Audio unit is called according to voice broadcast request instruction, is broadcasted voice messaging by audio unit, to realize browsing The voice broadcast of device.

In the present embodiment, the second calling module includes:

First sub- read module, for reading the user account information for currently logging in computer operating system；

First sub- judgment module, for judging whether active user there is permission to call sound according to the user account information Frequency unit；

First sub- calling module, if for active user there is permission to call audio unit, according to the voice broadcast Request instruction calls audio unit.

As shown in figure 3, also providing a kind of computer equipment in the embodiment of the present application, which can be service Device, internal structure can be as shown in Figure 3.The computer equipment includes processor, the memory, net connected by system bus Network interface and database.Wherein, the processor of the Computer Design is for providing calculating and control ability.The computer equipment Memory includes non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer journey Sequence and database.The internal memory provides environment for the operation of operating system and computer program in non-volatile memory medium. The database of the computer equipment is for storing the data such as the model of the audio recognition method based on artificial intelligence.The computer is set Standby network interface is used to communicate with external terminal by network connection.To realize when the computer program is executed by processor A kind of audio recognition method based on artificial intelligence.

Above-mentioned processor executes the step of above-mentioned audio recognition method based on artificial intelligence: receiving user and passes through browser The speech recognition request of transmission instructs, and the speech recognition request instruction is used for request call microphone；In response to the voice It identifies request instruction, calls microphone, and obtain the voice flow of user's input by the microphone；The voice flow is carried out Text identification obtains text information；The text information is sent to the browser.

In one embodiment, in above-mentioned the step of carrying out text identification to the voice flow, obtaining text information, packet It includes:

The voice flow is sent to speech recognition engine；

Obtain the text information after speech recognition engine identification；

Or

In one embodiment, in the step of above-mentioned calling microphone, comprising:

In one embodiment, after the above-mentioned the step of text information is sent to the browser, comprising:

First text information is sent to the text-to-speech identification engine；

According to the voice broadcast request instruction, audio unit is called；

The voice messaging is broadcasted by the audio unit.

In one embodiment, above-mentioned audio recognition method is the form using voice plug-in unit, the peace of the voice plug-in unit Dress mode is to be installed the voice plug-in unit silence on computers by way of pushing installation kit.

It will be understood by those skilled in the art that structure shown in Fig. 3, only part relevant to application scheme is tied The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme.

The computer equipment of the embodiment of the present application, voice plug-in unit call microphone, pass through according to the request instruction of browser Microphone obtains voice flow, then voice flow is carried out identification and obtains text information, and text information is sent to browser later, from And the Html5 interface by browser is implemented without to call microphone to obtain voice flow, it is intended to solve existing browser and want Speech recognition is carried out, since the version that browser is IE browser or browser too low does not support Html5 to call Mike Wind, cause browser cannot be called by its Html5 interface microphone obtain voice flow the problem of.

One embodiment of the application also provides a kind of computer readable storage medium, is stored thereon with computer program, calculates Machine program realizes a kind of audio recognition method based on artificial intelligence when being executed by processor, specifically: reception user passes through clear The speech recognition request instruction that device of looking at is sent, the speech recognition request instruction are used for request call microphone；In response to described Microphone is called in speech recognition request instruction, and the voice flow of user's input is obtained by the microphone；To the voice flow Text identification is carried out, text information is obtained；The text information is sent to the browser.

The voice flow is sent to speech recognition engine；

Obtain the text information after speech recognition engine identification；

Or

First text information is sent to the text-to-speech identification engine；

According to the voice broadcast request instruction, audio unit is called；

The voice messaging is broadcasted by the audio unit.

The storage medium of the embodiment of the present application, voice plug-in unit call microphone, pass through wheat according to the request instruction of browser Gram wind obtains voice flow, then voice flow is carried out identification and obtains text information, and text information is sent to browser later, thus The Html5 interface by browser is implemented without to call microphone to obtain voice flow, it is intended to which solving existing browser will be into Row speech recognition, due to browser be IE browser or browser version it is too low do not support Html5 to call microphone, The problem of causing browser that microphone cannot be called to obtain voice flow by its Html5 interface.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, Any reference used in provided herein and embodiment to memory, storage, database or other media, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double speed are according to rate SDRAM (SSRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..

The foregoing is merely the preferred embodiments of the application, not to limit the application, all essences in the application Made any modifications, equivalent replacements, and improvements etc., should be included within the scope of protection of this application within mind and principle.

Claims

1. a kind of audio recognition method based on artificial intelligence, which is characterized in that the described method includes:

It receives user to instruct by the speech recognition request that browser is sent, the speech recognition request instruction is used for request call Microphone；

It is instructed in response to the speech recognition request, calls microphone, and obtain the voice of user's input by the microphone Stream；

The text information is sent to the browser.

2. the audio recognition method according to claim 1 based on artificial intelligence, which is characterized in that described to institute's predicate In the step of sound stream carries out text identification, obtains text information, comprising:

The voice flow is sent to speech recognition engine；

Obtain the text information after speech recognition engine identification；

Or

The C++SDK identified by integrated speech, identifies the text information after the voice flow is identified, wherein the C++ SDK is the Software Development Kit using C Plus Plus.

3. the audio recognition method according to claim 1 based on artificial intelligence, which is characterized in that in the calling Mike In the step of wind, comprising:

4. the audio recognition method according to claim 1 based on artificial intelligence, which is characterized in that described by the text This information was sent to after the step of browser, comprising:

The voice broadcast request instruction that user is sent by the browser is received, the voice broadcast request instruction is for requesting First text information is subjected to voice broadcast；

First text information is sent to the text-to-speech identification engine；

According to the voice broadcast request instruction, audio unit is called；

The voice messaging is broadcasted by the audio unit.

5. the audio recognition method according to claim 1 based on artificial intelligence, which is characterized in that described by the text This information was sent to after the step of browser, comprising:

The control instruction that user is sent in the browser by ActiveX control is received, the control instruction is for controlling The start and stop of speech recognition processed；

6. the audio recognition method according to claim 1 based on artificial intelligence, which is characterized in that in the calling Mike In the step of wind, comprising:

7. the audio recognition method according to claim 1 based on artificial intelligence, which is characterized in that the speech recognition side Method is the form using voice plug-in unit, and the mounting means of the voice plug-in unit is by way of pushing installation kit by the voice Plug-in unit silence is installed on computers.

8. a kind of speech recognition equipment based on artificial intelligence, which is characterized in that described device includes:

Receiving module is instructed for receiving user by the speech recognition request that browser is sent, and the speech recognition request refers to It enables and is used for request call microphone；

Sending module, for the text information to be sent to the browser.

9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists In the processor realizes method described in any one of claims 1 to 7 when executing computer program the step of.

10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of method described in any one of claims 1 to 7 is realized when being executed by processor.