CN109994110A - Audio recognition method, device based on artificial intelligence, computer equipment - Google Patents

Audio recognition method, device based on artificial intelligence, computer equipment Download PDF

Info

Publication number
CN109994110A
CN109994110A CN201811488395.9A CN201811488395A CN109994110A CN 109994110 A CN109994110 A CN 109994110A CN 201811488395 A CN201811488395 A CN 201811488395A CN 109994110 A CN109994110 A CN 109994110A
Authority
CN
China
Prior art keywords
browser
speech recognition
voice
microphone
text information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811488395.9A
Other languages
Chinese (zh)
Inventor
胡宏伟
邹芳
罗小涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201811488395.9A priority Critical patent/CN109994110A/en
Publication of CN109994110A publication Critical patent/CN109994110A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44505Configuring for program initiating, e.g. using registry, configuration files
    • G06F9/4451User profiles; Roaming
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Telephonic Communication Services (AREA)

Abstract

This application involves field of artificial intelligence, in particular to a kind of audio recognition method based on artificial intelligence, device, computer equipment.It is instructed the described method includes: receiving user by the speech recognition request that browser is sent, the speech recognition request instruction is used for request call microphone;It is instructed in response to the speech recognition request, calls microphone, and obtain the voice flow of user's input by the microphone;Text identification is carried out to the voice flow, obtains text information;The text information is sent to the browser.To be implemented without the Html5 interface by browser to call microphone to obtain voice flow, aim to solve the problem that existing browser will carry out speech recognition, since browser is IE browser, or browser version it is too low do not support Html5 to call microphone, cause browser cannot be called by its Html5 interface microphone obtain voice flow the problem of.

Description

Audio recognition method, device based on artificial intelligence, computer equipment
Technical field
This application involves field of artificial intelligence, in particular to a kind of audio recognition method based on artificial intelligence, dress It sets, computer equipment.
Background technique
At present the browser of computer end to carry out speech recognition, general method be by the Html5 interface of browser come Microphone is called to obtain voice flow, such as the Html5 interface of Iflytek speech recognition.But use the Html5 interface of browser Can have some problems to call microphone to obtain voice flow, for example, most of intra-company's employee's office system use it is clear Device of looking at is IE, and they need to carry out speech recognition inside office system, however IE browser itself does not support microphone tune With that is, this Html5 interface cannot realize speech recognition in IE browser, other browsings can be used in some company Intranet computers Device, but other browser versions are more conservative (such as chrome version is too low), and Html5 is not supported to call microphone yet.Logical Cross the Html5 interface of browser call microphone obtain voice flow when, since browser is IE browser or browser Version is too low not to support Html5 to call microphone, and browser is caused microphone cannot to be called to obtain by its Html5 interface The problem of taking voice flow.
Apply for content
In view of the shortcomings of the prior art, the application proposes a kind of audio recognition method based on artificial intelligence, device, computer Equipment and storage medium, it is intended to speech recognition will be carried out by solving existing browser, since browser is IE browser, or browsing The version of device is too low not to support Html5 to call microphone, causes browser that cannot call microphone by its Html5 interface The problem of obtaining voice flow.
The technical solution that the application proposes is:
A kind of audio recognition method based on artificial intelligence, which comprises
It receives user to instruct by the speech recognition request that browser is sent, the speech recognition request instruction is for requesting Call microphone;
It is instructed in response to the speech recognition request, calls microphone, and user's input is obtained by the microphone Voice flow;
Text identification is carried out to the voice flow, obtains text information;
The text information is sent to the browser.
Further, in described the step of carrying out text identification to the voice flow, obtaining text information, comprising:
The voice flow is sent to speech recognition engine;
Obtain the text information after speech recognition engine identification;
Or
The C++SDK identified by integrated speech, identifies the text information after the voice flow is identified, wherein described C++SDK is the Software Development Kit using C Plus Plus.
Further, in the calling microphone the step of, comprising:
Read the user account information for currently logging in computer operating system;
Judge whether active user there is permission to call microphone according to the user account information;
If there is active user permission to call microphone, is instructed according to the speech recognition request and call microphone.
Further, after the described the step of text information is sent to the browser, comprising:
The voice broadcast request instruction that user is sent by the browser is received, the voice broadcast request instruction is used for First text information is carried out voice broadcast by request;
In response to the voice broadcast request instruction, the first text information to be broadcasted is obtained from the browser;
First text information is sent to the text-to-speech identification engine;
Obtain the voice messaging after text-to-speech identification engine identification;
According to the voice broadcast request instruction, audio unit is called;
The voice messaging is broadcasted by the audio unit.
Further, after the described the step of text information is sent to the browser, comprising:
The control instruction that user is sent in the browser by ActiveX control is received, the control instruction is to use In the start and stop of control speech recognition;
In response to the control instruction, the start and stop to speech recognition are executed.
Further, in the calling microphone the step of, comprising:
It is instructed according to the speech recognition request, whether identification current operation system is windows operating system;
If current operation system is to call microphone by windows native interface for windows operating system.
Further, the audio recognition method is the form using voice plug-in unit, the mounting means of the voice plug-in unit For the voice plug-in unit silence is installed on computers by way of pushing installation kit.
The application also provides a kind of speech recognition equipment based on artificial intelligence, and described device includes:
Receiving module is instructed for receiving user by the speech recognition request that browser is sent, and the speech recognition is asked Ask instruction for request call microphone;
Calling module calls microphone for instructing in response to the speech recognition request;
Module is obtained, for obtaining the voice flow of user's input by the microphone;
Module is obtained, for carrying out text identification to the voice flow, obtains text information;
Sending module, for the text information to be sent to the browser.
The application also provides a kind of computer equipment, including memory and processor, and the memory is stored with computer The step of program, the processor realizes method described in any of the above embodiments when executing the computer program.
The application also provides a kind of computer readable storage medium, is stored thereon with computer program, the computer journey The step of method described in any of the above embodiments is realized when sequence is executed by processor.
According to above-mentioned technical solution, for the application the utility model has the advantages that according to the request instruction of browser, voice plug-in unit calls wheat Gram wind obtains voice flow by microphone, then voice flow is carried out identification and obtains text information, is later sent to text information Browser, to be implemented without the Html5 interface by browser to call microphone to obtain voice flow, it is intended to solve existing Browser will carry out speech recognition, since the version that browser is IE browser or browser too low does not support Html5 to adjust With microphone, the problem of causing browser that microphone cannot be called to obtain voice flow by its Html5 interface.
Detailed description of the invention
Fig. 1 is the flow chart using the audio recognition method provided by the embodiments of the present application based on artificial intelligence;
Fig. 2 is the functional module using the audio recognition method device provided by the embodiments of the present application based on artificial intelligence Figure;
Fig. 3 is the structural schematic block diagram using computer equipment provided by the embodiments of the present application.
Specific embodiment
It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, and It is not used in restriction the application.
As shown in Figure 1, the embodiment of the present application proposes a kind of audio recognition method based on artificial intelligence, the method includes Following steps:
Step S101, it receives user to instruct by the speech recognition request that browser is sent, the speech recognition request refers to It enables and is used for request call microphone.
When browser needs to carry out speech recognition, browser sends speech recognition request by Html5 and instructs, and receives clear Device of looking at sends speech recognition request instruction.
In the present embodiment, speech recognition request instruction can be user and input in a browser, be also possible to browser Actively triggering is sent, and when browser identification currently needs to carry out speech recognition, triggering sends speech recognition request instruction.
Step S102, it is instructed in response to the speech recognition request, calls microphone, and obtain and use by the microphone The voice flow of family input.
After receiving the request instruction of browser, instructed in response to speech recognition request, according to speech recognition request Microphone is called in instruction, directly goes to adjust microphone without using the Html5 language of browser, but uses the form of voice plug-in unit, It is controlled by Html5 language local program (i.e. plug-in unit), local program goes to call microphone.Successfully call microphone it Afterwards, user obtains voice flow by microphone by microphone input voice.
Specifically, in the calling microphone the step of, comprising:
It is instructed according to the speech recognition request, whether identification current operation system is windows operating system;
If current operation system is to call microphone by windows native interface for windows operating system.
Due to handling official business, most of the operating system used is all for windows, for this purpose, referring to receiving speech recognition request After order, microphone is called by windows native interface.
In the present embodiment, in the calling microphone the step of, comprising:
Read the user account information for currently logging in computer operating system;
Judge whether active user there is permission to call microphone according to the user account information;
If there is active user permission to call microphone, is instructed according to the speech recognition request and call microphone.
After receiving speech recognition request instruction, to judge whether active user there is permission to call microphone, read The user account information for currently logging in computer operating system is taken to judge whether active user has according to user account information Permission calls microphone, if there is active user permission to call microphone, allows speech recognition request to instruct, is known according to voice Other request instruction calls microphone.If active user, which does not have permission, calls microphone, refuse speech recognition request instruction, no It goes to call microphone.
In the present embodiment, each user account of preset configuration calls the permission of microphone, after configuring, use by reading Family account information can get whether the user there is permission to call microphone.
Step S103, text identification is carried out to the voice flow, obtains text information.
After obtaining voice flow by microphone, according to voice flow, voice flow is identified, voice flow is converted into Text information, the text information after obtaining identification voice flow.
In the present embodiment, in step s 103, comprising:
The voice flow is sent to speech recognition engine;
Obtain the text information after speech recognition engine identification.
By interacting with speech recognition engine, voice flow is sent to speech recognition engine, speech recognition engine is to voice Stream is identified, after identifying voice flow, obtains text information, and speech recognition engine again feeds back to text information, to obtain Text information after speech recognition engine of learning from else's experience identification.
In some embodiments, in step s 103, comprising:
The C++SDK identified by integrated speech, identifies the text information after the voice flow is identified, wherein described C++SDK is the Software Development Kit using C Plus Plus.
C++SDK is the Software Development Kit using C Plus Plus, the C++ identified by using C# language integrated speech SDK makes the function of itself having speech recognition, after getting voice flow, directly identify to voice flow, after recognition Obtain the corresponding text information of voice flow.
Step S105, the text information is sent to the browser.
After obtaining text information, text information is sent to browser, browser after receiving text information, Text information is used on its web page be shown, to realize the speech recognition of browser.
In the present embodiment, the mounting means of voice plug-in unit is to be pacified voice plug-in unit silence by way of pushing installation kit On computers, such mounting means is not only simple, conveniently, but also will not bother the normal work of user for dress, after finishing the installation, Realization is facilitated to call microphone.
In the present embodiment, after step S105, comprising:
The control instruction that user is sent in the browser by ActiveX control is received, the control instruction is to use In the start and stop of control speech recognition;
In response to the control instruction, the start and stop to speech recognition are executed.
The control instruction that user is sent by ActiveX control in a browser, control instruction are known for controlling voice Other start and stop, control instruction include starting control instruction or stop control instruction, receive browser and sent out by ActiveX control The control instruction sent executes the start and stop to speech recognition according to control instruction, specifically, such as after receiving control instruction Fruit control instruction is starting control instruction, the starting to speech recognition is executed according to starting control instruction, if control instruction is Stop control instruction, the stopping to speech recognition is executed according to stopping control instruction.
In the present embodiment, after step S105, comprising:
The voice broadcast request instruction that user is sent by the browser is received, the voice broadcast request instruction is used for First text information is carried out voice broadcast by request;
In response to the voice broadcast request instruction, the first text information to be broadcasted is obtained from the browser;
First text information is sent to the text-to-speech identification engine;
Obtain the voice messaging after text-to-speech identification engine identification;
According to the voice broadcast request instruction, audio unit is called;
The voice messaging is broadcasted by the audio unit.
When browser needs to carry out voice broadcast, browser sends voice broadcast request instruction, receives browser and sends Voice broadcast request instruction, after receiving voice broadcast request instruction, according to voice broadcast request instruction, from browser It is middle to obtain the first text information to be broadcasted, after obtaining the first text information, need to convert the first text information, It is converted into voice messaging, just can be carried out casting, for this purpose, by identifying that engine interacts with text-to-speech, by the first text information It is sent to text-to-speech identification engine, text-to-speech identification engine identifies the first text information, by the first text Information is converted, the voice messaging after obtaining the identification of the first text information later, and text-to-speech identification engine believes voice Breath is fed back to.The voice messaging after text turns speech recognition engine identification is obtained, in acquisition voice messaging and then according to language Sound broadcasts request instruction and calls audio unit, is broadcasted voice messaging by audio unit, to realize the language of browser Sound casting.
In the present embodiment, in the step of according to the voice broadcast request instruction, calling audio unit, comprising:
Read the user account information for currently logging in computer operating system;
Judge whether active user there is permission to call audio unit according to the user account information;
If there is active user permission to call audio unit, audio list is called according to the voice broadcast request instruction Member.
After receiving voice broadcast request instruction, to judge whether active user there is permission to call audio unit, It reads the current user account information for logging in computer operating system and judges whether active user has according to user account information Calling audio unit is had permission, if there is active user permission to call audio unit, allows voice broadcast request instruction, according to First request instruction calls audio unit.If active user, which does not have permission, calls audio unit, refuse voice broadcast request Instruction does not go to call audio unit.
In conclusion request instruction of the voice plug-in unit according to browser, calls microphone, obtains voice by microphone Stream, then voice flow is subjected to identification and obtains text information, text information is sent to browser later, to be implemented without logical The Html5 interface of browser is crossed to call microphone to obtain voice flow, it is intended to speech recognition will be carried out by solving existing browser, by In browser be IE browser or browser version it is too low do not support Html5 to call microphone, cause browser cannot The problem of calling microphone to obtain voice flow by its Html5 interface.
As shown in Fig. 2, the embodiment of the present application proposes a kind of speech recognition equipment based on artificial intelligence, using voice plug-in unit Mode, device 1 include receiving module 11, calling module 12, obtain module 13, obtain module 14 and sending module 15
Receiving module 11 is instructed for receiving user by the speech recognition request that browser is sent, the speech recognition Request instruction is used for request call microphone.
When browser needs to carry out speech recognition, browser sends speech recognition request by Html5 and instructs, and receives clear Device of looking at sends speech recognition request instruction.
In the present embodiment, speech recognition request instruction can be user and input in a browser, be also possible to browser Actively triggering is sent, and when browser identification currently needs to carry out speech recognition, triggering sends speech recognition request instruction.
Calling module 12 calls microphone for instructing in response to the speech recognition request.
After receiving the request instruction of browser, instructed in response to speech recognition request, according to speech recognition request Microphone is called in instruction, directly goes to adjust microphone without using the Html5 language of browser, but uses the form of voice plug-in unit, It is controlled by Html5 language local program (i.e. plug-in unit), local program goes to call microphone.
Specifically, calling module 12 includes:
First sub- identification module, for according to the speech recognition request instruct, identification current operation system whether be Windows operating system;
First sub- calling module, if being that it is former to pass through windows for windows operating system for current operation system Raw interface calls microphone.
Due to handling official business, most of the operating system used is all for windows, for this purpose, referring to receiving speech recognition request After order, microphone is called by windows native interface.
In the present embodiment, calling module 12 includes:
First read module, for reading the user account information for currently logging in computer operating system;
First judgment module, for judging whether active user there is permission to call Mike according to the user account information Wind;
First calling module, if for active user there is permission to call microphone, according to the speech recognition request Microphone is called in instruction.
After receiving speech recognition request instruction, to judge whether active user there is permission to call microphone, read The user account information for currently logging in computer operating system is taken to judge whether active user has according to user account information Permission calls microphone, if there is active user permission to call microphone, allows speech recognition request to instruct, is known according to voice Other request instruction calls microphone.If active user, which does not have permission, calls microphone, refuse speech recognition request instruction, no It goes to call microphone.
In the present embodiment, each user account of preset configuration calls the permission of microphone, after configuring, use by reading Family account information can get whether the user there is permission to call microphone.
Module 13 is obtained, for obtaining voice flow by the microphone.
After successfully calling microphone, user obtains voice flow by microphone by microphone input voice.
Module 14 is obtained, for carrying out text identification to the voice flow, obtains text information.
After obtaining voice flow by microphone, according to voice flow, voice flow is identified, voice flow is converted into Text information, the text information after obtaining identification voice flow.
In the present embodiment, obtaining module 14 includes:
First sub- sending module, for the voice flow to be sent to speech recognition engine;
First sub-acquisition module, for obtaining the text information after speech recognition engine identification.
By interacting with speech recognition engine, voice flow is sent to speech recognition engine, speech recognition engine is to voice Stream is identified, after identifying voice flow, obtains text information, and speech recognition engine again feeds back to text information, to obtain Text information after speech recognition engine of learning from else's experience identification.
In some embodiments, obtaining module 14 includes:
First obtains module, the C++SDK for identifying by integrated speech, after identifying that the voice flow is identified Text information, wherein the C++SDK is the Software Development Kit using C Plus Plus.
C++SDK is the Software Development Kit using C Plus Plus, the C++ identified by using C# language integrated speech SDK makes the function of itself having speech recognition, after getting voice flow, directly identify to voice flow, after recognition Obtain the corresponding text information of voice flow.
Sending module 15, for the text information to be sent to the browser.
After obtaining text information, text information is sent to browser, browser after receiving text information, Text information is used on its web page be shown, to realize the speech recognition of browser.
In the present embodiment, the mounting means of voice plug-in unit is to be pacified voice plug-in unit silence by way of pushing installation kit On computers, such mounting means is not only simple, conveniently, but also will not bother the normal work of user for dress, after finishing the installation, Realization is facilitated to call microphone.
In the present embodiment, device 1 includes:
First receiving module, the control instruction sent in the browser by ActiveX control for receiving user, The control instruction is the start and stop for controlling speech recognition;
First execution module, for executing the start and stop to speech recognition in response to the control instruction.
The control instruction that user is sent by ActiveX control in a browser, control instruction are known for controlling voice Other start and stop, control instruction include starting control instruction or stop control instruction, receive browser and sent out by ActiveX control The control instruction sent executes the start and stop to speech recognition according to control instruction, specifically, such as after receiving control instruction Fruit control instruction is starting control instruction, executes starting according to starting control instruction, if control instruction is to stop control instruction, The stopping to speech recognition is executed according to stopping control instruction.
In the present embodiment, device 1 includes:
Second receiving module, the voice broadcast request instruction sent for receiving user by the browser, institute's predicate Sound broadcasts request instruction and the first text information is carried out voice broadcast for requesting;
Second obtains module, for being obtained wait broadcast from the browser in response to the voice broadcast request instruction The first text information;
Second sending module identifies engine for first text information to be sent to the text-to-speech;
Third obtains module, for obtaining the voice messaging after text-to-speech identification engine identification;
Second calling module, for calling audio unit according to the voice broadcast request instruction;
First broadcasting module, for being broadcasted the voice messaging by the audio unit.
When browser needs to carry out voice broadcast, browser sends the first request instruction of voice broadcast, receives browsing The voice broadcast request instruction that device is sent, after receiving voice broadcast request instruction, according to voice broadcast request instruction, from The first text information to be broadcasted is obtained in browser, after obtaining the first text information, need by the first text information into Row conversion, is converted into voice messaging, just can be carried out casting, for this purpose, by identifying that engine interacts with text-to-speech, by the first text This information is sent to text-to-speech identification engine, and text-to-speech identification engine identifies the first text information, by the One text information is converted, the voice messaging after obtaining the identification of the first text information later, and text-to-speech identifies that engine will Voice messaging is fed back to.Obtain through text turn speech recognition engine identification after voice messaging, obtain voice messaging and then Audio unit is called according to voice broadcast request instruction, is broadcasted voice messaging by audio unit, to realize browsing The voice broadcast of device.
In the present embodiment, the second calling module includes:
First sub- read module, for reading the user account information for currently logging in computer operating system;
First sub- judgment module, for judging whether active user there is permission to call sound according to the user account information Frequency unit;
First sub- calling module, if for active user there is permission to call audio unit, according to the voice broadcast Request instruction calls audio unit.
After receiving voice broadcast request instruction, to judge whether active user there is permission to call audio unit, It reads the current user account information for logging in computer operating system and judges whether active user has according to user account information Calling audio unit is had permission, if there is active user permission to call audio unit, allows voice broadcast request instruction, according to First request instruction calls audio unit.If active user, which does not have permission, calls audio unit, refuse voice broadcast request Instruction does not go to call audio unit.
In conclusion request instruction of the voice plug-in unit according to browser, calls microphone, obtains voice by microphone Stream, then voice flow is subjected to identification and obtains text information, text information is sent to browser later, to be implemented without logical The Html5 interface of browser is crossed to call microphone to obtain voice flow, it is intended to speech recognition will be carried out by solving existing browser, by In browser be IE browser or browser version it is too low do not support Html5 to call microphone, cause browser cannot The problem of calling microphone to obtain voice flow by its Html5 interface.
As shown in figure 3, also providing a kind of computer equipment in the embodiment of the present application, which can be service Device, internal structure can be as shown in Figure 3.The computer equipment includes processor, the memory, net connected by system bus Network interface and database.Wherein, the processor of the Computer Design is for providing calculating and control ability.The computer equipment Memory includes non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer journey Sequence and database.The internal memory provides environment for the operation of operating system and computer program in non-volatile memory medium. The database of the computer equipment is for storing the data such as the model of the audio recognition method based on artificial intelligence.The computer is set Standby network interface is used to communicate with external terminal by network connection.To realize when the computer program is executed by processor A kind of audio recognition method based on artificial intelligence.
Above-mentioned processor executes the step of above-mentioned audio recognition method based on artificial intelligence: receiving user and passes through browser The speech recognition request of transmission instructs, and the speech recognition request instruction is used for request call microphone;In response to the voice It identifies request instruction, calls microphone, and obtain the voice flow of user's input by the microphone;The voice flow is carried out Text identification obtains text information;The text information is sent to the browser.
In one embodiment, in above-mentioned the step of carrying out text identification to the voice flow, obtaining text information, packet It includes:
The voice flow is sent to speech recognition engine;
Obtain the text information after speech recognition engine identification;
Or
The C++SDK identified by integrated speech, identifies the text information after the voice flow is identified, wherein described C++SDK is the Software Development Kit using C Plus Plus.
In one embodiment, in the step of above-mentioned calling microphone, comprising:
Read the user account information for currently logging in computer operating system;
Judge whether active user there is permission to call microphone according to the user account information;
If there is active user permission to call microphone, is instructed according to the speech recognition request and call microphone.
In one embodiment, after the above-mentioned the step of text information is sent to the browser, comprising:
The voice broadcast request instruction that user is sent by the browser is received, the voice broadcast request instruction is used for First text information is carried out voice broadcast by request;
In response to the voice broadcast request instruction, the first text information to be broadcasted is obtained from the browser;
First text information is sent to the text-to-speech identification engine;
Obtain the voice messaging after text-to-speech identification engine identification;
According to the voice broadcast request instruction, audio unit is called;
The voice messaging is broadcasted by the audio unit.
In one embodiment, after the above-mentioned the step of text information is sent to the browser, comprising:
The control instruction that user is sent in the browser by ActiveX control is received, the control instruction is to use In the start and stop of control speech recognition;
In response to the control instruction, the start and stop to speech recognition are executed.
In one embodiment, in the step of above-mentioned calling microphone, comprising:
It is instructed according to the speech recognition request, whether identification current operation system is windows operating system;
If current operation system is to call microphone by windows native interface for windows operating system.
In one embodiment, above-mentioned audio recognition method is the form using voice plug-in unit, the peace of the voice plug-in unit Dress mode is to be installed the voice plug-in unit silence on computers by way of pushing installation kit.
It will be understood by those skilled in the art that structure shown in Fig. 3, only part relevant to application scheme is tied The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme.
The computer equipment of the embodiment of the present application, voice plug-in unit call microphone, pass through according to the request instruction of browser Microphone obtains voice flow, then voice flow is carried out identification and obtains text information, and text information is sent to browser later, from And the Html5 interface by browser is implemented without to call microphone to obtain voice flow, it is intended to solve existing browser and want Speech recognition is carried out, since the version that browser is IE browser or browser too low does not support Html5 to call Mike Wind, cause browser cannot be called by its Html5 interface microphone obtain voice flow the problem of.
One embodiment of the application also provides a kind of computer readable storage medium, is stored thereon with computer program, calculates Machine program realizes a kind of audio recognition method based on artificial intelligence when being executed by processor, specifically: reception user passes through clear The speech recognition request instruction that device of looking at is sent, the speech recognition request instruction are used for request call microphone;In response to described Microphone is called in speech recognition request instruction, and the voice flow of user's input is obtained by the microphone;To the voice flow Text identification is carried out, text information is obtained;The text information is sent to the browser.
In one embodiment, in above-mentioned the step of carrying out text identification to the voice flow, obtaining text information, packet It includes:
The voice flow is sent to speech recognition engine;
Obtain the text information after speech recognition engine identification;
Or
The C++SDK identified by integrated speech, identifies the text information after the voice flow is identified, wherein described C++SDK is the Software Development Kit using C Plus Plus.
In one embodiment, in the step of above-mentioned calling microphone, comprising:
Read the user account information for currently logging in computer operating system;
Judge whether active user there is permission to call microphone according to the user account information;
If there is active user permission to call microphone, is instructed according to the speech recognition request and call microphone.
In one embodiment, after the above-mentioned the step of text information is sent to the browser, comprising:
The voice broadcast request instruction that user is sent by the browser is received, the voice broadcast request instruction is used for First text information is carried out voice broadcast by request;
In response to the voice broadcast request instruction, the first text information to be broadcasted is obtained from the browser;
First text information is sent to the text-to-speech identification engine;
Obtain the voice messaging after text-to-speech identification engine identification;
According to the voice broadcast request instruction, audio unit is called;
The voice messaging is broadcasted by the audio unit.
In one embodiment, after the above-mentioned the step of text information is sent to the browser, comprising:
The control instruction that user is sent in the browser by ActiveX control is received, the control instruction is to use In the start and stop of control speech recognition;
In response to the control instruction, the start and stop to speech recognition are executed.
In one embodiment, in the step of above-mentioned calling microphone, comprising:
It is instructed according to the speech recognition request, whether identification current operation system is windows operating system;
If current operation system is to call microphone by windows native interface for windows operating system.
In one embodiment, above-mentioned audio recognition method is the form using voice plug-in unit, the peace of the voice plug-in unit Dress mode is to be installed the voice plug-in unit silence on computers by way of pushing installation kit.
The storage medium of the embodiment of the present application, voice plug-in unit call microphone, pass through wheat according to the request instruction of browser Gram wind obtains voice flow, then voice flow is carried out identification and obtains text information, and text information is sent to browser later, thus The Html5 interface by browser is implemented without to call microphone to obtain voice flow, it is intended to which solving existing browser will be into Row speech recognition, due to browser be IE browser or browser version it is too low do not support Html5 to call microphone, The problem of causing browser that microphone cannot be called to obtain voice flow by its Html5 interface.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, Any reference used in provided herein and embodiment to memory, storage, database or other media, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double speed are according to rate SDRAM (SSRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
The foregoing is merely the preferred embodiments of the application, not to limit the application, all essences in the application Made any modifications, equivalent replacements, and improvements etc., should be included within the scope of protection of this application within mind and principle.

Claims (10)

1. a kind of audio recognition method based on artificial intelligence, which is characterized in that the described method includes:
It receives user to instruct by the speech recognition request that browser is sent, the speech recognition request instruction is used for request call Microphone;
It is instructed in response to the speech recognition request, calls microphone, and obtain the voice of user's input by the microphone Stream;
Text identification is carried out to the voice flow, obtains text information;
The text information is sent to the browser.
2. the audio recognition method according to claim 1 based on artificial intelligence, which is characterized in that described to institute's predicate In the step of sound stream carries out text identification, obtains text information, comprising:
The voice flow is sent to speech recognition engine;
Obtain the text information after speech recognition engine identification;
Or
The C++SDK identified by integrated speech, identifies the text information after the voice flow is identified, wherein the C++ SDK is the Software Development Kit using C Plus Plus.
3. the audio recognition method according to claim 1 based on artificial intelligence, which is characterized in that in the calling Mike In the step of wind, comprising:
Read the user account information for currently logging in computer operating system;
Judge whether active user there is permission to call microphone according to the user account information;
If there is active user permission to call microphone, is instructed according to the speech recognition request and call microphone.
4. the audio recognition method according to claim 1 based on artificial intelligence, which is characterized in that described by the text This information was sent to after the step of browser, comprising:
The voice broadcast request instruction that user is sent by the browser is received, the voice broadcast request instruction is for requesting First text information is subjected to voice broadcast;
In response to the voice broadcast request instruction, the first text information to be broadcasted is obtained from the browser;
First text information is sent to the text-to-speech identification engine;
Obtain the voice messaging after text-to-speech identification engine identification;
According to the voice broadcast request instruction, audio unit is called;
The voice messaging is broadcasted by the audio unit.
5. the audio recognition method according to claim 1 based on artificial intelligence, which is characterized in that described by the text This information was sent to after the step of browser, comprising:
The control instruction that user is sent in the browser by ActiveX control is received, the control instruction is for controlling The start and stop of speech recognition processed;
In response to the control instruction, the start and stop to speech recognition are executed.
6. the audio recognition method according to claim 1 based on artificial intelligence, which is characterized in that in the calling Mike In the step of wind, comprising:
It is instructed according to the speech recognition request, whether identification current operation system is windows operating system;
If current operation system is to call microphone by windows native interface for windows operating system.
7. the audio recognition method according to claim 1 based on artificial intelligence, which is characterized in that the speech recognition side Method is the form using voice plug-in unit, and the mounting means of the voice plug-in unit is by way of pushing installation kit by the voice Plug-in unit silence is installed on computers.
8. a kind of speech recognition equipment based on artificial intelligence, which is characterized in that described device includes:
Receiving module is instructed for receiving user by the speech recognition request that browser is sent, and the speech recognition request refers to It enables and is used for request call microphone;
Calling module calls microphone for instructing in response to the speech recognition request;
Module is obtained, for obtaining the voice flow of user's input by the microphone;
Module is obtained, for carrying out text identification to the voice flow, obtains text information;
Sending module, for the text information to be sent to the browser.
9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists In the processor realizes method described in any one of claims 1 to 7 when executing computer program the step of.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of method described in any one of claims 1 to 7 is realized when being executed by processor.
CN201811488395.9A 2018-12-06 2018-12-06 Audio recognition method, device based on artificial intelligence, computer equipment Pending CN109994110A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811488395.9A CN109994110A (en) 2018-12-06 2018-12-06 Audio recognition method, device based on artificial intelligence, computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811488395.9A CN109994110A (en) 2018-12-06 2018-12-06 Audio recognition method, device based on artificial intelligence, computer equipment

Publications (1)

Publication Number Publication Date
CN109994110A true CN109994110A (en) 2019-07-09

Family

ID=67128688

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811488395.9A Pending CN109994110A (en) 2018-12-06 2018-12-06 Audio recognition method, device based on artificial intelligence, computer equipment

Country Status (1)

Country Link
CN (1) CN109994110A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112700779A (en) * 2020-12-29 2021-04-23 南方电网深圳数字电网研究院有限公司 Voice interaction method, system, browser and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102520792A (en) * 2011-11-30 2012-06-27 江苏奇异点网络有限公司 Voice-type interaction method for network browser
CN102968992A (en) * 2012-11-26 2013-03-13 北京奇虎科技有限公司 Voice identification processing method for internet explorer and internet explorer
CN102981738A (en) * 2012-10-31 2013-03-20 北京百度网讯科技有限公司 Method and system and browser for carrying out interaction with webpage through microphone
JP2016102899A (en) * 2014-11-28 2016-06-02 日本電信電話株式会社 Voice recognition device, voice recognition method, and voice recognition program
CN106373574A (en) * 2016-08-31 2017-02-01 乐视控股(北京)有限公司 Speech recognition processing method and device
CN107943405A (en) * 2016-10-13 2018-04-20 广州市动景计算机科技有限公司 Sound broadcasting device, method, browser and user terminal

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102520792A (en) * 2011-11-30 2012-06-27 江苏奇异点网络有限公司 Voice-type interaction method for network browser
CN102981738A (en) * 2012-10-31 2013-03-20 北京百度网讯科技有限公司 Method and system and browser for carrying out interaction with webpage through microphone
CN102968992A (en) * 2012-11-26 2013-03-13 北京奇虎科技有限公司 Voice identification processing method for internet explorer and internet explorer
JP2016102899A (en) * 2014-11-28 2016-06-02 日本電信電話株式会社 Voice recognition device, voice recognition method, and voice recognition program
CN106373574A (en) * 2016-08-31 2017-02-01 乐视控股(北京)有限公司 Speech recognition processing method and device
CN107943405A (en) * 2016-10-13 2018-04-20 广州市动景计算机科技有限公司 Sound broadcasting device, method, browser and user terminal

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112700779A (en) * 2020-12-29 2021-04-23 南方电网深圳数字电网研究院有限公司 Voice interaction method, system, browser and storage medium

Similar Documents

Publication Publication Date Title
WO2022116825A1 (en) Artificial intelligence-based audio processing method and apparatus, electronic device, computer readable storage medium, and computer program product
US10388277B1 (en) Allocation of local and remote resources for speech processing
CN109636317B (en) Service control method, device, system and storage medium
JP2020533628A (en) Selective voice-activated memory for voice capture devices
US20150149560A1 (en) System and method for relaying messages
CN112201222B (en) Voice interaction method, device, equipment and storage medium based on voice call
CN107018228B (en) Voice control system, voice processing method and terminal equipment
CN109995861B (en) Relay communication method and system for vehicle-mounted system application and vehicle-mounted peripheral device
CN111599358A (en) Voice interaction method and electronic equipment
CN108650419A (en) Telephone interpretation system based on smart mobile phone
CN106373566A (en) Data transmission control method and device
DE102012219020A1 (en) ARCHITECTURE FOR MOBILE TUNING PLATFORM
CN109994110A (en) Audio recognition method, device based on artificial intelligence, computer equipment
CN113159483A (en) Task scheduling method and device based on RPA and AI, robot and medium
JP6689953B2 (en) Interpreter service system, interpreter service method, and interpreter service program
CN112017663A (en) Voice generalization method and device and computer storage medium
CN111814494A (en) Language translation method and device and computer equipment
CN106686245B (en) Working mode adjusting method and device
CN110459209A (en) Audio recognition method, device, equipment and storage medium
CN103176998A (en) Read auxiliary system based on voice recognition
CN113411503B (en) Cloud mobile phone camera preview method and device, computer equipment and storage medium
CN116312472A (en) Method and device for designing robot speaking group, computer equipment and storage medium
CN114390144A (en) Intelligent processing method, device and control system for voice incoming call
CN112003991A (en) Outbound method and related equipment
KR101944187B1 (en) voice services providing method with visually impaired

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination