CN109994110A - Audio recognition method, device based on artificial intelligence, computer equipment - Google Patents
Audio recognition method, device based on artificial intelligence, computer equipment Download PDFInfo
- Publication number
- CN109994110A CN109994110A CN201811488395.9A CN201811488395A CN109994110A CN 109994110 A CN109994110 A CN 109994110A CN 201811488395 A CN201811488395 A CN 201811488395A CN 109994110 A CN109994110 A CN 109994110A
- Authority
- CN
- China
- Prior art keywords
- browser
- speech recognition
- voice
- microphone
- text information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000013473 artificial intelligence Methods 0.000 title claims abstract description 25
- 230000004044 response Effects 0.000 claims abstract description 23
- 238000004590 computer program Methods 0.000 claims description 12
- 238000009434 installation Methods 0.000 claims description 8
- 238000005266 casting Methods 0.000 description 3
- 241000209140 Triticum Species 0.000 description 2
- 235000021307 Triticum Nutrition 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- VYZAMTAEIAYCRO-UHFFFAOYSA-N Chromium Chemical compound [Cr] VYZAMTAEIAYCRO-UHFFFAOYSA-N 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 239000000686 essence Substances 0.000 description 1
- 238000007429 general method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/445—Program loading or initiating
- G06F9/44505—Configuring for program initiating, e.g. using registry, configuration files
- G06F9/4451—User profiles; Roaming
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Acoustics & Sound (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Telephonic Communication Services (AREA)
Abstract
This application involves field of artificial intelligence, in particular to a kind of audio recognition method based on artificial intelligence, device, computer equipment.It is instructed the described method includes: receiving user by the speech recognition request that browser is sent, the speech recognition request instruction is used for request call microphone;It is instructed in response to the speech recognition request, calls microphone, and obtain the voice flow of user's input by the microphone;Text identification is carried out to the voice flow, obtains text information;The text information is sent to the browser.To be implemented without the Html5 interface by browser to call microphone to obtain voice flow, aim to solve the problem that existing browser will carry out speech recognition, since browser is IE browser, or browser version it is too low do not support Html5 to call microphone, cause browser cannot be called by its Html5 interface microphone obtain voice flow the problem of.
Description
Technical field
This application involves field of artificial intelligence, in particular to a kind of audio recognition method based on artificial intelligence, dress
It sets, computer equipment.
Background technique
At present the browser of computer end to carry out speech recognition, general method be by the Html5 interface of browser come
Microphone is called to obtain voice flow, such as the Html5 interface of Iflytek speech recognition.But use the Html5 interface of browser
Can have some problems to call microphone to obtain voice flow, for example, most of intra-company's employee's office system use it is clear
Device of looking at is IE, and they need to carry out speech recognition inside office system, however IE browser itself does not support microphone tune
With that is, this Html5 interface cannot realize speech recognition in IE browser, other browsings can be used in some company Intranet computers
Device, but other browser versions are more conservative (such as chrome version is too low), and Html5 is not supported to call microphone yet.Logical
Cross the Html5 interface of browser call microphone obtain voice flow when, since browser is IE browser or browser
Version is too low not to support Html5 to call microphone, and browser is caused microphone cannot to be called to obtain by its Html5 interface
The problem of taking voice flow.
Apply for content
In view of the shortcomings of the prior art, the application proposes a kind of audio recognition method based on artificial intelligence, device, computer
Equipment and storage medium, it is intended to speech recognition will be carried out by solving existing browser, since browser is IE browser, or browsing
The version of device is too low not to support Html5 to call microphone, causes browser that cannot call microphone by its Html5 interface
The problem of obtaining voice flow.
The technical solution that the application proposes is:
A kind of audio recognition method based on artificial intelligence, which comprises
It receives user to instruct by the speech recognition request that browser is sent, the speech recognition request instruction is for requesting
Call microphone;
It is instructed in response to the speech recognition request, calls microphone, and user's input is obtained by the microphone
Voice flow;
Text identification is carried out to the voice flow, obtains text information;
The text information is sent to the browser.
Further, in described the step of carrying out text identification to the voice flow, obtaining text information, comprising:
The voice flow is sent to speech recognition engine;
Obtain the text information after speech recognition engine identification;
Or
The C++SDK identified by integrated speech, identifies the text information after the voice flow is identified, wherein described
C++SDK is the Software Development Kit using C Plus Plus.
Further, in the calling microphone the step of, comprising:
Read the user account information for currently logging in computer operating system;
Judge whether active user there is permission to call microphone according to the user account information;
If there is active user permission to call microphone, is instructed according to the speech recognition request and call microphone.
Further, after the described the step of text information is sent to the browser, comprising:
The voice broadcast request instruction that user is sent by the browser is received, the voice broadcast request instruction is used for
First text information is carried out voice broadcast by request;
In response to the voice broadcast request instruction, the first text information to be broadcasted is obtained from the browser;
First text information is sent to the text-to-speech identification engine;
Obtain the voice messaging after text-to-speech identification engine identification;
According to the voice broadcast request instruction, audio unit is called;
The voice messaging is broadcasted by the audio unit.
Further, after the described the step of text information is sent to the browser, comprising:
The control instruction that user is sent in the browser by ActiveX control is received, the control instruction is to use
In the start and stop of control speech recognition;
In response to the control instruction, the start and stop to speech recognition are executed.
Further, in the calling microphone the step of, comprising:
It is instructed according to the speech recognition request, whether identification current operation system is windows operating system;
If current operation system is to call microphone by windows native interface for windows operating system.
Further, the audio recognition method is the form using voice plug-in unit, the mounting means of the voice plug-in unit
For the voice plug-in unit silence is installed on computers by way of pushing installation kit.
The application also provides a kind of speech recognition equipment based on artificial intelligence, and described device includes:
Receiving module is instructed for receiving user by the speech recognition request that browser is sent, and the speech recognition is asked
Ask instruction for request call microphone;
Calling module calls microphone for instructing in response to the speech recognition request;
Module is obtained, for obtaining the voice flow of user's input by the microphone;
Module is obtained, for carrying out text identification to the voice flow, obtains text information;
Sending module, for the text information to be sent to the browser.
The application also provides a kind of computer equipment, including memory and processor, and the memory is stored with computer
The step of program, the processor realizes method described in any of the above embodiments when executing the computer program.
The application also provides a kind of computer readable storage medium, is stored thereon with computer program, the computer journey
The step of method described in any of the above embodiments is realized when sequence is executed by processor.
According to above-mentioned technical solution, for the application the utility model has the advantages that according to the request instruction of browser, voice plug-in unit calls wheat
Gram wind obtains voice flow by microphone, then voice flow is carried out identification and obtains text information, is later sent to text information
Browser, to be implemented without the Html5 interface by browser to call microphone to obtain voice flow, it is intended to solve existing
Browser will carry out speech recognition, since the version that browser is IE browser or browser too low does not support Html5 to adjust
With microphone, the problem of causing browser that microphone cannot be called to obtain voice flow by its Html5 interface.
Detailed description of the invention
Fig. 1 is the flow chart using the audio recognition method provided by the embodiments of the present application based on artificial intelligence;
Fig. 2 is the functional module using the audio recognition method device provided by the embodiments of the present application based on artificial intelligence
Figure;
Fig. 3 is the structural schematic block diagram using computer equipment provided by the embodiments of the present application.
Specific embodiment
It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood
The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, and
It is not used in restriction the application.
As shown in Figure 1, the embodiment of the present application proposes a kind of audio recognition method based on artificial intelligence, the method includes
Following steps:
Step S101, it receives user to instruct by the speech recognition request that browser is sent, the speech recognition request refers to
It enables and is used for request call microphone.
When browser needs to carry out speech recognition, browser sends speech recognition request by Html5 and instructs, and receives clear
Device of looking at sends speech recognition request instruction.
In the present embodiment, speech recognition request instruction can be user and input in a browser, be also possible to browser
Actively triggering is sent, and when browser identification currently needs to carry out speech recognition, triggering sends speech recognition request instruction.
Step S102, it is instructed in response to the speech recognition request, calls microphone, and obtain and use by the microphone
The voice flow of family input.
After receiving the request instruction of browser, instructed in response to speech recognition request, according to speech recognition request
Microphone is called in instruction, directly goes to adjust microphone without using the Html5 language of browser, but uses the form of voice plug-in unit,
It is controlled by Html5 language local program (i.e. plug-in unit), local program goes to call microphone.Successfully call microphone it
Afterwards, user obtains voice flow by microphone by microphone input voice.
Specifically, in the calling microphone the step of, comprising:
It is instructed according to the speech recognition request, whether identification current operation system is windows operating system;
If current operation system is to call microphone by windows native interface for windows operating system.
Due to handling official business, most of the operating system used is all for windows, for this purpose, referring to receiving speech recognition request
After order, microphone is called by windows native interface.
In the present embodiment, in the calling microphone the step of, comprising:
Read the user account information for currently logging in computer operating system;
Judge whether active user there is permission to call microphone according to the user account information;
If there is active user permission to call microphone, is instructed according to the speech recognition request and call microphone.
After receiving speech recognition request instruction, to judge whether active user there is permission to call microphone, read
The user account information for currently logging in computer operating system is taken to judge whether active user has according to user account information
Permission calls microphone, if there is active user permission to call microphone, allows speech recognition request to instruct, is known according to voice
Other request instruction calls microphone.If active user, which does not have permission, calls microphone, refuse speech recognition request instruction, no
It goes to call microphone.
In the present embodiment, each user account of preset configuration calls the permission of microphone, after configuring, use by reading
Family account information can get whether the user there is permission to call microphone.
Step S103, text identification is carried out to the voice flow, obtains text information.
After obtaining voice flow by microphone, according to voice flow, voice flow is identified, voice flow is converted into
Text information, the text information after obtaining identification voice flow.
In the present embodiment, in step s 103, comprising:
The voice flow is sent to speech recognition engine;
Obtain the text information after speech recognition engine identification.
By interacting with speech recognition engine, voice flow is sent to speech recognition engine, speech recognition engine is to voice
Stream is identified, after identifying voice flow, obtains text information, and speech recognition engine again feeds back to text information, to obtain
Text information after speech recognition engine of learning from else's experience identification.
In some embodiments, in step s 103, comprising:
The C++SDK identified by integrated speech, identifies the text information after the voice flow is identified, wherein described
C++SDK is the Software Development Kit using C Plus Plus.
C++SDK is the Software Development Kit using C Plus Plus, the C++ identified by using C# language integrated speech
SDK makes the function of itself having speech recognition, after getting voice flow, directly identify to voice flow, after recognition
Obtain the corresponding text information of voice flow.
Step S105, the text information is sent to the browser.
After obtaining text information, text information is sent to browser, browser after receiving text information,
Text information is used on its web page be shown, to realize the speech recognition of browser.
In the present embodiment, the mounting means of voice plug-in unit is to be pacified voice plug-in unit silence by way of pushing installation kit
On computers, such mounting means is not only simple, conveniently, but also will not bother the normal work of user for dress, after finishing the installation,
Realization is facilitated to call microphone.
In the present embodiment, after step S105, comprising:
The control instruction that user is sent in the browser by ActiveX control is received, the control instruction is to use
In the start and stop of control speech recognition;
In response to the control instruction, the start and stop to speech recognition are executed.
The control instruction that user is sent by ActiveX control in a browser, control instruction are known for controlling voice
Other start and stop, control instruction include starting control instruction or stop control instruction, receive browser and sent out by ActiveX control
The control instruction sent executes the start and stop to speech recognition according to control instruction, specifically, such as after receiving control instruction
Fruit control instruction is starting control instruction, the starting to speech recognition is executed according to starting control instruction, if control instruction is
Stop control instruction, the stopping to speech recognition is executed according to stopping control instruction.
In the present embodiment, after step S105, comprising:
The voice broadcast request instruction that user is sent by the browser is received, the voice broadcast request instruction is used for
First text information is carried out voice broadcast by request;
In response to the voice broadcast request instruction, the first text information to be broadcasted is obtained from the browser;
First text information is sent to the text-to-speech identification engine;
Obtain the voice messaging after text-to-speech identification engine identification;
According to the voice broadcast request instruction, audio unit is called;
The voice messaging is broadcasted by the audio unit.
When browser needs to carry out voice broadcast, browser sends voice broadcast request instruction, receives browser and sends
Voice broadcast request instruction, after receiving voice broadcast request instruction, according to voice broadcast request instruction, from browser
It is middle to obtain the first text information to be broadcasted, after obtaining the first text information, need to convert the first text information,
It is converted into voice messaging, just can be carried out casting, for this purpose, by identifying that engine interacts with text-to-speech, by the first text information
It is sent to text-to-speech identification engine, text-to-speech identification engine identifies the first text information, by the first text
Information is converted, the voice messaging after obtaining the identification of the first text information later, and text-to-speech identification engine believes voice
Breath is fed back to.The voice messaging after text turns speech recognition engine identification is obtained, in acquisition voice messaging and then according to language
Sound broadcasts request instruction and calls audio unit, is broadcasted voice messaging by audio unit, to realize the language of browser
Sound casting.
In the present embodiment, in the step of according to the voice broadcast request instruction, calling audio unit, comprising:
Read the user account information for currently logging in computer operating system;
Judge whether active user there is permission to call audio unit according to the user account information;
If there is active user permission to call audio unit, audio list is called according to the voice broadcast request instruction
Member.
After receiving voice broadcast request instruction, to judge whether active user there is permission to call audio unit,
It reads the current user account information for logging in computer operating system and judges whether active user has according to user account information
Calling audio unit is had permission, if there is active user permission to call audio unit, allows voice broadcast request instruction, according to
First request instruction calls audio unit.If active user, which does not have permission, calls audio unit, refuse voice broadcast request
Instruction does not go to call audio unit.
In conclusion request instruction of the voice plug-in unit according to browser, calls microphone, obtains voice by microphone
Stream, then voice flow is subjected to identification and obtains text information, text information is sent to browser later, to be implemented without logical
The Html5 interface of browser is crossed to call microphone to obtain voice flow, it is intended to speech recognition will be carried out by solving existing browser, by
In browser be IE browser or browser version it is too low do not support Html5 to call microphone, cause browser cannot
The problem of calling microphone to obtain voice flow by its Html5 interface.
As shown in Fig. 2, the embodiment of the present application proposes a kind of speech recognition equipment based on artificial intelligence, using voice plug-in unit
Mode, device 1 include receiving module 11, calling module 12, obtain module 13, obtain module 14 and sending module 15
Receiving module 11 is instructed for receiving user by the speech recognition request that browser is sent, the speech recognition
Request instruction is used for request call microphone.
When browser needs to carry out speech recognition, browser sends speech recognition request by Html5 and instructs, and receives clear
Device of looking at sends speech recognition request instruction.
In the present embodiment, speech recognition request instruction can be user and input in a browser, be also possible to browser
Actively triggering is sent, and when browser identification currently needs to carry out speech recognition, triggering sends speech recognition request instruction.
Calling module 12 calls microphone for instructing in response to the speech recognition request.
After receiving the request instruction of browser, instructed in response to speech recognition request, according to speech recognition request
Microphone is called in instruction, directly goes to adjust microphone without using the Html5 language of browser, but uses the form of voice plug-in unit,
It is controlled by Html5 language local program (i.e. plug-in unit), local program goes to call microphone.
Specifically, calling module 12 includes:
First sub- identification module, for according to the speech recognition request instruct, identification current operation system whether be
Windows operating system;
First sub- calling module, if being that it is former to pass through windows for windows operating system for current operation system
Raw interface calls microphone.
Due to handling official business, most of the operating system used is all for windows, for this purpose, referring to receiving speech recognition request
After order, microphone is called by windows native interface.
In the present embodiment, calling module 12 includes:
First read module, for reading the user account information for currently logging in computer operating system;
First judgment module, for judging whether active user there is permission to call Mike according to the user account information
Wind;
First calling module, if for active user there is permission to call microphone, according to the speech recognition request
Microphone is called in instruction.
After receiving speech recognition request instruction, to judge whether active user there is permission to call microphone, read
The user account information for currently logging in computer operating system is taken to judge whether active user has according to user account information
Permission calls microphone, if there is active user permission to call microphone, allows speech recognition request to instruct, is known according to voice
Other request instruction calls microphone.If active user, which does not have permission, calls microphone, refuse speech recognition request instruction, no
It goes to call microphone.
In the present embodiment, each user account of preset configuration calls the permission of microphone, after configuring, use by reading
Family account information can get whether the user there is permission to call microphone.
Module 13 is obtained, for obtaining voice flow by the microphone.
After successfully calling microphone, user obtains voice flow by microphone by microphone input voice.
Module 14 is obtained, for carrying out text identification to the voice flow, obtains text information.
After obtaining voice flow by microphone, according to voice flow, voice flow is identified, voice flow is converted into
Text information, the text information after obtaining identification voice flow.
In the present embodiment, obtaining module 14 includes:
First sub- sending module, for the voice flow to be sent to speech recognition engine;
First sub-acquisition module, for obtaining the text information after speech recognition engine identification.
By interacting with speech recognition engine, voice flow is sent to speech recognition engine, speech recognition engine is to voice
Stream is identified, after identifying voice flow, obtains text information, and speech recognition engine again feeds back to text information, to obtain
Text information after speech recognition engine of learning from else's experience identification.
In some embodiments, obtaining module 14 includes:
First obtains module, the C++SDK for identifying by integrated speech, after identifying that the voice flow is identified
Text information, wherein the C++SDK is the Software Development Kit using C Plus Plus.
C++SDK is the Software Development Kit using C Plus Plus, the C++ identified by using C# language integrated speech
SDK makes the function of itself having speech recognition, after getting voice flow, directly identify to voice flow, after recognition
Obtain the corresponding text information of voice flow.
Sending module 15, for the text information to be sent to the browser.
After obtaining text information, text information is sent to browser, browser after receiving text information,
Text information is used on its web page be shown, to realize the speech recognition of browser.
In the present embodiment, the mounting means of voice plug-in unit is to be pacified voice plug-in unit silence by way of pushing installation kit
On computers, such mounting means is not only simple, conveniently, but also will not bother the normal work of user for dress, after finishing the installation,
Realization is facilitated to call microphone.
In the present embodiment, device 1 includes:
First receiving module, the control instruction sent in the browser by ActiveX control for receiving user,
The control instruction is the start and stop for controlling speech recognition;
First execution module, for executing the start and stop to speech recognition in response to the control instruction.
The control instruction that user is sent by ActiveX control in a browser, control instruction are known for controlling voice
Other start and stop, control instruction include starting control instruction or stop control instruction, receive browser and sent out by ActiveX control
The control instruction sent executes the start and stop to speech recognition according to control instruction, specifically, such as after receiving control instruction
Fruit control instruction is starting control instruction, executes starting according to starting control instruction, if control instruction is to stop control instruction,
The stopping to speech recognition is executed according to stopping control instruction.
In the present embodiment, device 1 includes:
Second receiving module, the voice broadcast request instruction sent for receiving user by the browser, institute's predicate
Sound broadcasts request instruction and the first text information is carried out voice broadcast for requesting;
Second obtains module, for being obtained wait broadcast from the browser in response to the voice broadcast request instruction
The first text information;
Second sending module identifies engine for first text information to be sent to the text-to-speech;
Third obtains module, for obtaining the voice messaging after text-to-speech identification engine identification;
Second calling module, for calling audio unit according to the voice broadcast request instruction;
First broadcasting module, for being broadcasted the voice messaging by the audio unit.
When browser needs to carry out voice broadcast, browser sends the first request instruction of voice broadcast, receives browsing
The voice broadcast request instruction that device is sent, after receiving voice broadcast request instruction, according to voice broadcast request instruction, from
The first text information to be broadcasted is obtained in browser, after obtaining the first text information, need by the first text information into
Row conversion, is converted into voice messaging, just can be carried out casting, for this purpose, by identifying that engine interacts with text-to-speech, by the first text
This information is sent to text-to-speech identification engine, and text-to-speech identification engine identifies the first text information, by the
One text information is converted, the voice messaging after obtaining the identification of the first text information later, and text-to-speech identifies that engine will
Voice messaging is fed back to.Obtain through text turn speech recognition engine identification after voice messaging, obtain voice messaging and then
Audio unit is called according to voice broadcast request instruction, is broadcasted voice messaging by audio unit, to realize browsing
The voice broadcast of device.
In the present embodiment, the second calling module includes:
First sub- read module, for reading the user account information for currently logging in computer operating system;
First sub- judgment module, for judging whether active user there is permission to call sound according to the user account information
Frequency unit;
First sub- calling module, if for active user there is permission to call audio unit, according to the voice broadcast
Request instruction calls audio unit.
After receiving voice broadcast request instruction, to judge whether active user there is permission to call audio unit,
It reads the current user account information for logging in computer operating system and judges whether active user has according to user account information
Calling audio unit is had permission, if there is active user permission to call audio unit, allows voice broadcast request instruction, according to
First request instruction calls audio unit.If active user, which does not have permission, calls audio unit, refuse voice broadcast request
Instruction does not go to call audio unit.
In conclusion request instruction of the voice plug-in unit according to browser, calls microphone, obtains voice by microphone
Stream, then voice flow is subjected to identification and obtains text information, text information is sent to browser later, to be implemented without logical
The Html5 interface of browser is crossed to call microphone to obtain voice flow, it is intended to speech recognition will be carried out by solving existing browser, by
In browser be IE browser or browser version it is too low do not support Html5 to call microphone, cause browser cannot
The problem of calling microphone to obtain voice flow by its Html5 interface.
As shown in figure 3, also providing a kind of computer equipment in the embodiment of the present application, which can be service
Device, internal structure can be as shown in Figure 3.The computer equipment includes processor, the memory, net connected by system bus
Network interface and database.Wherein, the processor of the Computer Design is for providing calculating and control ability.The computer equipment
Memory includes non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer journey
Sequence and database.The internal memory provides environment for the operation of operating system and computer program in non-volatile memory medium.
The database of the computer equipment is for storing the data such as the model of the audio recognition method based on artificial intelligence.The computer is set
Standby network interface is used to communicate with external terminal by network connection.To realize when the computer program is executed by processor
A kind of audio recognition method based on artificial intelligence.
Above-mentioned processor executes the step of above-mentioned audio recognition method based on artificial intelligence: receiving user and passes through browser
The speech recognition request of transmission instructs, and the speech recognition request instruction is used for request call microphone;In response to the voice
It identifies request instruction, calls microphone, and obtain the voice flow of user's input by the microphone;The voice flow is carried out
Text identification obtains text information;The text information is sent to the browser.
In one embodiment, in above-mentioned the step of carrying out text identification to the voice flow, obtaining text information, packet
It includes:
The voice flow is sent to speech recognition engine;
Obtain the text information after speech recognition engine identification;
Or
The C++SDK identified by integrated speech, identifies the text information after the voice flow is identified, wherein described
C++SDK is the Software Development Kit using C Plus Plus.
In one embodiment, in the step of above-mentioned calling microphone, comprising:
Read the user account information for currently logging in computer operating system;
Judge whether active user there is permission to call microphone according to the user account information;
If there is active user permission to call microphone, is instructed according to the speech recognition request and call microphone.
In one embodiment, after the above-mentioned the step of text information is sent to the browser, comprising:
The voice broadcast request instruction that user is sent by the browser is received, the voice broadcast request instruction is used for
First text information is carried out voice broadcast by request;
In response to the voice broadcast request instruction, the first text information to be broadcasted is obtained from the browser;
First text information is sent to the text-to-speech identification engine;
Obtain the voice messaging after text-to-speech identification engine identification;
According to the voice broadcast request instruction, audio unit is called;
The voice messaging is broadcasted by the audio unit.
In one embodiment, after the above-mentioned the step of text information is sent to the browser, comprising:
The control instruction that user is sent in the browser by ActiveX control is received, the control instruction is to use
In the start and stop of control speech recognition;
In response to the control instruction, the start and stop to speech recognition are executed.
In one embodiment, in the step of above-mentioned calling microphone, comprising:
It is instructed according to the speech recognition request, whether identification current operation system is windows operating system;
If current operation system is to call microphone by windows native interface for windows operating system.
In one embodiment, above-mentioned audio recognition method is the form using voice plug-in unit, the peace of the voice plug-in unit
Dress mode is to be installed the voice plug-in unit silence on computers by way of pushing installation kit.
It will be understood by those skilled in the art that structure shown in Fig. 3, only part relevant to application scheme is tied
The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme.
The computer equipment of the embodiment of the present application, voice plug-in unit call microphone, pass through according to the request instruction of browser
Microphone obtains voice flow, then voice flow is carried out identification and obtains text information, and text information is sent to browser later, from
And the Html5 interface by browser is implemented without to call microphone to obtain voice flow, it is intended to solve existing browser and want
Speech recognition is carried out, since the version that browser is IE browser or browser too low does not support Html5 to call Mike
Wind, cause browser cannot be called by its Html5 interface microphone obtain voice flow the problem of.
One embodiment of the application also provides a kind of computer readable storage medium, is stored thereon with computer program, calculates
Machine program realizes a kind of audio recognition method based on artificial intelligence when being executed by processor, specifically: reception user passes through clear
The speech recognition request instruction that device of looking at is sent, the speech recognition request instruction are used for request call microphone;In response to described
Microphone is called in speech recognition request instruction, and the voice flow of user's input is obtained by the microphone;To the voice flow
Text identification is carried out, text information is obtained;The text information is sent to the browser.
In one embodiment, in above-mentioned the step of carrying out text identification to the voice flow, obtaining text information, packet
It includes:
The voice flow is sent to speech recognition engine;
Obtain the text information after speech recognition engine identification;
Or
The C++SDK identified by integrated speech, identifies the text information after the voice flow is identified, wherein described
C++SDK is the Software Development Kit using C Plus Plus.
In one embodiment, in the step of above-mentioned calling microphone, comprising:
Read the user account information for currently logging in computer operating system;
Judge whether active user there is permission to call microphone according to the user account information;
If there is active user permission to call microphone, is instructed according to the speech recognition request and call microphone.
In one embodiment, after the above-mentioned the step of text information is sent to the browser, comprising:
The voice broadcast request instruction that user is sent by the browser is received, the voice broadcast request instruction is used for
First text information is carried out voice broadcast by request;
In response to the voice broadcast request instruction, the first text information to be broadcasted is obtained from the browser;
First text information is sent to the text-to-speech identification engine;
Obtain the voice messaging after text-to-speech identification engine identification;
According to the voice broadcast request instruction, audio unit is called;
The voice messaging is broadcasted by the audio unit.
In one embodiment, after the above-mentioned the step of text information is sent to the browser, comprising:
The control instruction that user is sent in the browser by ActiveX control is received, the control instruction is to use
In the start and stop of control speech recognition;
In response to the control instruction, the start and stop to speech recognition are executed.
In one embodiment, in the step of above-mentioned calling microphone, comprising:
It is instructed according to the speech recognition request, whether identification current operation system is windows operating system;
If current operation system is to call microphone by windows native interface for windows operating system.
In one embodiment, above-mentioned audio recognition method is the form using voice plug-in unit, the peace of the voice plug-in unit
Dress mode is to be installed the voice plug-in unit silence on computers by way of pushing installation kit.
The storage medium of the embodiment of the present application, voice plug-in unit call microphone, pass through wheat according to the request instruction of browser
Gram wind obtains voice flow, then voice flow is carried out identification and obtains text information, and text information is sent to browser later, thus
The Html5 interface by browser is implemented without to call microphone to obtain voice flow, it is intended to which solving existing browser will be into
Row speech recognition, due to browser be IE browser or browser version it is too low do not support Html5 to call microphone,
The problem of causing browser that microphone cannot be called to obtain voice flow by its Html5 interface.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with
Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer
In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein,
Any reference used in provided herein and embodiment to memory, storage, database or other media,
Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM
(PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include
Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms,
Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double speed are according to rate SDRAM (SSRSDRAM), enhancing
Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM
(RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
The foregoing is merely the preferred embodiments of the application, not to limit the application, all essences in the application
Made any modifications, equivalent replacements, and improvements etc., should be included within the scope of protection of this application within mind and principle.
Claims (10)
1. a kind of audio recognition method based on artificial intelligence, which is characterized in that the described method includes:
It receives user to instruct by the speech recognition request that browser is sent, the speech recognition request instruction is used for request call
Microphone;
It is instructed in response to the speech recognition request, calls microphone, and obtain the voice of user's input by the microphone
Stream;
Text identification is carried out to the voice flow, obtains text information;
The text information is sent to the browser.
2. the audio recognition method according to claim 1 based on artificial intelligence, which is characterized in that described to institute's predicate
In the step of sound stream carries out text identification, obtains text information, comprising:
The voice flow is sent to speech recognition engine;
Obtain the text information after speech recognition engine identification;
Or
The C++SDK identified by integrated speech, identifies the text information after the voice flow is identified, wherein the C++
SDK is the Software Development Kit using C Plus Plus.
3. the audio recognition method according to claim 1 based on artificial intelligence, which is characterized in that in the calling Mike
In the step of wind, comprising:
Read the user account information for currently logging in computer operating system;
Judge whether active user there is permission to call microphone according to the user account information;
If there is active user permission to call microphone, is instructed according to the speech recognition request and call microphone.
4. the audio recognition method according to claim 1 based on artificial intelligence, which is characterized in that described by the text
This information was sent to after the step of browser, comprising:
The voice broadcast request instruction that user is sent by the browser is received, the voice broadcast request instruction is for requesting
First text information is subjected to voice broadcast;
In response to the voice broadcast request instruction, the first text information to be broadcasted is obtained from the browser;
First text information is sent to the text-to-speech identification engine;
Obtain the voice messaging after text-to-speech identification engine identification;
According to the voice broadcast request instruction, audio unit is called;
The voice messaging is broadcasted by the audio unit.
5. the audio recognition method according to claim 1 based on artificial intelligence, which is characterized in that described by the text
This information was sent to after the step of browser, comprising:
The control instruction that user is sent in the browser by ActiveX control is received, the control instruction is for controlling
The start and stop of speech recognition processed;
In response to the control instruction, the start and stop to speech recognition are executed.
6. the audio recognition method according to claim 1 based on artificial intelligence, which is characterized in that in the calling Mike
In the step of wind, comprising:
It is instructed according to the speech recognition request, whether identification current operation system is windows operating system;
If current operation system is to call microphone by windows native interface for windows operating system.
7. the audio recognition method according to claim 1 based on artificial intelligence, which is characterized in that the speech recognition side
Method is the form using voice plug-in unit, and the mounting means of the voice plug-in unit is by way of pushing installation kit by the voice
Plug-in unit silence is installed on computers.
8. a kind of speech recognition equipment based on artificial intelligence, which is characterized in that described device includes:
Receiving module is instructed for receiving user by the speech recognition request that browser is sent, and the speech recognition request refers to
It enables and is used for request call microphone;
Calling module calls microphone for instructing in response to the speech recognition request;
Module is obtained, for obtaining the voice flow of user's input by the microphone;
Module is obtained, for carrying out text identification to the voice flow, obtains text information;
Sending module, for the text information to be sent to the browser.
9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists
In the processor realizes method described in any one of claims 1 to 7 when executing computer program the step of.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program
The step of method described in any one of claims 1 to 7 is realized when being executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811488395.9A CN109994110A (en) | 2018-12-06 | 2018-12-06 | Audio recognition method, device based on artificial intelligence, computer equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811488395.9A CN109994110A (en) | 2018-12-06 | 2018-12-06 | Audio recognition method, device based on artificial intelligence, computer equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109994110A true CN109994110A (en) | 2019-07-09 |
Family
ID=67128688
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811488395.9A Pending CN109994110A (en) | 2018-12-06 | 2018-12-06 | Audio recognition method, device based on artificial intelligence, computer equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109994110A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112700779A (en) * | 2020-12-29 | 2021-04-23 | 南方电网深圳数字电网研究院有限公司 | Voice interaction method, system, browser and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102520792A (en) * | 2011-11-30 | 2012-06-27 | 江苏奇异点网络有限公司 | Voice-type interaction method for network browser |
CN102968992A (en) * | 2012-11-26 | 2013-03-13 | 北京奇虎科技有限公司 | Voice identification processing method for internet explorer and internet explorer |
CN102981738A (en) * | 2012-10-31 | 2013-03-20 | 北京百度网讯科技有限公司 | Method and system and browser for carrying out interaction with webpage through microphone |
JP2016102899A (en) * | 2014-11-28 | 2016-06-02 | 日本電信電話株式会社 | Voice recognition device, voice recognition method, and voice recognition program |
CN106373574A (en) * | 2016-08-31 | 2017-02-01 | 乐视控股(北京)有限公司 | Speech recognition processing method and device |
CN107943405A (en) * | 2016-10-13 | 2018-04-20 | 广州市动景计算机科技有限公司 | Sound broadcasting device, method, browser and user terminal |
-
2018
- 2018-12-06 CN CN201811488395.9A patent/CN109994110A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102520792A (en) * | 2011-11-30 | 2012-06-27 | 江苏奇异点网络有限公司 | Voice-type interaction method for network browser |
CN102981738A (en) * | 2012-10-31 | 2013-03-20 | 北京百度网讯科技有限公司 | Method and system and browser for carrying out interaction with webpage through microphone |
CN102968992A (en) * | 2012-11-26 | 2013-03-13 | 北京奇虎科技有限公司 | Voice identification processing method for internet explorer and internet explorer |
JP2016102899A (en) * | 2014-11-28 | 2016-06-02 | 日本電信電話株式会社 | Voice recognition device, voice recognition method, and voice recognition program |
CN106373574A (en) * | 2016-08-31 | 2017-02-01 | 乐视控股(北京)有限公司 | Speech recognition processing method and device |
CN107943405A (en) * | 2016-10-13 | 2018-04-20 | 广州市动景计算机科技有限公司 | Sound broadcasting device, method, browser and user terminal |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112700779A (en) * | 2020-12-29 | 2021-04-23 | 南方电网深圳数字电网研究院有限公司 | Voice interaction method, system, browser and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022116825A1 (en) | Artificial intelligence-based audio processing method and apparatus, electronic device, computer readable storage medium, and computer program product | |
US10388277B1 (en) | Allocation of local and remote resources for speech processing | |
CN109636317B (en) | Service control method, device, system and storage medium | |
JP2020533628A (en) | Selective voice-activated memory for voice capture devices | |
US20150149560A1 (en) | System and method for relaying messages | |
CN112201222B (en) | Voice interaction method, device, equipment and storage medium based on voice call | |
CN107018228B (en) | Voice control system, voice processing method and terminal equipment | |
CN109995861B (en) | Relay communication method and system for vehicle-mounted system application and vehicle-mounted peripheral device | |
CN111599358A (en) | Voice interaction method and electronic equipment | |
CN108650419A (en) | Telephone interpretation system based on smart mobile phone | |
CN106373566A (en) | Data transmission control method and device | |
DE102012219020A1 (en) | ARCHITECTURE FOR MOBILE TUNING PLATFORM | |
CN109994110A (en) | Audio recognition method, device based on artificial intelligence, computer equipment | |
CN113159483A (en) | Task scheduling method and device based on RPA and AI, robot and medium | |
JP6689953B2 (en) | Interpreter service system, interpreter service method, and interpreter service program | |
CN112017663A (en) | Voice generalization method and device and computer storage medium | |
CN111814494A (en) | Language translation method and device and computer equipment | |
CN106686245B (en) | Working mode adjusting method and device | |
CN110459209A (en) | Audio recognition method, device, equipment and storage medium | |
CN103176998A (en) | Read auxiliary system based on voice recognition | |
CN113411503B (en) | Cloud mobile phone camera preview method and device, computer equipment and storage medium | |
CN116312472A (en) | Method and device for designing robot speaking group, computer equipment and storage medium | |
CN114390144A (en) | Intelligent processing method, device and control system for voice incoming call | |
CN112003991A (en) | Outbound method and related equipment | |
KR101944187B1 (en) | voice services providing method with visually impaired |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |