CN109686373A - A kind of audio recognition method and system - Google Patents
A kind of audio recognition method and system Download PDFInfo
- Publication number
- CN109686373A CN109686373A CN201910145754.9A CN201910145754A CN109686373A CN 109686373 A CN109686373 A CN 109686373A CN 201910145754 A CN201910145754 A CN 201910145754A CN 109686373 A CN109686373 A CN 109686373A
- Authority
- CN
- China
- Prior art keywords
- processing unit
- central processing
- speech recognition
- recognition result
- server
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000012545 processing Methods 0.000 claims abstract description 167
- 238000004088 simulation Methods 0.000 claims abstract description 24
- 239000004973 liquid crystal related substance Substances 0.000 claims description 7
- 235000013399 edible fruits Nutrition 0.000 claims description 6
- 230000005540 biological transmission Effects 0.000 claims description 5
- 230000006870 function Effects 0.000 description 13
- 238000010586 diagram Methods 0.000 description 12
- 238000006243 chemical reaction Methods 0.000 description 7
- 230000002452 interceptive effect Effects 0.000 description 7
- 238000004891 communication Methods 0.000 description 6
- 230000003993 interaction Effects 0.000 description 5
- 238000004519 manufacturing process Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 230000000087 stabilizing effect Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/221—Announcement of recognition results
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The present invention provides a kind of audio recognition method and system, and this method is suitable for speech recognition system, and speech recognition system includes voice acquisition device, server, display device and central processing unit.This method are as follows: collected speech simulation data are converted to toned digital data by voice acquisition device, and are sent to central processing unit.Toned digital data is sent to server and handled by central processing unit, obtains speech recognition result.Speech recognition result is fed back to central processing unit by server.Central processing unit is by speech recognition result to display device.Display device is set to show speech recognition result.In scheme provided by the invention, voice data concurrency, which is acquired, by voice acquisition device gives central processing unit, voice data is sent to server and is handled to obtain speech recognition result by central processing unit, speech recognition result is sent to display device and shows speech recognition result by central processing unit, improves the user experience and improve property easy to use.
Description
Technical field
The present invention relates to technical field of voice recognition, and in particular to a kind of audio recognition method and system.
Background technique
With the development of science and technology, interactive voice is gradually spread in daily life.Interactive voice is defeated based on voice
The interactive mode entered can be obtained by feedback result by speaking.
Interactive voice mode relatively conventional at present is to acquire voice data by microphone, and voice data is passed through processing
Device obtains feedback result after being handled.But existing voice interaction technique is only played speech recognition result by power amplifier
Out, in the noisy environment of noise, user can not catch speech recognition result, and user is caused not know that final voice is known
Other result.User experience degree is low and inconvenient to use.
Summary of the invention
In view of this, the embodiment of the present invention provides a kind of audio recognition method and system, to solve existing voice interaction skill
The problems such as user experience degree existing for art is low and inconvenient to use.
To achieve the above object, the embodiment of the present invention provides the following technical solutions:
First aspect of the embodiment of the present invention discloses a kind of speech recognition system, the system comprises: voice acquisition device,
Server, display device and central processing unit;
The voice acquisition device is connect with the central processing unit, for being converted to collected speech simulation data
Toned digital data, and the toned digital data is transferred to the central processing unit;
The central processing unit is connect with the server and display device respectively, for passing the toned digital data
It is defeated by the server, and receives the speech recognition result of the server feedback, and institute's speech recognition result is sent
To the display device;
The server handles the voice for receiving the toned digital data of the central processing unit transmission
Numerical data obtains institute's speech recognition result, and institute's speech recognition result is fed back to the central processing unit;
The display device, for receiving and showing institute's speech recognition result.
Preferably, the system also includes image collecting devices;
Described image acquisition device is connect with the central processing unit, for acquiring image data, and by described image number
According to being transferred to the central processing unit;
Correspondingly, the central processing unit is also used to for described image data being transmitted to the server, and described in reception
Described image processing result is sent to the display device by the processing result image of server feedback;
The server is also used to handle described image data, and obtained processing result image is fed back to
The central processing unit;
The display device is also used to show described image processing result.
Preferably, described image acquisition device includes camera.
Preferably, the system also includes playing devices;
The central processing unit is connect with the playing device, the central processing unit, is also used to the speech recognition
As a result it is transferred to the playing device;
The playing device, for receiving and playing institute's speech recognition result of the central processing unit transmission.
Preferably, the playing device includes audio power amplifier.
Preferably, the system also includes LED arrays;
The central processing unit is connect with the LED array, and the central processing unit is also used to identify the voice collecting
Device acquires the direction of the speech simulation data;
The LED array is used to indicate the direction that the voice acquisition device acquires the speech simulation data.
Preferably, the voice acquisition device includes microphone.
Preferably, the display device includes liquid crystal display.
Second aspect of the embodiment of the present invention discloses a kind of audio recognition method, is suitable for first aspect of the embodiment of the present invention
Disclosed speech recognition system, the speech recognition system include voice acquisition device, server, display device and central processing
Device, which comprises
Collected speech simulation data are converted to toned digital data by the voice acquisition device, and are sent to described
Central processing unit;
The toned digital data is sent to the server and handled by the central processing unit, obtains speech recognition
As a result;
Institute's speech recognition result is fed back to the central processing unit by the server;
Institute's speech recognition result is sent to the display device by the central processing unit, shows the display device
Institute's speech recognition result.
Preferably, further includes:
Institute's speech recognition result is sent to playing device by the central processing unit, is made described in the playing device broadcasting
Speech recognition result.
A kind of audio recognition method and system provided based on the embodiments of the present invention, this method are suitable for speech recognition
System, speech recognition system include voice acquisition device, server, display device and central processing unit.This method are as follows: voice is adopted
Collected speech simulation data are converted to toned digital data by acquisition means, and are sent to central processing unit.Central processing unit
Toned digital data is sent to server to handle, obtains speech recognition result.Server feeds back speech recognition result
To central processing unit.Central processing unit is by speech recognition result to display device.Display device is set to show speech recognition result.?
In the present solution, acquiring voice data concurrency by voice acquisition device gives central processing unit, central processing unit is by voice data
It is sent to server to be handled to obtain speech recognition result, speech recognition result is sent to display device and shown by central processing unit
Show speech recognition result, in the case where noise is noisy, shows that speech recognition result can be improved user and make using display device
With experiencing and improve property easy to use.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis
The attached drawing of offer obtains other attached drawings.
Fig. 1 is a kind of flow chart of audio recognition method provided in an embodiment of the present invention;
Fig. 2 is a kind of structural block diagram of speech recognition system provided in an embodiment of the present invention;
Fig. 3 is a kind of structural block diagram of speech recognition system provided in an embodiment of the present invention;
Fig. 4 is a kind of structural block diagram of speech recognition system provided in an embodiment of the present invention;
Fig. 5 is a kind of structural block diagram of speech recognition system provided in an embodiment of the present invention;
Fig. 6 is a kind of configuration diagram of speech recognition system provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
In this application, the terms "include", "comprise" or any other variant thereof is intended to cover non-exclusive inclusion,
So that the process, method, article or equipment for including a series of elements not only includes those elements, but also including not having
The other element being expressly recited, or further include for elements inherent to such a process, method, article, or device.Do not having
There is the element limited in the case where more limiting by sentence "including a ...", it is not excluded that in the mistake including the element
There is also other identical elements in journey, method, article or equipment.
It can be seen from background technology that, existing voice interaction technique is only played back speech recognition result by power amplifier,
In the noisy environment of noise, user can not catch speech recognition result, and user is caused not know final speech recognition result.
User experience degree is low and inconvenient to use.
Therefore, the embodiment of the present invention provides a kind of audio recognition method and system, acquires voice by voice acquisition device
Data Concurrent gives central processing unit, and voice data is sent to server and is handled to obtain speech recognition knot by central processing unit
Speech recognition result is sent to display device and shows speech recognition result by fruit, central processing unit, to solve existing voice interaction
User experience degree existing for technology is low and inconvenient problem with use.
With reference to Fig. 1, a kind of flow chart of audio recognition method provided in an embodiment of the present invention is shown, this method is suitable for
Speech recognition system, the speech recognition system include voice acquisition device, server, display device and central processing unit, institute
State method the following steps are included:
Step S101: collected speech simulation data are converted to toned digital data by the voice acquisition device, and
It is sent to the central processing unit.
During implementing step S101, the voice acquisition device includes but are not limited to microphone, described
Voice acquisition device by the collected speech simulation data conversion at the toned digital data, and by integrated circuit
Set audio-frequency bus (Inter-IC Sound, I2S) interface or pulse density modulated (Pulse Density Modulation,
PDM) toned digital data is sent to the central processing unit by interface.
Preferably, during implementing step S101, the central processing unit identifies the voice acquisition device
The direction of the speech simulation data is acquired, and indicates that the voice acquisition device acquires the speech simulation by LED array
The direction of data.
Preferably, during specific implementation step S101, using image acquisition device image data, and by institute
It states image data and is transferred to the central processing unit.Described image acquisition device includes but are not limited to camera.
Step S102: the toned digital data is sent to the server and handled by the central processing unit, is obtained
To speech recognition result.
During implementing step S102, the central processing unit passes through wireless network for the speech digit number
It is handled according to the server is sent to, obtains speech recognition result.
Preferably, the central processing also passes through wireless network and described image data is sent at the server
Reason, obtains processing result image.
Step S103: institute's speech recognition result is fed back to the central processing unit by the server.
During implementing step S103, the server is anti-by institute's speech recognition result by wireless network
It feeds the central processing unit.
Preferably, the server also passes through wireless network described image processing result is fed back to the central processing
Device.
Step S104: institute's speech recognition result is sent to the display device by the central processing unit, is made described aobvious
Showing device shows institute's speech recognition result.
During implementing step S104, the display device is shown described in a manner of text and/or image
Speech recognition result.How more preferably to illustrate using display device display institute's speech recognition result, pass through process
A1-A4 is illustrated:
A1, assume that the collected speech simulation data of voice acquisition device are " how is the weather in the city A tomorrow ", by voice
Analogue data is converted into toned digital data and is sent to central processing unit.
The toned digital data is sent to server and handled by A2, the central processing unit, is obtained comprising weather
The speech recognition result of forecast information.
Speech recognition result comprising weather forecast information is fed back to the central processing unit, institute by A3, the server
It states central processing unit and the speech recognition result comprising weather forecast information is sent to the display device.
A4, the display device show information such as " weather in the city A tomorrow are as follows: fine day " in the form of text.
Optionally, it is assumed that the central controller is also connect with LED array, in implementation procedure A1, the center control
Device indicates the side of the collected speech simulation data of the voice acquisition device by the state of the LED light in control LED array
To.
For example the direction of the collected speech simulation data of voice acquisition device is positive east, then passes through the LED
The direction of the collected speech simulation data of voice acquisition device described in the light on and off state instruction of LED light in array is due east
Side.
Optionally, it is assumed that the central controller is also connect with playing device, in implementation procedure A3, the center control
Institute's speech recognition result is also sent to the playing device by device, make the playing device play " weather in the city A tomorrow are as follows:
The voice messaging of fine day ".
It should be noted that the content in above-mentioned A1-A4 is suitable only for illustrating.
Preferably, institute's speech recognition result is also sent to playing device by the central processing unit, fills the broadcasting
It sets and plays institute's speech recognition result.The playing device includes but are not limited to audio power amplifier, can have broadcasting to be any
The equipment of sound effect function is specifically selected and is used according to the actual situation by technical staff.The playing device receives institute
The form for stating central processing unit transmission is the speech recognition result of digital signal, and the playing device is by institute's speech recognition result
It is converted into analog signal from digital signal, and is played out.While playing institute's speech recognition result, pass through I2S interface
Or the speech recognition result that analog-digital converter back production has played, the data feedback that back production is obtained give the central processing
Device.
Preferably to illustrate the relevant content of the above-mentioned image collecting device being related to, below by process B1-B4
It is illustrated:
B1, assume that the image data that image acquisition device arrives is the shape of B product, and the B product will be included
The described image data of shape are sent to central processing unit.
Described image data are sent to server and handled by B2, the central processing unit, are obtained comprising B product information
Processing result image.
Processing result image comprising B product information is fed back to the central processing unit by B3, the server, it is described in
Processing result image comprising B product information is sent to display device by central processor.
B4, the display device show the processing result image comprising B product information, for example show the B product
The information such as sample pictures, title, price and purposes.
It should be noted that the content that above process B1-B4 is related to is only used for illustrating.
In embodiments of the present invention, voice data concurrency is acquired by voice acquisition device and gives central processing unit, center
Voice data is sent to server and is handled to obtain speech recognition result by processor, and central processing unit is by speech recognition result
It is sent to display device and shows speech recognition result, in the case where noise is noisy, speech recognition knot is shown using display device
Fruit can be improved user experience and improve property easy to use.
It is corresponding with a kind of audio recognition method that the embodiments of the present invention provide, with reference to Fig. 2, show of the invention real
The structural block diagram that a kind of speech recognition system of example offer is provided, the system comprises: voice acquisition device 201, server 202,
Display device 203 and central processing unit 204.
The voice acquisition device 201 is connect with the central processing unit 204, is used for collected speech simulation data
Toned digital data is converted to, and the toned digital data is transferred to the central processing unit 204.
In the concrete realization, the voice acquisition device 201 is by the collected speech simulation data conversion at described
Toned digital data, and the toned digital data is sent to by the central processing unit by I2S interface or PDM interface
204.It should be noted that the voice acquisition device 201 includes but are not limited to microphone, there can be acquisition language to be any
The equipment of sound function is specifically selected and is used according to the actual situation by technical staff.
The central processing unit 204 is connect with the server 202 and display device 203 respectively, is used for the voice
Digital data transfer gives the server 202, and receives the speech recognition result that the server 202 is fed back, and will be described
Speech recognition result is sent to the display device 203.
In the concrete realization, the toned digital data is sent to institute by wireless network by the central processing unit 204
It states server 202 to be handled, obtains speech recognition result.It should be noted that the central processing unit 204 can be any
Chip with processing data function.
The server 202, the toned digital data transmitted for receiving the central processing unit 204, handles institute
It states toned digital data and obtains institute's speech recognition result, institute's speech recognition result is fed back into the central processing unit 204.
The display device 202, for receiving and showing institute's speech recognition result.
In the concrete realization, the display device 202 shows the speech recognition knot in a manner of text and/or image
Fruit.It should be noted that the display device 202 includes but are not limited to liquid crystal display, there can be display function to be any
The equipment of energy, is specifically selected and is used according to the actual situation by technical staff.
In embodiments of the present invention, voice data concurrency is acquired by voice acquisition device and gives central processing unit, center
Voice data is sent to server and is handled to obtain speech recognition result by processor, and central processing unit is by speech recognition result
It is sent to display device and shows speech recognition result, in the case where noise is noisy, speech recognition knot is shown using display device
Fruit can be improved user experience and improve property easy to use.
Preferably, a kind of structure of speech recognition system provided in an embodiment of the present invention is shown with reference to Fig. 3 in conjunction with Fig. 2
Block diagram, the system also includes: image collecting device 205.
Described image acquisition device 205 is connect with the central processing unit 204, for acquiring image data, and will be described
Image data is transferred to the central processing unit 204.
Correspondingly, the central processing unit 204, is also used to for described image data to be transmitted to the server 202, and connect
The processing result image that the server 202 is fed back is received, described image processing result is sent to the display device 203.
The server 202 is also used to handle described image data, and obtained processing result image is fed back
To the central processing unit 204.
The display device 203, is also used to show described image processing result.
It should be noted that described image acquisition device 205 includes but are not limited to camera, there can be figure to be any
As the equipment of acquisition function, is specifically selected and used according to the actual situation by technical staff.
In embodiments of the present invention, pass through image acquisition device image data and be sent to central processing unit, center
Image data is sent to server and is handled to obtain processing result image by processor, and central processing unit is by processing result image
It is sent to display device and shows processing result image, improve the user experience and improve property easy to use.
Preferably, a kind of structure of speech recognition system provided in an embodiment of the present invention is shown with reference to Fig. 4 in conjunction with Fig. 2
Block diagram, the system also includes: playing device 206.
The central processing unit 204 is connect with the playing device 206, and the central processing unit 204, being also used to will be described
Speech recognition result is transferred to the playing device 206;
The playing device 206, the speech recognition knot transmitted for receiving and playing the central processing unit 204
Fruit.
In the concrete realization, the playing device 206 receives the form that the central processing unit 204 is sent as number letter
Number speech recognition result, institute's speech recognition result is converted into analog signal from digital signal by the playing device 206, and
It plays out.While playing institute's speech recognition result, played by I2S interface or analog-digital converter back production
Speech recognition result, the data feedback that back production is obtained give the central processing unit 204.
It should be noted that the playing device 206 includes but are not limited to audio power amplifier, there can be broadcasting to be any
The equipment of sound effect function is specifically selected and is used according to the actual situation by technical staff.
In embodiments of the present invention, voice data concurrency is acquired by voice acquisition device and gives central processing unit, center
Voice data is sent to server and is handled to obtain speech recognition result by processor, and central processing unit is by speech recognition result
It is sent to display device and playing device, shows and play speech recognition result, in the case where noise is noisy, is filled using display
Setting display speech recognition result can be improved user experience and improves property easy to use.
Preferably, a kind of structure of speech recognition system provided in an embodiment of the present invention is shown with reference to Fig. 5 in conjunction with Fig. 2
Block diagram, the system also includes: LED array 207.
The central processing unit 204 is connect with the LED array 207, and the central processing unit 204 is also used to identify institute
State the direction that voice acquisition device 201 acquires the speech simulation data.
The LED array 207 is used to indicate the direction that the voice acquisition device 201 acquires the speech simulation data.
The LED array 207 includes but are not limited to the LED array being made of 12 full-color LED lamps and LED drive chip, described
Central processing unit 204 is arranged by controlling the LED drive chip and lights the LED light, to realize the function in instruction direction
Energy.
It should be noted that the central processing unit 204 of the above-mentioned Fig. 2 into Fig. 5, passes through universal serial bus
(Universal Serial Bus, USB) power supply or 5V power supply are power supply, and generate difference by DC-DC chip
Voltage be that each device is powered, such as by the voltages such as DC-DC chip generation+3.3V ,+1.8V and+1.35V, for
Each device that the central processing unit 204 connects is powered.
In the concrete realization, with reference to Fig. 6, a kind of frame of speech recognition system 300 provided in an embodiment of the present invention is shown
Structure schematic diagram, in described Fig. 6, the speech recognition system 300 include voice acquisition device 301, server (server) 302,
Display device 303, central processing unit 304, image collecting device 305, playing device 306 and LED array 307.
The central processing 304 can be communicated with communication module 308.
The central processing unit 304 further includes interactive module 309, and interactive module 309 is used to carry out letter with communication module 308
Breath interaction.The interactive module 309 can be secure digital input-output card (Secure Digital Input and
Output, SDIO) or universal asynchronous receiving-transmitting transmitter (Universal Asynchronous Receiver/Transmitter,
UART), SDIO or UART is connected with the communication module 308, and the communication module 308 can be WiFi chip or bluetooth core
Piece.The central processing unit 304 is passed through by carrying out information exchange between SDIO or UART and the communication module 308
The communication module 308 carries out information exchange with the server 302 or other equipment.
The central processing unit 304 further includes at least one memory module, for storing data, such as voice data, view
Frequency evidence, image data etc..As shown in fig. 6, memory module 310a can be with the storage outside the central processing unit 304
Module 310b carries out information exchange.The memory module 310a, the memory module 310b can be EMMC
The storage chip of (EmbeddedMulti Media Card) card or other forms.The central processing unit 304 can also lead to
It crosses Double Data Rate controller 311 (Double Data Rae controller, DDR) to connect with memory 312, and then by depositing
Reservoir 312 carrys out storing data.
The central processing unit further includes USB interface, is connected by USB interface 313a with external USB interface 313b,
In, USB interface 313b can support OTG (On-The-Go) agreement.
The central processing unit 304 can also include analog-digital converter 314 (Analog-to-Digital Converter,
ADC), the analog-digital converter 314 is connected with key module 320.By operating the key module 320, control can be referred to
Order is sent to the analog-digital converter 314 in a manner of analog signal, and the simulation that the analog-digital converter 314 inputs equipment is believed
Number digital signal is converted to, the other the modules then digital signal being sent in the central processing unit 304.It needs to illustrate
, key module 320 can be the module containing 4 keys, is also possible to the module containing 6 keys, the quantity and function of key
It can be not limited to that, can be selected according to actual needs.
The central processing unit 304 can also be connected with the display device 303, wherein the display device 303 can be with
It is liquid crystal display (Liquid Crystal Display, LCD), LCD or organic light emission with touch function
Diode (Organic Light-Emitting Diode, OLED).The display device 303 can be at mobile industry
Manage the liquid crystal display of device interface (Mobile Industry Processor Interface, MIPI), the central processing unit
304 can be connected by interface 315a and the LCD liquid crystal display with MIPI interface or the touch display with I2C interface
Connect, interface 315a can be MIPI interface, be also possible to universal input output (General Purpose Input Output,
GPIO) interface or I2C interface.Central processing unit 304 can also include interface 315b, and interface 315b can be GPIO or I2C connects
Mouthful, for speech recognition result to be sent to LCD display, LCD display shows the speech recognition result.
The central processing unit 304 can also include interface 315c, for being connected with LED array 307.Wherein, interface
315c can be I2C interface, can be sent and be instructed to LED array 307 by interface 315c, pass through the LED in LED array 307
Driver drives LED array, indicates the source direction of the acquired voice data of microphone.
The central processing unit 304 can also include interface 315d, pass through interface 315d and 305 phase of image collecting device
Even.Wherein, image collecting device 305 can be the camera with MIPI interface, and interface 315d can be MIPI interface.Institute
Stating central processing unit 304 can receive the video data acquired by the camera with MIPI interface.
The central processing unit 304 can also be connected by interface 315e with voice acquisition device 301, wherein interface 315e
It can be PDM interface or I2S interface, voice acquisition device 301 can be the circuit board for being integrated with microphone array, the circuit board
It can also include 6 to prepare using Micro Electro Mechanical System (Micro-Electro-Mechanical System, MEMS) technique
Microphone, these microphones form microphone array, acquire voice data by the microphone array.
The central processing unit 304 can also include power module 316, and power module 316 is for managing the centre
Manage the electric power thus supplied of device 304.Power module 316 is connected with Voltage stabilizing module 317, Voltage stabilizing module 317 can be BUCK voltage-stablizer or
The leading voltage-stablizer of person's low pressure (low dropout regulator, LDO), Voltage stabilizing module 317 connects to power supply, and power supply passes through steady
Die block 316 is powered to the central processing unit 304.The power supply can be the power supply of 5V/3A.
The speech recognition system 300 can also include conversion module 318, and conversion module 318 may include that ADC and modulus turn
Parallel operation (Digital-to-Analog Converter, DAC).In the central processing unit 304 can also include interface 315f and
Interface 315g, interface 315f can be I2S interface, and interface 315g can be PDM interface, and the central processing unit 304 can lead to
It crosses I2S interface or PDM interface and speech recognition result is sent to the conversion module 318, the conversion module 318 passes through ADC
Speech recognition result is converted into analog signal by digital signal, and passes through (the audio power of audio-frequency power amplifier 319
Amplifiers, Audio Power Amps) it is sent to playing device 306, playing device 306 puts institute's speech recognition result
Big and broadcasting, and the speech recognition result of broadcasting is converted by digital signal by the DAC in conversion module 318, and pass through I2S
Or the back production of PDM interface is into the central processing unit 304.It should be noted that playing device 306 can be loudspeaker, or
Loudspeaker with bluetooth or wireless connecting function.
It should be noted that the black arrow in Fig. 6 indicates the direction of information exchange in speech recognition system 300, double arrows
Head indicates that information can indicate that information can be with unidirectional delivery with bi-directional, single arrow.
It should be noted that aforementioned all kinds of interfaces, such as interface 315a, interface 315b, interface 315c, interface 315d, connect
The type of mouth 315e, 315f, 315g etc. can be selected according to actual needs, be not limited in the listed example enumerated.
It should be noted that the central processing unit 304 in above-mentioned Fig. 6, can pass through universal serial bus
(Universal Serial Bus, USB) power supply or 5V power supply are power supply, and generate difference by DC-DC chip
Voltage be that each device is powered, such as by the voltages such as DC-DC chip generation+3.3V ,+1.8V and+1.35V, for
Each device that the central processing unit 204 connects is powered.
It should be noted that the configuration diagram for the speech recognition system being related in above-mentioned Fig. 6 be only used for for example,
Specific speech recognition system architecture diagram includes but are not limited to the content shown in Fig. 6, different one illustrates herein.
In order to realize above-mentioned speech recognition system 300, it can choose framework and suitably handle chip as the centre
Device 304, such as Intel chip or AMD chip are managed, or redesigns chip structure according to actual needs.
In conclusion the embodiment of the present invention provides a kind of audio recognition method and system, this method is suitable for speech recognition
System, speech recognition system include voice acquisition device, server, display device and central processing unit.This method are as follows: voice is adopted
Collected speech simulation data are converted to toned digital data by acquisition means, and are sent to central processing unit.Central processing unit
Toned digital data is sent to server to handle, obtains speech recognition result.Server feeds back speech recognition result
To central processing unit.Central processing unit is by speech recognition result to display device.Display device is set to show speech recognition result.?
In the present solution, acquiring voice data concurrency by voice acquisition device gives central processing unit, central processing unit is by voice data
It is sent to server to be handled to obtain speech recognition result, speech recognition result is sent to display device and shown by central processing unit
Show speech recognition result, in the case where noise is noisy, shows that speech recognition result can be improved user and make using display device
With experiencing and improve property easy to use.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system or
For system embodiment, since it is substantially similar to the method embodiment, so describing fairly simple, related place is referring to method
The part of embodiment illustrates.System and system embodiment described above is only schematical, wherein the conduct
The unit of separate part description may or may not be physically separated, component shown as a unit can be or
Person may not be physical unit, it can and it is in one place, or may be distributed over multiple network units.It can root
According to actual need that some or all of the modules therein is selected to achieve the purpose of the solution of this embodiment.Ordinary skill
Personnel can understand and implement without creative efforts.
Professional further appreciates that, unit described in conjunction with the examples disclosed in the embodiments of the present disclosure
And algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and
The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These
Function is implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Profession
Technical staff can use different methods to achieve the described function each specific application, but this realization is not answered
Think beyond the scope of this invention.
The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention.
Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein
General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention
It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one
The widest scope of cause.
Claims (10)
1. a kind of speech recognition system, which is characterized in that the system comprises: voice acquisition device, server, display device and
Central processing unit;
The voice acquisition device is connect with the central processing unit, for collected speech simulation data to be converted to voice
Numerical data, and the toned digital data is transferred to the central processing unit;
The central processing unit is connect with the server and display device respectively, for the toned digital data to be transferred to
The server, and the speech recognition result of the server feedback is received, and institute's speech recognition result is sent to institute
State display device;
The server handles the speech digit for receiving the toned digital data of the central processing unit transmission
Data obtain institute's speech recognition result, and institute's speech recognition result is fed back to the central processing unit;
The display device, for receiving and showing institute's speech recognition result.
2. system according to claim 1, which is characterized in that the system also includes: image collecting device;
Described image acquisition device is connect with the central processing unit, is passed for acquiring image data, and by described image data
It is defeated by the central processing unit;
Correspondingly, the central processing unit is also used to described image data being transmitted to the server, and receives the service
The processing result image of device feedback, is sent to the display device for described image processing result;
The server is also used to handle described image data, and obtained processing result image is fed back to described
Central processing unit;
The display device is also used to show described image processing result.
3. system according to claim 2, which is characterized in that described image acquisition device includes camera.
4. system according to claim 1, which is characterized in that the system also includes: playing device;
The central processing unit is connect with the playing device, the central processing unit, is also used to institute's speech recognition result
It is transferred to the playing device;
The playing device, for receiving and playing institute's speech recognition result of the central processing unit transmission.
5. system according to claim 3, which is characterized in that the playing device includes audio power amplifier.
6. system according to any one of claims 1-5, which is characterized in that the system also includes: LED array;
The central processing unit is connect with the LED array, and the central processing unit is also used to identify the voice acquisition device
Acquire the direction of the speech simulation data;
The LED array is used to indicate the direction that the voice acquisition device acquires the speech simulation data.
7. system according to any one of claims 1-5, which is characterized in that the voice acquisition device includes Mike
Wind.
8. system according to any one of claims 1-5, which is characterized in that the display device includes liquid crystal display
Device.
9. a kind of audio recognition method, which is characterized in that be suitable for speech recognition system, the speech recognition system includes voice
Acquisition device, server, display device and central processing unit, which comprises
Collected speech simulation data are converted to toned digital data by the voice acquisition device, and are sent to the center
Processor;
The toned digital data is sent to the server and handled by the central processing unit, obtains speech recognition knot
Fruit;
Institute's speech recognition result is fed back to the central processing unit by the server;
Institute's speech recognition result is sent to the display device by the central processing unit, is made described in the display device shows
Speech recognition result.
10. according to the method described in claim 9, it is characterized by further comprising:
Institute's speech recognition result is sent to playing device by the central processing unit, and the playing device is made to play the voice
Recognition result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910145754.9A CN109686373A (en) | 2019-02-27 | 2019-02-27 | A kind of audio recognition method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910145754.9A CN109686373A (en) | 2019-02-27 | 2019-02-27 | A kind of audio recognition method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109686373A true CN109686373A (en) | 2019-04-26 |
Family
ID=66197052
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910145754.9A Pending CN109686373A (en) | 2019-02-27 | 2019-02-27 | A kind of audio recognition method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109686373A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110418118A (en) * | 2019-09-05 | 2019-11-05 | 深圳伯图康卓智能科技有限公司 | A kind of video interactive speech recognition intelligent temperature control system |
CN111857646A (en) * | 2020-08-05 | 2020-10-30 | 上海茂声智能科技有限公司 | System for quickly realizing voice interaction function |
CN112306355A (en) * | 2019-07-24 | 2021-02-02 | 北京迪文科技有限公司 | Display device and method supporting voice recognition |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20100008152A (en) * | 2008-07-15 | 2010-01-25 | 계영정보통신(주) | Speech recognition system for security |
CN102890931A (en) * | 2012-09-25 | 2013-01-23 | 四川长虹电器股份有限公司 | Method for increasing voice recognition rate |
CN104700836A (en) * | 2013-12-10 | 2015-06-10 | 阿里巴巴集团控股有限公司 | Voice recognition method and voice recognition system |
CN108305627A (en) * | 2018-03-30 | 2018-07-20 | 合肥惠科金扬科技有限公司 | A kind of intelligent display and system |
CN108447479A (en) * | 2018-02-02 | 2018-08-24 | 上海大学 | The robot voice control system of noisy work condition environment |
CN208315196U (en) * | 2018-03-30 | 2019-01-01 | 合肥惠科金扬科技有限公司 | A kind of intelligent display and system |
CN109120993A (en) * | 2018-09-30 | 2019-01-01 | Tcl通力电子(惠州)有限公司 | Audio recognition method, intelligent terminal, speech recognition system and readable storage medium storing program for executing |
CN209691386U (en) * | 2019-02-27 | 2019-11-26 | 北京声智科技有限公司 | A kind of speech recognition system |
-
2019
- 2019-02-27 CN CN201910145754.9A patent/CN109686373A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20100008152A (en) * | 2008-07-15 | 2010-01-25 | 계영정보통신(주) | Speech recognition system for security |
CN102890931A (en) * | 2012-09-25 | 2013-01-23 | 四川长虹电器股份有限公司 | Method for increasing voice recognition rate |
CN104700836A (en) * | 2013-12-10 | 2015-06-10 | 阿里巴巴集团控股有限公司 | Voice recognition method and voice recognition system |
CN108447479A (en) * | 2018-02-02 | 2018-08-24 | 上海大学 | The robot voice control system of noisy work condition environment |
CN108305627A (en) * | 2018-03-30 | 2018-07-20 | 合肥惠科金扬科技有限公司 | A kind of intelligent display and system |
CN208315196U (en) * | 2018-03-30 | 2019-01-01 | 合肥惠科金扬科技有限公司 | A kind of intelligent display and system |
CN109120993A (en) * | 2018-09-30 | 2019-01-01 | Tcl通力电子(惠州)有限公司 | Audio recognition method, intelligent terminal, speech recognition system and readable storage medium storing program for executing |
CN209691386U (en) * | 2019-02-27 | 2019-11-26 | 北京声智科技有限公司 | A kind of speech recognition system |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112306355A (en) * | 2019-07-24 | 2021-02-02 | 北京迪文科技有限公司 | Display device and method supporting voice recognition |
CN110418118A (en) * | 2019-09-05 | 2019-11-05 | 深圳伯图康卓智能科技有限公司 | A kind of video interactive speech recognition intelligent temperature control system |
CN111857646A (en) * | 2020-08-05 | 2020-10-30 | 上海茂声智能科技有限公司 | System for quickly realizing voice interaction function |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109686373A (en) | A kind of audio recognition method and system | |
CN209691386U (en) | A kind of speech recognition system | |
CN209310172U (en) | Air purifier with voice control function | |
CN201470124U (en) | Voice and motion combined multimode interaction electronic toy | |
CN104380373A (en) | Systems and methods for name pronunciation | |
CN108319443A (en) | A kind of audio-frequency inputting method, mobile terminal and audio playing apparatus | |
CN206879039U (en) | Support the wireless microphone and intelligent terminal karaoke OK system of usb audio | |
CN105678662A (en) | Auxiliary system for realizing encouraging education and method thereof on the basis of internet | |
CN207473563U (en) | A kind of speech processes mouse | |
CN106303829B (en) | Double nip head circuit and its control method | |
CN211509180U (en) | Multifunctional audio and video processing equipment | |
CN106169164A (en) | Multimedia teaching classroom management system | |
CN203352751U (en) | Multifunctional microphone | |
CN110335610A (en) | The control method and display of multimedia translation | |
CN204046718U (en) | Based on the video conference tape deck of technology of Internet of things | |
CN107390657A (en) | Multimedia teaching classroom management system | |
CN109471609A (en) | K sings projection integration machine and its karaoke method | |
CN208691520U (en) | WIFI speaker | |
CN105632540A (en) | Somatosensory detection and sound control based intelligent music play system | |
CN207867746U (en) | A kind of replaceable multipurpose wall chart point diagram song learning device of intelligence | |
CN108332320A (en) | A kind of multifunctional air purifying all-in-one machine and application method based on artificial intelligence | |
CN209055911U (en) | Intelligent gesture interactive system | |
CN207382446U (en) | A kind of VR tele-conferencing systems for rest room | |
CN110366017A (en) | A kind of smart television voice cam device and intelligent TV set | |
CN108564844A (en) | A kind of classroom computer teaching system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |