CN109686373A

CN109686373A - A kind of audio recognition method and system

Info

Publication number: CN109686373A
Application number: CN201910145754.9A
Authority: CN
Inventors: 李鹏; 陈孝良; 常乐; 冯大航; 苏少炜
Original assignee: BEIJING WISDOM TECHNOLOGY Co Ltd
Current assignee: BEIJING WISDOM TECHNOLOGY Co Ltd; Beijing SoundAI Technology Co Ltd
Priority date: 2019-02-27
Filing date: 2019-02-27
Publication date: 2019-04-26

Abstract

The present invention provides a kind of audio recognition method and system, and this method is suitable for speech recognition system, and speech recognition system includes voice acquisition device, server, display device and central processing unit.This method are as follows: collected speech simulation data are converted to toned digital data by voice acquisition device, and are sent to central processing unit.Toned digital data is sent to server and handled by central processing unit, obtains speech recognition result.Speech recognition result is fed back to central processing unit by server.Central processing unit is by speech recognition result to display device.Display device is set to show speech recognition result.In scheme provided by the invention, voice data concurrency, which is acquired, by voice acquisition device gives central processing unit, voice data is sent to server and is handled to obtain speech recognition result by central processing unit, speech recognition result is sent to display device and shows speech recognition result by central processing unit, improves the user experience and improve property easy to use.

Description

A kind of audio recognition method and system

Technical field

The present invention relates to technical field of voice recognition, and in particular to a kind of audio recognition method and system.

Background technique

With the development of science and technology, interactive voice is gradually spread in daily life.Interactive voice is defeated based on voice The interactive mode entered can be obtained by feedback result by speaking.

Interactive voice mode relatively conventional at present is to acquire voice data by microphone, and voice data is passed through processing Device obtains feedback result after being handled.But existing voice interaction technique is only played speech recognition result by power amplifier Out, in the noisy environment of noise, user can not catch speech recognition result, and user is caused not know that final voice is known Other result.User experience degree is low and inconvenient to use.

Summary of the invention

In view of this, the embodiment of the present invention provides a kind of audio recognition method and system, to solve existing voice interaction skill The problems such as user experience degree existing for art is low and inconvenient to use.

To achieve the above object, the embodiment of the present invention provides the following technical solutions:

First aspect of the embodiment of the present invention discloses a kind of speech recognition system, the system comprises: voice acquisition device, Server, display device and central processing unit；

The voice acquisition device is connect with the central processing unit, for being converted to collected speech simulation data Toned digital data, and the toned digital data is transferred to the central processing unit；

The central processing unit is connect with the server and display device respectively, for passing the toned digital data It is defeated by the server, and receives the speech recognition result of the server feedback, and institute's speech recognition result is sent To the display device；

The server handles the voice for receiving the toned digital data of the central processing unit transmission Numerical data obtains institute's speech recognition result, and institute's speech recognition result is fed back to the central processing unit；

The display device, for receiving and showing institute's speech recognition result.

Preferably, the system also includes image collecting devices；

Described image acquisition device is connect with the central processing unit, for acquiring image data, and by described image number According to being transferred to the central processing unit；

Correspondingly, the central processing unit is also used to for described image data being transmitted to the server, and described in reception Described image processing result is sent to the display device by the processing result image of server feedback；

The server is also used to handle described image data, and obtained processing result image is fed back to The central processing unit；

The display device is also used to show described image processing result.

Preferably, described image acquisition device includes camera.

Preferably, the system also includes playing devices；

The central processing unit is connect with the playing device, the central processing unit, is also used to the speech recognition As a result it is transferred to the playing device；

The playing device, for receiving and playing institute's speech recognition result of the central processing unit transmission.

Preferably, the playing device includes audio power amplifier.

Preferably, the system also includes LED arrays；

The central processing unit is connect with the LED array, and the central processing unit is also used to identify the voice collecting Device acquires the direction of the speech simulation data；

The LED array is used to indicate the direction that the voice acquisition device acquires the speech simulation data.

Preferably, the voice acquisition device includes microphone.

Preferably, the display device includes liquid crystal display.

Second aspect of the embodiment of the present invention discloses a kind of audio recognition method, is suitable for first aspect of the embodiment of the present invention Disclosed speech recognition system, the speech recognition system include voice acquisition device, server, display device and central processing Device, which comprises

Collected speech simulation data are converted to toned digital data by the voice acquisition device, and are sent to described Central processing unit；

The toned digital data is sent to the server and handled by the central processing unit, obtains speech recognition As a result；

Institute's speech recognition result is fed back to the central processing unit by the server；

Institute's speech recognition result is sent to the display device by the central processing unit, shows the display device Institute's speech recognition result.

Preferably, further includes:

Institute's speech recognition result is sent to playing device by the central processing unit, is made described in the playing device broadcasting Speech recognition result.

A kind of audio recognition method and system provided based on the embodiments of the present invention, this method are suitable for speech recognition System, speech recognition system include voice acquisition device, server, display device and central processing unit.This method are as follows: voice is adopted Collected speech simulation data are converted to toned digital data by acquisition means, and are sent to central processing unit.Central processing unit Toned digital data is sent to server to handle, obtains speech recognition result.Server feeds back speech recognition result To central processing unit.Central processing unit is by speech recognition result to display device.Display device is set to show speech recognition result.? In the present solution, acquiring voice data concurrency by voice acquisition device gives central processing unit, central processing unit is by voice data It is sent to server to be handled to obtain speech recognition result, speech recognition result is sent to display device and shown by central processing unit Show speech recognition result, in the case where noise is noisy, shows that speech recognition result can be improved user and make using display device With experiencing and improve property easy to use.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.

Fig. 1 is a kind of flow chart of audio recognition method provided in an embodiment of the present invention；

Fig. 2 is a kind of structural block diagram of speech recognition system provided in an embodiment of the present invention；

Fig. 3 is a kind of structural block diagram of speech recognition system provided in an embodiment of the present invention；

Fig. 4 is a kind of structural block diagram of speech recognition system provided in an embodiment of the present invention；

Fig. 5 is a kind of structural block diagram of speech recognition system provided in an embodiment of the present invention；

Fig. 6 is a kind of configuration diagram of speech recognition system provided in an embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

In this application, the terms "include", "comprise" or any other variant thereof is intended to cover non-exclusive inclusion, So that the process, method, article or equipment for including a series of elements not only includes those elements, but also including not having The other element being expressly recited, or further include for elements inherent to such a process, method, article, or device.Do not having There is the element limited in the case where more limiting by sentence "including a ...", it is not excluded that in the mistake including the element There is also other identical elements in journey, method, article or equipment.

It can be seen from background technology that, existing voice interaction technique is only played back speech recognition result by power amplifier, In the noisy environment of noise, user can not catch speech recognition result, and user is caused not know final speech recognition result. User experience degree is low and inconvenient to use.

Therefore, the embodiment of the present invention provides a kind of audio recognition method and system, acquires voice by voice acquisition device Data Concurrent gives central processing unit, and voice data is sent to server and is handled to obtain speech recognition knot by central processing unit Speech recognition result is sent to display device and shows speech recognition result by fruit, central processing unit, to solve existing voice interaction User experience degree existing for technology is low and inconvenient problem with use.

With reference to Fig. 1, a kind of flow chart of audio recognition method provided in an embodiment of the present invention is shown, this method is suitable for Speech recognition system, the speech recognition system include voice acquisition device, server, display device and central processing unit, institute State method the following steps are included:

Step S101: collected speech simulation data are converted to toned digital data by the voice acquisition device, and It is sent to the central processing unit.

During implementing step S101, the voice acquisition device includes but are not limited to microphone, described Voice acquisition device by the collected speech simulation data conversion at the toned digital data, and by integrated circuit Set audio-frequency bus (Inter-IC Sound, I2S) interface or pulse density modulated (Pulse Density Modulation, PDM) toned digital data is sent to the central processing unit by interface.

Preferably, during implementing step S101, the central processing unit identifies the voice acquisition device The direction of the speech simulation data is acquired, and indicates that the voice acquisition device acquires the speech simulation by LED array The direction of data.

Preferably, during specific implementation step S101, using image acquisition device image data, and by institute It states image data and is transferred to the central processing unit.Described image acquisition device includes but are not limited to camera.

Step S102: the toned digital data is sent to the server and handled by the central processing unit, is obtained To speech recognition result.

During implementing step S102, the central processing unit passes through wireless network for the speech digit number It is handled according to the server is sent to, obtains speech recognition result.

Preferably, the central processing also passes through wireless network and described image data is sent at the server Reason, obtains processing result image.

Step S103: institute's speech recognition result is fed back to the central processing unit by the server.

During implementing step S103, the server is anti-by institute's speech recognition result by wireless network It feeds the central processing unit.

Preferably, the server also passes through wireless network described image processing result is fed back to the central processing Device.

Step S104: institute's speech recognition result is sent to the display device by the central processing unit, is made described aobvious Showing device shows institute's speech recognition result.

During implementing step S104, the display device is shown described in a manner of text and/or image Speech recognition result.How more preferably to illustrate using display device display institute's speech recognition result, pass through process A1-A4 is illustrated:

A1, assume that the collected speech simulation data of voice acquisition device are " how is the weather in the city A tomorrow ", by voice Analogue data is converted into toned digital data and is sent to central processing unit.

The toned digital data is sent to server and handled by A2, the central processing unit, is obtained comprising weather The speech recognition result of forecast information.

Speech recognition result comprising weather forecast information is fed back to the central processing unit, institute by A3, the server It states central processing unit and the speech recognition result comprising weather forecast information is sent to the display device.

A4, the display device show information such as " weather in the city A tomorrow are as follows: fine day " in the form of text.

Optionally, it is assumed that the central controller is also connect with LED array, in implementation procedure A1, the center control Device indicates the side of the collected speech simulation data of the voice acquisition device by the state of the LED light in control LED array To.

For example the direction of the collected speech simulation data of voice acquisition device is positive east, then passes through the LED The direction of the collected speech simulation data of voice acquisition device described in the light on and off state instruction of LED light in array is due east Side.

Optionally, it is assumed that the central controller is also connect with playing device, in implementation procedure A3, the center control Institute's speech recognition result is also sent to the playing device by device, make the playing device play " weather in the city A tomorrow are as follows: The voice messaging of fine day ".

It should be noted that the content in above-mentioned A1-A4 is suitable only for illustrating.

Preferably, institute's speech recognition result is also sent to playing device by the central processing unit, fills the broadcasting It sets and plays institute's speech recognition result.The playing device includes but are not limited to audio power amplifier, can have broadcasting to be any The equipment of sound effect function is specifically selected and is used according to the actual situation by technical staff.The playing device receives institute The form for stating central processing unit transmission is the speech recognition result of digital signal, and the playing device is by institute's speech recognition result It is converted into analog signal from digital signal, and is played out.While playing institute's speech recognition result, pass through I2S interface Or the speech recognition result that analog-digital converter back production has played, the data feedback that back production is obtained give the central processing Device.

Preferably to illustrate the relevant content of the above-mentioned image collecting device being related to, below by process B1-B4 It is illustrated:

B1, assume that the image data that image acquisition device arrives is the shape of B product, and the B product will be included The described image data of shape are sent to central processing unit.

Described image data are sent to server and handled by B2, the central processing unit, are obtained comprising B product information Processing result image.

Processing result image comprising B product information is fed back to the central processing unit by B3, the server, it is described in Processing result image comprising B product information is sent to display device by central processor.

B4, the display device show the processing result image comprising B product information, for example show the B product The information such as sample pictures, title, price and purposes.

It should be noted that the content that above process B1-B4 is related to is only used for illustrating.

In embodiments of the present invention, voice data concurrency is acquired by voice acquisition device and gives central processing unit, center Voice data is sent to server and is handled to obtain speech recognition result by processor, and central processing unit is by speech recognition result It is sent to display device and shows speech recognition result, in the case where noise is noisy, speech recognition knot is shown using display device Fruit can be improved user experience and improve property easy to use.

It is corresponding with a kind of audio recognition method that the embodiments of the present invention provide, with reference to Fig. 2, show of the invention real The structural block diagram that a kind of speech recognition system of example offer is provided, the system comprises: voice acquisition device 201, server 202, Display device 203 and central processing unit 204.

The voice acquisition device 201 is connect with the central processing unit 204, is used for collected speech simulation data Toned digital data is converted to, and the toned digital data is transferred to the central processing unit 204.

In the concrete realization, the voice acquisition device 201 is by the collected speech simulation data conversion at described Toned digital data, and the toned digital data is sent to by the central processing unit by I2S interface or PDM interface 204.It should be noted that the voice acquisition device 201 includes but are not limited to microphone, there can be acquisition language to be any The equipment of sound function is specifically selected and is used according to the actual situation by technical staff.

The central processing unit 204 is connect with the server 202 and display device 203 respectively, is used for the voice Digital data transfer gives the server 202, and receives the speech recognition result that the server 202 is fed back, and will be described Speech recognition result is sent to the display device 203.

In the concrete realization, the toned digital data is sent to institute by wireless network by the central processing unit 204 It states server 202 to be handled, obtains speech recognition result.It should be noted that the central processing unit 204 can be any Chip with processing data function.

The server 202, the toned digital data transmitted for receiving the central processing unit 204, handles institute It states toned digital data and obtains institute's speech recognition result, institute's speech recognition result is fed back into the central processing unit 204.

The display device 202, for receiving and showing institute's speech recognition result.

In the concrete realization, the display device 202 shows the speech recognition knot in a manner of text and/or image Fruit.It should be noted that the display device 202 includes but are not limited to liquid crystal display, there can be display function to be any The equipment of energy, is specifically selected and is used according to the actual situation by technical staff.

Preferably, a kind of structure of speech recognition system provided in an embodiment of the present invention is shown with reference to Fig. 3 in conjunction with Fig. 2 Block diagram, the system also includes: image collecting device 205.

Described image acquisition device 205 is connect with the central processing unit 204, for acquiring image data, and will be described Image data is transferred to the central processing unit 204.

Correspondingly, the central processing unit 204, is also used to for described image data to be transmitted to the server 202, and connect The processing result image that the server 202 is fed back is received, described image processing result is sent to the display device 203.

The server 202 is also used to handle described image data, and obtained processing result image is fed back To the central processing unit 204.

The display device 203, is also used to show described image processing result.

It should be noted that described image acquisition device 205 includes but are not limited to camera, there can be figure to be any As the equipment of acquisition function, is specifically selected and used according to the actual situation by technical staff.

In embodiments of the present invention, pass through image acquisition device image data and be sent to central processing unit, center Image data is sent to server and is handled to obtain processing result image by processor, and central processing unit is by processing result image It is sent to display device and shows processing result image, improve the user experience and improve property easy to use.

Preferably, a kind of structure of speech recognition system provided in an embodiment of the present invention is shown with reference to Fig. 4 in conjunction with Fig. 2 Block diagram, the system also includes: playing device 206.

The central processing unit 204 is connect with the playing device 206, and the central processing unit 204, being also used to will be described Speech recognition result is transferred to the playing device 206；

The playing device 206, the speech recognition knot transmitted for receiving and playing the central processing unit 204 Fruit.

In the concrete realization, the playing device 206 receives the form that the central processing unit 204 is sent as number letter Number speech recognition result, institute's speech recognition result is converted into analog signal from digital signal by the playing device 206, and It plays out.While playing institute's speech recognition result, played by I2S interface or analog-digital converter back production Speech recognition result, the data feedback that back production is obtained give the central processing unit 204.

It should be noted that the playing device 206 includes but are not limited to audio power amplifier, there can be broadcasting to be any The equipment of sound effect function is specifically selected and is used according to the actual situation by technical staff.

In embodiments of the present invention, voice data concurrency is acquired by voice acquisition device and gives central processing unit, center Voice data is sent to server and is handled to obtain speech recognition result by processor, and central processing unit is by speech recognition result It is sent to display device and playing device, shows and play speech recognition result, in the case where noise is noisy, is filled using display Setting display speech recognition result can be improved user experience and improves property easy to use.

Preferably, a kind of structure of speech recognition system provided in an embodiment of the present invention is shown with reference to Fig. 5 in conjunction with Fig. 2 Block diagram, the system also includes: LED array 207.

The central processing unit 204 is connect with the LED array 207, and the central processing unit 204 is also used to identify institute State the direction that voice acquisition device 201 acquires the speech simulation data.

The LED array 207 is used to indicate the direction that the voice acquisition device 201 acquires the speech simulation data. The LED array 207 includes but are not limited to the LED array being made of 12 full-color LED lamps and LED drive chip, described Central processing unit 204 is arranged by controlling the LED drive chip and lights the LED light, to realize the function in instruction direction Energy.

It should be noted that the central processing unit 204 of the above-mentioned Fig. 2 into Fig. 5, passes through universal serial bus (Universal Serial Bus, USB) power supply or 5V power supply are power supply, and generate difference by DC-DC chip Voltage be that each device is powered, such as by the voltages such as DC-DC chip generation+3.3V ,+1.8V and+1.35V, for Each device that the central processing unit 204 connects is powered.

In the concrete realization, with reference to Fig. 6, a kind of frame of speech recognition system 300 provided in an embodiment of the present invention is shown Structure schematic diagram, in described Fig. 6, the speech recognition system 300 include voice acquisition device 301, server (server) 302, Display device 303, central processing unit 304, image collecting device 305, playing device 306 and LED array 307.

The central processing 304 can be communicated with communication module 308.

The central processing unit 304 further includes interactive module 309, and interactive module 309 is used to carry out letter with communication module 308 Breath interaction.The interactive module 309 can be secure digital input-output card (Secure Digital Input and Output, SDIO) or universal asynchronous receiving-transmitting transmitter (Universal Asynchronous Receiver/Transmitter, UART), SDIO or UART is connected with the communication module 308, and the communication module 308 can be WiFi chip or bluetooth core Piece.The central processing unit 304 is passed through by carrying out information exchange between SDIO or UART and the communication module 308 The communication module 308 carries out information exchange with the server 302 or other equipment.

The central processing unit 304 further includes at least one memory module, for storing data, such as voice data, view Frequency evidence, image data etc..As shown in fig. 6, memory module 310a can be with the storage outside the central processing unit 304 Module 310b carries out information exchange.The memory module 310a, the memory module 310b can be EMMC The storage chip of (EmbeddedMulti Media Card) card or other forms.The central processing unit 304 can also lead to It crosses Double Data Rate controller 311 (Double Data Rae controller, DDR) to connect with memory 312, and then by depositing Reservoir 312 carrys out storing data.

The central processing unit further includes USB interface, is connected by USB interface 313a with external USB interface 313b, In, USB interface 313b can support OTG (On-The-Go) agreement.

The central processing unit 304 can also include analog-digital converter 314 (Analog-to-Digital Converter, ADC), the analog-digital converter 314 is connected with key module 320.By operating the key module 320, control can be referred to Order is sent to the analog-digital converter 314 in a manner of analog signal, and the simulation that the analog-digital converter 314 inputs equipment is believed Number digital signal is converted to, the other the modules then digital signal being sent in the central processing unit 304.It needs to illustrate , key module 320 can be the module containing 4 keys, is also possible to the module containing 6 keys, the quantity and function of key It can be not limited to that, can be selected according to actual needs.

The central processing unit 304 can also be connected with the display device 303, wherein the display device 303 can be with It is liquid crystal display (Liquid Crystal Display, LCD), LCD or organic light emission with touch function Diode (Organic Light-Emitting Diode, OLED).The display device 303 can be at mobile industry Manage the liquid crystal display of device interface (Mobile Industry Processor Interface, MIPI), the central processing unit 304 can be connected by interface 315a and the LCD liquid crystal display with MIPI interface or the touch display with I2C interface Connect, interface 315a can be MIPI interface, be also possible to universal input output (General Purpose Input Output, GPIO) interface or I2C interface.Central processing unit 304 can also include interface 315b, and interface 315b can be GPIO or I2C connects Mouthful, for speech recognition result to be sent to LCD display, LCD display shows the speech recognition result.

The central processing unit 304 can also include interface 315c, for being connected with LED array 307.Wherein, interface 315c can be I2C interface, can be sent and be instructed to LED array 307 by interface 315c, pass through the LED in LED array 307 Driver drives LED array, indicates the source direction of the acquired voice data of microphone.

The central processing unit 304 can also include interface 315d, pass through interface 315d and 305 phase of image collecting device Even.Wherein, image collecting device 305 can be the camera with MIPI interface, and interface 315d can be MIPI interface.Institute Stating central processing unit 304 can receive the video data acquired by the camera with MIPI interface.

The central processing unit 304 can also be connected by interface 315e with voice acquisition device 301, wherein interface 315e It can be PDM interface or I2S interface, voice acquisition device 301 can be the circuit board for being integrated with microphone array, the circuit board It can also include 6 to prepare using Micro Electro Mechanical System (Micro-Electro-Mechanical System, MEMS) technique Microphone, these microphones form microphone array, acquire voice data by the microphone array.

The central processing unit 304 can also include power module 316, and power module 316 is for managing the centre Manage the electric power thus supplied of device 304.Power module 316 is connected with Voltage stabilizing module 317, Voltage stabilizing module 317 can be BUCK voltage-stablizer or The leading voltage-stablizer of person's low pressure (low dropout regulator, LDO), Voltage stabilizing module 317 connects to power supply, and power supply passes through steady Die block 316 is powered to the central processing unit 304.The power supply can be the power supply of 5V/3A.

The speech recognition system 300 can also include conversion module 318, and conversion module 318 may include that ADC and modulus turn Parallel operation (Digital-to-Analog Converter, DAC).In the central processing unit 304 can also include interface 315f and Interface 315g, interface 315f can be I2S interface, and interface 315g can be PDM interface, and the central processing unit 304 can lead to It crosses I2S interface or PDM interface and speech recognition result is sent to the conversion module 318, the conversion module 318 passes through ADC Speech recognition result is converted into analog signal by digital signal, and passes through (the audio power of audio-frequency power amplifier 319 Amplifiers, Audio Power Amps) it is sent to playing device 306, playing device 306 puts institute's speech recognition result Big and broadcasting, and the speech recognition result of broadcasting is converted by digital signal by the DAC in conversion module 318, and pass through I2S Or the back production of PDM interface is into the central processing unit 304.It should be noted that playing device 306 can be loudspeaker, or Loudspeaker with bluetooth or wireless connecting function.

It should be noted that the black arrow in Fig. 6 indicates the direction of information exchange in speech recognition system 300, double arrows Head indicates that information can indicate that information can be with unidirectional delivery with bi-directional, single arrow.

It should be noted that aforementioned all kinds of interfaces, such as interface 315a, interface 315b, interface 315c, interface 315d, connect The type of mouth 315e, 315f, 315g etc. can be selected according to actual needs, be not limited in the listed example enumerated.

It should be noted that the central processing unit 304 in above-mentioned Fig. 6, can pass through universal serial bus (Universal Serial Bus, USB) power supply or 5V power supply are power supply, and generate difference by DC-DC chip Voltage be that each device is powered, such as by the voltages such as DC-DC chip generation+3.3V ,+1.8V and+1.35V, for Each device that the central processing unit 204 connects is powered.

It should be noted that the configuration diagram for the speech recognition system being related in above-mentioned Fig. 6 be only used for for example, Specific speech recognition system architecture diagram includes but are not limited to the content shown in Fig. 6, different one illustrates herein.

In order to realize above-mentioned speech recognition system 300, it can choose framework and suitably handle chip as the centre Device 304, such as Intel chip or AMD chip are managed, or redesigns chip structure according to actual needs.

In conclusion the embodiment of the present invention provides a kind of audio recognition method and system, this method is suitable for speech recognition System, speech recognition system include voice acquisition device, server, display device and central processing unit.This method are as follows: voice is adopted Collected speech simulation data are converted to toned digital data by acquisition means, and are sent to central processing unit.Central processing unit Toned digital data is sent to server to handle, obtains speech recognition result.Server feeds back speech recognition result To central processing unit.Central processing unit is by speech recognition result to display device.Display device is set to show speech recognition result.? In the present solution, acquiring voice data concurrency by voice acquisition device gives central processing unit, central processing unit is by voice data It is sent to server to be handled to obtain speech recognition result, speech recognition result is sent to display device and shown by central processing unit Show speech recognition result, in the case where noise is noisy, shows that speech recognition result can be improved user and make using display device With experiencing and improve property easy to use.

All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system or For system embodiment, since it is substantially similar to the method embodiment, so describing fairly simple, related place is referring to method The part of embodiment illustrates.System and system embodiment described above is only schematical, wherein the conduct The unit of separate part description may or may not be physically separated, component shown as a unit can be or Person may not be physical unit, it can and it is in one place, or may be distributed over multiple network units.It can root According to actual need that some or all of the modules therein is selected to achieve the purpose of the solution of this embodiment.Ordinary skill Personnel can understand and implement without creative efforts.

Professional further appreciates that, unit described in conjunction with the examples disclosed in the embodiments of the present disclosure And algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These Function is implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Profession Technical staff can use different methods to achieve the described function each specific application, but this realization is not answered Think beyond the scope of this invention.

The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.

Claims

1. a kind of speech recognition system, which is characterized in that the system comprises: voice acquisition device, server, display device and Central processing unit；

The voice acquisition device is connect with the central processing unit, for collected speech simulation data to be converted to voice Numerical data, and the toned digital data is transferred to the central processing unit；

The central processing unit is connect with the server and display device respectively, for the toned digital data to be transferred to The server, and the speech recognition result of the server feedback is received, and institute's speech recognition result is sent to institute State display device；

The server handles the speech digit for receiving the toned digital data of the central processing unit transmission Data obtain institute's speech recognition result, and institute's speech recognition result is fed back to the central processing unit；

2. system according to claim 1, which is characterized in that the system also includes: image collecting device；

Described image acquisition device is connect with the central processing unit, is passed for acquiring image data, and by described image data It is defeated by the central processing unit；

Correspondingly, the central processing unit is also used to described image data being transmitted to the server, and receives the service The processing result image of device feedback, is sent to the display device for described image processing result；

The server is also used to handle described image data, and obtained processing result image is fed back to described Central processing unit；

The display device is also used to show described image processing result.

3. system according to claim 2, which is characterized in that described image acquisition device includes camera.

4. system according to claim 1, which is characterized in that the system also includes: playing device；

The central processing unit is connect with the playing device, the central processing unit, is also used to institute's speech recognition result It is transferred to the playing device；

5. system according to claim 3, which is characterized in that the playing device includes audio power amplifier.

6. system according to any one of claims 1-5, which is characterized in that the system also includes: LED array；

The central processing unit is connect with the LED array, and the central processing unit is also used to identify the voice acquisition device Acquire the direction of the speech simulation data；

7. system according to any one of claims 1-5, which is characterized in that the voice acquisition device includes Mike Wind.

8. system according to any one of claims 1-5, which is characterized in that the display device includes liquid crystal display Device.

9. a kind of audio recognition method, which is characterized in that be suitable for speech recognition system, the speech recognition system includes voice Acquisition device, server, display device and central processing unit, which comprises

Collected speech simulation data are converted to toned digital data by the voice acquisition device, and are sent to the center Processor；

The toned digital data is sent to the server and handled by the central processing unit, obtains speech recognition knot Fruit；

Institute's speech recognition result is sent to the display device by the central processing unit, is made described in the display device shows Speech recognition result.

10. according to the method described in claim 9, it is characterized by further comprising:

Institute's speech recognition result is sent to playing device by the central processing unit, and the playing device is made to play the voice Recognition result.