CN108597513A - It is wirelessly transferred TV far field speech recognition system and method based on 5.8GHz - Google Patents

It is wirelessly transferred TV far field speech recognition system and method based on 5.8GHz Download PDF

Info

Publication number
CN108597513A
CN108597513A CN201810425853.8A CN201810425853A CN108597513A CN 108597513 A CN108597513 A CN 108597513A CN 201810425853 A CN201810425853 A CN 201810425853A CN 108597513 A CN108597513 A CN 108597513A
Authority
CN
China
Prior art keywords
voice
television
module
data
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810425853.8A
Other languages
Chinese (zh)
Inventor
鲁勇
赵新科
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Skyworth RGB Electronics Co Ltd
Original Assignee
Shenzhen Skyworth RGB Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Skyworth RGB Electronics Co Ltd filed Critical Shenzhen Skyworth RGB Electronics Co Ltd
Priority to CN201810425853.8A priority Critical patent/CN108597513A/en
Publication of CN108597513A publication Critical patent/CN108597513A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42204User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor
    • H04N21/42206User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor characterized by hardware details
    • H04N21/42221Transmission circuitry, e.g. infrared [IR] or radio frequency [RF]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42204User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor
    • H04N21/42206User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor characterized by hardware details
    • H04N21/42222Additional components integrated in the remote control device, e.g. timer, speaker, sensors for detecting position, direction or movement of the remote control, microphone or battery charging device
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The present invention discloses a kind of based on 5.8GHz wireless transmission TV far field speech recognition systems and method, wherein microphone array module, for acquiring mixed field voice data;DSP processing modules filter out the voice data that television is sent out to obtain phonetic order by preset algorithm;5.8GHz radio-frequency modules, for being modulated demodulation to the phonetic order, the phonetic order after demodulation is exported to television SOC systems;Television SOC systems generate operational order, and then control television APP and complete corresponding voice operating for carrying out data comparison to phonetic order and backstage speech database.The present invention uses the wireless technology transmitting audio data of 5.8GHz frequency range carrier waves, the wireless technology of 2.4GHz frequency ranges carrier wave possesses broader transmission bandwidth compared with prior art, anti-interference ability is stronger, without additional transmissions time delay, human-computer interaction is more preferable, can more preferably meet the needs of television remote speech recognition and data transmission.

Description

Far-field voice recognition system and method based on 5.8GHz wireless transmission television
Technical Field
The invention relates to the technical field of televisions, in particular to a far-field voice recognition system and method for a television based on 5.8GHz wireless transmission.
Background
With the development of smart television technology, man-machine interaction modes are increasingly diversified, the application of the voice recognition technology in the television is increasingly wide at present, and the voice technology can enable a client to operate the television more conveniently and quickly.
At present, most of intelligent televisions have a voice recognition function, a user needs to be close to the television, even close to television interaction, so that the user instruction can be correctly recognized, and the recognition rate is reduced along with the increase of the distance. Based on the above problems, wireless transmission becomes an ideal solution, and wireless transmission can get rid of the limitation of a data line, so as to solve the limitation of user voice interaction distance, and at present, there are many wireless transmission modes, such as ZigBee, bluetooth or WiFI, but the above modes have certain limitations, wherein the most important bottleneck is that the currently used 2.4GHz band of each device is more and more crowded, and the interference is more and more, especially for a smart television, there are many self-integrated 2.4GHz modules, such as WiFI and bluetooth, which easily cause interference to voice signals, and affect the voice recognition effect. More importantly, the 2.4GHz wireless transmission technology mostly adopts the frequency hopping technology in order to solve the mutual interference problem, however, the use of the frequency hopping protocol causes additional transmission delay of wireless data, which causes non-ideal man-machine interaction, which is undoubtedly fatal defect for voice recognition. Based on the problems, the current far-field voice recognition system of the smart television has limitations, and the requirement of far-field voice recognition of the television cannot be met.
Disclosure of Invention
The invention mainly aims to provide a far-field voice recognition system for a television based on 5.8GHz wireless transmission, aiming at reducing extra transmission time delay of wireless data so as to enable human-computer interaction to be more ideal and further meet the requirement of far-field voice recognition of the current television.
In order to achieve the above object, the present invention provides a far-field speech recognition system for a television based on 5.8GHz wireless transmission, which includes a mobile speech end and a television end, wherein the mobile speech end includes a microphone array module, an ADC module, a DSP processing module, and a first 5.8GHz radio frequency module; the television end comprises a second 5.8GHz radio frequency module and a television end SOC system; wherein,
the microphone array module is used for collecting mixed field voice data, and the mixed field voice data comprises voice instructions sent by a user and sound data sent by a television end; the ADC module is used for performing analog-to-digital conversion on the mixed field voice data; the DSP processing module is used for acquiring mixed field voice data subjected to analog-to-digital conversion by the ADC module and voice data sent by the television terminal through the first 5.8GHz radio frequency module, filtering the voice data of the television terminal through a preset algorithm to obtain a voice instruction sent by the user, and outputting the voice instruction to the first 5.8GHz radio frequency module; the first 5.8GHz radio frequency module is used for acquiring sound data sent by the television terminal and modulating the voice instruction output by the DSP processing module into a voice instruction with a frequency band of 5.8 GHz; the second 5.8GHz radio frequency module is used for receiving the voice command with the frequency band of 5.8GHz, demodulating the voice command with the frequency band of 5.8GHz and sending the demodulated voice command to the television SOC system; and the television terminal SOC system is used for comparing the received voice instruction with the cloud voice database to generate an operation instruction, and then controlling the APP of the television terminal to complete the corresponding voice instruction operation.
Preferably, a first signal end of the ADC module is connected to a signal output end of the microphone array module, and a second signal end of the ADC module is connected to a first signal end of the DSP processing module.
Preferably, the DSP processing module includes a voice processing chip, and the voice processing chip includes a plurality of logic processing cores for processing different channel audio data, an SRAM unit for data caching, and a FLASH unit for voice data caching.
Preferably, the first 5.8GHz radio frequency module includes a first communication unit for performing data communication with the DSP processing module and a first modem unit for modulating and demodulating voice commands issued by the user or voice data issued by the television terminal, where a first signal terminal of the first communication unit is connected to a second signal terminal of the DSP processing module, and a second signal terminal of the first communication unit is connected to a signal terminal of the first modem unit.
Preferably, the second 5.8GHz radio frequency module includes a second communication unit for performing data communication with the television terminal SOC system and a second modem unit for modulating and demodulating voice commands sent by the user or voice data sent by the television terminal, the first signal terminal of the second communication unit is connected to the signal terminal of the second modem unit, and the second signal terminal of the second communication unit is connected to the signal terminal of the television terminal SOC system.
In order to achieve the above object, the present invention further provides a far-field speech recognition method based on 5.8GHz wireless transmission television, the method comprising:
s100, collecting mixed field voice data and carrying out analog-to-digital conversion on the mixed field voice data, wherein the mixed field voice data comprises voice instructions sent by a user and sound data sent by a television end; s200, acquiring mixed field voice data after analog-to-digital conversion and voice data sent by the television end, and filtering the voice data sent by the television end according to a preset algorithm to obtain a voice instruction sent by a user; step S300, modulating the voice command into a voice command with a frequency band of 5.8GHz and outputting the voice command; step S400, receiving the voice command with the frequency band of 5.8GHz and demodulating the received voice command with the frequency band of 5.8 GHz; and S500, comparing the received voice instruction with the cloud voice database to generate an operation instruction, and further controlling the APP of the television end to execute corresponding voice instruction operation.
Preferably, step S200 includes: step S210, acquiring the mixed field voice data after the analog-to-digital conversion and the sound data sent by the television end, and caching the sound data sent by the television end; step S220, obtaining a voice instruction sent by the user according to voice data sent by the television terminal through a preset algorithm; and step S230, performing signal pre-emphasis and windowing pre-processing on the voice command and outputting the voice command.
Preferably, step S300 includes: step S310, receiving the voice command after the pre-emphasis and the windowing pretreatment, and carrying out frequency conversion on the received voice command after the pre-emphasis and the windowing pretreatment into an intermediate frequency signal; step S320, low-pass filtering the intermediate frequency signal and frequency converting the intermediate frequency signal into a high-frequency carrier signal with a frequency band of 5.8 GHz; and S330, filtering and amplifying the high-frequency carrier signal with the frequency band of 5.8GHz, and transmitting the signal through an antenna.
Preferably, step S400 includes: step S410, receiving the high-frequency carrier signal with the frequency band of 5.8GHz and carrying out low-noise amplification processing on the received high-frequency carrier signal with the frequency band of 5.8 GHz; step S420, mixing the high-frequency carrier signals after the low-noise amplification treatment to obtain intermediate-frequency signals; and step S430, filtering and frequency converting the intermediate frequency signal to obtain and output a voice instruction sent by the user.
Preferably, step S500 includes: step S510, receiving a voice command sent by the user, and performing voice analysis and measure analysis on the received voice command sent by the user; step S520, carrying out data search and comparison on the voice command subjected to voice analysis and measure analysis and the cloud voice database, and further making recognition judgment; and step S530, controlling the APP of the television end to execute corresponding voice instruction operation.
The technical scheme includes that a 5.8GHz wireless transmission television far-field voice recognition system is composed of a microphone array module, an ADC (analog to digital converter) module, a DSP (digital signal processor) module, a first 5.8GHz radio frequency module, a second 5.8GHz radio frequency module and a television end SOC (system on chip), when the system works, the microphone array module collects multidirectional mixed field voice data, the ADC module performs analog-to-digital conversion on the mixed field voice data and sends digital voice signals to the DSP module, the DSP module receives the digital voice data and voice data sent by a television end through the first 5.8GHz radio frequency module, local storage of the voice data sent by the television end is achieved, recognition of television horn voice, command voice sent by people and environmental noise is achieved through a preset DSP algorithm, characteristics are extracted and eliminated, processing such as filtering of the voice data sent by the television end is achieved, and the first 5.8GHz radio frequency module and the second 5.8GHz radio frequency module achieve processing of multichannel voice commands or voice sent by the television end simultaneously Data modulation and demodulation, accomplish simultaneously with DSP processing module and television end SOC system's data communication, the voice command after the second 5.8GHz radio frequency module will be handled sends to the television end, and after the voice command that the user sent was received to television end SOC system, the voice command that sends the user carries out data contrast processing with high in the clouds voice database, discernment voice command, and then the instruction operation that the APP of control television end accomplished the pronunciation and corresponds. The voice data is transmitted by adopting a wireless technology of a carrier wave with a 5.8GHz frequency band, the wireless technology has wider transmission bandwidth compared with the wireless technology of a carrier wave with a 2.4GHz frequency band, and a 5.8GHz system adopts a direct sequence spread spectrum technology, so that more channels, higher frequency, stronger anti-jamming capability, no extra transmission time delay and more ideal man-machine interaction are realized, and the requirements of television remote voice recognition and data transmission can be better met.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.
FIG. 1 is a functional block diagram of an embodiment of a far-field speech recognition system for a 5.8GHz wireless transmission television according to the invention;
FIG. 2 is a schematic diagram of a data structure of a 5.8GHz audio data transmission protocol stack in a far-field speech recognition system based on a 5.8GHz wireless transmission television of the present invention;
fig. 3 is a schematic structural diagram of the first 5.8GHz rf module in a far-field speech recognition system of a television based on 5.8GHz wireless transmission;
fig. 4 is a schematic structural diagram of the first modem unit in the far-field speech recognition system for tv based on 5.8GHz wireless transmission according to the present invention;
FIG. 5 is a schematic flow chart illustrating a far-field speech recognition method for a television based on 5.8GHz wireless transmission according to a first embodiment of the present invention;
FIG. 6 is a schematic flow chart illustrating a far-field speech recognition method for a television based on 5.8GHz wireless transmission according to a second embodiment of the present invention;
FIG. 7 is a schematic flow chart illustrating a far-field speech recognition method for a television based on 5.8GHz wireless transmission according to a third embodiment of the present invention;
FIG. 8 is a schematic flow chart illustrating a fourth embodiment of a far-field speech recognition method for a television based on 5.8GHz wireless transmission according to the present invention;
fig. 9 is a schematic flow chart of a fifth embodiment of the far-field speech recognition method for a television based on 5.8GHz wireless transmission according to the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the descriptions relating to "first", "second", etc. in the present invention are for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.
The invention provides a far-field voice recognition system of a television based on 5.8GHz wireless transmission, and aims to solve the problem that the current 2.4GHz carrier communication cannot meet the requirement of the current far-field voice recognition of the television.
As shown in fig. 1, fig. 1 is a schematic diagram of functional modules of an embodiment of a far-field speech recognition system of a television based on 5.8GHz wireless transmission according to the present invention, where the far-field speech recognition system of the television based on 5.8GHz wireless transmission according to the present invention includes a mobile speech end 100 and a television end 200, and the mobile speech end 100 includes a microphone array module 110, an ADC module 120, a DSP processing module 130, and a first 5.8GHz radio frequency module 140; the tv end 200 includes a second 5.8GHz rf module 210 and a tv end SOC system 220.
It should be noted that the applicable main body of the present invention includes all televisions having voice recognition technology, and is not limited herein, the mobile voice terminal 100 further includes a first power module 150 and a second power module 230, the power module is composed of a voltage transformation unit, a rectification unit, a filtering unit and a DC/DC chip unit, the first power module 150 provides a DC voltage for each module in the mobile voice terminal 100, and the second power module 230 provides a DC voltage for each module in the television terminal 200.
Specifically, the microphone array module 110 is configured to collect mixed-field voice data, where the mixed-field voice data includes a voice instruction sent by a user and sound data sent by a television.
The microphone array module 110 is an array formed by arranging a group of omnidirectional microphones at different spatial positions according to a certain shape rule, and is a device for spatially sampling a spatially propagated sound signal, and the acquired signal includes information such as a spatial position thereof. The array may be divided into a near-field model and a far-field model according to the distance between the sound source and the microphone array module 110. According to the topology structure of the microphone array module 110, the microphone array module can be divided into a linear array, a planar array, a volume array, and the like.
It can be understood that, in reality, the voice signal is an analog signal, and because the transmission interference rejection of the analog voice signal is poor, the signal is easy to attenuate, and it is not beneficial to the long-distance transmission of the voice signal, in this embodiment, the digital microphone array module 110 is adopted to realize the acquisition of the original voice signal, and when a human body sends a voice instruction, the current television may be in a playing state, so the mixed-field voice data acquired by the microphone array module 110 includes the sound data sent by the television's own speaker and the voice instruction sent by the user.
And the ADC module 120 is configured to perform analog-to-digital conversion on the mixed-field voice data.
It should be noted that the voice command sent by the user is an analog signal, because the transmission interference resistance of the analog voice signal is poor, the signal is easy to attenuate, and it is not favorable for the long-distance transmission of the voice signal, and the signal processed inside the DSP processing module 130 is a digital signal, the ADC module 120 is required to convert the analog voice data and output the digital voice signal to the DSP processing module 130, which, in this embodiment, the first signal terminal of the ADC module 120 is connected to the signal output terminal of the microphone array 110 and the microphone array module 110, the second signal terminal of the ADC module 120 is connected to the first signal terminal of the DSP processing module 130, the ADC module 120 supports multi-channel data conversion and I2S bus interface communication, the I2S bus interface is an interface for passing stereo audio signals and systems, to fulfill the requirements of communication with the DSP processing module 130 and the microphone array module 110.
The DSP processing module 130 is configured to obtain mixed-field voice data after analog-to-digital conversion by the ADC module 120, obtain sound data sent by the television terminal through the first 5.8GHz radio frequency module 140, filter the sound data of the television terminal through a preset algorithm to obtain a voice instruction sent by a user, and output the voice instruction to the first 5.8GHz radio frequency module 140.
It can be understood that, when a user sends a voice instruction, the television may be in a normal playing state, the DSP processing module 130 obtains digital mixed-field voice data through the ADC module 120, and meanwhile, the DSP processing module 130 also receives sound data sent by the television terminal 200 through the first 5.8GHz radio frequency module 140, and the DSP processing module 130 stores the sound data sent by the television terminal in Flash integrated with the DSP processing module 130, and then filters the sound data sent by the television terminal in the mixed-field voice data through a DSP algorithm, so as to obtain a single voice instruction sent by the user. The voice command is preprocessed by signal pre-emphasis, windowing and the like, the pre-emphasis refers to that a pre-emphasis network is adopted before the voice command is introduced, high-frequency components of input signals of the first 5.8GHz radio frequency module 140 are emphasized, the output signal-to-noise ratio is effectively improved, in the signal processing, because the 5.8GHz radio frequency module can only process signals with limited length, voice signals are cut off by T (sampling time), namely limited, and further processed after XT (T) is formed, and the process sequence is windowing processing, so that the voice command is completely and timely transmitted to the television end 200.
The first 5.8GHz rf module 140 is configured to obtain sound data sent by the television terminal and modulate the voice instruction output by the DSP processing module 130 into a voice instruction with a frequency band of 5.8GHz, and the second 5.8GHz rf module 210 is configured to receive the voice instruction with the frequency band of 5.8GHz, demodulate the voice instruction with the frequency band of 5.8GHz, and send the demodulated voice instruction to the television terminal SOC system 220.
It should be noted that 5.8GHz and 2.4GHz are two different frequency bands, the 2.4GHz carrier frequency band approximately accommodates 11 channels, and the frequency bands used by most of current television devices are 2.4GHz carrier frequency bands, so the current 2.4GHz frequency band is too crowded, and the 5.8GHz carrier frequency band uses few devices, compared with the 2.4GHz carrier frequency band, the channels that can be accommodated are several times more, and the frequency is higher, the anti-interference capability is stronger, meanwhile, the 5.8GHz wireless transmission adopts OFDM (orthogonal frequency division multiplexing) and a point-to-point and point-to-multipoint networking mode, a single sector is up to 54Mbps, has a wider transmission bandwidth than the 2.4GHz carrier, and can better meet the requirement of television remote voice data transmission.
When the system works, the first 5.8GHz rf module 140 obtains the voice data output by the tv end 200 and sends the voice data to the DSP processing module 130 for data processing, meanwhile, the first 5.8GHz rf module receives the multi-channel voice command output by the DSP processing module 130 and modulates the multi-channel voice command into a voice command with a frequency band of 5.8GHz, the modulated voice command is sent to the second 5.8GHz rf module 210 through the antenna, the second 5.8GHz rf module 210 receives the voice command with the frequency band of 5.8GHz and demodulates the voice carrier signal with the frequency band of 5.8GHz to obtain the voice command sent by the user, the voice command sent by the user is sent to the tv end SOC system 220 through the data bus for processing, in addition, the 5.8GHz rf module needs to complete data communication with the DSP processor and the tv end SOC system 220 at the same time, and a dedicated protocol stack is adopted in the first 5.8GHz rf module and the second 5.8GHz rf module, the method comprises the functions of multiple times of error correction, packet loss retransmission and the like, and further realizes point-to-point, multi-channel, low-delay and strong anti-interference transmission of voice data.
As shown in fig. 2, fig. 2 is a schematic diagram of a data structure of a 5.8GHz audio data transmission protocol stack in a far-field voice recognition system of a 5.8GHz wireless transmission television, and in order to ensure reliability and stability of voice data wireless transmission, a voice data packet of a 5.8GHz radio frequency module should conform to a uniform protocol standard, that is, a data frame structure definition. The specific protocol frame of the invention mainly comprises: synchronous frame, frame length, address, mixed data frame, check frame and end character.
The address frame mainly comprises a source address and a channel address, the source address is a self-set address of the transceiver module, the channel ID is a node number of different channels of the transceiver module, the mixed data frame mainly comprises a state frame and a data load, wherein the state frame comprises a preparation state: and marking whether the transceiver module is ready, wherein the transmission state marks the data transmission state of the transceiver module at present, and the retransmission times marks the retransmission times of the data after the wireless transceiving fails. The data load comprises a data type and voice data, wherein the data type marks the voice data type, and the data type comprises voice data, mixed field voice data and processed voice instructions sent by a television terminal. The 5.8GHz audio data transmission adopts a point-to-point and point-to-multipoint special transmission protocol, and compared with other wireless transmission modes based on a TCP/IP protocol stack, the transmission delay caused by a complex routing addressing process of the TCP/IP protocol is avoided, the transmission delay of voice data is greatly reduced, and the response speed of a system is improved.
And the television terminal SOC system 220 is used for comparing the received voice instruction with the cloud voice database, generating an operation instruction and further controlling the APP of the television terminal to complete the corresponding voice operation instruction.
The television terminal SOC system 220 receives the voice data demodulated by the second 5.8GHz rf module 210 through the I2S data bus, in addition, the television terminal SOC system 220 further transmits the audio data sent by the television body speaker to the second 5.8GHz rf module 210 through the I2S bus, the second 5.8GHz rf module 210 modulates and sends the audio data sent by the television body speaker to the mobile voice terminal 100 for mixed-field voice noise filtering, after receiving the processed voice instruction, the television terminal SOC system 220 performs data identification and comparison by connecting the cloud-side voice database 300 to generate an operation instruction, and then controls the APP of the television terminal to complete corresponding voice operations, such as page turning, channel changing and the like.
The technical scheme of the invention adopts a microphone array module 110, an ADC module 120, a DSP processing module 130, a first 5.8GHz radio frequency module 140, a second 5.8GHz radio frequency module 210 and a television SOC system 220 to form a television far-field voice recognition system based on 5.8GHz wireless transmission, when the system works, the microphone array module 110 collects multi-azimuth mixed-field voice data, the ADC module 120 performs analog-to-digital conversion on the mixed-field voice data and sends digital voice signals to the DSP processing module 130, the DSP processing module 130 receives the digital voice data and receives voice data sent by a television through the first 5.8GHz radio frequency module 140, local storage of the voice data sent by the television is realized, recognition of television loudspeaker voice, command voice sent by people and environmental noise is realized through a preset DSP algorithm, characteristics are extracted and eliminated, so that the voice data sent by the television is filtered and the like, first 5.8GHz radio frequency module 140 and second 5.8GHz radio frequency module 210 realize multichannel voice command or the modem of the sound data that the television end sent simultaneously, accomplish simultaneously with DSP processing module 130 and television end SOC system 220's data communication, second 5.8GHz radio frequency module 210 sends the voice command after handling to television end 200, after the voice command that the user sent was received to television end SOC system 220, carry out data contrast processing with the voice command that the user sent and high in the clouds voice database 300, discernment voice command, and then the APP of control television end accomplishes the instruction operation that the pronunciation corresponds. The voice data is transmitted by adopting a wireless technology of a carrier wave with a 5.8GHz frequency band, the wireless technology has wider transmission bandwidth compared with the wireless technology of a carrier wave with a 2.4GHz frequency band, and a 5.8GHz system adopts a direct sequence spread spectrum technology, so that the system has more channels, higher frequency, stronger anti-jamming capability, no extra transmission time delay and more ideal man-machine interaction, and can well meet the requirements of remote voice recognition and data transmission of a television.
In this embodiment, the DSP processing module 130 includes a voice processing chip, and the voice processing chip includes a plurality of logic processing cores for processing audio data of different channels, an SRAM unit for data caching, and a FLASH unit for voice data caching.
It should be noted that, in this embodiment, the DSP processing module 130 adopts an XMOSXUF216 voice processing chip, mainly performs related algorithm processing on voice data, 16 real-time logic processing cores are integrated in the chip, each DSP core can independently process audio data of different channels in parallel, and in addition, the chip also integrates an SRAM unit with 512KB for data caching and a Flash unit with 2MB for local storage of voice data, the DSP processing module 130 receives the voice data sent by the television end through the first 5.8GHz radio frequency module 140, the voice data is stored in the Flash unit, the voice processing chip calls the voice data in the Flash unit when processing the voice signal, and filters the sound data sent by the television terminal in the mixed field voice data through the DSP algorithm pre-stored in the voice processing chip, thereby obtaining a single voice command sent by the user, and the voice processing chip sends the voice command to the first 5.8GHz rf module 140 through the I2S data bus.
In addition, the DSP processing module 130 simultaneously supports multiple SPI, I2C and I2S interfaces for data communication with other modules, and as technology develops, the number and capacity of the real-time logic processing core, the SRAM unit and the FLASH unit integrated in the DSP processing module 130 will change accordingly, and therefore, no limitation is imposed on the specific number and capacity of the real-time logic processing core, the SRAM unit and the FLASH unit.
As shown in fig. 3, fig. 3 is a schematic structural diagram of the first 5.8GHz rf module in the far-field voice recognition system based on 5.8GHz wireless transmission television, it should be noted that the first 5.8GHz rf module 140 and the second 5.8GHz rf module 210 have the same structure, and in this embodiment, the 5.8GHz rf module includes a communication unit having the functions of operating control and data communication with the DSP processing module 130 or the television SOC system 220, and a modulation and demodulation unit having the functions of modulating and demodulating 5.8GHz carrier wave for voice signals.
It should be noted that, when the 5.8GHz rf module is the first 5.8GHz rf module 140, the first signal end of the first communication unit 141 is connected to the second signal end of the DSP processing module 130, the second signal end of the first communication unit 141 is connected to the signal end of the first modem unit 142, when the DSP processing module 130 outputs the processed voice command to the first modem unit 142 through the first communication unit 141, the first modem unit 142 digitally modulates the voice command and sends the voice command to the television 200 through an antenna, the first modem unit 142 receives the television voice signal output by the television 200 through the antenna, and the television voice signal is output to the DSP processing module 130 through the first communication unit 141 after demodulation, so that the DSP processing module 130 finishes filtering the voice data sent by the television in the mixed-field voice, thereby obtaining a single voice command sent by the user.
When the 5.8GHz rf module is the second 5.8GHz rf module 210, the first signal end of the second communication unit 212 is connected to the signal end of the second modem unit 211, the second signal end of the second communication unit 212 is connected to the signal end of the tv-end SOC system 220, when the second modem unit 211 receives voice data transmitted from the mobile voice terminal 100 through the antenna, the second modem unit 211 detects and demodulates the voice data and outputs the voice data to the tv-side SOC system 220 through the second communication unit 212, so that the SOC system 220 of the TV end performs data comparison by connecting with the cloud voice database 300 to generate an operation instruction, and further controls the APP of the TV end to complete voice operation, when the television-side SOC system 220 outputs the sound data emitted from the television side through the second communication unit 212, the second modem unit 211 modulates the sound data sent by the tv end and outputs the modulated sound data to the mobile audio end 100 through the antenna for mixed-field audio filtering.
As shown in fig. 4, fig. 4 is a schematic structural diagram of a first modem unit 141 in a far-field voice recognition system of a wireless transmission television based on 5.8GHz, it should be noted that the first modem unit 141 and the second modem unit 211 have the same structure, in this embodiment, the modem unit includes an intermediate frequency processing unit U1, a radio frequency processing unit U2, a frequency synthesizer H1, a plurality of filters and a plurality of amplifiers, the intermediate frequency processing unit U1 and the radio frequency processing unit U2 mainly include an IF and an RF transceiver, when the first modem unit 141 or the second modem unit 211 is used as a transmitting circuit, a voice command output by the DSP processing module 130 or voice data output by the television SOC system 220 are output to a TXI end or a TXQ end, the voice signal is converted into a fixed intermediate frequency signal of 380MHz by the intermediate frequency processing unit U1, the fixed intermediate frequency signal is filtered by a low pass filter C1, and the fixed intermediate frequency signal is output by the low pass filter C85, The radio frequency processing unit U2 outputs a required 5.8GHz high frequency signal in a frequency conversion manner, the high frequency signal is filtered and power-amplified and then sent to a television terminal or a mobile voice terminal by an antenna, when the first modem unit 141 or the second modem unit 211 serves as a receiving circuit, and when the antenna receives a voice signal, the voice signal is subjected to low-noise amplification and then mixed by the radio frequency processing unit U2, the received voice signal is converted into a 380MHz fixed intermediate frequency signal, the fixed intermediate frequency signal is filtered by the low pass filter C1 and frequency-converted by the intermediate frequency processing unit U1, and the voice signal is output to the television terminal SOC system 220 or the DSP processing module 130, so that the first modem unit 141 or the second modem unit 211 can realize a bidirectional transceiving function.
Referring to fig. 1 and fig. 4 together, the working principle of the far-field speech recognition system of the present embodiment based on 5.8GHz wireless transmission television is specifically described as follows:
in this embodiment, the television far-field speech recognition system based on 5.8GHz wireless transmission includes a mobile speech end 100 and a television end 200, where the mobile speech end 100 includes a microphone array module 110, an ADC module 120, a DSP processing module 130, and a first 5.8GHz radio frequency module 140; the tv end 200 includes a second 5.8GHz rf module 210 and a tv end SOC system 220.
When the system works, the microphone array module 110 collects multi-directional mixed field voice data, the ADC module 120 performs analog-to-digital conversion on the mixed field voice data and transmits a digital voice signal to the DSP processing module 130, the DSP processing module 130 receives the digital voice data and receives voice data transmitted from the television terminal through the first 5.8GHz rf module 140, and simultaneously implements local storage of the voice data transmitted from the television terminal, and filters the voice data transmitted from the television terminal through a preset DSP algorithm, the first 5.8GHz rf module 140 and the second 5.8GHz rf module 210 simultaneously implement modulation and demodulation of a multi-channel voice instruction or the voice data transmitted from the television terminal, and simultaneously implement data communication with the DSP processing module 130 and the television terminal SOC system 220, the second 5.8GHz rf module 210 transmits a processed voice instruction to the television terminal 200, a dedicated protocol stack is employed between the first 5.8GHz rf module 140 and the second 5.8GHz rf module 210, the method comprises the functions of multiple times of error correction, packet loss retransmission and the like, and further realizes point-to-point, multi-channel, low-delay and strong anti-interference transmission of voice data.
After receiving the voice instruction sent by the user, the television side SOC system 220 compares the voice instruction sent by the user with the cloud voice database 300, identifies the voice instruction, and then controls the APP of the television side to complete the instruction operation corresponding to the voice. The voice data is transmitted by adopting the wireless technology of the carrier wave of the 5.8GHz frequency band, the wireless technology of the carrier wave of the 2.4GHz frequency band has wider transmission bandwidth, and the 5.8GHz system adopts the direct sequence spread spectrum technology, so that more channels, higher frequency and stronger anti-jamming capability are provided, and the requirements of remote voice recognition and data transmission of a television can be well met.
Correspondingly, the present invention further provides a far-field speech recognition method for a television based on 5.8GHz wireless transmission, as shown in fig. 5, where fig. 5 is a flowchart of an embodiment of the far-field speech recognition method for a television based on 5.8GHz wireless transmission, and in this embodiment, the far-field speech recognition method for a television based on 5.8GHz wireless transmission includes:
step S100, mixed field voice data are collected and subjected to analog-to-digital conversion, and the mixed field voice data comprise voice instructions sent by a user and sound data sent by a television end.
It should be noted that, in this embodiment, the microphone array module 110 is used to collect mixed-field voice data, and the mixed-field voice data is subjected to analog-to-digital conversion by the ADC module 120 and then output to the DSP processing module 130, where the mixed-field voice data includes sound data sent by a television and a voice instruction sent by a user.
And S200, acquiring the mixed field voice data after the analog-to-digital conversion and the voice data sent by the television end, and filtering the voice data sent by the television end according to a preset algorithm to obtain a voice instruction sent by a user.
It should be noted that the DSP processing module 130 obtains, through the first 5.8GHz rf module 140, sound data emitted by the television end output by the television end 200, the sound data emitted by the television end is stored in Flash integrated by the DSP processing module 130, the DSP processing module 130 filters, through a DSP algorithm, the sound data emitted by the television end in mixed-field speech, so as to obtain a single speech instruction emitted by a user, and the speech instruction performs signal pre-emphasis, windowing, and the like through the DSP algorithm, and is output to the first 5.8GHz rf module 140 through the I2C data bus.
And step S300, modulating the voice command into a voice command with a frequency band of 5.8GHz and outputting the voice command.
It should be noted that, during operation, the first 5.8GHz rf module 140 receives and modulates the multi-channel voice signal output by the DSP processing module 130, a frequency band of the modulated voice carrier signal is 5.8GHz, the voice carrier signal is sent to the second 5.8GHz rf module 210 through the antenna, and meanwhile, the first 5.8GHz rf module 140 obtains the voice data output by the television end 200 and sends the voice data to the DSP processing module 130 for data processing.
And S400, receiving the voice command with the frequency band of 5.8GHz and demodulating the received voice command with the frequency band of 5.8 GHz.
It should be noted that, the second 5.8GHz rf module 210 receives the voice carrier signal with the frequency band of 5.8GHz through the antenna, and the second 5.8GHz rf module 210 completes receiving and demodulating the voice carrier signal with the frequency band of 5.8GHz and sends the signal to the television SOC system 220 for processing through the data bus.
And a special protocol stack is adopted in the first 5.8GHz radio frequency module and the second 5.8GHz radio frequency module, wherein the special protocol stack comprises the functions of error correction, packet loss retransmission and the like for many times, so that point-to-point, multi-channel, low-delay and strong anti-interference transmission of voice data is realized.
And S500, comparing the voice command with the background voice database to generate an operation command, and further controlling the APP of the television end to complete corresponding voice operation.
The television terminal SOC system 220 receives the voice data demodulated by the second 5.8GHz radio frequency module 210 through the I2S data bus, the television terminal SOC system 220 receives the voice data processed by the DSP processing module 130, performs data comparison by connecting the cloud voice database 300, generates an operation instruction, and then controls the APP of the television terminal to complete corresponding voice operations, such as page turning, channel changing, and the like.
In addition, the SOC system 220 of the television terminal further transmits the audio data sent by the television body to the second 5.8GHz rf module 210 through the I2S bus, and then modulates and sends the audio data to the mobile voice terminal 100 for mixed-field voice noise filtering.
As shown in fig. 6, fig. 6 is a flowchart illustrating a far-field speech recognition method for a television based on 5.8GHz wireless transmission according to a second embodiment of the present invention, and step S200 includes:
s210, mixed field voice data after analog-to-digital conversion and voice data sent by a television end are obtained, and the voice data are cached.
It should be noted that the mixed-field voice data is output to the DSP processing module 130 after analog-to-digital conversion by the ADC chip, the DSP processing module 130 adopts the XMOSXUF216 voice processing chip, and the XMOSXUF216 voice processing chip receives the voice data sent by the television terminal through the 5.8GHz radio frequency module and stores the voice data in the Flash integrated with the chip, so as to be used for subsequent operation of filtering the voice data sent by the television terminal.
And S220, filtering the sound data sent by the television according to a preset algorithm to obtain a voice command sent by a user.
It should be noted that 16 real-time logic processing cores are integrated inside the XMOSXUF216 voice processing chip, each DSP core can independently process audio data of different channels in parallel, and the XMOSXUF216 voice processing chip filters out the sound data of the television speaker in the mixed-field voice through a DSP algorithm, so as to obtain a single voice instruction sent by a client.
And S230, outputting the voice command after signal pre-emphasis and windowing pre-processing.
It should be noted that, in order to ensure that the voice command is transmitted to the television 200 in a complete timing manner, signal pre-emphasis and windowing are required to be performed on the voice command before transmission, where pre-emphasis refers to using a pre-emphasis network before the voice command is introduced to emphasize the high-frequency component of the input signal of the first 5.8GHz rf module 140, so as to effectively improve the output signal-to-noise ratio, and in signal processing, because the 5.8GHz rf module can only process signals of a limited length, the original signal is truncated by T (sampling time), i.e., limited, and further processed after XT (T), and this process sequence is windowing.
As shown in fig. 7, fig. 7 is a flowchart illustrating a far-field speech recognition method for a television based on 5.8GHz wireless transmission according to a third embodiment of the present invention, and step S300 includes:
step S310, receiving the voice command after the pre-emphasis and the windowing pretreatment, and carrying out frequency conversion on the received voice command after the pre-emphasis and the windowing pretreatment into an intermediate frequency signal; step S320, low-pass filtering the intermediate frequency signal and frequency converting the intermediate frequency signal into a high-frequency carrier signal with a frequency band of 5.8 GHz; and S330, amplifying the high-frequency carrier signal with the frequency band of 5.8GHz and transmitting the signal through an antenna.
In this embodiment, the first modem unit 141 and the second modem unit 211 have the same structure, the first modem unit 141 includes an intermediate frequency processing unit U1, a radio frequency processing unit U2, a frequency synthesizer H1, a plurality of filters and a plurality of amplifiers, the intermediate frequency processing unit U1 and the radio frequency processing unit U2 mainly include an IF and an RF transceiver, when the first modem unit 141 is used as a transmitting circuit, sound data output by a voice command output by the DSP processing module 130 is output to a TXI end or a TXQ end, the voice signal is converted into a fixed intermediate frequency signal of 380MHz by the intermediate frequency processing unit U1, the fixed intermediate frequency signal is filtered by the low pass filter C1, and a high frequency signal of 5.8GHz required by the radio frequency processing unit U2 is output by frequency conversion, and the high frequency signal is transmitted to a television end by an antenna after being filtered and power amplified.
As shown in fig. 8, fig. 8 is a schematic flowchart of a fourth embodiment of the far-field speech recognition method for a wireless transmission television based on 5.8GHz in the present invention, and step S400 includes:
step S410, receiving the high-frequency carrier signal with the frequency band of 5.8GHz and carrying out low-noise amplification processing on the received high-frequency carrier signal with the frequency band of 5.8 GHz; step S420, mixing the high-frequency carrier signals after the low-noise amplification treatment to obtain intermediate-frequency signals; and step S430, filtering and frequency converting the intermediate frequency signal to obtain and output a voice instruction sent by the user.
When the second modem unit 211 is used as a receiving circuit, and the antenna frequency band is a high-frequency carrier signal of 5.8GHz, the high-frequency carrier signal is subjected to low-noise amplification, and then mixed by the radio frequency processing unit U2, the received voice signal is converted into a fixed intermediate-frequency signal of 380MHz, the fixed intermediate-frequency signal is filtered by the low-pass filter C1, and converted by the intermediate-frequency processing unit U1, and the voice signal is output to the television SOC system 220.
As shown in fig. 9, fig. 9 is a schematic flowchart of a fifth embodiment of the far-field speech recognition method for a wireless transmission television based on 5.8GHz in the present invention, and step S500 specifically includes:
step S510, receiving a voice command sent by the user, and performing voice analysis and measure analysis on the received voice command sent by the user.
It can be understood that the speech signal is a random signal with high redundancy, and when performing speech signal processing (speech recognition, speech synthesis, and speech compression), the redundancy of the signal can be effectively reduced only by feature extraction, and the speech feature extraction obtains parameters characterizing the speech signal by analyzing the speech signal, so after receiving a speech instruction, speech analysis needs to be performed on the speech signal, and measure estimation is the core of the speech recognition, i.e., a pattern matching and model training technique. The model training aims to obtain model parameters for representing the essential characteristics of a large number of known modes according to a certain criterion, and the mode matching aims to obtain the best matching between an unknown mode and a certain model in a model base according to a certain criterion so as to represent the measure between the parameters and the template.
Step S520, data searching and comparing are carried out on the voice command after voice analysis and measure analysis and the cloud voice database, and then recognition judgment is made.
It should be noted that the cloud speech database 300 stores a control instruction corresponding to the speech instruction, the speech instruction after the measurement analysis of the cloud speech database 300 is compared with the control instruction in the cloud speech database 300, and is identified and matched according to the characteristics of the speech instruction, and along with the upgrade of the television, the cloud speech database 300 can be upgraded to meet the requirements of the user.
And step S530, controlling the APP of the television end to execute corresponding voice instruction operation.
When the voice command is successfully matched with the control command in the cloud voice database 300, the SOC system 220 of the television outputs a corresponding control signal to the APP of the television to perform corresponding operations, such as page turning, channel changing, and the like.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all modifications and equivalents of the present invention, which are made by the contents of the present specification and the accompanying drawings, or directly/indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A television far-field voice recognition system based on 5.8GHz wireless transmission is characterized by comprising a mobile voice end and a television end, wherein the mobile voice end comprises a microphone array module, an ADC (analog-to-digital converter) module, a DSP (digital signal processor) processing module and a first 5.8GHz radio frequency module; the television end comprises a second 5.8GHz radio frequency module and a television end SOC system; wherein,
the microphone array module is used for collecting mixed field voice data, and the mixed field voice data comprises voice instructions sent by a user and sound data sent by a television end;
the ADC module is used for performing analog-to-digital conversion on the mixed field voice data;
the DSP processing module is used for acquiring mixed field voice data subjected to analog-to-digital conversion by the ADC module and voice data sent by the television terminal through the first 5.8GHz radio frequency module, filtering the voice data of the television terminal through a preset algorithm to obtain a voice instruction sent by the user, and outputting the voice instruction to the first 5.8GHz radio frequency module;
the first 5.8GHz radio frequency module is used for acquiring sound data sent by the television terminal and modulating the voice instruction output by the DSP processing module into a voice instruction with a frequency band of 5.8 GHz;
the second 5.8GHz radio frequency module is used for receiving the voice command with the frequency band of 5.8GHz, demodulating the voice command with the frequency band of 5.8GHz and sending the demodulated voice command to the television SOC system;
and the television terminal SOC system is used for comparing the received voice instruction with the cloud voice database to generate an operation instruction, and then controlling the APP of the television terminal to complete the corresponding voice instruction operation.
2. The 5.8GHz wireless transmission based television far-field speech recognition system of claim 1, wherein a first signal terminal of the ADC module is connected with a signal output terminal of the microphone array module, and a second signal terminal of the ADC module is connected with a first signal terminal of the DSP processing module.
3. The far-field speech recognition system for television based on 5.8GHz wireless transmission according to claim 1, wherein the DSP processing module comprises a speech processing chip including a plurality of logic processing cores for processing different channel audio data, an SRAM unit for data caching, and a FLASH unit for voice data caching.
4. The far-field speech recognition system for television based on 5.8GHz wireless transmission according to claim 1, wherein the first 5.8GHz radio frequency module comprises a first communication unit for data communication with the DSP processing module and a first modem unit for modulating and demodulating voice commands issued by the user or voice data issued by the television terminal, a first signal terminal of the first communication unit is connected with a second signal terminal of the DSP processing module, and a second signal terminal of the first communication unit is connected with a signal terminal of the first modem unit.
5. The far-field speech recognition system for television based on 5.8GHz wireless transmission according to any one of claims 1 to 4, wherein the second 5.8GHz radio frequency module comprises a second communication unit for performing data communication with the television terminal SOC system and a second modulation and demodulation unit for modulating and demodulating voice commands issued by the user or voice data issued by the television terminal, a first signal terminal of the second communication unit is connected with a signal terminal of the second modulation and demodulation unit, and a second signal terminal of the second communication unit is connected with a signal terminal of the television terminal SOC system.
6. A far-field voice recognition method based on 5.8GHz wireless transmission television is characterized by comprising the following steps:
s100, collecting mixed field voice data and carrying out analog-to-digital conversion on the mixed field voice data, wherein the mixed field voice data comprises voice instructions sent by a user and sound data sent by a television end;
s200, acquiring mixed field voice data after analog-to-digital conversion and voice data sent by the television end, and filtering the voice data sent by the television end according to a preset algorithm to obtain a voice instruction sent by a user;
step S300, modulating the voice command into a voice command with a frequency band of 5.8GHz and outputting the voice command;
step S400, receiving the voice command with the frequency band of 5.8GHz and demodulating the received voice command with the frequency band of 5.8 GHz;
and S500, comparing the received voice instruction with the cloud voice database to generate an operation instruction, and further controlling the APP of the television end to execute corresponding voice instruction operation.
7. The far-field speech recognition method for television based on 5.8GHz radio transmission according to claim 6, wherein the step S200 comprises:
step S210, acquiring the mixed field voice data after the analog-to-digital conversion and the sound data sent by the television end, and caching the sound data sent by the television end;
step S220, filtering sound data sent by the television according to a preset algorithm to obtain a voice instruction sent by the user;
and step S230, performing signal pre-emphasis and windowing pre-processing on the voice command and outputting the voice command.
8. The far-field speech recognition method for television based on 5.8GHz radio transmission according to claim 7, wherein the step S300 comprises:
step S310, receiving the voice command after the pre-emphasis and the windowing pretreatment, and carrying out frequency conversion on the received voice command after the pre-emphasis and the windowing pretreatment into an intermediate frequency signal;
step S320, low-pass filtering the intermediate frequency signal and frequency converting the intermediate frequency signal into a high-frequency carrier signal with a frequency band of 5.8 GHz;
and S330, filtering and amplifying the high-frequency carrier signal with the frequency band of 5.8GHz, and transmitting the signal through an antenna.
9. The far-field speech recognition method for television based on 5.8GHz radio transmission according to claim 8, wherein the step S400 comprises:
step S410, receiving the high-frequency carrier signal with the frequency band of 5.8GHz and carrying out low-noise amplification processing on the received high-frequency carrier signal with the frequency band of 5.8 GHz;
step S420, mixing the high-frequency carrier signals after the low-noise amplification treatment to obtain intermediate-frequency signals;
and step S430, filtering and frequency converting the intermediate frequency signal to obtain and output a voice instruction sent by the user.
10. The far-field speech recognition method for television based on 5.8GHz radio transmission according to any of claims 6 to 9, wherein step S500 comprises:
step S510, receiving a voice command sent by the user, and performing voice analysis and measure analysis on the received voice command sent by the user;
step S520, carrying out data search and comparison on the voice command subjected to voice analysis and measure analysis and the cloud voice database, and further making recognition judgment;
and step S530, controlling the APP of the television end to execute corresponding voice instruction operation.
CN201810425853.8A 2018-05-04 2018-05-04 It is wirelessly transferred TV far field speech recognition system and method based on 5.8GHz Pending CN108597513A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810425853.8A CN108597513A (en) 2018-05-04 2018-05-04 It is wirelessly transferred TV far field speech recognition system and method based on 5.8GHz

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810425853.8A CN108597513A (en) 2018-05-04 2018-05-04 It is wirelessly transferred TV far field speech recognition system and method based on 5.8GHz

Publications (1)

Publication Number Publication Date
CN108597513A true CN108597513A (en) 2018-09-28

Family

ID=63619900

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810425853.8A Pending CN108597513A (en) 2018-05-04 2018-05-04 It is wirelessly transferred TV far field speech recognition system and method based on 5.8GHz

Country Status (1)

Country Link
CN (1) CN108597513A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110503975A (en) * 2019-08-02 2019-11-26 广州长嘉电子有限公司 Smart television speech enhan-cement control method and system based on multi-microphone noise reduction
CN111261152A (en) * 2018-12-03 2020-06-09 西安易朴通讯技术有限公司 Intelligent interaction system
CN111970568A (en) * 2020-08-31 2020-11-20 上海松鼠课堂人工智能科技有限公司 Method and system for interactive video playing

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN203457266U (en) * 2013-08-15 2014-02-26 安徽科大讯飞信息科技股份有限公司 Voice acquisition device and television system with voice acquisition function
CN103607611A (en) * 2013-11-18 2014-02-26 四川长虹电器股份有限公司 Voice control method and system of intelligent television
CN205759653U (en) * 2016-05-17 2016-12-07 康佳集团股份有限公司 A kind of toy system based on Yun Zhi control
CN206353839U (en) * 2016-12-13 2017-07-25 深圳创维-Rgb电子有限公司 A kind of TV speech control system
US20180033438A1 (en) * 2016-07-26 2018-02-01 Samsung Electronics Co., Ltd. Electronic device and method of operating the same
CN107680594A (en) * 2017-10-18 2018-02-09 宁波翼动通讯科技有限公司 A kind of distributed intelligence voice collecting identifying system and its collection and recognition method
CN207200736U (en) * 2017-07-11 2018-04-06 西安花生科技有限公司 Domestic electric appliances controller based on Voice command

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN203457266U (en) * 2013-08-15 2014-02-26 安徽科大讯飞信息科技股份有限公司 Voice acquisition device and television system with voice acquisition function
CN103607611A (en) * 2013-11-18 2014-02-26 四川长虹电器股份有限公司 Voice control method and system of intelligent television
CN205759653U (en) * 2016-05-17 2016-12-07 康佳集团股份有限公司 A kind of toy system based on Yun Zhi control
US20180033438A1 (en) * 2016-07-26 2018-02-01 Samsung Electronics Co., Ltd. Electronic device and method of operating the same
CN206353839U (en) * 2016-12-13 2017-07-25 深圳创维-Rgb电子有限公司 A kind of TV speech control system
CN207200736U (en) * 2017-07-11 2018-04-06 西安花生科技有限公司 Domestic electric appliances controller based on Voice command
CN107680594A (en) * 2017-10-18 2018-02-09 宁波翼动通讯科技有限公司 A kind of distributed intelligence voice collecting identifying system and its collection and recognition method

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111261152A (en) * 2018-12-03 2020-06-09 西安易朴通讯技术有限公司 Intelligent interaction system
CN110503975A (en) * 2019-08-02 2019-11-26 广州长嘉电子有限公司 Smart television speech enhan-cement control method and system based on multi-microphone noise reduction
CN110503975B (en) * 2019-08-02 2022-02-01 广州长嘉电子有限公司 Smart television voice enhancement control method and system based on multi-microphone noise reduction
CN111970568A (en) * 2020-08-31 2020-11-20 上海松鼠课堂人工智能科技有限公司 Method and system for interactive video playing
CN111970568B (en) * 2020-08-31 2021-07-16 上海松鼠课堂人工智能科技有限公司 Method and system for interactive video playing

Similar Documents

Publication Publication Date Title
CN109166578B (en) Mobile terminal, voice control method and related product
CN108597513A (en) It is wirelessly transferred TV far field speech recognition system and method based on 5.8GHz
CN203608193U (en) Hand-held short-wave radio station based on radio frequency digitalization
WO2012006857A1 (en) Integrated talkback chip and integrated talkback system including the chip
US10153801B2 (en) Systems and methods for acoustic communication in a mobile device
CN106385645B (en) Wireless microphone system
CN215222249U (en) Multimode intelligent interphone
CN105978645A (en) Device and method for avoiding signal interference
CN106603669A (en) Control method and system for distributed type main equipment and auxiliary equipment
RU85055U1 (en) DIGITAL COMMUNICATION ON-BOARD COMPLEX
CN203180922U (en) Wireless earphone system based on near field electromagnetic induction
CN101917206B (en) Integrated talkback chip and integrated talkback system with dual tone multiple frequency coding and decoding function
CN210807309U (en) Portable wireless CPE device with voice interaction function
CN211089640U (en) extensible intelligent communication module based on GSM/L TE Cat1
CN103067323B (en) A kind of intermediate frequency demodulation device being applied to intercom
CN208461978U (en) A kind of intercom and talkback system
CN203643599U (en) Secondary radar high- and intermediate-frequency digital receiver
CN213366133U (en) One-key type intelligent voice interaction microphone system
CN215912118U (en) Automatic test system of integral type thing networking perception equipment
CN106658246B (en) Wireless microphone system
CN105450285B (en) A kind of indirect communication device of ultrashort wave Vehicle mounted station
CN107071644A (en) A kind of radio microphone function implementation method, mobile terminal and readable storage medium storing program for executing
CN207690495U (en) A kind of high sensitivity speech recognition system
CN209233816U (en) Wireless communication module
CN207473826U (en) A kind of remote controler for supporting more radio frequency protocols

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180928