WO2021004067A1 - Display device - Google Patents

Display device Download PDF

Info

Publication number
WO2021004067A1
WO2021004067A1 PCT/CN2020/075958 CN2020075958W WO2021004067A1 WO 2021004067 A1 WO2021004067 A1 WO 2021004067A1 CN 2020075958 W CN2020075958 W CN 2020075958W WO 2021004067 A1 WO2021004067 A1 WO 2021004067A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
voice
circuit
sound
display device
Prior art date
Application number
PCT/CN2020/075958
Other languages
French (fr)
Chinese (zh)
Inventor
李本友
于云涛
Original Assignee
海信视像科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201910620438.2A external-priority patent/CN110349582B/en
Priority claimed from CN201910619184.2A external-priority patent/CN110223707A/en
Application filed by 海信视像科技股份有限公司 filed Critical 海信视像科技股份有限公司
Publication of WO2021004067A1 publication Critical patent/WO2021004067A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation

Definitions

  • This application relates to the field of electronic technology, and in particular to a display device.
  • the display device With the continuous development of electronic technology, more and more display devices, such as mobile phones, tablet computers, and televisions, have functions that can interact with users through voice. Among them, the user can directly speak the instructions that need to be executed to the display device.
  • the display device collects the voice signal of the external environment where the display device is located through the microphone, and recognizes that the voice signal includes the instruction spoken by the user, and then executes it. The function corresponding to the instruction.
  • some display devices also implement a far-field sound pickup function based on the voice interaction function. Among them, after the display device receives the voice signal, it needs to process the voice signal such as filtering, denoising and echo cancellation, and then recognize the instructions in the processed voice signal to obtain higher far-field voice Recognition accuracy rate.
  • the display device when the display device performs echo cancellation processing on the received voice signal, the echo itself is caused by the original playback signal determined by the System on Chip (SOC) controller in the display device.
  • SOC System on Chip
  • the display device can directly use the original playback signal in the SOC controller as the echo reference signal, and perform echo cancellation processing on the received voice signal.
  • the present application provides a display device to improve the echo cancellation effect when the display device performs echo cancellation.
  • the display device includes: a voice processing circuit, a power amplifier, a speaker, a voice collection circuit, and an echo processing circuit;
  • the voice processing circuit, the power amplifier and the loudspeaker are connected in sequence; the voice processing circuit is connected to the voice collection circuit and the echo processing circuit respectively;
  • the voice processing circuit is used to send an original playback signal to the power amplifier;
  • the power amplifier is used to process the original playback signal, and then send the obtained signal to be played to the speaker for playback;
  • the voice collection circuit is used to collect voice signals to be processed in the environment where the display device is located;
  • the echo processing circuit is configured to obtain the signal to be played sent by the power amplifier to the speaker;
  • the voice processing circuit is further configured to perform echo cancellation processing on the voice signal to be processed according to the signal to be played.
  • the echo processing circuit is also used to preprocess the signal to be played; the voice processing circuit is specifically used to perform processing on the signal to be played according to the preprocessed signal to be played The voice signal undergoes echo cancellation processing.
  • the preprocessing includes: amplitude reduction processing.
  • the power amplifier is also used to obtain the left channel signal and the right channel signal corresponding to the signal to be played according to the differential processing of the signal to be played, and send them to the speaker for performing Play;
  • the preprocessing also includes: converting to single-ended processing.
  • the left channel signal includes: a left channel positive differential signal and a left channel negative differential signal
  • the echo processing circuit includes: a left channel processing circuit
  • the right channel signal includes : Right channel positive differential signal and right channel negative differential signal
  • the echo processing circuit includes: a left channel processing circuit and a right channel processing circuit;
  • the left channel processing circuit is configured to perform amplitude reduction processing and single-ended conversion on the left channel positive differential signal and the left channel negative differential signal; wherein, the left channel processing circuit includes: The first input resistor, the first feedback resistor, and the first operational amplifier; the left channel positive differential signal is connected to the same direction input terminal of the first operational amplifier, and the left channel negative differential signal passes through the first operational amplifier.
  • An input resistor is connected to the inverting input terminal of the first operational amplifier, and the output terminal of the first operational amplifier is connected to the inverting input terminal of the first operational amplifier through the first feedback resistor.
  • the right channel signal includes: a right channel positive differential signal and a right channel negative differential signal;
  • the echo processing circuit includes: a right channel processing circuit;
  • the right channel processing circuit is configured to perform amplitude reduction processing and conversion to single-ended processing on the right channel positive differential signal and the right channel negative differential signal; wherein, the right channel processing circuit includes: A second input resistor, a second feedback resistor and a second operational amplifier; the right channel positive differential signal is connected to the same direction input terminal of the second operational amplifier, and the right channel negative differential signal passes through the first Two input resistors are connected to the reverse input end of the second operational amplifier, and the output end of the second operational amplifier is connected to the reverse input end of the second operational amplifier through the second feedback resistor.
  • the voice collection circuit is composed of a MIC array, and the MIC array includes a plurality of MICs; the voice signal to be processed is a pulse density modulated PDM signal collected by the MIC array.
  • the MIC array includes a first MIC, a second MIC, a third MIC, and a fourth MIC arranged in sequence;
  • the set second MIC and fourth MIC are recorded as the second group of MIC;
  • the MIC array specifically collects the to-be-processed voice signal in turn through the first group of MICs and the second group of MICs.
  • the voice processing circuit is also used to perform sampling and analog-to-digital conversion processing on the PDM signal.
  • the voice processing circuit is further configured to recognize the instruction in the to-be-processed voice signal after echo cancellation processing, and execute the function corresponding to the instruction.
  • the voice processing circuit is further configured to send the voice signal of the voice signal to be processed after echo cancellation processing to the server, so that the server recognizes the instruction in the voice signal to be processed, and then sends the voice signal to the server.
  • the voice processing circuit sends an instruction message; receives the instruction message sent by the server, and executes the function corresponding to the instruction message.
  • the present application provides a display device.
  • the display device includes a speaker and a far-field voice processing circuit; the far-field voice processing circuit includes:
  • Speaker used to play the sound output by the device
  • a sound pickup circuit for picking up far-field sounds where the far-field sounds include far-field voices emitted by a user and sounds played by the speaker and transmitted to the sound pickup circuit;
  • a preprocessing circuit connected to the sound pickup circuit to receive the picked-up far-field sound, and the preprocessing circuit is connected to the front end of the speaker to obtain the playback sound recovery signal;
  • the echo processing circuit is connected to the preprocessing circuit to receive the picked up far-field sound and the playback sound recovery signal, and use the playback sound recovery signal to echo cancel the picked up far-field sound to obtain the user's output Far-field voice.
  • the preprocessing circuit includes:
  • the pre-processing circuit is coupled with the sound pickup circuit and the front end of the speaker to convert the picked-up far-field sound and the playback sound recovery signal into a format compatible with the echo processing circuit.
  • the pre-processing circuit is further used to adjust the phase of the picked up far-field sound and the playback sound recovery signal, so that the phase of the playback sound recovery signal is ahead of the picked up far-field sound The phase of is within the preset duration.
  • the preprocessing circuit further includes:
  • a first encoder the pre-processing circuit is connected to the front end of the speaker through the first encoder, and the first encoder performs analog-to-digital conversion on the playback sound recovery signal.
  • the display device includes a power amplifier; the power amplifier is connected between the speaker and the echo processing circuit, and is used to provide the speaker with multiple channels of sound output by the device; and the playback sound
  • the recovery signal includes the multi-channel sound obtained from the front end of the speaker;
  • the first encoder is also used for synthesizing multiple sounds obtained from the front end of the speaker.
  • the sound pickup circuit includes a microphone array, and a second encoder electrically connected to the microphone array, wherein the microphone array is used for picking up the far-field sound; the second encoder is used for To perform analog-to-digital conversion on the far-field sound;
  • the second encoder is also used for synthesizing multiple far-field sounds picked up by the microphone array.
  • the far-field sound processing circuit further includes a speech enhancement circuit and a sound source localization circuit, and the echo-cancelled far-field sound output by the echo cancellation circuit is transmitted to the speech enhancement circuit and the sound source localization circuit respectively.
  • the speech enhancement circuit is connected to the sound source localization circuit to receive the sound source localization result output by the sound source localization circuit, and according to the sound source localization result, enhance the far-field sound after echo cancellation, To generate to form the far-field voice to be uploaded.
  • the display device further includes a voice engine circuit connected to the output terminal of the voice enhancement circuit, and the voice engine circuit performs wake-up word recognition processing on the far-field voice to be uploaded , To encode the far-field voice to be uploaded and transmit it to the designated terminal when the preset wake-up word is recognized;
  • the voice engine circuit is also used to receive an instruction corresponding to the far-field voice returned from a designated terminal.
  • the display device has a main control chip, and the echo processing circuit, voice enhancement circuit, sound source localization circuit, and voice engine circuit are all integrated in the main control chip.
  • a far-field speech processing circuit includes:
  • a sound pickup circuit for picking up far-field sounds where the far-field sounds include far-field voices emitted by a user and sounds played by the speaker and transmitted to the sound pickup circuit;
  • a preprocessing circuit connected to the sound pickup circuit to receive the picked-up far-field sound, and the preprocessing circuit is connected to the front end of the speaker to obtain the playback sound recovery signal;
  • the echo processing circuit is connected to the preprocessing circuit to receive the picked up far-field sound and the playback sound recovery signal, and use the playback sound recovery signal to echo cancel the picked up far-field sound to obtain the user's output Far-field voice.
  • the power amplifier will perform related processing on the sound signal that needs to be played, so the sound signal that needs to be played has already undergone non-linear changes before and after passing through the power amplifier; therefore, this solution changes from the power amplifier
  • the back end and the front end of the speaker obtain the playback sound recovery signal, so even after the non-linear signal processing such as equalization and amplification in the power amplifier, the playback sound recovery signal obtained by the preprocessing circuit and the speaker playback signal picked up by the sound pickup circuit
  • the sound is very close, so the playback sound recovery signal is used to eliminate the echo of the picked up far-field sound, which can greatly reduce the echo interference in the far-field voice sent by the user, and improve the accuracy of identifying the far-field voice. , Thereby improving the sensitivity of remote sound pickup to interrupt wake-up and improve user experience;
  • this embodiment sets up a preprocessing circuit to receive the picked up far-field sound and play the sound recovery signal, thereby overcoming that many existing display device SOC chips do not have corresponding interfaces and cannot receive the far-field sound transmitted by the microphone array. Defects. Therefore, the technical solution of this application improves the popularity of far-field voice human-computer interaction technology on display devices.
  • the display device provided by the present application can obtain the signal to be played output from the power amplifier to the speaker through the echo processing circuit, and use the signal to be played as the echo reference signal to perform echo cancellation on the voice signal to be processed received by the voice collection circuit deal with. It can more accurately represent the sound signal actually played by the speaker in the voice signal to be processed, so that the voice processing circuit can achieve a better echo cancellation effect when the voice signal to be processed is echo canceled, thereby improving the subsequent processing of the voice signal. Accuracy of recognition.
  • Figure 1 is a schematic diagram of the application scenario of this application.
  • FIG. 2 is a schematic diagram of the processing flow of the voice signal by the electronic device display device of this application;
  • FIG. 3 is a schematic diagram of a structure of an electronic device display device in the related art.
  • FIG. 5 is a schematic structural diagram of an embodiment of an electronic device display device provided by this application.
  • Figure 6 is a schematic diagram of the structure of a power amplifier
  • FIG. 7 is a schematic structural diagram of an embodiment of an electronic device display device provided by this application.
  • FIG. 8 is a schematic structural diagram of an embodiment of an echo processing circuit provided by this application.
  • FIG. 9 is a schematic structural diagram of an embodiment of an electronic device display device provided by this application.
  • FIG. 10 is a schematic structural diagram of an embodiment of an arrangement of a MIC array in a voice collection circuit provided by this application;
  • Figure 11 is a schematic circuit diagram of an embodiment of a voice collection circuit provided by this application.
  • FIG. 12 is a schematic diagram of the processing flow of the voice signal to be processed and the signal to be played by the voice processing circuit provided by this application;
  • FIG. 13 is a front view of an embodiment of a display device of the present application.
  • Figure 14 is an exploded view of part of the structure of Figure 13;
  • Fig. 16 is a circuit connection block diagram of an embodiment of the far-field speech processing circuit of the present application.
  • 17 is a circuit connection block diagram of another embodiment of the far-field speech processing circuit of the present application.
  • FIG. 18 is a circuit connection block diagram of still another embodiment of the far-field speech processing circuit of the present application.
  • Figure 19 is a circuit diagram of the interface between the microphone array and the second encoder
  • 20 is a block diagram of the functional structure of an embodiment of the main control chip
  • Fig. 21 is a block diagram of partial circuit connections of an embodiment of the far-field speech processing circuit of the present application.
  • FIG. 1 is a schematic diagram of the application scenario of this application, in which each embodiment of this application is applied to a display device 1 with a voice interaction function, that is, the display device 1 can execute related functions corresponding to the instructions according to the instructions spoken by the user 2.
  • the display device 1 includes: mobile phones, tablet computers, notebook computers, televisions, and other smart devices with related signal processing functions or data processing functions, such as smart watches, smart speakers, smart appliances, etc.
  • the display device 1 is a television set as an exemplary description, rather than limiting it.
  • the flow of processing the voice signal by the display device 1 can be referred to as shown in FIG. 2, where FIG. 2 is a schematic diagram of the processing flow of the voice signal by the display device of this application.
  • the display device 1 collects sound signals of the surrounding environment where the display device 1 is located through a microphone to obtain a voice signal, and then performs detection processing on the collected voice signal.
  • the key words are used to detect the instructions issued by the user 2 in the voice signal that need to be executed by the display device, such as "turn on”, “change channel”, “increase or decrease volume”, and “turn off”.
  • the display device 1 executes the function corresponding to the recognized instruction. For example, after the display device 1 recognizes that the command in the collected voice signal is "shutdown", it executes the shutdown operation.
  • the display device 1 can usually detect the instructions spoken by the user 2 through the voice signal with a higher recognition rate.
  • the distance between the user 2 and the display device 1 is relatively long
  • the instructions given by the user 2 are very weak, and the voice signals include interference such as noise and echo, so that the display device 1 recognizes the instructions spoken by the user 2 farther away.
  • the recognition rate is low. Therefore, some display devices 1 also have a far-field sound pickup function.
  • the voice signal After receiving a voice signal including instructions uttered by a user 2 who is far away, the voice signal is also processed by filtering, denoising and echo cancellation.
  • the instruction of the user 2 in the processed voice signal is detected again to improve the recognition accuracy of the instruction spoken by the user 2 far away from the display device 1.
  • the display device 1 when the display device 1 is a television, the television itself is also playing sound signals through its speakers, and at the same time, in order to recognize the instructions spoken by the user 2, the display device 1 is collecting the voice signals of the environment where it is located. At this time, the collected voice signal will inevitably include the sound signal played by the display device 1 through the speaker. Therefore, in order for the display device 1 to accurately detect the instructions spoken by the user 2 in the voice signal, it needs to eliminate the voice signal played by the display device 1 itself in the voice signal. This processing process is called "echoes" in some technologies. eliminate".
  • FIG. 3 is a schematic structural diagram of a display device in the related art.
  • the display device 1 shown in FIG. 3 it includes: a MIC board 11, a main board 12, a left channel speaker 123 and a right channel speaker 124.
  • the SOC121 on the main board 12 is used to determine the original playback signal to be played by the display device 1, and send the original playback signal to the AMP122 through the I2S interface for processing; then the AMP122 amplifies the original playback signal and converts the single-ended signal to After the left channel signal and the right channel signal, they are respectively sent to the left channel speaker 123 and the right channel speaker 124 for playback.
  • the microphone 111 on the MIC board 11 is used to collect the voice signal of the environment where the display device 1 is located, and send the voice signal to the codec unit 112 for codec processing, convert it into a voice signal in I2S format and send it to the MIC board MCU113. After the voice signal is further processed by the MCU113, it is sent to the SOC121 on the main board 12 through the USB interface.
  • the SOC121 can determine the original playback signal that the display device needs to play, and it needs to perform echo cancellation processing on the received voice signal, so the SOC121 can directly use the original playback signal that needs to be played as the echo reference signal.
  • the SOC121 After the voice signal sent by the MCU113 undergoes echo cancellation processing, the SOC121 then performs keyword detection, instruction recognition, and execution of functions corresponding to the instruction according to the voice signal after the echo cancellation processing.
  • the SOC121 in the display device 1 can determine the original playback signal to be played, it can use the original playback signal as the echo reference signal to perform the collected voice signal Echo cancellation processing.
  • the original playing signal determined by the SOC121 in the display device 1 will be amplified by the AMP122 and some non-linear processing to obtain the real signal to be played before the signal to be played output by the AMP122 is played through the speaker. This results in a big difference between the signal to be played that is actually played by the speaker after being processed by the AMP122 and the original playing signal determined by the SOC121.
  • the SOC121 Since the echo that needs to be eliminated in the voice signal received by the SOC121 is the signal to be played actually played by the speaker, at this time, if the SOC121 still only performs echo cancellation on the voice signal based on the original playback signal that has not been processed by the AMP122, the internal The sound signal does not better restore the to-be-played signal actually played by the speaker, which will reduce the echo cancellation effect of the SOC121's echo cancellation processing on the voice signal, which may affect the accuracy of subsequent voice signal recognition.
  • the present application provides a display device that collects the sound signal output to the speaker after AMP processing as an echo reference signal, and performs echo cancellation processing on the voice signal, thereby improving the echo cancellation effect on the voice signal, and further improving the subsequent response to the voice signal The recognition accuracy rate.
  • the display device 3 includes: a voice processing circuit 31, a power amplifier 32, a speaker 33, a voice collection circuit 34, and an echo processing circuit 35. Among them, the voice processing circuit 31, the power amplifier 32, and the speaker 33 are connected in sequence, and the voice processing circuit 31 is connected to the voice collecting circuit 34 and the echo processing circuit 35 respectively.
  • the voice processing circuit 31, the power amplifier 32 and the speaker 33 are jointly used to implement the voice playback function of the display device 3.
  • the voice processing circuit 31 may be a circuit on a system-on-chip (SOC) on the motherboard of the display device 1, or a central processing unit (SOC) in other forms. :CPU), graphics processing unit (Graphics Processing Unit, GPU for short) and other circuits on processing equipment with processing capabilities. This application does not limit the specific implementation of the voice processing circuit.
  • the voice processing circuit 31 is used to determine the original playback signal corresponding to the sound to be played by the display device 1 and send the original playback signal to the power amplifier 32 for processing.
  • the processing includes: amplifying the original playback signal.
  • the power amplifier 32 may also be called an operational amplifier, a power amplifier, etc., or may be called an Operational Amplifier, or AMP for short. Then, after the power amplifier 32 receives the original playback signal sent by the voice processing circuit 31, the signal to be played is amplified. The amplified signal to be played is sent to the speaker 33 for playing, and finally the speaker 33 in the display device 3 plays the signal to be played amplified by the power amplifier.
  • the voice collection circuit 34, the echo processing circuit 35, and the voice processing circuit 31 are jointly used to realize the voice signal collection and the voice of the display device 3. Signal processing function.
  • the voice collection circuit 34 may be a microphone (Microphone, MIC for short) provided in the display device 3, and the voice collection circuit 34 is used to collect sound signals in the surrounding environment where the display device 3 is located as the voice signals to be processed. , And send the collected voice signal to be processed to the voice processing circuit 31 for subsequent processing.
  • the echo processing circuit 35 is used to collect the to-be-played signal amplified by the power amplifier and output from the power amplifier 32 to the speaker 33, and send the collected to-be-played signal to the voice processing circuit 31 for subsequent processing.
  • the voice processing circuit 34 After receiving the to-be-processed voice signal sent by the voice collecting circuit 34 and the to-be-played signal sent by the echo processing circuit 35, the to-be-played signal is used as the echo reference signal, and the to-be-processed voice signal is echo canceled. deal with.
  • the present application does not limit the specific manner in which the voice processing circuit 31 performs echo cancellation; and the echo cancellation can be implemented by a hardware circuit in the voice processing circuit 31, or it can also be implemented by a processor in the voice processing circuit 31 in a way of program software.
  • FIG. 5 is a schematic structural diagram of an embodiment of a display device provided by this application.
  • the embodiment shown in FIG. 5 is a specific arrangement of various circuits in the display device based on the embodiment shown in FIG. 4.
  • the voice collection circuit 34 can be provided on the MIC board 301 of the display device 3, and the voice processing circuit 31, the power amplifier 32 and the echo processing circuit 35 can all be provided on the main board 302 of the display device 3.
  • the display device provided in the embodiment shown in FIG. 4 and FIG. 5 can obtain the signal to be played from the power amplifier to the speaker through the echo processing circuit, and then use the signal to be played as the echo reference signal.
  • the voice signal to be processed received by the voice acquisition circuit undergoes echo cancellation processing.
  • the to-be-played signal output by the power amplifier has undergone processing such as amplification, it can be directly played through the speaker, so that there is a gap between the to-be-played signal collected by the echo processing circuit and the sound signal directly played in the speaker of the display device. The difference is small.
  • the voice processing circuit uses the to-be-played signal output by the power amplifier as the echo reference signal, and performs echo cancellation processing on the to-be-processed voice signal, it can more accurately represent the sound signal actually played by the speaker in the to-be-processed voice signal, thereby This enables the voice processing circuit to achieve a better echo cancellation effect when performing echo cancellation processing on the voice signal to be processed, thereby improving the accuracy of subsequent voice signal recognition.
  • the power amplifiers in some display devices also need to be specially configured.
  • the display device is a television
  • the audio signal played by the speakers of the television needs to meet requirements such as audio power. Therefore, the amplitude of the signal to be played output from the power amplifier to the speaker is relatively large. For example, if the audio power requirement of a 55-inch TV signal is 10W, the amplitude of the signal to be played can reach 9V.
  • the signal amplitude that can be received by the voice processing circuit in the display device is small.
  • the voice processing circuit is an SOC
  • the upper limit of the effective value of the signal amplitude that the SOC can receive is generally 1V.
  • the echo processing circuit also needs to perform amplitude reduction processing on the collected signal to be played before sending it to the voice processing circuit for processing.
  • the speakers specifically include a left-channel speaker and a right-channel speaker
  • the power amplifier is also used to convert the signal to be played into a differential signal and output it to the speaker for playback.
  • the power amplifier specifically converts the signal to be played from a single-ended signal into a differential signal of a left channel signal and a right channel signal, and then sends them to the left channel speaker and the right channel speaker for playback.
  • Figure 6 is a schematic structural diagram of a power amplifier, where the power amplifier receives a signal to be played sent by a voice processing circuit, and the signal to be played includes a left channel signal and a right channel signal, and the power amplifier will The signal and the right channel signal are amplified separately, and after the single-ended signal is converted into a differential signal, the two differential left channel signals are sent to the left channel speaker for playback, and the two differential right channel signals are sent to Right channel speaker playback.
  • the left channel signal includes a differential AMP-Lout- signal and a SMP-Lout+ signal
  • the right channel signal includes a differential AMP-Rout+ signal and an AMP-Rout- signal.
  • FIG. 7 is a schematic structural diagram of an embodiment of a display device provided by this application.
  • the echo processing circuit 35 also needs to receive the left channel differential signal and the right channel differential signal sent by the power amplifier 32 respectively. After the channel differential signal, the differential signal received by the power amplifier is converted into a single-ended signal, and the single-ended signal of the left channel and the single-ended signal of the right channel are sent to the voice processing circuit.
  • this embodiment provides a specific implementation of the echo processing circuit, which can reduce the amplitude of the signal to be played from the amplifier and convert it to single-ended through the echo processing circuit After processing, the processed left channel signal and right channel signal are sent to the voice processing circuit for processing.
  • FIG. 8 is a schematic structural diagram of an embodiment of the echo processing circuit provided by this application. As shown in FIG. 8, the echo processing circuit specifically includes a right channel processing circuit and a left channel processing circuit.
  • the input end of the first operational amplifier N1A can be connected to the AMP-Rout+ signal and the AMP-Rout- signal output by the power amplifier as shown in FIG. 6.
  • the AMP-Rout- signal of the right channel is processed by the first capacitor C11, and then connected to the inverting input terminal IN- of the first operational amplifier N1A through the first input resistor R11; the AMP-Rout+ of the right channel
  • the output terminal OUT of the first operational amplifier N1A is also connected through the first feedback resistor R12.
  • the positive input terminal IN+ of the first operational amplifier N1A is grounded, and the positive input terminal IN+ and the inverting input terminal IN- of the first operational amplifier N1A are "virtually short", making the positive The voltage to the input terminal IN+ and the inverting input terminal IN- are both zero.
  • the inverting input terminal IN-input resistance R11 is high, a "virtual disconnection” is formed, so that there is almost no current injection and outflow from the inverting input terminal IN-.
  • the first input resistor R11 and the first feedback resistor R12 are connected in series, and the current flowing through the first input resistor R11 and the first feedback resistor R12 is the same.
  • the ratio of the voltage at the output terminal OUT of the first operational amplifier N1A to the voltage at the inverting input terminal IN- is the ratio of the first feedback resistor R12 to the first input resistor R11.
  • the voltage of the single-ended signal of AMP-RIN output by the first operational amplifier is smaller than the differential signal AMP-Rout- and AMP- of the input terminal of the first operational amplifier N1A.
  • the voltage of Rout+ that is, the first operational amplifier N1A realizes the conversion of differential signal to single-ended and amplitude reduction at the same time, and the single-ended signal AMP-RIN output by the first operational amplifier N1A can be used as the right channel signal of the signal to be played and directly sent to the voice processing circuit To process.
  • the input end of the second operational amplifier N1B can be connected to the AMP-Lout+ signal and the AMP-Lout- signal output by the power amplifier as shown in FIG. 6.
  • the AMP-Lout- signal of the right channel is processed by the third capacitor C31 and is connected to the inverting input terminal IN- of the second operational amplifier N1B through the third input resistor R31; the AMP-Lout+ of the left channel
  • the output terminal OUT of the second operational amplifier N1B is also connected through the second feedback resistor R32.
  • the positive input terminal IN+ of the second operational amplifier N1B is grounded, and the positive input terminal IN+ and the inverting input terminal IN- of the second operational amplifier N1B are "virtually short", making the positive The voltage to the input terminal IN+ and the inverting input terminal IN- are both zero.
  • the inverting input terminal IN-input resistance R31 is high, a "virtual disconnection” is formed, so that there is almost no current injection and outflow from the inverting input terminal IN-.
  • the third input resistor R31 and the second feedback resistor R32 are connected in series, and the current flowing through the third input resistor R31 and the second feedback resistor R32 is the same.
  • the ratio of the voltage at the output terminal OUT of the second operational amplifier N1B to the voltage at the inverting input terminal IN- is the ratio of the second feedback resistor R32 to the third input resistor R31.
  • the second operational amplifier N1B realizes the conversion of differential signal to single-ended and amplitude reduction at the same time, and the single-ended signal AMP-LIN output by the second operational amplifier N1B can be used as the left channel signal of the signal to be played and directly sent to the voice processing circuit To process.
  • the voice collection circuit 34 provided by the present application since the voice collection circuit 34 provided by the present application is only used to collect voice data to be processed, all subsequent processing of the voice data requires the voice processing circuit 31 to execute . Therefore, the voice collection circuit 34 provided in the present application may be a MIC array, and the to-be-processed voice signal received by the voice processing circuit is a pulse density modulation (Pulse Density Modulation, PDM) signal directly collected by the MIC array.
  • PDM Pulse Density Modulation
  • FIG. 9 is a schematic structural diagram of an embodiment of a display device provided by this application.
  • the voice collection circuit 34 of the display device 3 is a 4MIC array.
  • the MIC array is 4MIC as an example.
  • the voice collection circuit 34 may also be 2MIC, 8MIC, or 16MIC, which is only an increase or decrease in number, and the implementation principle is the same. No longer.
  • FIG. 10 is a schematic structural diagram of an embodiment of the arrangement of the MIC array in the voice collection circuit provided by this application.
  • the 4 MICs of the MIC array can be arranged in order. They are arranged inside the display device 1 from left to right. The figure also uses the display device 1 as a television as an example.
  • FIG. 11 is a schematic circuit diagram of an embodiment of the voice acquisition circuit provided by this application, in which four MICs of MIC1, MIC2, MIC3, and MIC4 are arranged in parallel on the circuit structure, and MIC1 and MIC3 are recorded For the first group D0, mark MIC2 and MIC4 as the second group D1.
  • the collected PDM signals are used as voice signals to be processed and sent to the voice processing circuit for processing.
  • the four MICs of MIC1, MIC2, MIC3 and MIC4 can be controlled through the PDM_CLK signal.
  • the L/R pin of MIC1 is directly connected to VDD through the resistor R1
  • the L/R pin of MIC1 is set to a high level by VDD.
  • the L/R pin of MIC2 is directly grounded through resistor R879.
  • resistor R9 is not connected in the figure, the L/R pin of MIC2 is set to low level.
  • the L/R pin of MIC3 is set to high level
  • the L/R pin of MIC4 is set to low level.
  • the CLK pins of the four MICs of MIC1, MIC2, MIC3 and MIC4 are connected to the square wave form of PDM_CLK signal, between the rising edge of the PDM_CLK signal and the next falling edge, MIC1 and MIC3 are the first group of D0. Collect the voice signal to be processed, and send the collected PDM_D0 signal and PDM_D1 signal to the voice processing circuit. And between the falling edge of the DM_CLK signal and the next rising edge, MIC2 and MIC4, the second group D1, collect the voice signal to be processed, and send the collected PDM_D0 signal and PDM_D1 signal to the voice processing circuit.
  • the to-be-processed voice signals collected by different groups of MICs will be received at different times, and in the embodiments of the present application, the to-be-processed voice signals received by the voice processing circuit are PDM signals.
  • the voice processing circuit can perform echo cancellation processing on the voice signal to be processed based on the received signal to be played, and the voice processing circuit can further perform operations such as voice recognition and semantic understanding on the voice signal to be processed.
  • FIG. 12 is a schematic diagram of the processing flow of the voice signal to be processed and the signal to be played by the voice processing circuit provided in this application.
  • the voice processing circuit 31 after receiving the to-be-processed voice signal from the voice collection circuit, the to-be-processed voice signal is first filtered, and then 16k sampling is performed to obtain the digitized voice signal to be processed, and then the digitized voice-to-be-processed After the signal undergoes gain control and delay control, it is sent to a direct memory access (Direct Memory Access, referred to as DMA) unit for processing.
  • DMA Direct Memory Access
  • the preprocessed signal to be played when receiving the preprocessed signal to be played from the echo collection circuit, the preprocessed signal to be played is first subjected to analog-to-digital conversion and 16k sampling to obtain the digitized signal to be played. Then the digitized signal to be played is also subjected to gain control and delay control, and then sent to the DMA unit for processing.
  • the DMA unit is the memory of the voice processing circuit, and its manifestation can be DDR.
  • the two signals obtained by the DMA unit are stored in the static random access memory (Static Random-Access Memory, referred to as SRAM) of the speech processing circuit, and the SRAM can be the hard disk of the speech processing circuit.
  • SRAM static random access memory
  • the voice processing circuit uses the signal to be played stored in the SRAM as an echo reference signal, and performs echo cancellation processing on the voice data to be processed to obtain the final voice data.
  • the voice processing circuit before performing echo cancellation, the voice processing circuit also needs to set the amplitude of the voice signal to be processed and the amplitude of the signal to be played to improve the efficiency of echo cancellation processing.
  • the purpose of delay control of the voice signal to be processed and the signal to be played is because the voice processing circuit receives the voice signal to be processed and the signal to be played from different circuits, and the processing of echo cancellation by the voice processing circuit is relative to real-time processing. The collected signal lags behind the asynchronous operation. Therefore, after the voice processing circuit receives the to-be-processed voice signal and the to-be-played signal, it needs to synchronize the two.
  • the voice processing circuit may further detect the user's instruction in the echo canceled voice data after obtaining the voice data after the echo cancellation processing. And after the user's instruction is detected, the function corresponding to the instruction is executed. For example, when this embodiment is applied in the scene as shown in FIG. 1 and the display device is a TV, if the user says to the TV the instruction to "turn off”. Then the to-be-processed voice data collected by the TV includes an instruction to "turn off".
  • the TV performs echo cancellation processing on the to-be-processed voice data according to the method provided in any of the foregoing embodiments of this application, it further recognizes that the to-be-processed voice data is "Shut down" command and execute the action of turning off the TV.
  • the voice processing circuit may also send the voice data after echo cancellation processing to the server on the network side through the communication circuit, and the server further detects the user's instructions in the voice data, and returns corresponding messages to the voice processing circuit according to the instructions, so that the voice The processing circuit performs corresponding functions according to the received message.
  • the voice data after the echo cancellation processing is sent to the server.
  • the server recognizes the "shutdown" instruction in the voice data, the server sends a shutdown message to the TV. Finally, after the TV receives the shutdown message sent by the server, it executes the shutdown action of the TV.
  • the display device proposed in this embodiment has a human-machine voice interaction function.
  • FIG. 13 is a front view of the display device of this embodiment
  • FIG. 14 is an exploded view of the structure of the display device of this embodiment.
  • the display device includes a panel 41, a backlight assembly 42, a main board 43, a power supply board 44, a rear case 45, a base 46, and a pickup circuit 47.
  • the panel 41 is used to present images to the user;
  • the backlight assembly 42 is located below the panel 41, usually some optical components, used to supply sufficient brightness and uniformly distributed light sources, so that the panel 41 can display images normally, the backlight assembly 42 also Including a back plate 4201, the main board 43 and the power supply board 44 are arranged on the back board 4201, and some convex structures are usually stamped on the back plate 4201.
  • the main board 43 and the power supply board 44 are fixed on the convex package by screws or hooks; the rear shell 45 The cover is set on the panel 41 to hide the backlight assembly 42, the main board 43, and the power supply board 44 and other display device components to achieve a beautiful effect; the base 46 is used to support the display device with a pickup circuit for picking up remote Field voice microphone.
  • the pickup circuit 47 can be arranged on the lower side of the rear case, and roughly located in the middle of the entire display device.
  • the pickup circuit 47 and the rear case 45 are an integrated structure or can be detachably connected by screws, buckles, etc. .
  • a microphone is provided on the remote control to pick up the voice uttered by the user.
  • the user needs to perform voice interaction with the display device, he must hold the remote control and speak to the remote control. Therefore, when the remote control is not around, the user needs to look for the remote control first, and while the user is holding the remote control to make a voice, the user’s hand is occupied and cannot do other things, which greatly causes inconvenience for the user, especially for Some users with hand disabilities will not be able to fully use the human-machine voice interaction function of the display device.
  • a display device with a far-field sound pickup function appears.
  • the microphone array for the user to pick up the sound is set on the display device. Therefore, the user can emit voice without the remote control and be picked up by the display device directly. This method liberates the user's hands and greatly facilitates the user's use.
  • the far-field pickup is interrupted and the recognition effect is deteriorated, thereby affecting the user experience. This is because the user’s far-field voice is often accompanied by the display device itself playing songs/videos and other local sounds through the speakers. Therefore, the microphone array actually collects the local sounds emitted by the display device’s speakers and the user’s actual Speaking voice, and the purpose of echo cancellation is to remove the local voice part of the speaker and only keep the user's voice.
  • the main board SOC of the display device sends out the sound signal to be played to the power amplifier, which is amplified by the power amplifier and then output to the speaker for playing. Therefore, it is usually used at the output end of the SOC chip to lead out a sound recovery signal as a reference to eliminate the signal.
  • the power amplifier will perform related processing on the sound signal that needs to be played, so the sound signal that needs to be played has already undergone non-linear changes before and after the power amplifier. Therefore, there is a certain gap between the collected sound recovery signal and the actual sound of the speaker. Therefore, even if the accuracy of the echo cancellation algorithm is high, the actual sound of the speaker cannot be completely eliminated, and the echo cancellation is incomplete. The problem has never been solved.
  • the motherboard 43 of the display device in this embodiment includes a SOC (System on Chip), and a power amplifier 550 connected to the SOC.
  • the output terminal of the power amplifier 550 is connected with a speaker 540, and the SOC outputs the audio signal to be played into the power amplifier 550.
  • the power amplifier 550 amplifies the audio signal and performs analog-to-digital conversion processing to drive the speaker 540 to play. Specifically, two or more speakers 540 may be provided.
  • the pickup circuit 47 includes a microphone board 58 on which a microphone array 511 is arranged.
  • the microphone array 511 includes a plurality of microphones arranged at intervals, and the distance between two adjacent microphones is approximately the same.
  • the microphone board 58 is also provided with a first encoder 522 for encoding the playback sound recovery signal obtained from the back end of the power amplifier 550, and a second encoder 512 for encoding the microphone output signal.
  • the main board 43 and the microphone board 58 need to transmit signals through the interface socket.
  • the far-field sound picked up by the microphone array 511 and the playback sound recovery signal acquired from the back end of the power amplifier 550 are all transmitted through the USB interface.
  • the interface socket can be a USB port, or a dedicated USB interface designed with the UAC (USB Audio Class) protocol of the USB as the interface protocol.
  • the embodiment of the present application proposes a far-field speech processing circuit of a device.
  • the device may be a smart terminal, such as a display device.
  • the application of the far-field speech processing circuit to the display device is taken as an example for description.
  • the far-field voice processing circuit includes a speaker 540, a sound pickup circuit 510, a preprocessing circuit 520, and a main control chip (not shown in the figure), and the main control chip integrates an echo processing circuit 531.
  • the speaker 540 is used to play the sound output by the device.
  • the sound pickup circuit 510 is used for picking up far-field sounds, and the far-field sounds include the far-field voice emitted by the user and the mixed sound that is transmitted to the sound pickup circuit 510 by the sound played by the speaker 540.
  • the preprocessing circuit 520 is connected to the sound pickup circuit 510 to receive the picked up far-field sound, and the preprocessing circuit 520 is connected to the front end of the speaker 540 to obtain the playback sound recovery signal.
  • the echo processing circuit 531 is connected to the preprocessing circuit 520 to receive the picked up far-field voice and the playback sound recovery signal, and use the playback sound recovery signal to echo cancel the picked far-field sound to obtain the far-field voice from the user.
  • the echo processing circuit 531 may be a separate circuit.
  • the user implements human-computer interaction with the display device by emitting voice, and the display device itself will play music, voice in video and other sounds through the speaker 540 when it is working; therefore, the sound pickup circuit 510 will inevitably pick up the remote voice of the user. Field voice and sound played by the speaker 540.
  • the main control chip of the display device transmits the sound signal to be played to the power amplifier (referred to as the power amplifier 550), and the power amplifier 550 will amplify the sound signal to be played to drive the speaker 540 plays a sound.
  • the power amplifier 550 will process the sound signals that need to be played. Therefore, the sound signals that need to be played have undergone non-linear changes before and after the power amplifier 550. Therefore, in the back end of the power amplifier 550, the speaker The sound acquired by the front end of the 540 can be closer to the real sound played by the speaker 540 to a greater extent.
  • the playback signal of the playback sound is obtained from the back end of the power amplifier 550 and the front end of the speaker 540. Therefore, the playback signal of the playback sound is very close to the sound played by the speaker 540 picked up in the sound pickup circuit 510. Therefore, based on the playback sound
  • the recovery signal performs echo cancellation on the far-field sound picked up, which can greatly reduce the echo doped in the far-field voice of the user (the echo refers to the sound played by the speaker 540), and improve the accuracy of recognizing far-field voice , Thereby improving the sensitivity of remote sound pickup interruption and wake-up, and improving user experience.
  • the "sound" in this embodiment may specifically refer to the sound wave signal corresponding to the sound and the analog signal and digital signal corresponding to the sound.
  • the sound pickup circuit 510 picks up the sound wave signal of the far-field sound, which is processed to form the digital signal of the far-field sound, and then is transmitted to the preprocessing circuit 520.
  • Those skilled in the art have the ability to judge some format changes that occur when sound is transmitted to different circuits.
  • the preprocessing circuit 520 includes a preprocessing circuit 521 and a first encoder 522.
  • the pre-processing circuit 521 may be an MCU, a single-chip microcomputer, or some other digital processing chips with audio interfaces.
  • the preprocessing circuit 521 is an MCU as an example for description.
  • the preprocessing circuit 21 is connected to the front end of the speaker 540 through the first encoder 522, and the first encoder 522 performs analog-to-digital conversion on the playback sound recovery signal.
  • the back end of the power amplifier 550 and the front end of the speaker 540 output the playback sound recovery signal as an analog signal, so the first encoder 522 performs analog-to-digital conversion on the playback sound recovery signal, and transmits the playback sound recovery signal after the analog-to-digital conversion.
  • the MCU that is, the pre-processing circuit, 521).
  • the first encoder 522 can perform analog-to-digital conversion on the playback sound recovery signals output by the multiple speakers 540 and convert them into a channel of digital signal output.
  • the output terminal of an audio signal corresponds to "one channel” here, and the multiple analog signals output by the multiple speakers can undergo analog-to-digital conversion in the encoder and output through one channel.
  • the first encoder 522 may specifically adopt the AC108 of X-POWER Company. The AC108 can convert the analog signals output by the two speakers 540 into a channel of digital signal output.
  • the far-field voice processing circuit includes a power amplifier, which is connected between the speaker 540 and the main control chip of the display device.
  • the playback sound recovery signal includes multiple sounds obtained from the front ends of the multiple speakers 540.
  • the far-field voice processing circuit further includes a signal processing circuit 570.
  • the input end of the signal processing circuit 570 is connected to the back end of the power amplifier 550 and the front end of the speaker 540.
  • the signal processing circuit 570 The output terminal is connected to the first encoder 522. That is, the playback sound recovery signal output from the power amplifier 550 is input to the first encoder 522 after the signal processing circuit performs voltage reduction and filtering processing.
  • the signal processing circuit 570 can use a BUCK step-down circuit or a resistor divider circuit to step down the playback sound recovery signal output from the power amplifier 550; it can also use an RC filter circuit to filter the playback sound playback signal after the step-down.
  • the sound pickup circuit 510 (refer to FIGS. 16 and 18) includes a microphone array 511, and a second encoder 512 electrically connected to the microphone array 511.
  • the microphone array 511 includes multiple microphones, each of which can pick up far-field sounds; multiple microphones simultaneously pick up far-field sounds to generate multiple analog signals of far-field sounds.
  • the multiple microphones are arranged in a linear array, and the original far-field sound signals are collected and converted into analog electrical signals, and then output to the first encoder 522 at the back end.
  • the second encoder 512 is used for analog-to-digital conversion of the analog signal of the far-field sound.
  • the second encoder 512 is also used to convert the digital signals of multiple channels of far-field sounds into one channel of audio signals to transmit to the MCU after performing analog-to-digital conversion on the analog signals of the far-field sounds.
  • the second encoder 512 can use X-POWER’s AC108.
  • AC108 contains a four-channel analog-to-digital converter, which can convert a total of four analog signals output by four microphones into analog-to-digital conversion and convert them into one-channel digital signals. Output.
  • the one-channel digital audio signal converted by the first encoder 522 and the second encoder 512 may be in the IIS audio format or the TDM audio format.
  • the linear microphone array 511 is synchronized as much as possible during the signal transmission process, so that the phase difference of the transmitted waveforms cannot exceed 180°.
  • a 1kHz single-frequency electrical signal can be used to pass into the microphone array 511 for testing, so as to better observe the phase difference of each microphone output signal.
  • the four microphones will correspondingly output four analog signals of the far-field sound to the second encoder 512, and the second encoder 512 will respond to the four analog signals of the far-field sound.
  • the one-channel audio signal substantially includes analog signals output by 4 microphones.
  • CON1-CON4 are interfaces for four microphones.
  • the microphones are placed equidistantly in a straight line, with a spacing of approximately 35mm between two pairs to form a linear four-microphone array that meets the space requirements of the algorithm.
  • the analog signals of the four microphones are directly input into the second encoder 512 to complete signal processing such as analog-to-digital conversion and low-pass filtering, and then converted into a 1-channel IIS format audio signal, and the audio signal is transmitted to the MCU through the IIS interface The corresponding IIS interface.
  • the pre-processing circuit 521 is coupled to the sound pickup circuit 510 and the front end of the speaker 540 to convert the picked-up far-field sound and the playback sound recovery signal into a format compatible with the echo processing circuit 531.
  • the pre-processing circuit 521 may be an MCU.
  • the MCU receives the far-field sound signal converted into one channel and the playback sound recovery signal converted into one channel, it will synthesize the far-field sound signal and the playback sound recovery signal to An audio signal in a format compatible with the echo processing circuit 531 is formed, so that the MCU can transmit the processed far-field sound signal and the playback sound recovery signal to the echo processing circuit 531.
  • the echo processing circuit 531 is integrated in the SOC of the display device. Therefore, the MCU needs to synthesize the audio signal in a format compatible with the SOC after the far-field sound signal and the playback sound recovery signal.
  • the MCU converts the far-field sound signal and the playback sound recovery signal into a USB data format, so that the MCU can use a standard USB data cable through the UAC (USB Audio Class) protocol of the USB interface. Audio data transmission between MCU and SOC.
  • UAC USB Audio Class
  • the pre-processing circuit 521 is provided to receive the picked up far-field sound and play the sound recovery signal, thereby overcoming that many existing display device SOC chips do not have corresponding audio transmission interfaces and cannot receive the far-field transmitted by the microphone array 511. Defects of field sound. Therefore, the technical solution of the present application improves the popularity of the far-field voice human-computer interaction technology on the display device.
  • the MCU before the format conversion, is also used to adjust the phase of the picked-up far-field sound and the playback sound recovery signal, so that the phase of the playback sound recovery signal is ahead of the phase of the picked-up far-field sound.
  • the phase of the playback sound recovery signal is ahead of the phase of the picked up far-field sound within 20 ms, so that the sound played by the speaker 540 can be better eliminated.
  • the MCU is also used to perform low-pass filtering on the far-field sound and playback sound recovery signals picked up by an algorithm to filter audio with a frequency higher than 8KHz to achieve the final far-field sound and playback sound recovery output by the MCU
  • the signal has no harmonics and no aliasing; it improves the preprocessing effect of the far-field sound and the playback sound recovery signal, thereby improving the echo processing effect.
  • the far-field sound and the playback sound recovery signal can be low-pass filtered through the algorithm, and then the phase between the two can be adjusted, and finally the format conversion can be performed; the far-field sound can also be converted first. Perform phase adjustment with the playback sound recovery signal, then filter, and finally perform format conversion.
  • the MCU receives the digitized playback sound retrieving signal output by the front-end first encoder 522 and the digitized far-field sound signal output by the second encoder 512, it first performs low-pass filtering on them to prevent aliasing. Affect the recognition of the echo cancellation algorithm, and then control and adjust the phase difference between the far-field sound signal and the playback sound recovery signal.
  • the processed far-field sound and the playback sound recovery signal are synthesized into a USB format audio signal and transmitted to the back-end SOC processing.
  • the far-field speech processing circuit further includes an encryption chip 580, the encryption chip 580 is used to store the key of the remote speech recognition algorithm, and the MCU is used to communicate with the encryption chip 580. Only when the MCU and the encryption chip 580 communicate successfully, can the far-field speech recognition algorithm be started. Specifically, after the display device is powered on, the MCU will communicate with the encryption chip 580. After the communication is successful, the far-field voice obtained after the SOC echoes the far-field voice can be further used by the subsequent far-field voice recognition algorithm. It is further recognized to analyze the semantics of far-field speech.
  • the echo processing algorithm is used to remove the part of the picked-up far-field sound that corresponds to the playback sound recovery signal, so as to preserve the far-field voice of the user.
  • Existing echo processing algorithms can all be applied in this embodiment, which is not specifically limited here.
  • the echo cancellation algorithm in the voice service program field (voice server APK) integrated in the SOC dynamically determines the voice signal picked up by the microphone array 511
  • the far-field voice and the energy difference and phase difference of the playback sound output signal from the speaker 540 can be extracted from the far-field voice signal picked up by the microphone array 511, thereby eliminating the display The echo interference phenomenon caused by the sound played by the device.
  • the remote voice that has been echo-processed needs to be further processed to restore the far-field voice actually emitted by the user to the greatest extent. Refer to Figure 20 and Figure 21.
  • the SOC also includes a speech enhancement circuit 633 and a sound source localization circuit 632.
  • the far-field sound after echo cancellation output by the echo cancellation circuit is transmitted to the speech enhancement circuit 633 and the sound source localization circuit 632 respectively; the speech enhancement circuit 633 and the sound source localization circuit 632 is connected to receive the sound source localization result output by the sound source localization circuit 632, and according to the sound source localization result, the far-field sound after echo cancellation is enhanced.
  • the speech enhancement circuit 633 may include one or more of a beam forming circuit 6331, a de-reverberation circuit 6332, and a noise reduction circuit 6333.
  • the speech enhancement circuit 633 also includes a beam forming circuit 6331, a de-reverberation circuit 6332, and a noise reduction circuit 6333 that are connected in sequence to perform beam forming, de-reverberation, and de-reverberation on the far-field sound after echo cancellation. And noise reduction processing to generate far-field voice to be uploaded.
  • the sound source location circuit 632 is used to identify the source location of the user's far-field voice, and feed this location back to the voice enhancement circuit 633.
  • the voice enhancement circuit 633 is based on the determined source location of the user's far-field voice. Perform beam forming, and suppress the voice in the corresponding area based on the formed beam, and further perform noise reduction processing to finally obtain the far-field voice to be uploaded.
  • the far-field voice to be uploaded obtained in this embodiment is already very close to the real far-field voice uttered by the user.
  • the SOC also includes a speech engine circuit 634.
  • the speech engine circuit 634 is connected to the output terminal of the speech enhancement circuit 633.
  • the speech engine circuit 634 performs wake-up word recognition processing on the far-field sound to be uploaded. When a preset wake-up word is recognized When the time, the wake-up event is triggered, and the far-field sound to be uploaded is encoded and transmitted to the designated terminal 660; the speech engine circuit 634 is also used to receive the instruction corresponding to the far-field sound returned from the designated terminal 660.
  • the designated terminal 660 may be the cloud, or may be other processing circuits in the display device. Taking uploading to the cloud as an example, voice recognition and semantic understanding are performed in the cloud, and instructions corresponding to far-field sounds are generated through online voice synthesis. By executing the instructions, the entire process of human-machine voice interaction of the display device is completed.
  • the instructions received by the voice engine circuit 634 from the cloud may include voice response messages that answer questions raised by the user, and the voice response messages may be broadcast through the power amplifier 550 and the speaker 540 of the display device.
  • the instruction can also control the control instruction that the display device responds to according to the control requirements in the user's far-field voice; the SOC of the display device controls the relevant circuit to respond to the control instruction according to the control instruction. For example, the control command is shutdown, and the SOC coordinates the power supply system of the display device to stop the power supply to the display system.
  • the voice to be uploaded will be synchronously uploaded to the voice service program (voice server APK), and then reported to the algorithm provider’s cloud service background by the voice service program to realize the closed loop of wake-up Optimization; This can improve the sensitivity of the recognition of wake-up words issued by different timbres and pronunciations.
  • the echo processing circuit 531, the speech enhancement circuit 633, the sound source localization circuit 632, and the speech engine circuit 634 may be separate circuits. In this embodiment, they are all algorithm circuits and are stored in the SOC.
  • the power amplifier 550 will perform related processing on the sound signal that needs to be played. Therefore, the sound signal that needs to be played has undergone nonlinear changes before and after passing through the power amplifier 550; therefore, this solution
  • the playback sound recovery signal is obtained from the back end of the power amplifier 550 and the front end of the speaker 540.
  • the playback sound recovery signal obtained by the preprocessing circuit 521 and the sound pickup circuit 510 is very close, so based on the playback sound recovery signal, the echo cancellation of the far-field sound picked up can greatly reduce the echo interference in the far-field voice sent by the user and improve the recognition The accuracy of far-field voice, thereby improving the sensitivity of remote sound pickup to interrupt wake-up, and improve user experience;
  • the preprocessing circuit 521 is set to receive the picked-up far-field sound and to play the sound recovery signal, thereby overcoming that many existing display device SOC chips do not have corresponding interfaces and cannot receive the far-field transmitted by the microphone array 511. Defects of field sound. Therefore, the technical solution of the present application improves the popularity of the far-field voice human-computer interaction technology on the display device.

Abstract

A display device (3) for obtaining, by means of an echo processing circuit (35), a signal to be played output to a loudspeaker (33) by a power amplifier (32), and then performing, using said signal as an echo reference signal, echo cancellation processing on a voice signal to be processed received by a voice acquisition circuit (34), such that a voice processing circuit (31) can implement a good echo cancellation effect when performing echo cancellation processing on the voice signal to be processed, thereby improving the accuracy of subsequent recognition on the voice signal.

Description

一种显示装置A display device
本专利申请要求于2019年7月10日提交的、申请号为201910619184.2,于2019年7月10日提交的、申请号为201910620438.2的中国专利申请的优先权,该申请的全文以引用的方式并入本文中。This patent application claims the priority of the Chinese patent application filed on July 10, 2019 with application number 201910619184.2, and the Chinese patent application filed on July 10, 2019 with application number 201910620438.2. The full text of this application is incorporated by reference. Into this article.
技术领域Technical field
本申请涉及电子技术领域,尤其涉及一种显示装置。This application relates to the field of electronic technology, and in particular to a display device.
背景技术Background technique
随着电子技术的不断发展,如手机、平板电脑、电视机等越来越多的显示装置都具有了能够与用户通过语音进行交互的功能。其中,用户可以通过向显示装置直接说出需要其执行的指令,显示装置通过麦克风对显示装置所在外部环境的语音信号进行采集,并识别出语音信号中包括用户所说的指令后,即可执行该指令对应的功能。为了更加清楚地获取用户在较远位置向显示装置所说出的指令,一些显示装置还在语音交互功能的基础上,实现了远场拾音功能。其中,显示装置在接收到语音信号后,还需要对语音信号进行滤波、去噪以及回音消除等处理后,再对经过处理后的语音信号中的指令进行识别,以获得更高的远场语音识别准确率。With the continuous development of electronic technology, more and more display devices, such as mobile phones, tablet computers, and televisions, have functions that can interact with users through voice. Among them, the user can directly speak the instructions that need to be executed to the display device. The display device collects the voice signal of the external environment where the display device is located through the microphone, and recognizes that the voice signal includes the instruction spoken by the user, and then executes it. The function corresponding to the instruction. In order to more clearly obtain the instructions spoken by the user to the display device at a remote location, some display devices also implement a far-field sound pickup function based on the voice interaction function. Among them, after the display device receives the voice signal, it needs to process the voice signal such as filtering, denoising and echo cancellation, and then recognize the instructions in the processed voice signal to obtain higher far-field voice Recognition accuracy rate.
相关技术中,显示装置在对接收到的语音信号进行回音消除的处理时,由于回音本身就是由显示装置中的片上系统(System on Chip,简称:SOC)控制器所确定的原始播放信号所导致,则显示装置可以直接将SOC控制器中的原始播放信号作为回音参考信号,对接收到的语音信号进行回音消除处理。In the related art, when the display device performs echo cancellation processing on the received voice signal, the echo itself is caused by the original playback signal determined by the System on Chip (SOC) controller in the display device. , The display device can directly use the original playback signal in the SOC controller as the echo reference signal, and perform echo cancellation processing on the received voice signal.
但是,采用相关技术,SOC控制器中的原始播放信号作为回音参考信号时, 对语音信号进行回音消除处理的回音消除效果较差,进而可能会影响后续对语音信号进行识别的准确率。因此,如何提高显示装置进行回音消除处理时的回音消除效果,是本领域亟需解决的技术问题。However, with related technologies, when the original playback signal in the SOC controller is used as the echo reference signal, the echo cancellation effect of the echo cancellation processing on the voice signal is poor, which may affect the accuracy of subsequent voice signal recognition. Therefore, how to improve the echo cancellation effect when the display device performs echo cancellation processing is a technical problem that needs to be solved urgently in this field.
申请内容Application content
本申请提供一种显示装置,以提高显示装置进行回音消除时的回音消除效果。The present application provides a display device to improve the echo cancellation effect when the display device performs echo cancellation.
本申请提供的显示装置,包括:语音处理电路、功率放大器、扬声器、语音采集电路和回音处理电路;The display device provided by this application includes: a voice processing circuit, a power amplifier, a speaker, a voice collection circuit, and an echo processing circuit;
其中,所述语音处理电路、所述功率放大器和所述扬声器依次连接;所述语音处理电路,分别与所述语音采集电路和所述回音处理电路连接;Wherein, the voice processing circuit, the power amplifier and the loudspeaker are connected in sequence; the voice processing circuit is connected to the voice collection circuit and the echo processing circuit respectively;
所述语音处理电路用于向所述功率放大器发送原始播放信号;所述功率放大器用于对所述原始播放信号进行处理后,将得到的待播放信号发送至所述扬声器进行播放;The voice processing circuit is used to send an original playback signal to the power amplifier; the power amplifier is used to process the original playback signal, and then send the obtained signal to be played to the speaker for playback;
所述语音采集电路用于,采集所述显示装置所在环境中的待处理语音信号;The voice collection circuit is used to collect voice signals to be processed in the environment where the display device is located;
所述回音处理电路用于,获取所述功率放大器向所述扬声器发送的所述待播放信号;The echo processing circuit is configured to obtain the signal to be played sent by the power amplifier to the speaker;
所述语音处理电路还用于,根据所述待播放信号,对所述待处理语音信号进行回音消除处理。The voice processing circuit is further configured to perform echo cancellation processing on the voice signal to be processed according to the signal to be played.
在上述实施例中,所述回音处理电路还用于,对所述待播放信号进行预处理;所述语音处理电路具体用于,根据所述预处理后的待播放信号,对所述待处理语音信号进行回音消除处理。In the above-mentioned embodiment, the echo processing circuit is also used to preprocess the signal to be played; the voice processing circuit is specifically used to perform processing on the signal to be played according to the preprocessed signal to be played The voice signal undergoes echo cancellation processing.
在上述实施例中,所述预处理包括:降幅处理。In the foregoing embodiment, the preprocessing includes: amplitude reduction processing.
在上述实施例中,所述功率放大器还用于,根据对所述待播放信号进行差分处理,得到所述待播放信号对应的左声道信号和右声道信号,并发送至所述扬声器进行播放;所述预处理还包括:转单端处理。In the above embodiment, the power amplifier is also used to obtain the left channel signal and the right channel signal corresponding to the signal to be played according to the differential processing of the signal to be played, and send them to the speaker for performing Play; the preprocessing also includes: converting to single-ended processing.
在上述实施例中,所述左声道信号包括:左声道正向差分信号和左声道负向差分信号;所述回音处理电路包括:左声道处理电路;所述右声道信号包括:右声道正向差分信号和右声道负向差分信号;In the foregoing embodiment, the left channel signal includes: a left channel positive differential signal and a left channel negative differential signal; the echo processing circuit includes: a left channel processing circuit; the right channel signal includes : Right channel positive differential signal and right channel negative differential signal;
所述回音处理电路包括:左声道处理电路和右声道处理电路;The echo processing circuit includes: a left channel processing circuit and a right channel processing circuit;
所述左声道处理电路用于,对所述左声道正向差分信号和所述左声道负向差分信号进行降幅处理和转单端处理;其中,所述左声道处理电路包括:第一输入电阻、第一反馈电阻和第一运算放大器;所述左声道正向差分信号连接所述第一运算放大器的同向输入端,所述左声道负向差分信号通过所述第一输入电阻连接所述第一运算放大器的反向输入端,所述第一运算放大器的输出端通过所述第一反馈电阻连接所述第一运算放大器的反向输入端。The left channel processing circuit is configured to perform amplitude reduction processing and single-ended conversion on the left channel positive differential signal and the left channel negative differential signal; wherein, the left channel processing circuit includes: The first input resistor, the first feedback resistor, and the first operational amplifier; the left channel positive differential signal is connected to the same direction input terminal of the first operational amplifier, and the left channel negative differential signal passes through the first operational amplifier. An input resistor is connected to the inverting input terminal of the first operational amplifier, and the output terminal of the first operational amplifier is connected to the inverting input terminal of the first operational amplifier through the first feedback resistor.
在上述实施例中,所述右声道信号包括:右声道正向差分信号和右声道负向差分信号;所述回音处理电路包括:右声道处理电路;In the above embodiment, the right channel signal includes: a right channel positive differential signal and a right channel negative differential signal; the echo processing circuit includes: a right channel processing circuit;
所述右声道处理电路用于,对所述右声道正向差分信号和所述右声道负向差分信号进行降幅处理和转单端处理;其中,所述右声道处理电路包括:第二输入电阻、第二反馈电阻和第二运算放大器;所述右声道正向差分信号连接所述第二运算放大器的同向输入端,所述右声道负向差分信号通过所述第二输入电阻连接所述第二运算放大器的反向输入端,所述第二运算放大器的输出端通过所述第二反馈电阻连接所述第二运算放大器的反向输入端。The right channel processing circuit is configured to perform amplitude reduction processing and conversion to single-ended processing on the right channel positive differential signal and the right channel negative differential signal; wherein, the right channel processing circuit includes: A second input resistor, a second feedback resistor and a second operational amplifier; the right channel positive differential signal is connected to the same direction input terminal of the second operational amplifier, and the right channel negative differential signal passes through the first Two input resistors are connected to the reverse input end of the second operational amplifier, and the output end of the second operational amplifier is connected to the reverse input end of the second operational amplifier through the second feedback resistor.
在上述实施例中,所述语音采集电路由MIC阵列组成,所述MIC阵列包括 多个MIC;所述待处理语音信号为所述MIC阵列所采集的脉冲密度调制PDM信号。In the foregoing embodiment, the voice collection circuit is composed of a MIC array, and the MIC array includes a plurality of MICs; the voice signal to be processed is a pulse density modulated PDM signal collected by the MIC array.
在上述实施例中,所述MIC阵列包括依次排列的第一MIC、第二MIC、第三MIC和第四MIC;将间隔设置的第一MIC和第三MIC记为第一组MIC,将间隔设置的第二MIC和第四MIC记为第二组MIC;In the above-mentioned embodiment, the MIC array includes a first MIC, a second MIC, a third MIC, and a fourth MIC arranged in sequence; The set second MIC and fourth MIC are recorded as the second group of MIC;
所述MIC阵列具体通过所述第一组MIC和所述第二组MIC轮流循环采集所述待处理语音信号。The MIC array specifically collects the to-be-processed voice signal in turn through the first group of MICs and the second group of MICs.
在上述实施例中,所述语音处理电路还用于,对所述PDM信号进行采样和模数转换处理。In the foregoing embodiment, the voice processing circuit is also used to perform sampling and analog-to-digital conversion processing on the PDM signal.
在上述实施例中,所述语音处理电路还用于,识别回音消除处理后的所述待处理语音信号中的指令,并执行所述指令对应的功能。In the above-mentioned embodiment, the voice processing circuit is further configured to recognize the instruction in the to-be-processed voice signal after echo cancellation processing, and execute the function corresponding to the instruction.
在上述实施例中,所述语音处理电路还用于,向服务器发送回音消除处理后的所述待处理语音信号的语音信号,使所述服务器识别所述待处理语音信号中的指令后,向所述语音处理电路发送指示消息;接收所述服务器发送的所述指示消息,并执行所述指示消息对应的功能。In the above embodiment, the voice processing circuit is further configured to send the voice signal of the voice signal to be processed after echo cancellation processing to the server, so that the server recognizes the instruction in the voice signal to be processed, and then sends the voice signal to the server. The voice processing circuit sends an instruction message; receives the instruction message sent by the server, and executes the function corresponding to the instruction message.
在一些实施例中,本申请提供一种显示装置,显示装置包括扬声器以及远场语音处理电路;所述远场语音处理电路包括:In some embodiments, the present application provides a display device. The display device includes a speaker and a far-field voice processing circuit; the far-field voice processing circuit includes:
扬声器,用于播放设备输出的声音;Speaker, used to play the sound output by the device;
声音拾取电路,用于拾取远场声音,所述远场声音包括用户发出的远场语音和所述扬声器播放的声音传输到声音拾取电路的声音;A sound pickup circuit for picking up far-field sounds, where the far-field sounds include far-field voices emitted by a user and sounds played by the speaker and transmitted to the sound pickup circuit;
预处理电路,与所述声音拾取电路连接,以接收拾取的远场声音,且所述预处理电路连接到扬声器的前端以获取播放声音回采信号;A preprocessing circuit connected to the sound pickup circuit to receive the picked-up far-field sound, and the preprocessing circuit is connected to the front end of the speaker to obtain the playback sound recovery signal;
回声处理电路,与所述预处理电路连接,以接收拾取的远场声音和所述播放声音回采信号,并用所述播放声音回采信号对所述拾取的远场声音进行回声消除,以得到用户发出的远场语音。The echo processing circuit is connected to the preprocessing circuit to receive the picked up far-field sound and the playback sound recovery signal, and use the playback sound recovery signal to echo cancel the picked up far-field sound to obtain the user's output Far-field voice.
在一些实施例中,所述预处理电路包括:In some embodiments, the preprocessing circuit includes:
前置处理电路,与所述声音拾取电路和所述扬声器的前端耦接,以将拾取的远场声音和所述播放声音回采信号转换成所述回声处理电路兼容的格式。The pre-processing circuit is coupled with the sound pickup circuit and the front end of the speaker to convert the picked-up far-field sound and the playback sound recovery signal into a format compatible with the echo processing circuit.
在一些实施例中,所述前置处理电路还用于调节拾取的远场声音与所述播放声音回采信号的相位,以使所述播放声音回采信号的相位超前于所述拾取的远场声音的相位在预设时长之内。In some embodiments, the pre-processing circuit is further used to adjust the phase of the picked up far-field sound and the playback sound recovery signal, so that the phase of the playback sound recovery signal is ahead of the picked up far-field sound The phase of is within the preset duration.
在一些实施例中,所述预处理电路还包括:In some embodiments, the preprocessing circuit further includes:
第一编码器,所述前置处理电路通过所述第一编码器与所述扬声器的前端连接,所述第一编码器对所述播放声音回采信号进行模数转换。A first encoder, the pre-processing circuit is connected to the front end of the speaker through the first encoder, and the first encoder performs analog-to-digital conversion on the playback sound recovery signal.
在一些实施例中,所述显示装置包括功率放大器;所述功率放大器连接在所述扬声器和所述回声处理电路之间,用于向所述扬声器提供设备输出的多路声音;所述播放声音回采信号包括从扬声器的前端获取的所述多路声音;In some embodiments, the display device includes a power amplifier; the power amplifier is connected between the speaker and the echo processing circuit, and is used to provide the speaker with multiple channels of sound output by the device; and the playback sound The recovery signal includes the multi-channel sound obtained from the front end of the speaker;
所述第一编码器还用于对从扬声器的前端获得的多路声音进行合成。The first encoder is also used for synthesizing multiple sounds obtained from the front end of the speaker.
在一些实施例中,所述声音拾取电路包括麦克风阵列,以及与所述麦克风阵列电连接的第二编码器,其中,所述麦克风阵列用于拾取所述远场声音;所述第二编码器用于对所述远场声音进行模数转换;In some embodiments, the sound pickup circuit includes a microphone array, and a second encoder electrically connected to the microphone array, wherein the microphone array is used for picking up the far-field sound; the second encoder is used for To perform analog-to-digital conversion on the far-field sound;
所述第二编码器还用于对所述麦克风阵列拾取的多路远场声音进行合成。The second encoder is also used for synthesizing multiple far-field sounds picked up by the microphone array.
在一些实施例中,所述远场声音处理电路还包括语音增强电路以及声源定位电路,所述回声消除电路输出的回声消除后的远场声音分别传输至所述语音 增强电路以及声源定位电路;In some embodiments, the far-field sound processing circuit further includes a speech enhancement circuit and a sound source localization circuit, and the echo-cancelled far-field sound output by the echo cancellation circuit is transmitted to the speech enhancement circuit and the sound source localization circuit respectively. Circuit
所述语音增强电路与所述声源定位电路连接,以接收所述声源定位电路输出的声源定位结果,并根据所述声源定位结果,对回声消除后的远场声音进行增强处理,以生成以形成待上传远场语音。The speech enhancement circuit is connected to the sound source localization circuit to receive the sound source localization result output by the sound source localization circuit, and according to the sound source localization result, enhance the far-field sound after echo cancellation, To generate to form the far-field voice to be uploaded.
在一些实施例中,所述显示装置还包括语音引擎电路,所述语音引擎电路与所述语音增强电路的输出端连接,所述语音引擎电路将所述待上传远场语音进行唤醒词识别处理,以在识别到预设的唤醒词时,将所述待上传远场语音进行编码,传输到指定终端;In some embodiments, the display device further includes a voice engine circuit connected to the output terminal of the voice enhancement circuit, and the voice engine circuit performs wake-up word recognition processing on the far-field voice to be uploaded , To encode the far-field voice to be uploaded and transmit it to the designated terminal when the preset wake-up word is recognized;
所述语音引擎电路还用于接收从指定终端返回的与所述远场语音对应的指令。The voice engine circuit is also used to receive an instruction corresponding to the far-field voice returned from a designated terminal.
在一些实施例中,所述显示装置具有主控芯片,所述回声处理电路、语音增强电路、声源定位电路、语音引擎电路均集成于所述主控芯片内。In some embodiments, the display device has a main control chip, and the echo processing circuit, voice enhancement circuit, sound source localization circuit, and voice engine circuit are all integrated in the main control chip.
在一些实施例中,提出一种远场语音处理电路,所述远场语音处理电路包括:In some embodiments, a far-field speech processing circuit is provided, and the far-field speech processing circuit includes:
声音拾取电路,用于拾取远场声音,所述远场声音包括用户发出的远场语音和所述扬声器播放的声音传输到声音拾取电路的声音;A sound pickup circuit for picking up far-field sounds, where the far-field sounds include far-field voices emitted by a user and sounds played by the speaker and transmitted to the sound pickup circuit;
预处理电路,与所述声音拾取电路连接,以接收拾取的远场声音,且所述预处理电路连接到扬声器的前端以获取播放声音回采信号;A preprocessing circuit connected to the sound pickup circuit to receive the picked-up far-field sound, and the preprocessing circuit is connected to the front end of the speaker to obtain the playback sound recovery signal;
回声处理电路,与所述预处理电路连接,以接收拾取的远场声音和所述播放声音回采信号,并用所述播放声音回采信号对所述拾取的远场声音进行回声消除,以得到用户发出的远场语音。The echo processing circuit is connected to the preprocessing circuit to receive the picked up far-field sound and the playback sound recovery signal, and use the playback sound recovery signal to echo cancel the picked up far-field sound to obtain the user's output Far-field voice.
本申请技术方案中,考虑到设备音响系统的需求,功放都会对需要播放的 声音信号作相关的处理,因此需要播放的声音信号在经过功放的前后已经发生了非线性变化;因此本方案从功放的后端、扬声器前端获取播放声音回采信号,因此即便在功放中进行了均衡、放大等非线性信号处理后,预处理电路所得到的播放声音回采信号与声音拾取电路所拾取到的扬声器播放的声音是极为接近的,因此用该播放声音回采信号对所述拾取的远场声音进行回声消除,能够较大程度上降低用户发出的远场语音中的回声干扰,提高识别远场语音的准确率,从而提高了远程拾音的打断唤醒的灵敏度,提高了用户体验;In the technical solution of this application, taking into account the requirements of the equipment audio system, the power amplifier will perform related processing on the sound signal that needs to be played, so the sound signal that needs to be played has already undergone non-linear changes before and after passing through the power amplifier; therefore, this solution changes from the power amplifier The back end and the front end of the speaker obtain the playback sound recovery signal, so even after the non-linear signal processing such as equalization and amplification in the power amplifier, the playback sound recovery signal obtained by the preprocessing circuit and the speaker playback signal picked up by the sound pickup circuit The sound is very close, so the playback sound recovery signal is used to eliminate the echo of the picked up far-field sound, which can greatly reduce the echo interference in the far-field voice sent by the user, and improve the accuracy of identifying the far-field voice. , Thereby improving the sensitivity of remote sound pickup to interrupt wake-up and improve user experience;
另外,本实施例通过设置预处理电路,以接收拾取的远场声音以及播放声音回采信号,从而克服了现有许多显示装置SOC芯片没有相应的接口,而无法接收麦克风阵列所传输的远场声音的缺陷。因此本申请技术方案提高了远场语音人机交互技术在显示装置上的普及In addition, this embodiment sets up a preprocessing circuit to receive the picked up far-field sound and play the sound recovery signal, thereby overcoming that many existing display device SOC chips do not have corresponding interfaces and cannot receive the far-field sound transmitted by the microphone array. Defects. Therefore, the technical solution of this application improves the popularity of far-field voice human-computer interaction technology on display devices.
另外,本申请提供的显示装置,能够通过回音处理电路获取功率放大器向扬声器所输出的待播放信号后,将待播放信号作为回音参考信号,对语音采集电路所接收的待处理语音信号进行回音消除处理。能够较为准确地表示出待处理语音信号中扬声器实际所播放的声音信号,从而使得语音处理电路能够在对待处理语音信号进行回音消除处理时实现较好的回音消除效果,进而提高后续对语音信号进行识别的准确率。In addition, the display device provided by the present application can obtain the signal to be played output from the power amplifier to the speaker through the echo processing circuit, and use the signal to be played as the echo reference signal to perform echo cancellation on the voice signal to be processed received by the voice collection circuit deal with. It can more accurately represent the sound signal actually played by the speaker in the voice signal to be processed, so that the voice processing circuit can achieve a better echo cancellation effect when the voice signal to be processed is echo canceled, thereby improving the subsequent processing of the voice signal. Accuracy of recognition.
附图说明Description of the drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly describe the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description These are some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative labor.
图1为本申请所应用场景的示意图;Figure 1 is a schematic diagram of the application scenario of this application;
图2为本申请电子设备显示装置对语音信号的处理流程示意图;FIG. 2 is a schematic diagram of the processing flow of the voice signal by the electronic device display device of this application;
图3为相关技术中一种电子设备显示装置的结构示意图。FIG. 3 is a schematic diagram of a structure of an electronic device display device in the related art.
图4为本申请提供的电子设备显示装置一实施例的结构示意图4 is a schematic structural diagram of an embodiment of an electronic device display device provided by this application
图5为本申请提供的电子设备显示装置一实施例的结构示意图5 is a schematic structural diagram of an embodiment of an electronic device display device provided by this application
图6为一种功率放大器的结构示意图;Figure 6 is a schematic diagram of the structure of a power amplifier;
图7为本申请提供的电子设备显示装置一实施例的结构示意图;FIG. 7 is a schematic structural diagram of an embodiment of an electronic device display device provided by this application;
图8为本申请提供的回音处理电路一实施例的结构示意图;8 is a schematic structural diagram of an embodiment of an echo processing circuit provided by this application;
图9为本申请提供的电子设备显示装置一实施例的结构示意图;9 is a schematic structural diagram of an embodiment of an electronic device display device provided by this application;
图10为本申请提供的语音采集电路中MIC阵列的设置方式一实施例的结构示意图;FIG. 10 is a schematic structural diagram of an embodiment of an arrangement of a MIC array in a voice collection circuit provided by this application;
图11为本申请提供的语音采集电路一实施例的电路示意图Figure 11 is a schematic circuit diagram of an embodiment of a voice collection circuit provided by this application
图12为本申请提供的语音处理电路对待处理语音信号和待播放信号的处理流程示意图;12 is a schematic diagram of the processing flow of the voice signal to be processed and the signal to be played by the voice processing circuit provided by this application;
图13是本申请显示装置一实施例的正视图;FIG. 13 is a front view of an embodiment of a display device of the present application;
图14为图13的部分结构分解图;Figure 14 is an exploded view of part of the structure of Figure 13;
图15为本申请显示装置的电路架构图;15 is a circuit structure diagram of the display device of this application;
图16本申请远场语音处理电路一实施例的电路连接框图;Fig. 16 is a circuit connection block diagram of an embodiment of the far-field speech processing circuit of the present application;
图17是本申请远场语音处理电路另一实施例的电路连接框图;17 is a circuit connection block diagram of another embodiment of the far-field speech processing circuit of the present application;
图18是本申请远场语音处理电路再一实施例的电路连接框图;18 is a circuit connection block diagram of still another embodiment of the far-field speech processing circuit of the present application;
图19是麦克风阵列中与第二编码器之间的接口电路图;Figure 19 is a circuit diagram of the interface between the microphone array and the second encoder;
图20是主控芯片一实施例的功能结构框图;20 is a block diagram of the functional structure of an embodiment of the main control chip;
图21是本申请远场语音处理电路一实施例的部分电路连接框图。Fig. 21 is a block diagram of partial circuit connections of an embodiment of the far-field speech processing circuit of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例例如能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third", "fourth", etc. (if any) in the specification and claims of this application and the above-mentioned drawings are used to distinguish similar objects, without having to use To describe a specific order or sequence. It should be understood that the data used in this way can be interchanged under appropriate circumstances, so that the embodiments of the present application described herein, for example, can be implemented in a sequence other than those illustrated or described herein. In addition, the terms "including" and "having" and any variations of them are intended to cover non-exclusive inclusions. For example, a process, method, system, product or device that includes a series of steps or units is not necessarily limited to the clearly listed Those steps or units may include other steps or units that are not clearly listed or are inherent to these processes, methods, products, or equipment.
图1为本申请所应用场景的示意图,其中,本申请各实施例应用于具有语音交互功能的显示装置1,即显示装置1能够根据用户2所说出的指令,执行指令对应的相关功能。所述显示装置1包括:手机、平板电脑、笔记本电脑、电视机,以及其他具有相关信号处理功能或者数据处理功能的智能设备,例如,智能手表、智能音箱、智能电器等。其中,在如图1所示的应用场景中,以显示装置1为电视机作为示例性说明,而非对其进行限定。FIG. 1 is a schematic diagram of the application scenario of this application, in which each embodiment of this application is applied to a display device 1 with a voice interaction function, that is, the display device 1 can execute related functions corresponding to the instructions according to the instructions spoken by the user 2. The display device 1 includes: mobile phones, tablet computers, notebook computers, televisions, and other smart devices with related signal processing functions or data processing functions, such as smart watches, smart speakers, smart appliances, etc. Among them, in the application scenario shown in FIG. 1, the display device 1 is a television set as an exemplary description, rather than limiting it.
在一些实施例中,显示装置1对语音信号进行处理的流程可以参照图2所示,其中,图2为本申请显示装置对语音信号的处理流程示意图。其中,显示 装置1通过麦克风采集显示装置1所在周围环境的声音信号得到语音信号后,将采集得到的语音信号进行检测处理。通过关键词检测语音信号中用户2所发出的需要显示装置执行的指令,例如“开机”、“换台”、“增减音量”以及“关机”等。随后,显示装置1执行识别出的指令对应的功能。例如,显示装置1识别出所采集到的语音信号中的指令为“关机”后,执行关机的操作。In some embodiments, the flow of processing the voice signal by the display device 1 can be referred to as shown in FIG. 2, where FIG. 2 is a schematic diagram of the processing flow of the voice signal by the display device of this application. Among them, the display device 1 collects sound signals of the surrounding environment where the display device 1 is located through a microphone to obtain a voice signal, and then performs detection processing on the collected voice signal. The key words are used to detect the instructions issued by the user 2 in the voice signal that need to be executed by the display device, such as "turn on", "change channel", "increase or decrease volume", and "turn off". Subsequently, the display device 1 executes the function corresponding to the recognized instruction. For example, after the display device 1 recognizes that the command in the collected voice signal is "shutdown", it executes the shutdown operation.
在如图1所示的应用场景中,显示装置1通常能够以较高的识别率通过语音信号检测出用户2所说出的指令,但是,当用户2与显示装置1之间的距离较远时,显示装置1所采集到的语音信号中,用户2所出的指令非常微弱,且语音信号中包括噪声、回音等干扰,使得显示装置1识别较远的用户2所说出的指令时的识别率较低。因此,一些显示装置1中还具有远场拾音功能,在接收到包括距离较远的用户2所说出指令的语音信号后,还对语音信号进行滤波、去噪以及回音消除等处理后,再检测处理后的语音信号中用户2的指令,以提高对距离显示装置1较远处的用户2所说出的指令的识别准确率。In the application scenario shown in FIG. 1, the display device 1 can usually detect the instructions spoken by the user 2 through the voice signal with a higher recognition rate. However, when the distance between the user 2 and the display device 1 is relatively long At this time, among the voice signals collected by the display device 1, the instructions given by the user 2 are very weak, and the voice signals include interference such as noise and echo, so that the display device 1 recognizes the instructions spoken by the user 2 farther away. The recognition rate is low. Therefore, some display devices 1 also have a far-field sound pickup function. After receiving a voice signal including instructions uttered by a user 2 who is far away, the voice signal is also processed by filtering, denoising and echo cancellation. The instruction of the user 2 in the processed voice signal is detected again to improve the recognition accuracy of the instruction spoken by the user 2 far away from the display device 1.
其中,当显示装置1为电视机时,由于电视机本身也在通过其扬声器播放声音信号,而与此同时,为了对用户2说出的指令进行识别,显示装置1在采集其所在环境语音信号时,所采集到的语音信号中不可避免地会包含显示装置1通过扬声器播放的声音信号。因此,显示装置1为了准确检测出语音信号中用户2所说出的指令,就需要对语音信号中显示装置1自身所播放的声音信号进行消除,这个处理过程在一些技术中被称为“回音消除”。Among them, when the display device 1 is a television, the television itself is also playing sound signals through its speakers, and at the same time, in order to recognize the instructions spoken by the user 2, the display device 1 is collecting the voice signals of the environment where it is located. At this time, the collected voice signal will inevitably include the sound signal played by the display device 1 through the speaker. Therefore, in order for the display device 1 to accurately detect the instructions spoken by the user 2 in the voice signal, it needs to eliminate the voice signal played by the display device 1 itself in the voice signal. This processing process is called "echoes" in some technologies. eliminate".
相关技术中,显示装置1通常使用SOC控制器所确定的原始播放声音信号作为回音参考信号,对所采集的语音信号进行回音消除。例如,图3为相关技术中一种显示装置的结构示意图。在如图3所示的显示装置1中,包括:MIC 板11、主板12、左声道扬声器123和右声道扬声器124。其中,主板12上的SOC121用于确定显示装置1需要播放的原始播放信号,并将原始播放信号通过I2S接口发送至AMP122进行处理;随后AMP122将原始播放信号进行放大,并将单端信号转换为左声道信号和右声道信号后,分别发送至左声道扬声器123和右声道扬声器124进行播放。In the related art, the display device 1 usually uses the original playback sound signal determined by the SOC controller as the echo reference signal to perform echo cancellation on the collected voice signal. For example, FIG. 3 is a schematic structural diagram of a display device in the related art. In the display device 1 shown in FIG. 3, it includes: a MIC board 11, a main board 12, a left channel speaker 123 and a right channel speaker 124. Among them, the SOC121 on the main board 12 is used to determine the original playback signal to be played by the display device 1, and send the original playback signal to the AMP122 through the I2S interface for processing; then the AMP122 amplifies the original playback signal and converts the single-ended signal to After the left channel signal and the right channel signal, they are respectively sent to the left channel speaker 123 and the right channel speaker 124 for playback.
同时,MIC板11上的麦克风111用于采集显示装置1所在环境的语音信号,并将语音信号送至编解码单元112进行编解码处理后,转换为I2S格式的语音信号后送入MIC板上的MCU113。由MCU113进一步对语音信号进行处理后,通过USB接口发送至主板12上的SOC121。At the same time, the microphone 111 on the MIC board 11 is used to collect the voice signal of the environment where the display device 1 is located, and send the voice signal to the codec unit 112 for codec processing, convert it into a voice signal in I2S format and send it to the MIC board MCU113. After the voice signal is further processed by the MCU113, it is sent to the SOC121 on the main board 12 through the USB interface.
此时,由于SOC121既能够确定显示装置需要播放的原始播放信号,又需要对接收到的语音信号进行回音消除处理,因此,SOC121能够直接将需要播放的原始播放信号作为回音参考信号,对来自于MCU113所发送的语音信号进行回音消除处理后,SOC121再根据回音消除处理后的语音信号进行关键词检测、指令识别以及执行指令对应的功能等处理。At this time, because the SOC121 can determine the original playback signal that the display device needs to play, and it needs to perform echo cancellation processing on the received voice signal, so the SOC121 can directly use the original playback signal that needs to be played as the echo reference signal. After the voice signal sent by the MCU113 undergoes echo cancellation processing, the SOC121 then performs keyword detection, instruction recognition, and execution of functions corresponding to the instruction according to the voice signal after the echo cancellation processing.
综上,在如图3所示的相关技术中,虽然显示装置1中的SOC121能够在确定待播放的原始播放信号后,即可将原始播放信号作为回音参考信号对所采集到的语音信号进行回音消除处理。但是,显示装置1中SOC121所确定的原始播放信号,会经过AMP122的放大以及一些非线性化的处理得到真实的待播放信号后,才通过扬声器播放AMP122所输出的待播放信号。也就导致了经过AMP122处理后发送至扬声器所实际播放的待播放信号,与SOC121所确定的原始播放信号之间存在较大差异。In summary, in the related technology shown in FIG. 3, although the SOC121 in the display device 1 can determine the original playback signal to be played, it can use the original playback signal as the echo reference signal to perform the collected voice signal Echo cancellation processing. However, the original playing signal determined by the SOC121 in the display device 1 will be amplified by the AMP122 and some non-linear processing to obtain the real signal to be played before the signal to be played output by the AMP122 is played through the speaker. This results in a big difference between the signal to be played that is actually played by the speaker after being processed by the AMP122 and the original playing signal determined by the SOC121.
由于SOC121接收到的语音信号中真正需要消除的回音是扬声器实际播放 的待播放信号,此时,若SOC121依然只根据其内部未经AMP122处理的原始播放信号对语音信号进行回音消除,由于SOC内部的声音信号并不能较好地还原扬声器实际播放的待播放信号,从而会降低SOC121对语音信号进行回音消除处理的回音消除效果,进而可能会影响后续对语音信号进行识别的准确率。Since the echo that needs to be eliminated in the voice signal received by the SOC121 is the signal to be played actually played by the speaker, at this time, if the SOC121 still only performs echo cancellation on the voice signal based on the original playback signal that has not been processed by the AMP122, the internal The sound signal does not better restore the to-be-played signal actually played by the speaker, which will reduce the echo cancellation effect of the SOC121's echo cancellation processing on the voice signal, which may affect the accuracy of subsequent voice signal recognition.
因此,如何提高显示装置进行回音消除处理时的回音消除效果,以提高语音信号识别的准确率,是本领域亟待解决的问题。本申请提供一种显示装置,通过采集经过AMP处理后输出至扬声器的声音信号作为回音参考信号,对语音信号进行回音消除处理,从而提高对语音信号的回音消除效果,以进一步提高后续对语音信号的识别准确率。Therefore, how to improve the echo cancellation effect when the display device performs echo cancellation processing to improve the accuracy of speech signal recognition is a problem to be solved in the field. The present application provides a display device that collects the sound signal output to the speaker after AMP processing as an echo reference signal, and performs echo cancellation processing on the voice signal, thereby improving the echo cancellation effect on the voice signal, and further improving the subsequent response to the voice signal The recognition accuracy rate.
下面以具体地实施例对本申请的技术方案进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例不再赘述。The technical solution of the present application will be described in detail below with specific embodiments. The following specific embodiments can be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments.
图4为本申请提供的显示装置一实施例的结构示意图,其中,如图4所示的显示装置可应用于如图1所示的应用场景中,本实施例提供的显示装置以电视机为例进行说明。如图4所示,本实施例提供的显示装置3包括:语音处理电路31,功率放大器32,扬声器33,语音采集电路34和回音处理电路35。其中,语音处理电路31、功率放大器32和扬声器33依次连接,语音处理电路31分别与语音采集电路34和回音处理电路35连接。4 is a schematic structural diagram of an embodiment of a display device provided by this application. The display device shown in FIG. 4 can be applied to the application scenario shown in FIG. 1. The display device provided in this embodiment uses a TV as a Examples are explained. As shown in FIG. 4, the display device 3 provided in this embodiment includes: a voice processing circuit 31, a power amplifier 32, a speaker 33, a voice collection circuit 34, and an echo processing circuit 35. Among them, the voice processing circuit 31, the power amplifier 32, and the speaker 33 are connected in sequence, and the voice processing circuit 31 is connected to the voice collecting circuit 34 and the echo processing circuit 35 respectively.
在一些实施例中,在如图4所示的显示装置3能够实现的功能A中,语音处理电路31、功率放大器32和扬声器33共同用于实现显示装置3的语音播放功能。In some embodiments, in the function A that can be implemented by the display device 3 as shown in FIG. 4, the voice processing circuit 31, the power amplifier 32 and the speaker 33 are jointly used to implement the voice playback function of the display device 3.
在一些实施例中,所述语音处理电路31可以是显示装置1内主板上的系统 级芯片(System on Chip,简称:SOC)上的电路,或者其他形式的中央处理器(Central Processing Unit,简称:CPU)、图形处理器(Graphics Processing Unit,简称:GPU)等具备处理能力的处理设备上的电路。本申请对语音处理电路的具体实现方式不做限定。则语音处理电路31用于确定显示装置1待播放的声音对应的原始播放信号,并将原始播放信号发送至功率放大器32进行处理,所述处理包括:对原始播放信号进行放大处理。In some embodiments, the voice processing circuit 31 may be a circuit on a system-on-chip (SOC) on the motherboard of the display device 1, or a central processing unit (SOC) in other forms. :CPU), graphics processing unit (Graphics Processing Unit, GPU for short) and other circuits on processing equipment with processing capabilities. This application does not limit the specific implementation of the voice processing circuit. The voice processing circuit 31 is used to determine the original playback signal corresponding to the sound to be played by the display device 1 and send the original playback signal to the power amplifier 32 for processing. The processing includes: amplifying the original playback signal.
其中,功率放大器32又可被称为运算放大器、功放等,或者被称为Operational Amplifier,简称:AMP。则当功率放大器32接收到语音处理电路31所发送的原始播放信号后,对待播放信号进行放大处理后。将放大后得到的待播放信号发送至扬声器33进行播放,最终实现显示装置3中的扬声器33对经过功率放大器放大后的待播放信号进行播放。Among them, the power amplifier 32 may also be called an operational amplifier, a power amplifier, etc., or may be called an Operational Amplifier, or AMP for short. Then, after the power amplifier 32 receives the original playback signal sent by the voice processing circuit 31, the signal to be played is amplified. The amplified signal to be played is sent to the speaker 33 for playing, and finally the speaker 33 in the display device 3 plays the signal to be played amplified by the power amplifier.
在一些实施例中,在如图3所示的显示装置3能够实现的功能B中,语音采集电路34、回音处理电路35和语音处理电路31共同用于实现显示装置3的语音信号采集以及语音信号处理功能。In some embodiments, in the function B that can be realized by the display device 3 as shown in FIG. 3, the voice collection circuit 34, the echo processing circuit 35, and the voice processing circuit 31 are jointly used to realize the voice signal collection and the voice of the display device 3. Signal processing function.
在一些实施例中,语音采集电路34可以是显示装置3中设置的麦克风(Microphone,简称:MIC),语音采集电路34用于采集显示装置3所在周围环境中的声音信号作为待处理的语音信号,并将所采集的待处理语音信号发送至语音处理电路31进行后续处理。回音处理电路35用于采集功率放大器32向扬声器33所输出的,经过功率放大器放大后的待播放信号,并将所采集的待播放信号发送至语音处理电路31进行后续处理。则对于语音处理电路34,在分别接收到语音采集电路34发送的待处理语音信号,以及回音处理电路35发送的待播放信号后,以待播放信号作为回音参考信号,对待处理语音信号进行 回音消除处理。In some embodiments, the voice collection circuit 34 may be a microphone (Microphone, MIC for short) provided in the display device 3, and the voice collection circuit 34 is used to collect sound signals in the surrounding environment where the display device 3 is located as the voice signals to be processed. , And send the collected voice signal to be processed to the voice processing circuit 31 for subsequent processing. The echo processing circuit 35 is used to collect the to-be-played signal amplified by the power amplifier and output from the power amplifier 32 to the speaker 33, and send the collected to-be-played signal to the voice processing circuit 31 for subsequent processing. Then for the voice processing circuit 34, after receiving the to-be-processed voice signal sent by the voice collecting circuit 34 and the to-be-played signal sent by the echo processing circuit 35, the to-be-played signal is used as the echo reference signal, and the to-be-processed voice signal is echo canceled. deal with.
本申请对语音处理电路31进行回音消除的具体方式不做限定;并且回音消除可以由语音处理电路31中的硬件电路实现,或者,还可由语音处理电路31中的处理器以程序软件方式实现。The present application does not limit the specific manner in which the voice processing circuit 31 performs echo cancellation; and the echo cancellation can be implemented by a hardware circuit in the voice processing circuit 31, or it can also be implemented by a processor in the voice processing circuit 31 in a way of program software.
图5为本申请提供的显示装置一实施例的结构示意图,如图5所示的实施例是在如图4所示实施例基础上,显示装置中各电路的一种具体设置方式。其中,语音采集电路34可以设置在显示装置3的MIC板301上,语音处理电路31、功率放大器32和回音处理电路35均可以设置在显示装置3的主板302上。FIG. 5 is a schematic structural diagram of an embodiment of a display device provided by this application. The embodiment shown in FIG. 5 is a specific arrangement of various circuits in the display device based on the embodiment shown in FIG. 4. The voice collection circuit 34 can be provided on the MIC board 301 of the display device 3, and the voice processing circuit 31, the power amplifier 32 and the echo processing circuit 35 can all be provided on the main board 302 of the display device 3.
综上,在如图4和图5所示的实施例中所提供的显示装置,能够通过回音处理电路获取功率放大器向扬声器所输出的待播放信号后,将待播放信号作为回音参考信号,对语音采集电路所接收的待处理语音信号进行回音消除处理。其中,由于功率放大器所输出的待播放信号已经经过了放大等处理,能够通过扬声器直接进行播放,使得回音处理电路所采集的待播放信号,与显示装置的扬声器中直接播放的声音信号之间存在的差异较小。In summary, the display device provided in the embodiment shown in FIG. 4 and FIG. 5 can obtain the signal to be played from the power amplifier to the speaker through the echo processing circuit, and then use the signal to be played as the echo reference signal. The voice signal to be processed received by the voice acquisition circuit undergoes echo cancellation processing. Among them, because the to-be-played signal output by the power amplifier has undergone processing such as amplification, it can be directly played through the speaker, so that there is a gap between the to-be-played signal collected by the echo processing circuit and the sound signal directly played in the speaker of the display device. The difference is small.
因此,在语音处理电路将功率放大器所输出的待播放信号作为回音参考信号,对待处理语音信号进行回音消除处理时,能够较为准确地表示出待处理语音信号中扬声器实际所播放的声音信号,从而使得语音处理电路能够在对待处理语音信号进行回音消除处理时实现较好的回音消除效果,进而提高后续对语音信号进行识别的准确率。Therefore, when the voice processing circuit uses the to-be-played signal output by the power amplifier as the echo reference signal, and performs echo cancellation processing on the to-be-processed voice signal, it can more accurately represent the sound signal actually played by the speaker in the to-be-processed voice signal, thereby This enables the voice processing circuit to achieve a better echo cancellation effect when performing echo cancellation processing on the voice signal to be processed, thereby improving the accuracy of subsequent voice signal recognition.
在一些实施例中,在上述实施例具体的实现中,一些显示装置中的功率放大器还需要进行特殊的设置。例如,若显示装置为电视机,则电视机扬声器所播放的声音信号需要满足伴音功率等要求。因此,功率放大器向扬声器所输出 的待播放信号的振幅较大,例如,55寸电视机信号的伴音功率要求为10W,则待播放信号的振幅可以达到9V。但是,显示装置中的语音处理电路能够接收的信号幅度较小,例如,语音处理电路为SOC时,SOC能够接收的信号幅度的有效值上限一般为1V。In some embodiments, in the specific implementation of the foregoing embodiments, the power amplifiers in some display devices also need to be specially configured. For example, if the display device is a television, the audio signal played by the speakers of the television needs to meet requirements such as audio power. Therefore, the amplitude of the signal to be played output from the power amplifier to the speaker is relatively large. For example, if the audio power requirement of a 55-inch TV signal is 10W, the amplitude of the signal to be played can reach 9V. However, the signal amplitude that can be received by the voice processing circuit in the display device is small. For example, when the voice processing circuit is an SOC, the upper limit of the effective value of the signal amplitude that the SOC can receive is generally 1V.
在这种情况下,若直接将回音处理电路所采集的待播放信号发送至语音处理电路,可能会导致语音处理电路的损坏。因此,在本实施例中,回音处理电路还有必要对所采集的待播放信号进行降幅处理后,再发送至语音处理电路进行处理。In this case, if the signal to be played collected by the echo processing circuit is directly sent to the voice processing circuit, the voice processing circuit may be damaged. Therefore, in this embodiment, the echo processing circuit also needs to perform amplitude reduction processing on the collected signal to be played before sending it to the voice processing circuit for processing.
此外,对于一些显示装置中,扬声器具体包括:左声道扬声器和右声道扬声器,则功率放大器还用于将待播放信号转换为差分信号后输出至扬声器播放。功率放大器具体将待播放信号由单端信号转换为左声道信号和右声道信号的差分信号后,分别发送至左声道扬声器和右声道扬声器进行播放。In addition, for some display devices, the speakers specifically include a left-channel speaker and a right-channel speaker, and the power amplifier is also used to convert the signal to be played into a differential signal and output it to the speaker for playback. The power amplifier specifically converts the signal to be played from a single-ended signal into a differential signal of a left channel signal and a right channel signal, and then sends them to the left channel speaker and the right channel speaker for playback.
例如,图6为一种功率放大器的结构示意图,其中,功率放大器接收语音处理电路发送的待播放信号,所述待播放信号包括左声道信号和右声道信号,则功率放大器将左声道信号和右声道信号分别进行放大处理,并且将单端信号转换为差分信号后,将两路差分的左声道信号发送至左声道扬声器播放,将两路差分的右声道信号发送至右声道扬声器播放。其中,如图6中所示,左声道信号包括差分的AMP-Lout-信号和SMP-Lout+信号,右声道信号包括差分的AMP-Rout+信号和AMP-Rout-信号。For example, Figure 6 is a schematic structural diagram of a power amplifier, where the power amplifier receives a signal to be played sent by a voice processing circuit, and the signal to be played includes a left channel signal and a right channel signal, and the power amplifier will The signal and the right channel signal are amplified separately, and after the single-ended signal is converted into a differential signal, the two differential left channel signals are sent to the left channel speaker for playback, and the two differential right channel signals are sent to Right channel speaker playback. Wherein, as shown in FIG. 6, the left channel signal includes a differential AMP-Lout- signal and a SMP-Lout+ signal, and the right channel signal includes a differential AMP-Rout+ signal and an AMP-Rout- signal.
因此,图7为本申请提供的显示装置一实施例的结构示意图,如图7所示的本实施例中,回音处理电路35还需要分别接收功率放大器32所发出的左声道差分信号和右声道差分信号后,对功率放大器接收到的差分信号转换为单端 信号,并将左声道的单端信号和右声道的单端信号发送至语音处理电路。Therefore, FIG. 7 is a schematic structural diagram of an embodiment of a display device provided by this application. In this embodiment as shown in FIG. 7, the echo processing circuit 35 also needs to receive the left channel differential signal and the right channel differential signal sent by the power amplifier 32 respectively. After the channel differential signal, the differential signal received by the power amplifier is converted into a single-ended signal, and the single-ended signal of the left channel and the single-ended signal of the right channel are sent to the voice processing circuit.
基于此,由于回音处理电路需要同时满足信号降幅以及转单端处理,本实施例提供一种回音处理电路的具体实现方式,能够通过回音处理电路对放大器输出的待播放信号进行降幅以及转单端处理后,将处理后的左声道信号和右声道信号发送至语音处理电路进行处理。Based on this, since the echo processing circuit needs to satisfy both signal reduction and single-ended processing, this embodiment provides a specific implementation of the echo processing circuit, which can reduce the amplitude of the signal to be played from the amplifier and convert it to single-ended through the echo processing circuit After processing, the processed left channel signal and right channel signal are sent to the voice processing circuit for processing.
在一些实施例中,图8为本申请提供的回音处理电路一实施例的结构示意图,如图8所示,回音处理电路具体包括:右声道处理电路和左声道处理电路。In some embodiments, FIG. 8 is a schematic structural diagram of an embodiment of the echo processing circuit provided by this application. As shown in FIG. 8, the echo processing circuit specifically includes a right channel processing circuit and a left channel processing circuit.
在右声道处理电路中,第一运算放大器N1A的输入端可以连接如图6所示的功率放大器所输出的AMP-Rout+信号和AMP-Rout-信号。其中,右声道的AMP-Rout-信号经过第一电容C11的隔直处理后,通过第一输入电阻R11与第一运算放大器N1A的反相输入端IN-连接;右声道的AMP-Rout+信号经过第二电容C21的隔直处理后,通过第二电阻R21与第一运算放大器N1A的正相输入端IN+连接;同时,右声道的AMP-Rout-信号经过第一电容C11的隔直处理后,还通过第一反馈电阻R12连接第一运算放大器N1A的输出端OUT。In the right channel processing circuit, the input end of the first operational amplifier N1A can be connected to the AMP-Rout+ signal and the AMP-Rout- signal output by the power amplifier as shown in FIG. 6. Among them, the AMP-Rout- signal of the right channel is processed by the first capacitor C11, and then connected to the inverting input terminal IN- of the first operational amplifier N1A through the first input resistor R11; the AMP-Rout+ of the right channel After the signal is blocked by the second capacitor C21, it is connected to the non-inverting input terminal IN+ of the first operational amplifier N1A through the second resistor R21; at the same time, the AMP-Rout- signal of the right channel is blocked by the first capacitor C11. After processing, the output terminal OUT of the first operational amplifier N1A is also connected through the first feedback resistor R12.
则在右声道处理电路中,第一运算放大器N1A的正向输入端IN+接地,则第一运算放大器N1A的正向输入端IN+和反相输入端IN-之间“虚短”,使得正向输入端IN+和反相输入端IN-的电压均为0。同时,反相输入端IN-输入电阻R11较高时形成“虚断”,使得反相输入端IN-几乎没有电流注入和流出。此时,第一输入电阻R11和第一反馈电阻R12相当于串联,且流过第一输入电阻R11和第一反馈电阻R12的电流相同。随后根据欧姆定律和串联电阻分压方式可得到,第一运算放大器N1A的输出端OUT的电压与反相输入端IN-电压之比即为第一反馈电阻R12与第一输入电阻R11之比。Then in the right channel processing circuit, the positive input terminal IN+ of the first operational amplifier N1A is grounded, and the positive input terminal IN+ and the inverting input terminal IN- of the first operational amplifier N1A are "virtually short", making the positive The voltage to the input terminal IN+ and the inverting input terminal IN- are both zero. At the same time, when the inverting input terminal IN-input resistance R11 is high, a "virtual disconnection" is formed, so that there is almost no current injection and outflow from the inverting input terminal IN-. At this time, the first input resistor R11 and the first feedback resistor R12 are connected in series, and the current flowing through the first input resistor R11 and the first feedback resistor R12 is the same. Then, according to Ohm's law and the series resistor divider method, the ratio of the voltage at the output terminal OUT of the first operational amplifier N1A to the voltage at the inverting input terminal IN- is the ratio of the first feedback resistor R12 to the first input resistor R11.
则当第一反馈电阻R12阻值小于第一电阻R11时,第一运算放大器所输出的AMP-RIN的单端信号的电压,小于第一运算放大器N1A输入端的差分信号AMP-Rout-和AMP-Rout+的电压。即,第一运算放大器N1A同时实现了差分信号转单端以及降幅处理,第一运算放大器N1A所输出的单端信号AMP-RIN可作为待播放信号的右声道信号,直接送入语音处理电路进行处理。Then when the resistance of the first feedback resistor R12 is smaller than the first resistor R11, the voltage of the single-ended signal of AMP-RIN output by the first operational amplifier is smaller than the differential signal AMP-Rout- and AMP- of the input terminal of the first operational amplifier N1A. The voltage of Rout+. That is, the first operational amplifier N1A realizes the conversion of differential signal to single-ended and amplitude reduction at the same time, and the single-ended signal AMP-RIN output by the first operational amplifier N1A can be used as the right channel signal of the signal to be played and directly sent to the voice processing circuit To process.
在左声道处理电路中,第二运算放大器N1B的输入端可以连接如图6所示的功率放大器所输出的AMP-Lout+信号和AMP-Lout-信号。其中,右声道的AMP-Lout-信号经过第三电容C31的隔直处理后,通过第三输入电阻R31与第二运算放大器N1B的反相输入端IN-连接;左声道的AMP-Lout+信号过第四电容C41的隔直处理后,通过第四电阻R41与第二运算放大器N1B的正相输入端IN+连接;同时,左声道的AMP-Lout-信号经过第三电容C31的隔直处理后,还通过第二反馈电阻R32连接第二运算放大器N1B的输出端OUT。In the left channel processing circuit, the input end of the second operational amplifier N1B can be connected to the AMP-Lout+ signal and the AMP-Lout- signal output by the power amplifier as shown in FIG. 6. Among them, the AMP-Lout- signal of the right channel is processed by the third capacitor C31 and is connected to the inverting input terminal IN- of the second operational amplifier N1B through the third input resistor R31; the AMP-Lout+ of the left channel After the signal passes through the fourth capacitor C41, it is connected to the non-inverting input terminal IN+ of the second operational amplifier N1B through the fourth resistor R41; at the same time, the AMP-Lout- signal of the left channel is blocked by the third capacitor C31. After processing, the output terminal OUT of the second operational amplifier N1B is also connected through the second feedback resistor R32.
则在左声道处理电路中,第二运算放大器N1B的正向输入端IN+接地,则第二运算放大器N1B的正向输入端IN+和反相输入端IN-之间“虚短”,使得正向输入端IN+和反相输入端IN-的电压均为0。同时,反相输入端IN-输入电阻R31较高时形成“虚断”,使得反相输入端IN-几乎没有电流注入和流出。此时,第三输入电阻R31和第二反馈电阻R32相当于串联,且流过第三输入电阻R31和第二反馈电阻R32的电流相同。随后根据欧姆定律和串联电阻分压方式可得到,第二运算放大器N1B的输出端OUT的电压与反相输入端IN-电压之比即为第二反馈电阻R32与第三输入电阻R31之比。Then in the left channel processing circuit, the positive input terminal IN+ of the second operational amplifier N1B is grounded, and the positive input terminal IN+ and the inverting input terminal IN- of the second operational amplifier N1B are "virtually short", making the positive The voltage to the input terminal IN+ and the inverting input terminal IN- are both zero. At the same time, when the inverting input terminal IN-input resistance R31 is high, a "virtual disconnection" is formed, so that there is almost no current injection and outflow from the inverting input terminal IN-. At this time, the third input resistor R31 and the second feedback resistor R32 are connected in series, and the current flowing through the third input resistor R31 and the second feedback resistor R32 is the same. Then, according to Ohm's law and the series resistor divider method, the ratio of the voltage at the output terminal OUT of the second operational amplifier N1B to the voltage at the inverting input terminal IN- is the ratio of the second feedback resistor R32 to the third input resistor R31.
则当第二反馈电阻R32阻值小于第三电阻R31时,第二运算放大器所输出的AMP-LIN的单端信号的电压,小于第二运算放大器N1B输入端的差分信号 AMP-Lout-和AMP-Lout+的电压。即,第二运算放大器N1B同时实现了差分信号转单端以及降幅处理,第二运算放大器N1B所输出的单端信号AMP-LIN可作为待播放信号的左声道信号,直接送入语音处理电路进行处理。Then when the resistance of the second feedback resistor R32 is smaller than the third resistor R31, the voltage of the single-ended signal of AMP-LIN output by the second operational amplifier is smaller than the differential signal AMP-Lout- and AMP- at the input of the second operational amplifier N1B. The voltage of Lout+. That is, the second operational amplifier N1B realizes the conversion of differential signal to single-ended and amplitude reduction at the same time, and the single-ended signal AMP-LIN output by the second operational amplifier N1B can be used as the left channel signal of the signal to be played and directly sent to the voice processing circuit To process.
在一些实施例中,在上述各实施例的基础上,由于本申请提供的语音采集电路34仅用于采集待处理语音数据,而后续对语音数据进行的所有的处理都需要语音处理电路31执行。因此,本申请提供的语音采集电路34可以是MIC阵列,并且,语音处理电路所接收到的待处理语音信号为MIC阵列所直接采集的脉冲密度调制(Pulse Density Modulation,简称:PDM)信号。In some embodiments, on the basis of the foregoing embodiments, since the voice collection circuit 34 provided by the present application is only used to collect voice data to be processed, all subsequent processing of the voice data requires the voice processing circuit 31 to execute . Therefore, the voice collection circuit 34 provided in the present application may be a MIC array, and the to-be-processed voice signal received by the voice processing circuit is a pulse density modulation (Pulse Density Modulation, PDM) signal directly collected by the MIC array.
例如,图9为本申请提供的显示装置一实施例的结构示意图,在图9所示的实施例中,显示装置3的语音采集电路34为4MIC阵列。其中,图9中以MIC阵列为4MIC为示例性说明,在其他可能的实现方式中,语音采集电路34还可以是2MIC、8MIC或者16MIC等,仅为数量上的增减,其实现原理相同,不再赘述。并且,在MIC阵列包括4个MIC的基础上,图10为本申请提供的语音采集电路中MIC阵列的设置方式一实施例的结构示意图,在图10中,MIC阵列的4个MIC可以按照顺序依次从左到右设置在显示装置1内部,图中同样以显示装置1为电视机作为示例。For example, FIG. 9 is a schematic structural diagram of an embodiment of a display device provided by this application. In the embodiment shown in FIG. 9, the voice collection circuit 34 of the display device 3 is a 4MIC array. Among them, in FIG. 9, the MIC array is 4MIC as an example. In other possible implementation manners, the voice collection circuit 34 may also be 2MIC, 8MIC, or 16MIC, which is only an increase or decrease in number, and the implementation principle is the same. No longer. Moreover, on the basis that the MIC array includes 4 MICs, FIG. 10 is a schematic structural diagram of an embodiment of the arrangement of the MIC array in the voice collection circuit provided by this application. In FIG. 10, the 4 MICs of the MIC array can be arranged in order. They are arranged inside the display device 1 from left to right. The figure also uses the display device 1 as a television as an example.
在一些实施例中,图11为本申请提供的语音采集电路一实施例的电路示意图,其中,MIC1、MIC2、MIC3和MIC4的四个MIC在电路结构上并列设置,并且,将MIC1和MIC3记为第一组D0,将MIC2和MIC4记为第二组D1。本实施例中,第一组D0和第二组D1轮流循环采集语音数据后,将采集到的PDM信号作为待处理语音信号,发送至语音处理电路进行处理。In some embodiments, FIG. 11 is a schematic circuit diagram of an embodiment of the voice acquisition circuit provided by this application, in which four MICs of MIC1, MIC2, MIC3, and MIC4 are arranged in parallel on the circuit structure, and MIC1 and MIC3 are recorded For the first group D0, mark MIC2 and MIC4 as the second group D1. In this embodiment, after the first group D0 and the second group D1 collect voice data in turn, the collected PDM signals are used as voice signals to be processed and sent to the voice processing circuit for processing.
在一些实施例中,可以通过PDM_CLK信号对MIC1、MIC2、MIC3和MIC4 的四个MIC进行控制。其中,由于MIC1的L/R引脚直接通过电阻R1连接VDD,在图中不接入电阻R7时,MIC1的L/R引脚被VDD置为高电平。而MIC2的L/R引脚直接通过电阻R879接地,在图中不接入电阻R9时,MIC2的L/R引脚被置为低电平。基于同样的原理,MIC3的L/R引脚被置为高电平,MIC4的L/R引脚被置为低电平。In some embodiments, the four MICs of MIC1, MIC2, MIC3 and MIC4 can be controlled through the PDM_CLK signal. Among them, since the L/R pin of MIC1 is directly connected to VDD through the resistor R1, when the resistor R7 is not connected in the figure, the L/R pin of MIC1 is set to a high level by VDD. The L/R pin of MIC2 is directly grounded through resistor R879. When resistor R9 is not connected in the figure, the L/R pin of MIC2 is set to low level. Based on the same principle, the L/R pin of MIC3 is set to high level, and the L/R pin of MIC4 is set to low level.
则当MIC1、MIC2、MIC3和MIC4的四个MIC的CLK引脚接入方波形式的PDM_CLK信号后,在PDM_CLK信号的上升沿到下一个下降沿之间,MIC1和MIC3即第一组D0进行待处理语音信号的采集,并将采集得到的PDM_D0信号和PDM_D1信号发送至语音处理电路。而在DM_CLK信号的下降沿到下一个上升沿之间,MIC2和MIC4即第二组D1进行待处理语音信号的采集,并将采集得到的PDM_D0信号和PDM_D1信号发送至语音处理电路。则对于语音处理电路,会在不同的时刻接收到由不同组的MIC所采集的待处理语音信号,并且本申请各实施例中,语音处理电路所接收到的待处理语音信号为PDM信号。Then when the CLK pins of the four MICs of MIC1, MIC2, MIC3 and MIC4 are connected to the square wave form of PDM_CLK signal, between the rising edge of the PDM_CLK signal and the next falling edge, MIC1 and MIC3 are the first group of D0. Collect the voice signal to be processed, and send the collected PDM_D0 signal and PDM_D1 signal to the voice processing circuit. And between the falling edge of the DM_CLK signal and the next rising edge, MIC2 and MIC4, the second group D1, collect the voice signal to be processed, and send the collected PDM_D0 signal and PDM_D1 signal to the voice processing circuit. For the voice processing circuit, the to-be-processed voice signals collected by different groups of MICs will be received at different times, and in the embodiments of the present application, the to-be-processed voice signals received by the voice processing circuit are PDM signals.
在本申请各实施例中,语音处理电路除了能够根据接收到的待播放信号对待处理语音信号进行回音消除处理,语音处理电路还可以进一步对待处理语音信号进行语音识别、语义理解等操作。In the embodiments of the present application, the voice processing circuit can perform echo cancellation processing on the voice signal to be processed based on the received signal to be played, and the voice processing circuit can further perform operations such as voice recognition and semantic understanding on the voice signal to be processed.
例如,如图12为本申请提供的语音处理电路对待处理语音信号和待播放信号的处理流程示意图。其中,对于语音处理电路31,当接收到来自语音采集电路的待处理语音信号后,首先对待处理语音信号进行滤波处理,随后进行16k采样得到数字化的待处理语音信号,随后将数字化的待处理语音信号进行增益控制与延时控制后,送入直接存储器访问(Direct Memory Access,简称:DMA)单元进行处理。而对于语音处理电路31,当接收到来自于回音采集电路发送的 预处理后的待播放信号,首先对预处理后的待播放信号通过模数转换并进行16k采样,得到数字化的待播放信号,随后同样将数字化的待播放信号进行增益控制与延时控制后,送入DMA单元进行处理。For example, FIG. 12 is a schematic diagram of the processing flow of the voice signal to be processed and the signal to be played by the voice processing circuit provided in this application. Among them, for the voice processing circuit 31, after receiving the to-be-processed voice signal from the voice collection circuit, the to-be-processed voice signal is first filtered, and then 16k sampling is performed to obtain the digitized voice signal to be processed, and then the digitized voice-to-be-processed After the signal undergoes gain control and delay control, it is sent to a direct memory access (Direct Memory Access, referred to as DMA) unit for processing. As for the voice processing circuit 31, when receiving the preprocessed signal to be played from the echo collection circuit, the preprocessed signal to be played is first subjected to analog-to-digital conversion and 16k sampling to obtain the digitized signal to be played. Then the digitized signal to be played is also subjected to gain control and delay control, and then sent to the DMA unit for processing.
其中,DMA单元为语音处理电路的内存,其表现形式可以是DDR。DMA单元得到的将得到的两路信号存入语音处理电路的静态随机存取存储器(Static Random-Access Memory,简称:SRAM)中,SRAM可以是语音处理电路的硬盘。最终,语音处理电路对SRAM中存储的待播放信号作为回音参考信号,对待处理语音数据进行回音消除处理后,得到最终的语音数据。Among them, the DMA unit is the memory of the voice processing circuit, and its manifestation can be DDR. The two signals obtained by the DMA unit are stored in the static random access memory (Static Random-Access Memory, referred to as SRAM) of the speech processing circuit, and the SRAM can be the hard disk of the speech processing circuit. Finally, the voice processing circuit uses the signal to be played stored in the SRAM as an echo reference signal, and performs echo cancellation processing on the voice data to be processed to obtain the final voice data.
需要说明的是,对待处理语音信号和待播放信号进行增益控制的目的是,由于两个信号的强度越接近,回音消除算法越容易对进行回音消除处理。因此,本实施例中,语音处理电路在进行回音消除之前,还需要对待处理语音信号的振幅与待播放信号的振幅进行设置,以提高回音消除处理的效率。而对待处理语音信号和待播放信号进行延时控制的目的是,由于语音处理电路分别从不同的电路接收待处理语音信号和待播放信号,并且语音处理电路进行回音消除的处理是相对于实时所采集的信号滞后的异步操作,因此,语音处理电路在接收到待处理语音信号和待播放信号后,需要对二者进行同步操作。It should be noted that the purpose of gain control on the voice signal to be processed and the signal to be played is that the closer the strength of the two signals is, the easier it is for the echo cancellation algorithm to perform echo cancellation processing. Therefore, in this embodiment, before performing echo cancellation, the voice processing circuit also needs to set the amplitude of the voice signal to be processed and the amplitude of the signal to be played to improve the efficiency of echo cancellation processing. The purpose of delay control of the voice signal to be processed and the signal to be played is because the voice processing circuit receives the voice signal to be processed and the signal to be played from different circuits, and the processing of echo cancellation by the voice processing circuit is relative to real-time processing. The collected signal lags behind the asynchronous operation. Therefore, after the voice processing circuit receives the to-be-processed voice signal and the to-be-played signal, it needs to synchronize the two.
在一些实施例中,在如图12所示基础上,语音处理电路在得到回音消除处理后的语音数据后,可以进一步检测回音消除后的语音数据中用户的指令。并在检测到用户的指令后,执行该指令对应的功能。例如,当本实施例应用在如图1所示的场景中,显示装置为电视机时,若用户向电视机说出“关机”的指令。则电视机所采集的待处理语音数据中包括“关机”的指令,电视机根据本申请前述任一实施例中提供的方式对待处理语音数据进行回音消除处理后,进 一步识别出待处理语音数据中“关机”的指令,并执行电视机关机的动作。In some embodiments, on the basis of FIG. 12, the voice processing circuit may further detect the user's instruction in the echo canceled voice data after obtaining the voice data after the echo cancellation processing. And after the user's instruction is detected, the function corresponding to the instruction is executed. For example, when this embodiment is applied in the scene as shown in FIG. 1 and the display device is a TV, if the user says to the TV the instruction to "turn off". Then the to-be-processed voice data collected by the TV includes an instruction to "turn off". After the TV performs echo cancellation processing on the to-be-processed voice data according to the method provided in any of the foregoing embodiments of this application, it further recognizes that the to-be-processed voice data is "Shut down" command and execute the action of turning off the TV.
或者,语音处理电路还可以将回音消除处理后的语音数据通过通信电路发送至网络侧的服务器,由服务器进一步检测语音数据中用户的指令,并根据指令向语音处理电路返回对应的消息,使得语音处理电路根据接收到的消息执行对应的功能。例如,同样当当本实施例应用在如图1所示的场景中,显示装置为电视机时,若用户向电视机说出“关机”的指令,则电视机根据本申请前述任一实施例中提供的方式对待处理语音数据进行回音消除处理后,将回音消除处理后的语音数据发送至服务器,由服务器识别出语音数据中“关机”的指令后,服务器向电视机发送关机消息。最终,电视机接收到服务器发送的关机消息后,执行电视机关机的动作。Alternatively, the voice processing circuit may also send the voice data after echo cancellation processing to the server on the network side through the communication circuit, and the server further detects the user's instructions in the voice data, and returns corresponding messages to the voice processing circuit according to the instructions, so that the voice The processing circuit performs corresponding functions according to the received message. For example, also when this embodiment is applied in the scene as shown in FIG. 1 and the display device is a TV, if the user tells the TV to "turn off" the command, the TV will follow any one of the preceding embodiments of this application. After the provided method performs echo cancellation processing on the voice data to be processed, the voice data after the echo cancellation processing is sent to the server. After the server recognizes the "shutdown" instruction in the voice data, the server sends a shutdown message to the TV. Finally, after the TV receives the shutdown message sent by the server, it executes the shutdown action of the TV.
本实施例所提出的显示装置具有人机语音交互功能。在此对显示装置的结构进行说明,参见图13所示,图13为本实施例显示装置的正视图,图14为本实施例显示装置的结构分解图。The display device proposed in this embodiment has a human-machine voice interaction function. Here, the structure of the display device will be described, referring to FIG. 13, which is a front view of the display device of this embodiment, and FIG. 14 is an exploded view of the structure of the display device of this embodiment.
如图13和图14所示,显示装置包括面板41、背光组件42、主板43、电源板44、后壳45、基座46、拾音电路47。其中,面板41用于给用户呈现画面;背光组件42位于面板41的下方,通常是一些光学组件,用于供应充足的亮度与分布均匀的光源,使面板41能正常显示影像,背光组件42还包括背板4201,主板43和电源板44设置于背板4201上,通常在背板4201上冲压形成一些凸包结构,主板43和电源板44通过螺钉或者挂钩固定在凸包上;后壳45盖设在面板41上,以隐藏背光组件42、主板43以及电源板44等显示装置的零部件,起到美观的效果;基座46,用于支撑显示装置拾音电路内具有用于拾取远场语音的麦克风。本实施例中,拾音电路47可以设置于后壳下侧,且大致 位于整个显示装置的中部,拾音电路47与后壳45为一体式结构或通过螺钉、卡扣等结构实现可拆卸连接。As shown in FIGS. 13 and 14, the display device includes a panel 41, a backlight assembly 42, a main board 43, a power supply board 44, a rear case 45, a base 46, and a pickup circuit 47. Among them, the panel 41 is used to present images to the user; the backlight assembly 42 is located below the panel 41, usually some optical components, used to supply sufficient brightness and uniformly distributed light sources, so that the panel 41 can display images normally, the backlight assembly 42 also Including a back plate 4201, the main board 43 and the power supply board 44 are arranged on the back board 4201, and some convex structures are usually stamped on the back plate 4201. The main board 43 and the power supply board 44 are fixed on the convex package by screws or hooks; the rear shell 45 The cover is set on the panel 41 to hide the backlight assembly 42, the main board 43, and the power supply board 44 and other display device components to achieve a beautiful effect; the base 46 is used to support the display device with a pickup circuit for picking up remote Field voice microphone. In this embodiment, the pickup circuit 47 can be arranged on the lower side of the rear case, and roughly located in the middle of the entire display device. The pickup circuit 47 and the rear case 45 are an integrated structure or can be detachably connected by screws, buckles, etc. .
在相关技术中,通过在遥控器上设置麦克风,以拾取用户发出的语音。当用户需要与显示装置进行语音交互时,必须要手持遥控器,并对着遥控器发出语音。因此当遥控器不在身边时,用户需要先寻找遥控器,并且在用户手持遥控器发出语音的同时,用户的手被占用而无法做其他事情,这极大的造成用户的使用不便,特别是对于一些手部残疾的用户来说将无法充分的使用显示装置的人机语音交互功能。In the related art, a microphone is provided on the remote control to pick up the voice uttered by the user. When the user needs to perform voice interaction with the display device, he must hold the remote control and speak to the remote control. Therefore, when the remote control is not around, the user needs to look for the remote control first, and while the user is holding the remote control to make a voice, the user’s hand is occupied and cannot do other things, which greatly causes inconvenience for the user, especially for Some users with hand disabilities will not be able to fully use the human-machine voice interaction function of the display device.
在另一相关技术中,出现了带有远场拾音功能的显示装置,用户拾音的麦克风阵列设置在显示装置上,因此用户可以脱离于遥控器发出语音而直接被显示装置所拾取,这种方式解放了用户的双手,极大的方便了用户的使用。但是由于回音消除的不彻底,导致远场拾音的打断唤醒及识别效果变差,从而影响用户体验。这是由于用户发出远场语音的同时,环境中往往也伴随着显示装置自身通过扬声器播放歌曲/视频等本机声音,因此麦克风阵列实际采集了显示装置的扬声器所发出的本机声音和用户实际说话的语音,而回声消除的目的就是要去掉其中的扬声器所发出的本机声音部分而只保留用户的语音。In another related technology, a display device with a far-field sound pickup function appears. The microphone array for the user to pick up the sound is set on the display device. Therefore, the user can emit voice without the remote control and be picked up by the display device directly. This method liberates the user's hands and greatly facilitates the user's use. However, due to the incomplete echo cancellation, the far-field pickup is interrupted and the recognition effect is deteriorated, thereby affecting the user experience. This is because the user’s far-field voice is often accompanied by the display device itself playing songs/videos and other local sounds through the speakers. Therefore, the microphone array actually collects the local sounds emitted by the display device’s speakers and the user’s actual Speaking voice, and the purpose of echo cancellation is to remove the local voice part of the speaker and only keep the user's voice.
在一些实施例中,显示装置的主板SOC发出准备要播放的声音信号至功放,由功放进行放大处理后,输出至扬声器进行播放。因此通常采用在SOC芯片的输出端,引出一路声音回采信号,以作为需要消除信号的参照。但是实际上,由于显示装置音响系统的需求,功放都会对需要播放的声音信号作相关的处理,因此需要播放的声音信号在经过功放的前后已经发生了非线性变化。因此造成所采集到声音回采信号与扬声器实际发出的本机声音具有一定的差距,因此即 使回声消除算法的精确性再高,也无法完全消除扬声器实际发出的本机声音,回声消除的不彻底的问题始终得不到解决。In some embodiments, the main board SOC of the display device sends out the sound signal to be played to the power amplifier, which is amplified by the power amplifier and then output to the speaker for playing. Therefore, it is usually used at the output end of the SOC chip to lead out a sound recovery signal as a reference to eliminate the signal. But in fact, due to the requirements of the audio system of the display device, the power amplifier will perform related processing on the sound signal that needs to be played, so the sound signal that needs to be played has already undergone non-linear changes before and after the power amplifier. Therefore, there is a certain gap between the collected sound recovery signal and the actual sound of the speaker. Therefore, even if the accuracy of the echo cancellation algorithm is high, the actual sound of the speaker cannot be completely eliminated, and the echo cancellation is incomplete. The problem has never been solved.
请参阅图15,本实施例的显示装置的主板43上包括SOC(System on Chip,系统芯片),与SOC连接的功放550。功放550的输出端连接有扬声器540,SOC输出待播放音频信号至功放550内,功放550对该音频信号进行放大、模数转换处理后,驱动扬声器540播放。扬声器540具体可以设置有两个或两个以上。Please refer to FIG. 15, the motherboard 43 of the display device in this embodiment includes a SOC (System on Chip), and a power amplifier 550 connected to the SOC. The output terminal of the power amplifier 550 is connected with a speaker 540, and the SOC outputs the audio signal to be played into the power amplifier 550. The power amplifier 550 amplifies the audio signal and performs analog-to-digital conversion processing to drive the speaker 540 to play. Specifically, two or more speakers 540 may be provided.
上述实施例中的拾音电路47内包括麦克板58,麦克板58上设置麦克风阵列511,麦克风阵列511包括多个间隔设置的麦克风,每相邻两麦克风之间的间距大致相同。麦克板58上还设置对从功放550后端获取的播放声音回采信号进行编码的第一编码器522,以及用于对麦克风输出信号进行编码的第二编码器512。In the above embodiment, the pickup circuit 47 includes a microphone board 58 on which a microphone array 511 is arranged. The microphone array 511 includes a plurality of microphones arranged at intervals, and the distance between two adjacent microphones is approximately the same. The microphone board 58 is also provided with a first encoder 522 for encoding the playback sound recovery signal obtained from the back end of the power amplifier 550, and a second encoder 512 for encoding the microphone output signal.
主板43与麦克板58需要通过接口座进行信号传输,麦克风阵列511拾取的远场声音以及从功放550后端获取的播放声音回采信号均通过该USB接口进行传输。接口座可以为USB口,或以USB的UAC(USB Audio Class)协议为接口协议所设计的专用的USB接口。The main board 43 and the microphone board 58 need to transmit signals through the interface socket. The far-field sound picked up by the microphone array 511 and the playback sound recovery signal acquired from the back end of the power amplifier 550 are all transmitted through the USB interface. The interface socket can be a USB port, or a dedicated USB interface designed with the UAC (USB Audio Class) protocol of the USB as the interface protocol.
本申请实施例提出了一种设备的远场语音处理电路。该设备可以是智能终端,例如显示装置。在以下实施例中,以远场语音处理电路应用于显示装置上为例说明。The embodiment of the present application proposes a far-field speech processing circuit of a device. The device may be a smart terminal, such as a display device. In the following embodiments, the application of the far-field speech processing circuit to the display device is taken as an example for description.
请参阅图16,该远场语音处理电路包括扬声器540、声音拾取电路510、预处理电路520以及主控芯片(图中未示出),主控芯片集成有回声处理电路531。其中,扬声器540用于播放设备输出的声音。声音拾取电路510用于拾 取远场声音,远场声音包括用户发出的远场语音和扬声器540播放的声音传输到声音拾取电路510的混合声音。Referring to FIG. 16, the far-field voice processing circuit includes a speaker 540, a sound pickup circuit 510, a preprocessing circuit 520, and a main control chip (not shown in the figure), and the main control chip integrates an echo processing circuit 531. Among them, the speaker 540 is used to play the sound output by the device. The sound pickup circuit 510 is used for picking up far-field sounds, and the far-field sounds include the far-field voice emitted by the user and the mixed sound that is transmitted to the sound pickup circuit 510 by the sound played by the speaker 540.
预处理电路520与声音拾取电路510连接,以接收拾取的远场声音,且预处理电路520连接到扬声器540的前端以获取播放声音回采信号。回声处理电路531与预处理电路520连接,以接收拾取的远场语音和播放声音回采信号,并用播放声音回采信号对拾取的远场声音进行回声消除,以得到用户发出的远场语音。在另一实施例中,回声处理电路531可以为单独的电路。The preprocessing circuit 520 is connected to the sound pickup circuit 510 to receive the picked up far-field sound, and the preprocessing circuit 520 is connected to the front end of the speaker 540 to obtain the playback sound recovery signal. The echo processing circuit 531 is connected to the preprocessing circuit 520 to receive the picked up far-field voice and the playback sound recovery signal, and use the playback sound recovery signal to echo cancel the picked far-field sound to obtain the far-field voice from the user. In another embodiment, the echo processing circuit 531 may be a separate circuit.
用户通过发出语音以与显示装置实现人机交互,而显示装置自身在工作时会通过扬声器540播放出音乐、视频中的语音等声音;因此声音拾取电路510不可避免的会拾取到用户发出的远场语音以及扬声器540播放的声音。The user implements human-computer interaction with the display device by emitting voice, and the display device itself will play music, voice in video and other sounds through the speaker 540 when it is working; therefore, the sound pickup circuit 510 will inevitably pick up the remote voice of the user. Field voice and sound played by the speaker 540.
参考图15,本实施例方案中,显示装置的主控芯片将需要播放的声音信号传输至功率放大器(简称功放550),功放550会对该需要播放的声音信号进行放大处理后,以驱动扬声器540播放声音。由于显示装置音响系统的需求,功放550都会对需要播放的声音信号作相关的处理,因此需要播放的声音信号在经过功放550的前后已经发生了非线性变化,因此在功放550的后端、扬声器540前端所获取的声音才能够较大程度的贴近扬声器540真实播放的声音。Referring to FIG. 15, in the solution of this embodiment, the main control chip of the display device transmits the sound signal to be played to the power amplifier (referred to as the power amplifier 550), and the power amplifier 550 will amplify the sound signal to be played to drive the speaker 540 plays a sound. Due to the needs of the audio system of the display device, the power amplifier 550 will process the sound signals that need to be played. Therefore, the sound signals that need to be played have undergone non-linear changes before and after the power amplifier 550. Therefore, in the back end of the power amplifier 550, the speaker The sound acquired by the front end of the 540 can be closer to the real sound played by the speaker 540 to a greater extent.
本实施例通过从功放550的后端、扬声器540前端获取播放声音的回采信号,因此该播放声音的回采信号与声音拾取电路510中拾取到的扬声器540播放的声音非常接近,因此基于该播放声音回采信号对拾取的远场声音进行回声消除,能够较大程度上降低用户发出的远场语音中掺杂的回声(该回声即是指扬声器540播放的声音),提高识别远场语音的准确率,从而提高了远程拾音的打断唤醒的灵敏度,提高了用户体验。In this embodiment, the playback signal of the playback sound is obtained from the back end of the power amplifier 550 and the front end of the speaker 540. Therefore, the playback signal of the playback sound is very close to the sound played by the speaker 540 picked up in the sound pickup circuit 510. Therefore, based on the playback sound The recovery signal performs echo cancellation on the far-field sound picked up, which can greatly reduce the echo doped in the far-field voice of the user (the echo refers to the sound played by the speaker 540), and improve the accuracy of recognizing far-field voice , Thereby improving the sensitivity of remote sound pickup interruption and wake-up, and improving user experience.
可以理解的是,本实施例中“声音”具体可以指代该声音对应的声波信号以及该声音对应的模拟信号、数字信号。例如,声音拾取电路510拾取的是远场声音的声波信号,经过处理后形成远场声音的数字信号,进而传输至预处理电路520。本领域技术人员有能力判断声音传输至不同的电路中,所会发生的一些格式上的变化。It is understandable that the "sound" in this embodiment may specifically refer to the sound wave signal corresponding to the sound and the analog signal and digital signal corresponding to the sound. For example, the sound pickup circuit 510 picks up the sound wave signal of the far-field sound, which is processed to form the digital signal of the far-field sound, and then is transmitted to the preprocessing circuit 520. Those skilled in the art have the ability to judge some format changes that occur when sound is transmitted to different circuits.
请参阅图17,本实施例中,预处理电路520包括前置处理电路521、第一编码器522。其中,前置处理电路521可以为MCU、单片机或其他一些具有音频接口的数字处理芯片。在以下实施例中,为了便于理解,以前置处理电路521为MCU为例进行说明。Referring to FIG. 17, in this embodiment, the preprocessing circuit 520 includes a preprocessing circuit 521 and a first encoder 522. The pre-processing circuit 521 may be an MCU, a single-chip microcomputer, or some other digital processing chips with audio interfaces. In the following embodiments, for ease of understanding, the preprocessing circuit 521 is an MCU as an example for description.
首先关于第一编码器522,前置处理电路21通过第一编码器522与扬声器540的前端连接,第一编码器522对播放声音回采信号进行模数转换。具体的,功放550的后端、扬声器540前端输出播放声音回采信号为模拟信号,因此第一编码器522对该播放声音回采信号进行模数转换,并将模数转换后的播放声音回采信号传输至MCU内(即前置处理电,521内)。当扬声器有多个时,第一编码器522能够对多个扬声器540输出的播放声音回采信号进行模数转换并转换成一通道的数字信号输出。First, with regard to the first encoder 522, the preprocessing circuit 21 is connected to the front end of the speaker 540 through the first encoder 522, and the first encoder 522 performs analog-to-digital conversion on the playback sound recovery signal. Specifically, the back end of the power amplifier 550 and the front end of the speaker 540 output the playback sound recovery signal as an analog signal, so the first encoder 522 performs analog-to-digital conversion on the playback sound recovery signal, and transmits the playback sound recovery signal after the analog-to-digital conversion. To the MCU (that is, the pre-processing circuit, 521). When there are multiple speakers, the first encoder 522 can perform analog-to-digital conversion on the playback sound recovery signals output by the multiple speakers 540 and convert them into a channel of digital signal output.
在此需要解释的是,一个音频信号的输出端在此对应为“一个通道”,多路扬声器输出的多路模拟信号能够在编码器内经过模数转换并通过一个通道输出。例如,第一编码器522具体可以采用X-POWER公司的AC108,AC108能够将两个扬声器540输出的模拟信号转换成一通道的数字信号输出。What needs to be explained here is that the output terminal of an audio signal corresponds to "one channel" here, and the multiple analog signals output by the multiple speakers can undergo analog-to-digital conversion in the encoder and output through one channel. For example, the first encoder 522 may specifically adopt the AC108 of X-POWER Company. The AC108 can convert the analog signals output by the two speakers 540 into a channel of digital signal output.
远场语音处理电路包括功率放大器,连接在扬声器540和显示装置的主控芯片之间。当扬声器540有多个时,播放声音回采信号包括从多个扬声器540 的前端获取的多路声音。The far-field voice processing circuit includes a power amplifier, which is connected between the speaker 540 and the main control chip of the display device. When there are multiple speakers 540, the playback sound recovery signal includes multiple sounds obtained from the front ends of the multiple speakers 540.
请参阅图18,具体的,本实施例中,远场语音处理电路还包括信号处理电路570,信号处理电路570的输入端与功放550的后端、扬声器540的前端连接,信号处理电路570的输出端与第一编码器522连接。即从功放550输出的播放声音回采信号经过信号处理电路进行降压、以及滤波处理后输入至第一编码器522。Referring to FIG. 18, specifically, in this embodiment, the far-field voice processing circuit further includes a signal processing circuit 570. The input end of the signal processing circuit 570 is connected to the back end of the power amplifier 550 and the front end of the speaker 540. The signal processing circuit 570 The output terminal is connected to the first encoder 522. That is, the playback sound recovery signal output from the power amplifier 550 is input to the first encoder 522 after the signal processing circuit performs voltage reduction and filtering processing.
信号处理电路570可以采用BUCK降压电路或电阻分压电路对从功放550输出的播放声音回采信号进行降压;也可以采用RC滤波电路,对经过降压后的播放声音回采信号进行滤波处理。The signal processing circuit 570 can use a BUCK step-down circuit or a resistor divider circuit to step down the playback sound recovery signal output from the power amplifier 550; it can also use an RC filter circuit to filter the playback sound playback signal after the step-down.
同时,声音拾取电路510(参考图16和图18)包括麦克风阵列511,以及与麦克风阵列511电连接的第二编码器512。其中,麦克风阵列511包括多个麦克风,每个麦克风均能够拾取远场声音;多个麦克风同时拾取远场声音,以生成多路远场声音的模拟信号。多个麦克风按照线性阵列排布,采集原始的远场声音信号并转化成模拟电信号,再输出给后端的第一编码器522内。Meanwhile, the sound pickup circuit 510 (refer to FIGS. 16 and 18) includes a microphone array 511, and a second encoder 512 electrically connected to the microphone array 511. Among them, the microphone array 511 includes multiple microphones, each of which can pick up far-field sounds; multiple microphones simultaneously pick up far-field sounds to generate multiple analog signals of far-field sounds. The multiple microphones are arranged in a linear array, and the original far-field sound signals are collected and converted into analog electrical signals, and then output to the first encoder 522 at the back end.
第二编码器512用于将远场声音的模拟信号进行模数转换。第二编码器512还用于在对远场声音的模拟信号进行模数转换后,对多路远场声音的数字信号转换成一路音频信号传输至MCU。The second encoder 512 is used for analog-to-digital conversion of the analog signal of the far-field sound. The second encoder 512 is also used to convert the digital signals of multiple channels of far-field sounds into one channel of audio signals to transmit to the MCU after performing analog-to-digital conversion on the analog signals of the far-field sounds.
例如第二编码器512可以采用X-POWER公司的AC108,AC108包含四通道的模数转换器,能够将四个麦克风输出的共四路模拟信号进行模数转换,并转换为一通道的数字信号输出。For example, the second encoder 512 can use X-POWER’s AC108. AC108 contains a four-channel analog-to-digital converter, which can convert a total of four analog signals output by four microphones into analog-to-digital conversion and convert them into one-channel digital signals. Output.
在上述实施例中,第一编码器522、第二编码器512所转化成的一通道数字音频信号可以是IIS音频格式或TDM音频格式。In the foregoing embodiment, the one-channel digital audio signal converted by the first encoder 522 and the second encoder 512 may be in the IIS audio format or the TDM audio format.
需要说明的是,本实施例中,在线性麦克风阵列511的在传输信号过程中尽量保证同步,使得传输的波形相位差不能超过180°。具体的,可以采用1kHz单频电信号进行通入麦克风阵列511中进行测试,以便更好的观察每个麦克风输出信号的相位差。It should be noted that, in this embodiment, the linear microphone array 511 is synchronized as much as possible during the signal transmission process, so that the phase difference of the transmitted waveforms cannot exceed 180°. Specifically, a 1kHz single-frequency electrical signal can be used to pass into the microphone array 511 for testing, so as to better observe the phase difference of each microphone output signal.
具体的,当麦克风阵列511中有四个麦克风时,四个麦克风会相应输出四路远场声音的模拟信号至第二编码器512,第二编码器512对这四路远场声音的模拟信号进行数模转换并转换形成一通道数字音频信号,以传输至MCU对应的音频接口上。可以理解的是,该一通道音频信号实质上包含了4个麦克风输出的模拟信号。Specifically, when there are four microphones in the microphone array 511, the four microphones will correspondingly output four analog signals of the far-field sound to the second encoder 512, and the second encoder 512 will respond to the four analog signals of the far-field sound. Perform digital-to-analog conversion and conversion to form a one-channel digital audio signal for transmission to the corresponding audio interface of the MCU. It is understandable that the one-channel audio signal substantially includes analog signals output by 4 microphones.
请参阅图19,在一实施例中,CON1-CON4为四个麦克风的接口,麦克风按直线等距摆放,两两间距大致35mm,组成满足算法空间需求的线性四麦阵列。四路麦克的模拟信号直接输入到第二编码器512中完成模数转换及低通滤波等信号处理,然后转换成1通道的IIS格式的音频信号,并通过IIS接口将该音频信号传输至MCU对应的IIS接口。Referring to FIG. 19, in one embodiment, CON1-CON4 are interfaces for four microphones. The microphones are placed equidistantly in a straight line, with a spacing of approximately 35mm between two pairs to form a linear four-microphone array that meets the space requirements of the algorithm. The analog signals of the four microphones are directly input into the second encoder 512 to complete signal processing such as analog-to-digital conversion and low-pass filtering, and then converted into a 1-channel IIS format audio signal, and the audio signal is transmitted to the MCU through the IIS interface The corresponding IIS interface.
参考图17关于前置处理电路521。前置处理电路521与声音拾取电路510和扬声器540的前端耦接,以将拾取的远场声音和播放声音回采信号转换成回声处理电路531兼容的格式。具体的,前置处理电路521可以为MCU,当MCU接收到转换成一通道的远场声音信号和转换成一通道的播放声音回采信号后,会将远场声音信号和播放声音回采信号进行合成,以形成回声处理电路531兼容的格式的音频信号,从而可以使MCU可以将经过处理后的远场声音信号和播放声音回采信号传输至回声处理电路531。在本实施例中,由于回声处理电路531集成于显示装置SOC内。因此MCU需要将远场声音信号和播放声音回采 信号后合成SOC所能兼容的格式的音频信号。Refer to FIG. 17 for the preprocessing circuit 521. The pre-processing circuit 521 is coupled to the sound pickup circuit 510 and the front end of the speaker 540 to convert the picked-up far-field sound and the playback sound recovery signal into a format compatible with the echo processing circuit 531. Specifically, the pre-processing circuit 521 may be an MCU. When the MCU receives the far-field sound signal converted into one channel and the playback sound recovery signal converted into one channel, it will synthesize the far-field sound signal and the playback sound recovery signal to An audio signal in a format compatible with the echo processing circuit 531 is formed, so that the MCU can transmit the processed far-field sound signal and the playback sound recovery signal to the echo processing circuit 531. In this embodiment, the echo processing circuit 531 is integrated in the SOC of the display device. Therefore, the MCU needs to synthesize the audio signal in a format compatible with the SOC after the far-field sound signal and the playback sound recovery signal.
在一实施例中,MCU将远场声音信号和所述播放声音回采信号转换成USB的数据格式,以使MCU能够通过USB接口的UAC(USB Audio Class)协议,利用标准的USB数据线,完成MCU与SOC之间音频数据的传输。In one embodiment, the MCU converts the far-field sound signal and the playback sound recovery signal into a USB data format, so that the MCU can use a standard USB data cable through the UAC (USB Audio Class) protocol of the USB interface. Audio data transmission between MCU and SOC.
本实施例通过设置预处理电路521,以接收拾取的远场声音以及播放声音回采信号,从而克服了现有许多显示装置SOC芯片没有相应的音频传输接口,而无法接收麦克风阵列511所传输的远场声音的缺陷。因此本申请技术方案提高了远场语音人机交互技术在显示装置上的普及。In this embodiment, the pre-processing circuit 521 is provided to receive the picked up far-field sound and play the sound recovery signal, thereby overcoming that many existing display device SOC chips do not have corresponding audio transmission interfaces and cannot receive the far-field transmitted by the microphone array 511. Defects of field sound. Therefore, the technical solution of the present application improves the popularity of the far-field voice human-computer interaction technology on the display device.
在一些实施例中,在进行格式转换前,MCU还用于调节拾取的远场声音与播放声音回采信号的相位,以使播放声音回采信号的相位超前于拾取的远场声音的相位在预设时长之内。这是为了满足SOC中回声处理算法的要求,提高回声处理效果。具体的,使播放声音回采信号的相位超前于拾取的远场声音的相位在20ms之内,由此可以实现对扬声器540所播放的声音更好的消除。In some embodiments, before the format conversion, the MCU is also used to adjust the phase of the picked-up far-field sound and the playback sound recovery signal, so that the phase of the playback sound recovery signal is ahead of the phase of the picked-up far-field sound. Within time. This is to meet the requirements of the echo processing algorithm in SOC and improve the effect of echo processing. Specifically, the phase of the playback sound recovery signal is ahead of the phase of the picked up far-field sound within 20 ms, so that the sound played by the speaker 540 can be better eliminated.
在一些实施例中,MCU还用于对拾取的远场声音与播放声音回采信号通过算法做低通滤波,以过滤频率高于8KHz的音频,以实现最终MCU输出的远场声音与播放声音回采信号无谐波、无混叠;提高对远场声音与播放声音回采信号的预处理效果,进而提高回声处理效果。In some embodiments, the MCU is also used to perform low-pass filtering on the far-field sound and playback sound recovery signals picked up by an algorithm to filter audio with a frequency higher than 8KHz to achieve the final far-field sound and playback sound recovery output by the MCU The signal has no harmonics and no aliasing; it improves the preprocessing effect of the far-field sound and the playback sound recovery signal, thereby improving the echo processing effect.
需要说明的是,在MCU中,可以先对远场声音与播放声音回采信号通过算法做低通滤波,再进行两者之间的相位调节,最后再进行格式转换;也可以先对远场声音与播放声音回采信号进行相位调节,再进行滤波,最后进行格式转换。例如,MCU接收到前端第一编码器522输出的数字化的播放声音回采信号和第二编码器512输出的数字化远场声音信号后,先对它们进行低通滤波的 处理,防止产生混叠现象而影响回音消除算法的识别,然后进行远场声音信号与播放声音回采信号的相位差控制与调节,最后将上述处理后的远场声音与播放声音回采信号合成USB格式的音频信号,传给后端的SOC处理。It should be noted that, in the MCU, the far-field sound and the playback sound recovery signal can be low-pass filtered through the algorithm, and then the phase between the two can be adjusted, and finally the format conversion can be performed; the far-field sound can also be converted first. Perform phase adjustment with the playback sound recovery signal, then filter, and finally perform format conversion. For example, after the MCU receives the digitized playback sound retrieving signal output by the front-end first encoder 522 and the digitized far-field sound signal output by the second encoder 512, it first performs low-pass filtering on them to prevent aliasing. Affect the recognition of the echo cancellation algorithm, and then control and adjust the phase difference between the far-field sound signal and the playback sound recovery signal. Finally, the processed far-field sound and the playback sound recovery signal are synthesized into a USB format audio signal and transmitted to the back-end SOC processing.
请参阅图18,在本实施例中,远场语音处理电路还包括加密芯片580,加密芯片580用于存储远程语音识别算法的密钥,MCU用于与加密芯片580通信。只有当MCU和加密芯片580通信成功,才能启动远场语音的识别算法。具体的,在显示装置上电后,MCU会与加密芯片580进行通信,当通信成功后,SOC对远场声音进行回音处理后所获得的远场语音才能够进一步被后续的远场语音识别算法所进一步识别,以解析远场语音的语义。Referring to FIG. 18, in this embodiment, the far-field speech processing circuit further includes an encryption chip 580, the encryption chip 580 is used to store the key of the remote speech recognition algorithm, and the MCU is used to communicate with the encryption chip 580. Only when the MCU and the encryption chip 580 communicate successfully, can the far-field speech recognition algorithm be started. Specifically, after the display device is powered on, the MCU will communicate with the encryption chip 580. After the communication is successful, the far-field voice obtained after the SOC echoes the far-field voice can be further used by the subsequent far-field voice recognition algorithm. It is further recognized to analyze the semantics of far-field speech.
在SOC中通过回声处理算法实现将拾取的远场声音中去除掉与播放声音回采信号所对应的部分,从而保留用户的发出的远场语音。现有的回声处理算法均可以应用于本实施例中,在此不做具体限定。In the SOC, the echo processing algorithm is used to remove the part of the picked-up far-field sound that corresponds to the playback sound recovery signal, so as to preserve the far-field voice of the user. Existing echo processing algorithms can all be applied in this embodiment, which is not specifically limited here.
在一实施例中,麦克风阵列511采集的远场声音信号送入SOC后,集成在SOC中的语音服务程序场(语音server APK)中的回声消除算法通过动态判断语音信号中麦克风阵列511所拾取的远场语音及扬声器540输出的播放声音回采信号的能量差值与相位差值,进而能够将麦克风阵列511所拾取的远场语音信号中的用户发出的远场语音提取出来,从而消除了显示装置本机播放的声音所造成的回声干扰现象。In one embodiment, after the far-field sound signal collected by the microphone array 511 is sent to the SOC, the echo cancellation algorithm in the voice service program field (voice server APK) integrated in the SOC dynamically determines the voice signal picked up by the microphone array 511 The far-field voice and the energy difference and phase difference of the playback sound output signal from the speaker 540 can be extracted from the far-field voice signal picked up by the microphone array 511, thereby eliminating the display The echo interference phenomenon caused by the sound played by the device.
经过SOC中回声处理电路531处理后,还需要对已经经过回声处理后的远程语音进行进一步处理,以最大程度上还原用户真实发出的远场语音。请参阅图20和图21。After being processed by the echo processing circuit 531 in the SOC, the remote voice that has been echo-processed needs to be further processed to restore the far-field voice actually emitted by the user to the greatest extent. Refer to Figure 20 and Figure 21.
SOC内还包括语音增强电路633以及声源定位电路632,回声消除电路输 出的回声消除后的远场声音分别传输至语音增强电路633以及声源定位电路632;语音增强电路633与声源定位电路632连接,以接收声源定位电路632输出的声源定位结果,并根据声源定位结果,对回声消除后的远场声音进行增强处理。语音增强电路633可以包括波束形成电路6331,去混响电路6332、以及降噪电路6333中的一个或多个。The SOC also includes a speech enhancement circuit 633 and a sound source localization circuit 632. The far-field sound after echo cancellation output by the echo cancellation circuit is transmitted to the speech enhancement circuit 633 and the sound source localization circuit 632 respectively; the speech enhancement circuit 633 and the sound source localization circuit 632 is connected to receive the sound source localization result output by the sound source localization circuit 632, and according to the sound source localization result, the far-field sound after echo cancellation is enhanced. The speech enhancement circuit 633 may include one or more of a beam forming circuit 6331, a de-reverberation circuit 6332, and a noise reduction circuit 6333.
在一实施例中,语音增强电路633同时包括依次连接的波束形成电路6331,去混响电路6332、以及降噪电路6333,以对回声消除后的远场声音依次进行波束形成、去混响、和降噪处理,从而生成以形成待上传远场语音。In an embodiment, the speech enhancement circuit 633 also includes a beam forming circuit 6331, a de-reverberation circuit 6332, and a noise reduction circuit 6333 that are connected in sequence to perform beam forming, de-reverberation, and de-reverberation on the far-field sound after echo cancellation. And noise reduction processing to generate far-field voice to be uploaded.
在该实施例中,通过声源定位电路632,以识别用户远场语音的来源位置,并将该位置反馈给语音增强电路633,语音增强电路633基于已确定的用户远场语音的来源位置,进行波束形成,并基于形成的波束对相应区域的语音进行抑制,并进一步进行降噪处理,以最终得到待上传远场语音。本实施例所得到的待上传远场语音已经极为接近用户发出的真实远场语音。In this embodiment, the sound source location circuit 632 is used to identify the source location of the user's far-field voice, and feed this location back to the voice enhancement circuit 633. The voice enhancement circuit 633 is based on the determined source location of the user's far-field voice. Perform beam forming, and suppress the voice in the corresponding area based on the formed beam, and further perform noise reduction processing to finally obtain the far-field voice to be uploaded. The far-field voice to be uploaded obtained in this embodiment is already very close to the real far-field voice uttered by the user.
在一些实施例中,在得到待上传远场语音后,还需要对该待上传远场语音进行语义分析。具体的,SOC中还包括语音引擎电路634,语音引擎电路634与语音增强电路633的输出端连接,语音引擎电路634将待上传远场声音进行唤醒词识别处理,当识别到预设的唤醒词时,会触发唤醒事件,进而将待上传远场声音进行编码,传输到指定终端660;语音引擎电路634还用于接收从指定终端660返回的与远场声音对应的指令。In some embodiments, after obtaining the far-field voice to be uploaded, semantic analysis of the far-field voice to be uploaded is also required. Specifically, the SOC also includes a speech engine circuit 634. The speech engine circuit 634 is connected to the output terminal of the speech enhancement circuit 633. The speech engine circuit 634 performs wake-up word recognition processing on the far-field sound to be uploaded. When a preset wake-up word is recognized When the time, the wake-up event is triggered, and the far-field sound to be uploaded is encoded and transmitted to the designated terminal 660; the speech engine circuit 634 is also used to receive the instruction corresponding to the far-field sound returned from the designated terminal 660.
具体的,指定终端660可以为云端,也可以是显示装置内的其他处理电路。在此以上传至云端为例,在云端进行语音识别以及语义理解,并通过在线语音合成,生成与远场声音对应的指令,通过执行该指令,从而完成显示装置的人 机语音交互全过程。Specifically, the designated terminal 660 may be the cloud, or may be other processing circuits in the display device. Taking uploading to the cloud as an example, voice recognition and semantic understanding are performed in the cloud, and instructions corresponding to far-field sounds are generated through online voice synthesis. By executing the instructions, the entire process of human-machine voice interaction of the display device is completed.
语音引擎电路634从云端接收到的指令可以是包含回答用户提出的问题的语音回复信息,语音回复信息可以通过显示装置的功放550、扬声器540将该进行播出。该指令也可以依照用户远场语音中的控制要求,以控制显示装置响应的控制指令;显示装置的SOC根据该控制指令控制相关的电路响应该控制指令。例如该控制指令为关机,此时SOC协调显示装置的供电系统以停止对显示系统的供电。The instructions received by the voice engine circuit 634 from the cloud may include voice response messages that answer questions raised by the user, and the voice response messages may be broadcast through the power amplifier 550 and the speaker 540 of the display device. The instruction can also control the control instruction that the display device responds to according to the control requirements in the user's far-field voice; the SOC of the display device controls the relevant circuit to respond to the control instruction according to the control instruction. For example, the control command is shutdown, and the SOC coordinates the power supply system of the display device to stop the power supply to the display system.
在一些实施例中,当唤醒事件被触发后,待上传语音会同步上传到语音服务程序场(语音server APK),再由语音服务程序场上报到算法提供方的云服务后台,实现唤醒的闭环优化;由此可以提高对由不同音色、发音所发出的唤醒词识别的灵敏度。In some embodiments, when the wake-up event is triggered, the voice to be uploaded will be synchronously uploaded to the voice service program (voice server APK), and then reported to the algorithm provider’s cloud service background by the voice service program to realize the closed loop of wake-up Optimization; This can improve the sensitivity of the recognition of wake-up words issued by different timbres and pronunciations.
以上实施例中,回声处理电路531、语音增强电路633、声源定位电路632、语音引擎电路634可以为单独的电路,在本实施例中,它们均为算法电路,而存储于SOC内。In the above embodiment, the echo processing circuit 531, the speech enhancement circuit 633, the sound source localization circuit 632, and the speech engine circuit 634 may be separate circuits. In this embodiment, they are all algorithm circuits and are stored in the SOC.
本申请技术方案中,考虑到设备音响系统的需求,功放550都会对需要播放的声音信号做相关的处理,因此需要播放的声音信号在经过功放550的前后已经发生了非线性变化;因此本方案从功放550的后端、扬声器540前端获取播放声音回采信号,因此即便在功放550中进行了均衡、放大等非线性信号处理后,预处理电路521所得到的播放声音回采信号与声音拾取电路510所拾取到的扬声器540播放的声音是极为接近的,因此基于该播放声音回采信号对拾取的远场声音进行回声消除,能够较大程度上降低用户发出的远场语音中的回声干扰,提高识别远场语音的准确率,从而提高了远程拾音的打断唤醒的灵敏 度,提高了用户体验;In the technical solution of this application, taking into account the requirements of the audio system of the device, the power amplifier 550 will perform related processing on the sound signal that needs to be played. Therefore, the sound signal that needs to be played has undergone nonlinear changes before and after passing through the power amplifier 550; therefore, this solution The playback sound recovery signal is obtained from the back end of the power amplifier 550 and the front end of the speaker 540. Therefore, even after nonlinear signal processing such as equalization and amplification is performed in the power amplifier 550, the playback sound recovery signal obtained by the preprocessing circuit 521 and the sound pickup circuit 510 The picked-up sound played by the loudspeaker 540 is very close, so based on the playback sound recovery signal, the echo cancellation of the far-field sound picked up can greatly reduce the echo interference in the far-field voice sent by the user and improve the recognition The accuracy of far-field voice, thereby improving the sensitivity of remote sound pickup to interrupt wake-up, and improve user experience;
另外,本实施例通过设置预处理电路521,以接收拾取的远场声音以及播放声音回采信号,从而克服了现有许多显示装置SOC芯片没有相应的接口,而无法接收麦克风阵列511所传输的远场声音的缺陷。因此本申请技术方案提高了远场语音人机交互技术在显示装置上的普及。In addition, in this embodiment, the preprocessing circuit 521 is set to receive the picked-up far-field sound and to play the sound recovery signal, thereby overcoming that many existing display device SOC chips do not have corresponding interfaces and cannot receive the far-field transmitted by the microphone array 511. Defects of field sound. Therefore, the technical solution of the present application improves the popularity of the far-field voice human-computer interaction technology on the display device.
以上所述,仅是本申请的较佳实施例而已,并非对本申请作任何形式上的限制,依据本申请的技术实质对以上实施例所作的任何简单修改、等同变化与修饰,均仍属于本申请技术方案的范围内。The above are only preferred embodiments of the application, and do not limit the application in any form. Any simple amendments, equivalent changes and modifications made to the above embodiments based on the technical essence of the application still belong to the present application. Within the scope of applying for technical solutions.
最后应说明的是:以上各实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述各实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the application, not to limit them; although the application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: It is still possible to modify the technical solutions described in the foregoing embodiments, or equivalently replace some or all of the technical features; these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the embodiments of the application range.

Claims (20)

  1. 一种显示装置,其特征在于,包括:A display device, characterized by comprising:
    语音处理电路、功率放大器、扬声器、语音采集电路和回音处理电路;Voice processing circuit, power amplifier, speaker, voice collection circuit and echo processing circuit;
    其中,所述语音处理电路、所述功率放大器和所述扬声器依次连接;所述语音处理电路,分别与所述语音采集电路和所述回音处理电路连接;Wherein, the voice processing circuit, the power amplifier and the loudspeaker are connected in sequence; the voice processing circuit is connected to the voice collection circuit and the echo processing circuit respectively;
    所述语音处理电路用于向所述功率放大器发送原始播放信号;所述功率放大器用于对所述原始播放信号进行处理后,将得到的待播放信号发送至所述扬声器进行播放;The voice processing circuit is used to send an original playback signal to the power amplifier; the power amplifier is used to process the original playback signal, and then send the obtained signal to be played to the speaker for playback;
    所述语音采集电路用于,采集所述显示装置所在环境中的待处理语音信号;The voice collection circuit is used to collect voice signals to be processed in the environment where the display device is located;
    所述回音处理电路用于,获取所述功率放大器向所述扬声器发送的所述待播放信号;The echo processing circuit is configured to obtain the signal to be played sent by the power amplifier to the speaker;
    所述语音处理电路还用于,根据所述待播放信号,对所述待处理语音信号进行回音消除处理。The voice processing circuit is further configured to perform echo cancellation processing on the voice signal to be processed according to the signal to be played.
  2. 根据权利要求1所述的显示装置,其特征在于,The display device according to claim 1, wherein:
    所述回音处理电路还用于,对所述待播放信号进行预处理;The echo processing circuit is also used to preprocess the signal to be played;
    所述语音处理电路具体用于,根据所述预处理后的待播放信号,对所述待处理语音信号进行回音消除处理。The voice processing circuit is specifically configured to perform echo cancellation processing on the voice signal to be processed according to the preprocessed signal to be played.
  3. 根据权利要求2所述的显示装置,其特征在于,The display device according to claim 2, wherein:
    所述预处理包括:降幅处理。The preprocessing includes: amplitude reduction processing.
  4. 根据权利要求3所述的显示装置,其特征在于,The display device according to claim 3, wherein:
    所述功率放大器还用于,对所述待播放信号进行差分处理,得到所述待播放信号对应的左声道信号和右声道信号,并发送至所述扬声器进行播放;The power amplifier is also used to perform differential processing on the signal to be played to obtain a left channel signal and a right channel signal corresponding to the signal to be played, and send them to the speaker for playback;
    所述预处理还包括:转单端处理。The preprocessing also includes: converting to single-ended processing.
  5. 根据权利要求4所述的显示装置,其特征在于,The display device according to claim 4, wherein:
    所述左声道信号包括:左声道正向差分信号和左声道负向差分信号;The left channel signal includes: a left channel positive differential signal and a left channel negative differential signal;
    所述回音处理电路包括:左声道处理电路;The echo processing circuit includes: a left channel processing circuit;
    所述左声道处理电路用于,对所述左声道正向差分信号和所述左声道负向差分信号进行降幅处理和转单端处理;The left channel processing circuit is configured to perform amplitude reduction processing and single-ended processing on the left channel positive differential signal and the left channel negative differential signal;
    其中,所述左声道处理电路包括:第一输入电阻、第一反馈电阻和第一运算放大器;所述左声道正向差分信号连接所述第一运算放大器的同向输入端,所述左声道负向差分信号通过所述第一输入电阻连接所述第一运算放大器的反向输入端,所述第一运算放大器的输出端通过所述第一反馈电阻连接所述第一运算放大器的反向输入端。Wherein, the left channel processing circuit includes: a first input resistor, a first feedback resistor, and a first operational amplifier; the left channel positive differential signal is connected to the same direction input end of the first operational amplifier, and The negative differential signal of the left channel is connected to the inverting input terminal of the first operational amplifier through the first input resistor, and the output terminal of the first operational amplifier is connected to the first operational amplifier through the first feedback resistor The reverse input terminal.
  6. 根据权利要求4所述的显示装置,其特征在于,The display device according to claim 4, wherein:
    所述右声道信号包括:右声道正向差分信号和右声道负向差分信号;The right channel signal includes: a right channel positive differential signal and a right channel negative differential signal;
    所述回音处理电路包括:右声道处理电路;The echo processing circuit includes: a right channel processing circuit;
    所述右声道处理电路用于,对所述右声道正向差分信号和所述右声道负向差分信号进行降幅处理和转单端处理;The right channel processing circuit is configured to perform amplitude reduction processing and single-ended processing on the right channel positive differential signal and the right channel negative differential signal;
    其中,所述右声道处理电路包括:第二输入电阻、第二反馈电阻和第二运算放大器;所述右声道正向差分信号连接所述第二运算放大器的同向输入端,所述右声道负向差分信号通过所述第二输入电阻连接所述第二运算放大器的反向输入端,所述第二运算放大器的输出端通过所述第二反馈电阻连接所述第二运算放大器的反向输入端。Wherein, the right channel processing circuit includes: a second input resistor, a second feedback resistor, and a second operational amplifier; the right channel positive differential signal is connected to the same direction input terminal of the second operational amplifier, and The negative differential signal of the right channel is connected to the inverting input terminal of the second operational amplifier through the second input resistor, and the output terminal of the second operational amplifier is connected to the second operational amplifier through the second feedback resistor The reverse input terminal.
  7. 根据权利要求1-6任一项所述的显示装置,其特征在于,The display device according to any one of claims 1-6, wherein:
    所述语音采集电路由MIC阵列组成,所述MIC阵列包括多个MIC;The voice collection circuit is composed of a MIC array, and the MIC array includes a plurality of MICs;
    所述待处理语音信号为所述MIC阵列所采集的脉冲密度调制PDM信号。The voice signal to be processed is a pulse density modulated PDM signal collected by the MIC array.
  8. 根据权利要求7所述的显示装置,其特征在于,The display device according to claim 7, wherein:
    所述MIC阵列包括依次排列的第一MIC、第二MIC、第三MIC和第四MIC;将间隔设置的第一MIC和第三MIC记为第一组MIC,将间隔设置的第二MIC和第四MIC记为第二组MIC;The MIC array includes a first MIC, a second MIC, a third MIC, and a fourth MIC arranged in sequence; the first MIC and the third MIC arranged at intervals are recorded as the first group of MICs, and the second MICs arranged at intervals and The fourth MIC is recorded as the second group of MIC;
    所述MIC阵列具体通过所述第一组MIC和所述第二组MIC轮流循环采集所述待处理语音信号。The MIC array specifically collects the to-be-processed voice signal in turn through the first group of MICs and the second group of MICs.
  9. 根据权利要求7或8所述的显示装置,其特征在于,所述语音处理电路还用于,The display device according to claim 7 or 8, wherein the voice processing circuit is further used for:
    对所述PDM信号进行采样和模数转换处理。Sampling and analog-to-digital conversion processing are performed on the PDM signal.
  10. 根据权利要求1-9任一项所述的显示装置,其特征在于,所述语音处理电路还用于,The display device according to any one of claims 1-9, wherein the voice processing circuit is further used for:
    识别回音消除处理后的所述待处理语音信号中的指令,并执行所述指令对应的功能。Recognize the instructions in the voice signal to be processed after the echo cancellation processing, and execute the functions corresponding to the instructions.
  11. 根据权利要求1-9任一项所述的显示装置,其特征在于,所述语音处理电路还用于,The display device according to any one of claims 1-9, wherein the voice processing circuit is further used for:
    向服务器发送回音消除处理后的所述待处理语音信号,使所述服务器识别所述待处理语音信号中的指令后,向所述语音处理电路发送指示消息;Sending the voice signal to be processed after echo cancellation processing to the server, so that the server may send an instruction message to the voice processing circuit after recognizing the instruction in the voice signal to be processed;
    接收所述服务器发送的所述指示消息,并执行所述指示消息对应的功能。Receiving the instruction message sent by the server, and executing the function corresponding to the instruction message.
  12. 一种显示装置,其特征在于,包括扬声器以及远场语音处理电路;所述远场语音处理电路包括:A display device, characterized by comprising a speaker and a far-field speech processing circuit; the far-field speech processing circuit includes:
    声音拾取电路,用于拾取远场声音,所述远场声音包括用户发出的远场语音和所述扬声器播放的声音传输到声音拾取电路的声音;A sound pickup circuit for picking up far-field sounds, where the far-field sounds include far-field voices emitted by a user and sounds played by the speaker and transmitted to the sound pickup circuit;
    预处理电路,与所述声音拾取电路连接,以接收拾取的远场声音,且所述预处理电路连接到扬声器的前端以获取播放声音回采信号;A preprocessing circuit connected to the sound pickup circuit to receive the picked-up far-field sound, and the preprocessing circuit is connected to the front end of the speaker to obtain the playback sound recovery signal;
    回声处理电路,与所述预处理电路连接,以接收拾取的远场声音和所述播放声音回采信号,并用所述播放声音回采信号对所述拾取的远场声音进行回声消除,以得到用户发出的远场语音。The echo processing circuit is connected to the preprocessing circuit to receive the picked up far-field sound and the playback sound recovery signal, and use the playback sound recovery signal to echo cancel the picked up far-field sound to obtain the user's output Far-field voice.
  13. 根据权利要求12所述的显示装置,其特征在于,所述预处理电路包括:The display device of claim 12, wherein the preprocessing circuit comprises:
    前置处理电路,与所述声音拾取电路和所述扬声器的前端耦接,以将拾取的远场声音和所述播放声音回采信号转换成所述回声处理电路兼容的格式。The pre-processing circuit is coupled with the sound pickup circuit and the front end of the speaker to convert the picked-up far-field sound and the playback sound recovery signal into a format compatible with the echo processing circuit.
  14. 根据权利要求13所述的显示装置,其特征在于,所述前置处理电路还用于调节拾取的远场声音与所述播放声音回采信号的相位,以使所述播放声音回采信号的相位超前于所述拾取的远场声音的相位在预设时长之内。The display device according to claim 13, wherein the pre-processing circuit is further used to adjust the phase of the picked-up far-field sound and the playback sound recovery signal, so that the phase of the playback sound recovery signal leads The phase of the picked up far-field sound is within the preset time length.
  15. 根据权利要求13所述的显示装置,其特征在于,所述预处理电路还包括:The display device according to claim 13, wherein the preprocessing circuit further comprises:
    第一编码器,所述前置处理电路通过所述第一编码器与所述扬声器的前端连接,所述第一编码器对所述播放声音回采信号进行模数转换。A first encoder, the pre-processing circuit is connected to the front end of the speaker through the first encoder, and the first encoder performs analog-to-digital conversion on the playback sound recovery signal.
  16. 根据权利要求15所述的显示装置,其特征在于,所述显示装置包括功率放大器;所述功率放大器连接在所述扬声器和所述回声处理电路之间,用于向所述扬声器提供设备输出的多路声音;所述播放声音回采信号包括从扬声器的前端获取的所述多路声音;The display device according to claim 15, wherein the display device comprises a power amplifier; the power amplifier is connected between the speaker and the echo processing circuit, and is used to provide equipment output to the speaker Multi-channel sound; the playback sound recovery signal includes the multi-channel sound obtained from the front end of the speaker;
    所述第一编码器还用于将从扬声器的前端获得的多路声音转换成一通道的 数字信号输出。The first encoder is also used to convert multiple channels of sound obtained from the front end of the speaker into one channel of digital signal output.
  17. 根据权利要求12所述的智能设备的显示装置,其特征在于,所述声音拾取电路包括麦克风阵列,以及与所述麦克风阵列电连接的第二编码器,其中,所述麦克风阵列用于拾取所述远场声音;所述第二编码器用于对所述远场声音进行模数转换;The display device of the smart device according to claim 12, wherein the sound pickup circuit comprises a microphone array, and a second encoder electrically connected to the microphone array, wherein the microphone array is used to pick up The far-field sound; the second encoder is used for analog-to-digital conversion of the far-field sound;
    所述第二编码器还用于对所述麦克风阵列拾取的多路远场声音进行合成。The second encoder is also used for synthesizing multiple far-field sounds picked up by the microphone array.
  18. 根据权利要求12所述的显示装置,其特征在于,所述远场声音处理电路还包括语音增强电路以及声源定位电路,所述回声消除电路输出的回声消除后的远场声音分别传输至所述语音增强电路以及声源定位电路;The display device according to claim 12, wherein the far-field sound processing circuit further comprises a speech enhancement circuit and a sound source localization circuit, and the echo-cancelled far-field sound output by the echo cancellation circuit is transmitted to all The speech enhancement circuit and the sound source localization circuit;
    所述语音增强电路与所述声源定位电路连接,以接收所述声源定位电路输出的声源定位结果,并根据所述声源定位结果,对回声消除后的远场声音进行增强处理,以生成以形成待上传远场语音。The speech enhancement circuit is connected to the sound source localization circuit to receive the sound source localization result output by the sound source localization circuit, and according to the sound source localization result, enhance the far-field sound after echo cancellation, To generate to form the far-field voice to be uploaded.
  19. 根据权利要求18所述的显示装置,其特征在于,所述显示装置还包括语音引擎电路,所述语音引擎电路与所述语音增强电路的输出端连接,所述语音引擎电路将所述待上传远场语音进行唤醒词识别处理,以在识别到预设的唤醒词时,将所述待上传远场语音进行编码,传输到指定终端;The display device of claim 18, wherein the display device further comprises a voice engine circuit, the voice engine circuit is connected to the output terminal of the voice enhancement circuit, and the voice engine circuit transfers the to-be-uploaded The far-field voice performs wake-up word recognition processing, so that when the preset wake-up word is recognized, the far-field voice to be uploaded is encoded and transmitted to the designated terminal;
    所述语音引擎电路还用于接收从指定终端返回的与所述远场语音对应的指令。The voice engine circuit is also used to receive an instruction corresponding to the far-field voice returned from a designated terminal.
  20. 根据权利要求19所述的显示装置,其特征在于,所述显示装置具有主控芯片,所述回声处理电路、语音增强电路、声源定位电路、语音引擎电路均集成于所述主控芯片内。The display device according to claim 19, wherein the display device has a main control chip, and the echo processing circuit, the speech enhancement circuit, the sound source localization circuit, and the speech engine circuit are all integrated in the main control chip .
PCT/CN2020/075958 2019-07-10 2020-02-20 Display device WO2021004067A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201910620438.2A CN110349582B (en) 2019-07-10 2019-07-10 Display device and far-field voice processing circuit
CN201910620438.2 2019-07-10
CN201910619184.2A CN110223707A (en) 2019-07-10 2019-07-10 Display device
CN201910619184.2 2019-07-10

Publications (1)

Publication Number Publication Date
WO2021004067A1 true WO2021004067A1 (en) 2021-01-14

Family

ID=74114937

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/075958 WO2021004067A1 (en) 2019-07-10 2020-02-20 Display device

Country Status (1)

Country Link
WO (1) WO2021004067A1 (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105825862A (en) * 2015-01-05 2016-08-03 沈阳新松机器人自动化股份有限公司 Robot man-machine dialogue echo cancellation system
US20170092272A1 (en) * 2015-09-10 2017-03-30 Crestron Electronics, Inc. System and method for determining recipient of spoken command in a control system
CN106782591A (en) * 2016-12-26 2017-05-31 惠州Tcl移动通信有限公司 A kind of devices and methods therefor that phonetic recognization rate is improved under background noise
CN106782589A (en) * 2016-12-12 2017-05-31 奇酷互联网络科技(深圳)有限公司 Mobile terminal and its pronunciation inputting method and device
CN109360562A (en) * 2018-12-07 2019-02-19 深圳创维-Rgb电子有限公司 Echo cancel method, device, medium and voice awakening method and equipment
CN109545237A (en) * 2018-10-24 2019-03-29 广东思派康电子科技有限公司 A kind of computer readable storage medium and the interactive voice speaker using the medium
US20190172463A1 (en) * 2014-09-10 2019-06-06 Fred Bargetzi Acoustic sensory network
CN209017204U (en) * 2018-12-25 2019-06-21 深圳创维-Rgb电子有限公司 Speech recognition system
CN110223707A (en) * 2019-07-10 2019-09-10 青岛海信电器股份有限公司 Display device
CN110349582A (en) * 2019-07-10 2019-10-18 青岛海信电器股份有限公司 Display device and far field speech processing circuit

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190172463A1 (en) * 2014-09-10 2019-06-06 Fred Bargetzi Acoustic sensory network
CN105825862A (en) * 2015-01-05 2016-08-03 沈阳新松机器人自动化股份有限公司 Robot man-machine dialogue echo cancellation system
US20170092272A1 (en) * 2015-09-10 2017-03-30 Crestron Electronics, Inc. System and method for determining recipient of spoken command in a control system
CN106782589A (en) * 2016-12-12 2017-05-31 奇酷互联网络科技(深圳)有限公司 Mobile terminal and its pronunciation inputting method and device
CN106782591A (en) * 2016-12-26 2017-05-31 惠州Tcl移动通信有限公司 A kind of devices and methods therefor that phonetic recognization rate is improved under background noise
CN109545237A (en) * 2018-10-24 2019-03-29 广东思派康电子科技有限公司 A kind of computer readable storage medium and the interactive voice speaker using the medium
CN109360562A (en) * 2018-12-07 2019-02-19 深圳创维-Rgb电子有限公司 Echo cancel method, device, medium and voice awakening method and equipment
CN209017204U (en) * 2018-12-25 2019-06-21 深圳创维-Rgb电子有限公司 Speech recognition system
CN110223707A (en) * 2019-07-10 2019-09-10 青岛海信电器股份有限公司 Display device
CN110349582A (en) * 2019-07-10 2019-10-18 青岛海信电器股份有限公司 Display device and far field speech processing circuit

Similar Documents

Publication Publication Date Title
US20190371353A1 (en) Band-limited Beamforming Microphone Array with Acoustic Echo Cancellation
CN101277331B (en) Sound reproducing device and sound reproduction method
US9071900B2 (en) Multi-channel recording
CN103458137B (en) System and method for the speech enhan-cement in audio conferencing
WO2015139642A1 (en) Bluetooth headset noise reduction method, device and system
CN110349582B (en) Display device and far-field voice processing circuit
US20120057717A1 (en) Noise Suppression for Sending Voice with Binaural Microphones
CN105208189B (en) Audio-frequency processing method and mobile terminal
CN104036771A (en) Signal processing device, signal processing method, and storage medium
US11696068B2 (en) Microphone with adjustable signal processing
CN110035372A (en) Output control method, device, sound reinforcement system and the computer equipment of sound reinforcement system
CN103428593B (en) The device of audio signal is gathered based on speaker
CN108510997A (en) Electronic equipment and echo cancel method applied to electronic equipment
CN103905960B (en) Enhancing stereo audio record in hand-held device
CN201805538U (en) Circuit for improving play sound articulation according to ambient sound, and device thereof
CN111933168B (en) Soft loop dynamic echo elimination method based on binder and mobile terminal
WO2021004067A1 (en) Display device
CN203243508U (en) Wireless howling suppression device
CN113038318A (en) Voice signal processing method and device
CN203181164U (en) A handset
TWI790718B (en) Conference terminal and echo cancellation method for conference
CN213547829U (en) Circuit structure and terminal of microphone
CN211089900U (en) K sings earphone
CN208316931U (en) A kind of sound pick up equipment
CN206517557U (en) A kind of microphone unit of independent control

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20836706

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20836706

Country of ref document: EP

Kind code of ref document: A1