CN115460476A - Audio parameter processing method and device of intercom system and intercom system - Google Patents

Audio parameter processing method and device of intercom system and intercom system Download PDF

Info

Publication number
CN115460476A
CN115460476A CN202211052806.6A CN202211052806A CN115460476A CN 115460476 A CN115460476 A CN 115460476A CN 202211052806 A CN202211052806 A CN 202211052806A CN 115460476 A CN115460476 A CN 115460476A
Authority
CN
China
Prior art keywords
audio signal
audio
determining
playing
intercom system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211052806.6A
Other languages
Chinese (zh)
Inventor
王旻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN202211052806.6A priority Critical patent/CN115460476A/en
Publication of CN115460476A publication Critical patent/CN115460476A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q5/00Selecting arrangements wherein two or more subscriber stations are connected by the same line to the exchange
    • H04Q5/24Selecting arrangements wherein two or more subscriber stations are connected by the same line to the exchange for two-party-line systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/01Correction of time axis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude

Abstract

The application discloses an audio parameter processing method and device of an intercom system and the intercom system, wherein the method comprises the following steps: generating a first audio signal, wherein the first audio signal frequency range is within an ultrasonic frequency range; sending the first audio signal to a playing device for playing, wherein the intercom system comprises the playing device and a collecting device, and the collecting device is used for collecting the audio signal; after the first audio signal is played, acquiring a second audio signal acquired by the acquisition equipment; and determining audio parameters according to the second audio signal and the first audio signal, and configuring the intercom system according to the audio parameters. Through the method and the device, the problems that the measurement is inaccurate and the talkback system cannot be adjusted at any time when the talkback system is adjusted by using audible sound signals such as white noise in the prior art are solved, so that the talkback system can be adjusted at any time, and the tone quality effect of the talkback system is improved.

Description

Audio parameter processing method and device of intercom system and intercom system
Technical Field
The application relates to the field of audio processing, in particular to an audio parameter processing method and device of an intercom system and the intercom system.
Background
With the progress of network technology, intercom systems are used more and more, and comprise playing equipment and acquisition equipment. For example, a video conference system includes a speaker as a playback device and a microphone as a capture device. For another example, a mobile terminal (e.g., a mobile phone) may also be considered as an intercom system, and therefore the mobile terminal also includes a speaker as a playing device and a microphone as a capturing device.
The intercom system can be installed and use in the environment of difference, and different environment can have different circumstances, and these circumstances all can lead to the audio effect variation of intercom system, in order to solve this problem, can let the user according to the parameter of the different intercom systems of the manual adjustment of environment of difference to improve tone quality effect.
Manual adjustment relies on the experience of the user and may not be able to adjust the intercom system to a better operating state. Chinese patent publication No. CNCN113747336A provides a method for tuning and adapting a sound field in different spaces based on an audio processor, in which, after a tuning key is pressed to start a tuning program of the audio processor, processing steps are automatically executed to implement tuning to adapt the sound field: the audio processor controls the loudspeaker to play sinusoidal signals and white noise, and the microphone collects signals to form a closed-loop system; and gradually adjusting the output gain of the hardware by using the fixed input gain of the hardware, and finely adjusting the gain again by using the adjusted output gain of the hardware to enable the closed-loop system to tend to be stable.
The inventor has found that this patent application uses audible signals such as white noise, which are similar in frequency distribution to the environmental noise in a noisy environment and result in inaccurate measurements, and that audible signals such as white noise can be heard by the human ear and therefore these signals cannot be used to adjust the intercom system at any time.
Disclosure of Invention
The embodiment of the application provides an audio parameter processing method and device of an intercom system and the intercom system, and aims to at least solve the problems that in the prior art, the measurement is inaccurate when the intercom system is adjusted by audible sound signals such as white noise and the like, and the intercom system cannot be adjusted at any time.
According to one aspect of the application, an audio parameter processing method of an intercom system is provided, which comprises the following steps: generating a first audio signal, wherein the first audio signal frequency range is within an ultrasonic frequency range; sending the first audio signal to a playing device for playing, wherein the intercom system comprises the playing device and a collecting device, and the collecting device is used for collecting the audio signal; after the first audio signal is played, acquiring a second audio signal acquired by the acquisition equipment; and determining audio parameters according to the second audio signal and the first audio signal, and configuring the intercom system according to the audio parameters.
Further, the audio parameters include at least one of: time delay, environmental noise, and the playing volume of the playing device.
Further, determining the audio parameter from the second audio signal and the first audio signal comprises: acquiring a first time for sending the first audio signal to the playing device; detecting the first audio signal from the second audio signal; determining a second time when the first audio signal is acquired by the acquisition device in the case that the first audio signal is detected from the second audio signal; and determining the time delay according to the first time and the second time.
Further, determining the audio parameter from the second audio signal and the first audio signal comprises: detecting three single-frequency combined signals from the second audio signal in case of a mute section between the three single-frequency combined signals and one single-frequency signal being included in the first audio signal, wherein the audio signal is not played by the playing device in the mute section; acquiring audio signals of a mute portion between the three single-frequency combined signals and one single-frequency signal detected in the second audio signal; and determining the environmental noise according to the audio signal of the mute part.
Further, determining the ambient noise from the audio signal of the silent section includes: acquiring the average energy of the audio signal of the mute part in the time domain; and determining the size of the environmental noise according to the average energy.
Further, determining the audio parameter from the second audio signal and the first audio signal comprises: detecting a third audio signal from the second audio signal in a case where the third audio signal having a predetermined length of time and a predetermined frequency is included in the first audio signal; and determining the playing volume of the playing device according to the third audio signal detected from the second audio signal.
Further, determining the playback volume of the playback device from the third audio signal detected from the second audio signal comprises: acquiring the average energy of the third audio signal detected from the second audio signal in the time domain; and determining the playing volume of the playing device according to the average energy and the frequency response curve of the playing device.
Further, configuring the intercom system according to the audio parameters comprises: taking the audio parameters as input parameters of an algorithm configured inside the intercom system so as to configure the intercom system, wherein the algorithm comprises at least one of the following parameters: the playback device comprises an echo cancellation algorithm, a noise reduction algorithm and a gain control algorithm, wherein the time delay is used for the echo cancellation algorithm, the ambient noise is used for the noise reduction algorithm, and the playback volume of the playback device is used for the gain control algorithm.
Further, the first audio signal comprises a plurality of audio signals with different frequencies, wherein a mute part is arranged between the plurality of audio signals with different frequencies.
According to another aspect of the present application, there is also provided an audio parameter processing apparatus of an intercom system, including: a generating module for generating a first audio signal, wherein the first audio signal frequency range is within an ultrasonic frequency range; the sending module is used for sending the first audio signal to a playing device for playing, wherein the intercom system comprises the playing device and a collecting device, and the collecting device is used for collecting the audio signal; the acquisition module is used for acquiring a second audio signal acquired by the acquisition equipment after the first audio signal is played; and the processing module is used for determining audio parameters according to the second audio signal and the first audio signal and configuring the intercom system according to the audio parameters.
Further, the audio parameters include at least one of: time delay, environmental noise, and the playing volume of the playing device.
Further, the processing module is configured to: acquiring a first time for sending the first audio signal to the playing device; detecting the first audio signal from the second audio signal; determining a second time when the first audio signal is acquired by the acquisition device in the case that the first audio signal is detected from the second audio signal; and determining the time delay according to the first time and the second time.
Further, the processing module is configured to: detecting three single-frequency combined signals from the second audio signal in case of a mute section between the three single-frequency combined signals and one single-frequency signal being included in the first audio signal, wherein the audio signal is not played by the playing device in the mute section; acquiring audio signals of a mute section between the three single-frequency combined signals and one single-frequency signal detected in the second audio signal; and determining the environmental noise according to the audio signal of the mute part.
Further, the processing module is configured to: acquiring the average energy of the audio signal of the mute part in the time domain; and determining the size of the environmental noise according to the average energy.
Further, the processing module is configured to: detecting a third audio signal from the second audio signal in a case where the third audio signal having a predetermined length of time and a predetermined frequency is included in the first audio signal; and determining the playing volume of the playing device according to the third audio signal detected from the second audio signal.
Further, the processing module is configured to: acquiring the average energy of the third audio signal detected from the second audio signal in the time domain; and determining the playing volume of the playing device according to the average energy and the frequency response curve of the playing device.
Further, the processing module is configured to: taking the audio parameters as input parameters of an algorithm configured inside the intercom system to configure the intercom system, wherein the algorithm comprises at least one of the following parameters: the playback device comprises an echo cancellation algorithm, a noise reduction algorithm and a gain control algorithm, wherein the time delay is used for the echo cancellation algorithm, the ambient noise is used for the noise reduction algorithm, and the playback volume of the playback device is used for the gain control algorithm.
Further, the first audio signal comprises a plurality of audio signals with different frequencies, wherein a mute part is arranged between the plurality of audio signals with different frequencies.
According to another aspect of the application, the talkback system comprises a playing device, a collecting device, a memory and a processor; wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the above-described method steps.
According to another aspect of the present application, there is also provided a readable storage medium having stored thereon computer instructions, wherein the computer instructions, when executed by a processor, implement the above-described method steps.
In the embodiment of the application, the generation of a first audio signal is adopted, wherein the frequency range of the first audio signal is within the ultrasonic frequency range; sending the first audio signal to a playing device for playing, wherein the intercom system comprises the playing device and a collecting device, and the collecting device is used for collecting the audio signal; after the first audio signal is played, acquiring a second audio signal acquired by the acquisition equipment; and determining audio parameters according to the second audio signal and the first audio signal, and configuring the intercom system according to the audio parameters. Through the method and the device, the problems that the measurement is inaccurate and the talkback system cannot be adjusted at any time when the talkback system is adjusted by using audible sound signals such as white noise in the prior art are solved, so that the talkback system can be adjusted at any time, and the tone quality effect of the talkback system is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:
FIG. 1 is a schematic view of an intercom system according to an embodiment of the present application;
fig. 2 is a flowchart of an audio parameter processing method of an intercom system according to an embodiment of the present application; and the number of the first and second groups,
fig. 3 is a block diagram of an intercom system according to an embodiment of the present application.
Detailed Description
It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than here.
Referring to the intercom system in the following implementation mode, fig. 1 is a schematic diagram of the intercom system according to the embodiment of the application, as shown in fig. 1, a speaker as a playing device is arranged at the upper left corner, and a microphone as a capturing device is arranged at the lower right corner. A system in which a playback device and a capture device are both present may be considered as an intercom system in the following embodiments, for example, a video conference system generally includes one or more microphones as the capture device and one or more speakers as the playback device, and the speakers and the microphones may be arranged at different positions, and the video conference system is a typical intercom system. The solution in the following embodiments is not only applicable to video conferencing systems, but also to other intercom systems.
In an embodiment, an audio parameter processing method of an intercom system is provided, fig. 2 is a flowchart of the audio parameter processing method of the intercom system according to the embodiment of the application, as shown in fig. 2, and the method steps included in fig. 2 are explained below.
Step S202, generating a first audio signal, wherein the frequency range of the first audio signal is within the ultrasonic frequency range;
step S204, sending the first audio signal to a playing device for playing, wherein the intercom system comprises the playing device and a collecting device, and the collecting device is used for collecting the audio signal;
step S206, after the first audio signal is played, acquiring a second audio signal acquired by the acquisition equipment;
step S208, determining an audio parameter according to the second audio signal and the first audio signal, and configuring the intercom system according to the audio parameter.
In the above step, the first audio signal in the ultrasonic frequency range is used, and since the ultrasonic frequency range is between 20KHZ and 40KHZ (i.e. 20000HZ-40000 HZ), which is different from the frequency of the audio signal that is ubiquitous in the environment, the first audio signal in the ultrasonic frequency range is not easily affected by the environmental noise when the intercom system is adjusted using the first audio signal in the ultrasonic frequency range. In addition, if parameters of the intercom system are adjusted by using audible sound signals such as white noise, human ears can hear the audible sound signals such as white noise, so that the parameters can only be adjusted before the intercom system is used, if the intercom system is used, the parameters cannot be adjusted, the first audio signals in the ultrasonic frequency range cannot be heard by the human ears, the first audio signals can be sent at any time during the use of the intercom system to adjust the intercom system, and the applicability of the intercom system to environmental changes is improved. Therefore, the problems that in the prior art, the measurement is inaccurate and the talkback system cannot be adjusted at any time when the talkback system is adjusted by audible sound signals such as white noise are solved through the steps, so that the talkback system can be adjusted at any time, and the tone quality effect of the talkback system is improved.
The types of audio parameters are various, and generally, when the intercom system is used, the volume of the intercom system needs to be controlled, environmental noise is eliminated, echo is eliminated, and the like. That is, in the above steps, the parameter for configuring the intercom system may include at least one of: the audio signal is sent to the playing device by the time delay, the time that passes when the audio signal played by the playing device is collected by the collecting device is played, the environmental noise is generally audio which causes interference to an intercom system in the environment where the intercom system is located, and the playing volume is used for automatically increasing or reducing the volume of the playing device. There are many ways to determine the delay, the ambient noise, and the playback volume of the playback device, and the following description will be made with reference to several examples. It should be noted that the following examples are optional embodiments, and other methods may also be used to obtain the time delay, the environmental noise, and the playing volume, and mainly obtain the audio parameter, which may all play a role in improving the sound quality of the intercom system.
Example 1
In example 1, an optional implementation manner of obtaining a time delay is provided, and the optional implementation manner includes the following steps: acquiring a first time for sending the first audio signal to the playing device; detecting the first audio signal from the second audio signal; determining a second time when the first audio signal is acquired by the acquisition device in the case that the first audio signal is detected from the second audio signal; and determining the time delay according to the first time and the second time.
In this example, the time delay is obtained by starting timing from the time when the first audio signal is sent to the playing device and then taking the time when the first audio signal is collected by the collecting device as the end time, and this way of obtaining the time delay is more accurate.
In example 1, a delay is obtained, which has a large effect in an intercom system. In one embodiment, the delay may be used to align the second audio signal with the first audio signal, which may make it easier to extract information from the second audio signal. For example, the first audio signal is a signal with a duration of 500 ms and a frequency of 30000HZ, and assuming a delay of 600 ms, a signal cut from the second audio signal between 600 ms and 1100 ms is the captured first audio signal. The corresponding portions may be extracted from the second audio signal after aligning the second audio signal with the first audio signal as in examples 2 and 3 below. Of course, the corresponding portion may also be directly extracted from the second audio signal according to the feature of the portion to be extracted, and may be flexibly selected according to actual needs when implemented.
In another embodiment, the delay may also be used for echo cancellation. There are many algorithms for echo cancellation, and the acquisition of the time delay and the cancellation of the echo are described below with an optional echo cancellation algorithm. In this alternative embodiment, the delay generally includes a delay caused by hardware and a delay caused by an environment, and whatever the cause of the delay, the following steps are adopted to obtain the accurate delay. Firstly, outputting a first audio signal, and recording the first time of outputting the first audio signal, wherein in the process, the sampling device is always receiving external waves, converting the external waves into a second audio signal, and synchronously sending the second audio signal to a buffer area for buffering. After the first audio signal is sent to the playing device, the first audio signal is also sent to the buffer area to be buffered, the playing device plays the first audio signal after receiving the first audio signal, the collecting device receives the wave corresponding to the first audio signal and converts the wave into a second audio signal, the starting time of the first audio signal in the second audio signal is found according to the comparison between the first audio signal and the second audio signal stored in the buffer area, the real time of the first audio signal is used as the second time, and the difference value between the second time and the first time is used as the time delay.
When echo cancellation is carried out, firstly, an audio signal f1 transmitted by the other party is received; sending the received audio signal f1 to a playing device, and sending the audio signal f1 to a cache region for storage; in the process, the acquisition equipment receives external sound waves all the time, converts the external sound waves into audio signals f1', and synchronously sends the audio signals f1' to the cache region for storage; the playing equipment plays the audio signal f1 after receiving the audio signal, the sound wave is transmitted to the acquisition equipment, and the acquisition equipment receives the sound wave; and calling the audio signal f1 and the audio signal f1 'stored in the buffer area, eliminating the audio signal f1 from the audio signal f1' according to the time delay, and controlling to output the audio signal after the echo is eliminated.
Example 2
In example 2, an alternative way of acquiring ambient noise is provided, in which a mute section may be set in the first audio signal, so that the audio information acquired by the acquisition device in the mute section may be determined as the audio information generated by the noise in the environment. This alternative way of acquiring ambient noise may comprise the steps of: detecting three single-frequency combined signals from the second audio signal in a case where a mute section between the three single-frequency combined signals and one single-frequency signal is included in the first audio signal, wherein the audio signal is not played by the playing device in the mute section; acquiring audio signals of a mute section between the three single-frequency combined signals and one single-frequency signal detected in the second audio signal; and determining the environmental noise according to the audio signal of the mute part.
After determining the ambient noise from the audio signal of the mute section, a processing method of noise reduction may be employed. For example, commonly used denoising algorithms are spectral subtraction, linear filtering, wavelet transform, and the like. The basic principle of spectral subtraction is: and replacing the noise spectrum in the voice period with the estimated value of the noise spectrum obtained by the calculation without the voice gap, and subtracting the noise-containing voice spectrum to obtain the estimated value of the voice spectrum. The linear filtering method and the wavelet transform method are the same as those in the prior art, and are not described herein again.
And controlling the noise reduction level of a noise reduction algorithm according to the measured environmental noise, wherein the higher the noise reduction level is, the greater the noise suppression capability is, but the greater the damage to the voice is, configuring the corresponding noise reduction level according to the environmental noise, and obtaining the best noise reduction effect. The magnitude of the environmental noise can be obtained in many ways, for example, the average energy of the audio signal of the mute part in the time domain can be obtained; and determining the size of the environmental noise according to the average energy.
In summary, the noise reduction processing after determining the ambient noise is described in the above example 2.
Example 3
In this example, the playing volume of the playing device may be adjusted to make the playing device at an appropriate playing volume, so as to prevent the playing volume from being too small or too large. This example may include the following steps: detecting a third audio signal from the second audio signal under the condition that the third audio signal with the time length being a preset time length and the frequency being a preset frequency is included in the first audio signal; and determining the playing volume of the playing device according to the third audio signal detected from the second audio signal.
The playback volume can be determined in many ways, one of which is based on the frequency response curve. The frequency response curve is a curve corresponding to the frequency and decibel of the playing device, for example, the test audio is a 21000Hz signal, and it can be known from the frequency response curve that the frequency response of the playing device in the audible sound range is 10 decibels higher than the frequency response at 21000Hz, and if the result measured by the 21000Hz signal is 50 decibels at this time, the decibel value when the playing device plays the audible sound signal is 60 decibels. Therefore, after the third audio signal is received, the playing volume of the playing device can be obtained according to the decibel value, the frequency and the frequency response curve of the third audio signal. That is, determining the playing volume of the playing device according to the third audio signal detected from the second audio signal may include the following steps: acquiring the average energy of the third audio signal detected from the second audio signal in the time domain; and determining the playing volume of the playing device according to the average energy and the frequency response curve of the playing device. For example, the frequency of the received third audio signal is 30000HZ, the decibel of the third audio signal is 70 decibels, and it can be obtained according to the frequency response curve that the playback device increases by 5 decibels when playing the audio signal of 30000HZ, at this time, the playback volume of the playback device is 70 decibels-5 decibels =65 decibels.
The algorithm for adjusting the playing volume can be a gain control algorithm, and the purpose of adjusting the volume is achieved by adjusting the gain through the gain control algorithm.
As can be seen from the above three examples, in the case that various algorithms are configured inside an intercom system, the audio parameters may be used as input parameters of the algorithms configured inside the intercom system to configure the intercom system, where the algorithms include at least one of: the time delay is used for the echo cancellation algorithm, the environmental noise is used for the noise reduction algorithm, and the playing volume of the playing device is used for the gain control algorithm. The algorithm configured inside may adopt the algorithm mentioned in the above embodiment, or may adopt other existing algorithms, which are not described herein again.
In the above embodiment, it is necessary to compare the acquired second audio signal with the first audio signal, in which case it is more advantageous to extract the audio parameters if the characteristics of the first audio signal are more obvious. In an alternative, the first audio signal comprises a plurality of audio signals with different frequencies, wherein the plurality of audio signals with different frequencies are separated by a mute part. For example, the first audio signal may include three portions, the first portion including a plurality of audio signals of different frequencies, each audio signal of different frequencies being separated by a mute signal of a first duration; the second part comprises a mute signal of a second duration; the third portion includes a single frequency segment of the audio signal for a third duration. The first part is used for acquiring time delay, the second part is used for acquiring environmental noise, and the third part is used for acquiring playing volume of the playing device.
In the following, an alternative embodiment is described, in which the playing device is a speaker and the capturing device is a microphone, and the steps involved in the foregoing embodiment may be implemented by two systems in the alternative embodiment: a sound field detection system and an algorithm self-adaptive adjusting system. The sound field detection system is used for obtaining audio parameters according to the second audio signal and the first audio signal, and the algorithm self-adaptive adjusting system is used for adjusting the algorithm according to the audio parameters. Fig. 3 is a block diagram of an intercom system according to an embodiment of the present application, and as shown in fig. 3, a speaker plays a test audio (i.e., a first audio signal), a microphone acquires a signal (i.e., a second audio signal) in real time, and the signal acquired by the microphone is passed through an automatic sound field detection system to obtain a time delay of the system, a noise level in an environment, and a volume of a speaker device, so that a playback device performs adaptive control on related parameters in a gain control algorithm, and an acquisition device performs adaptive control on related parameters in an echo cancellation algorithm and a noise reduction algorithm, thereby achieving improvement of a sound quality effect. The function shown in fig. 3 may be implemented as a one-key tuning function, and the user only needs to press a key to automatically perform the adjustment, which is convenient for the user to use.
The loudspeaker plays a specified detection audio signal (namely, a first audio signal) and collects the audio signal through the microphone, and the sound field automatic detection system analyzes the audio signal (namely, a second audio signal) collected by the microphone, so that the time delay of the system, the noise in the environment and the volume of the loudspeaker are obtained.
In this alternative embodiment, the detecting the audio signal is configured to: firstly, three single-frequency signals are provided, the frequencies are 20250Hz, 21000Hz and 21750Hz respectively, the duration of each single-frequency signal is 256ms (millisecond), the mute interval between each single-frequency signal is 128ms, next, a period of mute with the length of 768ms is provided, and after the mute of 768ms, a single-frequency signal with the duration of 768ms and the frequency of 21000Hz is provided.
The following describes the delay detection, the environmental noise detection, and the volume detection in this alternative embodiment.
And a time delay detection function: and performing time delay detection on the three single-frequency tones (20250 Hz, 21000Hz and 21750 Hz) in the first section of the received detection audio signal, respectively finding out the initial positions of the three single-frequency tones in the second audio signal, and calculating according to the initial position of the second audio signal and the sending time of the three single-frequency tones of the first audio signal to obtain the time delay in the system. Compared with the conventional white noise signal, the method for estimating the time delay by using the single-frequency combined signal has the advantage of strong anti-interference capability
The ambient noise detection function: aligning the received detected audio signals through the time delay size in the system obtained through calculation, positioning to a mute section between three single tones and one single tone, and calculating the time domain average energy of the signals of the mute section to obtain the size of the environmental noise.
The volume detection function: aligning the received detection audio signals through the time delay in the system obtained through calculation, positioning the signals to a single-frequency signal segment of 21000Hz, calculating time domain average energy, and obtaining the volume of the loudspeaker by combining the frequency response curve condition of the pre-loaded equipment.
The sound field automatic detection system obtains parameters such as the time delay of the system, the noise in the environment, the volume of a loudspeaker and the like through testing, and performs self-adaptive adjustment on a related algorithm integrated by the acquisition equipment and the playing equipment through the measured parameters, so that the sound quality improvement effect in different environments is achieved. For example, the system time delay measured by the sound field automatic detection system is used for an acquisition end echo cancellation algorithm, and time delay compensation is performed on an acquisition signal and a reference signal from a far end according to the measured system time delay, so that data of the acquisition signal and the reference signal are aligned, and echo cancellation effects in different environments are improved. The environmental noise measured by the sound field automatic detection system is used for controlling the noise reduction level of the noise reduction algorithm of the acquisition end, different threshold values of the environmental noise are set to correspond to different noise reduction levels of the noise reduction algorithm, and the measured environmental noise is larger, the noise reduction level of the corresponding noise reduction algorithm is higher, so that the noise reduction effect in different environments is improved. The volume of the loudspeaker measured by the sound field automatic detection system is used for controlling the gain level of the gain control algorithm of the playing end, when the volume of the loudspeaker is too large, the gain level of the gain control algorithm is reduced, and when the volume of the loudspeaker is too small, the gain level of the gain control algorithm is increased, so that the equipment in different installation positions and different installation distances in different environments is adjusted to be proper in volume.
The function among this optional embodiment can make a key tuning function, and the user only need press the button just can the automatic execution adjust, and the user of being convenient for uses, and the user just can realize self-adaptation tone quality through single self-broadcasting self-picking and adjust, and easy operation reduces work load by a wide margin to can improve the accuracy of adjusting, reach the effect that tone quality promoted. In addition, the optional embodiment uses the ultrasound, and the human ears cannot hear the ultrasound, so that the ultrasonic sound processing method can be carried out in any scene and at any time, has the advantage of no interference compared with the audible sound, and has a wider application range.
In the optional embodiment, the gain of the loudspeaker is automatically adjusted by acquiring the amplitude of the signal, the relevant parameters of the noise reduction algorithm are automatically configured according to the measured environmental noise, and the relevant parameters of the echo cancellation algorithm are automatically configured according to the measured system time delay, so that the effect of improving the self-adaptive tone quality is achieved.
In this embodiment, an electronic device is provided, comprising a memory in which a computer program is stored and a processor configured to run the computer program to perform the method in the above embodiments.
The programs described above may be run on a processor or may also be stored in memory (or referred to as computer-readable media), which includes both non-transitory and non-transitory, removable and non-removable media, that implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
These computer programs may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks, and corresponding steps may be implemented by different modules.
Such an arrangement is provided in this embodiment. The device is called an audio parameter processing device of an intercom system, and comprises: a generating module for generating a first audio signal, wherein the first audio signal frequency range is within an ultrasonic frequency range; the sending module is used for sending the first audio signal to a playing device for playing, wherein the talkback system comprises the playing device and a collecting device, and the collecting device is used for collecting the audio signal; the acquisition module is used for acquiring a second audio signal acquired by the acquisition equipment after the first audio signal is played; and the processing module is used for determining audio parameters according to the second audio signal and the first audio signal and configuring the intercom system according to the audio parameters.
The system or the apparatus is configured to implement the functions of the method in the foregoing embodiments, and each module in the system or the apparatus corresponds to each step in the method, which has been already described in the method, and is not described again here.
Optionally, the audio parameters comprise at least one of: time delay, environmental noise, and the playing volume of the playing device.
Optionally, the processing module is configured to: acquiring a first time for sending the first audio signal to the playing device; detecting the first audio signal from the second audio signal; determining a second time when the first audio signal is acquired by the acquisition device in the case that the first audio signal is detected from the second audio signal; and determining the time delay according to the first time and the second time.
Optionally, the processing module is configured to: detecting three single-frequency combined signals from the second audio signal in case of a mute section between the three single-frequency combined signals and one single-frequency signal being included in the first audio signal, wherein the audio signal is not played by the playing device in the mute section; acquiring audio signals of a mute section between the three single-frequency combined signals and one single-frequency signal detected in the second audio signal; and determining the environmental noise according to the audio signal of the mute part.
Optionally, the processing module is configured to: acquiring the average energy of the audio signal of the mute part in the time domain; and determining the size of the environmental noise according to the average energy.
Optionally, the processing module is configured to: detecting a third audio signal from the second audio signal in a case where the third audio signal having a predetermined length of time and a predetermined frequency is included in the first audio signal; and determining the playing volume of the playing device according to the third audio signal detected from the second audio signal.
Optionally, the processing module is configured to: acquiring the average energy of the third audio signal detected from the second audio signal in the time domain; and determining the playing volume of the playing device according to the average energy and the frequency response curve of the playing device.
Optionally, the processing module is configured to: taking the audio parameters as input parameters of an algorithm configured inside the intercom system to configure the intercom system, wherein the algorithm comprises at least one of the following parameters: the time delay is used for the echo cancellation algorithm, the environmental noise is used for the noise reduction algorithm, and the playing volume of the playing device is used for the gain control algorithm.
Optionally, the first audio signal includes a plurality of audio signals with different frequencies, wherein a mute part is spaced between the plurality of audio signals with different frequencies.
In the embodiment, the detection result of the automatic sound field detection system guides the algorithm to adaptively adjust the system, and finally the sound quality improvement effect under different equipment installation conditions in different environments is achieved.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement or the like made within the spirit and principle of the present application shall be included in the scope of the claims of the present application.

Claims (13)

1. An audio parameter processing method of an intercom system is characterized by comprising the following steps:
generating a first audio signal, wherein the first audio signal frequency range is within an ultrasonic frequency range;
sending the first audio signal to a playing device for playing, wherein the intercom system comprises the playing device and a collecting device, and the collecting device is used for collecting the audio signal;
after the first audio signal is played, acquiring a second audio signal acquired by the acquisition equipment;
and determining audio parameters according to the second audio signal and the first audio signal, and configuring the intercom system according to the audio parameters.
2. The method of claim 1, wherein the audio parameters comprise at least one of: time delay, environmental noise, and the playing volume of the playing device.
3. The method of claim 2, wherein determining the audio parameter from the second audio signal and the first audio signal comprises:
acquiring a first time for sending the first audio signal to the playing device;
detecting the first audio signal from the second audio signal;
determining a second time when the first audio signal is acquired by the acquisition device in the case that the first audio signal is detected from the second audio signal;
and determining the time delay according to the first time and the second time.
4. The method of claim 2, wherein determining the audio parameter from the second audio signal and the first audio signal comprises:
detecting three single-frequency combined signals from the second audio signal in a case where a mute section between the three single-frequency combined signals and one single-frequency signal is included in the first audio signal, wherein the audio signal is not played by the playing device in the mute section;
acquiring audio signals of a mute section between the three single-frequency combined signals and one single-frequency signal detected in the second audio signal;
and determining the environmental noise according to the audio signal of the mute part.
5. The method of claim 4, wherein determining the ambient noise from the audio signal of the silent section comprises:
acquiring the average energy of the audio signal of the mute part in the time domain;
and determining the size of the environmental noise according to the average energy.
6. The method of claim 2, wherein determining the audio parameter from the second audio signal and the first audio signal comprises:
detecting a third audio signal from the second audio signal in a case where the third audio signal having a predetermined length of time and a predetermined frequency is included in the first audio signal;
and determining the playing volume of the playing device according to the third audio signal detected from the second audio signal.
7. The method of claim 6, wherein determining the playback volume of the playback device based on the third audio signal detected from the second audio signal comprises:
acquiring the average energy of the third audio signal detected from the second audio signal in the time domain;
and determining the playing volume of the playing device according to the average energy and the frequency response curve of the playing device.
8. The method of claim 2, wherein configuring the intercom system according to the audio parameters comprises:
taking the audio parameters as input parameters of an algorithm configured inside the intercom system to configure the intercom system, wherein the algorithm comprises at least one of the following parameters: the playback device comprises an echo cancellation algorithm, a noise reduction algorithm and a gain control algorithm, wherein the time delay is used for the echo cancellation algorithm, the ambient noise is used for the noise reduction algorithm, and the playback volume of the playback device is used for the gain control algorithm.
9. The method according to any one of claims 1 to 8, wherein the first audio signal comprises a plurality of audio signals with different frequencies, and wherein a mute part is spaced between the plurality of audio signals with different frequencies.
10. An audio parameter processing apparatus of an intercom system, comprising:
a generating module for generating a first audio signal, wherein the first audio signal frequency range is within an ultrasonic frequency range;
the sending module is used for sending the first audio signal to a playing device for playing, wherein the intercom system comprises the playing device and a collecting device, and the collecting device is used for collecting the audio signal;
the acquisition module is used for acquiring a second audio signal acquired by the acquisition equipment after the first audio signal is played;
and the processing module is used for determining audio parameters according to the second audio signal and the first audio signal and configuring the intercom system according to the audio parameters.
11. The apparatus of claim 10,
the audio parameters include at least one of: time delay, environmental noise, and playing volume of the playing device;
the processing module is used for: acquiring a first time for sending the first audio signal to the playing device; detecting the first audio signal from the second audio signal; determining a second time when the first audio signal is acquired by the acquisition device in the case that the first audio signal is detected from the second audio signal; determining the time delay according to the first time and the second time;
the processing module is used for: detecting three single-frequency combined signals from the second audio signal in a case where a mute section between the three single-frequency combined signals and one single-frequency signal is included in the first audio signal, wherein the audio signal is not played by the playing device in the mute section; acquiring audio signals of a mute portion between the three single-frequency combined signals and one single-frequency signal detected in the second audio signal; determining the environmental noise according to the audio signal of the mute part;
the processing module is used for: acquiring the average energy of the audio signal of the mute part in the time domain; determining the magnitude of the environmental noise according to the average energy;
the processing module is used for: detecting a third audio signal from the second audio signal in a case where the third audio signal having a predetermined length of time and a predetermined frequency is included in the first audio signal; determining the playing volume of the playing device according to the third audio signal detected from the second audio signal;
the processing module is used for: acquiring the average energy of the third audio signal detected from the second audio signal in the time domain; determining the playing volume of the playing device according to the average energy and the frequency response curve of the playing device;
the processing module is used for: taking the audio parameters as input parameters of an algorithm configured inside the intercom system so as to configure the intercom system, wherein the algorithm comprises at least one of the following parameters: the time delay is used for the echo cancellation algorithm, the environmental noise is used for the noise reduction algorithm, and the playing volume of the playing device is used for the gain control algorithm;
the first audio signal comprises a plurality of sections of audio signals with different frequencies, wherein mute parts are arranged among the sections of audio signals with different frequencies.
12. An intercom system comprises a playing device, a collecting device, a memory and a processor; wherein the memory is to store one or more computer instructions, wherein the one or more computer instructions are to be executed by the processor to implement the method steps of any of claims 1 to 9.
13. A readable storage medium having stored thereon computer instructions, wherein the computer instructions, when executed by a processor, implement the method steps of any of claims 1 to 9.
CN202211052806.6A 2022-08-31 2022-08-31 Audio parameter processing method and device of intercom system and intercom system Pending CN115460476A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211052806.6A CN115460476A (en) 2022-08-31 2022-08-31 Audio parameter processing method and device of intercom system and intercom system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211052806.6A CN115460476A (en) 2022-08-31 2022-08-31 Audio parameter processing method and device of intercom system and intercom system

Publications (1)

Publication Number Publication Date
CN115460476A true CN115460476A (en) 2022-12-09

Family

ID=84300510

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211052806.6A Pending CN115460476A (en) 2022-08-31 2022-08-31 Audio parameter processing method and device of intercom system and intercom system

Country Status (1)

Country Link
CN (1) CN115460476A (en)

Similar Documents

Publication Publication Date Title
US9967661B1 (en) Multichannel acoustic echo cancellation
CN110291581B (en) Headset off-ear detection
US9653060B1 (en) Hybrid reference signal for acoustic echo cancellation
US8284947B2 (en) Reverberation estimation and suppression system
JP5704470B2 (en) Audio intelligibility increasing method and apparatus and computer apparatus
US9818425B1 (en) Parallel output paths for acoustic echo cancellation
US9524735B2 (en) Threshold adaptation in two-channel noise estimation and voice activity detection
US9299333B2 (en) System for adaptive audio signal shaping for improved playback in a noisy environment
US8401201B2 (en) Sound processing apparatus and method
CA2766196C (en) Apparatus, method and computer program for controlling an acoustic signal
US11373665B2 (en) Voice isolation system
JP2008543194A (en) Audio signal gain control apparatus and method
JPH09130281A (en) Processing method of voice signal and its circuit device
US9363600B2 (en) Method and apparatus for improved residual echo suppression and flexible tradeoffs in near-end distortion and echo reduction
KR20190019833A (en) Room-Dependent Adaptive Timbre Correction
US9503815B2 (en) Perceptual echo gate approach and design for improved echo control to support higher audio and conversational quality
JP2019216389A (en) Echo suppression device, echo suppression method, and echo suppression program
US11195539B2 (en) Forced gap insertion for pervasive listening
CN115460476A (en) Audio parameter processing method and device of intercom system and intercom system
CN111491061B (en) Audio detection method and device for call scene and related equipment
JP2020170986A (en) Echo suppression device, echo suppression method and echo suppression program
EP3688870B1 (en) Inference and correction of automatic gain compensation
CN116506785B (en) Automatic tuning system for enclosed space
WO2023245714A1 (en) Double talk method and apparatus, electronic device, and computer-readable storage medium
KR101258057B1 (en) Apparatus and method for auditory masking-based adjusting the amplitude of phone ringing sounds under acoustic noise environments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination