CN117544262A

CN117544262A - Dynamic control method, device, equipment and storage medium for directional broadcasting

Info

Publication number: CN117544262A
Application number: CN202311702972.0A
Authority: CN
Inventors: 冼子恩
Original assignee: Jingjing Intelligent Acoustic Technology Shenzhen Co ltd
Current assignee: Jingjing Intelligent Acoustic Technology Shenzhen Co ltd
Priority date: 2023-12-12
Filing date: 2023-12-12
Publication date: 2024-02-09

Abstract

The invention relates to the technical field of directional broadcasting, and discloses a dynamic control method, device, equipment and storage medium for directional broadcasting, which are used for improving the dynamic control accuracy of multi-scene directional broadcasting. Comprising the following steps: inputting the frequency characteristic set into a broadcasting scene analysis model for scene analysis to obtain a target broadcasting scene; constructing a high-frequency compression strategy for a frequency characteristic set corresponding to the initial audio signal data of each audio device, and generating a target frequency compression strategy; performing high-frequency compression processing on the frequency characteristic set corresponding to the initial audio signal data of each audio device to obtain processed candidate audio signal data; and calculating the dynamic control range of the candidate audio signal data to obtain a dynamic control range corresponding to the candidate audio signal data, and dynamically controlling the candidate audio signal data through the dynamic control range to generate corresponding target audio signal data.

Description

Dynamic control method, device, equipment and storage medium for directional broadcasting

Technical Field

The present invention relates to the field of directional broadcasting technologies, and in particular, to a method, an apparatus, a device, and a storage medium for dynamic control of directional broadcasting.

Background

In the field of broadcasting and audio broadcasting, multi-scene directional broadcasting is an important technology aiming at providing an audio service automatically adapted according to acoustic characteristics of different scenes and listener's needs. Broadcasting technology has made significant progress over the past decades, but there are still some challenges and shortcomings.

In the prior art, audio quality and loudness tend to be inconsistent across different scenes. For example, in an outdoor environment, the audio requires a higher loudness due to interference from ambient noise, while in a quiet indoor environment such a high loudness is not required. Such inconsistencies affect the user's auditory experience. Noisy environments often lead to a degradation of the quality of the audio signal. Conventional broadcast systems often fail to effectively handle and reduce ambient noise, resulting in audience confusion in noisy environments. Many conventional broadcast systems require operators to manually adjust audio parameters to accommodate different scenarios. This requires manpower and time and can lead to inconsistencies.

Disclosure of Invention

The invention provides a dynamic control method, a device, equipment and a storage medium for directional broadcasting, which are used for improving the dynamic control accuracy of multi-scene directional broadcasting.

The first aspect of the present invention provides a dynamic control method for directional broadcasting, which includes:

performing position calibration on a plurality of preset audio devices to obtain a position information set, and acquiring audio signals sent by the audio devices through the position information set to obtain initial audio signal data of each audio device;

respectively carrying out audio frequency spectrum analysis on the initial audio signal data of each audio device to obtain a frequency characteristic set corresponding to the initial audio signal data of each audio device;

inputting a frequency characteristic set corresponding to the initial audio signal data of each audio device into a preset broadcast scene analysis model to perform scene analysis to obtain a target broadcast scene;

constructing a high-frequency compression strategy for a frequency feature set corresponding to the initial audio signal data of each audio device based on the target broadcasting scene, and generating a target frequency compression strategy;

respectively carrying out high-frequency compression processing on a frequency characteristic set corresponding to the initial audio signal data of each audio device through the target high-frequency compression strategy to obtain processed candidate audio signal data;

Calculating a dynamic control range of the candidate audio signal data to obtain a dynamic control range corresponding to the candidate audio signal data, wherein the dynamic control range is used for indicating a volume control range corresponding to the candidate audio signal data;

and dynamically controlling the candidate audio signal data through the dynamic control range, generating corresponding target audio signal data, and controlling a plurality of audio devices to carry out directional broadcasting based on the target audio signal data.

With reference to the first aspect, in a first implementation manner of the first aspect of the present invention, the performing audio spectrum analysis on the initial audio signal data of each audio device to obtain a frequency feature set corresponding to the initial audio signal data of each audio device includes:

respectively carrying out data conversion on the initial audio signal data of each audio device through a preset analog-to-digital converter to obtain a plurality of digital audio signals;

performing frequency domain conversion on a plurality of digital audio signals through a preset Fourier transform algorithm to obtain a plurality of frequency domain signal data;

respectively carrying out spectrum analysis on each frequency domain signal data to obtain a spectrum data set corresponding to each frequency domain signal data;

And respectively extracting frequency characteristics of the frequency spectrum data set corresponding to each frequency domain signal data to obtain a frequency characteristic set corresponding to the initial audio signal data of each audio device.

With reference to the first implementation manner of the first aspect, in a second implementation manner of the first aspect of the present invention, the performing spectral analysis on each of the frequency domain signal data to obtain a set of spectral data corresponding to each of the frequency domain signal data includes:

carrying out amplitude spectrum analysis on each frequency domain signal data to obtain amplitude spectrum data corresponding to each frequency domain signal data;

carrying out power spectrum calculation on each frequency domain signal data through the amplitude spectrum data corresponding to each frequency domain signal data to obtain power spectrum data of each frequency domain signal data;

carrying out amplitude spectrum calculation on each frequency domain signal data through amplitude spectrum data corresponding to each frequency domain signal data to obtain amplitude spectrum data of each frequency domain signal data;

extracting phase data from each frequency domain signal data to obtain phase spectrum data corresponding to each frequency domain signal data;

and respectively carrying out data combination on the phase spectrum data corresponding to each frequency domain signal data, the amplitude spectrum data of each frequency domain signal data, the power spectrum data of each frequency domain signal data and the amplitude spectrum data corresponding to each frequency domain signal data to obtain a frequency spectrum data set corresponding to each frequency domain signal data.

With reference to the second implementation manner of the first aspect, in a third implementation manner of the first aspect of the present invention, the performing frequency feature extraction on the frequency spectrum data set corresponding to each frequency domain signal data to obtain a frequency feature set corresponding to initial audio signal data of each audio device includes:

extracting main frequency components from amplitude spectrum data corresponding to each frequency domain signal data to obtain main frequency components of each frequency domain signal data;

performing frequency bandwidth calculation on the amplitude spectrum data corresponding to each frequency domain signal data to obtain frequency bandwidth data of each frequency domain signal data, and performing data combination on main frequency components of each frequency domain signal data and the frequency bandwidth data of each frequency domain signal data to obtain a first initial frequency characteristic set corresponding to each frequency domain signal data;

carrying out total power calculation on the power spectrum data of each frequency domain signal data to obtain total power data of each frequency domain signal data;

carrying out spectral peak power analysis on each frequency domain signal data to obtain spectral peak power data of each frequency domain signal data, and carrying out data combination on total power data of each frequency domain signal data and spectral peak power data of each frequency domain signal data to obtain a second initial frequency characteristic set corresponding to each frequency domain signal data;

Calculating an amplitude average value of the amplitude spectrum data of each frequency domain signal data to obtain the amplitude average value of each frequency domain signal data;

extracting the maximum amplitude value of each frequency domain signal data to obtain the maximum amplitude value of each frequency domain signal data, and carrying out data combination on the average amplitude value of each frequency domain signal data and the maximum amplitude value of each frequency domain signal data to obtain a third initial frequency characteristic set corresponding to each frequency domain signal data;

calculating a phase average value of phase spectrum data corresponding to each frequency domain signal data to obtain the phase average value of each frequency domain signal data;

performing phase difference analysis on each frequency domain signal data to obtain phase difference data of each frequency domain signal data, and performing data combination on a phase average value of each frequency domain signal data and the phase difference data of each frequency domain signal data to obtain a fourth initial frequency characteristic set corresponding to each frequency domain signal data;

and respectively carrying out data combination on the first initial frequency characteristic set corresponding to each frequency domain signal data, the second initial frequency characteristic set corresponding to each frequency domain signal data, the third initial frequency characteristic set corresponding to each frequency domain signal data and the fourth initial frequency characteristic set corresponding to each frequency domain signal data to obtain the frequency characteristic set corresponding to the initial audio signal data of each audio device.

With reference to the first aspect, in a fourth implementation manner of the first aspect of the present invention, inputting the frequency feature set corresponding to the initial audio signal data of each audio device into a preset broadcast scene analysis model to perform scene analysis, to obtain a target broadcast scene, where the method includes:

inputting a frequency characteristic set corresponding to the initial audio signal data of each audio device into a characteristic mapping layer of the broadcasting scene analysis model to perform frequency characteristic mapping, and outputting a frequency characteristic vector corresponding to each audio device;

inputting the frequency characteristic vector corresponding to each audio device into a vector coding layer of the broadcasting scene analysis model to perform vector coding, and outputting the coding characteristic vector corresponding to each audio device;

inputting the coding feature vector corresponding to each audio device into a convolution layer of the broadcasting scene analysis model to extract scene features to obtain a scene feature set;

and inputting the scene feature set into a full-connection layer of the broadcasting scene analysis model to perform scene analysis to obtain a target broadcasting scene.

With reference to the first aspect, in a fifth implementation manner of the first aspect of the present invention, the performing, based on the target broadcast scene, high-frequency compression policy construction on a frequency feature set corresponding to initial audio signal data of each audio device, to generate a target frequency compression policy includes:

Based on the target broadcasting scene, performing cut-off frequency calculation of a high-frequency component on a frequency characteristic set corresponding to the initial audio signal data of each audio device to obtain a target cut-off frequency;

performing high-frequency compression ratio calculation on a frequency characteristic set corresponding to the initial audio signal data of each audio device to obtain a target compression ratio;

and constructing a high-frequency compression strategy based on the target cut-off frequency and the target compression ratio to obtain the target high-frequency compression strategy.

With reference to the first aspect, in a sixth implementation manner of the first aspect of the present invention, the calculating a dynamic control range of the candidate audio signal data to obtain a dynamic control range corresponding to the candidate audio signal data, where the dynamic control range is used to indicate a volume control range corresponding to the candidate audio signal data, includes:

detecting the maximum amplitude of the candidate audio signal data to obtain a target maximum amplitude;

detecting the minimum amplitude of the candidate audio signal data to obtain a target minimum amplitude;

and calculating the dynamic control range of the candidate audio signal data through the target maximum amplitude and the target minimum amplitude to obtain a dynamic control range corresponding to the candidate audio signal data, wherein the dynamic control range is used for indicating a volume control range corresponding to the candidate audio signal data.

The second aspect of the present invention provides a dynamic control device for directional broadcasting, the dynamic control device for directional broadcasting comprising:

the calibration module is used for carrying out position calibration on a plurality of preset audio devices to obtain a position information set, and collecting audio signals sent by the audio devices through the position information set to obtain initial audio signal data of each audio device;

the analysis module is used for respectively carrying out audio frequency spectrum analysis on the initial audio signal data of each audio device to obtain a frequency characteristic set corresponding to the initial audio signal data of each audio device;

the input module is used for inputting the frequency characteristic set corresponding to the initial audio signal data of each audio device into a preset broadcast scene analysis model to perform scene analysis to obtain a target broadcast scene;

the construction module is used for constructing a high-frequency compression strategy for the frequency characteristic set corresponding to the initial audio signal data of each audio device based on the target broadcasting scene, and generating a target frequency compression strategy;

the processing module is used for respectively carrying out high-frequency compression processing on the frequency characteristic set corresponding to the initial audio signal data of each audio device through the target high-frequency compression strategy to obtain processed candidate audio signal data;

The computing module is used for computing the dynamic control range of the candidate audio signal data to obtain a dynamic control range corresponding to the candidate audio signal data, wherein the dynamic control range is used for indicating the volume control range corresponding to the candidate audio signal data;

and the control module is used for dynamically controlling the candidate audio signal data through the dynamic control range, generating corresponding target audio signal data and controlling a plurality of audio devices to carry out directional broadcasting based on the target audio signal data.

A third aspect of the present invention provides a dynamic control apparatus for directional broadcasting, comprising: a memory and at least one processor, the memory having instructions stored therein; the at least one processor invokes the instructions in the memory to cause the dynamic control device of the directional broadcast to perform the dynamic control method of the directional broadcast described above.

A fourth aspect of the present invention provides a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the dynamic control method of directional broadcasting described above.

In the technical scheme provided by the invention, the position of a plurality of audio devices is calibrated to obtain a position information set, and audio signals sent by the plurality of audio devices are collected through the position information set to obtain initial audio signal data of each audio device; performing audio frequency spectrum analysis on the initial audio signal data of each audio device to obtain a frequency characteristic set corresponding to the initial audio signal data of each audio device; inputting a frequency characteristic set corresponding to the initial audio signal data of each audio device into a broadcast scene analysis model for scene analysis to obtain a target broadcast scene; constructing a high-frequency compression strategy for a frequency feature set corresponding to the initial audio signal data of each audio device based on a target broadcasting scene, and generating a target frequency compression strategy; performing high-frequency compression processing on a frequency characteristic set corresponding to the initial audio signal data of each audio device through a target frequency compression strategy to obtain processed candidate audio signal data; calculating a dynamic control range of the candidate audio signal data to obtain a dynamic control range corresponding to the candidate audio signal data, wherein the dynamic control range is used for indicating a volume control range corresponding to the candidate audio signal data; and dynamically controlling the candidate audio signal data through the dynamic control range, generating corresponding target audio signal data, and controlling a plurality of audio devices to carry out directional broadcasting based on the target audio signal data. In the scheme, the method can be automatically adapted to different broadcasting scenes through position calibration, scene analysis and high-frequency compression strategy construction. Whether indoor, outdoor, noisy environments or quiet spaces, the system can be optimized according to scene requirements, providing a better audio experience. The construction and application of high frequency compression strategies can help reduce noise, reduce audio distortion, and ensure that audio remains of high quality in a variety of scenarios. The whole process is automatic, and the system makes intelligent decisions according to the audio data and scene information acquired in real time. This reduces the burden on the broadcast operator and improves the efficiency and consistency of the broadcast. Through dynamic control range calculation and application, the system can intelligently manage the loudness range of the audio, avoiding the audio from being too loud or too weak. This helps to provide a more balanced volume level. Because the system can be adaptively adjusted according to scene and audio characteristics, personalized broadcast experiences can be provided for different listeners to meet their needs and preferences. The scheme can provide broadcast services with higher quality and consistency by comprehensively considering the position information, scene analysis, high-frequency compression strategy and dynamic control range calculation, thereby improving the satisfaction and experience of users. Through automated scene analysis and audio processing, the chance of needing manual intervention is reduced, and the operation cost and maintenance work are reduced.

Drawings

FIG. 1 is a schematic diagram of an embodiment of a dynamic control method for directional broadcasting according to an embodiment of the present invention;

fig. 2 is a flowchart of performing spectrum analysis on each frequency domain signal data to obtain a spectrum data set corresponding to each frequency domain signal data in the embodiment of the present invention;

fig. 3 is a flowchart of data combination of phase spectrum data corresponding to each frequency domain signal data, amplitude spectrum data of each frequency domain signal data, power spectrum data of each frequency domain signal data, and amplitude spectrum data corresponding to each frequency domain signal data, respectively, in an embodiment of the present invention;

fig. 4 is a flowchart of inputting a frequency feature set corresponding to initial audio signal data of each audio device into a preset broadcast scene analysis model for scene analysis in the embodiment of the present invention;

FIG. 5 is a schematic diagram of an embodiment of a dynamic control device for directional broadcasting according to an embodiment of the present invention;

fig. 6 is a schematic diagram of an embodiment of a dynamic control apparatus for directional broadcasting in an embodiment of the present invention.

Detailed Description

The embodiment of the invention provides a dynamic control method, a device, equipment and a storage medium for directional broadcasting, which are used for improving the dynamic control accuracy of multi-scene directional broadcasting.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

For easy understanding, the following describes a specific flow of an embodiment of the present invention, referring to fig. 1, and one embodiment of a dynamic control method for directional broadcasting in an embodiment of the present invention includes:

s101, performing position calibration on a plurality of preset audio devices to obtain a position information set, and acquiring audio signals sent by the plurality of audio devices through the position information set to obtain initial audio signal data of each audio device;

It can be understood that the execution body of the present invention may be a dynamic control device for directional broadcasting, and may also be a terminal or a server, which is not limited herein. The embodiment of the invention is described by taking a server as an execution main body as an example.

Specifically, the position of a plurality of audio devices is calibrated. Because the server knows the exact location of each device for directional broadcasting in subsequent audio collection and processing. Calibrating the location of an audio device is typically accomplished using a variety of techniques, such as Global Positioning System (GPS), wi-Fi positioning, voice positioning, and image recognition. For example, in an outdoor scenario, the server uses a GPS receiver to obtain the geographic coordinates of the device, while in an indoor scenario, the server uses Wi-Fi signal strength and sound localization to estimate the location of the device. When the server obtains the location information for each audio device, the next step is to integrate the information into a set of location information and perform efficient data management. This is an important link to ensure the normal operation of the multi-scene directional broadcasting system. Data is collected and the server periodically or event-triggered collects location information for each audio device using suitable sensors or techniques. These location data are stored in a dedicated database or data structure, including information such as unique identifiers of the devices, geographic coordinates, time stamps, etc. If multiple positioning techniques are used, it is desirable to perform position data fusion to improve the accuracy and reliability of the position information. This fusion process may take into account the weights and reliability of the different techniques to produce the most accurate location information. Audio signals transmitted by a plurality of audio devices are collected to obtain initial audio signal data for each device. This requires consideration of different scenarios and device types. In large indoor or outdoor scenes, a microphone array is a powerful tool consisting of multiple microphones that can capture sound from different directions simultaneously, providing stereo or surround sound effects. In smaller indoor scenes or portable devices, a single microphone is sufficient to capture sound, although simpler, but subject to ambient noise. Sometimes, audio signals need to be transmitted from a remote location to a central processing unit, which involves the use of a network or other means of communication to ensure real-time transmission and collection of audio data.

S102, respectively carrying out audio frequency spectrum analysis on the initial audio signal data of each audio device to obtain a frequency characteristic set corresponding to the initial audio signal data of each audio device;

specifically, the initial audio signal data of each audio device is subjected to data conversion. This may be done by a preset analog to digital converter. The analog-to-digital converter converts the analog audio signal into a digital audio signal for subsequent digital signal processing. The goal of this step is to obtain a plurality of digital audio signals, each representing an input signal of an audio device. The server performs frequency domain conversion on each digital audio signal. This can be achieved by a preset fourier transform algorithm. Fourier transforms transform the time domain signal into a frequency domain signal and transform the audio signal from a time domain representation into a frequency component representation. The result of this step is a plurality of frequency domain signal data, each corresponding to an input signal of an audio device. After obtaining the frequency domain signal, the server performs spectrum analysis. The purpose is to obtain a set of spectral data corresponding to each frequency domain signal data. The spectral data represents the intensity or amplitude distribution of the audio signal at different frequencies. Spectral analysis helps the server understand the frequency content and characteristics of the audio signal. And finally, the server respectively extracts frequency characteristics of the frequency spectrum data set corresponding to each frequency domain signal data. The aim is to extract information about the frequency characteristics of the audio signal from the spectral data. The frequency characteristics may include the main frequency components of the audio signal, frequency ranges, energy distribution, etc. Extracting these features helps the server to learn the sound characteristics of each audio device. For example, assume that a server is creating a multi-scene targeted broadcast system for a scene of a musical activity. The server presets a plurality of microphones, each for capturing a musical performance at a different location. The analog audio signal captured by each microphone is first converted to a digital audio signal by an analog-to-digital converter. For example, the server converts sound on a music stage, sound on an audience, etc., into digital signals. The server converts these digital audio signals into frequency domain signals using fourier transform algorithms. This converts the audio signal from the time domain to the frequency domain, enabling the server to observe the energy distribution of the audio signal over different frequencies. For each microphone captured frequency domain signal, the server performs a spectral analysis. The server calculates spectral data, i.e. amplitude or energy at different frequencies, for each frequency domain signal. This helps the server to know the distribution of the sound captured at each location over frequency. Finally, the server extracts frequency features from each set of spectral data. This may include identifying major frequency components of the audio signal, such as bass, midrange and treble, and their relative intensities. These features will help the server understand the sound characteristics of each location.

Wherein an amplitude spectrum analysis is performed on each frequency domain signal data. The aim is to extract amplitude information, i.e. the amplitudes or amplitudes of the different frequency components, from the frequency domain signal. The amplitude spectrum analysis helps the server to know the amplitude distribution of each frequency domain signal at different frequencies. And carrying out power spectrum calculation on each frequency domain signal data through the amplitude spectrum data corresponding to each frequency domain signal data. The power spectrum is the square of the amplitude, which represents the energy of each frequency component. The power spectrum calculation helps the server to quantify the energy distribution of the frequency domain signal. Meanwhile, amplitude spectrum calculation is performed through amplitude spectrum data corresponding to each frequency domain signal data. Amplitude spectrum computation typically involves taking the logarithm of the amplitude spectrum to obtain a measure of amplitude. This measure is typically in decibels (dB) and is used to represent the relative strength of the signal. Further, phase data extraction is performed for each frequency domain signal data. The phase data represents phase information of each frequency component, i.e. the offset in time of the signal. The phase information is very important for subsequent signal reconstruction. And finally, the server respectively performs data combination on the amplitude spectrum data, the power spectrum data, the amplitude spectrum data and the phase spectrum data corresponding to each frequency domain signal data so as to obtain a frequency spectrum data set corresponding to each frequency domain signal data. This step helps to integrate the frequency domain information in different aspects, providing complete spectral data for subsequent audio processing and high frequency compression strategy construction. For example, assume that a server is building a concert venue-cast multi-scenario targeted broadcast system. The server captures musical performances at different locations using a plurality of microphones and obtains respective frequency domain signal data. For the frequency domain signal data captured by each microphone, the server performs an amplitude spectrum analysis to obtain amplitude information of different frequency components. For example, the server obtains an amplitude distribution of each frequency domain signal over bass, midrange and treble frequencies. From the amplitude spectrum data, the server calculates a power spectrum for each frequency domain signal to measure the energy of the different frequency components. This helps the server to know the volume and importance of different frequencies in a musical performance. Meanwhile, the server performs amplitude spectrum calculation by using the amplitude spectrum data, takes the logarithm of the amplitude and represents the relative strength of the signal in units of decibels. This helps the server to quantify the relative volumes of the different frequency components. The server also extracts phase information from each frequency domain signal data. And finally, the server combines the amplitude spectrum data, the power spectrum data, the amplitude spectrum data and the phase spectrum data of the frequency domain signals captured by each microphone to obtain a complete frequency spectrum data set so as to construct a high-frequency compression strategy and process the audio.

Wherein, for each frequency domain signal data, first, extraction of a main frequency component is performed. This may be achieved by using fourier transforms or other frequency domain analysis methods. The dominant frequency component represents the dominant frequency in the signal. For example, assume that the server has two microphones a and B, which capture frequency domain signal data, respectively. In the signal of microphone a, the main frequency component is 1000Hz, while in the signal of microphone B, the main frequency component is 500Hz. The frequency bandwidth of each frequency domain signal data is calculated using the amplitude spectrum data. The frequency bandwidth represents the range of the distribution of the sound signal over the frequency domain and can be determined by analyzing the width of the amplitude curve. For example, the frequency bandwidth is 200Hz for the signal of microphone a and 100Hz for the signal of microphone B. The total power of each frequency domain signal data is calculated using the power spectrum data. The total power represents the total energy of the signal, typically the area or integral of the amplitude spectrum data. For example, the total power of the signals of microphone a is 2000 units, while the total power of the signals of microphone B is 800 units. Spectral peak power analysis is performed on the amplitude spectral data to extract the spectral peak power of each frequency domain signal data. The spectral peak power generally corresponds to the strongest frequency component of the sound signal. For example, microphone a has a signal peak power of 800 units, while microphone B has a signal peak power of 400 units. An average value of the amplitude spectrum data is calculated to evaluate the overall amplitude level of the sound signal. The average value represents the average amplitude intensity of the signal. For example, the average value of the signal amplitude of microphone a is 50 units, and the average value of the signal amplitude of microphone B is 30 units. A maximum value of the amplitude is extracted from the amplitude spectrum data for determining a maximum volume of the audio signal. For example, the maximum signal amplitude of microphone a is 70 units, while the maximum signal amplitude of microphone B is 40 units. The phase spectrum data is used to calculate the phase average value of each frequency domain signal data. The phase average represents the average phase angle of the signal. For example, the average signal phase of microphone a is 30 degrees, while the average signal phase of microphone B is 45 degrees. And performing phase difference analysis to obtain phase difference information among the different frequency domain signal data. This helps to understand the timing relationship of the signals. For example, the phase difference between microphones a and B is 15 degrees. And finally, carrying out data combination on the main frequency component, the frequency bandwidth data, the total power data, the spectral peak power data, the amplitude average value, the amplitude maximum value, the phase average value and the phase difference data of each frequency domain signal data to obtain a frequency characteristic set corresponding to the initial audio signal data of each audio device. This set will contain all the extracted frequency signature information. For example, for microphone a, the server gets a set of frequency characteristics including the main frequency component, frequency bandwidth, total power, spectral peak power, amplitude average, amplitude maximum, phase average and phase difference. Also for microphone B, there is a similar set of frequency characteristics.

S103, inputting a frequency characteristic set corresponding to the initial audio signal data of each audio device into a preset broadcast scene analysis model for scene analysis to obtain a target broadcast scene;

specifically, a frequency feature set corresponding to the initial audio signal data of each audio device is input into a feature mapping layer of the broadcast scene analysis model. The function of this layer is to map the set of frequency features into frequency feature vectors for further processing. For example, for microphone a, the feature mapping layer maps its set of frequency features into a frequency feature vector a; there is also a similar frequency characteristic vector B for microphone B. The frequency feature vectors a and B are then input to the vector encoding layer of the broadcast scene analysis model. The task of this layer is to encode the frequency feature vector into an encoded feature vector to better represent the characteristics of the audio device. For example, the frequency eigenvector a of the microphone a becomes the coded eigenvector a 'after passing through the vector coding layer, and the frequency eigenvector B of the microphone B becomes the coded eigenvector B'. The encoded feature vectors a 'and B' are fed into the convolutional layer of the broadcast scene analysis model for extraction of scene features. This step helps to identify differences and common features between audio devices to better understand the current acoustic environment. For example, the convolutional layer analyzes the encoded feature vectors a 'and B' to extract the scene feature set C. Finally, the scene feature set C is sent to a full connection layer of the broadcasting scene analysis model for scene analysis to determine a target broadcasting scene. The fully connected layer will comprehensively consider the characteristics and scene characteristics of the audio device to produce the best broadcast scene selection. For example, the full connectivity layer analyzes the scene feature set C, determining the currently most suitable broadcast scene as "indoor environment". Through the process, the server can automatically adjust the broadcasting scene according to the frequency characteristic set and the scene characteristic set of different audio devices so as to provide the optimal audio experience. For example, assume that there are two microphones a and B, which are located in different scenes, a being located in an indoor environment and B being located in an outdoor environment. Indoor environments typically require more low frequency sound, while outdoor environments require more high frequency sound to overcome ambient noise. The frequency feature sets of microphones a and B become frequency feature vectors a and B, respectively, through feature mapping. The frequency feature vectors a and B are encoded into encoded feature vectors a 'and B' taking into account their specific acoustic properties. The convolution layer analyzes the coding feature vectors A 'and B' to extract a scene feature set C. In this example, C would contain information about the low frequency and high frequency characteristics. The full connection layer analyzes the scene feature set C, and the server automatically determines the broadcasting scene most suitable for the current scene. In this case, the server chooses to provide more low frequency sound in the indoor environment to improve the listening experience of the listener.

S104, constructing a high-frequency compression strategy for a frequency feature set corresponding to the initial audio signal data of each audio device based on the target broadcasting scene, and generating a target frequency compression strategy;

specifically, based on the target broadcast scene, the cut-off frequency of the high-frequency component is calculated for the frequency feature set corresponding to the initial audio signal data of each audio device. The aim is to determine the target cut-off frequency, i.e. the part of the frequency spectrum of the audio signal above which will be compressed. For example, in an indoor environment, the target cut-off frequency is lower, e.g., 5kHz, to retain more high frequency information. In an outdoor environment, the target cut-off frequency is high, for example 10kHz, to reduce the effect of ambient noise. And performing high-frequency compression ratio calculation on the frequency characteristic set corresponding to the initial audio signal data of each audio device. This step determines the degree of compression of the high frequency portion signal. The compression ratio is typically expressed in decibels (dB), with a negative value representing compression and a positive value representing expansion. For example, for indoor environments, a smaller negative compression ratio, e.g., -6dB, may be selected to retain high frequency information. In an outdoor environment, a larger negative compression ratio, for example-12 dB, is selected to reduce the effect of ambient noise. And finally, constructing the target high-frequency compression strategy based on the determined target cut-off frequency and the target compression ratio. This strategy will apply high frequency compression on the high frequency part of the audio signal to adapt to the targeted broadcast scene. For example, for an indoor environment, the target high frequency compression strategy would specify that-6 dB of high frequency compression be applied in a frequency range above 5 kHz. For outdoor environments, the target high frequency compression strategy would specify that-12 dB of high frequency compression be applied over a frequency range above 10 kHz. Through the process, the server can automatically construct a high-frequency compression strategy according to the target broadcasting scene so as to adapt to acoustic characteristics under different scenes. This helps to provide a better audio experience, ensuring consistent loudness and quality of broadcast audio in different environments. For example, assume that in a concert venue-cast, there are two microphones a and B located in different locations, but all need to accommodate the same concert environment. The target cut-off frequency was set at 15kHz according to the acoustic properties of the concert to preserve the musical detail of the high frequency part. To accommodate the high loudness of a concert, a smaller negative compression ratio, e.g., -6dB, may be selected to ensure that the audio signal is not distorted. From the above calculations, the target frequency compression strategy will specify that-6 dB of high frequency compression be applied in the frequency range above 15 kHz. This will ensure that the audio broadcast on concert is of consistent quality in the high frequency part and will adapt to the characteristics of the concert environment.

S105, respectively carrying out high-frequency compression processing on the frequency characteristic set corresponding to the initial audio signal data of each audio device through a target high-frequency compression strategy to obtain processed candidate audio signal data;

specifically, a frequency characteristic set corresponding to the initial audio signal data of each audio device is input to the high-frequency compression processing module. These frequency characteristic sets contain information such as amplitude, phase, etc. of the audio signal in different frequency ranges. And carrying out high-frequency compression processing on the frequency characteristic set of each audio device by utilizing the target high-frequency compression strategy constructed before. This process reduces the amplitude of the high frequency portion of the frequency signature set based on the target cut-off frequency and compression ratio. The processed set of frequency characteristics will be used to synthesize candidate audio signal data. This involves restoring the high-frequency compressed set of frequency features to time-domain audio signal data. The above steps are repeated for each audio device to generate respective candidate audio signal data. These candidate audio signal data are further processed and analyzed in subsequent steps to determine the final audio output. Through the process, the server can perform high-frequency compression processing on the audio signals of different audio devices according to the target frequency compression strategy so as to adapt to the acoustic requirements of specific broadcasting scenes. This helps to ensure consistent loudness and quality of broadcast audio in different scenes. For example, assume that there are two audio devices a and B, which are located in indoor and outdoor environments, respectively. According to the target broadcast scenario, the server has constructed a high frequency compression strategy in which the target cut-off frequency of the indoor scenario is 5kHz, the compression ratio is-6 dB, and the target cut-off frequency of the outdoor scenario is 10kHz, the compression ratio is-12 dB. The server receives the initial audio signal data of the audio device A and carries out high-frequency compression processing: inputting a frequency characteristic set: the frequency characteristic set corresponding to the initial audio signal data of the audio equipment A comprises audio information captured in an indoor environment; and (3) applying a target frequency compression strategy: the server reduces the amplitude of a part higher than 5kHz in the frequency characteristic set according to the target cut-off frequency and the compression ratio of the indoor scene; generating candidate audio signal data: the processed set of frequency features is used to synthesize candidate audio signal data for audio device a. Also, with the audio device B, the server may perform similar high-frequency compression processing according to the target cut-off frequency and compression ratio of the outdoor scene. Eventually, the server will generate two candidate audio signal data, respectively adapting the acoustic properties of the indoor and outdoor environment to provide consistent broadcast audio quality.

S106, calculating a dynamic control range of the candidate audio signal data to obtain a dynamic control range corresponding to the candidate audio signal data, wherein the dynamic control range is used for indicating a volume control range corresponding to the candidate audio signal data;

specifically, maximum amplitude detection is performed on candidate audio signal data. This involves analyzing amplitude peaks in the audio signal to determine a maximum amplitude in the audio. This is to identify the highest volume level of the audio signal. The minimum amplitude detection is performed on the same candidate audio signal data. This is to find the lowest amplitude in the audio signal, which typically corresponds to the silent or very low volume portion of the audio. And calculating the dynamic control range by using the detection results of the maximum amplitude and the minimum amplitude. This dynamic control range is used to indicate the volume control range of the audio signal. Typically, it represents a volume range in which an audio signal is allowed, wherein a maximum amplitude corresponds to a highest volume and a minimum amplitude corresponds to a lowest volume. The calculation of the control range may take a variety of methods, such as linear scaling or logarithmic scaling, to ensure that the volume adjustment is uniform and appropriate. For example, suppose there is an audio device C that is located in a noisy outdoor scene, generating a candidate audio signal data. Through maximum amplitude detection, the server determines that the maximum amplitude of the audio signal is +6dB, indicating a high volume level. With minimum amplitude detection, the server determines that the minimum amplitude of the audio signal is-36 dB, indicating silence or a very low volume level. The server uses these amplitude information to calculate the dynamic control range. In this case, the dynamic control range may be set to +6dB to-36 dB, indicating that the volume of the audio signal may be adjusted within this range. The audio can be dynamically adjusted between high and low volumes according to the listener's needs to accommodate a noisy outdoor environment.

And S107, dynamically controlling the candidate audio signal data through the dynamic control range, generating corresponding target audio signal data, and controlling a plurality of audio devices to carry out directional broadcasting based on the target audio signal data.

In the previous step, the server calculates a dynamic control range indicating a volume range of the audio signal to be permitted. The server will adjust the candidate audio signal data according to this range to generate target audio signal data. This process may be implemented in a variety of ways, depending on the design and requirements of the server. One common approach is to use audio processing algorithms, such as audio compression or expansion, to adjust the volume of the audio signal. These algorithms can amplify or reduce the audio within dynamic control ranges to ensure that it adapts to the acoustic characteristics of a particular scene and the needs of the listener. For example, in a noisy outdoor environment, the audio needs to increase in volume, while in a quiet indoor environment the volume needs to decrease. In addition to volume adjustment, other quality optimization processes may be performed on the audio signal. This includes noise reduction, spectrum equalization, audio sharpness enhancement, etc. These processes may be adjusted according to the acoustic characteristics of the target scene to provide a better audio experience. For example, assume that the server has a multi-scene directional broadcast system that includes audio devices a and B that are located in different environments. Audio device a is located in a noisy stadium and audio device B is located in a calm library. Both capture candidate audio signal data. The server processes these audio signals using dynamic control ranges. The server detects that the dynamic control range of audio device a is +6dB to-36 dB, while the dynamic control range of audio device B is +3dB to-30 dB. For audio device a, the server amplifies the volume to fit in a noisy environment while performing noise reduction processing to reduce the effect of ambient noise. These processes generate target audio signal data that ensures that the listener gets clear and loud audio in the stadium. For audio device B, the server would reduce the volume moderately to ensure that the audio does not interfere in a calm library environment. At the same time, it will perform an audio equalization process to adapt to the acoustic properties of this environment, producing clear and soft audio. Finally, through such dynamic control and processing, the server generates target audio signal data adapted to different scenes for directional broadcasting. This ensures that high quality and suitable audio services are provided in different environments without manual intervention. By dynamically controlling the range of application, the multi-scene directional broadcast system can achieve adaptive audio processing, providing a better listening experience while reducing reliance on operators. This is one of the important innovations in modern broadcasting technology, and is expected to improve the quality of service in the broadcast and audio transmission fields.

In the embodiment of the invention, the scheme can be automatically adapted to different broadcasting scenes through position calibration, scene analysis and high-frequency compression strategy construction. Whether indoor, outdoor, noisy environments or quiet spaces, the system can be optimized according to scene requirements, providing a better audio experience. The construction and application of high frequency compression strategies can help reduce noise, reduce audio distortion, and ensure that audio remains of high quality in a variety of scenarios. The whole process is automatic, and the system makes intelligent decisions according to the audio data and scene information acquired in real time. This reduces the burden on the broadcast operator and improves the efficiency and consistency of the broadcast. Through dynamic control range calculation and application, the system can intelligently manage the loudness range of the audio, avoiding the audio from being too loud or too weak. This helps to provide a more balanced volume level. Because the system can be adaptively adjusted according to scene and audio characteristics, personalized broadcast experiences can be provided for different listeners to meet their needs and preferences. The scheme can provide broadcast services with higher quality and consistency by comprehensively considering the position information, scene analysis, high-frequency compression strategy and dynamic control range calculation, thereby improving the satisfaction and experience of users. Through automated scene analysis and audio processing, the chance of needing manual intervention is reduced, and the operation cost and maintenance work are reduced.

In a specific embodiment, the process of executing step S102 may specifically include the following steps:

(1) Respectively carrying out data conversion on the initial audio signal data of each audio device through a preset analog-to-digital converter to obtain a plurality of digital audio signals;

(2) Performing frequency domain conversion on a plurality of digital audio signals through a preset Fourier transform algorithm to obtain a plurality of frequency domain signal data;

(3) Respectively carrying out spectrum analysis on each frequency domain signal data to obtain a spectrum data set corresponding to each frequency domain signal data;

(4) And respectively extracting frequency characteristics of the frequency spectrum data set corresponding to each frequency domain signal data to obtain the frequency characteristic set corresponding to the initial audio signal data of each audio device.

Specifically, the initial audio signal data of each audio device is subjected to data conversion. This typically involves converting the analog audio signal to a digital audio signal using a preset analog-to-digital converter. The analog-to-digital converter converts the continuous analog audio waveform into a discrete digital representation. Each audio device generates a digital audio signal. These digital audio signals are frequency-domain converted by a preset fourier transform algorithm. The fourier transform converts a signal in the time domain into a signal in the frequency domain, i.e. an audio signal from a time domain representation into a frequency representation. This will produce a plurality of frequency domain signal data, each corresponding to an audio device. Each frequency domain signal data is subjected to a spectral analysis. The spectrum is a graph describing the energy distribution of a signal over different frequencies. Spectral analysis can help the server understand the different frequency components contained in the audio signal and their intensities. This step generates a set of spectral data corresponding to each frequency domain signal data. Finally, frequency features are extracted from each set of spectral data. The frequency characteristics comprise information such as main frequency components, frequency bandwidths, total power, spectrum peak power, amplitude average value, amplitude maximum value, phase average value, phase difference and the like. These features are used to describe the frequency domain characteristics of the audio signal. For example, assume that the server has two audio devices, one microphone A and the other microphone B, which capture sounds from different speakers. The analog sound signal of the microphone a is converted into a digital audio signal a by an analog-to-digital converter, and the analog sound signal of the microphone B is converted into a digital audio signal B. The digital audio signals a and B are converted into frequency domain signal data a and B using a fourier transform algorithm. These data represent the distribution of sound over different frequencies. And carrying out spectrum analysis on the frequency domain signal data A and B to obtain a spectrum data set of the frequency domain signal data A and B. These data sets show the characteristics of the sound signal in the frequency domain. Frequency features such as a main frequency component, a frequency bandwidth, and the like are extracted from the spectrum data sets a and B. For example, the server finds that the main frequency component of microphone a is the speaker's voice frequency, while the main frequency component of microphone B is the noise of the surrounding environment.

In a specific embodiment, as shown in fig. 2, the process of performing the step of performing spectrum analysis on each frequency domain signal data to obtain a spectrum data set corresponding to each frequency domain signal data may specifically include the following steps:

s201, performing amplitude spectrum analysis on each frequency domain signal data to obtain amplitude spectrum data corresponding to each frequency domain signal data;

s202, performing power spectrum calculation on each frequency domain signal data through amplitude spectrum data corresponding to each frequency domain signal data to obtain power spectrum data of each frequency domain signal data;

s203, performing amplitude spectrum calculation on each frequency domain signal data through the amplitude spectrum data corresponding to each frequency domain signal data to obtain the amplitude spectrum data of each frequency domain signal data;

s204, extracting phase data of each frequency domain signal data to obtain phase spectrum data corresponding to each frequency domain signal data;

s205, respectively carrying out data combination on phase spectrum data corresponding to each frequency domain signal data, amplitude spectrum data of each frequency domain signal data, power spectrum data of each frequency domain signal data and amplitude spectrum data corresponding to each frequency domain signal data to obtain a frequency spectrum data set corresponding to each frequency domain signal data.

The amplitude spectrum analysis is performed for each frequency domain signal data. An amplitude spectrum is a graph that describes the amplitude or magnitude of a signal at different frequencies. This step will extract the amplitude spectrum data corresponding to each frequency domain signal data. Based on the amplitude spectrum data, a power spectrum of each frequency domain signal data is calculated. The power spectrum represents the energy distribution of the signal at different frequencies. By calculating the power spectrum by squaring the amplitude, the power spectrum data of each frequency domain signal data can be obtained. The amplitude spectrum of each frequency domain signal data is also calculated using the amplitude spectrum data. The amplitude spectrum is typically a logarithmic representation of the amplitude in order to better describe the relative strength of the signal. This will generate amplitude spectrum data for each frequency domain signal data. Phase information is extracted from each frequency domain signal data. The phase represents the relative offset or time information of the signal waveform. Phase information is important in certain audio processing applications, such as synthesizing audio. And finally, combining the extracted phase spectrum data, amplitude spectrum data, power spectrum data and amplitude spectrum data to obtain a spectrum data set corresponding to each frequency domain signal data. This set contains comprehensive information about the frequency domain characteristics of the signal. For example, assume that the server has an audio signal that represents a piece of music. Amplitude spectrum analysis is performed on frequency domain signal data of the audio signal. This will extract the amplitude information on each frequency. For example, for a certain frequency $f_1$, the amplitude spectrum tells the server the magnitude of the amplitude of the signal at $f_1$. Based on the amplitude spectrum data, a power spectrum of each frequency domain signal data is calculated. By squaring the amplitude, the server gets the power distribution at each frequency. For example, the power spectrum of a certain frequency $f_2$ tells the server the energy of the signal at $f_2$. The amplitude spectrum of each frequency domain signal data is calculated using the amplitude spectrum data. The amplitude spectrum is typically a logarithmic representation of the amplitude to better describe the relative strength of the signal. This allows the server to know which frequency components are more pronounced in the signal. Phase information is extracted from the frequency domain signal data. The phase data represents the relative shift of the signal waveform. This is important for the time domain properties of the audio. And finally, merging the extracted phase spectrum data, amplitude spectrum data, power spectrum data and amplitude spectrum data into a spectrum data set. This set contains comprehensive information of the audio signal at different frequencies, which can be used for various audio processing tasks.

In a specific embodiment, as shown in fig. 3, the process of executing step S205 may specifically include the following steps:

s301, extracting main frequency components of amplitude spectrum data corresponding to each frequency domain signal data to obtain main frequency components of each frequency domain signal data;

s302, performing frequency bandwidth calculation on amplitude spectrum data corresponding to each frequency domain signal data to obtain frequency bandwidth data of each frequency domain signal data, and performing data combination on main frequency components of each frequency domain signal data and the frequency bandwidth data of each frequency domain signal data to obtain a first initial frequency characteristic set corresponding to each frequency domain signal data;

s303, performing total power calculation on the power spectrum data of each frequency domain signal data to obtain total power data of each frequency domain signal data;

s304, carrying out spectral peak power analysis on each frequency domain signal data to obtain spectral peak power data of each frequency domain signal data, and carrying out data combination on total power data of each frequency domain signal data and the spectral peak power data of each frequency domain signal data to obtain a second initial frequency characteristic set corresponding to each frequency domain signal data;

s305, calculating an amplitude average value of the amplitude spectrum data of each frequency domain signal data to obtain the amplitude average value of each frequency domain signal data;

S306, extracting the maximum amplitude value of each frequency domain signal data to obtain the maximum amplitude value of each frequency domain signal data, and carrying out data combination on the average amplitude value of each frequency domain signal data and the maximum amplitude value of each frequency domain signal data to obtain a third initial frequency characteristic set corresponding to each frequency domain signal data;

s307, calculating a phase average value of the phase spectrum data corresponding to each frequency domain signal data to obtain the phase average value of each frequency domain signal data;

s308, carrying out phase difference analysis on each frequency domain signal data to obtain phase difference data of each frequency domain signal data, and carrying out data combination on a phase average value of each frequency domain signal data and the phase difference data of each frequency domain signal data to obtain a fourth initial frequency characteristic set corresponding to each frequency domain signal data;

s309, respectively carrying out data combination on the first initial frequency characteristic set corresponding to each frequency domain signal data, the second initial frequency characteristic set corresponding to each frequency domain signal data, the third initial frequency characteristic set corresponding to each frequency domain signal data and the fourth initial frequency characteristic set corresponding to each frequency domain signal data to obtain the frequency characteristic set corresponding to the initial audio signal data of each audio device.

The amplitude spectrum analysis is performed for each frequency domain signal data. An amplitude spectrum is a graph that describes the amplitude or magnitude of a signal at different frequencies. This step will extract the amplitude spectrum data corresponding to each frequency domain signal data. The main frequency component of each frequency domain signal data is extracted from the amplitude spectrum data. The dominant frequency component is typically the highest peak in the amplitude spectrum, representing the dominant frequency of the signal. These main frequency components are used to describe the fundamental frequency characteristics of the signal. Based on the amplitude spectrum data, a frequency bandwidth of each frequency domain signal data is calculated. The frequency bandwidth represents the width or distribution range of the signal over the frequency domain. This calculation may tell the server about the frequency distribution of the signal. Based on the amplitude spectrum data, a power spectrum of each frequency domain signal data is calculated. The power spectrum represents the energy distribution of the signal at different frequencies. By calculating the power spectrum by squaring the amplitude, the power spectrum data of each frequency domain signal data can be obtained. And analyzing the power spectrum of each frequency domain signal data, and extracting spectral peak power data. Spectral peak power is the highest peak in the power spectrum, representing the strongest frequency component of the signal. This feature can be used to describe the spectral peak characteristics of the signal. The amplitude spectrum of each frequency domain signal data is averaged to calculate an amplitude average. The amplitude average represents the average amplitude of the signal over the entire frequency range. The amplitude maximum value of each frequency domain signal data is extracted from the amplitude spectrum data. The amplitude maximum represents the strongest amplitude in the signal. The phase spectrum of each frequency domain signal data is averaged to calculate a phase average value. The phase average represents the average phase of the signal over the entire frequency range. And analyzing the phase spectrum of each frequency domain signal data, and extracting phase difference data. The phase difference data may tell the server how the signals are phase-different at different frequencies. And finally, combining the extracted main frequency components, frequency bandwidth data, power spectrum data, spectral peak power data, amplitude average value, amplitude maximum value, phase average value and phase difference data to obtain a frequency spectrum data set corresponding to each frequency domain signal data. This set contains comprehensive information about the frequency domain characteristics of the signal. For example, assume that the server has an audio signal representing a song that a human voice sings. Amplitude spectrum analysis is performed on frequency domain signal data of the audio signal. This will extract the amplitude information on each frequency. For example, for a certain frequency $f_1$, the amplitude spectrum tells the server the magnitude of the amplitude of the signal at $f_1$. The main frequency component, i.e., the highest peak frequency, of each frequency domain signal data is extracted from the amplitude spectrum data. This may tell the server the dominant pitch of the singer. Based on the amplitude spectrum data, a frequency bandwidth of each frequency domain signal data is calculated. This tells the server the gamut width of the different notes in the music. Based on the amplitude spectrum data, a power spectrum of each frequency domain signal data is calculated. The power spectrum reflects which frequencies in the music have more volume and energy. The power spectrum is analyzed and the highest power peak is extracted, which represents the strongest frequency component in the music, usually corresponding to the main melody. The amplitude spectrum is averaged to calculate an average amplitude of the audio signal. This may help the server to learn the overall volume of music. The maximum amplitude is extracted from the amplitude spectrum data, representing the strongest amplitude in the music. The phase spectrum is averaged to calculate the average phase of the audio signal. This can be used for audio synthesis and processing. And analyzing the phase spectrum and extracting phase difference information on different frequencies. And finally, combining the extracted characteristic data into a frequency spectrum data set, wherein the frequency spectrum data set comprises main frequency components, frequency bandwidths, power spectrums, spectrum peak power, amplitude average value, amplitude maximum value, phase average value and phase difference data. This set contains comprehensive information about the frequency domain characteristics of the music and can be used for various audio analysis and processing tasks.

In a specific embodiment, as shown in fig. 4, the process of performing step S103 may specifically include the following steps:

s401, inputting a frequency characteristic set corresponding to initial audio signal data of each audio device into a characteristic mapping layer of a broadcast scene analysis model to perform frequency characteristic mapping, and outputting a frequency characteristic vector corresponding to each audio device;

s402, inputting the frequency characteristic vector corresponding to each audio device into a vector coding layer of a broadcast scene analysis model to perform vector coding, and outputting the coding characteristic vector corresponding to each audio device;

s403, inputting the coding feature vector corresponding to each audio device into a convolution layer of a broadcast scene analysis model to extract scene features, so as to obtain a scene feature set;

s404, inputting the scene feature set into a full-connection layer of the broadcasting scene analysis model to perform scene analysis, and obtaining a target broadcasting scene.

Specifically, a frequency feature set corresponding to the initial audio signal data of each audio device is input to a feature mapping layer of the broadcast scene analysis model. This layer typically includes a convolution layer, pooling layer, or other feature extraction layer for extracting higher level representations from the frequency features. This step will generate a corresponding frequency feature vector for each audio device. And inputting the frequency characteristic vector corresponding to each audio device into a vector coding layer of the broadcasting scene analysis model. Vector encoding may be a fully concatenated layer or other suitable layer for converting frequency feature vectors into higher level encoded feature vectors. These encoded feature vectors typically contain more information about the audio features. The encoded feature vectors are input into a convolutional layer or other suitable layer of the broadcast scene analysis model for extracting scene features from the encoded features. This step will generate a set of scene features describing the interaction between the audio devices and the scene information. Finally, the scene feature set is input into a fully connected layer or other appropriate layers of the broadcast scene analysis model for scene analysis to determine the target broadcast scene. This step involves classification, regression or other tasks, according to the specific needs of the server. For example, assume that a server is developing a multi-scene directional broadcasting system including both indoor and outdoor scenes. The server has two audio devices, one for indoor and one for outdoor. The servers want to switch automatically to the appropriate scene based on their audio input. For indoor audio equipment, the frequency characteristic set corresponding to the initial audio signal data is input to a characteristic mapping layer. This layer can detect low and high frequency components in the audio, such as ambient noise and speech signals. The frequency feature vector after feature mapping is input to a vector encoding layer. This layer may convert the frequency feature vector into a richer coded feature vector, containing more information about the audio. In an indoor scenario, the encoded feature vector contains information about the acoustic properties of the room, such as the sound absorption effect and echoes in the room. The convolutional layer may extract relevant scene features from it. And inputting the scene feature set into the full connection layer for scene analysis. Based on the trained model, the server determines whether the current scene is indoor or outdoor and adjusts the audio processing parameters accordingly to provide the best sound quality and loudness.

In a specific embodiment, the process of executing step S104 may specifically include the following steps:

(1) Based on a target broadcasting scene, performing cut-off frequency calculation of a high-frequency component on a frequency characteristic set corresponding to initial audio signal data of each audio device to obtain a target cut-off frequency;

(2) Performing high-frequency compression ratio calculation on a frequency characteristic set corresponding to the initial audio signal data of each audio device to obtain a target compression ratio;

(3) And constructing a high-frequency compression strategy based on the target cut-off frequency and the target compression ratio to obtain the target frequency compression strategy.

Specifically, the target cut-off frequency is determined according to the characteristics of the broadcast scene. This may be a fixed value or a dynamically adjusted value in different scenarios. The target cut-off frequency is typically located in the high frequency part of the audio spectrum for controlling the loss or retention of high frequency components. By analyzing the broadcast scenario and the user requirements, an appropriate target cut-off frequency can be determined. For example, for a noisy outdoor environment, a higher cut-off frequency is required to preserve speech intelligibility, while in a quiet indoor environment, a lower cut-off frequency may be employed to reduce noise. After the target cut-off frequency is determined, the target compression ratio needs to be calculated next. This ratio is used to adjust the amplitude of the high frequency components to achieve high frequency compression. The compression ratio may be determined based on the desired degree of high frequency attenuation. In general, a higher compression ratio results in a stronger high frequency attenuation, thereby reducing the amplitude of the high frequency components, while a lower compression ratio retains more high frequency information. And constructing a high-frequency compression strategy based on the target cut-off frequency and the target compression ratio. This strategy may be a digital filter for filtering and attenuating the high frequency components in the frequency domain. One common high frequency compression strategy is to design a low pass filter with a cut-off frequency equal to the target cut-off frequency and apply a compression ratio above the cut-off frequency. This can be achieved by multiplying the amplitude of the high frequency portion by the compression ratio. For example, assume that a server's multi-scene directional broadcasting system needs to automatically adapt to different environments, including indoor and outdoor. For indoor environments, the server decides to employ a high frequency compression strategy with a cut-off frequency of 8kHz and a compression ratio of 0.7 to reduce the effects of noise. For outdoor environments, the server has higher speech intelligibility, and therefore adopts a high frequency compression strategy with a cut-off frequency of 12kHz and a compression ratio of 0.5 to preserve more high frequency information. Calculating a target cut-off frequency: indoor environment: the target cut-off frequency is 8kHz; outdoor environment: the target cut-off frequency was 12kHz. Calculating a target compression ratio: indoor environment: the target compression ratio is 0.7; outdoor environment: the target compression ratio was 0.5. For indoor environments, a low pass filter is constructed with a cut-off frequency of 8kHz, and then the amplitude of the frequency portion above 8kHz is multiplied by 0.7 to achieve high frequency compression. For outdoor environments, a low pass filter is constructed with a cut-off frequency of 12kHz, and then the amplitude of the frequency portion above 12kHz is multiplied by 0.5 to achieve high frequency compression.

In a specific embodiment, the process of executing step S106 may specifically include the following steps:

(1) Detecting the maximum amplitude of the candidate audio signal data to obtain a target maximum amplitude;

(2) Detecting the minimum amplitude of the candidate audio signal data to obtain a target minimum amplitude;

(3) And calculating the dynamic control range of the candidate audio signal data through the target maximum amplitude and the target minimum amplitude to obtain the dynamic control range corresponding to the candidate audio signal data, wherein the dynamic control range is used for indicating the volume control range corresponding to the candidate audio signal data.

Specifically, the candidate audio signal data is subjected to amplitude detection to find the maximum amplitude of the signal. This may be done by analyzing the sample value of the audio signal, typically using Root Mean Square (RMS) amplitude or peak amplitude, among other methods. Likewise, the candidate audio signal data is subjected to amplitude detection to find the minimum amplitude of the signal. The minimum amplitude is typically related to the background noise level of the audio signal. The dynamic control range is calculated by the maximum amplitude and the minimum amplitude. This range will be used to indicate the volume control range of the candidate audio signal data. In general, the calculation of the dynamic control range may employ one of two methods. Using the difference between the maximum amplitude and the minimum amplitude, for example: dynamic control range = maximum amplitude-minimum amplitude; a ratio of maximum amplitude to minimum amplitude is used, for example: dynamic control range = maximum amplitude/minimum amplitude. The calculation method of the dynamic control range can be adjusted according to specific application and requirements. By controlling the range, the server adjusts the volume range of the audio signal to ensure adaptation to different listeners and environments. For example, assume that a server's broadcast system plays audio in different scenes, including indoors and outdoors. In an indoor environment there is a lower background noise level, while in an outdoor environment the background noise is higher. The server wants to dynamically control the volume range of the audio to accommodate both cases. Maximum amplitude detection: for candidate audio signals of an indoor scene, detecting that the maximum amplitude is 1.0; for a candidate audio signal of an outdoor scene, a maximum amplitude of 2.0 is detected. Minimum amplitude detection: for candidate audio signals of an indoor scene, detecting that the minimum amplitude is 0.2; for a candidate audio signal of an outdoor scene, a minimum amplitude of 0.5 is detected. For candidate audio signals of an indoor scene, the dynamic control range=1.0-0.2=0.8. For candidate audio signals for outdoor scenes, dynamic control range = 2.0-0.5 = 1.5. When playing audio, the server dynamically adjusts the volume range of the audio according to the detected scene. In an outdoor environment, the server expands the volume range to ensure that the audio is clearly audible, while in an indoor environment, the server reduces the volume range to accommodate lower background noise levels. In this way, the server provides a better listening experience, making the broadcast more adaptive.

The foregoing describes a method for dynamic control of directional broadcasting in an embodiment of the present invention, and the following describes a device for dynamic control of directional broadcasting in an embodiment of the present invention, referring to fig. 5, and one embodiment of the device for dynamic control of directional broadcasting in an embodiment of the present invention includes:

the calibration module 501 is configured to perform position calibration on a plurality of preset audio devices to obtain a position information set, and collect audio signals sent by the plurality of audio devices through the position information set to obtain initial audio signal data of each audio device;

the analysis module 502 is configured to perform audio spectrum analysis on the initial audio signal data of each audio device, so as to obtain a frequency feature set corresponding to the initial audio signal data of each audio device;

an input module 503, configured to input a frequency feature set corresponding to initial audio signal data of each audio device into a preset broadcast scene analysis model to perform scene analysis, so as to obtain a target broadcast scene;

the construction module 504 is configured to perform high-frequency compression policy construction on a frequency feature set corresponding to initial audio signal data of each audio device based on the target broadcast scene, so as to generate a target frequency compression policy;

The processing module 505 is configured to perform high-frequency compression processing on a frequency feature set corresponding to the initial audio signal data of each audio device through the target high-frequency compression policy, so as to obtain processed candidate audio signal data;

a calculating module 506, configured to perform dynamic control range calculation on the candidate audio signal data to obtain a dynamic control range corresponding to the candidate audio signal data, where the dynamic control range is used to indicate a volume control range corresponding to the candidate audio signal data;

and the control module 507 is configured to dynamically control the candidate audio signal data through the dynamic control range, generate corresponding target audio signal data, and control a plurality of audio devices to perform directional broadcasting based on the target audio signal data.

Through the cooperation of the components, the scheme can be automatically adapted to different broadcasting scenes through position calibration, scene analysis and high-frequency compression strategy construction. Whether indoor, outdoor, noisy environments or quiet spaces, the system can be optimized according to scene requirements, providing a better audio experience. The construction and application of high frequency compression strategies can help reduce noise, reduce audio distortion, and ensure that audio remains of high quality in a variety of scenarios. The whole process is automatic, and the system makes intelligent decisions according to the audio data and scene information acquired in real time. This reduces the burden on the broadcast operator and improves the efficiency and consistency of the broadcast. Through dynamic control range calculation and application, the system can intelligently manage the loudness range of the audio, avoiding the audio from being too loud or too weak. This helps to provide a more balanced volume level. Because the system can be adaptively adjusted according to scene and audio characteristics, personalized broadcast experiences can be provided for different listeners to meet their needs and preferences. The scheme can provide broadcast services with higher quality and consistency by comprehensively considering the position information, scene analysis, high-frequency compression strategy and dynamic control range calculation, thereby improving the satisfaction and experience of users. Through automated scene analysis and audio processing, the chance of needing manual intervention is reduced, and the operation cost and maintenance work are reduced.

The above fig. 5 describes the dynamic control device for directional broadcasting in the embodiment of the present invention in detail from the point of view of the modularized functional entity, and the following describes the dynamic control device for directional broadcasting in the embodiment of the present invention in detail from the point of view of hardware processing.

Fig. 6 is a schematic structural diagram of a directional broadcast dynamic control device according to an embodiment of the present invention, where the directional broadcast dynamic control device 600 may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPU) 610 (e.g., one or more processors) and a memory 620, and one or more storage media 630 (e.g., one or more mass storage devices) storing applications 633 or data 632. Wherein the memory 620 and the storage medium 630 may be transitory or persistent storage. The program stored in the storage medium 630 may include one or more modules (not shown), each of which may include a series of instruction operations in the dynamic control apparatus 600 for directional broadcasting. Still further, the processor 610 may be configured to communicate with the storage medium 630 to execute a series of instruction operations in the storage medium 630 on the directionally-broadcasted dynamic control device 600.

The broadcast-directed dynamic control device 600 may also include one or more power supplies 640, one or more wired or wireless network interfaces 650, one or more input/output interfaces 660, and/or one or more operating systems 631, such as WindowsServe, macOSX, unix, linux, freeBSD, and the like. It will be appreciated by those skilled in the art that the configuration of the directional broadcast dynamic control device shown in fig. 6 does not constitute a limitation of the directional broadcast dynamic control device and may include more or less components than illustrated, or may combine certain components, or may be arranged in a different arrangement of components.

The invention also provides a dynamic control device of directional broadcasting, which comprises a memory and a processor, wherein the memory stores computer readable instructions, and the computer readable instructions, when executed by the processor, cause the processor to execute the steps of the dynamic control method of directional broadcasting in the above embodiments.

The present invention also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, and may also be a volatile computer readable storage medium, in which instructions are stored which, when executed on a computer, cause the computer to perform the steps of the dynamic control method of directional broadcasting.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

The integrated units, if implemented in the form of software functional units and sold or passed as separate products, may be stored in a computer readable storage medium. Based on the understanding that the technical solution of the present invention may be embodied in essence or in a part contributing to the prior art or in whole or in part in the form of a software product stored in a storage medium, comprising instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or other various media capable of storing program codes.

The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A dynamic control method of directional broadcasting, characterized in that the dynamic control method of directional broadcasting comprises:

2. The method for dynamic control of directional broadcasting according to claim 1, wherein the performing audio spectrum analysis on the initial audio signal data of each audio device to obtain a frequency feature set corresponding to the initial audio signal data of each audio device includes:

3. The method for dynamic control of directional broadcasting according to claim 2, wherein the performing spectral analysis on each of the frequency domain signal data to obtain a set of spectral data corresponding to each of the frequency domain signal data includes:

4. The method for dynamic control of directional broadcasting according to claim 3, wherein the step of extracting frequency characteristics from the set of spectral data corresponding to each of the frequency domain signal data to obtain the set of frequency characteristics corresponding to the initial audio signal data of each of the audio devices comprises:

5. The method for dynamic control of directional broadcasting according to claim 1, wherein inputting the frequency feature set corresponding to the initial audio signal data of each audio device into a preset broadcasting scene analysis model for scene analysis to obtain a target broadcasting scene comprises:

6. The method for dynamic control of directional broadcasting according to claim 1, wherein the constructing a high-frequency compression policy for a frequency feature set corresponding to initial audio signal data of each audio device based on the target broadcasting scene, and generating a target frequency compression policy, includes:

7. The method for dynamic control of directional broadcasting according to claim 1, wherein the calculating the dynamic control range of the candidate audio signal data to obtain the dynamic control range corresponding to the candidate audio signal data, wherein the dynamic control range is used for indicating the volume control range corresponding to the candidate audio signal data, includes:

8. A dynamic control apparatus for directional broadcasting, characterized in that the dynamic control apparatus for directional broadcasting comprises:

9. A directional broadcasting dynamic control apparatus, characterized in that the directional broadcasting dynamic control apparatus comprises: a memory and at least one processor, the memory having instructions stored therein;

the at least one processor invoking the instructions in the memory to cause the dynamic control device of the directed broadcast to perform the dynamic control method of the directed broadcast of any of claims 1-7.

10. A computer readable storage medium having instructions stored thereon, which when executed by a processor, implement the dynamic control method of directional broadcasting according to any one of claims 1-7.