CN117079660B

CN117079660B - Panoramic sound real-time data noise reduction method for rebroadcasting vehicle

Info

Publication number: CN117079660B
Application number: CN202311349792.9A
Authority: CN
Inventors: 林广远; 于路; 李维; 彭炯; 梁超翔; 魏志元; 伍家杰
Original assignee: Guangdong Tusheng Ultra High Definition Innovation Center Co ltd
Current assignee: Guangdong Tusheng Ultra High Definition Innovation Center Co ltd
Priority date: 2023-10-18
Filing date: 2023-10-18
Publication date: 2023-12-19
Anticipated expiration: 2043-10-18
Also published as: CN117079660A

Abstract

The invention discloses a panoramic sound real-time data noise reduction method for a rebroadcasting vehicle, which belongs to electric digital data processing and comprises the following steps: acquiring speech spectrum space data of panoramic sound audio data to be reduced acquired in real time, dividing the speech spectrum space data to obtain a frequency spectrum data sequence, and dividing the frequency spectrum data sequence to obtain a characteristic data bar; combining the characteristic data strips into characteristic data blocks; comparing based on the characteristic vectors of the data blocks, grouping to obtain a target data set, and obtaining characteristic values of the target data set; inputting the characteristic values into a pre-constructed neural network, deleting the language spectrum space data corresponding to the noise data block determined by the neural network, and obtaining the noise reduction language spectrum space data. Thus, the speech spectrum space data of the panoramic sound audio data to be reduced is segmented to obtain the characteristic data blocks, then the characteristic data blocks are compared according to the characteristic vectors of the data blocks to obtain the characteristic values, and then the noise is determined based on the characteristic values, so that the noise reduction is realized, and the noise reduction effect of the panoramic real-time data of the rebroadcasting vehicle is improved.

Description

Panoramic sound real-time data noise reduction method for rebroadcasting vehicle

Technical Field

The invention relates to the technical field of electric digital data processing, in particular to a panoramic sound real-time data noise reduction method for a rebroadcasting vehicle.

Background

There is an increasing demand for audio quality, and various noise is inevitably collected during the audio collection process. The traditional denoising method comprises linear filtering, wiener filtering, kalman filtering and the like; with the development of artificial intelligence technology, people are also combining deep learning technology and audio noise reduction to try to obtain better noise reduction effect.

Noise reduction means for audio in a complex environment generally needs to acquire noise samples in advance, and then reduce noise for a current scene according to the noise samples. In a real-time rebroadcasting scene of a news rebroadcasting vehicle, a worker needs to frequently go to and from different complex scenes, and cannot acquire noise samples in advance, so that high-quality noise reduction of audio cannot be performed by the existing noise reduction method for recording the noise samples, and only an algorithm with wider applicability can be used for improving the audio quality, but the noise reduction effect is not ideal.

Disclosure of Invention

The invention provides a panoramic sound real-time data noise reduction method for a rebroadcasting vehicle, aiming at improving the noise reduction effect of panoramic sound real-time data of the rebroadcasting vehicle.

In order to achieve the above object, the present invention provides a method for reducing noise of panoramic sound real-time data for a rebroadcasting vehicle, the method comprising:

performing fast Fourier transform on panoramic sound audio data to be reduced acquired in real time to obtain language spectrum space data;

dividing the language spectrum space data along an x-axis to obtain a spectrum data sequence, and dividing the spectrum data sequence based on a maximum value and a minimum value to obtain a plurality of characteristic data strips;

combining the characteristic data strips into characteristic data blocks based on the mean differences of the adjacent characteristic data strips;

comparing the characteristic data blocks based on the data block characteristic vectors of the characteristic data blocks, grouping based on the comparison result to obtain a target data set, and obtaining characteristic values of the target data set;

inputting the characteristic value into a pre-constructed neural network, deleting the language spectrum space data corresponding to the noise data block determined by the neural network, and obtaining the noise reduction language spectrum space data.

Optionally, the segmenting the speech spectrum space data along the x-axis to obtain a spectrum data sequence, and segmenting the spectrum data sequence based on a maximum value and a minimum value to obtain a plurality of characteristic data strips includes:

dividing the audio time of the x-axis in the speech spectrum space data according to a preset duration to obtain a spectrum data sequence;

deriving the spectrum data sequence after mean value filtering to obtain a derivative sequence, and marking convex points and concave points in the derivative sequence;

marking a maximum value corresponding to the convex point and a minimum value corresponding to the concave point in the frequency spectrum data sequence;

and extending data in the front and back directions of the spectrum data sequence based on the maximum value to obtain a data group:

and storing the data group of each maximum value in the spectrum data sequence as a characteristic data strip of the spectrum data sequence.

Optionally, the data extending in the front and back directions of the spectrum data sequence based on the maximum value to obtain a data set includes:

taking the maximum value as a base point, extending forward on the spectrum data sequence to obtain first data, and forming a first data group by the obtained first data and the maximum value;

then, extending backwards on the spectrum data sequence to obtain second data, and calculating the average absolute error of the second data and spectrum data contained in the first data group;

if the average absolute error is smaller than or equal to a threshold value, combining second data with the first data to obtain a second data set, otherwise, not combining the second data;

and continuously performing extension and merging operation, stopping extension when the extension reaches a minimum value or when both the spectrum data obtained by forward extension and backward extension are out of condition, and storing all the obtained spectrum data which can be merged into a data set corresponding to the maximum value.

Optionally, the combining the feature data strips into a feature data block based on the mean difference of the adjacent feature data strips includes:

if two adjacent spectrum data on the spectrum data sequence have the corresponding characteristic data strips, calculating the average difference value between the two corresponding adjacent characteristic data strips, and determining the two characteristic data strips with the average difference value smaller than or equal to the preset average difference value as one type of data;

the method comprises the steps of obtaining the time length of one type of data, determining one type of data with the time length being greater than or equal to the preset time length as a noise data block, and determining one type of data with the time length being less than the preset time length as a characteristic data block, wherein the characteristic data block visually represents a curved surface in a language spectrum space.

Optionally, comparing the feature data blocks based on the data block feature vectors of the feature data blocks, grouping the feature data blocks based on the comparison result to obtain a target data set, and obtaining the feature value of the target data set includes:

determining a data block shape feature vector of the feature data block using the sample vector based on the centroid projection of the feature data block;

calculating expansion coefficients and shape similarity between the characteristic data blocks based on the data block appearance feature vectors;

grouping the characteristic data blocks based on the expansion coefficient and the shape similarity coefficient to obtain a plurality of target data sets;

and placing the target data set into a sampling space, and extracting the characteristic value of the target data set through a convolution kernel of the sampling space.

Optionally, the determining the data block shape feature vector of the feature data block using the sample vector based on a centroid projection of the feature data block;

when the characteristic data block vertically translates along the y axis, a three-dimensional stereoscopic graph is formed by a curved surface of the characteristic data block in a language spectrum space and a z=0 plane;

respectively averaging the x coordinates and the y coordinates of all points in the three-dimensional stereo graph to obtain an average x coordinate value and an average y coordinate value, and determining a centroid projection point of the centroid of the characteristic data block on a z=0 plane based on the average x coordinate value;

determining a hemispherical curved surface and a spherical surface based on the coordinates of the centroid projection points and a preset radius, determining a vector endpoint based on the coordinates of the centroid projection points, the hemispherical curved surface and the spherical surface, and continuously determining a new vector endpoint;

taking the centroid projection point of the characteristic data block as a starting point, and taking the obtained vector end point and each new vector end point as end points to obtain a plurality of sampling vectors;

and determining a data block appearance characteristic vector OA based on the distance between the starting point and the intersection points of the plurality of sampling vectors and the three-dimensional stereo graph of the characteristic data block.

Optionally, the determining a hemispherical curved surface and a spherical surface based on the coordinates of the centroid projection point and a preset radius, and determining a vector endpoint based on the coordinates of the centroid projection point, the hemispherical curved surface and the spherical surface, and continuously determining a new vector endpoint includes:

determining a hemispherical curved surface based on the average x coordinate value, the average y coordinate value and a preset large sphere radius;；

determining an initial vector endpoint based on the average x coordinate value, the average y coordinate value and a preset large sphere radius, and determining a sphere based on the initial vector endpoint and the preset small sphere radius;

determining a point on the spherical surface meeting a preset condition as a vector end point;

determining a new spherical surface by taking the vector end point as a spherical center, wherein the intersection point of the new spherical surface and the hemispherical curved surface forms a new circle;

selecting the intersection point with the most circle as a new vector end point from points where the new sphere intersects the hemispherical curved surface and is contained in the new sphere;

the new vector end point is continuously selected until no new vector end point can be generated on the hemispherical surface.

Optionally, the grouping of the feature data blocks based on the expansion coefficient and the shape similarity coefficient obtains a plurality of target data sets:

if the expansion coefficients and the appearance similarity coefficients of the two characteristic data blocks meet the preset requirements, classifying the corresponding two characteristic data blocks into one type;

and calculating to obtain the time span of each type of characteristic data block, determining the type of which the time span is within a preset time range as a target data set, and correspondingly obtaining a plurality of target data sets.

Optionally, the placing the target data set into a sampling space, and extracting the eigenvalue of the target data set through a convolution kernel of the sampling space includes:

sequentially placing all characteristic data blocks in the target data set into a sampling space along an x-axis according to sampling time, wherein the x-axis of the sampling space is time, the y-axis is frequency, and the z-axis is amplitude;

the method comprises the steps that the average value of amplitude values of spectrum data contained in all characteristic data blocks of an area where a convolution kernel is located is taken as a characteristic value, the area convolution kernel completes sampling from the upper left to the lower left in an S-shaped motion mode in a sampling space, moves left and right and then moves up and down, U/2 units are moved left and right each time, and moves up and down each time by 100 units, wherein U is an experience time span;

and after the convolution is completed, 120 eigenvalues are obtained, and all eigenvalues of each target data set are recorded according to the obtained sequence.

Optionally, inputting the feature value into a pre-constructed neural network, and deleting the speech spectrum space data corresponding to the noise data block determined by the neural network to obtain the noise-reduction speech spectrum space data includes:

inputting the characteristic value into a pre-constructed neural network, and obtaining an identification tag of the characteristic value through the neural network, wherein the identification tag comprises broadcasting voice and other noise;

deleting the speech spectrum space data corresponding to the characteristic values of the other noise as the identification tag to obtain the noise reduction speech spectrum.

Compared with the prior art, the panoramic sound real-time data denoising method for the rebroadcasting vehicle provided by the invention has the advantages that the panoramic sound audio data to be denoised, which are acquired in real time, are subjected to fast Fourier transform, so that the speech spectrum space data are obtained; dividing the language spectrum space data along an x-axis to obtain a spectrum data sequence, and dividing the spectrum data sequence based on a maximum value and a minimum value to obtain a plurality of characteristic data strips; combining the characteristic data strips into characteristic data blocks based on the mean differences of the adjacent characteristic data strips; comparing the characteristic data blocks based on the data block characteristic vectors of the characteristic data blocks, grouping based on the comparison result to obtain a target data set, and obtaining characteristic values of the target data set; inputting the characteristic value into a pre-constructed neural network, deleting the language spectrum space data corresponding to the noise data block determined by the neural network, and obtaining the noise reduction language spectrum space data. Thus, the speech spectrum space data of the panoramic sound audio data to be reduced is segmented to obtain the characteristic data blocks, then the characteristic data blocks are compared according to the characteristic vectors of the data blocks to obtain the characteristic values, and then the noise is determined based on the characteristic values, so that the noise reduction is realized, and the noise reduction effect of the panoramic real-time data of the rebroadcasting vehicle is improved.

Drawings

FIG. 1 is a flow chart of an embodiment of a method for denoising panoramic sound real-time data for a rebroadcasting vehicle of the present invention;

FIG. 2 is a schematic diagram of a first refinement flow of an embodiment of a method for denoising panoramic acoustic real-time data for a rebroadcast in accordance with the present invention;

FIG. 3 is a schematic diagram of a second refinement flow of an embodiment of a method for panoramic acoustic real-time data noise reduction for a rebroadcasting vehicle of the present invention;

fig. 4 is a schematic diagram of a neural network involved in an embodiment of the panoramic acoustic real-time data noise reduction method for a rebroadcasting vehicle of the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, fig. 1 is a flowchart illustrating an embodiment of a method for reducing noise of panoramic sound real-time data of a rebroadcasting vehicle according to the present invention.

As shown in fig. 1, an embodiment of the present invention proposes a method for reducing panoramic sound real-time data noise of a rebroadcasting vehicle, the method comprising:

step S101, performing fast Fourier transform on panoramic sound audio data to be reduced acquired in real time to obtain language spectrum space data;

the scene that this embodiment relates to is that news rebroadcasting car is in the complex environment, to the people's voice reinforcing of the person of broadcasting in the panoramic sound noise reduction process. Broadcast audio data obtained in real time through radio equipment of the rebroadcasting vehicle are marked as panoramic sound audio data to be noise reduced.

Taking panoramic audio data to be noise reduced as input, performing fast Fourier transform, and obtaining a window size of，The empirical value of (2) is 1024 sampling points, other parameters are default parameters, the panoramic audio data to be noise reduced is converted into the speech spectrum space data, and the fast fourier transform is a known technology in the art and will not be repeated. The speech spectrum space data is represented as a three-dimensional visual map in the speech spectrum space, wherein the x-axis represents time, the y-axis represents frequency and the z-axis represents amplitude in the three-dimensional data. In the visualization, a Cartesian coordinate system is formed by an x-axis and a y-axis, wherein the size of each coordinate point corresponds to the frequency and amplitude of the audio. The language spectrum space is a three-dimensional Cartesian coordinate system formed by three-dimensional data.

The sample window size for the fast fourier transform isThe common sampling frequency of the rebroadcasting vehicle is V, the empirical value is 48 kilohertz, and the sampling time of each time is calculated to be aboutSecond, empirical value isSecond. In the embodiment, first, a feature data block is marked in a language spectrum space so as to calculate a feature value through the feature data block, and then whether the feature data block is noise is judgedA data block.

Step S102, dividing the language spectrum space data along an x-axis to obtain a spectrum data sequence, and dividing the spectrum data sequence based on a maximum value and a minimum value to obtain a plurality of characteristic data strips;

in the artificial audio denoising process, a technician discriminates complex environmental noise and human voice in a spectrogram through experience, and manually deletes the complex environmental noise. The present embodiment simulates a manual denoising process by the following operations.

Referring to fig. 2, fig. 2 is a schematic diagram of a first refinement flow of an embodiment of a panoramic acoustic real-time data noise reduction method for a rebroadcasting vehicle according to the present invention, as shown in fig. 2, the step S102 includes:

step S1021: dividing the audio time of the x-axis in the speech spectrum space data according to a preset duration to obtain a spectrum data sequence;

and carrying out data segmentation on the spectrum space data on the x-axis within a preset time length by taking t as a unit to obtain a spectrum data sequence, wherein each element in the spectrum data sequence represents different frequencies, and the size of the element in the spectrum data sequence represents the amplitude of the frequency. And smoothing the spectrum data sequence, namely carrying out mean filtering on the spectrum data sequence by using a window with the window length of 5. Thus, each element in the frequency spectrum data sequence is smoother, and too many extreme points are prevented from occurring in subsequent calculation.

Step S1022: deriving the spectrum data sequence after mean value filtering to obtain a derivative sequence, and marking convex points and concave points in the derivative sequence;

and (3) deriving the smoothed spectrum data sequence to obtain a derivative sequence, and marking convex points and concave points in the derivative sequence to obtain the convex points and concave points.

Step S1023: marking a maximum value corresponding to the convex point and a minimum value corresponding to the concave point in the frequency spectrum data sequence;

wherein the bump is the 0 point when the derivative decreases from positive to 0 to negative, corresponding to a maximum in the sequence of spectral data; the pits, i.e. the derivatives, rise from negative to 0 to positive 0 points, corresponding to minima in the sequence of spectral data.

Step S1024: and extending data in the front and back directions of the spectrum data sequence based on the maximum value to obtain a data group:

specifically, the maximum value is taken as a base point, first data are obtained by extending forwards on the spectrum data sequence, and the obtained first data and the maximum value form a first data group; and taking the maximum value of the language spectrum space data corresponding to the salient points in the derivative sequence as a starting point, and extending the maximum value to two ends of the language spectrum space data one by one. In this embodiment, the speech spectrum space data with the sampling time earlier is determined as the front, and the speech spectrum space data with the sampling time later is determined as the back. It will be appreciated that only the first data and the maximum value as the base point are included in the first data set.

Then, extending backwards on the spectrum data sequence to obtain second data, and calculating the average absolute error of the second data and spectrum data contained in the first data group; comparing the average absolute error with a threshold, combining the second data with the first data to obtain a second data set if the average absolute error is smaller than or equal to the threshold, wherein three data exist in the second data set at the moment, and otherwise, not combining the second data; the empirical value of the threshold in this embodiment is 80. The mean absolute error is the average of the absolute values of the deviations of all individual observations from the arithmetic mean. When calculating the average absolute error of a certain data set, calculating the average value of all the spectrum data contained in the data set, calculating the difference value between each spectrum data in the data set and the average value, taking the absolute value of the obtained difference value, and dividing the absolute value by one spectrum data to obtain the average absolute error.

And continuously performing extension and merging operation, stopping extension when the extension reaches a minimum value or when both the spectrum data obtained by forward extension and backward extension are out of condition, and storing all the obtained spectrum data which can be merged into a data set corresponding to the maximum value. That is, when the extension touches the minimum point, the extension is stopped in the direction, and two times of data can not be incorporated continuously during the extension, so that the extension is stopped.

Step S1025: and storing the data group of each maximum value in the spectrum data sequence as a characteristic data strip of the spectrum data sequence.

When the extending operation is finished, each maximum value point has a corresponding data group in the data sequence, and the data group obtained after the extending is called a characteristic data bar, so that the cutting of the frequency spectrum data sequence can be realized after the extending is finished, and the corresponding characteristic data bar is obtained. It will be appreciated that one sequence of spectral data corresponds to a number of characteristic data stripes, the number of characteristic data stripes being equal to the number of maxima obtained.

Step S103, combining the characteristic data strips into characteristic data blocks based on the mean value difference of the adjacent characteristic data strips;

The number of elements contained in the spectrum data sequences at two adjacent moments is the same, so that the spectrum data sequences can be in one-to-one correspondence. And (3) finishing the cut spectrum data sequence, wherein some spectrum data belong to one characteristic data strip, some data do not belong to the characteristic data strip, when the front spectrum sequence data and the rear spectrum sequence data are subjected to one-to-one correspondence, if the spectrum data have the characteristic data strip which belongs to both the two spectrum data sequences, calculating the mean value difference between the two characteristic data strips, and if the mean value difference is smaller than or equal to a preset mean difference E, connecting the two characteristic data strips into one type of data. The empirical value of the preset average difference E is taken to be 20 db.

Each type of data has a length in time, and when the length in time of one type of data is greater than or equal to a preset duration U, the data is marked as a noise data block. Because in speech communication, a segmented syllable occurs in a human voice, a block of speech feature data corresponding to the present embodiment appears without a long time span. In this embodiment, the empirical value of the preset duration U is taken to be 0.7 seconds.

And marking the data of the type with the time length smaller than the preset time length U as a characteristic data block. Each element in the characteristic data block has its coordinates in the speech spectrum space, so the characteristic data block visually appears as a curved surface in the speech spectrum space.

Step S104, comparing the characteristic data blocks based on the data block characteristic vectors of the characteristic data blocks, grouping based on the comparison result to obtain a target data set, and obtaining characteristic values of the target data set;

the characteristic data blocks screened based on time can also contain noise data blocks, and compared with the human voice characteristic data blocks corresponding to human voice, the noise characteristic data blocks are obviously different in time length, and the human voice characteristic data blocks are also expressed as follows: a plurality of data blocks with the same curved surface shape are arranged at the same moment, and the data blocks with the same curved surface shape are greatly different in shape at different moments; the noise characteristic data block is represented as: there are few similarly shaped data blocks at the same time, and there are multiple similarly shaped or even identical curved data blocks at different times due to the fact that noise tends to repeat cyclically. Noise can be further distinguished from human voice based on the three-dimensional stereo features of the feature data block.

Referring to fig. 3, fig. 3 is a schematic diagram of a second refinement flow of an embodiment of the panoramic acoustic real-time data noise reduction method for a rebroadcasting vehicle according to the present invention, as shown in fig. 3, the step S104 includes:

step S1041: determining a data block shape feature vector of the feature data block using the sample vector based on the centroid projection of the feature data block;

specifically, when the characteristic data block translates up and down along the y axis, a three-dimensional stereoscopic graph is formed by a curved surface of the characteristic data block in a language spectrum space and a z=0 plane;

respectively averaging the x coordinates and the y coordinates of all points in the three-dimensional stereo graph to obtain an average x coordinate value and an average y coordinate valueDetermining a centroid projection point of a centroid of the feature data block on a z=0 plane based on the average x-coordinate value; the centroid projection point O is (x 0, y0, 0) when the average x coordinate value is denoted as x0 and the average y coordinate value is denoted as y 0. The present embodiment requires calculation of all feature data blocks on the time axisIf the projected line segments are overlapped, the centroid projection points of the two characteristic data blocks are consideredAnd (5) overlapping.

specifically, determining a hemispherical curved surface based on the average x coordinate value x0, the average y coordinate value y0 and a preset large sphere radius; the preset large sphere radius is expressed as R, and the hemispherical surface may be expressed as:wherein the empirical value of the large sphere radius R is 15.

Determining an initial vector endpoint based on the average x coordinate value, the average y coordinate value and a preset large sphere radius, and determining a sphere based on the initial vector endpoint and the preset small sphere radius; will beIs determined as the initial vector endpoint. The spherical equation may be expressed as:wherein the empirical value is 1.

Determining a point on the spherical surface meeting a preset condition as a vector end point; in this embodiment, the intersection point of the spherical surface and the hemispherical surface is a circle, and the preset condition for determining the vector endpoint is that the distance between the circle and the hemispherical surface is equal to the circleRecently, i.e. distance on circleThe nearest point is determined as the endpoint vector.

Determining a new spherical surface by taking the vector end point as a spherical center, wherein the intersection point of the new spherical surface and the hemispherical curved surface forms a new circle; and taking the vector end point as a sphere center to form a new sphere with a radius r, wherein the intersection point of the sphere and the hemispherical curved surface forms a new circle, and the new circle is intersected with the circle corresponding to the initial vector end point.

Selecting the intersection point with the most circle as a new vector end point from points where the new sphere intersects the hemispherical curved surface and is contained in the new sphere; the new vector end point is continuously selected until no new vector end point can be generated on the hemispherical surface. And selecting the intersection point with the most circles as a new vector endpoint from points which are intersected by the hemispherical curved surface and are contained in the spherical surface, and repeatedly selecting the new vector endpoint until no new vector endpoint can be generated on the hemispherical curved surface. If there are a plurality of points of the same number of related circles, the closest point is selected as the vector end point. Each repetition may result in a new vector endpoint, ultimately resulting in a plurality of new vector endpoints.

Taking the centroid projection point of the characteristic data block as a starting point, and taking the obtained vector end point and each new vector end point as end points to obtain a plurality of sampling vectors; the projection point O of the centroid on the z=0 plane is taken as a centroid projection, the centroid projection point O is taken as a starting point, a vector end point and a new vector end point are taken as end points, a plurality of vectors are generated, the vectors are called sampling vectors, and the equation of the sampling vectors is thatWhereinIs the coordinates of the vector endpoint or the new vector endpoint.

And determining the outline feature vector of the data block based on the distance between the starting point and the intersection points of the plurality of sampling vectors and the three-dimensional stereo graph of the feature data block.

Because the vector calculation process is not random, the sampling vector of each three-dimensional stereo graph is the same and can be in one-to-one correspondence. In this embodiment, the intersection point of the sampling vector and the three-dimensional stereo graph of the feature data block is denoted as An, the distance from the centroid projection point O (i.e., the start point of the sampling vector) to the intersection point An is denoted as An, where n is the number of the vector, and An constitutes a one-dimensional vector.

Step S1042: calculating expansion coefficients among the characteristic data blocks and appearance similarity coefficients of the characteristic data blocks based on the appearance feature vectors of the data blocks;

in the embodiment, the expansion coefficient between the feature data blocks is built based on the element value difference value and the maximum element value at the corresponding positions in the profile feature vectors of the two data blocks, and the profile similarity coefficient is calculated based on the expansion coefficient and the profile feature vectors.

Specifically, the expansion coefficient between the characteristic data block j and the characteristic data block k is expressed asThe following steps are:

where N represents the total number of elements in the data block outline feature vector OA, N represents the position of the element in the data block outline feature vector OA, an, j represents the nth element in the data block outline feature vector OA, an, k represents the nth element in the data block outline feature vector OA, MAX (an, j, an, k) represents the maximum value of an, j, an, k.

Subtracting the corresponding elements of the corresponding positions in the appearance feature vectors, dividing the subtracted values by the maximum value in each pair of elements to obtain the size relation of the length of the intercepted sampling vector of the feature data block on each sampling vector, and simultaneously completing normalization; averaging the obtained data to obtain expansion coefficient. If j is toThe corresponding characteristic data block has a larger overall volume ratio k, thenIs positive and vice versaNegative, when the two are the same sizeIs 0;the absolute value of (c) represents the difference in the overall volume of the database, and the greater the absolute value, the greater the overall volume difference.

The volume ratio is not directly used as the expansion coefficient, but the formula is constructed as the expansion coefficient, because the expansion in the embodiment represents the proportional expansion of different parts of two characteristic data blocks, and the shape comparison relation of the different parts is included, so that the expansion is more accurate than the direct use of the volume ratio as the expansion coefficient.

The appearance similarity coefficient between the characteristic data block j and the characteristic data block k is expressed asThe following steps are:

in the embodiment, the data block appearance characteristic vector OA of the smaller characteristic data block is multiplied by the sum of 1 and expansion coefficient, and then the difference is taken as an absolute value with the data block appearance characteristic vector OA of the larger characteristic data block, so that the two characteristic data blocks are similar with smaller values; the value divided byAnd N is to normalize the final data; finally obtaining the appearance similarity coefficient，Is capable of representing the appearance similarity between two blocks of characteristic data,smaller represents a more similar appearance.

Step S1043: grouping the characteristic data blocks based on the expansion coefficient and the appearance similarity coefficient to obtain a plurality of target data sets;

specifically, determining the shape similarity degree of two characteristic data blocks based on the expansion coefficient and the data block appearance characteristic vector;

if the expansion coefficients and the shape similarity degree of the two characteristic data blocks meet the preset requirements, classifying the corresponding two characteristic data blocks into one type;

calculating all characteristic data blocks in the time axis while shifting the data blocks up and down along the spectrum axis, i.e., the y-axisIf the projected line segments are overlapped, the centroid projection points O of the two characteristic data blocks are overlapped, and the expansion coefficient of the two characteristic data blocks is calculatedAnd form similarity coefficientExperience is worthAnd is also provided withWhen two characteristic data blocks are classified into one type.

Because voice with overlong time span cannot occur in voice, the embodiment marks the characteristic data blocks with the time span more than or equal to 3U in each class as noise class data blocks; and classifying the characteristic data blocks with the time span smaller than 3U according to classes to obtain a target data set. The empirical value of U was taken to be 0.7s. Thus obtaining a target data set for each class.

Step S1044: and placing the target data set into a sampling space, and extracting the characteristic value of the target data set through a convolution kernel of the sampling space.

Specifically, each characteristic data block in the target data set is sequentially placed into a sampling space along an x-axis according to sampling time, wherein the x-axis of the sampling space is time, the y-axis is frequency, and the z-axis is amplitude;

since the hearing range of adults is 20 to 2000 hz, the time span of the human voice data set is 3U, based on which a vertical axis is constructedThe frequency ranges from 0 to 2000 Hz, transverse axisTime range of 0 to 3U, vertical axisA sampling space for the amplitude value; the earliest characteristic data block in the same data group is next toThe position is put into the sampling space of the device,value and value ofThe values remain unchanged, and so on all the characteristic data blocks in the data set are placed into the sampling space, regardless of the factThe direction is beyond the portion of the sampling space.

The method comprises the steps that the average value of amplitude values of spectrum data contained in all characteristic data blocks of an area where a convolution kernel is located is taken as a characteristic value, the area convolution kernel completes sampling from the upper left to the lower left in an S-shaped motion mode in a sampling space, moves left and right and then moves up and down, U/2 units are moved left and right each time, and moves up and down each time by 100 units, wherein U is an experience time span; wherein the side-to-side movement is along the x-axis and the up-and-down movement is along the y-axis.

The characteristic value is extracted by using convolution kernel in the sampling space, specifically: the length of the convolution kernel in the y direction is 100, the length of the convolution kernel in the x direction is U/2, and the average value of the amplitude values of all points in the included area is calculated each time to be taken as a characteristic valueThe convolution kernel completes sampling from the upper left to the lower left in the sampling space by S-shaped movement, moves left and right and then moves up and down, moves U/2 units left and right each time, and moves up and down 100 units each time.

And after the convolution is completed, 120 eigenvalues are obtained, and all eigenvalues of each target data set are recorded according to the obtained sequence. Obtaining 120 characteristic values, and recording the obtained characteristic values as。

Step S105, inputting the characteristic value into a pre-constructed neural network, deleting the speech spectrum space data corresponding to the noise data block determined by the neural network, and obtaining the noise-reduction speech spectrum space data.

Inputting the characteristic value into a pre-constructed neural network, and obtaining an identification tag of the characteristic value through the neural network, wherein the identification tag comprises broadcasting voice and other noise; deleting the speech spectrum space data corresponding to the characteristic values of the other noise as the identification tag to obtain the noise reduction speech spectrum.

In this embodiment, a neural network is pre-configured, and referring to fig. 4, fig. 4 is a schematic diagram of the neural network according to an embodiment of the panoramic sound real-time data noise reduction method for a rebroadcasting vehicle according to the present invention. As shown in fig. 4, the neural network includes an input layer, a Full Connection (FC) layer, a Softmax function layer, and an output layer.

The embodiment is based on obtaining panoramic sound real-time data of a large number of rebroadcasts, converting the panoramic sound real-time data into a spectrogram through fast Fourier change, and identifying a data set by adopting the method.

The data set is marked in the spectrogram, and is judged by related technicians through manual listening and labeled with { announcer voice, other noise }. And calculating 120 characteristic values of the corresponding data set, forming one piece of data together with the tag, and forming a data set by a plurality of pieces of data.

The input layer is 120 eigenvalues, the output is label { broadcast voice, other noise }, the loss function is cross entropy function, the optimizer adopts Adam (Adaptive Moment Estimation ) optimizer, and the process of training the neural network is not repeated in the prior art.

During specific operation, firstly, a data block and a target data group are divided, the target data group is judged through a neural network, a target characteristic data bar corresponding to the characteristic data block marked as other noise is determined, a target spectrum data sequence corresponding to the target characteristic data bar is continuously acquired, then target language spectrum space data corresponding to the target spectrum data sequence is continuously determined, finally, the target language spectrum space data is deleted from original language spectrum space data, and then a noise reduction spectrogram with the target language spectrum space data deleted is taken as output, non-white noise in a scene is removed, and panoramic sound real-time data noise reduction for a rebroadcasting vehicle is completed.

According to the scheme, the panoramic sound audio data to be reduced, which are acquired in real time, are subjected to fast Fourier transform, so that the language spectrum space data are obtained; dividing the language spectrum space data along an x-axis to obtain a spectrum data sequence, and dividing the spectrum data sequence based on a maximum value and a minimum value to obtain a plurality of characteristic data strips; combining the characteristic data strips into characteristic data blocks based on the mean differences of the adjacent characteristic data strips; comparing the characteristic data blocks based on the data block characteristic vectors of the characteristic data blocks, grouping based on the comparison result to obtain a target data set, and obtaining characteristic values of the target data set; inputting the characteristic value into a pre-constructed neural network, deleting the language spectrum space data corresponding to the noise data block determined by the neural network, and obtaining the noise reduction language spectrum space data. Thus, the speech spectrum space data of the panoramic sound audio data to be reduced is segmented to obtain the characteristic data blocks, then the characteristic data blocks are compared according to the characteristic vectors of the data blocks to obtain the characteristic values, and then the noise is determined based on the characteristic values, so that the noise reduction is realized, and the noise reduction effect of the panoramic real-time data of the rebroadcasting vehicle is improved.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention, and all equivalent structures or modifications in the process, or direct or indirect application in other related technical fields are included in the scope of the present invention.

Claims

1. A method for denoising panoramic acoustic real-time data for a rebroadcasting vehicle, the method comprising:

edge of the framexDividing the language spectrum space data by an axis to obtain a spectrum data sequence, and dividing the spectrum data sequence based on a maximum value and a minimum value to obtain a plurality of characteristic data strips;

inputting the characteristic value into a pre-constructed neural network, deleting the language spectrum space data corresponding to the noise data block determined by the neural network to obtain noise reduction language spectrum space data;

the data block feature vector based on the feature data block compares the feature data block, the grouping is performed based on the comparison result to obtain a target data set, and the obtaining the feature value of the target data set comprises the following steps:

placing the target data set into a sampling space, and extracting the characteristic value of the target data set through a convolution kernel of the sampling space;

the determining a data block shape feature vector of the feature data block using the sample vector based on centroid projection of the feature data block comprises:

at the edge of the characteristic data blockyWhen the shaft moves up and down, the characteristic data block is curved surface and is characterized byzPlane=0 constitutes a three-dimensional stereograph;

for all points in the three-dimensional stereo graphxCoordinates of,yThe coordinates are respectively averaged to obtain an average x coordinate value and an averageyCoordinate values based on the averagexCoordinate values of the averageyCoordinate values determine that the centroid of the feature data block is atzCentroid projection point on plane =0;

determining a data block outline feature vector OA based on the distance between the starting point and the intersection points of the plurality of sampling vectors and the three-dimensional stereo graph of the feature data block;

the determining the hemispherical curved surface and the spherical surface based on the coordinates of the centroid projection point and the preset radius, and determining the vector endpoint based on the coordinates of the centroid projection point, the hemispherical curved surface and the spherical surface, and continuously determining the new vector endpoint includes:

based on the averagexCoordinate values, wherein the average y coordinate value and the preset large sphere radiusDetermining a hemispherical curved surface;the method comprises the steps of carrying out a first treatment on the surface of the Wherein R represents a preset large sphere radius, < >>Represents the average x coordinate value>Represents an average y coordinate value;

based on the averagexCoordinate values of the averageyDetermining an initial vector end point according to the coordinate value and the preset large sphere radius, and determining a sphere based on the initial vector end point and the preset small sphere radius;

2. The method of claim 1, wherein the segmenting the speech spectrum space data along the x-axis to obtain a sequence of spectral data, and segmenting the sequence of spectral data based on maxima and minima to obtain a plurality of feature data bars comprises:

according to preset time length, the speech spectrum space data is subjected toxDividing the audio time of the shaft to obtain a frequency spectrum data sequence;

3. The method for reducing noise of panoramic acoustic real-time data for a rebroadster according to claim 2, wherein said extending data in both front and rear directions of said sequence of spectral data based on said maxima to obtain a data set comprises:

4. The method of reducing noise of panoramic acoustic real-time data for a rebroadster of claim 2, wherein said combining said strips of characteristic data into a block of characteristic data based on a mean difference of adjacent strips of characteristic data comprises:

5. The method for reducing noise of panoramic sound real-time data for a rebroadster according to claim 1, wherein said grouping of said characteristic data blocks based on said expansion coefficient and profile similarity coefficient obtains a plurality of target data sets:

6. The method for reducing noise of panoramic sound real-time data for a rebroadster according to claim 1, wherein said placing said target data set into a sampling space, extracting a characteristic value of said target data set by a convolution kernel of the sampling space comprises:

7. The method for reducing noise of panoramic sound real-time data of a rebroadcasting vehicle according to claim 1, wherein inputting the feature value into a pre-constructed neural network, deleting the speech spectrum space data corresponding to the noise data block determined by the neural network to obtain noise-reduced speech spectrum space data comprises: