EP4066389A1

EP4066389A1 - Method and device for processing an audio signal

Info

Publication number: EP4066389A1
Application number: EP20807468.2A
Authority: EP
Inventors: Farhan Mirani; Yuehgoh FOUTSE; Hugo LEVARD; Sabrine LAZRAK
Original assignee: PSA Automobiles SA
Current assignee: Stellantis Auto SAS
Priority date: 2019-11-26
Filing date: 2020-10-21
Publication date: 2022-10-05
Also published as: FR3103660B1; FR3103660A1; WO2021105574A1

Abstract

The invention relates to a method and device for processing an audio signal, for example a signal received by a vehicle (103) and converted in a tuner (104). To that end, the audio signal is sampled in order to obtain a vector of values of amplitudes each associated with a sample of the signal. The input vector is passed through a plurality of convolutional layers of a neural network in order to obtain, as output, a classification of the audio signal, a class representative of a noise level being assigned to the audio signal.

Description

DESCRIPTION

Title: Method and device for processing an audio signal

Technical area

The invention relates to a method and a device for processing an audio signal, in particular an audio signal from a radio receiver of a vehicle. The invention also relates to a method and a device for classifying the noise level of an audio signal.

Technological background

Contemporary vehicles are generally equipped with radio reception systems, for the entertainment or information of the driver and / or his passengers, for the reception of service information (for example information relating to road traffic) and / or for communication of the vehicle with fixed equipment (eg antennas or roadside units) or mobile (eg other vehicles).

These systems make it possible to pick up radiofrequency signals emitted by surrounding sources, these signals carrying information intended to be received and decoded by the vehicle. These signals correspond, for example, to radio signals of the FM type (standing for “Frequency Modulation” in English or in French for “Frequency Modulation”) which are then converted into audio signals to be transmitted to one or more loudspeakers fitted to the vehicle. , which make an audio signal perceptible by the driver and / or passengers of the vehicle.

The performance of these systems depends on many factors, such as the intrinsic quality of the components of these systems, the arrangement of the components in the vehicle, the cog speed of the vehicle, the environment of use (e.g. example urban or peri-urban environment, mountain area), etc. To improve the quality of these systems and in particular the quality of the audio signal perceived by the driver and / or the passengers, it is important to test these systems in real operating conditions, which involves a large number of tests as well as the conditions of use. are varied and the factors influencing system performance are numerous.

EP 1093245 discloses a method for testing the quality of the components of a vehicle radio reception system. This process is based on the use of two vehicles, each comprising a system to be tested and its own recording device. Both vehicles travel on the same road and acquire signals as they travel. The signals are compared to determine the performance level of the system under test.

Such an approach, like the known test approaches to test the performance of a radio reception system in a prototype vehicle, involves multiple operators or testers. These people sit in several vehicles, listen to, analyze and compare the quality of the audio signals received in a vehicle under test with the quality of a signal received in a reference vehicle which is traveling at the same time on the same stretch of road. The performance of the system is then established on the basis of the subjective perceptions of the testers in the test vehicle, these performances being evaluated and compared in real time with the subjective perceptions of the testers in the reference vehicle.

Such an approach has significant limitations. It assumes, for example, a great homogeneity of the respective perception of the audio quality by the different testers. The relevance of the comparison may otherwise be insignificant. In addition, there is a need to train personnel with the technical skills required to analyze such signals. Such training can be complex and expensive. In addition, the resources mobilized during the tests are significant. Typically, for testing an FM radio antenna, two vehicles are used and each must travel between 75 and 100 km for approximately 1 hour to get relevant results.

Finally, such tests are only carried out in real time. No recording of the tested signals is made. Therefore, the tests are not reproducible. They are not performed only once. If they are repeated, for example after having modified the implantation of a particular component of the reception chain in the vehicle, there is no guarantee that the same test conditions can be reproduced to assess the improvement of the situation. with relevance and objectivity.

Summary of the invention

An object of the present invention is to improve the evaluation of the quality of an audio signal.

According to a first aspect, the invention relates to a method for processing an audio signal, the method being implemented in a device for processing an audio signal implementing a neural network, the method comprising the following steps:

- receiving an input vector comprising a determined number of values each representative of an amplitude of a sample of the audio signal;

- determination of a first matrix by applying to the input vector 100 first convolution filters of size 1x7 with a step of 1;

- determination of a second matrix by applying to the first matrix 100 second convolution filters of size 1x7 with a step of 1;

- determination of a third matrix by applying to the second matrix a first so-called “pooling” operation without overlap with a filter of size 1x3, the maximum value of each sub-matrix obtained from the second matrix with the filter being retained during said first operation;

- determination of a fourth matrix by applying to the third matrix 128 third convolution filters of size 1x7 with a step of 1;

- determination of a fifth matrix by applying to the fourth matrix a second so-called “pooling” operation without overlap with a filter of size 1x3, the maximum value of each sub-matrix obtained from the second matrix with the filter being retained during the second operation;

- determination of a sixth matrix by application to the fifth matrix of 128 fourth convolution filters of size 1x7 with a step of 1, the size of the sixth matrix being equal to 1xNx128, with N a natural integer;

- determination of an output vector by applying to the sixth matrix a third so-called “pooling” operation without overlap with a filter of size 1xN, the mean value of each sub-matrix obtained from the sixth matrix with the filter being retained during the third operation;

- determination of a class representative of a noise level of the audio signal from a layer of densely connected neurons having as input the output vector and as output a number of neurons less than 10, each output neuron corresponding to a class representative of a different noise level.

According to a variant, an operation of random deactivation of a part of the neurons of the network is associated with the first operation of “pooling” with a probability of 0.3, an operation of random deactivation of a part of the neurons of the network is associated with the second “pooling” operation with a probability of 0.1 and a random deactivation operation of part of the neurons of said network is associated with the third “pooling” operation with a probability of 0.2.

According to another variant, the determined number of values of said input vector is equal to 16000.

According to an additional variant, the audio signal is obtained by converting a radiofrequency signal by a tuner of a vehicle.

According to an additional variant, the size of the input vector is equal to 1x16000, the size of the first matrix is equal to 1x15994x100, the size of the second matrix is equal to 1x15988x100, the size of the third matrix is equal to 1x5329x100, the size of the fourth matrix is 1x5323x128, the size of the fifth matrix is 1x1774x128, the size of the sixth matrix is 1x1768x128, and the size of the output vector is 1x1x128.

According to yet another variant, the number of output neurons is equal to 6. According to a further variant, the method further comprises a step of processing the audio signal as a function of the class representative of a noise level associated with the signal in order to at least partially correct the noise.

According to yet another variant, the method further comprises a step of learning the values included in the convolution filters.

According to a second aspect, the invention relates to a device for processing an audio signal, the device comprising a memory associated with a processor configured for implementing the steps of the method according to the first aspect of the invention.

According to a third aspect, the invention relates to a computer program which comprises instructions adapted for the execution of the steps of the method according to the first aspect of the invention, this in particular when the computer program is executed by at least one. processor.

Such a computer program can use any programming language, and be in the form of a source code, an object code, or an intermediate code between a source code and an object code, such as in a partially compiled form, or in any other desirable form.

According to a fourth aspect, the invention relates to a recording medium readable by a computer on which is recorded a computer program comprising instructions for carrying out the steps of the method according to the first aspect of the invention.

On the one hand, the recording medium can be any entity or device capable of storing the program. For example, the medium may comprise a storage means, such as a ROM memory, a CD-ROM or a ROM memory of the microelectronic circuit type, or else a magnetic recording means or a hard disk.

On the other hand, this recording medium can also be a transmissible medium such as an electrical or optical signal, such a signal being able to be conveyed via an electrical or optical cable, by conventional or hertzian radio or by self-directed laser beam. or by other means. The computer program according to the invention can in particular be downloaded from an Internet type network.

Alternatively, the recording medium can be an integrated circuit in which the computer program is incorporated, the integrated circuit being adapted to execute or to be used in the execution of the method in question.

Brief description of the figures

Other characteristics and advantages of the invention will emerge from the description of the non-limiting embodiments of the invention below, with reference to Figures 1 to 4 attached, in which:

[Fig. 1] schematically illustrates a system for evaluating the quality of an audio signal received by a vehicle, according to a particular embodiment of the present invention;

[Fig. 2] schematically illustrates the audio signal of FIG. 1, according to a particular embodiment of the present invention;

[Fig. 3] illustrates a flowchart of the different steps of a method for processing the audio signal of FIG. 2, according to a particular embodiment of the present invention.

[Fig. 4] schematically illustrates a device for processing the audio signal of FIG. 2, according to a particular embodiment of the present invention.

Description of the embodiments

A method and a device for processing an audio signal will now be described in what follows with reference in conjunction with FIGS. 1 to 4. The same elements are identified with the same reference signs throughout the description which follows. . According to a particular and non-limiting example of an embodiment of the invention, a method for processing an audio signal implemented in a neural network comprises receiving a vector at the input of the processing device, the vector comprising a number determined of values each representing the amplitude of a sample of the processed or evaluated audio signal. The audio signal is for example sampled so as to obtain 16,000 samples with for each of these samples a value representative of the amplitude of the signal. A first matrix is obtained by applying 100 first convolution filters of size 1x7 with a step (from the English “stride”) of 1 to the input vector. The size of the first matrix is for example 1x15994x100. A second matrix is obtained by applying 100 second convolution filters of size 1x7 with a step of 1 to the first matrix. The size of the second matrix is for example 1x15988x100. A third matrix is obtained by applying a first so-called “pooling” operation without overlap with a filter of size 1x3 to the second matrix. The first “pooling” operation advantageously corresponds to a so-called “maximum pooling” operation for which the maximum value of each sub-matrix obtained by applying the filter to the second matrix is retained. The size of the third matrix is for example 1x5329x100. A fourth matrix is obtained by applying 128 third convolutional filters of size 1x7 with a step of 1 to the third matrix. The size of the fourth matrix is for example 1x5323x128. A fifth matrix is obtained by applying a second operation called “maximum pooling” without overlap with a filter of size 1x3 to the fourth matrix. The size of the fifth matrix is equal for example 1x1774x128. A sixth matrix is obtained by applying 128 third convolutional filters of size 1x7 with a step of 1 to the fifth matrix. The size of the sixth matrix is 1xNx128, for example equal to 1x1768x128. Finally, an output vector is obtained by applying a third operation called “average pooling” without overlap with a filter of size 1xN to the sixth matrix. The size of the output vector is for example 1x1x128. The input audio signal is classified, that is, a class (also called a label) is associated with the audio signal by applying the output vector to a layer of densely connected neurons, this layer having at output a number of neurons less than 10 (for example equal to 6), each output neuron corresponding to a class representative of a different noise level.

Such a method makes it possible to test the quality of an audio signal by determining to which noise level class the audio signal belongs, without human intervention. Furthermore, the use of such a method allows the system to learn as the audio signals to be classified are processed, which allows the classification results to be refined. Furthermore, the evaluation of the quality of the audio signal is all the more relevant as the analysis is carried out on the sampled audio signal as a whole, and not on certain fragmented and / or random characteristics of the signal.

The data from the various operations implemented in the invention are represented in the form of a vector, matrix or tensor. For simplification purposes, the sizes (or dimensions) of each of the vectors or matrices will be provided in the format corresponding to the size or dimension of a tensor, i.e. with three dimensions AxBxC, where A , B and C correspond to the numbers of rows, columns and channels.

Thus, a vector is a tensor for which 2 of the 3 dimensions are equal to 1 and a matrix is a tensor for which 1 of the 3 dimensions is equal to 1.

[Fig. 1] schematically illustrates a system for evaluating the quality of an audio signal, according to a particular and non-limiting exemplary embodiment of the present invention.

The system 1 comprises a vehicle 103, for example a motor vehicle, receiving a radiofrequency signal 102 via one or more onboard antennas in or on the vehicle 103. The radiofrequency signal corresponds for example to an FM-type radio signal carrying the data of the signal. audio to analyze, evaluate or test. The signal 102 is for example transmitted via one or more transmitting antennas 101. A tuner 104 on board the vehicle 103 receives the radiofrequency signal 102 via the receiving antenna on board the vehicle and converts it into an audio signal. . VS audio signal is then transmitted to one or more speakers of the vehicle 103 for a reproduction or a rendering of this audio signal perceptible by the driver and / or the passengers of the vehicle 103 in the form of sound waves.

The audio signal obtained from the tuner 104, or part of this signal (for example 1 second or a few seconds of this signal), is for example transmitted directly to a device or a processing unit 106 configured to implement the processing method of the audio signal of the invention with a view to evaluating this audio signal, that is to say with a view to obtaining one or more pieces of information representative of the quality of the audio signal.

According to a variant, the audio signal obtained from the tuner 104 (or part of this signal) is recorded, that is to say stored in a data storage device (for example a memory) 105 before being transmitted to the. processing device 106. The storage of the audio signal makes it possible for example to repeat the evaluation of the signal, for example to compare the latter with a processed or corrected version of this recorded signal. The recording of the signal also allows a subsequent processing of this signal, for example in a test laboratory, which avoids the need to embark the processing unit 106 in the vehicle 103 to carry out the tests and which makes it possible to carry out the test. all the tests in a controlled environment, away from the noise of the road environment in which the vehicle 103 operates, for example. According to another example, the recording of the signal (or of several signals) also makes it possible to carry out one or more processing operations of this signal before implementing the evaluation of the audio signal.

[Fig. 2] illustrates a diagram 2 representing the audio signal 20 to be evaluated, according to a particular and non-limiting embodiment of the present invention.

Diagram 2 represents an evolution of the amplitude of an audio signal 20 received from the tuner 104 as a function of time. The time is represented on the abscissa and the amplitude of signal 20 is represented on the ordinate. According to the particular example of FIG. 2, the amplitude of signal 20 is normalized between a minimum value and a maximum value, for example between -5 and 5, -2 and 2, -1 and 1. The audio signal 20 evaluated by example a duration of 1 second. According to other examples, the duration of signal 20 is a few seconds, for example between 2 and 8 seconds, for example equal to 5 seconds.

The audio signal 20 is advantageously sampled before being processed or classified according to the method described with reference to FIG. 3. The audio signal 20 is for example sampled at a frequency of 16 kHz, which makes it possible to obtain 16,000 samples of the signal. 20 when the latter has a duration of 1 second, as in the example of FIG. 2. Such a sampling frequency makes it possible to obtain a large number of samples and to have a faithful representation of the audio signal 20, even sampled for evaluation or classification.

[Fig. 3] illustrates a flowchart of the various steps of a process for processing an audio signal, for example the audio signal 20 of FIG. 2, according to a particular and non-limiting example of the present invention. The method is advantageously implemented in the device 106.

In a first step 31, a determined number of values each representing a value of an amplitude of a sample of the audio signal 20 are received. These data are for example received from a memory, for example a buffer memory or a flash type memory, in which the signal samples obtained after sampling the audio signal 20 are stored. Example given with reference to figure 2, 16000 values are received in the form of an input vector of size 1x16000x1.

In a second step 32, a first matrix is determined or obtained by applying 100 first convolution filters of size 1x7, with a step (from the English “stride”) of 1, to the input vector obtained in step 31 In other words, the input vector passes through a first convolutional layer and convolution operations are applied to this input vector on the basis of 100 first filters of size 1x7, with a step of 1. Following the example provided with the input vector, the size of the first matrix is equal to 1x15994x100. The product of the first convolutional layer is a first matrix (also called the characteristic map (of the output feature map or activation map, the OxOxC dimensions of which are obtained by:

[Equation 1]

0 = (I-F + 2P) / S + 1,

With O a dimension of the first matrix, I the side of the input volume (the input vector in the case of the first matrix), F the size of the filter, P the size of the overlap (from the English "padding ”), This size being equal to 0 in the specific examples described below, S the pitch and C the number of channels.

Regarding the first matrix, 15994 = (16000 - 7) / 1 + 1 and C = 100, corresponding to the number of first filters.

The step (or "stride" in English) corresponds to the number of pixels by which the window corresponding to the filter moves in the input tensor (input vector in this case).

In a third step 33, a second matrix is determined or obtained by applying 100 second convolution filters of size 1 x7, with a step of 1, to the first matrix obtained in step 32. In other words, the first matrix passes through a second convolution layer and convolution operations are applied to this first matrix on the basis of 100 second filters of size 1x7, with a step of 1. Following the example provided with the input vector of dimension 1 x16000x1, the size of the second matrix is equal to 1x15988x100, with 15988 = (15994 - 7) / 1 + 1.

In a fourth step 34, a third matrix is determined or obtained by applying to the second matrix obtained in step 33 a first so-called “pooling” operation, and more specifically a first “pooling by maximum” operation. (from the English "max pooling"). This first operation is based on the use of a 1x3 filter, without overlap, i.e. the filter (a window of 1 pixel by 3 pixels if we consider that the matrices correspond to arrays of pixels) moves in the second matrix without overlap (ie there is no common pixel between the sub-matrices obtained from the second matrix by application of the filter moving in this second matrix). The first one The “maximum pooling” operation corresponds to a spatial reduction in the size of the second matrix by selecting only one value, the maximum value, in each sub-matrix obtained from the second matrix via the 1x3 filter. Thus, for each sub-matrix of size 1x3, only one value among the 3 is selected and kept, that is to say the maximum value among the 3 values. Following the example provided with the input vector of dimension 1 x16000x1, the size of the third matrix is equal to 1x5329x100, with 5329 ^¾ (15988/3).

According to an alternative embodiment, the first "pooling" operation is matched or associated with a random deactivation operation of some of the neurons of the network, with a probability of 0.3 (30%). This technique is known as “dropout”.

In a fifth step 35, a fourth matrix is determined or obtained by applying 128 third convolutional filters of size 1x7, with a step of 1, to the third matrix obtained in step 34. In other words, the third matrix passes into a third layer of convolution and operations of convolutions are applied to this third matrix on the basis of 128 second filters of size 1x7, with a step of 1. Following the example provided with the input vector of dimension 1 x16000x1, the size of the fourth matrix is equal to 1x5323x128, with 5323 = (5329 - 7) / 1 + 1.

In a sixth step 36, a fifth matrix is determined or obtained by applying to the fourth matrix obtained in step 35 a second so-called “pooling” operation, and more specifically a second “maximum pooling” operation. (from the English "max pooling"). This second operation is based on the use of a 1x3 filter, without overlap. Following the example provided with the input vector of dimension 1 x16000x1, the size of the fifth matrix is equal to 1 x1774x100, with 1774 ^" (5323/3).

According to an alternative embodiment, the second "pooling" operation is matched or associated with a random deactivation operation ("dropout") of part of the neurons of the network, with a probability of 0.1 (10%).

In a seventh step 37, a sixth matrix is determined or obtained by applying 128 fourth convolution filters of size 1x7, with a step of 1, to the fifth matrix obtained in step 36. In other words, the fifth matrix passes into a fourth convolution layer and convolution operations are applied to this fifth matrix on the basis of 128 second filters of size 1x7, with a step of 1. Following the example provided with the input vector of dimension 1x16000x1, the size of the sixth matrix is equal to 1x1768x128, with 1768 = (1774 - 7) / 1 + 1.

In an eighth step 38, a seventh matrix is determined or obtained by applying to the sixth matrix obtained in step 37 a third operation called “pooling”, and more specifically a third “pooling by average” operation. (from the English "average pooling"). This third operation is based on the use of a 1xN filter (N corresponding to the 1xNx128 dimension of the sixth matrix, or 1768 according to the particular example). The third operation of "pooling by average" corresponds to a spatial reduction in the size of the sixth matrix by selecting only one value, the average of the N values, for each of the 128 channels. Thus, for each channel or each sub-matrix of size 1xN, a single value (the average of the N values) is taken into account. Following the example provided with the input vector of dimension 1x16000x1, the size of the seventh matrix is equal to 1x1x128. This seventh matrix is also called the output vector since 2 of the 3 dimensions of the 1x1x128 tensor are equal to 1.

According to an alternative embodiment, the third "pooling" operation is matched or associated with a random deactivation operation ("dropout") of part of the neurons of the network, with a probability of 0.2 (20%).

In a ninth step 39, the output vector passes through a layer of densely connected neurons, with 128 neurons each connected to each of the neurons of a layer comprising for example 6 neurons, a class representative of a different noise level being associated to each of these 6 neurons. The input audio signal is thus classified, that is to say that a class (also called label) is associated with the audio signal by applying the output vector to a layer of densely connected neurons, this layer having at output for example 6 neurons each corresponding to a class representative of a different noise level. The table below shows an example of 6 noise classes, identified from 0 to 5: [Table 1]

Of course, the number of output neurons is not limited to 6 and is for example equal to 4, 8 or 10 and advantageously less than 10.

The values of the first filters are advantageously determined in a learning phase, according to a method known to those skilled in the art. In a learning phase, a large number of input audio signals of which the associated class (ie the noise level) is known are used to learn the different values or coefficients of the convolution filters. In such a learning phase, a method known as the backpropagation of the error gradient is, for example, implemented.

By way of example, the values included in one of the first, second, third or fourth convolution filters are equal to: - 0.07848639; 0.04320003; 0.0651957; 0.054399617; 0.033398744; - 0.006850287; and 0.07814157.

The number of input samples may be different from 16000, for example equal to 10,000 or 20,000 or 32,000. The number of samples of the input vector is for example taken into account as a learning parameter, having an impact on the values included in the convolution filters used in the different convolution layers. According to an optional variant embodiment, a particular processing is applied to the audio signal as a function of the class to which the signal has been associated in step 39, the processing being for example implemented to improve the quality of the signal by reducing the noise. . So-called spectral subtraction methods, for example based on Wiener filtering or Lim filtering, are for example implemented.

Such a classification via a neural network has the advantage that the system implementing the method (the neural network) finds or learns the properties of interest to make the classification (for example the values included in the convolution filters) , which makes it possible to be freed from arbitrary and biased choices that a human being would make for example.

[Fig. 4] schematically illustrates a device 106 configured to process an audio signal such as the audio signal 20, for example to classify the noise level of such a signal, according to a particular and non-limiting embodiment of the present invention.

The device 106 is for example configured for the implementation of the operations described with reference to FIGS. 1 and 2 and / or the steps of the method described with regard to FIG. 3. Examples of such a device 106 include, without being there limited, a computer, a server, a smart phone, a tablet, a calculator. The elements of the device 106, individually or in combination, can be integrated in a single integrated circuit, in several integrated circuits, and / or in discrete components. The device 106 can be produced in the form of electronic circuits or software (or computer) modules or else a combination of electronic circuits and software modules.

The device 106 comprises one (or more) processor (s) 40 configured to execute instructions for carrying out the steps of the method and / or for executing the instructions of the software (s) embedded in the device 106. The processor 40 can include integrated memory, an input / output interface, and various circuits known to those skilled in the art. The device 106 further comprises at least one memory 41 corresponding, for example, to a volatile and / or non-volatile memory and / or comprises a memory storage device which may comprise memory. volatile and / or non-volatile, such as EEPROM, ROM, PROM, RAM, DRAM, SRAM, flash, magnetic or optical disk.

The computer code of the on-board software (s) comprising the instructions to be loaded and executed by the processor is for example stored in the first memory 41.

According to a particular and non-limiting embodiment, the device 106 comprises a block 42 of interface elements for communicating with external devices, for example a remote server or the “cloud” or else the tuner 104. The elements of interface of block 42 include one or more of the following interfaces:

- RF radiofrequency interface, for example of the Bluetooth® or Wi-Fi® type, LTE (from English "Long-Term Evolution" or in French "Long-term Evolution"), LTE-Advanced (or in French LTE-advanced );

- USB interface (from English "Universal Serial Bus" or "Bus Universel en Série" in French);

- FIDMI interface (from English "High Definition Multimedia Interface", or "Interface Multimedia Flaute Definition" in French).

According to another particular embodiment, the device 106 comprises a communication interface 43 which makes it possible to establish communication with other devices via a communication channel 430. The communication interface 43 corresponds for example to a transmitter configured for transmitting and receiving information and / or data via the communication channel 430.

According to a further particular embodiment, the device 106 can provide and / or receive output signals to one or more external devices, such as a keyboard 440, a mouse 450 and / or a screen 460 respectively via interfaces of input / output 44, 45 and 46. According to a variant, one or the other of the external devices is integrated into the device 106. The display screen 460 corresponds for example to a touch screen. Of course, the invention is not limited to the embodiments described above but extends to a method of classifying the noise level of an audio signal, and to the device configured for the implementation of such process. The invention also relates to a method for evaluating an audio signal and more particularly to a method for evaluating the quality of an audio signal, and to the device configured for implementing such a method.

The invention also relates to a vehicle comprising the device 106.

Claims

1. A method of processing an audio signal (20), said method being implemented in an audio signal processing device (106) implementing a neural network, said method comprising the following steps:

- reception (31) of an input vector comprising a determined number of values each representative of an amplitude of a sample of said audio signal (20);

- determination (32) of a first matrix by applying to said input vector 100 first convolution filters of size 1x7 with a step of 1;

- determination (33) of a second matrix by applying to said first matrix 100 second convolution filters of size 1x7 with a step of 1;

- determination (34) of a third matrix by applying to said second matrix a first so-called “pooling” operation without overlap with a filter of size 1x3, the maximum value of each sub-matrix obtained from said second matrix with said filter being retained in said first operation;

- determination (35) of a fourth matrix by applying to said third matrix 128 third convolution filters of size 1x7 with a step of 1;

- determination (36) of a fifth matrix by applying to said fourth matrix a second so-called “pooling” operation without overlap with a filter of size 1x3, the maximum value of each sub-matrix obtained from said second matrix with said filter being retained in said second operation;

- determination (37) of a sixth matrix by applying to said fifth matrix 128 fourth convolution filters of size 1x7 with a step of 1, the size of said sixth matrix being equal to 1xNx128, with N a natural integer;

- determination (38) of an output vector by applying to said sixth matrix a third so-called “pooling” operation without overlap with a filter of size 1xN, the mean value of each sub-matrix obtained from said sixth matrix with said filter being retained in said third operation;

- determination (39) of a class representative of a noise level of said audio signal (20) from a layer of densely connected neurons having said input as said output vector and at output a number of neurons less than 10, each output neuron corresponding to a class representative of a different noise level.

2. Method according to claim 1, for which a random deactivation operation of a part of the neurons of said network is associated with said first “pooling” operation with a probability of 0.3, a random deactivation operation of a part. neurons of said network is associated with said second "pooling" operation with a probability of 0.1 and a random deactivation operation of some of the neurons of said network is associated with said third "pooling" operation with a probability of 0.2.

3. The method of claim 1 or 2, wherein said determined number of values of said input vector is equal to 16000.

4. Method according to one of claims 1 to 3, wherein said audio signal (20) is obtained by converting a radio frequency signal by a tuner (104) of a vehicle (103).

5. Method according to one of claims 1 to 4, for which the size of the input vector is equal to 1x16000, the size of the first matrix is equal to 1x15994x100, the size of the second matrix is equal to 1x15988x100, the size of the third matrix is equal to 1x5329x100, the size of the fourth matrix is equal to 1x5323x128, the size of the fifth matrix is equal to 1x1774x128, the size of the sixth matrix is equal to 1x1768x128, and the size of the output vector is equal to 1x1x128.

6. Method according to one of claims 1 to 5, for which the number of output neurons is equal to 6.

7. Method according to any one of claims 1 to 6, further comprising a step of processing said audio signal as a function of said class representative of a noise level associated with said signal to at least partially correct the noise.

8. Method according to any one of claims 1 to 7, further comprising a step of learning the values included in said convolution filters.

9. Device (106) for processing an audio signal, said device comprising a memory (41) associated with at least one processor (40) configured for implementing the steps of the method according to any one of claims 1 to 8.

10. Computer program product comprising instructions adapted for the execution of the steps of the method according to one of claims 1 to 8, when the computer program is executed by at least one processor.