CN118098278A - Audio processing method, device, equipment and storage medium - Google Patents

Audio processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN118098278A
CN118098278A CN202410116328.3A CN202410116328A CN118098278A CN 118098278 A CN118098278 A CN 118098278A CN 202410116328 A CN202410116328 A CN 202410116328A CN 118098278 A CN118098278 A CN 118098278A
Authority
CN
China
Prior art keywords
audio
warning sound
vehicle speed
pedestrian warning
calibration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410116328.3A
Other languages
Chinese (zh)
Inventor
闫启东
张冠男
宫宇
王运航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Great Wall Motor Co Ltd
Original Assignee
Great Wall Motor Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Great Wall Motor Co Ltd filed Critical Great Wall Motor Co Ltd
Priority to CN202410116328.3A priority Critical patent/CN118098278A/en
Publication of CN118098278A publication Critical patent/CN118098278A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The application provides an audio processing method, an audio processing device, audio processing equipment and a storage medium. According to the method, pedestrian warning sound audio to be processed is input into a trained target model, and a gain value of the target model output after the pedestrian warning sound audio is calibrated at the internal vehicle speed of a preset vehicle speed range is obtained, wherein the target model is obtained based on the training of a plurality of pedestrian warning sound audio and the gain value of the corresponding audio calibrated at the internal vehicle speed of the part, so that a mode for processing the audio is provided, the gain value of the pedestrian warning sound audio calibrated at the internal vehicle speed of the preset vehicle speed range is rapidly obtained, the audio is calibrated, the diversified requirements of users are met, and an automobile manufacturer or a professional sound engineer is not required to calibrate, so that the calibration process is simplified, the vehicle speed of the audio calibration is improved, and the user can update the pedestrian warning sound simply and conveniently.

Description

Audio processing method, device, equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to an audio processing method, apparatus, device, and storage medium.
Background
With the continuous increase of the travel demands of people, automobiles have become the preferred exit vehicles for most people. For some automobiles, such as new energy automobiles, the noise is low during the running process, and in a low-speed state, such as the running speed is within 30km/h, pedestrians, non-motor vehicles and other road users are difficult to perceive the approaching of the vehicles, so that potential safety hazards exist. For this reason, regulations have been enacted in many areas requiring such vehicles to be equipped with pedestrian warning tone systems to address this problem.
In the related art, in order to ensure that the audio of the pedestrian warning sound meets the requirements of regulations and sound quality, audio calibration is required. However, many of the audio calibration needs to be performed by an automobile manufacturer or a professional sound engineer, for example, the automobile manufacturer or the professional sound engineer is required to obtain a gain value after the audio calibration of the pedestrian warning sound at the speed of less than 30km/h, which limits the audio that can be used by the user, so that the user cannot select the pedestrian warning sound that the user likes. And only if other replaceable audios are developed by automobile manufacturers or professional sound engineers later, the pedestrian warning tone can be updated by users, and the process is complicated.
Disclosure of Invention
The embodiment of the application provides an audio processing method, an audio processing device, audio processing equipment and a storage medium, which are used for obtaining a gain value after the pedestrian warning sound audio is calibrated at a vehicle speed in a preset vehicle speed range and calibrating the pedestrian warning sound audio.
In a first aspect, an embodiment of the present application provides an audio processing method, including:
acquiring pedestrian warning sound audio to be processed;
Inputting the pedestrian warning sound audio to be processed into a target model to obtain a gain value of the pedestrian warning sound audio after calibration in a preset vehicle speed range output by the target model; the target model is obtained through training based on a plurality of pedestrian warning sound audios and gain values after corresponding audio calibration under the partial vehicle speed.
In one possible implementation manner, after the pedestrian warning sound audio to be processed is input into the target model to obtain the gain value of the target model output after the calibration of the pedestrian warning sound audio in the preset vehicle speed range, the method further includes:
Based on the gain value of the pedestrian warning sound after audio calibration in the partial vehicle speed and a pre-stored relation between each vehicle speed in the preset vehicle speed range and the gain value after audio calibration, determining the gain value of the pedestrian warning sound after audio calibration in each vehicle speed in the preset vehicle speed range;
and obtaining the calibrated pedestrian warning sound frequency under each vehicle speed according to the gain value of the calibrated pedestrian warning sound frequency under each vehicle speed.
In one possible implementation manner, the relation includes parameters to be solved;
the step of determining the gain value after the pedestrian warning sound audio calibration in the preset vehicle speed range based on the gain value after the pedestrian warning sound audio calibration in the partial vehicle speed and a pre-stored relation between each vehicle speed in the preset vehicle speed range and the gain value after the audio calibration comprises the following steps:
Obtaining the value of the parameter to be solved in the relation according to the gain value after the pedestrian warning sound audio calibration under the partial vehicle speed;
And determining the gain value of the pedestrian warning sound after audio calibration under each vehicle speed based on the value of the parameter to be solved and the relation.
In one possible implementation manner, the obtaining the calibrated pedestrian warning sound audio at each vehicle speed according to the calibrated gain value of the pedestrian warning sound audio at each vehicle speed includes:
Adjusting the pedestrian warning sound frequency to be processed by utilizing the gain value and the pedestrian warning sound algorithm after the pedestrian warning sound frequency calibration under each vehicle speed to obtain the pedestrian warning sound frequency after the calibration under each vehicle speed;
The pedestrian warning sound algorithm changes the frequency and the sound pressure level of the audio based on the gain value of the audio.
In one possible implementation, the training process of the target model includes:
Respectively inputting the pedestrian warning sound audios into a target model to obtain a gain predicted value of each pedestrian warning sound audio after calibration under the partial vehicle speed;
And adjusting the target model according to the gain predicted value of each pedestrian warning sound after audio calibration under the partial vehicle speed, the gain value of each pedestrian warning sound after audio calibration under the corresponding vehicle speed and the loss function to obtain the trained target model.
In one possible implementation, the target model includes a preprocessing unit, and a convolutional neural network and recurrent neural network mixing unit.
The step of respectively inputting the pedestrian warning sound audios into a target model to obtain a gain predicted value of each pedestrian warning sound audio after calibration under the partial vehicle speed comprises the following steps:
acquiring time, frequency and sound pressure level information of each pedestrian warning sound frequency in the preprocessing unit;
And determining a gain predicted value of each pedestrian warning sound frequency after calibration under the partial vehicle speed based on the time, frequency and sound pressure level information of each pedestrian warning sound frequency at the convolutional neural network and cyclic neural network mixing unit.
In a possible implementation manner, the adjusting the target model according to the gain predicted value after each pedestrian warning sound audio calibration at the partial vehicle speed, the gain value after each pedestrian warning sound audio calibration at the corresponding vehicle speed, and the loss function to obtain the trained target model includes:
Determining the value of the loss function based on the difference between the gain predicted value of each pedestrian warning sound after audio calibration under the partial vehicle speed and the gain value of each pedestrian warning sound after audio calibration under the corresponding vehicle speed;
judging whether the value of the loss function is larger than a preset threshold value or not;
And if the value of the loss function is larger than the preset threshold, adjusting the target model, and based on the adjusted target model, re-executing the step of inputting the pedestrian warning sound audios into the target model to obtain a gain predicted value of each pedestrian warning sound audio after calibration under the partial vehicle speed until the value of the loss function is smaller than or equal to the preset threshold, so as to obtain the trained target model.
In one possible implementation manner, the target model is obtained by training based on the pedestrian warning sound audios and gain values after corresponding audio calibration in the partial vehicle speed of the region corresponding to the pedestrian warning sound audios.
In one possible implementation manner, the target models comprise target models of different regions, wherein each target model adds a region identifier of a corresponding region;
The step of inputting the pedestrian warning sound audio to be processed into a target model to obtain a gain value of the target model output after the pedestrian warning sound audio is calibrated at the vehicle speed in a preset vehicle speed range, comprises the following steps:
determining a region identifier corresponding to the pedestrian warning sound audio to be processed;
based on the region identifier corresponding to each target model and the region identifier corresponding to the pedestrian warning sound audio to be processed, obtaining target models for processing the pedestrian warning sound audio to be processed from the target models of different regions;
And inputting the pedestrian warning sound audio to be processed into an obtained target model to obtain a gain value of the target model output after the pedestrian warning sound audio is calibrated at the vehicle speed in the preset vehicle speed range.
In one possible implementation manner, the preprocessing unit obtains time, frequency and sound pressure level information of each pedestrian warning sound frequency, including:
and in the preprocessing unit, obtaining the time, frequency and sound pressure level information of each pedestrian warning sound frequency through the conversion of the short-time Fourier transform and the acoustic parameters.
In a second aspect, an embodiment of the present application provides an audio processing apparatus, including:
The acquisition module is used for acquiring pedestrian warning sound audio to be processed;
The processing module is used for inputting the pedestrian warning sound audio to be processed into a target model to obtain a gain value of the pedestrian warning sound audio after calibration in a preset vehicle speed range output by the target model; the target model is obtained through training based on a plurality of pedestrian warning sound audios and gain values after corresponding audio calibration under the partial vehicle speed.
In a third aspect, an embodiment of the present application provides an electronic device, including a memory and a processor, where the memory stores a computer program executable on the processor, and the processor implements the audio processing method according to any one of the first aspects when executing the computer program.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program which, when executed by a processor, implements the audio processing method according to any one of the first aspects.
It will be appreciated that the advantages of the second to fourth aspects may be found in the relevant description of the first aspect and are not repeated here.
According to the audio processing method, the device, the equipment and the storage medium, the pedestrian warning sound audio to be processed is input into the trained target model, the gain value of the pedestrian warning sound audio in the preset vehicle speed range output by the target model is obtained, the gain value of the pedestrian warning sound audio in the preset vehicle speed range is calibrated, the target model is obtained by training based on the pedestrian warning sound audio and the gain value of the corresponding audio in the part of the vehicle speed, therefore, a mode for processing the audio is provided by utilizing the thought of model calibration, the gain value of the pedestrian warning sound audio in the part of the vehicle speed in the preset vehicle speed range is obtained quickly, the audio is calibrated, the diversified requirements of users are met, and a vehicle manufacturer or a professional sound engineer is not required to calibrate, so that the audio calibration process is simplified, the speed of the audio calibration is improved, and the user can update the pedestrian warning sound simply and conveniently.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;
FIG. 2 is a flow chart of an audio processing method according to an embodiment of the application;
FIG. 3 is a flow chart of an audio processing method according to another embodiment of the present application;
FIG. 4 is a schematic diagram of a target model provided by an embodiment of the present application;
fig. 5 is a schematic structural diagram of an audio processing apparatus according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The present application will be more clearly described with reference to the following examples. The following examples will assist those skilled in the art in further understanding the function of the present application, but are not intended to limit the application in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the inventive concept. These are all within the scope of the present application.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
In the description of the present specification and the appended claims, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and are not to be construed as indicating or implying relative importance.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
Furthermore, references to "a plurality of" in embodiments of the present application should be interpreted as two or more.
The inventors have found that in order to ensure that the audio of the pedestrian alert meets regulatory and sound quality requirements, audio calibration is required. Existing audio calibration is typically performed by an automobile manufacturer or a professional sound engineer. If the professional sound engineer performs audio calibration, the professional sound engineer obtains gain values after audio calibration within 30km/h from the audio to be calibrated based on own professional technology and experience, such as gain values after audio calibration at forward running of 0km/h,1km/h, 30km/h, and gain values after audio calibration at reverse running of 0km/h,1km/h, and 30km/h, and further performs audio calibration by using the gain values. However, general users do not have relevant professional technology and experience and cannot perform audio calibration, so that the audio which can be used by the users is limited, the users cannot select favorite pedestrian warning sounds, and only if automobile manufacturers or professional sound engineers subsequently develop other replaceable audio, the users can update the pedestrian warning sounds, so that the process is complicated. Therefore, a new method is needed to calibrate the pedestrian warning sound frequency.
In order to achieve the aim of calibrating pedestrian warning sound audio, in the embodiment of the application, based on a trained target model, pedestrian warning sound audio to be processed is processed, a gain value of the pedestrian warning sound audio calibrated within a preset vehicle speed range is obtained, audio calibration is further carried out, more personalized audio selection rights are given to users, and calibration is not needed by automobile manufacturers or professional sound engineers, so that the calibration process is simplified, cost required by audio calibration is saved, and the users can update the pedestrian warning sound simply and conveniently.
Among them, since the inventors found that: if the training target model obtains the gain value of the pedestrian warning sound after the audio calibration in a preset vehicle speed range, such as within 30km/h, the model training difficulty is high and the period is long. In addition, certain rules exist between the vehicle speed in some vehicle speed ranges and the gain value after the audio calibration of the pedestrian warning sound, such as the following vehicle speed and gain values as examples:
When the vehicle is running forward, the gain value after the vehicle speed and the pedestrian warning sound are calibrated is exemplified:
Wherein the above underlined gain values correspond to speeds of 0km/h,10km/h,20km/h and 30km/h, respectively. The inventors found that: the gain value increases with the increase of the forward running speed from 0km/h to 10km/h, and can be expressed by a plurality of relational expressions, such as a binary one-time equation; the gain value is increased along with the increase of the forward running speed and can be expressed by a binary first-order equation from 10km/h to 20 km/h; the gain value is reduced with the increase of the forward running speed and can be expressed by a binary first-order equation. Since gain values at 0km/h and 30km/h are known to be fixed to 0, gain values corresponding to all vehicle speeds at forward travel can be obtained by obtaining only gain values at 10km/h and 20km/h at forward travel. Or because the gain values are increased along with the increase of the forward running speed, the gain values can be expressed by a binary one-time equation, and the gain values corresponding to all the vehicle speeds in the forward running process can be obtained only by obtaining the gain values of 10km/h or 20km/h in the forward running process.
For another example, when the vehicle runs in a reverse direction, the gain value after the vehicle speed and the pedestrian warning sound are calibrated is as follows:
Wherein the above underlined gain values correspond to speeds of 0km/h,6km/h,27km/h and 30km/h, respectively. The inventors found that: the gain value of 1km/h-27km/h is a fixed value, and the gain value of 27km/h-30km/h is reduced along with the increase of the reversing running speed, and can be expressed by a binary one-time equation. And it is known that the gain value is fixed to 0 when the reverse running is performed at 0km/h and 30km/h. Therefore, gain values corresponding to the reversing speed of 6km/h or corresponding to one speed except 6km/h in 1km/h-27km/h, such as 8km/h or 10km/h, and the like, can be obtained, so that the gain values corresponding to all the speeds during reversing can be calculated.
Therefore, the inventor provides training a model to input pedestrian warning sound audio to be processed, outputting gain values after the pedestrian warning sound audio calibration under partial vehicle speeds, such as gain values corresponding to forward running 10km/h and 20km/h and reversing vehicle speed of 6km/h, obtaining gain values after the pedestrian warning sound audio calibration under all vehicle speeds, such as all vehicle speeds in forward running 30km/h and all vehicle speeds in reversing running 30km/h, according to the rules, so as to perform audio calibration, solve the problem of audio calibration, reduce the difficulty of training the model, and improve the training rate of the model.
The target model may be a regression model, which is used for training and establishing a relationship between audio before calibration, such as pedestrian warning sound audio to be processed, and a gain value after audio calibration at a vehicle speed in a preset vehicle speed range, namely, inputting the audio before calibration, and outputting the gain value after audio calibration at a partial vehicle speed in the preset vehicle speed range.
Referring first to fig. 1, fig. 1 schematically shows an application scenario schematic diagram provided according to an embodiment of the present application, where an apparatus involved in the application scenario includes a server 101.
When the application scene is audio standard: the server 101 stores pedestrian warning sound audio to be processed and is provided with a trained model, and gain values after audio calibration in a part of the preset vehicle speed range can be obtained on the server 101 based on the pedestrian warning sound audio to be processed and the trained model, so that audio calibration is performed.
Optionally, the device related to the application scenario further includes a host 102 on the vehicle, and the server 101 and the host 102 may communicate through a network.
Wherein, the vehicle host 102 is provided with a custom button for selecting pedestrian warning sound and audio, which allows the user to upload custom audio files. If the user opens the button, the recorded pedestrian warning sound audio is uploaded. The host 102 may record the pedestrian alert audio uploaded by the user and send the recorded pedestrian alert audio to the server 101 for audio calibration under user authorization.
After the calibration is completed, the server 101 can also return the calibrated audio to the host 102, so that the diversified requirements of users on audio calibration are met, the users are given the most personalized audio selection, the boring and the malocclusion caused by fixed audio are prevented, meanwhile, the tedious calibration process of each pedestrian warning sound audio is saved, and the process of updating the pedestrian warning sound by the users is simplified. In addition, the server 101 is used for running codes, so that central control resources of the vehicle are saved.
An audio processing method according to an exemplary embodiment of the present application is described below with reference to fig. 2 to 4 in conjunction with the application scenario of fig. 1. It should be noted that the above application scenario is only shown for the convenience of understanding the spirit and principle of the present application, and the embodiments of the present application are not limited in any way. Rather, embodiments of the application may be applied to any scenario where applicable.
It should be noted that, the embodiment of the present application may be applied to an electronic device, and the electronic device may be a server or a host of a vehicle, that is, the audio processing method provided by the exemplary embodiment of the present application may be executed on the server or the host of the vehicle.
Wherein the server may be a monolithic server or a distributed server across multiple computers or computer data centers. The server may also be of various types, such as, but not limited to, a web server, an application server, or a database server, or a proxy server.
Alternatively, the server may comprise hardware, software, or embedded logic components or a combination of two or more such components for performing the appropriate functions supported or implemented by the server. For example, a server, such as a blade server, cloud server, etc., or may be a server group consisting of multiple servers, may include one or more of the above-mentioned classes of servers, etc.
It should be noted that, the audio processing method according to the exemplary embodiment of the present application may be executed on the same device or may be executed on a different device.
Referring to fig. 2, fig. 2 is a flowchart illustrating an audio processing method according to an embodiment of the application.
As shown in fig. 2, the method in the embodiment of the present application may include:
step 201, obtaining pedestrian warning sound audio to be processed.
For the pedestrian warning sound audio to be processed, the embodiment may be obtained from a host of the vehicle, for example, a user-defined button for setting a selection of the pedestrian warning sound audio on the host of the vehicle, so as to allow the user to upload a user-defined audio file. Here, the present embodiment may also be obtained from pre-stored audio. The pre-stored audio may be a custom audio file uploaded by the user in a period of time.
Optionally, after the pedestrian warning sound audio to be processed is obtained, the pedestrian warning sound audio to be processed can be processed, such as data cleaning, feature quantization, normalization processing and the like, so that the processed audio can be better processed by the target model, and the accuracy of subsequent audio calibration is improved.
By way of example, the present embodiment may set a cleaning rule according to an actual situation, for example, clean a non-dry audio and/or an audio with a duration lower than a preset duration threshold, so as to clean some bad data, and further, according to the cleaning rule, clean the pedestrian warning audio to be processed, clean the non-dry audio and/or the audio with a duration lower than the preset duration threshold, so as to improve the quality of the audio, and thus, subsequently, input the pedestrian warning audio to be processed after the data cleaning into the above-mentioned target model, and improve the accuracy of the subsequent audio calibration.
The preset time threshold may be determined according to practical situations, for example, 20 seconds or 30 seconds.
After the data is cleaned, the pedestrian warning sound audio to be processed after the data is cleaned can be subjected to feature quantization and standardization processing, wherein the feature quantization is used for converting non-numerical data into numerical data, the standardization is used for carrying out dimensionless processing on the data, and converting data with different dimensions into the same dimension, and further, the pedestrian warning sound audio to be processed after the feature quantization and standardization processing is input into the target model for processing, so that gain values after the pedestrian warning sound audio is calibrated under partial vehicle speeds in a preset vehicle speed range are obtained.
In this embodiment, considering some requirements of the target model for audio calibration, for example, the target model processes the numerical data, so that the audio is converted into numerical attribute data, for example, the audio obtained after the data is cleaned is subjected to feature quantization, so that the non-numerical data is converted into the numerical data, and the subsequent data processing is facilitated. The conversion rule for converting the non-numerical data into the numerical data can be preset, and then, based on the conversion rule, the audio obtained after the data is cleaned is subjected to characteristic quantization. For example, the present embodiment may acquire numeric data corresponding to a plurality of non-numeric data that have been converted, and then construct the above conversion rule according to the acquired data.
In addition, in the embodiment, the fact that the follow-up data is inconvenient for data of different units and/or different dimensions is considered, so that the data is further normalized, namely dimensionless processing is performed, and the data of different dimensions are converted into the same dimension, so that the follow-up data processing is convenient.
Step 202, inputting the pedestrian warning sound audio to be processed into a target model to obtain a gain value of the target model output after the calibration of the pedestrian warning sound audio in a preset vehicle speed range, wherein the target model is obtained by training based on a plurality of pedestrian warning sound audios and the gain value of the target model after the calibration of the corresponding audio in the preset vehicle speed range.
In this embodiment, the pedestrian warning sound audio may be input into the target model, and the target model processes the pedestrian warning sound audio to obtain the gain value after the corresponding audio is calibrated at the partial vehicle speed.
Here, the above-described preset vehicle speed range may be set according to actual conditions, such as setting the above-described preset vehicle speed range as a vehicle speed range when the vehicle is traveling at a low speed, for example, as a vehicle speed range in which the vehicle travel speed is within 30 km/h.
Before inputting the pedestrian warning sound audio to be processed into the target model, the method performs model training, in the training process, a plurality of pedestrian warning sound audio can be input into the target model to obtain gain predicted values after corresponding audio calibration under the partial vehicle speed, then the gain predicted values after each pedestrian warning sound audio calibration under the partial vehicle speed are compared with the gain values after each pedestrian warning sound audio calibration under the corresponding vehicle speed, wherein the gain values after each pedestrian warning sound audio calibration under the corresponding vehicle speed are real gain values after each pedestrian warning sound audio calibration under the corresponding vehicle speed, and further, the target model is adjusted according to the comparison result, so that the trained model can output the gain values after corresponding audio calibration under the partial vehicle speed meeting the requirements. Here, the compliance requirement may be understood as that the difference between the gain value after audio calibration at the above-mentioned part of the vehicle speed output by the trained model and the actual gain value after audio calibration at the corresponding vehicle speed is small, for example, within a preset difference range.
For different areas, the Gain values after corresponding audio calibration under the partial vehicle speed may be different, for example, the Gain value after audio calibration under the vehicle speed of 10km/h in the forward running of the vehicle in the area A is Gain1, the Gain value after audio calibration under the vehicle speed of 10km/h in the forward running of the vehicle in the area B is Gain2, and the Gain2 is different from Gain 1. Therefore, in this embodiment, the gain value after the corresponding audio calibration is trained based on the plurality of pedestrian warning audio frequencies and the partial vehicle speeds in the region corresponding to the plurality of pedestrian warning audio frequencies when the target model is trained. For example, if the target model corresponding to the area a is trained, training the target model by using a plurality of pedestrian warning sound frequencies and gain values calibrated by corresponding frequencies of the pedestrian warning sound frequencies under the partial vehicle speeds corresponding to the area a; if the target model corresponding to the area B is trained, training the target model by using a plurality of pedestrian warning sound frequencies and gain values after corresponding audio calibration under the partial vehicle speed of the area B corresponding to the plurality of pedestrian warning sound frequencies, that is, the embodiment obtains a plurality of target models, and training the target model of each area by using a plurality of pedestrian warning sound frequencies and gain values after corresponding audio calibration under the partial vehicle speed of the corresponding area corresponding to the plurality of pedestrian warning sound frequencies, thereby obtaining the target model corresponding to each area.
In this embodiment, in order to distinguish the target models of different regions, region identifiers of corresponding regions may be added to each target model, for example, an a region identifier may be added to the target model of an a region, for example, the name of an a region may be added to the name of the target model of an a region, so that it is convenient to accurately use the target models of different regions subsequently.
For example, before the pedestrian warning sound audio to be processed is input into the target model, the region identifier, such as the region name, corresponding to the pedestrian warning sound audio to be processed is first determined, and then, based on the region identifier corresponding to each target model and the region identifier corresponding to the pedestrian warning sound audio to be processed, the target model for processing the pedestrian warning sound audio to be processed is obtained from the target models of different regions, that is, the target model of the region corresponding to the pedestrian warning sound audio to be processed is obtained, so that the target model obtained by inputting the pedestrian warning sound audio to be processed is obtained, and the gain value of the calibrated pedestrian warning sound audio in the preset vehicle speed range output by the target model is obtained.
Here, in this embodiment, the gain values after the audio calibration may be different in the above-mentioned partial vehicle speeds in different regions, so that the training obtained target model is different, that is, different regions correspond to different target models, so before the uncalibrated audio is processed by using the training obtained target model, the region corresponding to the uncalibrated audio is first determined, and then the uncalibrated audio is processed by using the target model of the region, so that accuracy of the audio processing result is improved.
In addition, in this embodiment, after gain values of the calibrated pedestrian warning sound audio in the preset vehicle speed range of the target model are obtained, the gain values of the calibrated pedestrian warning sound audio in the preset vehicle speed range of each vehicle speed may be determined based on the gain values of the calibrated pedestrian warning sound audio in the preset vehicle speed range of the target model and a pre-stored relational expression between the gain values of the calibrated pedestrian warning sound audio in the preset vehicle speed range and the gain values of the calibrated pedestrian warning sound audio in the preset vehicle speed range, so that the calibrated pedestrian warning sound audio in each vehicle speed is obtained according to the gain values of the calibrated pedestrian warning sound audio in each vehicle speed.
The relation between each vehicle speed and the calibrated gain value of the audio in the preset vehicle speed range can be determined according to the calibrated gain value of the audio in each vehicle speed, for example, when the vehicle is running in the forward direction, the calibrated gain value of the audio of the vehicle speed and the pedestrian warning sound is exemplified:
Wherein, the gain value increases with the increase of the forward running speed from 0km/h to 10km/h, and the relation between the gain value and the vehicle speed can be expressed by a binary first-order equation, such as expressed as y=k1x+b1. The gain value is also increased with increasing forward travel speed, and can be expressed by a binary first-order equation, such as y=k2x+b2. The gain value is reduced with the increase of the forward running speed, and can be expressed by a binary first-order equation, such as expressed as y= -k3x+b3. In addition, for 0km/h to 20km/h, the gain value increases with the increase of the forward running speed, and the relation between the gain value and the vehicle speed at this time may be expressed by a binary first-order equation, for example, expressed as y=k4x+b4.
Alternatively, the above-mentioned relation between the respective vehicle speeds and the audio calibrated gain values in the preset vehicle speed range may include a relation between the respective vehicle speeds and the audio calibrated gain values in the forward running speed range of 30km/h and a relation between the respective vehicle speeds and the audio calibrated gain values in the reverse running speed range of 30km/h, and the above-mentioned relation between the respective vehicle speeds and the audio calibrated gain values in the forward running speed range of 30km/h may include the above-mentioned y=k1x+b1, y=k2x+b2, and y= -k3x+b3. Or the relation between each vehicle speed and the gain value after the audio calibration in the range of 30km/h of the forward running speed of the vehicle comprises y=k4x+b4 and y= -k3x+b3.
For determining the relation between each vehicle speed and the gain value after audio calibration in the range of 30km/h for the vehicle reversing running speed, see the process of determining the relation between each vehicle speed and the gain value after audio calibration in the range of 30km/h for the vehicle forward running speed, and will not be described herein.
After the gain value of the pedestrian warning sound after the audio calibration is obtained in the part of the vehicle speed, the gain value of the pedestrian warning sound after the audio calibration can be determined in the preset vehicle speed range based on the gain value of the pedestrian warning sound after the audio calibration in the part of the vehicle speed and the relational expression between each vehicle speed and the gain value after the audio calibration in the preset vehicle speed range, so that the gain value of the pedestrian warning sound after the audio calibration in each vehicle speed in the preset vehicle speed range is not required to be obtained by a training model, the model training difficulty is reduced, and the model training period is shortened.
The above relation includes parameters to be solved, for example, parameters k1 and b1 to be solved in the relation y=k1x+b1 when the forward running speed of the vehicle is 0km/h-10km/h, parameters k2 and b2 to be solved in the relation y=k2x+b2 when the forward running speed of the vehicle is 10km/h-20km/h, parameters k3 and b3 to be solved in the relation y= -k3x+b3 when the forward running speed of the vehicle is 20km/h-30km/h, or parameters k4 and b4 to be solved in the relation y=k4x+b4 when the forward running speed of the vehicle is 0km/h-20 km/h. According to the embodiment, the value of the parameter to be solved in the relation can be obtained according to the gain value after the pedestrian warning sound audio calibration under the partial vehicle speed, and then the gain value after the pedestrian warning sound audio calibration under each vehicle speed is determined based on the value of the parameter to be solved and the relation.
Here, the gain value after the pedestrian warning sound audio calibration under the partial vehicle speed can be determined according to actual conditions, for example, the gain value comprises gain values of forward running speed at 10km/h and 20km/h, gain values corresponding to reversing vehicle speed of 6km/h, gain values corresponding to reversing vehicle speed of forward running speed at 10km/h and 6km/h, or gain values corresponding to reversing vehicle speed of forward running speed at 20km/h, gain values corresponding to reversing vehicle speed of 6km/h, and the like.
From the above, the gain values at the forward running speeds of the vehicle at 0km/h and 30km/h are fixed to 0. Therefore, the present embodiment may substitute gain values of the forward running speed at 0km/h and 10km/h into the relation y=k1x+b1 to obtain a value of 0.014 for the parameter k1 to be solved and a value of 0 for b1 in the relation. Furthermore, the embodiment can determine the gain value after the pedestrian warning sound audio calibration under the vehicle speeds in the range of 0km/h-10km/h of the forward running speed based on y=0.014 x. Similarly, the embodiment can substitute gain values of the forward running speed at 10km/h and 20km/h into a relation y=k2x+b2 to obtain a parameter k2 value to be solved of 0.025 and a parameter b2 value of-0.11 in the relation, and then determine gain values of the forward running speed at 10km/h-20km/h after the pedestrian warning sound audio calibration under various vehicle speeds based on y=0.025 x-0.11; substituting gain values of the forward running speed at 20km/h and 30km/h into a relation y= -k3x+b3 to obtain a parameter k3 value to be solved in the relation of 0.039 and a value b3 of 1.17, and then determining gain values of the forward running speed at 20km/h-30km/h after pedestrian warning sound audio calibration at each vehicle speed based on y= -0.039x+1.17, so as to obtain gain values of the forward running speed at 0km/h-30km/h after pedestrian warning sound audio calibration at each vehicle speed.
Or the embodiment may substitute gain values of the forward running speed at 0km/h and 10km/h into the relation y=k4x+b4 to obtain a value of the parameter k4 to be solved in the relation of 0.014 and a value of b4 of 0. Furthermore, the embodiment can determine the gain value after the pedestrian warning sound audio calibration at each vehicle speed in the range of 0km/h-20km/h of the forward running speed based on y=0.014 x, and determine the gain value after the pedestrian warning sound audio calibration at each vehicle speed in the range of 20km/h-30km/h of the forward running speed based on y= -0.039x+1.17, thereby obtaining the gain value after the pedestrian warning sound audio calibration at each vehicle speed in the range of 0km/h-30km/h of the forward running speed.
Or the embodiment may substitute gain values of the forward running speed at 0km/h and 20km/h into the relation y=k4x+b4 to obtain the value of the parameter k4 to be solved in the relation as 0.0195 and the value of b4 as 0. Furthermore, the embodiment can determine the gain value after the pedestrian warning sound audio calibration at each vehicle speed in the range of 0km/h-20km/h of the forward running speed based on y=0.0195x, and determine the gain value after the pedestrian warning sound audio calibration at each vehicle speed in the range of 20km/h-30km/h of the forward running speed based on y= -0.039x+1.17, thereby obtaining the gain value after the pedestrian warning sound audio calibration at each vehicle speed in the range of 0km/h-30km/h of the forward running speed.
The process of obtaining the gain value after the pedestrian warning sound audio calibration under each vehicle speed when the vehicle runs in reverse is referred to the process when the vehicle runs in forward, and will not be repeated here.
In this embodiment, based on the gain values after the pedestrian warning sound audio calibration at the partial vehicle speeds and the pre-stored relation between each vehicle speed in the preset vehicle speed range and the gain values after the audio calibration, gain values after the pedestrian warning sound audio calibration at each vehicle speed in the preset vehicle speed range, such as the gain values after the pedestrian warning sound audio calibration at the forward running speed in the range of 30km/h and the reverse running speed in the range of 30km/h, are obtained, so that the pedestrian warning sound audio after the calibration at each vehicle speed is obtained according to the gain values after the pedestrian warning sound audio calibration at each vehicle speed, thereby meeting the diversified requirements of users for audio calibration, simplifying the calibration process, and improving the audio calibration speed.
Optionally, in this embodiment, the pedestrian warning sound audio to be processed may be adjusted by using the gain value and the pedestrian warning sound algorithm after the calibration of the pedestrian warning sound audio at each vehicle speed, so as to obtain the calibrated pedestrian warning sound audio at each vehicle speed. The pedestrian warning sound algorithm changes the frequency and the sound pressure level of the audio based on the gain value of the audio.
If the audio calibration requirement is that the vehicle is running forward, the pedestrian warning sound of 0km/h-20km/h is calibrated and then pronounciated as the speed increases, the frequency and the sound pressure level gradually increase, the sound begins to gradually decrease when reaching 20km/h, and the sound does not exist when reaching 30 km/h. The frequency and the sound pressure level of the audio are related to the Gain value of the audio, for example, the Gain value of the audio is Gain1, the frequency of the audio is f1, the sound pressure level is s1, the Gain value of the audio is Gain2, the frequency of the audio is f2, the sound pressure level is s2, and the like. According to the embodiment, the pedestrian warning sound frequency to be processed is adjusted by utilizing the gain value and the pedestrian warning sound algorithm after the pedestrian warning sound frequency calibration under each vehicle speed, namely, the frequency and the sound pressure level of the pedestrian warning sound frequency to be processed are changed by utilizing the pedestrian warning sound algorithm based on the gain value after the audio calibration, so that the frequency and the sound pressure level of the adjusted audio meet the audio calibration requirements, if the frequency and the sound pressure level of the adjusted audio meet the requirements of the vehicle forward running, the sound produced after the pedestrian warning sound calibration increases along with the speed, the frequency and the sound pressure level gradually increase, and the pedestrian warning sound begins to gradually decline when reaching 20km/h and declines to be free of sound when reaching 30km/h, and accordingly, the pedestrian warning sound frequency calibrated under each vehicle speed is obtained.
And if the audio calibration requirement is that the vehicle is in a reversing running state, the sound of the calibrated pedestrian warning sound is unchanged when the speed is more than 0 and less than 27km/h, the frequency and the sound pressure level are unchanged, and when the V is more than or equal to 27 and less than 30km/h, the sound starts to gradually decline, and the sound gradually declines to be no sound. Here, the frequency and sound pressure level of the audio are related to the gain value of the audio, and the embodiment adjusts the pedestrian warning sound audio to be processed by using the gain value and the pedestrian warning sound algorithm after the calibration of the pedestrian warning sound audio under each vehicle speed, so that when the frequency and the sound pressure level of the adjusted audio accord with the reverse running of the vehicle, the frequency and the sound pressure level are unchanged when the speed is more than 0 and less than 27km/h, the sound is unchanged, when the V is more than or equal to 27 and less than 30km/h, the sound starts to gradually decline, and the sound gradually declines to be no sound, thereby obtaining the pedestrian warning sound audio calibrated under each vehicle speed.
Optionally, in this embodiment, after the pedestrian warning sound audio calibrated at each vehicle speed is obtained, the pedestrian warning sound audio calibrated at each vehicle speed may be further pushed.
In this embodiment, the pedestrian warning sound audio calibrated at each vehicle speed can be displayed by pushing a message, for example, according to the pedestrian warning sound audio to be processed, a host of the target vehicle is determined, a message of "please check the pedestrian warning sound audio calibrated at different vehicle speeds" is sent to the host, and a user is reminded to check the pedestrian warning sound audio calibrated at different vehicle speeds in time, so that diversified requirements of the user on audio calibration can be met, and meanwhile, pedestrian warning sound updating can be simply and conveniently performed.
For example, the embodiment may send the pedestrian warning sound audio calibrated at each vehicle speed to the host of the target vehicle. In addition, if the current execution main body is the host of the target vehicle, the host can display the pedestrian warning sound audio calibrated under each vehicle speed on a display screen of the host, so as to remind a user to check and collect the use in time.
Here, in addition to the calibration of the pedestrian warning sound frequency, the embodiment may also perform the calibration of other frequencies, such as the calibration of automobile whistle, the calibration of noise, vibration and harshness (Noise, vibration, harshness, NVH), etc., and the specific calibration process refers to the above calibration process of the pedestrian warning sound frequency, which is not described herein again.
In the embodiment of the application, the pedestrian warning sound audio to be processed is processed based on the trained target model, so that the thought of model calibration is utilized, a mode for processing the audio is provided, the gain value of the pedestrian warning sound audio after the pedestrian warning sound audio is calibrated at partial vehicle speed in the preset vehicle speed range is rapidly obtained, the audio is calibrated, the diversified requirements of users are met, and an automobile manufacturer or a professional sound engineer is not required to calibrate, so that the calibration process is simplified, the audio calibration speed is improved, and the user can simply and conveniently update the pedestrian warning sound.
In addition, before the pedestrian warning sound audio to be processed is input into the target model to obtain the gain value of the target model output after the pedestrian warning sound audio is calibrated at the internal vehicle speed within the preset vehicle speed range, model training is considered to obtain a trained target model, wherein training is performed in the model training process based on training data, a loss function and model output, the pedestrian warning sound audio to be processed is processed based on the trained target model, user-defined calibration of the pedestrian warning sound audio is further achieved, diversified requirements of users on audio calibration are met, meanwhile, tedious calibration processes of the pedestrian warning sound audio are saved, and the process of updating the pedestrian warning sound by the users is simplified. Fig. 3 is a flow chart of an audio processing method according to another embodiment of the present application, as shown in fig. 3, the method includes:
Step 301, respectively inputting a plurality of pedestrian warning sound audios into a target model to obtain a gain predicted value of each pedestrian warning sound audio after calibration under partial vehicle speed.
For training data, the present embodiment may be obtained from pre-stored audio, which may store a plurality of training data over a period of time. The training data may include a plurality of pedestrian alert tones, and a gain value for each of the pedestrian alert tones calibrated at a portion of the vehicle speed. Here, the gain value of each pedestrian warning sound after audio calibration at the partial vehicle speed is the actual gain value of each pedestrian warning sound after audio calibration at the partial vehicle speed. In this embodiment, the gain value of each pedestrian warning sound frequency after calibration at the partial vehicle speed obtained by processing the plurality of pedestrian warning sound frequencies by the automobile manufacturer or the professional sound engineer may be obtained and used as the actual gain value of each pedestrian warning sound frequency after calibration at the partial vehicle speed.
Alternatively, as shown in fig. 4, the target model may include a preprocessing unit, and a convolutional neural network and recurrent neural network mixing unit. And the preprocessing unit is used for obtaining time, frequency and sound pressure level information of each pedestrian warning sound frequency, and the convolutional neural network and cyclic neural network mixing unit is used for determining a gain predicted value of each pedestrian warning sound frequency after calibration under partial vehicle speed based on the time, frequency and sound pressure level information of each pedestrian warning sound frequency.
For example, in the preprocessing unit, the time, frequency and sound pressure level information of each pedestrian warning sound can be obtained through short-time fourier transform and conversion of acoustic parameters. For example, the preprocessing unit firstly converts acoustic parameters of each pedestrian warning sound frequency to obtain sound parameter information corresponding to each pedestrian warning sound frequency, and then loads the sound frequency after the acoustic parameter conversion, wherein the sound frequency can be loaded according to a preset sampling rate in the loading process, and after the loading is completed, the time, frequency and sound pressure level information of the loaded sound frequency are calculated through short-time fourier transform. Here, the time, frequency, and sound pressure level information of the audio can be understood as a change in the frequency and sound pressure level of the audio on the time axis.
In this embodiment, as shown in fig. 4, the convolutional neural network and recurrent neural network hybrid unit may include a convolutional neural network layer, a recurrent neural network layer, a fully connected layer, and the like. The convolution neural network layer can capture local frequency characteristics through sliding of a convolution kernel, so that frequency-related characteristics in the audio are obtained. Moreover, the audio has time sequence, the information of the front time and the rear time is related to each other, and the cyclic neural network layer can capture the long-term time sequence association relationship in the audio to obtain global information. Therefore, the convolution neural network and circulation neural network mixing unit can comprehensively acquire the local and global information of the audio, and further accurately determine the gain predicted value of each pedestrian warning sound after the audio calibration under the partial vehicle speed.
And step 302, adjusting the target model according to the gain predicted value of each pedestrian warning sound after audio calibration at the partial vehicle speed, the gain value of each pedestrian warning sound after audio calibration at the corresponding vehicle speed and the loss function to obtain a trained target model.
Here, the present embodiment may determine the value of the loss function based on the difference between the gain predicted value after calibration of each pedestrian alert tone audio at the partial vehicle speed and the gain value after calibration of each pedestrian alert tone audio at the corresponding vehicle speed, further determine whether the value of the loss function is greater than a preset threshold, adjust the target model if the value of the loss function is greater than the preset threshold, and re-execute the steps of inputting the plurality of pedestrian alert tone audios into the target model based on the adjusted target model, thereby obtaining the gain predicted value after calibration of each pedestrian alert tone audio at the partial vehicle speed until the value of the loss function is less than or equal to the preset threshold, thereby obtaining the trained target model. The preset threshold value can be determined according to actual conditions, for example, a value of a loss function when a gain predicted value of each pedestrian warning sound after audio calibration is close to a gain value of each pedestrian warning sound after audio calibration under a corresponding vehicle speed according to a part of vehicle speed output by a target model. Here, the above approach can be understood as that the gain prediction value is slightly different from the true gain value, such as within a set range.
In this embodiment, when training a model, a plurality of pedestrian warning sound frequencies are used as input data of a target model, a gain value after each pedestrian warning sound frequency is calibrated at a partial vehicle speed is used as label data of the target model, and the target model is subjected to supervised training, so that a gain predicted value after each pedestrian warning sound frequency is calibrated at the partial vehicle speed output by the target model is close to a gain value after the audio frequency is calibrated at a corresponding vehicle speed, the value of a loss function is smaller, and the accuracy of audio processing of the target model is improved.
Here, the value of the loss function may be determined based on the difference between the gain predicted value after calibration of each pedestrian warning sound audio at the partial vehicle speed and the gain value after calibration of each pedestrian warning sound audio at the corresponding vehicle speed, where the difference is large, the value of the loss function is large, the difference is small, and the value of the loss function is small. The specific form of the loss function can be determined according to actual conditions, for example, a variance function between a gain predicted value after each pedestrian warning sound audio calibration under a part of vehicle speed and a gain value after each pedestrian warning sound audio calibration under a corresponding vehicle speed is calculated as the loss function.
In this embodiment, the value of the loss function is greater than the preset threshold, which indicates that the difference between the gain predicted value after calibration of each pedestrian warning sound audio under the partial vehicle speed output by the target model and the gain value after calibration of each pedestrian warning sound audio under the corresponding vehicle speed is greater, and the target model needs to be adjusted. Here, adjusting the target model may include adjusting a structural parameter of the model and/or adjusting a scale of training data, and further, based on the adjusted target model, re-executing the steps of inputting the plurality of pedestrian warning sound audios into the target model to obtain a gain predicted value after calibration of each pedestrian warning sound audio under a partial vehicle speed, until a value of a loss function is less than or equal to a preset threshold value, to indicate that a difference between the gain predicted value after calibration of each pedestrian warning sound audio under the partial vehicle speed output by the target model and the gain value after calibration of each pedestrian warning sound audio under the corresponding vehicle speed is small, thereby obtaining the trained target model.
In addition, in this embodiment, after the trained target model is obtained, the trained target model may also be verified in this embodiment. Illustratively, the present embodiment utilizes verification data to verify the trained object model. The verification data comprise one or more pedestrian warning sound frequencies and gain values after the calibration of the pedestrian warning sound frequencies under partial vehicle speeds. According to the embodiment, the pedestrian warning sound audio can be input into the trained target model to obtain the gain value of the pedestrian warning sound audio after calibration under the partial vehicle speed output by the target model, then the gain value output by the model is compared with the corresponding gain value in the verification data, if the difference between the gain value output by the model and the corresponding gain value in the verification data is smaller, for example, the variance between the gain value output by the model and the corresponding gain value in the verification data is smaller than a preset variance threshold, the verification of the trained target model is determined to be passed, and otherwise, the verification is determined not to be passed. When the verification is not passed, the embodiment re-executes the model training process until the model verification is passed.
After the trained target model is obtained, verifying the target model, and after verification is passed, continuing to execute subsequent steps, otherwise retraining the model, thereby further improving the training effect of the target model and further improving the accuracy of audio processing of the target model.
Step 303, obtaining the pedestrian warning sound audio to be processed.
And 304, inputting the pedestrian warning sound audio to be processed into a target model to obtain a gain value of the pedestrian warning sound audio after calibration in the vehicle speed within a preset vehicle speed range output by the target model.
The target model is obtained through training based on gain values obtained after calibration of a plurality of pedestrian warning sound audios and corresponding audios under the partial vehicle speeds.
The implementation of steps 303-304 is described in the embodiment of fig. 2, and is not described here.
In this embodiment, before the pedestrian warning sound audio calibration, model training is performed to obtain a trained target model, so that the pedestrian warning sound audio to be processed is processed based on the trained target model, and therefore, by using the thought of model calibration, a mode of processing the audio is provided, gain values after the pedestrian warning sound audio calibration are obtained quickly in a preset vehicle speed range at a part of the vehicle speed, and then the audio is calibrated, so as to meet diversified requirements of users, and calibration is performed without an automobile manufacturer or a professional sound engineer, so that the calibration process is simplified, the vehicle speed of audio calibration is improved, and the user can update the pedestrian warning sound simply and conveniently.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.
Fig. 5 is a schematic structural diagram of an audio processing apparatus according to an embodiment of the present application. As shown in fig. 5, the audio processing apparatus provided in this embodiment may include: an acquisition module 501 and a processing module 502.
The acquiring module 501 is configured to acquire pedestrian warning sound audio to be processed.
The processing module 502 is configured to input the pedestrian warning sound audio to be processed into a target model, and obtain a gain value of the target model after the pedestrian warning sound audio is calibrated at a vehicle speed within a preset vehicle speed range; the target model is obtained through training based on a plurality of pedestrian warning sound audios and gain values after corresponding audio calibration under the partial vehicle speed.
In a possible implementation manner, the system further includes an obtaining module, configured to, after the processing module 502 inputs the pedestrian warning sound audio to be processed into a target model to obtain a gain value after the calibration of the pedestrian warning sound audio in a preset vehicle speed range output by the target model, determine gain values after the calibration of the pedestrian warning sound audio in the preset vehicle speed range based on the gain value after the calibration of the pedestrian warning sound audio in the part of the vehicle speed and a pre-stored relational expression between each vehicle speed and the gain value after the calibration of the audio in the preset vehicle speed range;
and obtaining the calibrated pedestrian warning sound frequency under each vehicle speed according to the gain value of the calibrated pedestrian warning sound frequency under each vehicle speed.
In one possible implementation, the relation includes parameters to be solved.
The obtaining module is specifically configured to:
Obtaining the value of the parameter to be solved in the relation according to the gain value after the pedestrian warning sound audio calibration under the partial vehicle speed;
And determining the gain value of the pedestrian warning sound after audio calibration under each vehicle speed based on the value of the parameter to be solved and the relation.
In one possible implementation manner, the obtaining module is specifically configured to:
Adjusting the pedestrian warning sound frequency to be processed by utilizing the gain value and the pedestrian warning sound algorithm after the pedestrian warning sound frequency calibration under each vehicle speed to obtain the pedestrian warning sound frequency after the calibration under each vehicle speed;
The pedestrian warning sound algorithm changes the frequency and the sound pressure level of the audio based on the gain value of the audio.
In one possible implementation, the device further includes a training module, configured to:
Respectively inputting the pedestrian warning sound audios into a target model to obtain a gain predicted value of each pedestrian warning sound audio after calibration under the partial vehicle speed;
And adjusting the target model according to the gain predicted value of each pedestrian warning sound after audio calibration under the partial vehicle speed, the gain value of each pedestrian warning sound after audio calibration under the corresponding vehicle speed and the loss function to obtain the trained target model.
In one possible implementation, the target model includes a preprocessing unit, and a convolutional neural network and recurrent neural network mixing unit.
The training module is specifically configured to:
acquiring time, frequency and sound pressure level information of each pedestrian warning sound frequency in the preprocessing unit;
And determining a gain predicted value of each pedestrian warning sound frequency after calibration under the partial vehicle speed based on the time, frequency and sound pressure level information of each pedestrian warning sound frequency at the convolutional neural network and cyclic neural network mixing unit.
In one possible implementation manner, the training module is specifically configured to:
Determining the value of the loss function based on the difference between the gain predicted value of each pedestrian warning sound after audio calibration under the partial vehicle speed and the gain value of each pedestrian warning sound after audio calibration under the corresponding vehicle speed;
judging whether the value of the loss function is larger than a preset threshold value or not;
And if the value of the loss function is larger than the preset threshold, adjusting the target model, and based on the adjusted target model, re-executing the step of inputting the pedestrian warning sound audios into the target model to obtain a gain predicted value of each pedestrian warning sound audio after calibration under the partial vehicle speed until the value of the loss function is smaller than or equal to the preset threshold, so as to obtain the trained target model.
In one possible implementation manner, the target model is obtained by training based on the pedestrian warning sound audios and gain values after corresponding audio calibration in the partial vehicle speed of the region corresponding to the pedestrian warning sound audios.
In one possible implementation, the target models include target models of different regions, wherein each target model adds a region identification of the corresponding region.
The processing module is specifically configured to:
determining a region identifier corresponding to the pedestrian warning sound audio to be processed;
based on the region identifier corresponding to each target model and the region identifier corresponding to the pedestrian warning sound audio to be processed, obtaining target models for processing the pedestrian warning sound audio to be processed from the target models of different regions;
And inputting the pedestrian warning sound audio to be processed into an obtained target model to obtain a gain value of the target model output after the pedestrian warning sound audio is calibrated at the vehicle speed in the preset vehicle speed range.
In one possible implementation manner, the training module is specifically configured to:
and in the preprocessing unit, obtaining the time, frequency and sound pressure level information of each pedestrian warning sound frequency through the conversion of the short-time Fourier transform and the acoustic parameters.
It should be noted that, because the content of information interaction and execution process between the above devices/units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein.
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 6, the electronic device 600 of this embodiment includes: a processor 610, a memory 620, and a computer program 621 executable on the processor 610 stored in the memory 620. The steps of any of the various method embodiments described above, such as steps 201 through 202 shown in fig. 2, are implemented when the processor 610 executes the computer program 621. Or the processor 610, when executing the computer program 621, performs the functions of the modules/units in the above-described apparatus embodiments, such as the functions of the modules 501 to 502 shown in fig. 5.
By way of example, computer program 621 may be partitioned into one or more modules/units that are stored in memory 620 and executed by processor 610 to perform the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing particular functions to describe the execution of the computer program 621 in the electronic device 600.
It will be appreciated by those skilled in the art that fig. 6 is merely an example of an electronic device and is not meant to be limiting and may include more or fewer components than shown, or may combine certain components, or different components, such as input-output devices, network access devices, buses, etc.
The Processor 610 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 620 may be an internal storage unit of the electronic device, such as a hard disk or a memory of the electronic device, or an external storage device of the electronic device, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like. The memory 620 may also include both internal storage units and external storage devices of the electronic device. The memory 620 is used to store computer programs and other programs and data required by the electronic device. The memory 620 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
An embodiment of the present invention also provides a computer-readable storage medium storing a computer program that when executed by a processor implements the above-described audio processing method.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/electronic device and method may be implemented in other manners. For example, the apparatus/electronic device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims (13)

1. An audio processing method, comprising:
acquiring pedestrian warning sound audio to be processed;
Inputting the pedestrian warning sound audio to be processed into a target model to obtain a gain value of the pedestrian warning sound audio after calibration in a preset vehicle speed range output by the target model; the target model is obtained through training based on a plurality of pedestrian warning sound audios and gain values after corresponding audio calibration under the partial vehicle speed.
2. The audio processing method according to claim 1, wherein after inputting the pedestrian warning sound audio to be processed into a target model to obtain a gain value of the pedestrian warning sound audio after calibration at a vehicle speed within a preset vehicle speed range output by the target model, further comprising:
Based on the gain value of the pedestrian warning sound after audio calibration in the partial vehicle speed and a pre-stored relation between each vehicle speed in the preset vehicle speed range and the gain value after audio calibration, determining the gain value of the pedestrian warning sound after audio calibration in each vehicle speed in the preset vehicle speed range;
and obtaining the calibrated pedestrian warning sound frequency under each vehicle speed according to the gain value of the calibrated pedestrian warning sound frequency under each vehicle speed.
3. The audio processing method according to claim 2, wherein the relation includes parameters to be solved;
the step of determining the gain value after the pedestrian warning sound audio calibration in the preset vehicle speed range based on the gain value after the pedestrian warning sound audio calibration in the partial vehicle speed and a pre-stored relation between each vehicle speed in the preset vehicle speed range and the gain value after the audio calibration comprises the following steps:
Obtaining the value of the parameter to be solved in the relation according to the gain value after the pedestrian warning sound audio calibration under the partial vehicle speed;
And determining the gain value of the pedestrian warning sound after audio calibration under each vehicle speed based on the value of the parameter to be solved and the relation.
4. The audio processing method according to claim 2, wherein the obtaining the calibrated pedestrian alert tone audio at each vehicle speed according to the calibrated gain value of the pedestrian alert tone audio at each vehicle speed comprises:
Adjusting the pedestrian warning sound frequency to be processed by utilizing the gain value and the pedestrian warning sound algorithm after the pedestrian warning sound frequency calibration under each vehicle speed to obtain the pedestrian warning sound frequency after the calibration under each vehicle speed;
The pedestrian warning sound algorithm changes the frequency and the sound pressure level of the audio based on the gain value of the audio.
5. The audio processing method according to any one of claims 1 to 4, characterized in that the training process of the object model includes:
Respectively inputting the pedestrian warning sound audios into a target model to obtain a gain predicted value of each pedestrian warning sound audio after calibration under the partial vehicle speed;
And adjusting the target model according to the gain predicted value of each pedestrian warning sound after audio calibration under the partial vehicle speed, the gain value of each pedestrian warning sound after audio calibration under the corresponding vehicle speed and the loss function to obtain the trained target model.
6. The audio processing method according to claim 5, wherein the object model includes a preprocessing unit, and a convolutional neural network and cyclic neural network mixing unit;
The step of respectively inputting the pedestrian warning sound audios into a target model to obtain a gain predicted value of each pedestrian warning sound audio after calibration under the partial vehicle speed comprises the following steps:
acquiring time, frequency and sound pressure level information of each pedestrian warning sound frequency in the preprocessing unit;
And determining a gain predicted value of each pedestrian warning sound frequency after calibration under the partial vehicle speed based on the time, frequency and sound pressure level information of each pedestrian warning sound frequency at the convolutional neural network and cyclic neural network mixing unit.
7. The method according to claim 5, wherein the adjusting the target model according to the gain predicted value after each pedestrian alert tone audio calibration at the partial vehicle speed, the gain value after each pedestrian alert tone audio calibration at the corresponding vehicle speed, and the loss function, to obtain the trained target model comprises:
Determining the value of the loss function based on the difference between the gain predicted value of each pedestrian warning sound after audio calibration under the partial vehicle speed and the gain value of each pedestrian warning sound after audio calibration under the corresponding vehicle speed;
judging whether the value of the loss function is larger than a preset threshold value or not;
And if the value of the loss function is larger than the preset threshold, adjusting the target model, and based on the adjusted target model, re-executing the step of inputting the pedestrian warning sound audios into the target model to obtain a gain predicted value of each pedestrian warning sound audio after calibration under the partial vehicle speed until the value of the loss function is smaller than or equal to the preset threshold, so as to obtain the trained target model.
8. The audio processing method according to any one of claims 1 to 4, wherein the target model is trained based on the plurality of pedestrian alert tones and corresponding audio calibrated gain values at the partial vehicle speeds of the region corresponding to the plurality of pedestrian alert tones.
9. The audio processing method of claim 8, wherein the target models comprise target models of different regions, wherein each target model adds a region identification of a corresponding region;
The step of inputting the pedestrian warning sound audio to be processed into a target model to obtain a gain value of the target model output after the pedestrian warning sound audio is calibrated at the vehicle speed in a preset vehicle speed range, comprises the following steps:
determining a region identifier corresponding to the pedestrian warning sound audio to be processed;
based on the region identifier corresponding to each target model and the region identifier corresponding to the pedestrian warning sound audio to be processed, obtaining target models for processing the pedestrian warning sound audio to be processed from the target models of different regions;
And inputting the pedestrian warning sound audio to be processed into an obtained target model to obtain a gain value of the target model output after the pedestrian warning sound audio is calibrated at the vehicle speed in the preset vehicle speed range.
10. The audio processing method according to claim 6, wherein obtaining, at the preprocessing unit, time, frequency and sound pressure level information of each pedestrian alert tone, comprises:
and in the preprocessing unit, obtaining the time, frequency and sound pressure level information of each pedestrian warning sound frequency through the conversion of the short-time Fourier transform and the acoustic parameters.
11. An audio processing apparatus, comprising:
The acquisition module is used for acquiring pedestrian warning sound audio to be processed;
The processing module is used for inputting the pedestrian warning sound audio to be processed into a target model to obtain a gain value of the pedestrian warning sound audio after calibration in a preset vehicle speed range output by the target model; the target model is obtained through training based on a plurality of pedestrian warning sound audios and gain values after corresponding audio calibration under the partial vehicle speed.
12. An electronic device comprising a memory and a processor, the memory having stored therein a computer program executable on the processor, wherein the processor implements the audio processing method of any of claims 1 to 10 when the computer program is executed by the processor.
13. A computer-readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the audio processing method according to any one of claims 1 to 10.
CN202410116328.3A 2024-01-26 2024-01-26 Audio processing method, device, equipment and storage medium Pending CN118098278A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410116328.3A CN118098278A (en) 2024-01-26 2024-01-26 Audio processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410116328.3A CN118098278A (en) 2024-01-26 2024-01-26 Audio processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN118098278A true CN118098278A (en) 2024-05-28

Family

ID=91141429

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410116328.3A Pending CN118098278A (en) 2024-01-26 2024-01-26 Audio processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN118098278A (en)

Similar Documents

Publication Publication Date Title
US20170076395A1 (en) User-managed evidentiary record of driving behavior and risk rating
CN109740433B (en) Vehicle avoiding method and vehicle-mounted terminal
CN107277685B (en) System and method for electronic sound enhancement tuning
US20130151063A1 (en) Active and stateful hyperspectral vehicle evaluation
US11126744B2 (en) Systems and methods for preserving the privacy of collected vehicular data
CN113119981B (en) Vehicle active safety control method, system and storage medium
CN114103850B (en) System for realizing active sound enhancement of automobile
CN113849680A (en) Travel sound library, device for generating travel sound library, and vehicle including travel sound library
CN112966818A (en) Directional guide model pruning method, system, equipment and storage medium
CN109067889A (en) Live broadcasting method, onboard system, server and storage medium based on geographical location
CN114137527A (en) Vehicle early warning method, device, equipment and storage medium
CN118098278A (en) Audio processing method, device, equipment and storage medium
CN118098279A (en) Audio processing method, device, equipment and storage medium
CN115859219A (en) Multi-modal interaction method, device, equipment and storage medium
CN111907435B (en) Control method, device and equipment of vehicle-mounted multimedia system and storage medium
Rhode et al. Online validity monitor for vehicle dynamics models
CN115447616B (en) Method and device for generating objective index of vehicle driving
CN117657170B (en) Intelligent safety and whole vehicle control method and system for new energy automobile
CN114189612B (en) Camera installation angle determining method and device and terminal equipment
CN113687326B (en) Vehicle-mounted radar echo noise reduction method, device, equipment and medium
CN111090269A (en) Sensor simulation method, device and storage medium based on generation countermeasure network
CN113469159B (en) Obstacle information generation method and device, electronic equipment and computer readable medium
CN111443621B (en) Model generation method, model generation device and electronic equipment
CN114084154A (en) Method, device and system for configuring parameters of automatic driving system
CN115442642A (en) Video pushing method and device, server and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination