CN112489682B

CN112489682B - Audio processing method, device, electronic equipment and storage medium

Info

Publication number: CN112489682B
Application number: CN202011341834.0A
Authority: CN
Inventors: 蒋慧军; 徐伟; 杨艾琳; 姜凯英; 肖京
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-11-25
Filing date: 2020-11-25
Publication date: 2023-05-23
Anticipated expiration: 2040-11-25
Also published as: CN112489682A; WO2021213135A1

Abstract

The application discloses an audio processing method, an audio processing device, electronic equipment and a storage medium, and relates to the technical field of computers. The audio processing method comprises the following steps: acquiring a time domain audio signal corresponding to electronic musical instrument equipment to be detected; performing frequency domain conversion processing on the time domain audio signal to obtain a frequency domain audio signal; inputting the frequency domain audio signal into a pre-trained machine learning model; acquiring a fundamental frequency label corresponding to a frequency domain audio signal output by a pre-trained machine learning model; determining a target scale and a target sound component according to the fundamental frequency label and the corresponding relation between the fundamental frequency label and the scale and the sound component; and based on the determined target musical scale and target musical score, and the standard musical scale and standard musical score corresponding to the electronic musical instrument equipment to be detected. The time domain audio signal can be uploaded to any node server in the block chain server system by the client, and the accuracy of sound level detection of the electronic musical instrument equipment is improved by the technical scheme.

Description

Audio processing method, device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technology, and in particular, to an audio processing method, apparatus, electronic device, and storage medium.

Background

Electronic musical instrument devices are widely favored as musical instruments that produce sound by electronic signals. During the production process of the electronic musical instrument equipment, the sound signals generated by the electronic musical instrument equipment need to be analyzed so as to detect the sound level condition of the sound signals generated by the electronic musical instrument equipment, and only the electronic musical instrument equipment which is qualified in detection can be put on the market for sale.

In the related art, when detecting an electronic musical instrument device, only the electronic musical instrument device whose similarity reaches a predetermined condition is identified as being acceptable, mainly by comparing the similarity of a sound signal generated by the electronic musical instrument device with a standard sound signal. The method for detecting the sound level of the sound signal generated by the electronic musical instrument device proposed by the related art can only detect the situation that a large difference exists between the sound signal generated by the electronic musical instrument device and the standard sound signal, and has the technical problem of lower detection accuracy.

Disclosure of Invention

Based on the above, the application provides an audio processing method, an audio processing device, electronic equipment and a storage medium, which improve accuracy of detecting the sound level of the electronic musical instrument equipment.

In a first aspect, the present application provides an audio processing method, including: acquiring a time domain audio signal corresponding to electronic musical instrument equipment to be detected; performing frequency domain conversion processing on the time domain audio signal to obtain a frequency domain audio signal; inputting the frequency domain audio signals into a pre-trained machine learning model, wherein the pre-trained machine learning model is obtained through training sample data comprising the frequency domain audio signals and fundamental frequency labels corresponding to the frequency domain audio signals; acquiring a fundamental frequency label corresponding to the frequency domain audio signal output by the pre-trained machine learning model; determining a target scale and a target score according to the fundamental frequency label and the corresponding relation between the fundamental frequency label and the scale and the score; and determining a sound level detection result of the electronic musical instrument equipment to be detected based on the determined target musical scale and target sound point and the standard musical scale and standard sound point corresponding to the electronic musical instrument equipment to be detected.

In a second aspect, the present application provides an audio processing apparatus comprising: the first acquisition unit is used for acquiring a time domain audio signal corresponding to the electronic musical instrument equipment to be detected; the conversion unit is used for carrying out frequency domain conversion processing on the time domain audio signals to obtain frequency domain audio signals; the input unit is used for inputting the frequency domain audio signals into a pre-trained machine learning model, wherein the pre-trained machine learning model is obtained through training sample data comprising the frequency domain audio signals and fundamental frequency labels corresponding to the frequency domain audio signals; the second acquisition unit is used for acquiring a fundamental frequency label corresponding to the frequency domain audio signal output by the pre-trained machine learning model; the first execution unit is used for determining a target musical scale and a target sound component according to the fundamental frequency label and the corresponding relation between the fundamental frequency label and the musical scale and the sound component; and the second execution unit is used for determining a sound level detection result of the electronic musical instrument equipment to be detected based on the determined target musical scale and target sound score and the standard musical scale and standard sound score corresponding to the electronic musical instrument equipment to be detected.

In a third aspect, the present application provides an electronic device comprising a memory and a processor, the memory having stored therein computer readable instructions which, when executed by the processor, cause the processor to perform the steps of the above-described audio processing method.

In a fourth aspect, the present application provides a storage medium storing computer readable instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of the above-described audio processing method.

The technical scheme provided by the embodiment of the application can comprise the following beneficial effects: the frequency domain audio signal is obtained by performing frequency domain conversion processing on the time domain audio signal, fundamental frequency detection is performed on the frequency domain audio signal, fundamental frequency of the time domain audio signal corresponding to the electronic musical instrument equipment to be detected is obtained, the scale and the pitch of sound generated by the electronic musical instrument equipment to be detected are determined based on the determined fundamental frequency, so that the sound level detection is performed on the sound generated by the electronic musical instrument equipment to be detected, and compared with the similarity comparison between the sound signal generated by the electronic musical instrument equipment and the standard sound signal, the sound level detection can be performed on the sound generated by the electronic musical instrument equipment to be detected more accurately by detecting the fundamental frequency of the time domain audio signal generated by the electronic musical instrument equipment to be detected, and the accuracy of the sound level detection on the electronic musical instrument equipment is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solutions of the embodiments of the present application may be applied.

Fig. 2 is a flow chart of an audio processing method according to an exemplary embodiment of the present application.

Fig. 3 is a flowchart illustrating an audio processing method according to an exemplary embodiment of the present application.

Fig. 4 is a flowchart illustrating an audio processing method according to an exemplary embodiment of the present application.

Fig. 5 is a specific flowchart of step S230 of the audio processing method according to an exemplary embodiment of the present application.

Fig. 6 is a flowchart illustrating an audio processing method according to an exemplary embodiment of the present application.

Fig. 7 is a block diagram of an audio processing device according to one embodiment of the present application.

Fig. 8 is an exemplary block diagram of an electronic device for implementing the above-described audio processing method according to an exemplary embodiment of the present application.

Fig. 9 is a computer readable storage medium for implementing the above data verification method according to an exemplary embodiment of the present application.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present application. One skilled in the relevant art will recognize, however, that the aspects of the application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the application.

The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.

As shown in fig. 1, the system architecture may include an electronic musical instrument device 101 to be detected, a network 102, a client 103 and a server 104, where the client 103 obtains a time domain audio signal corresponding to the electronic musical instrument device 101 to be detected, and the client 103 uploads the obtained time domain audio signal corresponding to the electronic musical instrument device 101 to be detected to the server 104, and the server 104 may be a server for providing a sound level detection service, and the client 101 may be one or more of a smart phone, a tablet computer and a portable computer, and of course may also be a desktop computer or the like. The network 102 is a medium for providing a communication link between the electronic musical instrument device 101 to be detected and the client 103, and the server 104. Network 102 may include various connection types, such as wired communication links, wireless communication links, and the like.

It should be understood that the numbers of the electronic musical instrument device 101 to be detected, the network 102, the client 103, and the server 104 in fig. 1 are merely illustrative. There may be any number of electronic musical instrument devices 101 to be detected, a network 102, a client 103, and a server 104, for example, the server 104 may be a server cluster made up of a plurality of servers, or the like, as needed for implementation.

Optionally, when the client 103 of the present application uploads the time domain audio signal corresponding to the electronic musical instrument device 101 to be detected to the server 104, specifically, any node server of the blockchain data server system may be uploaded, and any node server determines the sound level detection result according to the time domain audio signal corresponding to the electronic musical instrument device 101 to be detected, stores the sound level detection result, and effectively guarantees the safety and reliability of the sound level detection result based on the safety and unchangeable characteristics of blockchain data sharing.

The server 104 in this embodiment obtains the time domain audio signal corresponding to the electronic musical instrument to be detected; performing frequency domain conversion processing on the time domain audio signal to obtain a frequency domain audio signal; inputting the frequency domain audio signals into a pre-trained machine learning model, wherein the pre-trained machine learning model is obtained through sample data training comprising the frequency domain audio signals and fundamental frequency labels corresponding to the frequency domain audio signals; acquiring a fundamental frequency label corresponding to a frequency domain audio signal output by a pre-trained machine learning model; determining a target scale and a target sound component according to the fundamental frequency label and the corresponding relation between the fundamental frequency label and the scale and the sound component; and determining a sound level detection result of the electronic musical instrument equipment to be detected based on the determined target scale and target sound component and the standard scale and standard sound component corresponding to the electronic musical instrument equipment to be detected, wherein compared with the similarity comparison between the sound signal generated by the electronic musical instrument equipment and the standard sound signal, the sound level detection result of the electronic musical instrument equipment to be detected can be used for accurately detecting the sound generated by the electronic musical instrument equipment to be detected by detecting the fundamental frequency of the time domain audio signal generated by the electronic musical instrument equipment to be detected, and the accuracy of the sound level detection of the electronic musical instrument equipment is improved.

It should be noted that, the audio processing method provided in the embodiments of the present application is generally executed by the server 104, and accordingly, the audio processing apparatus is generally disposed in the server 104. Implementation details of the technical solutions of the embodiments of the present application are set forth in detail below.

Referring to fig. 2, fig. 2 is a flowchart of an audio processing method according to an exemplary embodiment of the present application, and an execution subject of the audio processing method provided in the embodiment of the present application is a server, specifically may be the server 104 shown in fig. 1, and the audio processing method shown in fig. 2 includes steps S210 to S260, which are described in detail below.

In step S210, a time-domain audio signal corresponding to the electronic musical instrument device to be detected is acquired.

In one embodiment, the electronic musical instrument device is a device that generates sound by an electronic signal, and may be an electronic organ, an electronic piano, an electronic synthesizer, an electronic drum, or the like. The electronic musical instrument equipment to be detected is used as the electronic musical instrument equipment needing to be subjected to sound level detection, the electronic musical instrument equipment can generate sound through a preset control instruction, the control instruction can be triggered by clicking a physical key in the electronic musical instrument equipment, the sound generated by the electronic musical instrument equipment is subjected to audio recording to obtain a sound signal, and the time domain audio signal is a sound signal of a certain time period extracted from the sound signal generated by the electronic musical instrument equipment.

Optionally, in an embodiment, the audio processing method may further include: and carrying out sampling processing on sound signals generated by the electronic musical instrument equipment to be detected based on a preset sampling frequency to obtain time domain audio signals corresponding to the electronic musical instrument equipment to be detected.

When the time-domain audio signal corresponding to the electronic musical instrument equipment to be detected is obtained, a preset control instruction is input to the electronic musical instrument equipment to generate sound, the sound generated by the electronic musical instrument equipment is recorded through the recording equipment to obtain a sound signal, after the sound signal generated by the electronic musical instrument equipment is obtained, the sound signal generated by the electronic musical instrument equipment to be detected can be sampled through a preset sampling frequency to obtain a time-domain audio signal, and further, the time-domain audio signal is obtained, for example, the sound signal generated by the electronic musical instrument equipment to be detected is sampled every 2 seconds, and the time length of the audio signal sampled every time can be 0.5 seconds.

In step S220, a frequency domain conversion process is performed on the time domain audio signal, to obtain a frequency domain audio signal.

In one embodiment, when obtaining the time-domain audio signal, in order to implement fundamental frequency detection on the sound signal generated by the electronic musical instrument to be detected, frequency-domain conversion processing needs to be performed on the time-domain audio signal to obtain a corresponding frequency-domain audio signal, and the manner of performing frequency-domain conversion processing on the time-domain audio signal may specifically be to perform fourier transform on the time-domain audio signal, so as to obtain the corresponding frequency-domain audio signal.

In step S230, the frequency domain audio signal is input into a pre-trained machine learning model, which is obtained by training sample data including the frequency domain audio signal and a baseband frequency tag corresponding to the frequency domain audio signal.

In one embodiment, the frequency domain audio signal obtained by performing the frequency domain conversion processing on the time domain audio signal is input into a pre-trained machine learning model, where the pre-trained machine learning model is obtained by training the machine learning model through training sample data. The machine learning model may be a CNN (Convolutional Neural Network ) model, or may be a deep neural network model or the like.

Referring to fig. 3, fig. 3 is a flowchart of an audio processing method according to an exemplary embodiment of the present application, and the audio processing method in this embodiment may include steps S310 to S320, which are described in detail below.

In step S310, training set sample data for training a machine learning model to be trained is obtained, where each piece of sample data in the training set sample data includes a frequency domain audio signal and a baseband frequency tag corresponding to the frequency domain audio signal.

In one embodiment, each piece of sample data in the training set sample data includes a frequency domain audio signal and a baseband frequency tag generated from a baseband frequency corresponding to the frequency domain audio signal.

In step S320, the machine learning model to be trained is trained by the training set sample data, and a trained machine learning model is obtained.

In one embodiment, training set sample data is input into a machine learning model, and the machine learning model to be trained is trained through the training set sample data, so that a trained machine learning model is obtained. The process of training the machine learning model is to adjust each coefficient in the network structure corresponding to the machine learning model, so that the input frequency domain audio signal is subjected to each coefficient operation in the network structure corresponding to the machine learning model, and the output result is the determined fundamental frequency label.

Referring to fig. 4, fig. 4 is a flowchart of an audio processing method according to an exemplary embodiment of the present application, and the audio processing method in this embodiment may include steps S410 to S430, which are described in detail below.

In step S410, test set sample data for verifying the trained machine learning model is obtained, where each sample data in the test set sample data includes a frequency domain audio signal and a baseband frequency tag corresponding to the frequency domain audio signal.

In one embodiment, the trained machine learning model also needs to be checked to ensure that the machine learning model meets the expected effects. When the trained machine learning model is verified, test set sample data for verifying the trained machine learning model can be obtained, wherein each sample data in the test set sample data comprises a frequency domain audio signal and a fundamental frequency label corresponding to the frequency domain audio signal.

In step S420, the frequency domain audio signal of each sample data of the test set sample data is input to the trained machine learning model, and the predicted baseband frequency label is output.

In one embodiment, the frequency domain audio signal of each sample data of the test set is input to the trained machine learning model, and each coefficient in the network structure corresponding to the machine learning model processes the frequency domain audio signal contained in each sample data to obtain the predicted fundamental frequency label for each sample data.

In step S430, if the ratio of the number of samples with the fundamental frequency label consistent with the predicted fundamental frequency label in the sample data of the test set to the total number of samples in the sample data of the test set exceeds the predetermined ratio threshold, the trained machine learning model is identified as a pre-trained machine learning model.

In one embodiment, for each piece of sample data in the test set sample data, if the ratio of the number of sample data pieces in the test set sample data, in which the baseband frequency label is consistent with the predicted baseband frequency label, to the total number of sample data pieces in the test set sample data exceeds a predetermined ratio threshold, it is indicated that the trained machine learning model meets the expected functional effect, the trained machine learning model may be identified as a pre-trained machine learning model, otherwise, it needs to be trained by the training set sample data until the trained machine learning model meets the expected functional effect.

Referring to fig. 5, fig. 5 is a specific flowchart of step S230 of the audio processing method according to an exemplary embodiment of the present application, and step S230 may include steps S510 to S520, which are described in detail below.

In step S510, a frequency-domain audio signal within a predetermined frequency range is selected from among the frequency-domain audio signals, and the selected frequency-domain audio signal is obtained.

In one embodiment, before the frequency-domain audio signal is input into the pre-trained machine learning model, since the frequency-domain audio signal may include environmental noise, in order to improve accuracy of the fundamental frequency corresponding to the determined frequency-domain audio signal, the frequency-domain audio signal corresponding to the environmental noise in the frequency-domain audio signal needs to be filtered.

Specifically, the frequency range in which the frequency domain of the frequency domain audio signal is located may be detected first, and the frequency domain audio signal in the predetermined frequency range may be selected based on the frequency range in which the frequency domain audio signal is detected. Because the frequency range of the sound signals generated by different types of electronic musical instrument equipment is a fixed frequency range, and the frequency range corresponding to the environmental noise does not belong to the frequency range, the frequency domain audio signals in the preset frequency range can be selected from the frequency domain audio signals, and the selected frequency domain audio signals are obtained so as to realize the filtering of the frequency domain audio signals corresponding to the environmental noise contained in the frequency domain audio signals. The predetermined frequency range included in the frequency domain audio signal is associated with the type of the electronic musical instrument equipment to be detected, so that the corresponding relation between the frequency range of the sound signal generated by the electronic musical instrument equipment of different types and the type of the electronic musical instrument equipment can be stored in the storage area of the system, and when the predetermined frequency range corresponding to the electronic musical instrument equipment to be detected needs to be acquired, the predetermined frequency range corresponding to the electronic musical instrument equipment to be detected can be determined through the type of the electronic musical instrument equipment to be detected and the corresponding relation.

In step S520, the selected frequency domain audio signal is input into a pre-trained machine learning model.

In one embodiment, the selected frequency domain audio signal is input into the pre-trained machine learning model, and the accuracy of the fundamental frequency corresponding to the frequency domain audio signal detected by the pre-trained machine learning model can be effectively improved due to the fact that the environmental noise filtering is performed on the frequency domain audio signal corresponding to the electronic musical instrument equipment to be detected.

Still referring to fig. 2, in step S240, a baseband frequency label corresponding to the frequency domain audio signal output by the pre-trained machine learning model is obtained.

In one embodiment, a baseband frequency tag corresponding to a frequency domain audio signal output by a pre-trained machine learning model is obtained when the frequency domain audio signal is input to the pre-trained machine learning model.

In step S250, the target scale and the target score are determined according to the fundamental frequency label and the correspondence between the fundamental frequency label and the scale and score.

In one embodiment, the musical scales refer to a sound column formed by arranging musical scales generated by the electronic musical instrument equipment to be detected according to a pitch order, and the musical scores refer to a quantized value of musical intervals contained in each musical scale, and different musical scores reflect different frequencies of sound. After obtaining the baseband frequency label corresponding to the frequency domain audio signal, the target scale and the target sound corresponding to the frequency domain audio signal can be determined according to the baseband frequency label corresponding to the frequency domain audio signal and the corresponding relation between the preset baseband frequency label and the scale and the sound, wherein the corresponding relation between the preset baseband frequency label and the scale and the sound is generated according to the corresponding relation between the two parameters of the scale and the sound of the international standard and the corresponding frequencies.

In step S260, a intonation detection result of the electronic musical instrument device to be detected is determined based on the determined target musical scale and target intonation and the standard musical scale and standard intonation corresponding to the electronic musical instrument device to be detected.

In one embodiment, the standard musical scale and standard sound component corresponding to the electronic musical instrument to be detected are the standard musical scale and standard sound component corresponding to the sound signal generated by the electronic musical instrument. And respectively comparing the target musical scale and the target sound score corresponding to the frequency domain audio signal with the standard musical scale and the standard sound score corresponding to the electronic musical instrument equipment to be detected so as to determine the sound level detection result of the electronic musical instrument equipment to be detected.

Optionally, in one embodiment, step S260 may specifically include: if the scale difference between the target scale and the standard scale is smaller than the preset scale difference and the scale difference between the target scale and the standard scale is smaller than the preset scale difference, determining that the tone detection result of the electronic musical instrument equipment to be detected meets the preset detection requirement; if the scale difference value between the target scale and the standard scale is greater than or equal to the preset scale difference value and/or the scale difference value between the target tone and the standard scale is greater than or equal to the preset scale difference value, determining that the tone detection result of the electronic musical instrument equipment to be detected does not meet the preset detection requirement.

In one embodiment, a target scale corresponding to the frequency domain audio signal is compared with a standard scale corresponding to the electronic musical instrument to be detected to determine a scale difference value between the target scale and the standard scale, and a pitch difference value between a target pitch corresponding to the frequency domain audio signal and the standard pitch corresponding to the electronic musical instrument to be detected is determined. If the scale difference between the target scale and the standard scale is smaller than the preset scale difference and the scale difference between the target scale and the standard scale is smaller than the preset scale difference, determining that the tone detection result of the electronic musical instrument equipment to be detected meets the preset detection requirement. Otherwise, if the scale difference between the target scale and the standard scale is greater than or equal to the preset scale difference, and/or the scale difference between the target tone and the standard scale is greater than or equal to the preset scale difference, determining that the pitch detection result of the electronic musical instrument equipment to be detected does not meet the preset detection requirement. It will be appreciated that the predetermined musical scale difference value may be set to 1, the predetermined musical scale difference value may be set to 2, and of course, the predetermined musical scale difference value may be other values, which are not limited herein.

As can be seen from the above, by performing frequency domain conversion processing on the time domain audio signal to obtain a frequency domain audio signal, performing fundamental frequency detection on the frequency domain audio signal to obtain fundamental frequency of the time domain audio signal corresponding to the electronic musical instrument to be detected, determining the scale and the pitch of the sound generated by the electronic musical instrument to be detected based on the determined fundamental frequency, so as to implement the pitch detection on the sound generated by the electronic musical instrument to be detected, and by detecting the fundamental frequency of the time domain audio signal generated by the electronic musical instrument to be detected, the accuracy of the pitch detection on the electronic musical instrument to be detected can be improved.

Referring to fig. 6, fig. 6 is a flowchart of an audio processing method according to an exemplary embodiment of the present application, and the audio processing method in this embodiment may include steps S610 to S620, which are described in detail below.

In step S610, a notification message of the level detection is generated based on the level detection result.

In one embodiment, after obtaining the sound level detection result, a notification message for sound level detection may be generated according to the sound level detection result, where the notification message may be a voice message or a text message, and is not limited herein.

In step S620, a predetermined notification operation is performed based on the generated notification message.

In one embodiment, after generating the notification message for the level detection, a predetermined notification operation may be performed based on the generated notification message, and when the notification message is a voice message, the level detection result may be played through a voice device of the electronic device, and when the notification message is a text message, the notification message may be displayed through a display device of the electronic device, for example, the notification message may be displayed in a display interface of the electronic device.

The technical solution of the embodiment shown in fig. 6 may enable a user to timely obtain a sound level detection result of an electronic musical instrument device to be detected.

Referring to fig. 7, fig. 7 is a block diagram of an audio processing apparatus according to an embodiment of the present application, which may be integrated in an electronic device, and an audio processing apparatus 700 according to an embodiment of the present application may include: a first acquisition unit 710, a conversion unit 720, an input unit 730, a second acquisition unit 740, a first execution unit 750, and a second execution unit 760; a first obtaining unit 710, configured to obtain a time-domain audio signal corresponding to the electronic musical instrument device to be detected; a conversion unit 720, configured to perform frequency domain conversion processing on the time domain audio signal to obtain a frequency domain audio signal; an input unit 730, configured to input a frequency domain audio signal into a pre-trained machine learning model, where the pre-trained machine learning model is obtained by training sample data including the frequency domain audio signal and a baseband frequency tag corresponding to the frequency domain audio signal; a second obtaining unit 740, configured to obtain a baseband frequency tag corresponding to the frequency domain audio signal output by the pre-trained machine learning model; a first execution unit 750, configured to determine a target scale and a target score according to the baseband frequency tag and a correspondence between the baseband frequency tag and the scale and score; and a second execution unit 760 for determining a sound level detection result of the electronic musical instrument device to be detected based on the determined target scale and target sound score and the standard scale and standard sound score corresponding to the electronic musical instrument device to be detected.

Optionally, the audio processing device further comprises: and the sampling unit is used for sampling the sound signals generated by the electronic musical instrument equipment to be detected based on a preset sampling frequency to obtain time domain audio signals corresponding to the electronic musical instrument equipment to be detected.

Alternatively, the input unit 730 is configured to: selecting a frequency domain audio signal within a preset frequency range from the frequency domain audio signals to obtain a selected frequency domain audio signal; and inputting the selected frequency domain audio signals into the pre-trained machine learning model.

Optionally, the second execution unit 760 is configured to: if the scale difference value between the target scale and the standard scale is smaller than the preset scale difference value and the scale difference value between the target tone and the standard scale is smaller than the preset scale difference value, determining that the tone detection result of the electronic musical instrument equipment to be detected meets the preset detection requirement; and if the scale difference value between the target scale and the standard scale is greater than or equal to a preset scale difference value and/or the pitch difference value between the target tone and the standard pitch is greater than or equal to a preset pitch difference value, determining that the pitch detection result of the electronic musical instrument equipment to be detected does not meet the preset detection requirement.

Optionally, the audio processing device further comprises: a first generation unit: the notification message is used for generating the sound level detection based on the sound level detection result; and a third execution unit configured to execute a predetermined notification operation based on the generated notification message.

Optionally, the audio processing device further comprises: the second acquisition unit is used for acquiring training set sample data for training a machine learning model to be trained, and each piece of sample data in the training set sample data comprises a frequency domain audio signal and a fundamental frequency label corresponding to the frequency domain audio signal; and the training unit is used for training the machine learning model to be trained through the training set sample data to obtain a trained machine learning model.

Optionally, the audio processing device further comprises: the third acquisition unit is used for acquiring test set sample data for checking the trained machine learning model, and each piece of sample data in the test set sample data comprises a frequency domain audio signal and a fundamental frequency label corresponding to the frequency domain audio signal; the fourth execution unit is used for inputting the frequency domain audio signal of each sample data of the test set to the trained machine learning model and outputting a predicted fundamental frequency label; and the identification unit is used for identifying the trained machine learning model as the pre-trained machine learning model if the proportion of the number of the sample data strips, which are consistent with the fundamental frequency label in the sample data of the test set and the predicted fundamental frequency label, to the total number of the sample data strips in the sample data of the test set exceeds a preset proportion threshold value.

The implementation process of the functions and roles of each module in the above device is specifically shown in the implementation process based on the corresponding steps in the audio processing method, and will not be described herein.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments disclosed herein. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

Furthermore, although the various steps of the methods herein are depicted in the accompanying drawings in a particular order, this is not required to either suggest that the steps must be performed in that particular order, or that all of the illustrated steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a mobile terminal, or a network device, etc.) to perform the method according to the embodiments of the present application.

In an exemplary embodiment of the present application, an electronic device capable of implementing the above method is also provided.

Those skilled in the art will appreciate that the various aspects of the present application may be implemented as a system, method, or program product. Accordingly, aspects of the present application may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.

Referring to fig. 8, fig. 8 is an exemplary block diagram of an electronic device for implementing the above-described audio processing method according to an exemplary embodiment of the present application. The electronic device 800 shown in fig. 8 is merely an example and should not be construed as limiting the functionality and scope of use of embodiments of the present application.

As shown in fig. 8, the electronic device 800 is embodied in the form of a general purpose computing device. Components of electronic device 800 may include, but are not limited to: the at least one processing unit 810, the at least one memory unit 820, and a bus 830 connecting the various system components, including the memory unit 820 and the processing unit 810.

Wherein the storage unit stores program code that is executable by the processing unit 810 such that the processing unit 810 performs steps according to various exemplary embodiments of the present application described in the above section of the "exemplary method" of the present specification. For example, the processing unit 810 may perform steps S210 to S260 as shown in fig. 2.

The storage unit 820 may include readable media in the form of volatile storage units, such as Random Access Memory (RAM) 8201 and/or cache memory 8202, and may further include Read Only Memory (ROM) 8203.

Storage unit 820 may also include a program/utility 8204 having a set (at least one) of program modules 8205, such program modules 8205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

Bus 830 may be one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 800 may also communicate with one or more external devices 1000 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with the electronic device 800, and/or any device (e.g., router, modem, etc.) that enables the electronic device 800 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 840. Also, electronic device 800 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 860. As shown, network adapter 860 communicates with other modules of electronic device 800 over bus 830. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 800, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a usb disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the embodiments of the present application.

In an exemplary embodiment of the present application, a computer readable storage medium is also provided, on which a program product capable of implementing the method described in the present specification is stored. In some possible implementations, the various aspects of the present application may also be implemented in the form of a program product comprising program code for causing an electronic device to carry out the steps according to the various exemplary embodiments of the present application as described in the "exemplary methods" section of this specification, when the program product is run on a terminal device.

Referring to fig. 9, fig. 9 is a computer readable storage medium for implementing the above data verification method according to an exemplary embodiment of the present application. Fig. 9 depicts a program product 900 for implementing the above-described method according to an embodiment of the present application, which may employ a portable compact disc read-only memory (CD-ROM) and include program code, and which may be run on an electronic device, such as a personal computer. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

Furthermore, the above-described figures are only illustrative of the processes involved in the method according to exemplary embodiments of the present application, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

Claims

1. An audio processing method, comprising:

acquiring a time domain audio signal corresponding to electronic musical instrument equipment to be detected;

performing frequency domain conversion processing on the time domain audio signal to obtain a frequency domain audio signal;

inputting the frequency domain audio signals into a pre-trained machine learning model, wherein the pre-trained machine learning model is obtained through training sample data comprising the frequency domain audio signals and fundamental frequency labels corresponding to the frequency domain audio signals;

acquiring a fundamental frequency label corresponding to the frequency domain audio signal output by the pre-trained machine learning model;

determining a target scale and a target score according to the fundamental frequency label and the corresponding relation between the fundamental frequency label and the scale and the score;

Determining a sound level detection result of the electronic musical instrument equipment to be detected based on the determined target musical scale and target sound point and the standard musical scale and standard sound point corresponding to the electronic musical instrument equipment to be detected;

if the scale difference between the target scale and the standard scale is smaller than a preset scale difference, and the scale difference between the target scale and the standard scale is smaller than a preset scale difference, determining that the pitch detection result of the electronic musical instrument equipment to be detected meets the preset detection requirement;

and if the scale difference value between the target scale and the standard scale is greater than or equal to a preset scale difference value and/or the pitch difference value between the target tone and the standard pitch is greater than or equal to a preset pitch difference value, determining that the pitch detection result of the electronic musical instrument equipment to be detected does not meet the preset detection requirement.

2. The audio processing method according to claim 1, characterized in that the audio processing method further comprises:

and carrying out sampling processing on the sound signals generated by the electronic musical instrument equipment to be detected based on a preset sampling frequency to obtain time domain audio signals corresponding to the electronic musical instrument equipment to be detected.

3. The audio processing method of claim 1, wherein the inputting the frequency domain audio signal into a pre-trained machine learning model comprises:

selecting a frequency domain audio signal within a preset frequency range from the frequency domain audio signals to obtain a selected frequency domain audio signal;

and inputting the selected frequency domain audio signals into the pre-trained machine learning model.

4. The audio processing method according to claim 1, characterized in that the audio processing method further comprises:

generating a notification message of the sound level detection based on the sound level detection result;

a predetermined notification operation is performed based on the generated notification message.

5. The audio processing method according to claim 1, characterized in that the audio processing method further comprises:

acquiring training set sample data for training a machine learning model to be trained, wherein each piece of sample data in the training set sample data comprises a frequency domain audio signal and a fundamental frequency label corresponding to the frequency domain audio signal;

and training the machine learning model to be trained through the training set sample data to obtain a trained machine learning model.

6. The audio processing method according to claim 5, characterized in that the audio processing method further comprises:

acquiring test set sample data for verifying a trained machine learning model, wherein each piece of sample data in the test set sample data comprises a frequency domain audio signal and a fundamental frequency label corresponding to the frequency domain audio signal;

inputting the frequency domain audio signal of each sample data of the test set to a trained machine learning model, and outputting to obtain a predicted fundamental frequency label;

and if the ratio of the number of the sample data strips, which are consistent with the fundamental frequency label in the sample data of the test set and the predicted fundamental frequency label, to the total number of the sample data strips in the sample data of the test set exceeds a preset ratio threshold, identifying the trained machine learning model as the pre-trained machine learning model.

7. An audio processing apparatus, comprising:

the first acquisition unit is used for acquiring a time domain audio signal corresponding to the electronic musical instrument equipment to be detected;

the conversion unit is used for carrying out frequency domain conversion processing on the time domain audio signals to obtain frequency domain audio signals;

the input unit is used for inputting the frequency domain audio signals into a pre-trained machine learning model, wherein the pre-trained machine learning model is obtained through training sample data comprising the frequency domain audio signals and fundamental frequency labels corresponding to the frequency domain audio signals;

The second acquisition unit is used for acquiring a fundamental frequency label corresponding to the frequency domain audio signal output by the pre-trained machine learning model;

the first execution unit is used for determining a target musical scale and a target sound component according to the fundamental frequency label and the corresponding relation between the fundamental frequency label and the musical scale and the sound component;

the second execution unit is used for determining a tone detection result of the electronic musical instrument equipment to be detected based on the determined target musical scale and target tone score and the standard musical scale and standard tone score corresponding to the electronic musical instrument equipment to be detected; if the scale difference between the target scale and the standard scale is smaller than a preset scale difference, and the scale difference between the target scale and the standard scale is smaller than a preset scale difference, determining that the pitch detection result of the electronic musical instrument equipment to be detected meets the preset detection requirement; and if the scale difference value between the target scale and the standard scale is greater than or equal to a preset scale difference value and/or the pitch difference value between the target tone and the standard pitch is greater than or equal to a preset pitch difference value, determining that the pitch detection result of the electronic musical instrument equipment to be detected does not meet the preset detection requirement.

8. An electronic device comprising a memory and a processor, the memory having stored therein computer readable instructions that, when executed by the processor, cause the processor to perform the audio processing method of any of claims 1-6.

9. A storage medium storing computer readable instructions which, when executed by one or more processors, cause the one or more processors to perform the audio processing method of any of claims 1 to 6.