CN112489682A

CN112489682A - Audio processing method and device, electronic equipment and storage medium

Info

Publication number: CN112489682A
Application number: CN202011341834.0A
Authority: CN
Inventors: 蒋慧军; 徐伟; 杨艾琳; 姜凯英; 肖京
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-11-25
Filing date: 2020-11-25
Publication date: 2021-03-12
Anticipated expiration: 2040-11-25
Also published as: CN112489682B; WO2021213135A1

Abstract

The application discloses an audio processing method, an audio processing device, electronic equipment and a storage medium, and relates to the technical field of computers. The audio processing method comprises the following steps: acquiring a time domain audio signal corresponding to electronic musical instrument equipment to be detected; carrying out frequency domain conversion processing on the time domain audio signal to obtain a frequency domain audio signal; inputting the frequency domain audio signal into a pre-trained machine learning model; obtaining a fundamental frequency label corresponding to a frequency domain audio signal output by a pre-trained machine learning model; determining a target scale and a target score according to the fundamental frequency tags and the corresponding relationship between the fundamental frequency tags and the scale and the score; and based on the determined target scale and target score and the standard scale and standard score corresponding to the electronic musical instrument equipment to be detected. According to the time domain audio signal processing method and device, the time domain audio signal can be uploaded to any node server in the block chain server system through the client, and the accuracy of tone level detection of the electronic musical instrument equipment is improved.

Description

Audio processing method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for processing audio, an electronic device, and a storage medium.

Background

Electronic musical instrument devices are widely preferred as musical instruments that generate sounds by electronic signals. When the electronic musical instrument equipment is produced, the sound signals generated by the electronic musical instrument equipment need to be analyzed so as to detect the tone level condition of the sound signals generated by the electronic musical instrument equipment, and only the qualified electronic musical instrument equipment can be put on the market for sale.

In the related art, when the electronic musical instrument device is detected, only the electronic musical instrument device whose similarity meets a predetermined condition is qualified by mainly comparing the similarity of the sound signal generated by the electronic musical instrument device with the standard sound signal. The method for detecting the intonation condition of the sound signal generated by the electronic musical instrument equipment provided by the related art can only detect the sound signal generated by the electronic musical instrument equipment when the sound signal is greatly different from a standard sound signal, and has the technical problem of low detection precision.

Disclosure of Invention

Based on the above, the application provides an audio processing method, an audio processing device, an electronic device and a storage medium, which improve accuracy of tone level detection of the electronic musical instrument device.

In a first aspect, the present application provides an audio processing method, including: acquiring a time domain audio signal corresponding to electronic musical instrument equipment to be detected; performing frequency domain conversion processing on the time domain audio signal to obtain a frequency domain audio signal; inputting the frequency domain audio signal into a pre-trained machine learning model, wherein the pre-trained machine learning model is obtained by training sample data containing the frequency domain audio signal and a fundamental frequency label corresponding to the frequency domain audio signal; obtaining a fundamental frequency label corresponding to the frequency domain audio signal output by the pre-trained machine learning model; determining a target scale and a target score according to the fundamental frequency tags and the corresponding relationship between the fundamental frequency tags and the scale and the score; and determining a intonation detection result of the electronic musical instrument equipment to be detected based on the determined target scale and target score and the standard scale and standard score corresponding to the electronic musical instrument equipment to be detected.

In a second aspect, the present application provides an audio processing apparatus comprising: the first acquisition unit is used for acquiring a time domain audio signal corresponding to the electronic musical instrument device to be detected; the conversion unit is used for carrying out frequency domain conversion processing on the time domain audio signal to obtain a frequency domain audio signal; the input unit is used for inputting the frequency domain audio signals into a pre-trained machine learning model, and the pre-trained machine learning model is obtained by training sample data containing the frequency domain audio signals and fundamental frequency labels corresponding to the frequency domain audio signals; the second acquisition unit is used for acquiring a fundamental frequency label corresponding to the frequency domain audio signal output by the pre-trained machine learning model; the first execution unit is used for determining a target scale and a target score according to the fundamental frequency label and the corresponding relation between the fundamental frequency label and the scale and the score; and the second execution unit is used for determining the intonation detection result of the electronic musical instrument equipment to be detected based on the determined target scale and target score and the standard scale and standard score corresponding to the electronic musical instrument equipment to be detected.

In a third aspect, the present application provides an electronic device comprising a memory and a processor, the memory having stored therein computer-readable instructions, which, when executed by the processor, cause the processor to perform the steps of the audio processing method described above.

In a fourth aspect, the present application provides a storage medium having stored thereon computer-readable instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of the audio processing method described above.

The technical scheme provided by the embodiment of the application can have the following beneficial effects: the method comprises the steps of carrying out frequency domain conversion processing on a time domain audio signal to obtain a frequency domain audio signal, carrying out fundamental frequency detection on the frequency domain audio signal to obtain fundamental frequency of a time domain audio signal corresponding to electronic musical instrument equipment to be detected, determining scale and score of sound generated by the electronic musical instrument equipment to be detected based on the determined fundamental frequency to realize tone level detection on the sound generated by the electronic musical instrument equipment to be detected.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of the embodiments of the present application can be applied.

Fig. 2 is a flowchart illustrating an audio processing method according to an exemplary embodiment of the present application.

Fig. 3 is a flowchart illustrating an audio processing method according to an exemplary embodiment of the present application.

Fig. 4 is a flowchart illustrating an audio processing method according to an exemplary embodiment of the present application.

Fig. 5 is a specific flowchart illustrating the step S230 of the audio processing method according to an exemplary embodiment of the present application.

Fig. 6 is a flowchart illustrating an audio processing method according to an exemplary embodiment of the present application.

Fig. 7 is a block diagram of an audio processing device according to one embodiment of the present application.

Fig. 8 is a block diagram illustrating an example of an electronic device for implementing the audio processing method according to an example embodiment of the present application.

Fig. 9 illustrates a computer-readable storage medium for implementing the data verification method according to an exemplary embodiment of the present application.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

As shown in fig. 1, the system architecture may include an electronic musical instrument device 101 to be detected, a network 102, a client 103 and a server 104, where the client 103 obtains a time domain audio signal corresponding to the electronic musical instrument device 101 to be detected, the client 103 uploads the obtained time domain audio signal corresponding to the electronic musical instrument device 101 to be detected to the server 104, the server 104 may be a server providing a intonation detection service, and the client 101 may be one or more of a smart phone, a tablet computer and a portable computer, and certainly may also be a desktop computer and the like. The network 102 is a medium for providing communication links between the electronic musical instrument device 101 to be detected and the client terminal 103, and the server 104. Network 102 may include various connection types, such as wired communication links, wireless communication links, and so forth.

It should be understood that the numbers of the electronic musical instrument devices 101 to be detected, the network 102, the clients 103, and the servers 104 in fig. 1 are merely illustrative. There may be any number of electronic musical instrument devices 101 to be detected, networks 102, clients 103, and servers 104 according to implementation needs, for example, the server 104 may be a server cluster composed of a plurality of servers, and the like.

Optionally, when the client 103 uploads the time domain audio signal corresponding to the electronic musical instrument device 101 to be detected to the server 104, the time domain audio signal may be specifically uploaded to any node server of the block chain data server system, and any node server determines the intonation detection result according to the time domain audio signal corresponding to the electronic musical instrument device 101 to be detected and stores the intonation detection result.

In this embodiment, the server 104 obtains the time domain audio signal corresponding to the electronic musical instrument device to be detected; carrying out frequency domain conversion processing on the time domain audio signal to obtain a frequency domain audio signal; inputting the frequency domain audio signal into a pre-trained machine learning model, wherein the pre-trained machine learning model is obtained by training sample data containing the frequency domain audio signal and a fundamental frequency label corresponding to the frequency domain audio signal; obtaining a fundamental frequency label corresponding to a frequency domain audio signal output by a pre-trained machine learning model; determining a target scale and a target score according to the fundamental frequency tags and the corresponding relationship between the fundamental frequency tags and the scale and the score; and determining a intonation detection result of the electronic musical instrument equipment to be detected based on the determined target scale and the standard scale and standard scale corresponding to the electronic musical instrument equipment to be detected, and detecting the intonation of the sound generated by the electronic musical instrument equipment to be detected more accurately by detecting the fundamental frequency of the time domain audio signal generated by the electronic musical instrument equipment to be detected compared with the similarity comparison between the sound signal generated by the electronic musical instrument equipment and the standard sound signal, thereby improving the accuracy of intonation detection of the electronic musical instrument equipment.

It should be noted that the audio processing method provided by the embodiment of the present application is generally executed by the server 104, and accordingly, the audio processing apparatus is generally disposed in the server 104. The details of implementation of the technical solution of the embodiments of the present application are set forth in the following.

Referring to fig. 2, fig. 2 is a flowchart of an audio processing method according to an exemplary embodiment of the present application, an execution subject of the audio processing method according to the embodiment of the present application is a server, and may specifically be the server 104 shown in fig. 1, and the audio processing method shown in fig. 2 includes steps S210 to S260, which are described in detail as follows.

In step S210, a time domain audio signal corresponding to the electronic musical instrument device to be detected is obtained.

In one embodiment, the electronic musical instrument device is a device that generates sound by an electronic signal, and may be an electronic organ, an electronic piano, an electronic synthesizer, an electronic drum, or the like. The electronic musical instrument equipment to be detected is used as the electronic musical instrument equipment needing intonation detection, the electronic musical instrument equipment can generate sound through a preset control instruction, the control instruction can be triggered by clicking an entity key in the electronic musical instrument equipment, the sound generated by the electronic musical instrument equipment is subjected to audio recording to obtain a sound signal, and the time domain audio signal is the sound signal of a certain time period extracted from the sound signal generated by the electronic musical instrument equipment.

Optionally, in an embodiment, the audio processing method may further include: and sampling the sound signal generated by the electronic musical instrument equipment to be detected based on the preset sampling frequency to obtain a time domain audio signal corresponding to the electronic musical instrument equipment to be detected.

When acquiring a time domain audio signal corresponding to an electronic musical instrument device to be detected, a preset control instruction may be input to the electronic musical instrument device to generate sound, and the sound signal may be obtained by recording the sound of the generated sound by a recording device, and after obtaining the sound signal generated by the electronic musical instrument device, the sound signal generated by the electronic musical instrument device to be detected may be sampled at a preset sampling frequency to obtain a time domain audio signal, and then an audio signal of a time domain dimension may be obtained, for example, the sound signal generated by the electronic musical instrument device to be detected may be sampled every 2 seconds, and the duration of the audio signal sampled every time may be 0.5 seconds.

In step S220, the time-domain audio signal is subjected to frequency domain conversion processing to obtain a frequency-domain audio signal.

In an embodiment, when obtaining the time-domain audio signal, in order to implement the fundamental frequency detection on the sound signal generated by the electronic musical instrument device to be detected, the time-domain audio signal needs to be subjected to frequency domain conversion processing first to obtain a corresponding frequency-domain audio signal, and the way of performing the frequency domain conversion processing on the time-domain audio signal may specifically be to perform fourier transform on the time-domain audio signal to obtain a corresponding frequency-domain audio signal.

In step S230, the frequency domain audio signal is input into a pre-trained machine learning model, which is obtained by training sample data including the frequency domain audio signal and a fundamental frequency tag corresponding to the frequency domain audio signal.

In one embodiment, the frequency domain audio signal obtained by performing the frequency domain conversion on the time domain audio signal is input into a pre-trained machine learning model, and the pre-trained machine learning model is obtained by training the machine learning model through training sample data. The machine learning model may be a CNN (Convolutional Neural Network) model, or may be a deep Neural Network model, or the like.

Referring to fig. 3, fig. 3 is a flowchart illustrating an audio processing method according to an exemplary embodiment of the present application, where the audio processing method in this embodiment may include steps S310 to S320, which is described in detail as follows.

In step S310, training set sample data for training a machine learning model to be trained is obtained, where each sample data in the training set sample data includes a frequency domain audio signal and a fundamental frequency label corresponding to the frequency domain audio signal.

In one embodiment, each sample data in the training set sample data includes a frequency domain audio signal and a fundamental frequency tag generated according to a fundamental frequency corresponding to the frequency domain audio signal.

In step S320, the machine learning model to be trained is trained through the training set sample data, so as to obtain a trained machine learning model.

In one embodiment, the training set sample data is input into the machine learning model, and the machine learning model to be trained is trained through the training set sample data to obtain the trained machine learning model. The process of training the machine learning model is to adjust each coefficient in the network structure corresponding to the machine learning model, so that for the input frequency domain audio signal, the output result is the determined fundamental frequency label through each coefficient operation in the network structure corresponding to the machine learning model.

Referring to fig. 4, fig. 4 is a flowchart illustrating an audio processing method according to an exemplary embodiment of the present application, where the audio processing method in this embodiment may include steps S410 to S430, which are described in detail as follows.

In step S410, test set sample data for verifying the trained machine learning model is obtained, where each sample data in the test set sample data includes a frequency domain audio signal and a fundamental frequency tag corresponding to the frequency domain audio signal.

In one embodiment, the trained machine learning model may also need to be verified to ensure that the machine learning model meets the expected effect. When the trained machine learning model is verified, test set sample data used for verifying the trained machine learning model can be obtained, wherein each sample data in the test set sample data comprises a frequency domain audio signal and a fundamental frequency tag corresponding to the frequency domain audio signal.

In step S420, the frequency domain audio signal of each sample data of the test set sample data is input to the trained machine learning model, and the predicted fundamental frequency label is output.

In one embodiment, the frequency domain audio signal of each sample data of the test set sample data is input to the trained machine learning model, and each coefficient in the network structure corresponding to the machine learning model processes the frequency domain audio signal contained in each sample data to obtain the predicted fundamental frequency tag for each sample data.

In step S430, if the ratio of the number of sample data pieces in the sample data of the test set, in which the fundamental frequency label is consistent with the predicted fundamental frequency label, to the total number of sample data pieces in the sample data of the test set exceeds a predetermined ratio threshold, the trained machine learning model is identified as a pre-trained machine learning model.

In an embodiment, for each sample data in the sample data of the test set, if the ratio of the number of sample data pieces, in which the fundamental frequency labels in the sample data of the test set are consistent with the predicted fundamental frequency labels, to the total number of sample data pieces in the sample data of the test set exceeds a predetermined ratio threshold, it is indicated that the trained machine learning model conforms to the expected functional effect, the trained machine learning model may be identified as a pre-trained machine learning model, otherwise, the trained machine learning model needs to be trained through the sample data of the training set until the trained machine learning model conforms to the expected functional effect.

Referring to fig. 5, fig. 5 is a detailed flowchart of step S230 of the audio processing method according to an exemplary embodiment of the present application, and step S230 may include step S510 to step S520, which is described in detail as follows.

In step S510, a frequency domain audio signal within a predetermined frequency range is selected from the frequency domain audio signals, resulting in a selected frequency domain audio signal.

In one embodiment, before the frequency-domain audio signal is input into the pre-trained machine learning model, since the frequency-domain audio signal includes the environmental noise, in order to improve the accuracy of the determined fundamental frequency corresponding to the frequency-domain audio signal, the frequency-domain audio signal corresponding to the environmental noise in the frequency-domain audio signal needs to be filtered.

Specifically, a frequency range in which the frequency domain of the frequency domain audio signal is located may be detected, and the frequency domain audio signal in the predetermined frequency range may be selected based on the detected frequency range in which the frequency domain audio signal is located. Because the frequency range of the sound signals generated by different types of electronic musical instrument devices is a fixed frequency range, and the frequency range corresponding to the environmental noise does not belong to the frequency range, the frequency domain audio signals within the preset frequency range can be selected from the frequency domain audio signals to obtain the selected frequency domain audio signals, so that the frequency domain audio signals corresponding to the environmental noise contained in the frequency domain audio signals can be filtered. The predetermined frequency range contained in the frequency domain audio signal is associated with the type of the electronic musical instrument device to be detected, so that the corresponding relation between the frequency range in which the sound signal generated by the electronic musical instrument device of different types is located and the type of the electronic musical instrument device can be stored in the storage area of the system, and when the predetermined frequency range corresponding to the electronic musical instrument device to be detected needs to be acquired, the predetermined frequency range corresponding to the electronic musical instrument device to be detected can be determined through the type of the electronic musical instrument device to be detected and the corresponding relation.

In step S520, the selected frequency domain audio signal is input into a pre-trained machine learning model.

In one embodiment, the selected frequency domain audio signal is input into the pre-trained machine learning model, and because the environmental noise filtering is performed on the frequency domain audio signal corresponding to the electronic musical instrument device to be detected, the accuracy of the fundamental frequency corresponding to the frequency domain audio signal detected by the pre-trained machine learning model can be effectively improved.

Still referring to fig. 2, in step S240, a fundamental frequency label corresponding to the frequency domain audio signal output by the pre-trained machine learning model is obtained.

In one embodiment, after the frequency domain audio signal is input to the pre-trained machine learning model, the fundamental frequency label corresponding to the frequency domain audio signal output by the pre-trained machine learning model is obtained.

In step S250, the target scale and the target score are determined according to the fundamental frequency tag and the corresponding relationship between the fundamental frequency tag and the scale and the score.

In one embodiment, the scale refers to a musical sequence formed by arranging musical tones generated by the electronic musical instrument device to be detected in the order of pitches, the scale refers to a quantized value of a musical interval included in each scale, and different scales reflect different frequencies of sound. After obtaining the fundamental frequency tags corresponding to the frequency domain audio signals, the target scale and the target score corresponding to the frequency domain audio signals can be determined according to the fundamental frequency tags corresponding to the frequency domain audio signals and the corresponding relationship between the preset fundamental frequency tags and the scale and the score, and the corresponding relationship between the preset fundamental frequency tags and the scale and the score is generated according to the corresponding relationship between the two parameters of the scale and the score and the corresponding frequencies thereof in the international standard.

In step S260, a intonation detection result of the electronic musical instrument device to be detected is determined based on the determined target scale and target score and the standard scale and standard score corresponding to the electronic musical instrument device to be detected.

In one embodiment, the standard scale and standard note corresponding to the electronic musical instrument device to be detected are the standard scale and standard note corresponding to the sound signal generated by the electronic musical instrument device. And comparing the target scale and the target scale corresponding to the frequency domain audio signal with the standard scale and the standard scale corresponding to the electronic musical instrument equipment to be detected respectively to determine the intonation detection result of the electronic musical instrument equipment to be detected.

Optionally, in an embodiment, step S260 may specifically include: if the scale difference value between the target scale and the standard scale is smaller than the preset scale difference value, and the score difference value between the target score and the standard score is smaller than the preset score difference value, determining that the intonation detection result of the electronic musical instrument equipment to be detected meets the preset detection requirement; and if the scale difference value between the target scale and the standard scale is larger than or equal to the preset scale difference value, and/or the score difference value between the target score and the standard score is larger than or equal to the preset score difference value, determining that the intonation detection result of the electronic musical instrument equipment to be detected does not meet the preset detection requirement.

In one embodiment, the target scale corresponding to the frequency domain audio signal is compared with the standard scale corresponding to the electronic musical instrument device to be detected to determine a scale difference value therebetween, and the score difference value between the target score corresponding to the frequency domain audio signal and the standard score corresponding to the electronic musical instrument device to be detected is determined. And if the scale difference value between the target scale and the standard scale is smaller than the preset scale difference value, and the score difference value between the target score and the standard score is smaller than the preset score difference value, determining that the intonation detection result of the electronic musical instrument equipment to be detected meets the preset detection requirement. Otherwise, if the scale difference value between the target scale and the standard scale is larger than or equal to the preset scale difference value, and/or the score difference value between the target score and the standard score is larger than or equal to the preset score difference value, determining that the intonation detection result of the electronic musical instrument equipment to be detected does not meet the preset detection requirement. It is understood that the predetermined scale difference value may be set to 1 and the predetermined score difference value may be set to 2, and of course, the predetermined score difference value may be other values, which is not limited herein.

It can be seen from the above that, by performing frequency domain conversion processing on the time domain audio signal to obtain a frequency domain audio signal, and performing fundamental frequency detection on the frequency domain audio signal to obtain a fundamental frequency of the time domain audio signal corresponding to the electronic musical instrument device to be detected, and determining the scale and the scale of the sound generated by the electronic musical instrument device to be detected based on the determined fundamental frequency, the sound level detection on the sound generated by the electronic musical instrument device to be detected is realized.

Referring to fig. 6, fig. 6 is a flowchart illustrating an audio processing method according to an exemplary embodiment of the present application, where the audio processing method in this embodiment may include steps S610 to S620, which are described in detail as follows.

In step S610, a notification message of intonation detection is generated based on the intonation detection result.

In one embodiment, after obtaining the intonation detection result, a notification message for the intonation detection may be generated according to the intonation detection result, and the notification message may be a voice message or a text message, which is not limited herein.

In step S620, a predetermined notification operation is performed based on the generated notification message.

In one embodiment, after the notification message for intonation detection is generated, a predetermined notification operation may be performed based on the generated notification message, and when the notification message is a voice message, the intonation detection result may be played through a voice device of the electronic device, and when the notification message is a text message, the notification message may be displayed through a display device of the electronic device, such as displaying the notification message in a display interface of the electronic device.

The technical scheme of the embodiment shown in fig. 6 can enable the user to timely acquire the tone level detection result of the electronic musical instrument device to be detected.

Referring to fig. 7, fig. 7 is a block diagram of an audio processing apparatus according to an embodiment of the present application, where the audio processing apparatus may be integrated in an electronic device, and an audio processing apparatus 700 according to an embodiment of the present application may include: a first obtaining unit 710, a converting unit 720, an input unit 730, a second obtaining unit 740, a first executing unit 750, and a second executing unit 760; the first obtaining unit 710 is configured to obtain a time domain audio signal corresponding to the electronic musical instrument device to be detected; a converting unit 720, configured to perform frequency domain conversion processing on the time domain audio signal to obtain a frequency domain audio signal; the input unit 730 is configured to input the frequency domain audio signal into a pre-trained machine learning model, where the pre-trained machine learning model is obtained by training sample data including the frequency domain audio signal and a fundamental frequency tag corresponding to the frequency domain audio signal; a second obtaining unit 740, configured to obtain a fundamental frequency tag corresponding to the frequency domain audio signal output by the pre-trained machine learning model; the first execution unit 750 is configured to determine a target scale and a target score according to the fundamental frequency tag and a corresponding relationship between the fundamental frequency tag and the scale and the score; the second executing unit 760 is configured to determine a intonation detection result of the electronic musical instrument device to be detected based on the determined target scale and target score and the standard scale and standard score corresponding to the electronic musical instrument device to be detected.

Optionally, the audio processing apparatus further includes: and the sampling unit is used for sampling the sound signal generated by the electronic musical instrument equipment to be detected based on a preset sampling frequency to obtain a time domain audio signal corresponding to the electronic musical instrument equipment to be detected.

Optionally, the input unit 730 is configured to: selecting a frequency domain audio signal in a preset frequency range from the frequency domain audio signals to obtain a selected frequency domain audio signal; and inputting the selected frequency domain audio signal into the pre-trained machine learning model.

Optionally, the second performing unit 760 is configured to: if the scale difference value between the target scale and the standard scale is smaller than a preset scale difference value, and the score difference value between the target score and the standard score is smaller than a preset score difference value, determining that the intonation detection result of the electronic musical instrument equipment to be detected meets the preset detection requirement; and if the scale difference value between the target scale and the standard scale is greater than or equal to a preset scale difference value, and/or the score difference value between the target scale and the standard score is greater than or equal to a preset score difference value, determining that the intonation detection result of the electronic musical instrument equipment to be detected does not meet the preset detection requirement.

Optionally, the audio processing apparatus further includes: a first generation unit: a notification message for generating intonation detection based on the intonation detection result; a third execution unit configured to execute a predetermined notification operation based on the generated notification message.

Optionally, the audio processing apparatus further includes: the second acquisition unit is used for acquiring training set sample data used for training a machine learning model to be trained, wherein each sample data in the training set sample data comprises a frequency domain audio signal and a fundamental frequency label corresponding to the frequency domain audio signal; and the training unit is used for training the machine learning model to be trained through the training set sample data to obtain the trained machine learning model.

Optionally, the audio processing apparatus further includes: a third obtaining unit, configured to obtain test set sample data used for verifying the trained machine learning model, where each piece of sample data in the test set sample data includes a frequency domain audio signal and a fundamental frequency tag corresponding to the frequency domain audio signal; the fourth execution unit is used for inputting the frequency domain audio signal of each sample data of the test set into the trained machine learning model and outputting to obtain a predicted fundamental frequency label; and the identification unit is used for identifying the trained machine learning model as the pre-trained machine learning model if the proportion of the number of sample data pieces with consistent fundamental frequency labels and predicted fundamental frequency labels in the sample data of the test set to the total number of sample data pieces in the sample data of the test set exceeds a preset proportion threshold.

The implementation process of the functions and actions of each module in the device is specifically described in the implementation process based on the corresponding steps in the audio processing method, and is not described herein again.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments disclosed herein. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Moreover, although the steps of the methods herein are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.

In an exemplary embodiment of the present application, there is also provided an electronic device capable of implementing the above method.

As will be appreciated by one skilled in the art, aspects of the present application may be embodied as a system, method or program product. Accordingly, various aspects of the present application may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

Referring to fig. 8, fig. 8 is a block diagram illustrating an example of an electronic device for implementing the audio processing method according to an example embodiment of the present application. The electronic device 800 shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 8, electronic device 800 is in the form of a general purpose computing device. The components of the electronic device 800 may include, but are not limited to: the at least one processing unit 810, the at least one memory unit 820, and a bus 830 that couples the various system components including the memory unit 820 and the processing unit 810.

Wherein the storage unit stores program code that is executable by the processing unit 810 to cause the processing unit 810 to perform steps according to various exemplary embodiments of the present application described in the above section "exemplary methods" of the present specification. For example, the processing unit 810 may perform steps S210 to S260 as shown in fig. 2.

The storage unit 820 may include readable media in the form of volatile memory units such as a random access memory unit (RAM)8201 and/or a cache memory unit 8202, and may further include a read only memory unit (ROM) 8203.

The storage unit 820 may also include a program/utility 8204 having a set (at least one) of program modules 8205, such program modules 8205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 830 may be any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 800 may also communicate with one or more external devices 1000 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 800, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 800 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 840. Also, the electronic device 800 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 860. As shown, the network adapter 860 communicates with the other modules of the electronic device 800 via the bus 830. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 800, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make a computing device (which can be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiments of the present application.

In an exemplary embodiment of the present application, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, various aspects of the present application may also be implemented in the form of a program product comprising program code means for causing an electronic device to carry out the steps according to various exemplary embodiments of the present application described in the above-mentioned "exemplary methods" section of the present description, when said program product is run on a terminal device.

Referring to fig. 9, fig. 9 illustrates a computer-readable storage medium for implementing the data verification method according to an exemplary embodiment of the present application. Fig. 9 depicts a program product 900 for implementing the above-described method according to an embodiment of the present application, which may employ a portable compact disc read-only memory (CD-ROM) and include program code, and may be run on an electronic device, such as a personal computer. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the present application, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

Claims

1. An audio processing method, comprising:

acquiring a time domain audio signal corresponding to electronic musical instrument equipment to be detected;

performing frequency domain conversion processing on the time domain audio signal to obtain a frequency domain audio signal;

inputting the frequency domain audio signal into a pre-trained machine learning model, wherein the pre-trained machine learning model is obtained by training sample data containing the frequency domain audio signal and a fundamental frequency label corresponding to the frequency domain audio signal;

obtaining a fundamental frequency label corresponding to the frequency domain audio signal output by the pre-trained machine learning model;

determining a target scale and a target score according to the fundamental frequency tags and the corresponding relationship between the fundamental frequency tags and the scale and the score;

and determining a intonation detection result of the electronic musical instrument equipment to be detected based on the determined target scale and target score and the standard scale and standard score corresponding to the electronic musical instrument equipment to be detected.

2. The audio processing method according to claim 1, further comprising:

and sampling the sound signal generated by the electronic musical instrument equipment to be detected based on a preset sampling frequency to obtain a time domain audio signal corresponding to the electronic musical instrument equipment to be detected.

3. The audio processing method of claim 1, wherein the inputting the frequency domain audio signal into a pre-trained machine learning model comprises:

selecting a frequency domain audio signal in a preset frequency range from the frequency domain audio signals to obtain a selected frequency domain audio signal;

and inputting the selected frequency domain audio signal into the pre-trained machine learning model.

4. The audio processing method according to claim 1, wherein the determining the intonation detection result of the electronic musical instrument device to be detected based on the determined target scale and target score and the standard scale and standard score corresponding to the electronic musical instrument device to be detected comprises:

if the scale difference value between the target scale and the standard scale is smaller than a preset scale difference value, and the score difference value between the target score and the standard score is smaller than a preset score difference value, determining that the intonation detection result of the electronic musical instrument equipment to be detected meets the preset detection requirement;

and if the scale difference value between the target scale and the standard scale is greater than or equal to a preset scale difference value, and/or the score difference value between the target scale and the standard score is greater than or equal to a preset score difference value, determining that the intonation detection result of the electronic musical instrument equipment to be detected does not meet the preset detection requirement.

5. The audio processing method according to claim 1, further comprising:

generating a notification message of intonation detection based on the intonation detection result;

a predetermined notification operation is performed based on the generated notification message.

6. The audio processing method according to claim 1, further comprising:

acquiring training set sample data for training a machine learning model to be trained, wherein each sample data in the training set sample data comprises a frequency domain audio signal and a fundamental frequency label corresponding to the frequency domain audio signal;

and training the machine learning model to be trained through the training set sample data to obtain the trained machine learning model.

7. The audio processing method according to claim 6, further comprising:

obtaining test set sample data for verifying a trained machine learning model, wherein each piece of sample data in the test set sample data comprises a frequency domain audio signal and a fundamental frequency label corresponding to the frequency domain audio signal;

inputting the frequency domain audio signal of each sample data of the test set sample data into a trained machine learning model, and outputting to obtain a predicted fundamental frequency label;

and if the proportion of the number of sample data pieces with the consistent fundamental frequency labels and the predicted fundamental frequency labels in the sample data of the test set to the total number of the sample data pieces in the sample data of the test set exceeds a preset proportion threshold, identifying the trained machine learning model as the pre-trained machine learning model.

8. An audio processing apparatus, comprising:

the first acquisition unit is used for acquiring a time domain audio signal corresponding to the electronic musical instrument device to be detected;

the conversion unit is used for carrying out frequency domain conversion processing on the time domain audio signal to obtain a frequency domain audio signal;

the input unit is used for inputting the frequency domain audio signals into a pre-trained machine learning model, and the pre-trained machine learning model is obtained by training sample data containing the frequency domain audio signals and fundamental frequency labels corresponding to the frequency domain audio signals;

the second acquisition unit is used for acquiring a fundamental frequency label corresponding to the frequency domain audio signal output by the pre-trained machine learning model;

the first execution unit is used for determining a target scale and a target score according to the fundamental frequency label and the corresponding relation between the fundamental frequency label and the scale and the score;

and the second execution unit is used for determining the intonation detection result of the electronic musical instrument equipment to be detected based on the determined target scale and target score and the standard scale and standard score corresponding to the electronic musical instrument equipment to be detected.

9. An electronic device comprising a memory and a processor, the memory having stored therein computer-readable instructions that, when executed by the processor, cause the processor to perform the audio processing method of any of claims 1 to 7.

10. A storage medium storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the audio processing method of any of claims 1-7.