WO2021213135A1 - 音频处理方法、装置、电子设备和存储介质 - Google Patents

音频处理方法、装置、电子设备和存储介质 Download PDF

Info

Publication number
WO2021213135A1
WO2021213135A1 PCT/CN2021/083398 CN2021083398W WO2021213135A1 WO 2021213135 A1 WO2021213135 A1 WO 2021213135A1 CN 2021083398 W CN2021083398 W CN 2021083398W WO 2021213135 A1 WO2021213135 A1 WO 2021213135A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio signal
domain audio
scale
machine learning
learning model
Prior art date
Application number
PCT/CN2021/083398
Other languages
English (en)
French (fr)
Inventor
蒋慧军
徐伟
杨艾琳
姜凯英
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021213135A1 publication Critical patent/WO2021213135A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals

Definitions

  • This application relates to the field of artificial intelligence, in particular to audio-based processing methods, devices, electronic equipment and storage media.
  • Electronic musical instrument equipment as a musical instrument that generates sound through electronic signals, is widely loved by people. During the production process of electronic musical instrument equipment, it is necessary to analyze the sound signal generated by it to detect the pitch of the sound signal generated by the electronic musical instrument equipment. Only qualified electronic musical instrument equipment can be put on the market for sale.
  • the inventor realizes that in the related art, when detecting electronic musical instruments, the sound signal generated by the electronic musical instrument is mainly compared with the standard sound signal. Only the electronic musical instrument with the similarity reaching a predetermined condition can be compared. Will be certified as qualified.
  • the method of detecting the intonation of the sound signal generated by the electronic musical instrument device proposed by the related art can only detect when there is a large difference between the sound signal generated by the electronic musical instrument device and the standard sound signal, and the detection is accurate. Low-degree technical problems.
  • One of the objectives of the embodiments of the present application is to provide an audio processing method, device, electronic equipment, and storage medium to solve the prior art method of detecting the intonation of the sound signal generated by the electronic musical instrument.
  • the detection can only be performed when there is a large difference between the sound signal generated by the musical instrument equipment and the standard sound signal, which has a technical problem of low detection accuracy.
  • an embodiment of the present application provides an audio processing method, and the method includes:
  • the pre-trained machine learning model is obtained by training sample data containing a frequency-domain audio signal and a fundamental frequency label corresponding to the frequency-domain audio signal;
  • the pitch detection result of the electronic musical instrument device to be tested is determined.
  • an audio processing device including:
  • the first acquiring unit is configured to acquire a time domain audio signal corresponding to the electronic musical instrument device to be detected
  • a conversion unit configured to perform frequency domain conversion processing on the time domain audio signal to obtain a frequency domain audio signal
  • the input unit is configured to input the frequency domain audio signal into a pre-trained machine learning model, and the pre-trained machine learning model passes samples containing the frequency domain audio signal and the fundamental frequency label corresponding to the frequency domain audio signal Obtained by data training;
  • the second acquiring unit is configured to acquire the fundamental frequency label corresponding to the frequency domain audio signal output by the pre-trained machine learning model
  • the first execution unit is configured to determine the target musical scale and the target musical fraction according to the fundamental frequency label and the corresponding relationship between the fundamental frequency label and the musical scale and musical cent;
  • the second execution unit is used to determine the intonation detection result of the electronic musical instrument device to be detected based on the determined target scale and target pitch and the standard scale and standard pitch corresponding to the electronic musical instrument device to be detected.
  • an embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor.
  • the processor executes the computer program, the following is achieved:
  • the pre-trained machine learning model is obtained by training sample data containing a frequency-domain audio signal and a fundamental frequency label corresponding to the frequency-domain audio signal;
  • the pitch detection result of the electronic musical instrument device to be tested is determined.
  • the embodiments of the present application provide a computer-readable storage medium.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the computer-readable storage medium stores a computer program. Realized when executed by the processor:
  • the pre-trained machine learning model is obtained by training sample data containing a frequency-domain audio signal and a fundamental frequency label corresponding to the frequency-domain audio signal;
  • the pitch detection result of the electronic musical instrument device to be tested is determined.
  • the embodiment of the present application has the beneficial effects that the frequency domain audio signal is obtained by performing frequency domain conversion processing on the time domain audio signal, and the fundamental frequency of the frequency domain audio signal is detected to obtain the electronic musical instrument to be detected.
  • the fundamental frequency of the time-domain audio signal corresponding to the device is determined based on the determined fundamental frequency. Based on the similarity comparison between the sound signal generated by the electronic musical instrument device and the standard sound signal, by detecting the fundamental frequency of the time-domain audio signal generated by the electronic musical instrument device to be detected, the electronic musical instrument device can be detected more accurately
  • the pitch detection of the generated sound improves the accuracy of the pitch detection of the electronic musical instrument equipment.
  • Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solutions of the embodiments of the present application can be applied.
  • Fig. 2 is a flowchart of an audio processing method shown in an exemplary embodiment of the application.
  • Fig. 3 is a flowchart of an audio processing method shown in an exemplary embodiment of the application.
  • Fig. 4 is a flowchart of an audio processing method shown in an exemplary embodiment of the application.
  • Fig. 5 is a specific flowchart of step S230 of the audio processing method according to an exemplary embodiment of the application.
  • Fig. 6 is a flowchart of an audio processing method according to an exemplary embodiment of the application.
  • Fig. 7 is a block diagram of an audio processing device according to an embodiment of the present application.
  • Fig. 8 is an exemplary block diagram of an electronic device for implementing the foregoing audio processing method according to an exemplary embodiment of the application.
  • Fig. 9 is a computer-readable storage medium for implementing the above-mentioned data verification method according to an exemplary embodiment of the application.
  • Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solutions of the embodiments of the present application can be applied.
  • the system architecture may include an electronic musical instrument device to be detected 101, a network 102, a client 103, and a server 104.
  • the client 103 obtains the time domain audio signal corresponding to the electronic musical instrument device 101 to be detected, and the client 103 obtains the The time-domain audio signal corresponding to the electronic musical instrument device 101 to be detected is uploaded to the server 104.
  • the server 104 may be a server that provides a pitch detection service.
  • the client 101 may be one or more of a smart phone, a tablet computer, and a portable computer. Of course, it can also be a desktop computer and so on.
  • the network 102 is used to provide a medium for communication links between the electronic musical instrument device 101 to be detected and the client 103, and the client 103 and the server 104.
  • the network 102 may include various connection types, such as wired communication links, wireless communication links, and so on.
  • server 104 may be a server cluster composed of multiple servers.
  • the client 103 of the present application uploads the time domain audio signal corresponding to the electronic musical instrument device 101 to be detected to the server 104, it can be specifically uploaded to any node server or any node server of the blockchain data server system.
  • the server 104 in this embodiment obtains the time domain audio signal corresponding to the electronic musical instrument device to be detected; performs frequency domain conversion processing on the time domain audio signal to obtain the frequency domain audio signal; inputs the frequency domain audio signal to the pre-training machine
  • the pre-trained machine learning model is obtained by training the sample data containing the frequency domain audio signal and the fundamental frequency label corresponding to the frequency domain audio signal; obtain the base frequency corresponding to the frequency domain audio signal output by the pre-trained machine learning model Frequency label;
  • the target scale and the target cent are determined; based on the determined target scale and the target cent and the electronic musical instrument equipment to be tested The corresponding standard scale and standard centimeter are used to determine the pitch detection result of the electronic musical instrument device to be tested.
  • the detection of the fundamental frequency of the generated time-domain audio signal can more accurately perform the pitch detection of the sound generated by the electronic musical instrument device to be detected, and improve the accuracy of the pitch detection of the electronic musical instrument device.
  • the audio processing method provided by the embodiment of the present application is generally executed by the server 104, and correspondingly, the audio processing device is generally set in the server 104.
  • the implementation details of the technical solutions of the embodiments of the present application will be described in detail below.
  • FIG. 2 is a flowchart of an audio processing method shown in an exemplary embodiment of the present application.
  • the audio processing method provided by the embodiment of the present application is executed by a server, which may specifically be the server 104 shown in FIG. 1.
  • the audio processing method shown in FIG. 2 includes steps S210 to S260, which are described in detail as follows.
  • step S210 a time domain audio signal corresponding to the electronic musical instrument device to be detected is acquired.
  • the electronic musical instrument device is a device that generates sound through electronic signals, and may be an electronic piano, an electric piano, an electronic synthesizer, an electronic drum, and other devices.
  • the electronic musical instrument device to be detected is an electronic musical instrument device that needs to perform pitch detection.
  • the electronic musical instrument device can generate sound through a preset control instruction.
  • the control instruction can be triggered by clicking a physical button in the electronic musical instrument device.
  • a sound signal is obtained by audio recording of the sound of an electronic musical instrument.
  • the time-domain audio signal is a sound signal of a certain period of time extracted from the sound signal generated by the electronic musical instrument device.
  • the audio processing method may further include: sampling the sound signal generated by the electronic musical instrument device to be detected based on a preset sampling frequency to obtain a time-domain audio signal corresponding to the electronic musical instrument device to be detected .
  • the sound can be generated by inputting preset control instructions to the electronic musical instrument device, and the sound signal can be obtained by audio recording the generated sound through the recording device.
  • the sound signal generated by the musical instrument device can be sampled at a preset sampling frequency to obtain the time domain audio signal, and then the time domain audio signal is obtained, for example, every 2 seconds
  • the audio signal generated by the electronic musical instrument device to be tested is sampled, and the duration of each audio signal sampled may be 0.5 seconds.
  • step S220 frequency domain conversion processing is performed on the time domain audio signal to obtain a frequency domain audio signal.
  • the manner of performing frequency domain conversion processing on the time domain audio signal may specifically be to perform Fourier transform on the time domain audio signal to obtain the corresponding frequency domain audio signal.
  • step S230 the frequency domain audio signal is input into the pre-trained machine learning model, and the pre-trained machine learning model is obtained by training the sample data containing the frequency domain audio signal and the fundamental frequency label corresponding to the frequency domain audio signal.
  • the frequency domain audio signal obtained by performing frequency domain conversion processing on the time domain audio signal will be input to the pre-trained machine learning model, and the pre-trained machine learning model is to train the machine learning model through training sample data.
  • the machine learning model may be a CNN (Convolutional Neural Network, convolutional neural network) model or a deep neural network model.
  • FIG. 3 is a flowchart of an audio processing method shown in an exemplary embodiment of this application.
  • the audio processing method in this embodiment may include step S310 to step S320, which are described in detail as follows.
  • step S310 the training set sample data used for training the machine learning model to be trained is obtained, and each piece of sample data in the training set sample data includes a frequency domain audio signal and a fundamental frequency label corresponding to the frequency domain audio signal.
  • each piece of sample data in the training set sample data includes a frequency domain audio signal and a fundamental frequency label generated according to the fundamental frequency corresponding to the frequency domain audio signal.
  • step S320 the machine learning model to be trained is trained through the training set sample data to obtain the trained machine learning model.
  • the training set sample data is input into the machine learning model, and the machine learning model to be trained is trained through the training set sample data to obtain the trained machine learning model.
  • the process of training the machine learning model is to adjust the coefficients in the network structure corresponding to the machine learning model, so that for the input frequency domain audio signal, through the calculation of the coefficients in the network structure corresponding to the machine learning model, the output result is The determined fundamental frequency label.
  • FIG. 4 is a flowchart of an audio processing method shown in an exemplary embodiment of this application.
  • the audio processing method in this embodiment may include step S410 to step S430, which are described in detail as follows.
  • step S410 obtain test set sample data for verifying the trained machine learning model, and each sample data in the test set sample data includes a frequency domain audio signal and a fundamental frequency corresponding to the frequency domain audio signal Label.
  • the trained machine learning model needs to be verified to ensure that the machine learning model meets the expected effect.
  • step S420 the frequency domain audio signal of each sample data of the test set sample data is input to the trained machine learning model, and the predicted fundamental frequency label is output.
  • the frequency domain audio signal of each piece of sample data of the test set sample data is input to the trained machine learning model, and the coefficients of each piece of sample data contained in the network structure corresponding to the machine learning model The frequency domain audio signal is processed to obtain the predicted fundamental frequency label for each sample data.
  • step S430 if the ratio of the number of sample data pieces whose fundamental frequency label in the test set sample data is consistent with the predicted fundamental frequency label to the total number of sample data pieces in the test set sample data exceeds a predetermined ratio threshold, the training The latter machine learning model is recognized as a pre-trained machine learning model.
  • the number of sample data pieces occupies the total sample data in the test set sample data. If the ratio of the number of data items exceeds the predetermined ratio threshold, it means that the trained machine learning model meets the expected functional effect, and the trained machine learning model can be identified as a pre-trained machine learning model. Otherwise, it needs to pass the training set sample The data is trained until the trained machine learning model meets the expected functional effect.
  • FIG. 5 is a specific flowchart of step S230 of the audio processing method according to an exemplary embodiment of the application.
  • Step S230 may include step S510 to step S520, which are described in detail as follows.
  • step S510 among the frequency domain audio signals, a frequency domain audio signal within a predetermined frequency range is selected to obtain the selected frequency domain audio signal.
  • the frequency domain audio signal before the frequency domain audio signal is input into the pre-trained machine learning model, since the frequency domain audio signal contains environmental noise, in order to improve the accuracy of the determined fundamental frequency of the frequency domain audio signal It is necessary to filter the frequency domain audio signal corresponding to the environmental noise in the frequency domain audio signal.
  • the frequency range of the frequency domain audio signal may be detected first, and the frequency domain audio signal in the predetermined frequency range may be selected based on the frequency range of the detected frequency domain audio signal. Since the frequency range of sound signals produced by different types of electronic musical instruments is a fixed frequency range, and the frequency range corresponding to environmental noise may not belong to this frequency range, it is possible to select a predetermined frequency range in the frequency domain audio signal The frequency domain audio signal in the internal frequency domain audio signal is obtained to obtain the selected frequency domain audio signal, so as to filter the frequency domain audio signal corresponding to the environmental noise contained in the frequency domain audio signal.
  • the predetermined frequency range contained in the frequency domain audio signal is related to the type of electronic musical instrument device to be detected.
  • the storage area of the system can store the frequency range of the sound signal generated by different types of electronic musical instrument and the electronic musical instrument. Correspondence between the types of devices.
  • the predetermined frequency range corresponding to the electronic musical instrument device to be tested can be determined by the type of the electronic musical instrument device to be tested and the above-mentioned corresponding relationship. .
  • step S520 the selected frequency domain audio signal is input into the pre-trained machine learning model.
  • the selected frequency domain audio signal is input into the pre-trained machine learning model. Since the frequency domain audio signal corresponding to the electronic musical instrument device to be detected is filtered by environmental noise, the pre-trained machine learning can be effectively improved. The accuracy of the fundamental frequency corresponding to the frequency domain audio signal detected by the model.
  • step S240 the fundamental frequency label corresponding to the frequency domain audio signal output by the pre-trained machine learning model is obtained.
  • the fundamental frequency label corresponding to the frequency domain audio signal output by the pre-trained machine learning model is obtained.
  • step S250 the target musical scale and the target musical fraction are determined according to the fundamental frequency label and the corresponding relationship between the fundamental frequency label and the musical scale and musical cent.
  • the scale refers to a sequence of musical modes produced by the electronic musical instrument to be tested and arranged in order of pitch
  • the pitch refers to a quantized value of the interval contained in each scale, and different pitch reflects Different frequencies of sound.
  • the frequency domain audio can be determined according to the fundamental frequency label corresponding to the frequency domain audio signal and the preset correspondence between the fundamental frequency label and the scale and cent.
  • the preset correspondence between the fundamental frequency label and the scale and cent is generated based on the correspondence between the two parameters of the scale and cent of international standards and their corresponding frequencies. of.
  • step S260 based on the determined target scale and target pitch, and the standard scale and standard pitch corresponding to the electronic musical instrument device to be detected, the pitch detection result of the electronic musical instrument device to be detected is determined.
  • the standard scale and standard pitch corresponding to the electronic musical instrument device to be detected are divided into the standard scale and standard pitch corresponding to the sound signal generated by the electronic musical instrument device. After obtaining the target scale and target pitch corresponding to the frequency domain audio signal, they are respectively compared with the standard scale and standard pitch corresponding to the electronic musical instrument device to be tested to determine the pitch detection result of the electronic musical instrument device to be tested.
  • step S260 may specifically include: if the scale difference between the target scale and the standard scale is less than a predetermined scale difference, and the difference between the target scale and the standard scale is less than The predetermined pitch difference value, it is determined that the pitch detection result of the electronic musical instrument to be tested meets the predetermined test requirements; if the pitch difference value between the target scale and the standard scale is greater than or equal to the predetermined pitch difference value, and/or the target pitch and If the difference between the standard pitches is greater than or equal to the predetermined pitch difference, it is determined that the pitch detection result of the electronic musical instrument device to be detected does not meet the predetermined detection requirement.
  • the target musical scale corresponding to the frequency domain audio signal is compared with the standard musical scale corresponding to the electronic musical instrument to be detected to determine the scale difference between the two, and the target musical scale corresponding to the frequency domain audio signal is compared with The difference between the standard cents corresponding to the electronic musical instrument equipment to be detected. If the scale difference between the target scale and the standard scale is less than the predetermined scale difference, and the cent difference between the target cent and the standard cent is less than the predetermined cent difference, determine the pitch detection of the electronic musical instrument device to be detected The result is that it meets the predetermined testing requirements.
  • the predetermined musical scale difference value can be set to 1, and the predetermined musical score difference value can be set to 2.
  • the predetermined musical score difference value can also be other values, which are not limited herein.
  • the frequency domain audio signal is obtained, and the fundamental frequency of the frequency domain audio signal is detected to obtain the fundamental frequency of the time domain audio signal corresponding to the electronic musical instrument device to be detected.
  • the standard sound signal is compared for similarity.
  • FIG. 6 is a flowchart of an audio processing method shown in an exemplary embodiment of this application.
  • the audio processing method in this embodiment may include steps S610 to S620, which are described in detail as follows.
  • step S610 based on the result of the pitch detection, a notification message of pitch detection is generated.
  • a notification message for the pitch detection may be generated according to the result of the pitch detection.
  • the notification message may be a voice message or a text message, which is not limited herein.
  • step S620 a predetermined notification operation is performed based on the generated notification message.
  • a predetermined notification operation can be performed based on the generated notification message.
  • the notification message is a voice message
  • the pitch detection can be played through the voice device of the electronic device.
  • the notification message is a text message
  • the notification message can be displayed through the display device of the electronic device, such as displaying the notification message on the display interface of the electronic device.
  • the technical solution of the embodiment shown in FIG. 6 can enable the user to obtain the result of the pitch detection of the electronic musical instrument device to be detected in time.
  • FIG. 7 is a block diagram of an audio processing apparatus according to an embodiment of the present application.
  • the audio processing apparatus may be integrated in an electronic device.
  • the audio processing apparatus 700 may include: a first acquiring unit 710 , The conversion unit 720, the input unit 730, the second acquisition unit 740, the first execution unit 750, and the second execution unit 760; the first acquisition unit 710 is used to acquire the time domain audio signal corresponding to the electronic musical instrument device to be detected; the conversion unit 720, configured to perform frequency domain conversion processing on the time domain audio signal to obtain a frequency domain audio signal; input unit 730, configured to input the frequency domain audio signal into a pre-trained machine learning model, and the pre-trained machine The learning model is obtained by training the sample data containing the frequency domain audio signal and the fundamental frequency label corresponding to the frequency domain audio signal; the second obtaining unit 740 is configured to obtain the frequency domain audio output by the pre-trained machine learning model The fundamental frequency label corresponding to the signal; the first execution
  • the audio processing device further includes: a sampling unit, configured to sample the sound signal generated by the electronic musical instrument device to be detected based on a preset sampling frequency to obtain the time domain corresponding to the electronic musical instrument device to be detected audio signal.
  • a sampling unit configured to sample the sound signal generated by the electronic musical instrument device to be detected based on a preset sampling frequency to obtain the time domain corresponding to the electronic musical instrument device to be detected audio signal.
  • the input unit 730 is configured to: among the frequency domain audio signals, select a frequency domain audio signal within a predetermined frequency range to obtain the selected frequency domain audio signal; and input the selected frequency domain audio signal To the pre-trained machine learning model.
  • the second execution unit 760 is configured to: if the scale difference between the target scale and the standard scale is less than a predetermined scale difference, and the difference between the target scale and the standard scale If the pitch difference is less than the predetermined pitch difference, it is determined that the pitch detection result of the electronic musical instrument device to be tested meets the predetermined detection requirements; if the pitch difference between the target scale and the standard scale is greater than or equal to a predetermined The scale difference, and/or the pitch difference between the target pitch and the standard pitch is greater than or equal to a predetermined pitch difference, then it is determined that the pitch detection result of the electronic musical instrument device to be detected is non-compliant Schedule testing requirements.
  • the audio processing device further includes: a first generation unit: configured to generate a pitch detection notification message based on the pitch detection result; a third execution unit, configured to perform a predetermined notification operation based on the generated notification message .
  • the audio processing device further includes: a second acquiring unit configured to acquire training set sample data used for training the machine learning model to be trained, and each piece of sample data in the training set sample data includes frequency The fundamental frequency label corresponding to the audio signal in the frequency domain and the audio signal in the frequency domain; the training unit is used to train the machine learning model to be trained through the training set sample data to obtain the trained machine learning model.
  • a second acquiring unit configured to acquire training set sample data used for training the machine learning model to be trained, and each piece of sample data in the training set sample data includes frequency The fundamental frequency label corresponding to the audio signal in the frequency domain and the audio signal in the frequency domain
  • the training unit is used to train the machine learning model to be trained through the training set sample data to obtain the trained machine learning model.
  • the audio processing device further includes: a third acquiring unit configured to acquire test set sample data used for verifying the trained machine learning model, each piece of sample data in the test set sample data Including the frequency domain audio signal and the fundamental frequency label corresponding to the frequency domain audio signal; the fourth execution unit is used to input the frequency domain audio signal of each sample data of the test set sample data to the trained machine learning model , Output the predicted fundamental frequency label; the identification unit, if the fundamental frequency label in the test set sample data is consistent with the predicted fundamental frequency label, the number of sample data pieces in the test set sample data If the ratio of the number of sample data exceeds the predetermined ratio threshold, the trained machine learning model is identified as the pre-trained machine learning model.
  • a third acquiring unit configured to acquire test set sample data used for verifying the trained machine learning model, each piece of sample data in the test set sample data Including the frequency domain audio signal and the fundamental frequency label corresponding to the frequency domain audio signal
  • the fourth execution unit is used to input the frequency domain audio signal of each sample data of the test set sample data to
  • the example embodiments described here can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which can be a personal computer, a server, a mobile terminal, or a network device, etc.) execute the method according to the embodiment of the present application.
  • a non-volatile storage medium which can be a CD-ROM, U disk, mobile hard disk, etc.
  • Including several instructions to make a computing device which can be a personal computer, a server, a mobile terminal, or a network device, etc.
  • an electronic device capable of implementing the above method.
  • the electronic device includes a memory, a processor, and a computer program that is stored in the memory and can run on the processor.
  • the processor executes the computer program, it realizes: acquiring the time domain audio signal corresponding to the electronic musical instrument device to be detected;
  • the audio signal undergoes frequency-domain conversion processing to obtain a frequency-domain audio signal;
  • the frequency-domain audio signal is input into a pre-trained machine learning model, and the pre-trained machine learning model contains frequency-domain audio signals and frequency-domain audio signals corresponding to
  • the sample data of the fundamental frequency label is obtained by training;
  • the fundamental frequency label corresponding to the frequency domain audio signal output by the pre-trained machine learning model is obtained; according to the fundamental frequency label, and the fundamental frequency label and the scale and the cent
  • the corresponding relationship between the target scale and the target pitch is determined; based on the determined target scale and target pitch, and the standard scale and standard pitch corresponding to the electronic musical instrument device to be tested, the pitch detection
  • FIG. 8 is an exemplary block diagram of an electronic device for implementing the above audio processing method according to an exemplary embodiment of the application.
  • the electronic device 800 shown in FIG. 8 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present application.
  • the electronic device 800 is represented in the form of a general-purpose computing device.
  • the components of the electronic device 800 may include, but are not limited to: the aforementioned at least one processing unit 810, the aforementioned at least one storage unit 820, and a bus 830 connecting different system components (including the storage unit 820 and the processing unit 810).
  • the storage unit stores a program code
  • the program code can be executed by the processing unit 810, so that the processing unit 810 executes the various exemplary methods described in the "Exemplary Method" section of this specification. Steps of implementation.
  • the processing unit 810 may perform step S210 to step S260 as shown in FIG. 2.
  • the storage unit 820 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 8201 and/or a cache storage unit 8202, and may further include a read-only storage unit (ROM) 8203.
  • RAM random access storage unit
  • ROM read-only storage unit
  • the storage unit 820 may also include a program/utility tool 8204 having a set of (at least one) program module 8205.
  • program module 8205 includes but is not limited to: an operating system, one or more application programs, other program modules, and program data, Each of these examples or some combination may include the implementation of a network environment.
  • the bus 830 may represent one or more of several types of bus structures, including a storage unit bus or a storage unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any bus structure among multiple bus structures. bus.
  • the electronic device 800 can also communicate with one or more external devices 1000 (such as keyboards, pointing devices, Bluetooth devices, etc.), and can also communicate with one or more devices that enable a user to interact with the electronic device 800, and/or communicate with Any device (eg, router, modem, etc.) that enables the electronic device 800 to communicate with one or more other computing devices. This communication can be performed through an input/output (I/O) interface 840.
  • the electronic device 800 may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 860. As shown in the figure, the network adapter 860 communicates with other modules of the electronic device 800 through the bus 830.
  • LAN local area network
  • WAN wide area network
  • public network such as the Internet
  • the example embodiments described here can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which can be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiment of the present application.
  • a computing device which can be a personal computer, a server, a terminal device, or a network device, etc.
  • a computer-readable storage medium is also provided, on which a program product capable of implementing the above method of this specification is stored.
  • various aspects of the present application can also be implemented in the form of a program product, which includes program code.
  • the program product runs on a terminal device, the program code is used to make the The electronic device executes the steps described in the above-mentioned "Exemplary Method" section of this specification according to various exemplary embodiments of the present application.
  • FIG. 9 is a computer-readable storage medium for implementing the above-mentioned data verification method according to an exemplary embodiment of this application.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the computer-readable storage medium stores a computer program that is implemented when the computer program is executed by a processor: the electronic device includes a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor executes the computer Implementation of the program: Obtain the time domain audio signal corresponding to the electronic musical instrument device to be detected; perform frequency domain conversion processing on the time domain audio signal to obtain the frequency domain audio signal; input the frequency domain audio signal into the pre-trained machine learning model ,
  • the pre-trained machine learning model is obtained by training the sample data containing the frequency domain audio signal and the fundamental frequency label corresponding to the frequency domain audio signal; obtaining the frequency domain audio signal corresponding to the frequency domain audio signal output by the pre-trained machine learning model Frequency frequency label; According to the fundamental frequency label and the corresponding relationship
  • FIG. 9 depicts a program product 900 for implementing the above-mentioned method according to an embodiment of the present application, which may adopt a portable compact disk read-only memory (CD-ROM) and include program code, and may be stored in an electronic device, For example, running on a personal computer.
  • CD-ROM portable compact disk read-only memory
  • the program product of this application is not limited to this.
  • the readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, device, or device.
  • the program product can use any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Type programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • the computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the readable signal medium may also be any readable medium other than a readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with the instruction execution system, apparatus, or device.
  • the program code contained on the readable medium can be transmitted by any suitable medium, including but not limited to wireless, wired, optical cable, RF, etc., or any suitable combination of the foregoing.
  • the program code used to perform the operations of the present application can be written in any combination of one or more programming languages.
  • the programming languages include object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural programming languages. Programming language-such as "C" language or similar programming language.
  • the program code can be executed entirely on the user's computing device, partly on the user's device, executed as an independent software package, partly on the user's computing device and partly executed on the remote computing device, or entirely on the remote computing device or server Executed on.
  • the remote computing device can be connected to a user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (for example, using Internet service providers). Business to connect via the Internet).
  • LAN local area network
  • WAN wide area network
  • Internet service providers for example, using Internet service providers.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Quality & Reliability (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Auxiliary Devices For Music (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

一种音频处理方法、装置(700)、电子设备(800)和存储介质,涉及人工智能领域。音频处理方法包括:获取待检测电子乐器设备对应的时域音频信号(S210);对时域音频信号进行频域转换处理,得到频域音频信号(S220);将频域音频信号输入至预训练的机器学习模型中(S230);获取预训练的机器学习模型输出的频域音频信号对应的基频频率标签(S240);根据基频频率标签、以及基频频率标签与音阶、音分之间的对应关系,确定目标音阶和目标音分(S250);基于确定的目标音阶和目标音分以及待检测电子乐器设备对应的标准音阶和标准音分,确定待检测电子乐器设备的音准检测结果(S260)。时域音频信号可由客户端上传至区块链服务器系统中的任意一个节点服务器,提高了对电子乐器设备进行音准检测的准确度。

Description

音频处理方法、装置、电子设备和存储介质
本申请要求于2020年11月25日在中华人民共和国国家知识产权局专利局提交的、申请号为202011341834.0、发明名称为“音频处理方法、装置、电子设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能领域,特别是涉及基于音频处理方法、装置、电子设备和存储介质。
背景技术
电子乐器设备作为通过电子信号来产生声音的乐器,受到人们广泛的喜爱。电子乐器设备在生产过程时,需要对其产生的声音信号进行分析,以检测电子乐器设备产生的声音信号的音准情况,只有检测合格的电子乐器设备才能被投放到市场上进行销售。
发明人意识到,在相关技术中,对电子乐器设备进行检测时,主要通过将电子乐器设备所产生的声音信号与标准的声音信号进行相似度比较,只有相似度达到预定条件的电子乐器设备才会被鉴定为合格。相关技术提出的检测电子乐器设备产生的声音信号的音准情况的方式,仅能对电子乐器设备所产生的声音信号与标准的声音信号之间存在较大差异的情况时才能进行检测,存在检测精准度较低的技术问题。
技术问题
本申请实施例的目的之一在于:提供了一种音频处理方法、装置、电子设备和存储介质,以解决现有技术中检测电子乐器设备产生的声音信号的音准情况的方式,仅能对电子乐器设备所产生的声音信号与标准的声音信号之间存在较大差异的情况时才能进行检测,存在检测精准度较低的技术问题。
技术解决方案
第一方面,本申请实施例提供了一种音频处理方法,方法包括:
获取待检测电子乐器设备对应的时域音频信号;
对所述时域音频信号进行频域转换处理,得到频域音频信号;
将所述频域音频信号输入至预训练的机器学习模型中,所述预训练的机器学习模型通过包含有频域音频信号以及频域音频信号对应的基频频率标签的样本数据训练得到;
获取所述预训练的机器学习模型输出的所述频域音频信号对应的基频频率标签;
根据所述基频频率标签、以及基频频率标签与音阶、音分之间的对应关系,确定目标音阶和目标音分;
基于所确定的目标音阶和目标音分以及待检测电子乐器设备所对应的标准音阶和标准音分,确定所述待检测电子乐器设备的音准检测结果。
第二方面,本申请实施例提供了一种音频处理装置,包括:
第一获取单元,用于获取待检测电子乐器设备对应的时域音频信号;
转换单元,用于对所述时域音频信号进行频域转换处理,得到频域音频信号;
输入单元,用于将所述频域音频信号输入至预训练的机器学习模型中,所述预训练的机器学习模型通过包含有频域音频信号以及频域音频信号对应的基频频率标签的样本数据训练得到;
第二获取单元,用于获取所述预训练的机器学习模型输出的所述频域音频信号对应的基频频率标签;
第一执行单元,用于根据所述基频频率标签、以及基频频率标签与音阶、音分之间的对应关系,确定目标音阶和目标音分;
第二执行单元,用于基于所确定的目标音阶和目标音分以及待检测电子乐器设备所对应的标准音阶和标准音分,确定所述待检测电子乐器设备的音准检测结果。
第三方面,本申请实施例提供了一种电子设备,包括存储器、处理器以及存储在存储器中并可在处理器上运行的计算机程序,所述处理器执行计算机程序时实现:
获取待检测电子乐器设备对应的时域音频信号;
对所述时域音频信号进行频域转换处理,得到频域音频信号;
将所述频域音频信号输入至预训练的机器学习模型中,所述预训练的机器学习模型通过包含有频域音频信号以及频域音频信号对应的基频频率标签的样本数据训练得到;
获取所述预训练的机器学习模型输出的所述频域音频信号对应的基频频率标签;
根据所述基频频率标签、以及基频频率标签与音阶、音分之间的对应关系,确定目标音阶和目标音分;
基于所确定的目标音阶和目标音分以及待检测电子乐器设备所对应的标准音阶和标准音分,确定所述待检测电子乐器设备的音准检测结果。
第四方面,本申请实施例提供了一种计算机可读存储介质,计算机可读存储介质可以是非易失性,也可以是易失性,计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现:
获取待检测电子乐器设备对应的时域音频信号;
对所述时域音频信号进行频域转换处理,得到频域音频信号;
将所述频域音频信号输入至预训练的机器学习模型中,所述预训练的机器学习模型通过包含有频域音频信号以及频域音频信号对应的基频频率标签的样本数据训练得到;
获取所述预训练的机器学习模型输出的所述频域音频信号对应的基频频率标签;
根据所述基频频率标签、以及基频频率标签与音阶、音分之间的对应关系,确定目标音阶和目标音分;
基于所确定的目标音阶和目标音分以及待检测电子乐器设备所对应的标准音阶和标准音分,确定所述待检测电子乐器设备的音准检测结果。
有益效果
本申请实施例与现有技术相比存在的有益效果是:通过对时域音频信号进行频域转换处理,得到频域音频信号,对频域音频信号进行基频频率检测,得到待检测电子乐器设备对应的时域音频信号的基频频率,基于所确定的基频频率来确定待检测电子乐器设备所产生声音的音阶和音分,以实现对待检测电子乐器设备所产生声音进行音准检测,相较于根据电子乐器设备所产生的声音信号与标准的声音信号进行相似度比较,通过对待检测电子乐器设备所产生的时域音频信号的基频频率进行检测,可以更为精准地对待检测电子乐器设备所产生声音进行音准检测,提高了对电子乐器设备进行音准检测的准确度。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或示范性技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1示出了可以应用本申请实施例的技术方案的示例性系统架构的示意图。
图2为本申请一示例性实施例示出的音频处理方法的流程图。
图3为本申请一示例性实施例示出的音频处理方法的流程图。
图4为本申请一示例性实施例示出的音频处理方法的流程图。
图5为本申请一示例性实施例示出的音频处理方法的步骤S230的具体流程图。
图6为本申请一示例性实施例示出的音频处理方法的流程图。
图7为根据本申请的一个实施例的音频处理装置的框图。
图8为本申请一示例性实施例示出的一种用于实现上述音频处理方法的电子设备示例框图。
图9为本申请一示例性实施例示出的一种用于实现上述数据校验方法的计算机可读存储介质。
本发明的实施方式
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本申请将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。
此外,所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施例中。在下面的描述中,提供许多具体细节从而给出对本申请的实施例的充分理解。然而,本领域技术人员将意识到,可以实践本申请的技术方案而没有特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知方法、装置、实现或者操作以避免模糊本申请的各方面。
附图中所示的方框图仅仅是功能实体,不一定必须与物理上独立的实体相对应。即,可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。
附图中所示的流程图仅是示例性说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分解,而有的操作/步骤可以合并或部分合并,因此实际执行的顺序有可能根据实际情况改变。
图1示出了可以应用本申请实施例的技术方案的示例性系统架构的示意图。
如图1所示,系统架构可以包括待检测电子乐器设备101、网络102、客户端103和服务器104,客户端103获取待检测电子乐器设备101对应的时域音频信号,客户端103将获取的待检测电子乐器设备101对应的时域音频信号上传到服务器104,该服务器104可以为提供音准检测服务的服务器,客户端101可以为智能手机、平板电脑和便携式计算机中的一种或多种,当然也可以是台式计算机等等。网络102用以在待检测电子乐器设备101和客户端103、客户端103和服务器104之间提供通信链路的介质。网络102可以包括各种连接类型,例如有线通信链路、无线通信链路等等。
应该理解,图1中的待检测电子乐器设备101、网络102、客户端103和服务器104的数目仅仅是示意性的。根据实现需要,可以具有任意数目的待检测电子乐器设备101、网络102、客户端103和服务器104,例如服务器104可以是多个服务器组成的服务器集群等。
可选地,本申请的客户端103将待检测电子乐器设备101对应的时域音频信号上传至服务器104时,具体可以是上传到区块链数据服务器系统的任意一个节点服务器,任意一个节点服务器根据待检测电子乐器设备101对应的时域音频信号确定其音准检测结果,并将音准检测结果进行存储,基于区块链数据共享的安全性和不可更改特性,有效保证音准检测结果的安全性和可靠性。
本实施例中的服务器104在获取待检测电子乐器设备对应的时域音频信号后;对时域音频信号进行频域转换处理,得到频域音频信号;将频域音频信号输入至预训练的机器学习模型中,预训练的机器学习模型通过包含有频域音频信号以及频域音频信号对应的基频频率标签的样本数据训练得到;获取预训练的机器学习模型输出的频域音频信号对应的基频频率标签;根据基频频率标签、以及基频频率标签与音阶、音分之间的对应关系,确定目标音阶和目标音分;基于所确定的目标音阶和目标音分以及待检测电子乐器设备所对应的标准音阶和标准音分,确定待检测电子乐器设备的音准检测结果,相较于根据电子乐器设备所产生的声音信号与标准的声音信号进行相似度比较,通过对待检测电子乐器设备所产生的时域音频信号的基频频率进行检测,可以更为精准地对待检测电子乐器设备所产生声音进行音准检测,提高了对电子乐器设备进行音准检测的准确度。
需要说明的是,本申请实施例所提供的音频处理方法一般由服务器104执行,相应地,音频处理装置一般设置于服务器104中。以下对本申请实施例的技术方案的实现细节进行详细阐述。
参考图2,图2为本申请一示例性实施例示出的音频处理方法的流程图,本申请实施例所提供的音频处理方法的执行主体为服务器,具体可以为图1所示的服务器104,如图2所示的音频处理方法包括步骤S210至步骤S260,详细描述如下。
在步骤S210中,获取待检测电子乐器设备对应的时域音频信号。
在一个实施例中,电子乐器设备为通过电子信号产生声音的设备,可以为电子琴、电钢琴、电子合成器、电子鼓等设备。待检测电子乐器设备作为需要进行音准检测的电子乐器设备,电子乐器设备可以通过预设的控制指令产生声音,该控制指令可以通过点击电子乐器设备中的实体按键进行触发,将对电子乐器设备产生的声音进行音频录制得到声音信号,时域音频信号为从电子乐器设备所产生的声音信号中所提取的某个时间段的声音信号。
可选地,在一个实施例中,该音频处理方法还可以包括:基于预设的采样频率对待检测电子乐器设备所产生的声音信号进行采样处理,得到待检测电子乐器设备对应的时域音频信号。
在获取待检测电子乐器设备对应的时域音频信号时,可以先通过输入预设控制指令至电子乐器设备来产生声音,并通过录音设备对所产生的声音进行音频录制得到声音信号,在得到电子乐器设备所产生的声音信号后,可以通过预设的采样频率对待检测电子乐器设备所产生的声音信号进行采样处理得到时域音频信号,进而得到时域维度的音频信号,例如,每隔2秒对待检测电子乐器设备所产生的声音信号进行采样,每次采样的音频信号的时长可以为0.5秒。
在步骤S220中,对时域音频信号进行频域转换处理,得到频域音频信号。
在一个实施例中,在得到时域音频信号,为了实现对待检测电子乐器设备所产生的声音信号进行基频检测,需要先对时域音频信号进行频域转换处理,得到对应的频域音频信号,对时域音频信号进行频域转换处理的方式具体可以为采用对时域音频信号进行傅里叶变换,从而得到对应的频域音频信号。
在步骤S230中,将频域音频信号输入至预训练的机器学习模型中,预训练的机器学习模型通过包含有频域音频信号以及频域音频信号对应的基频频率标签的样本数据训练得到。
在一个实施例中,对时域音频信号进行频域转换处理得到频域音频信号会被输入至预训练的机器学习模型中,预训练的机器学习模型是通过训练样本数据对机器学习模型进行训练得到的。其中,机器学习模型可以是CNN(Convolutional Neural Network,卷积神经网络)模型,或者也可以是深度神经网络模型等。
参考图3,图3为本申请一示例性实施例示出的音频处理方法的流程图,该实施例中的音频处理方法可以包括步骤S310至步骤S320,详细描述如下。
在步骤S310中,获取用于对待训练的机器学习模型进行训练的训练集样本数据,训练集样本数据中的每条样本数据包括频域音频信号、频域音频信号所对应的基频频率标签。
在一个实施例中,训练集样本数据中的每条样本数据包含频域音频信号、以及根据该频域音频信号所对应的基频频率所生成的基频频率标签。
在步骤S320中,通过训练集样本数据对待训练的机器学习模型进行训练,得到训练后的机器学习模型。
在一个实施例中,将训练集样本数据输入至机器学习模型中,通过训练集样本数据对待训练的机器学习模型进行训练,得到训练后的机器学习模型。对机器学习模型进行训练的过程是调整机器学习模型对应的网络结构中的各项系数,使得对于输入的频域音频信号,经过机器学习模型对应的网络结构中的各项系数运算,输出结果为所确定的基频频率标签。
参考图4,图4为本申请一示例性实施例示出的音频处理方法的流程图,该实施例中的音频处理方法可以包括步骤S410至步骤S430,详细描述如下。
在步骤S410中,获取用于对训练后的机器学习模型进行校验的测试集样本数据,测试集样本数据中的每条样本数据包括频域音频信号、频域音频信号所对应的基频频率标签。
在一个实施例中,对于训练后的机器学习模型,还需要对其进行校验,以确保机器学习模型符合预期的效果。在对训练后的机器学习模型进行校验时,可以获取用于对训练后的机器学习模型进行校验的测试集样本数据,其中,测试集样本数据中的每条样本数据均包括频域音频信号以及频域音频信号对应的基频频率标签。
在步骤S420中,将测试集样本数据的每条样本数据的频域音频信号输入至训练后的机器学习模型,输出得到预测的基频频率标签。
在一个实施例中,将测试集样本数据的每条样本数据的频域音频信号输入至训练后的机器学习模型,由机器学习模型对应的网络结构中的各项系数对每条样本数据包含的频域音频信号进行处理,得到针对每条样本数据所预测的基频频率标签。
在步骤S430中,若测试集样本数据中的基频频率标签与预测的基频频率标签一致的样本数据条数占测试集样本数据中总样本数据条数的比例超过预定比例阈值,则将训练后的机器学习模型识别为预训练的机器学习模型。
在一个实施例中,对于测试集样本数据中的每条样本数据,若测试集样本数据中的基频频率标签与预测的基频频率标签一致的样本数据条数占测试集样本数据中总样本数据条数的比例超过预定比例阈值,则说明训练后的机器学习模型符合预期的功能效果,则可以将训练后的机器学习模型识别为预训练的机器学习模型,否则,则需要通过训练集样本数据对其进行训练,直到训练后的机器学习模型符合预期的功能效果。
参考图5,图5为本申请一示例性实施例示出的音频处理方法的步骤S230的具体流程图,步骤S230可以包括步骤S510至步骤S520,详细描述如下。
在步骤S510中,在频域音频信号中,选择处于预定频率范围内的频域音频信号,得到选择后的频域音频信号。
在一个实施例,在将频域音频信号输入至预训练的机器学习模型中之前,由于频域音频信号中会包含环境噪声,因此为了提高所确定的频域音频信号对应的基频频率的准确性,需要过滤频域音频信号中的环境噪声所对应的频域音频信号。
具体的,可以先检测频域音频信号频域所处的频率范围,在基于所检测得到频域音频信号所处的频率范围,选择处于预定频率范围的频域音频信号。由于不同类型的电子乐器设备所产生声音信号的频率范围为一个固定的频率范围,而环境噪声所对应的频率范围可以不属于这个频率范围,因此可以在频域音频信号中,选择处于预定频率范围内的频域音频信号,得到选择后的频域音频信号,以实现对频域音频信号中所包含的环境噪声对应的频域音频信号进行过滤。该频域音频信号所包含的预定频率范围与待检测电子乐器设备的类型存在关联,因此系统的存储区中可以存储有不同类型的电子乐器设备所产生的声音信号所处的频率范围与电子乐器设备的类型之间的对应关系,在需要获取待检测电子乐器设备所对应的预定频率范围,可以通过待检测电子乐器设备的类型以及上述对应关系,确定待检测电子乐器设备所对应的预定频率范围。
在步骤S520中,将选择后的频域音频信号输入至预训练的机器学习模型中。
在一个实施例中,将选择后的频域音频信号输入至预训练的机器学习模型中,由于对待检测电子乐器设备对应的频域音频信号进行了环境噪声过滤,可以有效提高预训练的机器学习模型检测的频域音频信号对应的基频频率的准确度。
还请继续参考图2,在步骤S240中,获取预训练的机器学习模型输出的频域音频信号对应的基频频率标签。
在一个实施例中,在将频域音频信号输入至预训练的机器学习模型,获取预训练的机器学习模型输出的频域音频信号对应的基频频率标签。
在步骤S250中,根据基频频率标签、以及基频频率标签与音阶、音分之间的对应关系,确定目标音阶和目标音分。
在一个实施例中,音阶指的是对待检测电子乐器设备产生音乐调式按音高次序排列而成的音列,音分指的是各音阶所包含音程的一种量化值,不同音分反映了声音的不同频率。在得到频域音频信号对应的基频频率标签后,可以根据频域音频信号对应的基频频率标签,以及预设的基频频率标签与音阶、音分之间的对应关系,确定频域音频信号对应的目标音阶和目标音分,预设的基频频率标签与音阶、音分之间的对应关系为根据国际标准的音阶和音分这两个参数与其对应的频率之间的对应关系所生成的。
在步骤S260中,基于所确定的目标音阶和目标音分以及待检测电子乐器设备所对应的标准音阶和标准音分,确定待检测电子乐器设备的音准检测结果。
在一个实施例中,待检测电子乐器设备所对应的标准音阶和标准音分为电子乐器设备所产生的声音信号所对应的标准音阶和标准音分。在得到频域音频信号所对应的目标音阶和目标音分,将其分别与待检测电子乐器设备所对应的标准音阶和标准音分进行比对,以确定待检测电子乐器设备的音准检测结果。
可选地,在一个实施例中,步骤S260具体可以包括:若目标音阶与标准音阶之间的音阶差值小于预定音阶差值,且目标音分与标准音分之间的音分差值小于预定音分差值,则确定待检测电子乐器设备的音准检测结果为符合预定检测要求;若目标音阶与标准音阶之间的音阶差值大于或等于预定音阶差值,和/或目标音分与标准音分之间的音分差值大于或等于预定音分差值,则确定待检测电子乐器设备的音准检测结果为不符合预定检测要求。
在一个实施例中,将频域音频信号对应的目标音阶与待检测电子乐器设备所对应的标准音阶进行比较确定两者之间的音阶差值,以及将频域音频信号对应的目标音分与待检测电子乐器设备所对应的标准音分之间的音分差值。若目标音阶与标准音阶之间的音阶差值小于预定音阶差值,且目标音分与标准音分之间的音分差值小于预定音分差值,则确定待检测电子乐器设备的音准检测结果为符合预定检测要求。否则,若目标音阶与标准音阶之间的音阶差值大于或等于预定音阶差值,和/或目标音分与标准音分之间的音分差值大于或等于预定音分差值,则确定待检测电子乐器设备的音准检测结果为不符合预定检测要求。可以理解的是,预定音阶差值可以设置为1,预定音分差值可以设置为2,当然,预定音分差值也可以为其它数值,在此不作限定。
以上可以看出,通过对时域音频信号进行频域转换处理,得到频域音频信号,对频域音频信号进行基频频率检测,得到待检测电子乐器设备对应的时域音频信号的基频频率,基于所确定的基频频率来确定待检测电子乐器设备所产生声音的音阶和音分,以实现对待检测电子乐器设备所产生声音进行音准检测,相较于根据电子乐器设备所产生的声音信号与标准的声音信号进行相似度比较,通过对待检测电子乐器设备所产生的时域音频信号的基频频率进行检测,可以更为精准地对待检测电子乐器设备所产生声音进行音准检测,提高了对电子乐器设备进行音准检测的准确度。
参考图6,图6为本申请一示例性实施例示出的音频处理方法的流程图,该实施例中的音频处理方法可以包括步骤S610至步骤S620,详细描述如下。
在步骤S610中,基于音准检测结果,生成音准检测的通知消息。
在一个实施例中,在得到音准检测结果后,可以根据该音准检测结果生成针对音准检测的通知消息,该通知消息可以为语音消息或文本消息,在此不作限定。
在步骤S620中,基于生成的通知消息执行预定的通知操作。
在一个实施例中,在生成针对音准检测的通知消息后,则可以基于生成的通知消息执行预定的通知操作,当该通知消息为语音消息时,则可以通过电子设备的语音装置播放该音准检测结果,当该通知消息为文本消息,则可以通过电子设备的显示装置来显示该通知消息,如在电子设备的显示界面中显示该通知消息。
图6所示实施例的技术方案,可以使得用户及时获取对待检测的电子乐器设备的音准检测结果。
参考图7,图7为本申请的一个实施例的音频处理装置的框图,音频处理装置可以集成于电子设备中,根据本申请的一个实施例的音频处理装置700可以包括:第一获取单元710、转换单元720、输入单元730、第二获取单元740、第一执行单元750以及第二执行单元760;第一获取单元710,用于获取待检测电子乐器设备对应的时域音频信号;转换单元720,用于对所述时域音频信号进行频域转换处理,得到频域音频信号;输入单元730,用于将频域音频信号输入至预训练的机器学习模型中,所述预训练的机器学习模型通过包含有频域音频信号以及频域音频信号对应的基频频率标签的样本数据训练得到;第二获取单元740,用于获取所述预训练的机器学习模型输出的所述频域音频信号对应的基频频率标签;第一执行单元750,用于根据所述基频频率标签、以及基频频率标签与音阶、音分之间的对应关系,确定目标音阶和目标音分;第二执行单元760,用于基于所确定的目标音阶和目标音分以及待检测电子乐器设备所对应的标准音阶和标准音分,确定所述待检测电子乐器设备的音准检测结果。
可选地,所述音频处理装置还包括:采样单元,用于基于预设的采样频率对所述待检测电子乐器设备所产生的声音信号进行采样处理,得到待检测电子乐器设备对应的时域音频信号。
可选地,输入单元730被配置为:在所述频域音频信号中,选择处于预定频率范围内的频域音频信号,得到选择后的频域音频信号;将选择后的频域音频信号输入至所述预训练的机器学习模型中。
可选地,第二执行单元760被配置为:若所述目标音阶与所述标准音阶之间的音阶差值小于预定音阶差值,且所述目标音分与所述标准音分之间的音分差值小于预定音分差值,则确定所述待检测电子乐器设备的音准检测结果为符合预定检测要求;若所述目标音阶与所述标准音阶之间的音阶差值大于或等于预定音阶差值,和/或所述目标音分与所述标准音分之间的音分差值大于或等于预定音分差值,则确定所述待检测电子乐器设备的音准检测结果为不符合预定检测要求。
可选地,所述音频处理装置还包括:第一生成单元:用于基于所述音准检测结果,生成音准检测的通知消息;第三执行单元,用于基于生成的通知消息执行预定的通知操作。
可选地,所述音频处理装置还包括:第二获取单元,用于获取用于对待训练的机器学习模型进行训练的训练集样本数据,所述训练集样本数据中的每条样本数据包括频域音频信号、频域音频信号所对应的基频频率标签;训练单元,用于通过所述训练集样本数据对待训练的机器学习模型进行训练,得到训练后的机器学习模型。
可选地,所述音频处理装置还包括:第三获取单元,用于获取用于对训练后的机器学习模型进行校验的测试集样本数据,所述测试集样本数据中的每条样本数据包括频域音频信号、频域音频信号所对应的基频频率标签;第四执行单元,用于将所述测试集样本数据的每条样本数据的频域音频信号输入至训练后的机器学习模型,输出得到预测的基频频率标签;识别单元,用于若所述测试集样本数据中的基频频率标签与预测的基频频率标签一致的样本数据条数占所述测试集样本数据中总样本数据条数的比例超过预定比例阈值,则将训练后的机器学习模型识别为所述预训练的机器学习模型。
上述装置中各个模块的功能和作用的实现过程具体详见上述基于音频处理方法中对应步骤的实现过程,在此不再赘述。
应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本申请公开的实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。
此外,尽管在附图中以特定顺序描述了本申请中方法的各个步骤,但是,这并非要求或者暗示必须按照该特定顺序来执行这些步骤,或是必须执行全部所示的步骤才能实现期望的结果。附加的或备选的,可以省略某些步骤,将多个步骤合并为一个步骤执行,以及/或者将一个步骤分解为多个步骤执行等。
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本申请实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、移动终端、或者网络设备等)执行根据本申请实施方式的方法。
在本申请的示例性实施例中,还提供了一种能够实现上述方法的电子设备。该电子设备包括存储器、处理器以及存储在存储器中并可在处理器上运行的计算机程序,该处理器执行计算机程序时实现:获取待检测电子乐器设备对应的时域音频信号;对该时域音频信号进行频域转换处理,得到频域音频信号;将该频域音频信号输入至预训练的机器学习模型中,该预训练的机器学习模型通过包含有频域音频信号以及频域音频信号对应的基频频率标签的样本数据训练得到;获取该预训练的机器学习模型输出的该频域音频信号对应的基频频率标签;根据该基频频率标签、以及基频频率标签与音阶、音分之间的对应关系,确定目标音阶和目标音分;基于所确定的目标音阶和目标音分以及待检测电子乐器设备所对应的标准音阶和标准音分,确定该待检测电子乐器设备的音准检测结果。
所属技术领域的技术人员能够理解,本申请的各个方面可以实现为系统、方法或程序产品。因此,本申请的各个方面可以具体实现为以下形式,即:完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等),或硬件和软件方面结合的实施方式,这里可以统称为“电路”、“模块”或“系统”。
参考图8,图8为本申请一示例性实施例示出的一种用于实现上述音频处理方法的电子设备示例框图。图8显示的电子设备800仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图8所示,电子设备800以通用计算设备的形式表现。电子设备800的组件可以包括但不限于:上述至少一个处理单元810、上述至少一个存储单元820、连接不同系统组件(包括存储单元820和处理单元810)的总线830。
其中,所述存储单元存储有程序代码,所述程序代码可以被所述处理单元810执行,使得所述处理单元810执行本说明书上述“示例性方法”部分中描述的根据本申请各种示例性实施方式的步骤。例如,所述处理单元810可以执行如图2中所示的步骤S210至步骤S260。
存储单元820可以包括易失性存储单元形式的可读介质,例如随机存取存储单元(RAM)8201和/或高速缓存存储单元8202,还可以进一步包括只读存储单元(ROM)8203。
存储单元820还可以包括具有一组(至少一个)程序模块8205的程序/实用工具8204,这样的程序模块8205包括但不限于:操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。
总线830可以为表示几类总线结构中的一种或多种,包括存储单元总线或者存储单元控制器、外围总线、图形加速端口、处理单元或者使用多种总线结构中的任意总线结构的局域总线。
电子设备800也可以与一个或多个外部设备1000(例如键盘、指向设备、蓝牙设备等)通信,还可与一个或者多个使得用户能与该电子设备800交互的设备通信,和/或与使得该电子设备800能与一个或多个其它计算设备进行通信的任何设备(例如路由器、调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口840进行。并且,电子设备800还可以通过网络适配器860与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器860通过总线830与电子设备800的其它模块通信。应当明白,尽管图中未示出,可以结合电子设备800使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本申请实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、终端装置、或者网络设备等)执行根据本申请实施方式的方法。
在本申请的示例性实施例中,还提供了一种计算机可读存储介质,其上存储有能够实现本说明书上述方法的程序产品。在一些可能的实施方式中,本申请的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当所述程序产品在终端设备上运行时,所述程序代码用于使所述电子设备执行本说明书上述“示例性方法”部分中描述的根据本申请各种示例性实施方式的步骤。
参考图9所示,图9为本申请一示例性实施例示出的一种用于实现上述数据校验方法的计算机可读存储介质。该计算机可读存储介质可以是非易失性,也可以是易失性。该计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行时实现:该电子设备包括存储器、处理器以及存储在存储器中并可在处理器上运行的计算机程序,该处理器执行计算机程序时实现:获取待检测电子乐器设备对应的时域音频信号;对该时域音频信号进行频域转换处理,得到频域音频信号;将该频域音频信号输入至预训练的机器学习模型中,该预训练的机器学习模型通过包含有频域音频信号以及频域音频信号对应的基频频率标签的样本数据训练得到;获取该预训练的机器学习模型输出的该频域音频信号对应的基频频率标签;根据该基频频率标签、以及基频频率标签与音阶、音分之间的对应关系,确定目标音阶和目标音分;基于所确定的目标音阶和目标音分以及待检测电子乐器设备所对应的标准音阶和标准音分,确定该待检测电子乐器设备的音准检测结果。
示例性地,图9描述了根据本申请的实施方式的用于实现上述方法的程序产品900,其可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在电子设备,例如个人电脑上运行。然而,本申请的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
所述程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以为但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。
计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、有线、光缆、RF等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言的任意组合来编写用于执行本申请操作的程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。
此外,上述附图仅是根据本申请示例性实施例的方法所包括的处理的示意性说明,而不是限制目的。易于理解,上述附图所示的处理并不表明或限制这些处理的时间顺序。另外,也易于理解,这些处理可以是例如在多个模块中同步或异步执行的。
本领域技术人员在考虑说明书及实践这里公开的申请后,将容易想到本申请的其他实施例。本申请旨在涵盖本申请的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本申请的真正范围和精神由权利要求指出。

Claims (20)

  1. 一种音频处理方法,其中,包括:
    获取待检测电子乐器设备对应的时域音频信号;
    对所述时域音频信号进行频域转换处理,得到频域音频信号;
    将所述频域音频信号输入至预训练的机器学习模型中,所述预训练的机器学习模型通过包含有频域音频信号以及频域音频信号对应的基频频率标签的样本数据训练得到;
    获取所述预训练的机器学习模型输出的所述频域音频信号对应的基频频率标签;
    根据所述基频频率标签、以及基频频率标签与音阶、音分之间的对应关系,确定目标音阶和目标音分;
    基于所确定的目标音阶和目标音分以及待检测电子乐器设备所对应的标准音阶和标准音分,确定所述待检测电子乐器设备的音准检测结果。
  2. 根据权利要求1所述的音频处理方法,其中,所述音频处理方法,还包括:
    基于预设的采样频率对所述待检测电子乐器设备所产生的声音信号进行采样处理,得到待检测电子乐器设备对应的时域音频信号。
  3. 根据权利要求1所述的音频处理方法,其中,所述将所述频域音频信号输入至预训练的机器学习模型中,包括:
    在所述频域音频信号中,选择处于预定频率范围内的频域音频信号,得到选择后的频域音频信号;
    将选择后的频域音频信号输入至所述预训练的机器学习模型中。
  4. 根据权利要求1所述的音频处理方法,其中,所述基于所确定的目标音阶和目标音分以及待检测电子乐器设备所对应的标准音阶和标准音分,确定所述待检测电子乐器设备的音准检测结果,包括:
    若所述目标音阶与所述标准音阶之间的音阶差值小于预定音阶差值,且所述目标音分与所述标准音分之间的音分差值小于预定音分差值,则确定所述待检测电子乐器设备的音准检测结果为符合预定检测要求;
    若所述目标音阶与所述标准音阶之间的音阶差值大于或等于预定音阶差值,和/或所述目标音分与所述标准音分之间的音分差值大于或等于预定音分差值,则确定所述待检测电子乐器设备的音准检测结果为不符合预定检测要求。
  5. 根据权利要求1所述的音频处理方法,其中,所述音频处理方法,还包括:
    基于所述音准检测结果,生成音准检测的通知消息;
    基于生成的通知消息执行预定的通知操作。
  6. 根据权利要求1所述的音频处理方法,其中,所述音频处理方法,还包括:
    获取用于对待训练的机器学习模型进行训练的训练集样本数据,所述训练集样本数据中的每条样本数据包括频域音频信号、频域音频信号所对应的基频频率标签;
    通过所述训练集样本数据对待训练的机器学习模型进行训练,得到训练后的机器学习模型。
  7. 根据权利要求6所述的音频处理方法,其中,所述音频处理方法,还包括:
    获取用于对训练后的机器学习模型进行校验的测试集样本数据,所述测试集样本数据中的每条样本数据包括频域音频信号、频域音频信号所对应的基频频率标签;
    将所述测试集样本数据的每条样本数据的频域音频信号输入至训练后的机器学习模型,输出得到预测的基频频率标签;
    若所述测试集样本数据中的基频频率标签与预测的基频频率标签一致的样本数据条数占所述测试集样本数据中总样本数据条数的比例超过预定比例阈值,则将训练后的机器学习模型识别为所述预训练的机器学习模型。
  8. 一种音频处理装置,其中,包括:
    第一获取单元,用于获取待检测电子乐器设备对应的时域音频信号;
    转换单元,用于对所述时域音频信号进行频域转换处理,得到频域音频信号;
    输入单元,用于将所述频域音频信号输入至预训练的机器学习模型中,所述预训练的机器学习模型通过包含有频域音频信号以及频域音频信号对应的基频频率标签的样本数据训练得到;
    第二获取单元,用于获取所述预训练的机器学习模型输出的所述频域音频信号对应的基频频率标签;
    第一执行单元,用于根据所述基频频率标签、以及基频频率标签与音阶、音分之间的对应关系,确定目标音阶和目标音分;
    第二执行单元,用于基于所确定的目标音阶和目标音分以及待检测电子乐器设备所对应的标准音阶和标准音分,确定所述待检测电子乐器设备的音准检测结果。
  9. 一种电子设备,其中,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现:
    获取待检测电子乐器设备对应的时域音频信号;
    对所述时域音频信号进行频域转换处理,得到频域音频信号;
    将所述频域音频信号输入至预训练的机器学习模型中,所述预训练的机器学习模型通过包含有频域音频信号以及频域音频信号对应的基频频率标签的样本数据训练得到;
    获取所述预训练的机器学习模型输出的所述频域音频信号对应的基频频率标签;
    根据所述基频频率标签、以及基频频率标签与音阶、音分之间的对应关系,确定目标音阶和目标音分;
    基于所确定的目标音阶和目标音分以及待检测电子乐器设备所对应的标准音阶和标准音分,确定所述待检测电子乐器设备的音准检测结果。
  10. 如权利要求9所述的电子设备,其中,所述处理器执行所述计算机程序时还实现:
    基于预设的采样频率对所述待检测电子乐器设备所产生的声音信号进行采样处理,得到待检测电子乐器设备对应的时域音频信号。
  11. 如权利要求9所述的电子设备,其中,所述处理器执行所述计算机程序时还实现:
    在所述频域音频信号中,选择处于预定频率范围内的频域音频信号,得到选择后的频域音频信号;
    将选择后的频域音频信号输入至所述预训练的机器学习模型中。
  12. 如权利要求9所述的电子设备,其中,所述处理器执行所述计算机程序时还实现:
    若所述目标音阶与所述标准音阶之间的音阶差值小于预定音阶差值,且所述目标音分与所述标准音分之间的音分差值小于预定音分差值,则确定所述待检测电子乐器设备的音准检测结果为符合预定检测要求;
    若所述目标音阶与所述标准音阶之间的音阶差值大于或等于预定音阶差值,和/或所述目标音分与所述标准音分之间的音分差值大于或等于预定音分差值,则确定所述待检测电子乐器设备的音准检测结果为不符合预定检测要求。
  13. 如权利要求9所述的电子设备,其中,所述处理器执行所述计算机程序时还实现:
    基于所述音准检测结果,生成音准检测的通知消息;
    基于生成的通知消息执行预定的通知操作。
  14. 如权利要求9所述的电子设备,其中,所述处理器执行所述计算机程序时还实现:
    获取用于对待训练的机器学习模型进行训练的训练集样本数据,所述训练集样本数据中的每条样本数据包括频域音频信号、频域音频信号所对应的基频频率标签;
    通过所述训练集样本数据对待训练的机器学习模型进行训练,得到训练后的机器学习模型。
  15. 如权利要求14所述的电子设备,其中,所述处理器执行所述计算机程序时还实现:
    获取用于对训练后的机器学习模型进行校验的测试集样本数据,所述测试集样本数据中的每条样本数据包括频域音频信号、频域音频信号所对应的基频频率标签;
    将所述测试集样本数据的每条样本数据的频域音频信号输入至训练后的机器学习模型,输出得到预测的基频频率标签;
    若所述测试集样本数据中的基频频率标签与预测的基频频率标签一致的样本数据条数占所述测试集样本数据中总样本数据条数的比例超过预定比例阈值,则将训练后的机器学习模型识别为所述预训练的机器学习模型。
  16. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序被处理器执行时实现:
    获取待检测电子乐器设备对应的时域音频信号;
    对所述时域音频信号进行频域转换处理,得到频域音频信号;
    将所述频域音频信号输入至预训练的机器学习模型中,所述预训练的机器学习模型通过包含有频域音频信号以及频域音频信号对应的基频频率标签的样本数据训练得到;
    获取所述预训练的机器学习模型输出的所述频域音频信号对应的基频频率标签;
    根据所述基频频率标签、以及基频频率标签与音阶、音分之间的对应关系,确定目标音阶和目标音分;
    基于所确定的目标音阶和目标音分以及待检测电子乐器设备所对应的标准音阶和标准音分,确定所述待检测电子乐器设备的音准检测结果。
  17. 如权利要求16所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还实现:
    基于预设的采样频率对所述待检测电子乐器设备所产生的声音信号进行采样处理,得到待检测电子乐器设备对应的时域音频信号。
  18. 如权利要求16所述的计算机可读存储介质,其中,所述处理器执行所述计算机程序时还实现:
    在所述频域音频信号中,选择处于预定频率范围内的频域音频信号,得到选择后的频域音频信号;
    将选择后的频域音频信号输入至所述预训练的机器学习模型中。
  19. 如权利要求16所述的计算机可读存储介质,其中,所述处理器执行所述计算机程序时还实现:
    若所述目标音阶与所述标准音阶之间的音阶差值小于预定音阶差值,且所述目标音分与所述标准音分之间的音分差值小于预定音分差值,则确定所述待检测电子乐器设备的音准检测结果为符合预定检测要求;
    若所述目标音阶与所述标准音阶之间的音阶差值大于或等于预定音阶差值,和/或所述目标音分与所述标准音分之间的音分差值大于或等于预定音分差值,则确定所述待检测电子乐器设备的音准检测结果为不符合预定检测要求。
  20. 如权利要求16所述的计算机可读存储介质,其中,所述处理器执行所述计算机程序时还实现:
    获取用于对待训练的机器学习模型进行训练的训练集样本数据,所述训练集样本数据中的每条样本数据包括频域音频信号、频域音频信号所对应的基频频率标签;
    通过所述训练集样本数据对待训练的机器学习模型进行训练,得到训练后的机器学习模型。
PCT/CN2021/083398 2020-11-25 2021-03-26 音频处理方法、装置、电子设备和存储介质 WO2021213135A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011341834.0 2020-11-25
CN202011341834.0A CN112489682B (zh) 2020-11-25 2020-11-25 音频处理方法、装置、电子设备和存储介质

Publications (1)

Publication Number Publication Date
WO2021213135A1 true WO2021213135A1 (zh) 2021-10-28

Family

ID=74934478

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/083398 WO2021213135A1 (zh) 2020-11-25 2021-03-26 音频处理方法、装置、电子设备和存储介质

Country Status (2)

Country Link
CN (1) CN112489682B (zh)
WO (1) WO2021213135A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113763930A (zh) * 2021-11-05 2021-12-07 深圳市倍轻松科技股份有限公司 语音分析方法、装置、电子设备以及计算机可读存储介质
CN116861316A (zh) * 2023-09-04 2023-10-10 国网浙江省电力有限公司余姚市供电公司 一种电器监测方法及装置
CN116884438A (zh) * 2023-09-08 2023-10-13 杭州育恩科技有限公司 基于声学特征的练琴音准检测方法及系统
CN117041858A (zh) * 2023-08-14 2023-11-10 央广云听文化传媒有限公司 空间音频播放优化方法和装置

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112489682B (zh) * 2020-11-25 2023-05-23 平安科技(深圳)有限公司 音频处理方法、装置、电子设备和存储介质
CN113744756A (zh) * 2021-08-11 2021-12-03 浙江讯飞智能科技有限公司 设备质检及音频数据扩充方法和相关装置、设备、介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7812243B2 (en) * 2002-07-16 2010-10-12 Line 6, Inc. Stringed instrument with embedded DSP modeling for modeling acoustic stringed instruments
CN205388515U (zh) * 2016-03-21 2016-07-20 王治泽 板材震动频率检测器
CN107705775A (zh) * 2017-08-17 2018-02-16 广东工业大学 一种基于rbf神经网络的多种乐器调音方法
CN207572057U (zh) * 2017-07-28 2018-07-03 得理电子(上海)有限公司 一种数字音准检测模块及音准检测系统
CN111798814A (zh) * 2020-06-23 2020-10-20 广州欧米勒钢琴有限公司 一种钢琴自助调音系统
CN112489682A (zh) * 2020-11-25 2021-03-12 平安科技(深圳)有限公司 音频处理方法、装置、电子设备和存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6720520B2 (ja) * 2015-12-18 2020-07-08 カシオ計算機株式会社 感情推定器生成方法、感情推定器生成装置、感情推定方法、感情推定装置及びプログラム
CN108172224B (zh) * 2017-12-19 2019-08-27 浙江大学 基于机器学习的防御无声指令控制语音助手的方法
CN108766440B (zh) * 2018-05-28 2020-01-14 平安科技(深圳)有限公司 说话人分离模型训练方法、两说话人分离方法及相关设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7812243B2 (en) * 2002-07-16 2010-10-12 Line 6, Inc. Stringed instrument with embedded DSP modeling for modeling acoustic stringed instruments
CN205388515U (zh) * 2016-03-21 2016-07-20 王治泽 板材震动频率检测器
CN207572057U (zh) * 2017-07-28 2018-07-03 得理电子(上海)有限公司 一种数字音准检测模块及音准检测系统
CN107705775A (zh) * 2017-08-17 2018-02-16 广东工业大学 一种基于rbf神经网络的多种乐器调音方法
CN111798814A (zh) * 2020-06-23 2020-10-20 广州欧米勒钢琴有限公司 一种钢琴自助调音系统
CN112489682A (zh) * 2020-11-25 2021-03-12 平安科技(深圳)有限公司 音频处理方法、装置、电子设备和存储介质

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113763930A (zh) * 2021-11-05 2021-12-07 深圳市倍轻松科技股份有限公司 语音分析方法、装置、电子设备以及计算机可读存储介质
CN117041858A (zh) * 2023-08-14 2023-11-10 央广云听文化传媒有限公司 空间音频播放优化方法和装置
CN117041858B (zh) * 2023-08-14 2024-04-09 央广云听文化传媒有限公司 空间音频播放优化方法和装置
CN116861316A (zh) * 2023-09-04 2023-10-10 国网浙江省电力有限公司余姚市供电公司 一种电器监测方法及装置
CN116861316B (zh) * 2023-09-04 2023-12-15 国网浙江省电力有限公司余姚市供电公司 一种电器监测方法及装置
CN116884438A (zh) * 2023-09-08 2023-10-13 杭州育恩科技有限公司 基于声学特征的练琴音准检测方法及系统
CN116884438B (zh) * 2023-09-08 2023-12-01 杭州育恩科技有限公司 基于声学特征的练琴音准检测方法及系统

Also Published As

Publication number Publication date
CN112489682B (zh) 2023-05-23
CN112489682A (zh) 2021-03-12

Similar Documents

Publication Publication Date Title
WO2021213135A1 (zh) 音频处理方法、装置、电子设备和存储介质
WO2021174757A1 (zh) 语音情绪识别方法、装置、电子设备及计算机可读存储介质
Wang et al. Towards query-efficient adversarial attacks against automatic speech recognition systems
US11017774B2 (en) Cognitive audio classifier
US8901406B1 (en) Selecting audio samples based on excitation state
CN116072098B (zh) 音频信号生成方法、模型训练方法、装置、设备和介质
US20230317052A1 (en) Sample generation method and apparatus
WO2023245389A1 (zh) 歌曲生成方法、装置、电子设备和存储介质
US20180349794A1 (en) Query rejection for language understanding
CN112309409A (zh) 音频修正方法及相关装置
JP2019144485A (ja) コード特定方法およびプログラム
CN111399745A (zh) 音乐播放方法、音乐播放界面生成方法及相关产品
WO2020052135A1 (zh) 音乐推荐的方法、装置、计算装置和存储介质
CN111898753A (zh) 音乐转录模型的训练方法、音乐转录方法以及对应的装置
Sturm et al. Formalizing the problem of music description
Cui et al. Evaluation System of Mobile English Learning Platform by Using Deep Learning Algorithm
CN114676227B (zh) 样本生成方法、模型的训练方法以及检索方法
CN114302301B (zh) 频响校正方法及相关产品
US20220130411A1 (en) Defect-detecting device and defect-detecting method for an audio device
CN113555031B (zh) 语音增强模型的训练方法及装置、语音增强方法及装置
CN108962389A (zh) 用于风险提示的方法及系统
Hao Optimizing the Design of a Vocal Teaching Platform Based on Big Data Feature Analysis of the Audio Spectrum
US20220093089A1 (en) Model constructing method for audio recognition
US20190385590A1 (en) Generating device, generating method, and non-transitory computer readable storage medium
CN114446316B (zh) 音频分离方法、音频分离模型的训练方法、装置及设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21791637

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21791637

Country of ref document: EP

Kind code of ref document: A1