CN112435680A - Audio processing method and device, electronic equipment and computer readable storage medium - Google Patents

Audio processing method and device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN112435680A
CN112435680A CN201910728211.XA CN201910728211A CN112435680A CN 112435680 A CN112435680 A CN 112435680A CN 201910728211 A CN201910728211 A CN 201910728211A CN 112435680 A CN112435680 A CN 112435680A
Authority
CN
China
Prior art keywords
audio
pitch
tone
audio processing
original audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910728211.XA
Other languages
Chinese (zh)
Inventor
冯穗豫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN201910728211.XA priority Critical patent/CN112435680A/en
Publication of CN112435680A publication Critical patent/CN112435680A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/013Adapting to target pitch
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/167Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/013Adapting to target pitch
    • G10L2021/0135Voice conversion or morphing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals
    • G10L2025/906Pitch tracking

Abstract

The disclosure discloses an audio processing method, an audio processing device, an electronic device and a computer-readable storage medium. The audio processing method comprises the following steps: receiving raw audio from an audio source; detecting a number of occurrences of a plurality of pitches in the original audio; sequencing the detected multiple pitches according to the occurrence times of the multiple pitches to obtain a pitch sequence of the original audio; and determining the tone of the original audio according to the pitch sequence. By the method, the technical problem that the audio processing effect is poor due to the fact that the tone of the audio cannot be automatically judged in the prior art is solved.

Description

Audio processing method and device, electronic equipment and computer readable storage medium
Technical Field
The present disclosure relates to the field of audio processing, and in particular, to an audio processing method and apparatus, an electronic device, and a computer-readable storage medium.
Background
With the continuous development of mobile terminal technology, mobile phone application programs have been advanced to various levels in various fields. Acoustics is the most direct way to reflect human character features, and is of course widely applied to various media platforms. Commonly, there are players, videos, and so on. Most of the applications serve as auxiliary tools for programs, and the processing of sound and the intelligent interaction of sound become the main leading technologies for research and development in the acoustic field. The sound changing technology has a wide application base in real life, such as application in a text-to-speech conversion system, dubbing in a video program, helping a speaker with a damaged sound track to improve the speech intelligibility of the speaker, and performing personalized voice camouflage in secret communication, and the like. At present, telephone sound changers and sound changing software appearing in the markets at home and abroad are typical sound changing equipment capable of being used for personalized voice camouflage.
However, the existing sound changing method can only process sound in one mode or only process sound by selecting a processing mode by a user, and the processed sound is not ideal and cannot achieve the optimal effect.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
In a first aspect, an embodiment of the present disclosure provides an audio processing method, including:
receiving raw audio from an audio source;
detecting a number of occurrences of a plurality of pitches in the original audio;
sequencing the detected multiple pitches according to the occurrence times of the multiple pitches to obtain a pitch sequence of the original audio;
and determining the tone of the original audio according to the pitch sequence.
In a second aspect, an embodiment of the present disclosure provides an audio processing apparatus, including:
an audio receiving module for receiving raw audio from an audio source;
a pitch detection module for detecting the occurrence number of a plurality of pitches in the original audio;
the pitch sequencing module is used for sequencing the detected multiple pitches according to the occurrence times of the multiple pitches to obtain a pitch sequence of the original audio;
and the tone determining module is used for determining the tone of the original audio according to the pitch sequence.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the audio processing method of any of the preceding first aspects.
In a fourth aspect, the disclosed embodiments provide a non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer instructions for causing a computer to execute the audio processing method of any one of the foregoing first aspects.
The disclosure discloses an audio processing method, an audio processing device, an electronic device and a computer-readable storage medium. The audio processing method comprises the following steps: receiving raw audio from an audio source; detecting a number of occurrences of a plurality of pitches in the original audio; sequencing the detected multiple pitches according to the occurrence times of the multiple pitches to obtain a pitch sequence of the original audio; and determining the tone of the original audio according to the pitch sequence. By the method, the technical problem that the audio processing effect is poor due to the fact that the tone of the audio cannot be automatically judged in the prior art is solved.
The foregoing is a summary of the present disclosure, and for the purposes of promoting a clear understanding of the technical means of the present disclosure, the present disclosure may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.
Fig. 1 is a flowchart of an embodiment of an audio processing method provided by the present disclosure;
fig. 2 is a flowchart illustrating a specific example of step S102 in an embodiment of an audio processing method provided in the present disclosure;
fig. 3 is a flowchart illustrating a specific example of step S104 in an embodiment of an audio processing method provided in the present disclosure;
FIG. 4 is a flow chart of a further embodiment of an audio processing method provided by the present disclosure;
fig. 5 is a schematic structural diagram of an embodiment of an audio processing apparatus according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of an electronic device provided according to an embodiment of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
Fig. 1 is a flowchart of an embodiment of an audio processing method provided in an embodiment of the present disclosure, where the audio processing method provided in this embodiment may be executed by an audio processing apparatus, the audio processing apparatus may be implemented as software, or implemented as a combination of software and hardware, and the audio processing apparatus may be integrated in a certain device in an audio processing system, such as an audio processing server or an audio processing terminal device. As shown in fig. 1, the method comprises the steps of:
step S101, receiving original audio from an audio source;
optionally, in this step, the audio source is an audio sensor, and the audio is audio received from the audio sensor. The audio sensor refers to various devices capable of acquiring audio, and a typical audio sensor is a microphone and the like. In this embodiment, the audio sensor may be a microphone on the terminal device, and the microphone may directly capture a song sung by the user, or the like.
Optionally, in this step, the audio source is a memory, the audio is an audio file received from the memory, and the audio file is an audio file recorded in advance and stored in the memory. Optionally, the memory is a local memory, a removable memory, or a network memory.
Optionally, in this step, receiving the original audio includes converting the audio from one format to another format, typically a CD disc, and receiving the audio from the CD disc and converting it to the original audio used in the present disclosure.
It will be appreciated that the audio source is not limited to the types listed above and any source that can acquire audio may be used in the present disclosure.
Step S102, detecting the occurrence frequency of a plurality of pitches in the original audio;
in the present disclosure, the original audio includes at least a melody sound, such as a piece of music or a piece of singing voice.
In this step, a plurality of pitches in the melody in the original audio and the number of times the plurality of pitches appear in the audio are detected. The pitch is also called pitch, and is determined by the vibration frequency of the sound, and is high when the vibration frequency is high, and is low when the vibration frequency is low. In general, in music, pitches can be represented by using names C, D, E, F, G, A, B, etc., and represent the names do, re, mi, fa, sol, la, si, respectively, and in different tones, the names can be represented by different names, such as C as do in C major and D as do in D major, and the following names correspond to the names in order.
Optionally, as shown in fig. 2, the detecting the number of occurrences of multiple pitches in the original audio includes:
step S201, detecting a plurality of sound frequencies in original audio;
step S202, determining the pitch corresponding to each sound frequency;
in step S203, the number of occurrences of each pitch is calculated.
In steps S201 and S202, the frequency of the sound may be detected using a frequency domain detection method, which is typically a harmonic peak method based on a Fast Fourier Transform (FFT) transform, where the signal is FFT transformed into a discrete frequency spectrum, and the maximum peak corresponds to the fundamental frequency. The pitch corresponding to each sound frequency can be determined by the correspondence of the fundamental frequency to the pitch. Corresponding all the detected basic frequencies in the audio to a pitch, completing the steps of detecting the sound frequency and determining the pitch in steps S201 and S202, and then counting the same pitch in step S203 to obtain the frequency of each pitch appearing in the audio.
Step S103, sequencing the detected multiple pitches according to the occurrence times of the multiple pitches to obtain a pitch sequence of the original audio;
optionally, the detected multiple pitches are arranged from large to small according to the occurrence times of the multiple pitches, so as to obtain a pitch sequence of the original audio, where the pitch sequence indicates which pitches appear most in the audio file.
Optionally, the pitch sequence includes not only the arrangement order of the pitches but also the specific occurrence number of the pitches.
And step S104, determining the tone of the original audio according to the pitch sequence.
Tonality (Tonality) is a general term for the key and mode class of a tone. For example, a major mode with C as the dominant tone may have a tonality of "C major", and a minor mode with a as the dominant tone may have a tonality of "a minor". By analogy, there are mainly 24 key features in general music.
Typically, the tonality of the major C key is C, D, E, F, G, A, B; the tone of Db major key Db, Eb, F, Gb, Ab, Bb, C; the major key of D is D, E, Gb, G, A, B and Db; the tone of the Eb is Eb, F, G, Ab, Bb, C and D; the major key of A is A, B, Db, D, E, Gb and Ab; the major key of Bb is Bb, C, D, Eb, F, G and A; the major key B has the key characters of B, Db, Eb, E, Gb, Ab, Bb and the like, and other major keys or minor keys are not described any more.
With the tonal patterns, the pitch sequence obtained in step S103 is used to match the tonal patterns, so that the tonal properties of the original audio can be obtained. Alternatively, an absolute matching method may be used, i.e. only whether a pitch in the pitch pattern appears in the pitch sequence or not. In one specific example, the pitch sequence is: D. c, G, Bb, B, Db, F, A and E. When the matching is carried out specifically, because the mode has only 7 pitches, only the first 7 pitches in the pitch sequence can be compared, namely D, C, G, Bb, B, Db and F are used for matching. Firstly, matching C major scale, which can be matched to D, C, G, B; then continuing to match Db major key, and matching C, Bb, Db and F; d major key is continuously matched, and D, G, B, Db can be matched; continuously matching the Eb major key, and matching D, C, G, Bb and F; continuously matching A major scale, and matching to D, G, B, Db; continuously matching Bb major key, and matching D, C, G, Bb and F; the major key B can be matched with Bb, B and Db continuously, and other key matching can be continued without repeated description. Through the matching process, the highest matching degree is Eb major key and Bb major key, and the high tone sequence is matched with the 5 pitches of D, C, G, Bb and F, so that the tone quality of the audio can be judged to be Eb major key or Bb major key.
In the above scenario, sometimes the only tone with the highest matching degree may be directly obtained, and at this time, which tone the audio belongs to may be directly determined, but a situation that the matching result of the same pitch sequence in multiple tones is the same may also occur, and the audio in the above example may be matched to two tones Eb and Bb, and at this time, the accurate tone needs to be determined, and optionally, the tone of the original audio may also be determined in a weighted manner. Optionally, before step S104, the method further includes: and acquiring the weight value of each pitch in each tone.
Wherein the obtaining a weight value of each pitch in each tonality comprises: acquiring the weight value of the pitch in the tone according to the position of the pitch in the tone. Specifically, in step S102, the correspondence between the pitch and the note name is described, for each tone, the note name of the first pitch is do, which is numerically represented as 1, and do, re, mi, fa, sol, la, si respectively correspond to 1, 2, 3, 4, 5, 6, 7, where the stable sound weighting value of 1, 2, 3, 5, 6 is 5, the unstable sound weighting value of 4, 7 is 2, and the other ascending sound weighting value is 0. Thus, because the note name of each pitch is different in each tone, its corresponding weight value will be different in each tone.
Based thereon, the determining the tonality of the original audio from the pitch sequence comprises:
step S301, acquiring N pitches arranged at the first N bits in the pitch sequence, wherein N is an integer greater than zero;
step S302, calculating a tonal weight value of the original audio relative to each tone according to the weight value and the N pitches;
step S303, determining the tonality with the largest tonality weight value as the tonality of the original audio.
Specifically, in the above example, taking D, C, G, Bb, B, Db, and E as examples, D occurs 20 times, C occurs 18 times, G occurs 15 times, Bb occurs 12 times, B occurs 10 times, Db occurs 7 times, F occurs 5 times, a occurs 3 times, and E occurs 2 times in the pitch sequence, the weights may be calculated by taking the first 7 pitches in the pitch sequence, i.e., D, C, G, Bb, B, Db, and F. Then, for the Eb major key and Bb major key in the above example, Eb, F, G, Ab, Bb, C, D in Eb major key correspond to 1, 2, 3, 4, 5, 6, 7, respectively, and for Eb major key, D, C, G, Bb, F, that is, 7, 6, 3, 5, 2 can be matched, and the weighted values are 2, 5, respectively, then the tonality weighted value can be obtained as: 2 × 20+5 × 18+5 × 15+5 × 12+2 × 5 ═ 275; bb, C, D, Eb, F, G, a in Bb major key corresponds to 1, 2, 3, 4, 5, 6, 7, respectively, for Bb major key, D, C, G, Bb, F can be matched, that is, 3, 2, 6, 1, 5, then their weighted values are 5, respectively, then the tonal weighted values can be obtained as: 5 × 20+5 × 18+5 × 15+5 × 12+5 × 5 ═ 350. The calculation of the weighted value can obtain that the tonality weighted value of Bb major key is the maximum, so that the tonality of the audio can be determined to be Bb major key.
It is understood that the above two embodiments for determining the tonality of the audio may be used alone or in combination, that is, either one of the methods may be used alone to determine the tonality of the audio, or the two embodiments may be used in combination, that is, the first method is used first, and when the tonality of the audio cannot be determined by the first method, the second method is further used for the tonality that cannot be distinguished.
After obtaining the tonality to which the audio belongs, the audio may be further processed according to the tonality, as shown in fig. 4, after step S104, further comprising:
step S401, obtaining an audio processing parameter corresponding to the tone according to the tone of the original audio;
step S402, processing the original audio according to the audio processing parameter to obtain a first audio.
In the present disclosure, each supported tone corresponds to an audio processing parameter, and the audio processing parameter is used to process the audio to obtain a processed first audio.
Optionally, the obtaining, according to the tonality of the original audio, an audio processing parameter corresponding to the tonality includes: acquiring a configuration file of audio processing parameters; and searching the adjusting frequency of each pitch in the tone according to the tone of the original audio in the configuration file. In the method, a configuration file is preset, and the corresponding relation between the tone and the audio processing parameter is recorded in the configuration file, wherein the audio processing parameter is the adjusting frequency of each pitch or another pitch which is required to be adjusted and corresponds to each pitch. It is understood that a plurality of profiles can be preset to obtain different effects under different processing environments, such as a requirement for processing audio into an electric sound effect, a japanese wind effect, a national wind effect, or the like, and the processing parameters are different, and different processing effects can be realized through different profiles. For different tonality, the tone rule is required to be met when the audio is processed, so that the processing modes for each tone are different, for example, a song is required to be processed into an electric sound effect, and if the processing mode of the C major key is used for processing the D major key, conditions such as running and unnatural tone may occur, so that the audio processing parameters corresponding to each major key are required to be set.
The disclosure discloses an audio processing method, an audio processing device, an electronic device and a computer-readable storage medium. The audio processing method comprises the following steps: receiving raw audio from an audio source; detecting a number of occurrences of a plurality of pitches in the original audio; sequencing the detected multiple pitches according to the occurrence times of the multiple pitches to obtain a pitch sequence of the original audio; and determining the tone of the original audio according to the pitch sequence. By the method, the technical problem that the audio processing effect is poor due to the fact that the tone of the audio cannot be automatically judged in the prior art is solved.
In the above, although the steps in the above method embodiments are described in the above sequence, it should be clear to those skilled in the art that the steps in the embodiments of the present disclosure are not necessarily performed in the above sequence, and may also be performed in other sequences such as reverse, parallel, and cross, and further, on the basis of the above steps, other steps may also be added by those skilled in the art, and these obvious modifications or equivalents should also be included in the protection scope of the present disclosure, and are not described herein again.
Fig. 5 is a schematic structural diagram of an embodiment of an audio processing apparatus according to an embodiment of the disclosure, and as shown in fig. 5, the apparatus 500 includes: an audio receiving module 501, a pitch detection module 502, a pitch sorting module 503, and a tonality determination module 504. Wherein the content of the first and second substances,
an audio receiving module 501 for receiving original audio from an audio source;
a pitch detection module 502 for detecting the number of occurrences of a plurality of pitches in the original audio;
a pitch sorting module 503, configured to sort the detected multiple pitches according to the occurrence times of the detected multiple pitches to obtain a pitch sequence of the original audio;
a tonality determination module 504, configured to determine a tonality of the original audio according to the pitch sequence.
Further, the audio processing apparatus 500 further includes:
the processing parameter acquisition module is used for acquiring audio processing parameters corresponding to the tone according to the tone of the original audio;
and the audio processing submodule is used for processing the original audio according to the audio processing parameter to obtain a first audio.
Further, the audio processing apparatus 500 further includes:
and the weight value acquisition module is used for acquiring the weight value of each pitch in each tone.
Further, the weight value obtaining module is further configured to:
acquiring the weight value of the pitch in the tone according to the position of the pitch in the tone.
Further, the tonality determining module 504 further includes:
a weighted pitch obtaining module, configured to obtain N pitches of the top N bits in the pitch sequence, where N is an integer greater than zero;
a tonality weight value calculation module, configured to calculate a tonality weight value of the original audio with respect to each tonality according to the weight value and the N pitches;
and the tonality determining submodule is used for determining the tonality with the maximum tonality weight value as the tonality of the original audio.
Further, the pitch detection module 502 further includes:
the sound frequency detection module is used for detecting a plurality of sound frequencies in the original audio;
the pitch determining module is used for determining the pitch corresponding to each sound frequency;
and the frequency calculating module is used for calculating the occurrence frequency of each pitch.
Further, the processing parameter obtaining module further includes:
the configuration file acquisition module is used for acquiring a configuration file of the audio processing parameters;
and the parameter searching module is used for searching the adjusting frequency of each pitch in the tone in the configuration file according to the tone of the original audio.
The apparatus shown in fig. 5 can perform the method of the embodiment shown in fig. 1-4, and the detailed description of this embodiment can refer to the related description of the embodiment shown in fig. 1-4. The implementation process and technical effect of the technical solution refer to the descriptions in the embodiments shown in fig. 1 to fig. 4, and are not described herein again.
Referring now to FIG. 6, a block diagram of an electronic device 600 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from storage 606 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 606 including, for example, magnetic tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such embodiments, the computer program may be downloaded and installed from a network through the communication device 609, or installed from the storage device 606, or installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: receiving raw audio from an audio source; detecting a number of occurrences of a plurality of pitches in the original audio; sequencing the detected multiple pitches according to the occurrence times of the multiple pitches to obtain a pitch sequence of the original audio; and determining the tone of the original audio according to the pitch sequence.
Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
According to one or more embodiments of the present disclosure, there is provided an audio processing method including:
receiving raw audio from an audio source;
detecting a number of occurrences of a plurality of pitches in the original audio;
sequencing the detected multiple pitches according to the occurrence times of the multiple pitches to obtain a pitch sequence of the original audio;
and determining the tone of the original audio according to the pitch sequence.
Further, the method further comprises:
acquiring an audio processing parameter corresponding to the tone according to the tone of the original audio;
and processing the original audio according to the audio processing parameter to obtain a first audio.
Further, before the determining the tonality of the original audio according to the pitch sequence, the method further includes:
and acquiring the weight value of each pitch in each tone.
Further, the obtaining a weight value of each pitch in each tonality includes:
acquiring the weight value of the pitch in the tone according to the position of the pitch in the tone.
Further, the determining the tonality of the original audio from the pitch sequence comprises:
acquiring N pitches arranged at the first N bits in the pitch sequence, wherein N is an integer larger than zero;
calculating a tonal weight value of the original audio relative to each tone according to the weight value and the N pitches;
and determining the tonality with the maximum tonality weight value as the tonality of the original audio.
Further, the detecting the number of occurrences of the plurality of tones in the original audio includes:
detecting a plurality of sound frequencies in original audio;
determining the pitch corresponding to each sound frequency;
the number of occurrences for each pitch is calculated.
Further, the obtaining of the audio processing parameter corresponding to the tonal characteristic according to the tonal characteristic of the original audio includes:
acquiring a configuration file of audio processing parameters;
and searching the adjusting frequency of each pitch in the tone according to the tone of the original audio in the configuration file.
According to one or more embodiments of the present disclosure, there is provided an audio processing apparatus including:
an audio receiving module for receiving raw audio from an audio source;
a pitch detection module for detecting the occurrence number of a plurality of pitches in the original audio;
the pitch sequencing module is used for sequencing the detected multiple pitches according to the occurrence times of the multiple pitches to obtain a pitch sequence of the original audio;
and the tone determining module is used for determining the tone of the original audio according to the pitch sequence.
Further, the audio processing apparatus further includes:
the processing parameter acquisition module is used for acquiring audio processing parameters corresponding to the tone according to the tone of the original audio;
and the audio processing submodule is used for processing the original audio according to the audio processing parameter to obtain a first audio.
Further, the audio processing apparatus further includes:
and the weight value acquisition module is used for acquiring the weight value of each pitch in each tone.
Further, the weight value obtaining module is further configured to:
acquiring the weight value of the pitch in the tone according to the position of the pitch in the tone.
Further, the tonality determining module further includes:
a weighted pitch obtaining module, configured to obtain N pitches of the top N bits in the pitch sequence, where N is an integer greater than zero;
a tonality weight value calculation module, configured to calculate a tonality weight value of the original audio with respect to each tonality according to the weight value and the N pitches;
and the tonality determining submodule is used for determining the tonality with the maximum tonality weight value as the tonality of the original audio.
Further, the pitch detection module further includes:
the sound frequency detection module is used for detecting a plurality of sound frequencies in the original audio;
the pitch determining module is used for determining the pitch corresponding to each sound frequency;
and the frequency calculating module is used for calculating the occurrence frequency of each pitch.
Further, the processing parameter obtaining module further includes:
the configuration file acquisition module is used for acquiring a configuration file of the audio processing parameters;
and the parameter searching module is used for searching the adjusting frequency of each pitch in the tone in the configuration file according to the tone of the original audio.
According to one or more embodiments of the present disclosure, there is provided an electronic device including: at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform any of the audio processing methods described above.
According to one or more embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium characterized by storing computer instructions for causing a computer to execute any of the aforementioned audio processing methods.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims (10)

1. An audio processing method, comprising:
receiving raw audio from an audio source;
detecting a number of occurrences of a plurality of pitches in the original audio;
sequencing the detected multiple pitches according to the occurrence times of the multiple pitches to obtain a pitch sequence of the original audio;
and determining the tone of the original audio according to the pitch sequence.
2. The audio processing method of claim 1, wherein the method further comprises:
acquiring an audio processing parameter corresponding to the tone according to the tone of the original audio;
and processing the original audio according to the audio processing parameter to obtain a first audio.
3. The audio processing method of claim 1, wherein prior to the determining the tonality of the original audio from the pitch sequence, further comprising:
and acquiring the weight value of each pitch in each tone.
4. The audio processing method of claim 3, wherein the obtaining a weight value for each pitch in each tonality comprises:
acquiring the weight value of the pitch in the tone according to the position of the pitch in the tone.
5. The audio processing method of claim 3 or 4, wherein the determining of the tonality of the original audio from the pitch sequence comprises:
acquiring N pitches arranged at the first N bits in the pitch sequence, wherein N is an integer larger than zero;
calculating a tonal weight value of the original audio relative to each tone according to the weight value and the N pitches;
and determining the tonality with the maximum tonality weight value as the tonality of the original audio.
6. The audio processing method as claimed in claim 1, wherein said detecting a number of occurrences of a plurality of tones in the original audio comprises:
detecting a plurality of sound frequencies in original audio;
determining the pitch corresponding to each sound frequency;
the number of occurrences for each pitch is calculated.
7. The audio processing method according to claim 2, wherein said obtaining audio processing parameters corresponding to the tonality according to the tonality of the original audio comprises:
acquiring a configuration file of audio processing parameters;
and searching the adjusting frequency of each pitch in the tone according to the tone of the original audio in the configuration file.
8. An audio processing apparatus comprising:
an audio receiving module for receiving raw audio from an audio source;
a pitch detection module for detecting the occurrence number of a plurality of pitches in the original audio;
the pitch sequencing module is used for sequencing the detected multiple pitches according to the occurrence times of the multiple pitches to obtain a pitch sequence of the original audio;
and the tone determining module is used for determining the tone of the original audio according to the pitch sequence.
9. An electronic device, comprising:
a memory for storing computer readable instructions; and
a processor for executing the computer readable instructions such that the processor when executed implements the audio processing method of any of claims 1-7.
10. A non-transitory computer-readable storage medium storing computer-readable instructions that, when executed by a computer, cause the computer to perform the audio processing method of any one of claims 1-7.
CN201910728211.XA 2019-08-08 2019-08-08 Audio processing method and device, electronic equipment and computer readable storage medium Pending CN112435680A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910728211.XA CN112435680A (en) 2019-08-08 2019-08-08 Audio processing method and device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910728211.XA CN112435680A (en) 2019-08-08 2019-08-08 Audio processing method and device, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN112435680A true CN112435680A (en) 2021-03-02

Family

ID=74689468

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910728211.XA Pending CN112435680A (en) 2019-08-08 2019-08-08 Audio processing method and device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN112435680A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113178183A (en) * 2021-04-30 2021-07-27 杭州网易云音乐科技有限公司 Sound effect processing method and device, storage medium and computing equipment
CN113178183B (en) * 2021-04-30 2024-05-14 杭州网易云音乐科技有限公司 Sound effect processing method, device, storage medium and computing equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10293586A (en) * 1998-06-01 1998-11-04 Yamaha Corp Automatic accompaniment device
KR20030095474A (en) * 2002-06-10 2003-12-24 휴먼씽크(주) Method and apparatus for analysing a pitch, method and system for discriminating a corporal punishment, and computer readable medium storing a program thereof
JP2006195384A (en) * 2005-01-17 2006-07-27 Matsushita Electric Ind Co Ltd Musical piece tonality calculating device and music selecting device
JP2006201614A (en) * 2005-01-21 2006-08-03 Victor Co Of Japan Ltd Device for recognizing musical interval, and voice conversion device using the same
JP2007041234A (en) * 2005-08-02 2007-02-15 Univ Of Tokyo Method for deducing key of music sound signal, and apparatus for deducing key
CN105845115A (en) * 2016-03-16 2016-08-10 腾讯科技(深圳)有限公司 Song mode determining method and song mode determining device
CN106057208A (en) * 2016-06-14 2016-10-26 科大讯飞股份有限公司 Audio correction method and device
CN109979488A (en) * 2019-03-14 2019-07-05 浙江大学 Voice based on stress analysis turns music notation system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10293586A (en) * 1998-06-01 1998-11-04 Yamaha Corp Automatic accompaniment device
KR20030095474A (en) * 2002-06-10 2003-12-24 휴먼씽크(주) Method and apparatus for analysing a pitch, method and system for discriminating a corporal punishment, and computer readable medium storing a program thereof
JP2006195384A (en) * 2005-01-17 2006-07-27 Matsushita Electric Ind Co Ltd Musical piece tonality calculating device and music selecting device
JP2006201614A (en) * 2005-01-21 2006-08-03 Victor Co Of Japan Ltd Device for recognizing musical interval, and voice conversion device using the same
JP2007041234A (en) * 2005-08-02 2007-02-15 Univ Of Tokyo Method for deducing key of music sound signal, and apparatus for deducing key
CN105845115A (en) * 2016-03-16 2016-08-10 腾讯科技(深圳)有限公司 Song mode determining method and song mode determining device
CN106057208A (en) * 2016-06-14 2016-10-26 科大讯飞股份有限公司 Audio correction method and device
CN109979488A (en) * 2019-03-14 2019-07-05 浙江大学 Voice based on stress analysis turns music notation system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113178183A (en) * 2021-04-30 2021-07-27 杭州网易云音乐科技有限公司 Sound effect processing method and device, storage medium and computing equipment
CN113178183B (en) * 2021-04-30 2024-05-14 杭州网易云音乐科技有限公司 Sound effect processing method, device, storage medium and computing equipment

Similar Documents

Publication Publication Date Title
WO2020119150A1 (en) Rhythm point recognition method and apparatus, electronic device, and storage medium
CN111798821B (en) Sound conversion method, device, readable storage medium and electronic equipment
CN111785238B (en) Audio calibration method, device and storage medium
CN112489606B (en) Melody generation method, device, readable medium and electronic equipment
CN111309962A (en) Method and device for extracting audio clip and electronic equipment
CN111785247A (en) Voice generation method, device, equipment and computer readable medium
US20150310874A1 (en) Adaptive audio signal filtering
CN106653049A (en) Addition of virtual bass in time domain
CN111369968A (en) Sound reproduction method, device, readable medium and electronic equipment
CN112669878B (en) Sound gain value calculation method and device and electronic equipment
CN111429881B (en) Speech synthesis method and device, readable medium and electronic equipment
US20130178963A1 (en) Audio system with adaptable equalization
CN112562633A (en) Singing synthesis method and device, electronic equipment and storage medium
CN113496706B (en) Audio processing method, device, electronic equipment and storage medium
CN112259076A (en) Voice interaction method and device, electronic equipment and computer readable storage medium
CN110085214B (en) Audio starting point detection method and device
CN111653261A (en) Speech synthesis method, speech synthesis device, readable storage medium and electronic equipment
CN112435680A (en) Audio processing method and device, electronic equipment and computer readable storage medium
CN113593527B (en) Method and device for generating acoustic features, training voice model and recognizing voice
CN112309410A (en) Song sound repairing method and device, electronic equipment and storage medium
CN111444384B (en) Audio key point determining method, device, equipment and storage medium
CN113763976B (en) Noise reduction method and device for audio signal, readable medium and electronic equipment
CN109375892B (en) Method and apparatus for playing audio
CN113179354A (en) Sound signal processing method and device and electronic equipment
CN111680754A (en) Image classification method and device, electronic equipment and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination