CN115331682A

CN115331682A - Method and apparatus for correcting pitch of audio

Info

Publication number: CN115331682A
Application number: CN202110512936.2A
Authority: CN
Inventors: 张超; 朱洁
Original assignee: Beijing Qiyin Miaoxiao Technology Co ltd
Current assignee: Beijing Qiyin Miaoxiao Technology Co ltd
Priority date: 2021-05-11
Filing date: 2021-05-11
Publication date: 2022-11-11

Abstract

The invention discloses a method for correcting the pitch of audio, which comprises the steps of obtaining a fundamental frequency sequence of the audio, and obtaining an original pitch sequence of the audio based on the fundamental frequency sequence; determining the closest canonical tonal scale using a similarity algorithm; generating a minimum pitch difference based on the standard-tone scale; forming a pitch difference sequence to be corrected based on the minimum pitch difference of the fundamental frequency of each time point and the offset direction thereof; and taking the audio and the pitch difference sequence to be corrected as input, and correcting the audio sequentially through resampling and a PSOLA algorithm to obtain a corrected audio. The pitch difference sequence to be corrected can be quickly calculated by finding the most similar tonal scale so as to determine the fundamental frequency corresponding to the most close musical tonal scale. The application also discloses a corresponding device for correcting the pitch of the audio.

Description

Method and apparatus for correcting pitch of audio

Technical Field

The invention relates to the field of audio signal processing and the field of algorithm composition, in particular to a method and a device for correcting the pitch of audio content based on music mode information.

Background

With the development and the rise of the music internet industry, the functions of music playing, online Karaoke (song singing based on an accompaniment system), voice social contact and the like have become a popular field direction in the mobile internet industry. The processing of audio signals, such as changing male voice into female voice, changing female voice into male voice, changing special voice and other karaoke products, audio correction of the karaoke products is increasingly appearing in mobile internet products, and the personalized requirements of users on the audio signals are more and more strong.

However, in the process of using the karaoke software, not every user can sing the correct pitch of every sound accurately, and with the increasing personalized demand of the new generation of young people, the user does not meet the singing mode with the same pitch rhythm as the original singing, but usually adopts the original humming mode to compose music with the style of the user. However, users often cannot sing existing or original songs accurately in a specific tune, and therefore there is a need for a method and system suitable for correcting pitch deviations when singing accompanying or unaffiliated original music.

Disclosure of Invention

Therefore, the invention provides a method and a device for correcting the pitch of audio content based on music mode information.

Some embodiments of the present application provide a method of modifying pitch of audio, comprising the steps of: obtaining a fundamental frequency sequence of the audio, wherein the fundamental frequency sequence comprises a plurality of time points and a fundamental frequency value of each time point; obtaining an original pitch sequence of the audio based on the sequence of fundamental frequencies; calculating the tonal scale which is closest to the original pitch sequence in different reference tonal scales by using a similarity algorithm and taking the tonal scale as a standard tonal scale; generating a standard-tone scale fundamental frequency reference table based on the standard-tone scale; calculating the minimum pitch difference between the fundamental frequency of each time point of the fundamental frequency sequence and the fundamental frequency in the standard tone scale fundamental frequency reference table by using a sorting method; forming a pitch difference sequence to be corrected based on the minimum pitch difference of the fundamental frequency of each time point and the offset direction thereof; and taking the audio and the pitch difference sequence to be corrected as input, and sequentially correcting the audio through resampling and a PSOLA algorithm to obtain the corrected audio.

Some embodiments of the present application provide a method of modifying pitch of audio, comprising the steps of: obtaining a fundamental frequency sequence of the audio, wherein the fundamental frequency sequence comprises a plurality of time points and a fundamental frequency value of each time point; obtaining an original pitch sequence of the audio based on the sequence of fundamental frequencies; calculating the tonal scale which is closest to the original pitch sequence in different reference tonal scales by using a cosine similarity algorithm and taking the tonal scale as a standard tonal scale; generating a standard-tone scale fundamental frequency reference table based on the standard-tone scale; calculating the minimum pitch difference between the fundamental frequency of each time point of the fundamental frequency sequence and the fundamental frequency in the standard tone scale fundamental frequency reference table by using a quick sequencing method; forming a pitch difference sequence to be corrected based on the minimum pitch difference of the fundamental frequency of each time point and the offset direction thereof; and taking the audio and the pitch difference sequence to be corrected as input, and correcting the audio sequentially through resampling and a PSOLA algorithm to obtain the corrected audio.

In some embodiments, the computing, as the standard-tone scale, a closest-to-standard-pitch-sequence scale of the different reference-tone scales using a similarity algorithm or a cosine similarity algorithm includes: using scales of different tones under preset tone combination as the reference tone scale; or using a user-determined scale of a different key as the reference-toned scale.

In some embodiments, said calculating, as a standard-tone scale, a closest-to-standard-pitch-sequence tonal scale of the different reference-tone scales using a similarity algorithm or a cosine similarity algorithm comprises: and screening the reference tonal scale by taking twelve pitches in each octave as a reference.

In some embodiments, the number of pitches screened out by the scale permutation within an octave is increased or decreased depending on the composition of the reference tonal scale.

In some embodiments, before the reference scale expression with the highest similarity is selected as the standard-pitch scale by using a similarity algorithm or a cosine similarity algorithm, the standard-pitch sequence is converted into the range of the octave which is the same as the reference-pitch scale according to the octave relation.

In some embodiments, the different reference-toned scales include one or more of twelve different-toned scales of common natural major, the paleo-toned scale, the modern musical-toned scale based on the pentatonic scale and the bruises scale, the national scale, and the like.

In some embodiments, generating the reference table of standard-toned scale fundamental frequencies for the standard-toned scale comprises generating a standard pitch sequence from the number of octaves of the standard-toned scale and the number of pitches of the toned scale in each octave, and converting the standard pitch sequence into a frequency sequence of standard pitches according to an international standard pitch-to-frequency comparison table, thereby obtaining the reference table of standard-toned scale fundamental frequencies.

In some embodiments, the number of octaves is preset or set by a user.

In some embodiments, the number of pitches of the toned scale in each octave is preset or set by the user.

In some embodiments, said calculating the pitch difference between the fundamental frequency of the audio and the fundamental frequency of the standard-pitch scale fundamental frequency reference table using a fast-ranking method comprises: shifting the fundamental frequency of each sampling point to a first direction in a step length mode; stopping shifting and recording a first accumulated shift amount when shifting to a frequency corresponding to a nearest standard pitch in the standard-toned-scale-fundamental-frequency reference table or when the difference between the frequency and the frequency corresponding to the nearest standard pitch in the standard-toned-scale-fundamental-frequency reference table is less than a predefined minimum tolerance; and determining the first accumulated offset as the minimum pitch difference for that sample point.

In some embodiments, said calculating the pitch difference between the fundamental frequency of the audio and the fundamental frequency of the standard-pitch scale fundamental frequency reference table using a fast-ranking method comprises: shifting the fundamental frequency of each sampling point to a first direction in a step length mode; stopping shifting and recording a first accumulated shift amount when shifting to a frequency corresponding to a nearest standard pitch in the standard-toned-scale-fundamental-frequency reference table or when the difference between the frequency and the frequency corresponding to the nearest standard pitch in the standard-toned-scale-fundamental-frequency reference table is less than a predefined minimum tolerance; step shifting the fundamental frequency of each sampling point to a second direction opposite to the first direction; stopping shifting and recording a second accumulated shift amount when shifting to a frequency corresponding to a nearest standard pitch in the standard-toned-scale-fundamental-frequency reference table or when the difference between the frequency and the frequency corresponding to the nearest standard pitch in the standard-toned-scale-fundamental-frequency reference table is less than a predefined minimum tolerance; comparing the first and second accumulated offsets and determining the smaller accumulated offset as the minimum pitch difference for that sample point.

In some embodiments, the offset of the step offset is predetermined or set by a user.

In some embodiments, the fundamental frequency sequence is determined using the PYin algorithm.

In some embodiments, according to the obtained sequence of pitch difference to be corrected, determining a multiple sequence of the audio whose pitch needs to be increased, and resampling the singing voice audio by 1/S times of the sampling rate when the singing voice of the user is collected to obtain the resampled audio; and (3) lengthening the re-sampled audio to S times through a PSOLA algorithm and a multiple sequence of the audio height to be corrected, wherein the multiple sequence needs to be improved.

In some embodiments, the multiple sequence in which the pitch to be modified needs to be increased is a calculated decimal array or a fixed decimal rather than a decimal array.

In some embodiments, the method further comprises re-synthesizing the human voice audio based on a PSOLA algorithm, with each pitch difference as a pitch variation coefficient of a formant at a corresponding time instant in the sequence of standard pitch human voice fundamental frequencies.

Further embodiments of the present application provide an apparatus for modifying a pitch of audio, comprising at least one processor and at least one memory including computer program code, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to perform any of the above methods of modifying a pitch of audio.

Compared with the traditional pitch correction technology, the method has the advantages that the most similar tonal scale is calculated by using a similarity algorithm, particularly a cosine similarity algorithm, for the first time, and the tonal scale is automatically obtained from the collected human voice audio; the pitch sequence to be corrected is calculated quickly by combining the two methods, so that the calculated amount is reduced.

Compared with the traditional pitch correction technology, the pitch correction method has the advantages that the method is completely consistent in applicable scenes: firstly, compared with music information extracted by singing voices of an original singer and music information extracted by singing voices of a user, the method is more suitable for original works of the original singer compared with the prior art; secondly, the application can select the tone scale to sing in national style such as Japanese style and Persian style according to user definition, and is more suitable for personalized singing output of the user.

Drawings

FIG. 1 is a schematic diagram of a method of modifying pitch of audio according to an embodiment of the application;

FIG. 2 is a schematic diagram of audio processing steps in a method of modifying pitch of audio according to an embodiment of the application;

FIG. 3 is a schematic diagram of a standard-toned scale processing step in a method of modifying the pitch of audio according to an embodiment of the present application;

4A, 4B are schematic diagrams of the sequencing algorithm steps in a method of modifying the pitch of audio according to an embodiment of the present application;

fig. 5 is a schematic diagram of a resampling and PSOLA algorithm in a method of modifying a pitch of audio according to an embodiment of the present application.

Detailed Description

The following detailed description of embodiments of the present application refers to the accompanying drawings.

Definition of terms:

to clearly convey the scope of the present application, and to avoid ambiguity, the general terms of the present application are defined as follows:

adjusting the formula: several tones organized according to different pitches in a certain interval relation with one tone as a core constitute one organism.

Tone scale: the main sound of a certain tone is taken as a starting point and an end point, and other tones of the tone are sequentially arranged into a scale form according to the order of pitches. Including natural, indian, and medium-old modes, each of which includes major and minor modes.

Musical interval: refers to the pitch relationship between two tones, denoted by "degrees". Including pure first degree, small second degree, increasing first degree, large second degree, decreasing third degree, small third degree, increasing second degree, large third degree, decreasing fourth degree, pure fourth degree, increasing fourth degree, decreasing fifth degree, pure fifth degree, small sixth degree, large sixth degree, small seventh degree, large seventh degree, pure eighth degree, etc.

It will be readily understood that the components of certain exemplary embodiments, as generally described and illustrated in the figures herein, could be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of some example embodiments of systems, methods, apparatuses, and computer program products related to an interactive multimedia architecture is not intended to limit the scope of some embodiments, but is representative of selected example embodiments.

The features, structures, or characteristics of the example embodiments described throughout this specification may be combined in any suitable manner in one or more example embodiments. For example, throughout the specification, use of the phrases "certain embodiments," "some embodiments," or other similar language refers to the fact that: a particular feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment. Thus, appearances of the phrases "in certain embodiments," "in some embodiments," "in other embodiments," or other similar language throughout this specification are not necessarily all referring to the same group of embodiments, and the described features, structures, or characteristics may be combined in any suitable manner in one or more example embodiments. In addition, the phrase "a group" refers to a group that includes one or more of the referenced group members. Thus, the phrases "a set," "one or more," and "at least one," or the equivalent may be used interchangeably. In addition, "or" is intended to mean "and/or" unless explicitly stated otherwise.

In addition, the different functions or operations discussed below may be performed in a different order and/or concurrently with each other, if desired. Furthermore, if desired, one or more of the described functions or operations may be optional or may be combined. As such, the following description should be considered as merely illustrative of the principles and teachings of certain exemplary embodiments, and not in limitation thereof.

Aiming at scenes that a user sings original songs, adapts existing songs and the like and contains human voices, the application provides a method and a system for correcting the pitch of the human voices in the songs to the pitch in a standard pitch scale.

The application relates to a method for modifying the pitch of audio, in particular human audio, which comprises collecting the singing voice of a user during humming a song at a sampling rate to obtain an audio file in a WAV format, for example, step S100; then, carrying out audio processing on the audio file to obtain a base frequency sequence and a pitch sequence of the audio file, and step S200; comparing the acquired pitch sequence of the voice with different reference tone scales by using a similarity calculation method, calculating the similarity, determining the closest reference tone scale as a standard tone scale and generating a standard tone scale base frequency reference table, and step S300; comparing each standard pitch in the standard pitch scale with the voice fundamental frequency sequence by using a sorting algorithm, and calculating to obtain the minimum pitch difference between each voice fundamental frequency and the standard pitch in the closest standard pitch scale, and step S400; and taking the minimum pitch difference corresponding to the pitch to be corrected in the human voice base frequency sequence as an input parameter to obtain the human voice base frequency sequence in the standard tone scale, and step S500.

The audio processing of the audio file may be performed using the PYin algorithm. It may comprise using the PYin algorithm to obtain an audio pitch sequence comprising a plurality of time points, step S201. The audio frequency fundamental frequency sequence can be in a sequence format, and comprises a record of a time point and a record of a numerical value of a fundamental frequency at a certain time point. After obtaining the fundamental frequency sequence of the audio frequency, the audio Pitch sequence corresponding to the fundamental frequency of the audio frequency at each time point can be obtained by referring to an international Pitch-frequency comparison table (Scientific Pitch notification), and step S202.

After the pitch sequence of the audio is obtained, one or more different reference tonal scales preset by the system or selected by the user can be adopted to be compared by using a similarity algorithm, such as a cosine similarity algorithm, so as to obtain the tonal scale closest to the pitch sequence of the audio. In addition to the cosine similarity calculation method, the similarity calculation may be performed using a method such as euclidean distance, manhattan distance, minkowski distance, jacobian similarity, pearson correlation coefficient, or the like.

The reference tonal scale may be any known tonal scale, such as twelve different tonal scales of the natural major, or may be one or more of the traditional chinese tonal scales, the modern musical tonal scales based on the pentatonic and bruises musical scales, national musical scales, etc.

Taking the natural major key as an example, twelve different key modes can be combined, including scales of C major key, G major key, E major key, B major key, D major key, A major key, rising F major key, falling B major key, falling E major key, falling A major key and falling D major key as reference key mode scales for comparison.

After the standard tonal scale of the audio is determined, the fundamental frequency of each standard pitch in the standard tonal scale is compared with the sequence of the fundamental frequencies of the audio by using a quick sorting algorithm, and the minimum pitch difference between each fundamental frequency of the audio and the standard pitch in the closest standard tonal scale is calculated. For example, the fundamental frequencies of the standard Pitch notes, which are pre-stored for the standard-Pitch scale, may be used for comparison, or the Pitch of the standard Pitch notes may be converted into frequencies according to the international standard Pitch-frequency comparison table (Scientific Pitch Notation) after the standard-Pitch scale is determined, wherein the number of standard Pitch notes in the standard Pitch note may be pre-set or set by the user, for example, for a determined standard-Pitch scale, a number of pitches of the standard-Pitch scale, e.g., N, may be selected according to the octave relation, or other musical interval relation, e.g., five, step S301, and the number of tones in each interval may be selected, step S302, e.g., seven tones are included in one octave, and N octaves are selected, a sequence of 7N standard pitches is generated, and the international tile Pitch sequence may be used as a basis to convert the Pitch of the standard Pitch note into frequencies according to the Pitch standard-frequency comparison table (Scientific), resulting in the standard Pitch scale table, step S303.

The octave relationship refers to the correspondence between different octaves, for example, C1 pitch and C2, C3, C4, C5, C6, C7 are set to have octave correspondence. When the major key C is determined to be a standard-key scale, pitches of one or more of the key ranges C1-B1, C2-B2, C3-B3, C4-B4, C5-B5 and C6-B6 can be selected as pitches of a standard pitch sequence, namely the number of octave ranges is preset or set by a user, and the number of pitches of the key scale in each octave range is preset or set by the user.

Finding the minimum pitch difference using a fast ranking algorithm may comprise: shifting the fundamental frequency of each time point of the audio frequency in a step length in a first direction, for example, shifting the fundamental frequency upwards by 2Hz, namely +2Hz each time, stopping shifting when the fundamental frequency is shifted to a frequency corresponding to a nearest standard pitch in a standard tonal scale fundamental frequency reference table or when the distance is less than a preset minimum tolerance, and recording an accumulated shift amount, step 401A; and the accumulated offset and the offset direction are used as the parameter of the minimum pitch difference and the time point to form a pitch difference sequence to be corrected, step 402A; its elements include time point, accumulated offset, offset direction. This offset is most suitable when the audio is low or high overall.

However, when the audio cannot be judged to be low or high overall, the minimum pitch difference can be found in the following way: performing step length shifting in a first direction on the fundamental frequency of each time point of the audio, for example, shifting up by 2Hz, namely +2Hz each time, stopping shifting when shifting to a frequency corresponding to a nearest standard pitch in a standard tone scale fundamental frequency reference table or when the distance is less than a preset minimum tolerance, and recording a first accumulated shift amount, step 401B; repeating the shifting process to shift in a second direction opposite to the first direction, stopping shifting when shifting to a frequency corresponding to a nearest standard pitch in the standard-tone scale fundamental frequency reference table or when the distance is less than a preset minimum tolerance, and recording a second accumulated shift amount, step 402B; taking the minimum accumulated offset of the two offset directions at the time point, recording the minimum pitch difference and the offset direction at the time point, and calculating the minimum pitch difference of each time point in the fundamental frequency sequence of the audio to form a pitch difference sequence to be corrected, step 403B; the elements of the sequence of pitch differences to be corrected comprise time points, pitch differences measured in frequency and offset direction.

The step size may be preset or configured to be user settable as desired. The minimum tolerance specifies a minimum distance from a standard pitch as defined. The minimum tolerance may be preset or configured to be user adjustable.

In addition to the fast sorting algorithm, the minimum pitch difference of each time point in the fundamental frequency sequence of the audio frequency can be used, such as insert sorting, bubble sorting, merge sorting, selection sorting and the like, so as to form a pitch difference sequence to be corrected.

After the pitch difference sequence to be corrected is obtained, the collected singing voice including common audio formats and fundamental frequencies such as WAV, MP3, WMA and the like and the pitch difference sequence to be corrected are used as input parameters, and the pitch change calculation to be corrected is completed through resampling and a PSOLA algorithm in sequence to obtain a corrected audio format file and obtain a corrected voice fundamental frequency sequence.

Wherein, the multiple sequence of the audio pitch to be increased can be determined according to the sequence of the pitch difference to be corrected, for example, the multiple measurement of the peak value is an expansion multiple measurement, namely division operation, which includes resampling the audio with the rhythm standard according to 1/S times of the sampling rate when the singing voice of the user is collected, and obtaining the resampled audio, step S501; and (3) lengthening the resampled audio to S times through a PSOLA algorithm and a multiple sequence of the pitch to be corrected, wherein the pitch needs to be increased, and the step S502 is carried out.

In some embodiments, the multiple sequence in which the pitch to be corrected needs to be increased is a decimal array, such as [1.2, 2.1, -1.5,.., 0.8]. In other embodiments, the sequence of multiples whose pitch needs to be increased may be a fixed fraction, e.g., a direct transition from implementing C-key to D-key, rather than an array of fractions; the pitch-changing coefficient calculating step is to re-synthesize the human voice audio based on the PSOLA algorithm by taking each pitch difference in the pitch-difference sequence to be corrected as the pitch-changing coefficient of the formant of the corresponding moment in the standard pitch human voice fundamental frequency sequence, and step S503.

It is to be understood that the described embodiments are merely exemplary of some, and not all, of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The technical solution of the present application is further described below with reference to application examples of the technical solution of the present application.

The technical scheme of the application can be presented in the form of a single application program, app, applet and the like, or integrated in the form of a program module in the application program, app and applet, so as to realize the human humming automatic vocal accompaniment function.

The App or program module can provide three pitch automatic correction options of 'height', 'middle' and 'low' for the user, wherein the 'height' automatic correction corresponds to the conversion of a 7-tone scale of a natural major key into a 5-tone scale, so that the change of the fundamental frequency at each time point is larger; the 'moderate' automatic correction corresponds to the pitch difference sequence to be corrected determined through unidirectional offset; the "low degree" automatic correction corresponds to the above-mentioned determination of the sequence of pitch differences to be corrected by means of a bi-directional offset.

The program may also provide the user with a choice of the mode, for example, the user may be provided with an option of the mode, and after receiving the choice of a certain mode from the user, the program will determine the specific mode scale within the range of the mode. For example, the user may be provided with options such as "ancient indian style", "ancient chinese style", "standard style", and the like, where each option corresponds to a set of tonal scales, and selection of the style enables selection of the set of tonal scales, thereby determining the tonal scale range of the similarity determination. For example, the ancient indian style corresponds to twelve-tone indian scale sequences, and when the user selects the ancient indian style, the twelve-tone indian scale sequences are formed as an alignment standard.

The application scene of the technical scheme is not limited. For example, the present invention may be applied to a system operating independently, or may be operated on a network system, for example, a network system of a client-server architecture. In such a network system structure, an application may be configured at the client to provide a graphical user interface to a user through a display device of the client and to capture user input, trigger and other operations through a capture device such as a touch screen, and a sound capture device provided or connected to the client may be used to capture singing voice of humming of the user and a playback device such as a loudspeaker of the client may be used to play back the modified humming audio. The humming audio collected by the client may be processed at the service and the modified humming audio may be transmitted to the client.

The client's collection of humming by the user may be humming completely at a certain sampling rate for a certain period of time and store the collection result in an audio file, such as an audio file in WAV format. The capturing of the humming by the server may include reading the audio file, such as the WAV file, at a certain sampling rate, or directly reading the client's cache and reading the client-captured audio in real time.

In an example one of the method and system for correcting pitch of audio, the selection of the correction option may be performed before starting the audio acquisition, and after receiving a trigger of the user to "low-degree" pitch automatic correction function, the following program processing flow is entered:

scheme 1A: the server side obtains a storage address of a human humming audio file recorded by a user at the user side, and reads the audio wav format file of the user according to a negotiated sampling rate of 44100 Hz;

scheme 2A: obtaining a fundamental frequency sung by a user and a standard pitch corresponding to the fundamental frequency by using a PYin algorithm; forming a fundamental frequency and pitch sequence corresponding to each time point; and storing the data into a database at the server.

Scheme 3A: and forming a twelve-tone natural major scale sequence as a comparison standard, comparing the pitch sequence obtained in the second calculation process with the natural major scale sequence under each of the twelve tones by using a cosine similarity algorithm to obtain a natural major scale with the maximum similarity, and determining and/or recording the tonal scale as the track.

Scheme 4A: if the tonal scale of the song is determined to be C major, selecting all natural major scale sequences in each octave range from C2 to C7 to form a natural major scale sequence: "C1", "D1", "E1". -. "A7", "B7", "C7", the sequence of natural major scales is converted into a sequence of frequencies with reference to an international standard pitch to frequency table.

Scheme 5A: calculating the minimum pitch difference between the singing fundamental frequency of the user and the fundamental frequency in the reference sequence by using a quick sequencing method to form a pitch difference sequence to be corrected, wherein the frequency to be increased is recorded as a positive number, and the frequency to be reduced is recorded as a negative number;

scheme 6A: and using the pitch difference sequence to be corrected as an input parameter, and completing correction calculation of the pitch by resampling and a PSOLA algorithm in sequence to obtain a corrected pitch sequence and a corresponding corrected human voice audio file.

Scheme 7A: and returning the corrected voice audio file to the client.

In the second example of the method and system for correcting pitch of audio, the function of converting humming melody into ancient indian singing style can be realized, and after the user triggers the "ancient indian style" function, the corresponding program processing flow is as follows:

scheme 1B: the method comprises the steps that a server side obtains a storage address of a human voice humming audio file recorded by a user at a user side, and reads an audio wav format file of the user according to a 44100Hz sampling rate;

scheme 2B: acquiring a fundamental frequency sung by a user and a standard pitch corresponding to the fundamental frequency by using a PYin algorithm; forming a fundamental frequency and pitch sequence corresponding to each time point; and storing the data into a database at the server.

Scheme 3B: twelve-toned indian scale sequence configurations are used as alignment standards, for example, C-toned indian scale configuration tones are C, D, E, # F, G, a, B; and (3) comparing the pitch sequence obtained in the second calculation process with the India scale sequence under each mode by using a cosine similarity algorithm to obtain the India mode scale with the maximum similarity, and determining and/or recording the India mode scale as the mode scale of the audio.

Scheme 4B: and selecting scale sequences of octave ranges of all pitches corresponding to the tonal scale of the audio according to the determined tonal scale of the audio, and converting the India scale sequences into frequency sequences by referring to an international standard pitch and frequency comparison table. For example, if the tonal scale of the audio is determined to be the tonal scale of the C-toned indian scale, all indian scale sequences in each octave range of "C2" to "C7" are selected and converted into frequency sequences by referring to the international standard pitch-frequency mapping table.

Scheme 5B: calculating the minimum pitch difference between the fundamental frequency of the audio and the fundamental frequency in the reference sequence by using a quick sequencing method to form a pitch difference sequence to be corrected, wherein the frequency to be increased is recorded as a positive number, and the frequency to be reduced is recorded as a negative number;

scheme 6B: and using the pitch difference sequence to be corrected as an input parameter, sequentially performing resampling and PSOLA algorithm, completing correction calculation on the pitch sequence, obtaining a corrected pitch sequence and then obtaining a corrected human voice audio file.

In some example embodiments, the functions of any of the methods, processes, signaling diagrams, algorithms, or flow diagrams described herein may be implemented by software and/or computer program code or portions of code stored in memory or other computer-readable or tangible media, and executed by a processor.

In some example embodiments, an apparatus may be included or associated with at least one software application, module, unit or entity configured as arithmetic operations, or as programs or portions thereof (including added or updated software routines), executed by at least one operating processor. Programs, also referred to as program products or computer programs, including software routines, applets and macros, may be stored in any device-readable data storage medium and may include program instructions for performing particular tasks.

A sequence is a unit of a data structure that may include strings, lists, tuples, and the like.

A computer program product may include one or more computer-executable components configured to perform some example embodiments when the program is run. The one or more computer-executable components may be at least one software code or code portion. Changes and configurations to implement the functions of the example embodiments may be performed as routines, which may be implemented as added or updated software routines. In an example, a software routine may be downloaded into the device.

By way of example, the software or computer program code or portions of code may be in source code form, object code form, or in some intermediate form, and may be stored on some type of carrier, distribution medium, or computer-readable medium, which may be any entity or device capable of carrying the program. Such a carrier may comprise, for example, a record medium, computer memory, read-only memory, an optical and/or electrical carrier signal, a telecommunication signal and/or a software distribution package. Depending on the required processing power, the computer program may be executed in a single electronic digital computer or may be distributed over a plurality of computers. The computer-readable medium or computer-readable storage medium may be a non-transitory medium.

In other example embodiments, the functions may be performed by a router, for example, using an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or any other hardware and software combination. In yet another example embodiment, the functionality may be implemented as a signal, such as an intangible means that may be carried by an electromagnetic signal downloaded from the Internet or other network.

According to example embodiments, an apparatus such as a node, device or response means may be configured as a circuit, computer or microprocessor (such as a single-chip computer element) or chipset that may include at least a memory for providing storage capacity for arithmetic operations and/or an arithmetic processor for performing arithmetic operations.

The example embodiments described herein are equally applicable to both singular and plural implementations, regardless of whether the language used to describe certain embodiments is in the singular or plural. For example, embodiments describing the operation of a single computing device are equally applicable to embodiments that include multiple instances of the computing device, and vice versa.

One of ordinary skill in the art will readily appreciate that the example embodiments as described above may be implemented with operations in a different order and/or with hardware elements in configurations different from those disclosed. Thus, while some embodiments have been described based upon these example embodiments, it would be apparent to those of ordinary skill in the art that certain modifications, variations, and alternative constructions would be apparent, while remaining within the spirit and scope of the example embodiments.

Claims

1. A method of modifying the pitch of an audio, characterized by: comprises the steps of

Obtaining a fundamental frequency sequence of the audio, wherein the fundamental frequency sequence comprises a plurality of time points and a fundamental frequency value of each time point;

obtaining an original pitch sequence of the audio based on the sequence of fundamental frequencies;

calculating the tonal scale which is closest to the original pitch sequence in different reference tonal scales by using a similarity algorithm and using the tonal scale as a standard tonal scale;

generating a standard-tone scale fundamental frequency reference table based on the standard-tone scale;

calculating the minimum pitch difference between the fundamental frequency of each time point of the fundamental frequency sequence and the fundamental frequency in the standard tone scale fundamental frequency reference table by using a sorting method;

forming a pitch difference sequence to be corrected based on the minimum pitch difference of the fundamental frequency of each time point and the offset direction thereof;

and taking the audio and the pitch difference sequence to be corrected as input, and correcting the audio sequentially through resampling and a PSOLA algorithm to obtain a corrected audio.

2. A method of modifying the pitch of audio according to claim 1, wherein: the calculating, as a standard tonal scale, a tonal scale closest to the standard pitch sequence among the different reference tonal scales using a similarity algorithm, includes: using scales of different tones under preset tone combination as the reference tone scale; or using a user-determined scale of a different key as the reference-key scale.

3. A method of modifying a pitch of audio according to claim 1, wherein: the calculating, as a standard tonal scale, a tonal scale closest to the standard pitch sequence among the different reference tonal scales using a similarity algorithm, includes: and screening the reference tonal scale by taking twelve pitches in each octave as references.

4. A method of modifying the pitch of audio according to claim 3, wherein: increasing or decreasing the number of pitches screened out by the scale arrangement within one octave according to the constitution of the reference tonal scale.

5. A method of modifying the pitch of audio according to claim 1, wherein: and before selecting the reference scale tone with the maximum similarity as the standard scale tone by using a similarity algorithm, converting the standard tone sequence into the octave range which is the same as the reference scale tone according to the octave relation.

6. A method of modifying a pitch of audio according to claim 1, wherein: the different reference tone scale comprises one or more of twelve different tone scales of natural major, a Chinese style scale, a modern musical tone scale based on a five-tone scale and a Bruce scale, a national scale, and the like.

7. A method of modifying the pitch of audio according to claim 1, wherein: the generation of the reference table of standard-toned scale fundamental frequencies of the standard-toned scale comprises generating a standard pitch sequence according to the number of octaves of the standard-toned scale and the number of pitches of the toned scale in each octave, and converting the standard pitch sequence into a frequency sequence of standard pitches according to an international standard pitch-frequency comparison table, thereby obtaining the reference table of standard-toned scale fundamental frequencies.

8. A method of modifying the pitch of audio according to claim 7, wherein: the number of octaves is preset or set by a user; and/or the number of pitches of the toned scale in each octave is preset or set by the user.

9. A method of modifying the pitch of audio according to claim 1, wherein: the step of calculating the pitch difference between the fundamental frequency of the audio and the fundamental frequency in the standard-tone scale fundamental frequency reference table by using a ranking method or a quick ranking method comprises the following steps: shifting the fundamental frequency of each sampling point to a first direction in a step length mode; stopping shifting and recording a first accumulated shift amount when shifting to a frequency corresponding to a nearest standard pitch in the standard-toned-scale-fundamental-frequency reference table or when the difference between the frequency and the frequency corresponding to the nearest standard pitch in the standard-toned-scale-fundamental-frequency reference table is less than a predefined minimum tolerance; and determining the first accumulated offset as the minimum pitch difference for that sample point.

10. A method of modifying the pitch of audio according to claim 1, wherein: the step of calculating the pitch difference between the fundamental frequency of the audio and the fundamental frequency of the standard-tone scale fundamental frequency with the minimum fundamental frequency by using a ranking method calculation or a fast ranking method comprises the following steps: shifting the fundamental frequency of each sampling point to a first direction by step length; stopping shifting and recording a first accumulated shift amount when shifting to a frequency corresponding to a nearest standard pitch in the standard tonal scale fundamental frequency reference table or when the difference between the frequency and the frequency corresponding to the nearest standard pitch in the standard tonal scale fundamental frequency reference table is less than a predefined minimum tolerance; shifting the fundamental frequency of each sampling point by a step size towards a second direction opposite to the first direction; stopping shifting and recording a second accumulated shift amount when shifting to a frequency corresponding to a nearest standard pitch in the standard tonal scale fundamental frequency reference table or when the difference between the frequency and the frequency corresponding to the nearest standard pitch in the standard tonal scale fundamental frequency reference table is less than a predefined minimum tolerance; comparing the first and second accumulated offsets and determining the smaller accumulated offset as the minimum pitch difference for that sample point.

11. A method of modifying the pitch of audio according to claim 9 or 10, wherein: the offset of the step offset is preset or set by a user.

12. A method of modifying the pitch of audio according to claim 1, wherein: determining a multiple sequence of the audio required to be increased in pitch according to the obtained sequence of the pitch difference to be corrected, and resampling the singing voice audio by 1/S times of the sampling rate when the singing voice of the user is collected to obtain the resampled audio; and (3) lengthening the re-sampled audio to S times through a PSOLA algorithm and a multiple sequence of the audio height to be corrected, wherein the multiple sequence needs to be improved.

13. A method of modifying the pitch of audio according to claim 12, wherein: the multiple sequence of the pitch to be corrected, which needs to be improved, is a decimal array obtained by calculation or a single fixed decimal.

14. A method of modifying the pitch of audio according to claim 12, wherein: and each pitch difference is used as a pitch variation coefficient of a formant at a corresponding moment in the standard pitch human voice fundamental frequency sequence, and the human voice audio is synthesized again based on a PSOLA algorithm.

15. A method of modifying the pitch of audio according to claim 1, wherein: determining a range of the reference tonal scale prior to the calculating using the similarity algorithm; and/or a calculation method for determining the minimum pitch difference before the calculation using the ranking method.

16. A method of modifying the pitch of audio according to claim 1, wherein: determining the fundamental frequency sequence by using a PYin algorithm; and/or the similarity algorithm is a cosine similarity algorithm; and/or the ranking method is a quick ranking method.

17. Apparatus for modifying a pitch of audio, comprising at least one processor and at least one memory including computer program code, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to perform the method of modifying a pitch of audio of any of claims 1 to 16.