CN109192218B

CN109192218B - Method and apparatus for audio processing

Info

Publication number: CN109192218B
Application number: CN201811066716.6A
Authority: CN
Inventors: 劳振锋
Original assignee: Guangzhou Kugou Computer Technology Co Ltd
Current assignee: Guangzhou Kugou Computer Technology Co Ltd
Priority date: 2018-09-13
Filing date: 2018-09-13
Publication date: 2021-05-07
Anticipated expiration: 2038-09-13
Also published as: CN109192218A

Abstract

The invention discloses an audio processing method and device, and belongs to the technical field of audio editing. The method comprises the following steps: acquiring a tone reference audio frame from a target audio, and extracting spectral envelope characteristic information of the tone reference audio frame; extracting fundamental frequency information of source audio frames which are the same as the playing time points of the tone color reference audio frames in the source audio; and generating a varied tone audio frame corresponding to the source audio frame based on the fundamental frequency information and the spectral envelope characteristic information. The invention can effectively solve the technical problem of tone change in the sound changing process.

Description

Method and apparatus for audio processing

Technical Field

The present invention relates to the field of audio editing technologies, and in particular, to a method and an apparatus for audio processing.

Background

At present, sound changing software is arranged on a plurality of mobile phones, and the conversion of male and female voices or the sound of a child is very interesting for users.

The principle of sound change in the related art is as follows: and copying an audio frame from the source audio every several audio frames, and then inserting the copied audio frame behind the copied audio frame to obtain the reduced-speed audio with the variable duration. And resampling the speed-reducing audio to obtain a new audio with the same duration as the source audio. The tone and tone of the new audio are changed, so that the purpose of changing sound is achieved.

In the process of implementing the invention, the inventor finds that the related art has at least the following problems:

when it is desired to combine the transposed human voice audio and the accompaniment audio into a song audio, there are two cases: if the accompaniment audio is correspondingly modified, the tone of the accompaniment audio is changed, so that the tone quality of the accompaniment audio is damaged, and the quality of the finally synthesized song audio is reduced; if the accompanying audio is not tonal modification, then the tonal modification of the human voice audio and the tonal modification of the accompanying audio are not in one tone, and the synthesized song audio has poor auditory effect.

Disclosure of Invention

In order to solve the problems in the related art, embodiments of the present invention provide a method and an apparatus for audio processing. The technical scheme is as follows:

in a first aspect, a method of audio processing is provided, the method comprising:

acquiring a tone reference audio frame from a target audio, and extracting spectral envelope characteristic information of the tone reference audio frame;

extracting fundamental frequency information of source audio frames which are the same as the playing time points of the tone color reference audio frames in the source audio;

and generating a varied tone audio frame corresponding to the source audio frame based on the fundamental frequency information and the spectral envelope characteristic information.

Optionally, the method further includes:

extracting consonant information of the source audio frame;

generating a varied-tone audio frame corresponding to the source audio frame based on the fundamental frequency information and the spectral envelope characteristic information, including:

and generating a varied tone color audio frame corresponding to the source audio frame based on the fundamental frequency information, the spectral envelope characteristic information and the consonant information.

Optionally, before obtaining the timbre reference audio frame in the target audio, the method further includes:

and carrying out tonal modification processing on the source audio to obtain the target audio.

Optionally, the performing the tonal modification processing on the source audio to obtain the target audio includes:

in the source audio, selecting a second preset number of source audio frames at intervals of a first preset number of source audio frames, copying the second preset number of source audio frames, and inserting the copied source audio frames into the selected source audio frames to obtain a reduced-speed audio corresponding to the source audio;

and resampling the speed reduction audio to obtain the target audio with the same number of frames and the same time length as the source audio.

displaying a local audio list;

and when a selection instruction of the target audio option in the local audio list is received, acquiring the target audio.

In a second aspect, there is provided an apparatus for audio processing, the apparatus comprising:

the acquisition module is used for acquiring a tone reference audio frame in the target audio;

the extraction module is used for extracting the spectral envelope characteristic information of the tone color reference audio frame and extracting the fundamental frequency information of the source audio frame which is the same as the playing time point of the tone color reference audio frame in the source audio;

and the generating module is used for generating a varied tone audio frame corresponding to the source audio frame based on the fundamental frequency information and the spectral envelope characteristic information.

Optionally, the extracting module is further configured to extract consonant information of the source audio frame;

the generating module is further configured to generate a varied-tone audio frame corresponding to the source audio frame based on the fundamental frequency information, the spectral envelope characteristic information, and the consonant information.

Optionally, the apparatus further comprises:

and the tone changing module is used for carrying out tone changing processing on the source audio to obtain the target audio.

Optionally, the pitch modification module is configured to select a second preset number of source audio frames at intervals of a first preset number of source audio frames in the source audio, copy the second preset number of source audio frames, and insert the copied source audio frames into the selected source audio frames to obtain a reduced-speed audio corresponding to the source audio;

Optionally, the apparatus further comprises:

the display module is used for displaying the local audio list;

the obtaining module is further configured to obtain the target audio when a selection instruction of an option of the target audio in the local audio list is received.

In a third aspect, a terminal is provided, the terminal comprising a processor and a memory, the memory having stored therein at least one instruction, the at least one instruction being loaded and executed by the processor to implement the method of audio processing according to the first aspect.

In a fourth aspect, there is provided a computer readable storage medium having stored therein at least one instruction, which is loaded and executed by the processor, to implement the method of audio processing according to the first aspect.

The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:

in the embodiment of the invention, because the finally obtained variable tone audio frame comprises the fundamental frequency information of the source audio frame and the spectral envelope characteristic information of the tone reference audio frame, the tone of the variable tone audio is changed, the purpose of changing the sound is achieved, the tone of the variable tone audio is the same as that of the source audio, and the tone is not changed. Thus, since the pitch of the timbre-varying audio is not changed, the pitch of the accompaniment audio does not have to be changed. The tone of the tone-changing audio and the tone of the accompaniment audio are not changed, and the tone-changing audio can be directly synthesized with the accompaniment audio to form the song audio. The final audio tone quality of the song can not be damaged, and the problem of poor auditory effect can not exist.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart of a method for audio processing according to an embodiment of the present invention;

fig. 2 is a flowchart of a method for audio processing according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an audio processing apparatus according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an audio processing terminal according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a computer device for audio processing according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a song selection interface provided by an embodiment of the invention;

FIG. 7 is a diagram illustrating a Karaoke interface according to an embodiment of the present invention;

fig. 8 is a schematic diagram of a variant interface according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

The embodiment of the invention provides an audio processing method, which can be realized by a terminal. The terminal can be a mobile terminal such as a mobile phone, a tablet computer and a notebook computer, and can also be a fixed terminal such as a desktop computer.

The terminal may include a processor, memory, audio output components, audio input components, and the like. The processor, which may be a CPU (Central Processing Unit), may be used to edit an audio file and control a display to perform Processing such as displaying. The Memory may be a RAM (Random Access Memory), a Flash Memory, and the like, and may be configured to store received data, data required by the processing procedure, data generated in the processing procedure, and the like, such as a tone reference audio frame, a source audio frame, a tone-changing audio frame, and the like. The audio output component may be a speaker, headphones, or the like. The audio input means may be a microphone or the like. The terminal may also include input means, a screen, etc. The input component can be a mouse, a touch screen, a touch pad, a keyboard and the like, and can generate corresponding instructions based on the operation of a user. The screen can be a touch screen or a non-touch screen, and can be used for displaying an operation interface of an application program and the like.

As shown in fig. 1, the processing flow of the method may include the following steps:

in step 101, a target audio is obtained by performing a pitch change process on a source audio.

Wherein the source audio may be human audio of the user.

In implementation, a user can install an application program for processing the K song and the audio on the terminal, and when the user wants to select the K song and performs voice change processing on the own song voice, the user can click the shortcut icon to run the application program and select a function option of the K song in a main interface of the application program. At this time, a song selection interface (as shown in fig. 6) may be displayed in the application program, an audio list may be displayed in the song selection interface, the audio list includes a plurality of song audio options, and the user may browse the audio list and click on the option of selecting a song audio that the user wants to sing. After the selection is completed, the application program enters the karaoke interface (as shown in fig. 7). The terminal plays the accompaniment in the song audio and may display the lyrics on the screen. Meanwhile, the terminal can start an audio input component (such as a microphone) to record audio, a user can sing along with the accompaniment, and the recorded voice audio is taken as source audio by the terminal.

After the recording of the voice audio is completed, a voice change category interface (as shown in fig. 8) is displayed in the application program, a sound type list can be displayed in the voice change category interface, a plurality of sound type options can be displayed in the sound type list, and a user can browse the sound type list and click and select a sound type option which the user wants to change. If the user wants to change the voice to a child voice, he can click on the child voice option. Then, the terminal performs a transposition process for changing the source audio of the user to the child sound. And (5) carrying out tone modification on the source audio to obtain the target audio.

Optionally, the tone-changing process may select a soundtouch algorithm (an audio tone-changing algorithm), a frequency domain tone-changing method, or a parameter tone-changing method, and taking the soundtouch algorithm as an example, the tone-changing principle of the soundtouch algorithm is as follows: in source audio, selecting a second preset number of source audio frames at intervals of a first preset number of source audio frames, copying the second preset number of source audio frames, and inserting the copied source audio frames into the selected source audio frames to obtain the speed-reducing audio corresponding to the source audio; and resampling the speed reduction audio to obtain target audio with the same number of frames and the same time length as the source audio.

The specific numerical values of the first preset number and the second preset number may be fixed numerical values, or may be determined according to a fundamental frequency of the source audio and a fundamental frequency corresponding to a sound type selected by the user.

In step 102, a timbre reference audio frame is obtained in the target audio, and spectral envelope characteristic information of the timbre reference audio frame is extracted.

The information of the spectral envelope characteristic of the tone reference audio frame is used for describing the shape characteristic of a spectral curve, and the information of the spectral envelope characteristic can represent the tone.

In implementation, after the target audio is generated, the audio frames therein, i.e., the timbre reference audio frames, may be acquired one by one in the target audio in the playing order starting from the first audio frame. Then, for the obtained tone reference audio frame, extracting fundamental frequency information of the tone reference audio frame, and then extracting spectral envelope characteristic information of the tone reference audio frame in combination with the fundamental frequency information.

In step 103, fundamental frequency information of source audio frames in the same playing time point as the timbre reference audio frames in the source audio is extracted.

Wherein the fundamental frequency information may be a peak frequency of the audio frame spectrum.

In implementation, the terminal extracts the spectral envelope characteristic information from the tone reference audio frame, and at the same time, may obtain the source audio frame with the same playing time point as the tone reference audio frame from the source audio. As can be seen from the above process of generating the target audio based on the source audio, the number of frames and the duration of the source audio and the target audio are the same. Therefore, while the first audio frame is obtained as the timbre reference audio frame in the target audio, the first source audio frame may be obtained in the source audio, and while the second audio frame is obtained as the timbre reference audio frame in the target audio, the second source audio frame may be obtained in the source audio, so that the obtained source audio frame and the timbre reference audio frame have the same play time point. After the source audio frame is acquired, the fundamental frequency information of the source audio frame may be extracted.

Optionally, in order to make the sound variation effect more natural and real, the consonant information in the source audio can be extracted.

In implementation, the process of generating the target audio based on the source audio is described above. After the source audio frame is acquired, the fundamental frequency information and the consonant information of the source audio frame can be extracted.

In step 104, a varied-tone audio frame corresponding to the source audio frame is generated based on the fundamental frequency information and the spectral envelope characteristic information.

In an implementation, the terminal may invoke a world tool (a tool that can generate audio) to generate new human voice audio. Inputting the fundamental frequency information of the source audio and the spectral envelope characteristic information of the target audio into a world tool, and generating new human voice audio by the world tool. The human voice audio has the fundamental frequency information of the source audio and the spectral envelope characteristic information of the target audio, so that the tone of the human voice audio is consistent with the source audio, and the tone color is consistent with the target audio. The overall effect is that the timbre of the altered human voice audio is unchanged from the original recorded human voice audio.

Optionally, based on the extracted consonant information of the source audio frame, the corresponding processing procedure in step 104 is as follows: and generating a tone-changing audio frame corresponding to the source audio frame based on the fundamental frequency information, the spectral envelope characteristic information and the consonant information.

In an implementation, the terminal may invoke a world tool (a tool that can generate audio) to generate new human voice audio. And inputting the fundamental frequency information of the source audio, the consonant information of the source audio and the spectral envelope characteristic information of the target audio into a world tool, wherein the world tool can generate new human voice audio. The human voice audio has the fundamental frequency information of the source audio and the spectral envelope characteristic information of the target audio, so that the tone of the human voice audio is consistent with the source audio, and the tone color is consistent with the target audio. Since the human voice audio also has consonant information of the source audio, the human voice audio sounds more natural than the human voice audio generated based on only the fundamental frequency information and the spectral envelope characteristic information.

Finally, after the sound change is completed, the user can click the audition button, and the terminal plays the tone-changing audio and the accompaniment audio. If the user is satisfied with the sound change effect, the user can click the storage button to synthesize new song audio by the tone change audio and the accompaniment audio and store the new song audio in a local file; if the user is not satisfied with the sound changing effect, the user can click the re-recording button, re-record and carry out sound changing operation. The user may also click on the publish button to select a new song audio to upload to the network.

As shown in fig. 2, the processing flow of the method may include the following steps:

in step 201, a local audio list is displayed. And when a selection instruction of the options of the target audio in the local audio list is received, acquiring the target audio.

Wherein the target audio may be a human voice audio pre-stored on the terminal.

In implementation, a user can install an application program for processing the K song and the audio on the terminal, and when the user wants to select the K song and performs voice change processing on the own song voice, the user can click the shortcut icon to run the application program and select a function option of the K song in a main interface of the application program. At this time, a song selection interface (as shown in fig. 6) may be displayed in the application program, an audio list may be displayed in the song selection interface, the audio list includes a plurality of song audio options, and the user may browse the audio list and click on the option of selecting a song audio that the user wants to sing. After the selection is completed, the application program enters the karaoke interface (as shown in fig. 7). The terminal plays the accompaniment in the song audio and may display the lyrics on the screen. And the terminal takes the human voice audio in the song audio as the target audio. Meanwhile, the terminal can start an audio input component (such as a microphone) to record audio, a user can sing along with the accompaniment, and the recorded voice audio is taken as source audio by the terminal.

In step 202, a timbre reference audio frame is obtained in the target audio, and spectral envelope feature information of the timbre reference audio frame is extracted.

The detailed implementation process refers to step 102.

In step 203, fundamental frequency information of source audio frames in the same time point as the playing time point of the timbre reference audio frame in the source audio is extracted.

The detailed implementation process refers to step 103.

In step 204, an alternate-tone audio frame corresponding to the source audio frame is generated based on the fundamental frequency information and the spectral envelope characteristic information.

The implementation is described with reference to step 104.

Based on the same technical concept, an embodiment of the present invention further provides an apparatus for audio processing, which may be the terminal in the foregoing embodiment, as shown in fig. 3, and the apparatus includes:

an obtaining module 301, configured to obtain a timbre reference audio frame in a target audio;

an extracting module 302, configured to extract spectral envelope characteristic information of a timbre reference audio frame and extract fundamental frequency information of a source audio frame in the source audio, where the fundamental frequency information is the same as a playing time point of the timbre reference audio frame;

and a generating module 303, configured to generate a varied-tone audio frame corresponding to the source audio frame based on the fundamental frequency information and the spectral envelope characteristic information.

Optionally, the extracting module 302 is further configured to extract consonant information of the source audio frame;

the generating module 303 is further configured to generate a varied-tone audio frame corresponding to the source audio frame based on the fundamental frequency information, the spectral envelope characteristic information, and the consonant information.

Optionally, the apparatus further comprises:

and the tonal modification module 304 is configured to perform tonal modification processing on the source audio to obtain a target audio.

Optionally, the tone modification module 304 is configured to select a second preset number of source audio frames at intervals of a first preset number of source audio frames in the source audio, copy the second preset number of source audio frames, and insert the copied source audio frames into the selected source audio frames to obtain a reduced-speed audio corresponding to the source audio;

Optionally, the apparatus further comprises:

a display module 305 for displaying a local audio list;

the obtaining module 306 is further configured to obtain the target audio when a selection instruction for an option of the target audio in the local audio list is received.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

It should be noted that: in the audio processing apparatus provided in the foregoing embodiment, only the division of the functional modules is illustrated in the audio processing, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the audio processing apparatus and the audio processing method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments and are not described herein again.

Fig. 4 is a block diagram of a terminal according to an embodiment of the present invention. The terminal 400 may be a portable mobile terminal such as: smart phones, tablet computers. The terminal 400 may also be referred to by other names such as user equipment, portable terminal, etc.

Generally, the terminal 400 includes: a processor 401 and a memory 402.

Processor 401 may include one or more processing cores, such as a 4-core processor, and so forth. The processor 401 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 401 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 401 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed by the display screen. In some embodiments, the processor 401 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.

Memory 402 may include one or more computer-readable storage media, which may be tangible and non-transitory. Memory 402 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 402 is used to store at least one instruction for execution by processor 401 to implement the methods of audio processing provided herein.

In some embodiments, the terminal 400 may further optionally include: a peripheral interface 403 and at least one peripheral. Specifically, the peripheral device includes: at least one of radio frequency circuitry 404, touch screen display 405, camera 406, audio circuitry 407, positioning components 408, and power supply 409.

The peripheral interface 403 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 401 and the memory 402. In some embodiments, processor 401, memory 402, and peripheral interface 403 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 401, the memory 402 and the peripheral interface 403 may be implemented on a separate chip or circuit board, which is not limited by this embodiment.

The Radio Frequency circuit 404 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 404 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 404 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 404 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 404 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 404 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

The touch display screen 405 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. The touch display screen 405 also has the ability to capture touch signals on or over the surface of the touch display screen 405. The touch signal may be input to the processor 401 as a control signal for processing. The touch screen display 405 is used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the touch display screen 405 may be one, providing the front panel of the terminal 400; in other embodiments, the touch screen display 405 may be at least two, respectively disposed on different surfaces of the terminal 400 or in a folded design; in still other embodiments, the touch display 405 may be a flexible display disposed on a curved surface or on a folded surface of the terminal 400. Even more, the touch screen display 405 can be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The touch screen 405 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and other materials.

The camera assembly 406 is used to capture images or video. Optionally, camera assembly 406 includes a front camera and a rear camera. Generally, a front camera is used for realizing video call or self-shooting, and a rear camera is used for realizing shooting of pictures or videos. In some embodiments, the number of the rear cameras is at least two, and each of the rear cameras is any one of a main camera, a depth-of-field camera and a wide-angle camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting function and a VR (Virtual Reality) shooting function. In some embodiments, camera assembly 406 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuit 407 is used to provide an audio interface between the user and the terminal 400. The audio circuit 407 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 401 for processing, or inputting the electric signals to the radio frequency circuit 404 for realizing voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different portions of the terminal 400. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 401 or the radio frequency circuit 404 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, audio circuitry 407 may also include a headphone jack.

The positioning component 408 is used to locate the current geographic position of the terminal 400 for navigation or LBS (Location Based Service). The Positioning component 408 can be a Positioning component based on the Global Positioning System (GPS) in the united states, the beidou System in china, or the galileo System in russia.

The power supply 409 is used to supply power to the various components in the terminal 400. The power source 409 may be alternating current, direct current, disposable or rechargeable. When the power source 409 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the terminal 400 also includes one or more sensors 410. The one or more sensors 410 include, but are not limited to: acceleration sensor 411, gyro sensor 412, pressure sensor 413, fingerprint sensor 414, optical sensor 415, and proximity sensor 416.

The acceleration sensor 411 may detect the magnitude of acceleration in three coordinate axes of the coordinate system established with the terminal 400. For example, the acceleration sensor 411 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 401 may control the touch display screen 405 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 411. The acceleration sensor 411 may also be used for acquisition of motion data of a game or a user.

The gyro sensor 412 may detect a body direction and a rotation angle of the terminal 400, and the gyro sensor 412 may cooperate with the acceleration sensor 411 to acquire a 3D motion of the terminal 400 by the user. From the data collected by the gyro sensor 412, the processor 401 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

The pressure sensor 413 may be disposed on a side bezel of the terminal 400 and/or a lower layer of the touch display screen 405. When the pressure sensor 413 is disposed at a side frame of the terminal 400, a user's grip signal to the terminal 400 can be detected, and left-right hand recognition or shortcut operation can be performed according to the grip signal. When the pressure sensor 413 is disposed at the lower layer of the touch display screen 405, the operability control on the UI interface can be controlled according to the pressure operation of the user on the touch display screen 405. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 414 is used for collecting a fingerprint of the user to identify the identity of the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, processor 401 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings, etc. The fingerprint sensor 414 may be disposed on the front, back, or side of the terminal 400. When a physical key or vendor Logo is provided on the terminal 400, the fingerprint sensor 414 may be integrated with the physical key or vendor Logo.

The optical sensor 415 is used to collect the ambient light intensity. In one embodiment, the processor 401 may control the display brightness of the touch display screen 405 based on the ambient light intensity collected by the optical sensor 415. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 405 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 405 is turned down. In another embodiment, the processor 401 may also dynamically adjust the shooting parameters of the camera assembly 406 according to the ambient light intensity collected by the optical sensor 415.

A proximity sensor 416, also known as a distance sensor, is typically disposed on the front side of the terminal 400. The proximity sensor 416 is used to collect the distance between the user and the front surface of the terminal 400. In one embodiment, when the proximity sensor 416 detects that the distance between the user and the front surface of the terminal 400 gradually decreases, the processor 401 controls the touch display screen 405 to switch from the bright screen state to the dark screen state; when the proximity sensor 416 detects that the distance between the user and the front surface of the terminal 400 gradually becomes larger, the processor 401 controls the touch display screen 405 to switch from the breath screen state to the bright screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 4 is not intended to be limiting of terminal 400 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.

In an exemplary embodiment, a computer-readable storage medium is further provided, in which at least one instruction is stored, and the at least one instruction is loaded and executed by a processor to implement the method for identifying an action category in the above embodiments. For example, the computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Fig. 5 is a schematic structural diagram of a computer device 500 according to an embodiment of the present invention, where the computer device 500 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 501 and one or more memories 502, where the memory 502 stores at least one instruction, and the at least one instruction is loaded and executed by the processor 501 to implement the method for audio processing.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method of audio processing, the method comprising:

carrying out tone modification processing on source audio to obtain target audio, wherein the source audio is human voice audio of a user;

displaying a local audio list;

when a selection instruction of the target audio option in the local audio list is received, acquiring the target audio;

acquiring a tone reference audio frame from the target audio, and extracting spectral envelope characteristic information of the tone reference audio frame;

extracting fundamental frequency information of a source audio frame which is the same as the playing time point of the tone color reference audio frame in the source audio, and extracting consonant information of the source audio frame, wherein the fundamental frequency information is the peak frequency of a source audio frame frequency spectrum;

2. The method of claim 1, wherein the transposing the source audio to obtain the target audio comprises:

3. An apparatus for audio processing, the apparatus comprising:

the system comprises a tone changing module, a target audio processing module and a tone changing module, wherein the tone changing module is used for carrying out tone changing processing on source audio to obtain target audio, and the source audio is human voice audio of a user;

the display module is used for displaying the local audio list;

the acquisition module is used for acquiring the target audio when receiving a selection instruction of the target audio option in the local audio list;

the obtaining module is further configured to obtain a timbre reference audio frame in the target audio;

the extraction module is used for extracting the spectral envelope characteristic information of the tone reference audio frame, extracting the fundamental frequency information of the source audio frame which is the same as the playing time point of the tone reference audio frame in the source audio, and extracting the consonant information of the source audio frame, wherein the fundamental frequency information is the peak frequency of the source audio frame spectrum;

and the generating module is used for generating a variable tone color audio frame corresponding to the source audio frame based on the fundamental frequency information, the spectral envelope characteristic information and the consonant information.

4. The apparatus of claim 3, wherein the transposition module is configured to:

5. A terminal, characterized in that the terminal comprises a processor and a memory, in which at least one instruction is stored, which is loaded and executed by the processor to implement the method of audio processing according to any of claims 1 or 2.

6. A computer-readable storage medium having stored therein at least one instruction, which is loaded and executed by a processor, to implement the method of audio processing according to any of claims 1 or 2.