CN109192218A - The method and apparatus of audio processing - Google Patents

The method and apparatus of audio processing Download PDF

Info

Publication number
CN109192218A
CN109192218A CN201811066716.6A CN201811066716A CN109192218A CN 109192218 A CN109192218 A CN 109192218A CN 201811066716 A CN201811066716 A CN 201811066716A CN 109192218 A CN109192218 A CN 109192218A
Authority
CN
China
Prior art keywords
audio
audio frame
frame
source
source audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811066716.6A
Other languages
Chinese (zh)
Other versions
CN109192218B (en
Inventor
劳振锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Kugou Computer Technology Co Ltd
Original Assignee
Guangzhou Kugou Computer Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Kugou Computer Technology Co Ltd filed Critical Guangzhou Kugou Computer Technology Co Ltd
Priority to CN201811066716.6A priority Critical patent/CN109192218B/en
Publication of CN109192218A publication Critical patent/CN109192218A/en
Application granted granted Critical
Publication of CN109192218B publication Critical patent/CN109192218B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/013Adapting to target pitch
    • G10L2021/0135Voice conversion or morphing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Telephone Function (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The invention discloses a kind of method and apparatus of audio processing, belong to audio editing techniques field.The described method includes: obtaining tone color reference audio frame in target audio, the spectrum envelope characteristic information of the tone color reference audio frame is extracted;Extract the fundamental frequency information of source audio frame identical with the play time of the tone color reference audio frame in source audio;Based on the fundamental frequency information and the spectrum envelope characteristic information, the corresponding color audio frame of changing voice of the source audio frame is generated.Using the present invention, the technical issues of can effectively solving dodgoing during the change of voice.

Description

The method and apparatus of audio processing
Technical field
The present invention relates to audio editing techniques field, in particular to a kind of method and apparatus of audio processing.
Background technique
Now, there is change of voice software on many mobile phones, the conversion of men and women's sound or child's voice uncle's sound is carried out to voice, for It is a very interesting thing for user.
The principle of the change of voice in the related technology are as follows: source audio is replicated into an audio frame every several audio frames, then by duplication Audio frame is inserted into the audio frame back being replicated, and obtains the elongated reduction of speed audio of duration.Resampling is carried out to reduction of speed audio to obtain To duration new audio identical with source audio.The tone and tone color of new audio are all changed, to reach the mesh of the change of voice 's.
In the implementation of the present invention, inventor find the relevant technologies the prior art has at least the following problems:
When wanting the voice audio after modified tone synthesizing song audio with audio accompaniment, there is following two situation: if Audio accompaniment modifies tone accordingly, and due to the dodgoing of audio accompaniment, the sound quality of audio accompaniment will be damaged, the song being finally synthesizing The quality of bent audio will decline;If audio accompaniment invariable tone, the accompaniment tone of voice audio and invariable tone after modified tone For frequency not on a tone, the song audio auditory effect of synthesis is poor.
Summary of the invention
In order to solve the problems, such as present in the relevant technologies, the embodiment of the invention provides a kind of method of audio processing and dresses It sets.The technical solution is as follows:
In a first aspect, providing a kind of method of audio processing, which comprises
Tone color reference audio frame is obtained in target audio, extracts the spectrum envelope feature letter of the tone color reference audio frame Breath;
Extract the fundamental frequency information of source audio frame identical with the play time of the tone color reference audio frame in source audio;
Based on the fundamental frequency information and the spectrum envelope characteristic information, the corresponding color sound of changing voice of the source audio frame is generated Frequency frame.
Optionally, the method also includes:
Extract the consonant information of the source audio frame;
It is described to be based on the fundamental frequency information and the spectrum envelope characteristic information, generate that the source audio frame is corresponding to change voice Color audio frame, comprising:
Based on the fundamental frequency information, the spectrum envelope characteristic information and the consonant information, the source audio frame is generated Corresponding color audio frame of changing voice.
Optionally, it is described in target audio obtain tone color reference audio frame before, further includes:
Modified tone is carried out to the source audio to handle to obtain the target audio.
It is optionally, described modified tone is carried out to the source audio to handle to obtain the target audio, comprising:
In the source audio, at interval of the source audio frame of the first preset number, the source audio of the second preset number is chosen The source audio frame of duplication is inserted into after the source audio frame of selection by frame, the source audio frame for replicating second preset number, Obtain the corresponding reduction of speed audio of the source audio;
Resampling is carried out to the reduction of speed audio, it is identical described to obtain and duration identical as the frame number of the source audio Target audio.
Optionally, it is described in target audio obtain tone color reference audio frame before, further includes:
Show local audio list;
When receiving the instruction of the selection to the option of the target audio in the local audio list, described in acquisition Target audio.
Second aspect, provides a kind of device of audio processing, and described device includes:
Module is obtained, for obtaining tone color reference audio frame in target audio;
Extraction module, for extract the tone color reference audio frame spectrum envelope characteristic information and extract source audio in The fundamental frequency information of the identical source audio frame of the play time of the tone color reference audio frame;
Generation module generates the source audio frame for being based on the fundamental frequency information and the spectrum envelope characteristic information Corresponding color audio frame of changing voice.
Optionally, the extraction module is also used to extract the consonant information of the source audio frame;
The generation module is also used to based on the fundamental frequency information, the spectrum envelope characteristic information and consonant letter Breath, generates the corresponding color audio frame of changing voice of the source audio frame.
Optionally, described device further include:
Modify tone module, handles to obtain the target audio for carrying out modified tone to the source audio.
Optionally, the modified tone module, in the source audio, at interval of the source audio frame of the first preset number, The source audio frame of the second preset number is chosen, the source audio frame of second preset number is replicated, the source audio frame of duplication is inserted Enter to after the source audio frame of selection, obtains the corresponding reduction of speed audio of the source audio;
Resampling is carried out to the reduction of speed audio, it is identical described to obtain and duration identical as the frame number of the source audio Target audio.
Optionally, described device further include:
Display module, for showing local audio list;
The acquisition module is also used to receive to the option of the target audio in the local audio list When choosing instruction, the target audio is obtained.
The third aspect provides a kind of terminal, and the terminal includes processor and memory, is stored in the memory At least one instruction, at least one instruction are loaded as the processor and are executed to realize as described in above-mentioned first aspect The method of audio processing.
Fourth aspect provides a kind of computer readable storage medium, is stored in the computer readable storage medium At least one instruction, at least one instruction are loaded as the processor and are executed to realize as described in above-mentioned first aspect The method of audio processing.
Technical solution bring beneficial effect provided in an embodiment of the present invention includes at least:
In the embodiment of the present invention, since finally obtained color audio frame of changing voice includes the fundamental frequency information and tone color of source audio frame The spectrum envelope characteristic information of reference audio frame, so, the tone color for color audio of changing voice changes, and has reached change of voice purpose, color of changing voice The tone of audio is identical as source audio, and tone has not been changed.To, since the tone for color audio of changing voice is constant, the sound of audio accompaniment Tune need not also change.The tone of color audio of changing voice and audio accompaniment does not all change, and color audio of changing voice can direct and accompaniment Audio synthesizes song audio.Final song audio sound quality will not be damaged, and will not there is a problem of auditory effect difference.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.
Fig. 1 is a kind of flow chart of the method for audio processing provided in an embodiment of the present invention;
Fig. 2 is a kind of flow chart of the method for audio processing provided in an embodiment of the present invention;
Fig. 3 is a kind of structural schematic diagram of the device of audio processing provided in an embodiment of the present invention;
Fig. 4 is a kind of structural schematic diagram of the terminal of audio processing provided in an embodiment of the present invention;
Fig. 5 is a kind of structural schematic diagram of the computer equipment of audio processing provided in an embodiment of the present invention;
Fig. 6 is a kind of schematic diagram of song selection interface provided in an embodiment of the present invention;
Fig. 7 is a kind of schematic diagram at K song interface provided in an embodiment of the present invention;
Fig. 8 is a kind of schematic diagram at change of voice classification interface provided in an embodiment of the present invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention Formula is described in further detail.
The embodiment of the invention provides a kind of method of audio processing, this method can be realized by terminal.Wherein, the terminal It can be the mobile terminals such as mobile phone, tablet computer, notebook, be also possible to the fixed terminals such as desktop computer.
Terminal may include processor, memory, audio output part and audio input means etc..Processor, Ke Yiwei CPU (Central Processing Unit, central processing unit) etc., can be used for editing audio file, control display into The processing such as row display.Memory can be RAM (Random Access Memory, random access memory) that Flash (dodges Deposit) etc., data needed for can be used for storing the data received, treatment process, the data generated in treatment process etc., such as sound Color reference audio frame, source audio frame, color audio frame etc. of changing voice.Audio output part can be speaker, earphone etc..Audio input portion Part can be microphone etc..Terminal can also include input part, screen etc..Input part can be mouse, touch screen, touching Plate, keyboard etc. are controlled, corresponding instruction can be generated based on the operation of user.Screen can be touch screen or non-touch-control screen, can be with For showing the operation interface etc. of application program.
As shown in Figure 1, the process flow of this method may include following step:
In a step 101, modified tone is carried out to source audio to handle to obtain target audio.
Wherein, source audio can be the voice audio of user.
In an implementation, user can install the application program for K song and audio processing at the terminal, when user wants K When singing and carrying out voice change process to the song of oneself, shortcut icon can be clicked and run the application program, and in application program The function choosing-item of K song is selected in main interface.At this point, can show song selection interface (as shown in Figure 6), album in application program Audio list can be shown by selecting in interface, include the option of multiple song audios in audio list, and user may browse through audio column Table clicks the option that the song audio sung is wanted oneself in selection.After selection, application program enters K song interface (such as Fig. 7 institute Show).Accompaniment in the terminal plays song audio, and the lyrics can be shown on the screen.Meanwhile to will start audio defeated for terminal Enter component (such as microphone), carry out audio recording, user can sing with accompaniment, the voice sound that terminal will be recorded Frequency is used as source audio.
Change of voice classification interface (as shown in Figure 8), change of voice classification circle can be shown after the completion of voice audio recording, in application program Sound type list can be shown in face, multiple sound type options can be shown in sound type list, and user may browse through Sound type list clicks selection oneself and wants the sound type option become.If user thinks that the change of voice is child's voice, then he can point Hit child's voice option.Then, terminal can the source audio to user carry out to child's voice change modified tone processing.Source audio modified tone is handled Target audio is obtained afterwards.
Optionally, modified tone processing can choose soundtouch algorithm (a kind of audio pitch shifting algorithm), frequency domain modified tone method or Parameter modified tone method modified tone, by taking soundtouch algorithm as an example, the modified tone principle of soundtouch algorithm is as follows: in source audio, At interval of the source audio frame of the first preset number, the source audio frame of the second preset number is chosen, second preset number is replicated The source audio frame of duplication is inserted into after the source audio frame of selection by source audio frame, obtains the corresponding reduction of speed audio of source audio; Resampling is carried out to the reduction of speed audio, obtains the identical target audio of and duration identical as the frame number of source audio.
Wherein, the specific value of the first preset number and the second preset number can be fixed numbers, can also be according to source The fundamental frequency of audio and the sound type corresponding fundamental frequency of user's selection determine.
In a step 102, tone color reference audio frame is obtained in target audio, extracts the frequency spectrum of the tone color reference audio frame Envelope characteristic information.
Wherein, the spectrum envelope characteristic information of tone color reference audio frame is the shape feature for describing spectrum curve Information, spectrum envelope characteristic information can characterize tone color.
In an implementation, after generating target audio, can in target audio, since first audio frame, according to Playing sequence obtains audio frame therein, i.e. tone color reference audio frame one by one.Then, it to the tone color reference audio frame of acquisition, mentions The fundamental frequency information for taking tone color reference audio frame extracts the spectrum envelope feature of tone color reference audio frame then in conjunction with the fundamental frequency information Information.
In step 103, source audio frame identical with the play time of tone color reference audio frame in source audio is extracted Fundamental frequency information.
Wherein, fundamental frequency information can be the crest frequency of audio frame frequency spectrum.
It in an implementation, can be in source sound while terminal extracts spectrum envelope characteristic information in tone color reference audio frame Source audio frame identical with the play time of tone color reference audio frame is obtained in frequency.Target is generated based on source audio by above-mentioned The treatment process of audio is it is found that source audio is identical with duration with the frame number of target audio.So obtaining in target audio While one audio frame is as tone color reference audio frame, first source audio frame can be obtained in source audio, in target sound While obtaining second audio frame as tone color reference audio frame in frequency, second source audio can be obtained in source audio Frame, in this way, the source audio frame and tone color reference audio frame play time having the same that obtain.After obtaining source audio frame, It can be with the fundamental frequency information of extraction source audio frame.
Optionally, in order to enable change of voice effect more natural reality, can also extract the consonant information in source audio.
In an implementation, the treatment process of target audio is generated based on above-mentioned source audio.After obtaining source audio frame, it can incite somebody to action The fundamental frequency information and consonant information of source audio frame all extract.
At step 104, it is based on fundamental frequency information and spectrum envelope characteristic information, generates the corresponding color sound of changing voice of source audio frame Frequency frame.
In an implementation, terminal can call world tool (a kind of tool that audio can be generated) to generate new voice sound Frequently.The spectrum envelope characteristic information of the fundamental frequency information of source audio and target audio is input in world tool, world tool New voice audio can be generated.The people's sound audio has the fundamental frequency information of source audio and the spectrum envelope feature letter of target audio Breath, so, the tone of the people's sound audio is consistent with source audio, and tone color is consistent with target audio.After overall effect is the change of voice It is constant that voice audio tone color compared with the voice audio initially recorded changes tone.
Optionally, the consonant information of the source audio frame based on said extracted, the corresponding treatment process of step 104 are as follows: base In fundamental frequency information, spectrum envelope characteristic information and consonant information, the corresponding color audio frame of changing voice of source audio frame is generated.
In an implementation, terminal can call world tool (a kind of tool that audio can be generated) to generate new voice sound Frequently.The spectrum envelope characteristic information of the fundamental frequency information of source audio, the consonant information of source audio and target audio is input to world In tool, world tool can generate new voice audio.Fundamental frequency information and target audio of the people's sound audio with source audio Spectrum envelope characteristic information, so, the tone of the people's sound audio is consistent with source audio, and tone color is consistent with target audio.Due to being somebody's turn to do Voice audio also has the consonant information of source audio, so, the people's sound audio and it is based only on fundamental frequency information and spectrum envelope feature The voice audio that information generates is compared, and sounds that sound is more natural.
Finally, after the completion of the change of voice, user can click audition button, and terminal plays are changed voice color audio and audio accompaniment.Such as Fruit user can click save button to change of voice good results, user, and the color audio that will change voice synthesizes new song with audio accompaniment Audio is stored in local file;If user is dissatisfied to change of voice effect, user, which can click, rerecords button, records again simultaneously Carry out change of voice operation.User can also click publication button, and selection uploads new song audio on the net.
As shown in Fig. 2, the process flow of this method may include following step:
In step 201, local audio list is shown.When receiving the option to the target audio in local audio list Selection instruction when, obtain target audio.
Wherein, target audio can be the voice audio being stored in advance at the terminal.
In an implementation, user can install the application program for K song and audio processing at the terminal, when user wants K When singing and carrying out voice change process to the song of oneself, shortcut icon can be clicked and run the application program, and in application program The function choosing-item of K song is selected in main interface.At this point, can show song selection interface (as shown in Figure 6), album in application program Audio list can be shown by selecting in interface, include the option of multiple song audios in audio list, and user may browse through audio column Table clicks the option that the song audio sung is wanted oneself in selection.After selection, application program enters K song interface (such as Fig. 7 institute Show).Accompaniment in the terminal plays song audio, and the lyrics can be shown on the screen.Terminal will be in the song audio Voice audio is as target audio.Meanwhile terminal will start audio input means (such as microphone), carry out audio recording, user It can be sung with accompaniment, terminal is using the voice audio recorded as source audio.
In step 202, tone color reference audio frame is obtained in target audio, extracts the frequency spectrum of the tone color reference audio frame Envelope characteristic information.
Specific implementation process is referring to step 102.
In step 203, source audio frame identical with the play time of tone color reference audio frame in source audio is extracted Fundamental frequency information.
Specific implementation process is referring to step 103.
In step 204, it is based on fundamental frequency information and institute's spectrum envelope characteristic information, generates that institute's source audio frame is corresponding changes voice Color audio frame.
Specific implementation process is referring to step 104.
Based on the same technical idea, the embodiment of the invention also provides a kind of device of audio processing, which can be with For the terminal in above-described embodiment, as shown in figure 3, the device includes:
Module 301 is obtained, for obtaining tone color reference audio frame in target audio;
Extraction module 302, for extract tone color reference audio frame spectrum envelope characteristic information and extract source audio in The fundamental frequency information of the identical source audio frame of the play time of tone color reference audio frame;
Generation module 303, for generating based on fundamental frequency information and spectrum envelope characteristic information, source audio frame is corresponding to change voice Color audio frame.
Optionally, extraction module 302 are also used to the consonant information of extraction source audio frame;
Generation module 303 is also used to generate source audio based on fundamental frequency information, spectrum envelope characteristic information and consonant information The corresponding color audio frame of changing voice of frame.
Optionally, described device further include:
Modify tone module 304, handles to obtain target audio for carrying out modified tone to source audio.
Optionally, modify tone module 304, in source audio, at interval of the source audio frame of the first preset number, chooses the The source audio frame of duplication is inserted into selection by the source audio frame of the source audio frame of two preset numbers, the second preset number of duplication After source audio frame, the corresponding reduction of speed audio of source audio is obtained;
Resampling is carried out to reduction of speed audio, obtains the identical target audio of and duration identical as the frame number of source audio.
Optionally, described device further include:
Display module 305, for showing local audio list;
Module 306 is obtained, is also used to receive the instruction of the selection to the option of the target audio in local audio list When, obtain target audio.
About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method Embodiment in be described in detail, no detailed explanation will be given here.
It should be understood that the device of audio processing provided by the above embodiment is in audio processing, only with above-mentioned each function Can module division progress for example, in practical application, can according to need and by above-mentioned function distribution by different functions Module is completed, i.e., the internal structure of equipment is divided into different functional modules, described above all or part of to complete Function.In addition, the device of audio processing provided by the above embodiment and the embodiment of the method for audio processing belong to same design, Specific implementation process is detailed in embodiment of the method, and which is not described herein again.
Fig. 4 is a kind of structural block diagram of terminal provided in an embodiment of the present invention.The terminal 400 can be Portable movable end End, such as: smart phone, tablet computer.Terminal 400 is also possible to referred to as other titles such as user equipment, portable terminal.
In general, terminal 400 includes: processor 401 and memory 402.
Processor 401 may include one or more processing cores, such as 4 core processors, 4 core processors etc..Place Reason device 401 can use DSP (Digital Signal Processing, Digital Signal Processing), FPGA (Field- Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, may be programmed Logic array) at least one of example, in hardware realize.Processor 401 also may include primary processor and coprocessor, master Processor is the processor for being handled data in the awake state, also referred to as CPU (Central Processing Unit, central processing unit);Coprocessor is the low power processor for being handled data in the standby state.? In some embodiments, processor 401 can be integrated with GPU (Graphics Processing Unit, image processor), GPU is used to be responsible for the rendering and drafting of content to be shown needed for display screen.In some embodiments, processor 401 can also be wrapped AI (Artificial Intelligence, artificial intelligence) processor is included, the AI processor is for handling related machine learning Calculating operation.
Memory 402 may include one or more computer readable storage mediums, which can To be tangible and non-transient.Memory 402 may also include high-speed random access memory and nonvolatile memory, Such as one or more disk storage equipments, flash memory device.In some embodiments, non-transient in memory 402 Computer readable storage medium for storing at least one instruction, at least one instruction for performed by processor 401 with The method for realizing audio processing provided herein.
In some embodiments, terminal 400 is also optional includes: peripheral device interface 403 and at least one peripheral equipment. Specifically, peripheral equipment includes: radio circuit 404, touch display screen 405, camera 406, voicefrequency circuit 407, positioning component At least one of 408 and power supply 409.
Peripheral device interface 403 can be used for I/O (Input/Output, input/output) is relevant outside at least one Peripheral equipment is connected to processor 401 and memory 402.In some embodiments, processor 401, memory 402 and peripheral equipment Interface 403 is integrated on same chip or circuit board;In some other embodiments, processor 401, memory 402 and outer Any one or two in peripheral equipment interface 403 can realize on individual chip or circuit board, the present embodiment to this not It is limited.
Radio circuit 404 is for receiving and emitting RF (Radio Frequency, radio frequency) signal, also referred to as electromagnetic signal.It penetrates Frequency circuit 404 is communicated by electromagnetic signal with communication network and other communication equipments.Radio circuit 404 turns electric signal It is changed to electromagnetic signal to be sent, alternatively, the electromagnetic signal received is converted to electric signal.Optionally, radio circuit 404 wraps It includes: antenna system, RF transceiver, one or more amplifiers, tuner, oscillator, digital signal processor, codec chip Group, user identity module card etc..Radio circuit 404 can be carried out by least one wireless communication protocol with other terminals Communication.The wireless communication protocol includes but is not limited to: WWW, Metropolitan Area Network (MAN), Intranet, each third generation mobile communication network (2G, 3G, 4G and 5G), WLAN and/or WiFi (Wireless Fidelity, Wireless Fidelity) network.In some embodiments, it penetrates Frequency circuit 404 can also include NFC (Near Field Communication, wireless near field communication) related circuit, this Application is not limited this.
Touch display screen 405 is for showing UI (User Interface, user interface).The UI may include figure, text Sheet, icon, video and its their any combination.Touch display screen 405 also have acquisition touch display screen 405 surface or The ability of the touch signal of surface.The touch signal can be used as control signal and be input to processor 401 and be handled.Touching Display screen 405 is touched for providing virtual push button and/or dummy keyboard, also referred to as soft button and/or soft keyboard.In some embodiments In, touch display screen 405 can be one, and the front panel of terminal 400 is arranged;In further embodiments, touch display screen 405 It can be at least two, be separately positioned on the different surfaces of terminal 400 or in foldover design;In still other embodiments, touch Display screen 405 can be flexible display screen, be arranged on the curved surface of terminal 400 or on fold plane.Even, touch display screen 405 can also be arranged to non-rectangle irregular figure, namely abnormity screen.Touch display screen 405 can use LCD (Liquid Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) Etc. materials preparation.
CCD camera assembly 406 is for acquiring image or video.Optionally, CCD camera assembly 406 include front camera and Rear camera.In general, front camera is for realizing video calling or self-timer, rear camera is for realizing photo or video Shooting.In some embodiments, rear camera at least two are main camera, depth of field camera, wide-angle imaging respectively Any one in head, to realize that main camera and the fusion of depth of field camera realize background blurring function, main camera and wide-angle Pan-shot and VR (Virtual Reality, virtual reality) shooting function are realized in camera fusion.In some embodiments In, CCD camera assembly 406 can also include flash lamp.Flash lamp can be monochromatic warm flash lamp, be also possible to double-colored temperature flash of light Lamp.Double-colored temperature flash lamp refers to the combination of warm light flash lamp and cold light flash lamp, can be used for the light compensation under different-colour.
Voicefrequency circuit 407 is used to provide the audio interface between user and terminal 400.Voicefrequency circuit 407 may include wheat Gram wind and loudspeaker.Microphone is used to acquire the sound wave of user and environment, and converts sound waves into electric signal and be input to processor 401 are handled, or are input to radio circuit 404 to realize voice communication.For stereo acquisition or the purpose of noise reduction, wheat Gram wind can be it is multiple, be separately positioned on the different parts of terminal 400.Microphone can also be array microphone or omnidirectional's acquisition Type microphone.Loudspeaker is then used to that sound wave will to be converted to from the electric signal of processor 401 or radio circuit 404.Loudspeaker can To be traditional wafer speaker, it is also possible to piezoelectric ceramic loudspeaker.When loudspeaker is piezoelectric ceramic loudspeaker, not only may be used To convert electrical signals to the audible sound wave of the mankind, the sound wave that the mankind do not hear can also be converted electrical signals to survey Away from etc. purposes.In some embodiments, voicefrequency circuit 407 can also include earphone jack.
Positioning component 408 is used for the current geographic position of positioning terminal 400, to realize navigation or LBS (Location Based Service, location based service).Positioning component 408 can be the GPS (Global based on the U.S. Positioning System, global positioning system), China dipper system or Russia Galileo system positioning group Part.
Power supply 409 is used to be powered for the various components in terminal 400.Power supply 409 can be alternating current, direct current, Disposable battery or rechargeable battery.When power supply 409 includes rechargeable battery, which can be wired charging electricity Pond or wireless charging battery.Wired charging battery is the battery to be charged by Wireline, and wireless charging battery is by wireless The battery of coil charges.The rechargeable battery can be also used for supporting fast charge technology.
In some embodiments, terminal 400 further includes having one or more sensors 410.The one or more sensors 410 include but is not limited to: acceleration transducer 411, gyro sensor 412, pressure sensor 413, fingerprint sensor 414, Optical sensor 415 and proximity sensor 416.
The acceleration that acceleration transducer 411 can detecte in three reference axis of the coordinate system established with terminal 400 is big It is small.For example, acceleration transducer 411 can be used for detecting component of the acceleration of gravity in three reference axis.Processor 401 can With the acceleration of gravity signal acquired according to acceleration transducer 411, touch display screen 405 is controlled with transverse views or longitudinal view Figure carries out the display of user interface.Acceleration transducer 411 can be also used for the acquisition of game or the exercise data of user.
Gyro sensor 412 can detecte body direction and the rotational angle of terminal 400, and gyro sensor 412 can To cooperate with acquisition user to act the 3D of terminal 400 with acceleration transducer 411.Processor 401 is according to gyro sensor 412 Following function may be implemented in the data of acquisition: when action induction (for example changing UI according to the tilt operation of user), shooting Image stabilization, game control and inertial navigation.
The lower layer of side frame and/or touch display screen 405 in terminal 400 can be set in pressure sensor 413.Work as pressure When the side frame of terminal 400 is arranged in sensor 413, it can detecte user to the gripping signal of terminal 400, believed according to the gripping Number carry out right-hand man's identification or prompt operation.When the lower layer of touch display screen 405 is arranged in pressure sensor 413, Ke Yigen According to user to the pressure operation of touch display screen 405, realization controls the operability control on the interface UI.Operability Control includes at least one of button control, scroll bar control, icon control, menu control.
Fingerprint sensor 414 is used to acquire the fingerprint of user, according to the identity of collected fingerprint recognition user.Knowing Not Chu the identity of user when being trusted identity, authorize the user to execute relevant sensitive operation, the sensitive operation by processor 401 Including solution lock screen, check encryption information, downloading software, payment and change setting etc..End can be set in fingerprint sensor 414 Front, the back side or the side at end 400.When being provided with physical button or manufacturer Logo in terminal 400, fingerprint sensor 414 can To be integrated with physical button or manufacturer Logo.
Optical sensor 415 is for acquiring ambient light intensity.In one embodiment, processor 401 can be according to optics The ambient light intensity that sensor 415 acquires controls the display brightness of touch display screen 405.Specifically, when ambient light intensity is higher When, the display brightness of touch display screen 405 is turned up;When ambient light intensity is lower, the display for turning down touch display screen 405 is bright Degree.In another embodiment, the ambient light intensity that processor 401 can also be acquired according to optical sensor 415, dynamic adjust The acquisition parameters of CCD camera assembly 406.
Proximity sensor 416, also referred to as range sensor are generally arranged at the front of terminal 400.Proximity sensor 416 is used In the distance between the front of acquisition user and terminal 400.In one embodiment, when proximity sensor 416 detects user When the distance between front of terminal 400 gradually becomes smaller, touch display screen 405 is controlled by processor 401 and is cut from bright screen state It is changed to breath screen state;When proximity sensor 416 detects user and the distance between the front of terminal 400 becomes larger, by Processor 401 controls touch display screen 405 and is switched to bright screen state from breath screen state.
It will be understood by those skilled in the art that the restriction of the not structure paired terminal 400 of structure shown in Fig. 4, can wrap It includes than illustrating more or fewer components, perhaps combine certain components or is arranged using different components.
In the exemplary embodiment, a kind of computer readable storage medium is additionally provided, is stored at least in storage medium One instruction, at least one instruction are loaded by processor and are executed the side to realize the identification maneuver classification in above-described embodiment Method.For example, the computer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk With optical data storage devices etc..
Fig. 5 is a kind of structural schematic diagram of computer equipment provided in an embodiment of the present invention, which can be because Configuration or performance are different and generate bigger difference, may include one or more processors (central Processing units, CPU) 501 and one or more memory 502, wherein it is stored in the memory 502 There is at least one instruction, at least one instruction is loaded by the processor 501 and executed to realize above-mentioned audio processing Method.
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware It completes, relevant hardware can also be instructed to complete by program, the program can store in a kind of computer-readable In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims (12)

1. a kind of method of audio processing, which is characterized in that the described method includes:
Tone color reference audio frame is obtained in target audio, extracts the spectrum envelope characteristic information of the tone color reference audio frame;
Extract the fundamental frequency information of source audio frame identical with the play time of the tone color reference audio frame in source audio;
Based on the fundamental frequency information and the spectrum envelope characteristic information, the corresponding color audio of changing voice of the source audio frame is generated Frame.
2. the method according to claim 1, wherein the method also includes:
Extract the consonant information of the source audio frame;
It is described to be based on the fundamental frequency information and the spectrum envelope characteristic information, generate the corresponding color sound of changing voice of the source audio frame Frequency frame, comprising:
Based on the fundamental frequency information, the spectrum envelope characteristic information and the consonant information, it is corresponding to generate the source audio frame Color audio frame of changing voice.
3. the method according to claim 1, wherein it is described in target audio obtain tone color reference audio frame it Before, further includes:
Modified tone is carried out to the source audio to handle to obtain the target audio.
4. according to the method described in claim 3, it is characterized in that, it is described to the source audio carry out modified tone handle to obtain it is described Target audio, comprising:
In the source audio, at interval of the source audio frame of the first preset number, the source audio frame of the second preset number is chosen, it is multiple The source audio frame of duplication is inserted into after the source audio frame of selection by the source audio frame for making second preset number, obtains institute State the corresponding reduction of speed audio of source audio;
Resampling is carried out to the reduction of speed audio, obtains the identical target of and duration identical as the frame number of the source audio Audio.
5. the method according to claim 1, wherein it is described in target audio obtain tone color reference audio frame it Before, further includes:
Show local audio list;
When receiving the instruction of the selection to the option of the target audio in the local audio list, the target is obtained Audio.
6. a kind of device of audio processing, which is characterized in that described device includes:
Module is obtained, for obtaining tone color reference audio frame in target audio;
Extraction module, for extracting the spectrum envelope characteristic information of the tone color reference audio frame, extract in source audio with it is described The fundamental frequency information of the identical source audio frame of the play time of tone color reference audio frame;
It is corresponding to generate the source audio frame for being based on the fundamental frequency information and the spectrum envelope characteristic information for generation module Color audio frame of changing voice.
7. device according to claim 6, which is characterized in that the extraction module is also used to extract the source audio frame Consonant information;
The generation module is generated for being based on the fundamental frequency information, the spectrum envelope characteristic information and the consonant information The corresponding color audio frame of changing voice of the source audio frame.
8. device according to claim 6, which is characterized in that described device further include:
Modify tone module, handles to obtain the target audio for carrying out modified tone to the source audio.
9. device according to claim 8, which is characterized in that the modified tone module is used for:
In the source audio, at interval of the source audio frame of the first preset number, the source audio frame of the second preset number is chosen, it is multiple The source audio frame of duplication is inserted into after the source audio frame of selection by the source audio frame for making second preset number, obtains institute State the corresponding reduction of speed audio of source audio;
Resampling is carried out to the reduction of speed audio, obtains the identical target of and duration identical as the frame number of the source audio Audio.
10. device according to claim 6, which is characterized in that described device further include: display module, for showing this Ground audio list;
The acquisition module is also used to when the selection for receiving the option to the target audio in the local audio list When instruction, the target audio is obtained.
11. a kind of terminal, which is characterized in that the terminal includes processor and memory, is stored at least in the memory One instruction, at least one instruction are loaded by the processor and are executed to realize as described in claim 1 to 6 is any The method of audio processing.
12. a kind of computer readable storage medium, which is characterized in that be stored at least one instruction, institute in the storage medium At least one instruction is stated to be loaded by the processor and executed to realize such as the audio processing as described in claim 1 to 6 is any Method.
CN201811066716.6A 2018-09-13 2018-09-13 Method and apparatus for audio processing Active CN109192218B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811066716.6A CN109192218B (en) 2018-09-13 2018-09-13 Method and apparatus for audio processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811066716.6A CN109192218B (en) 2018-09-13 2018-09-13 Method and apparatus for audio processing

Publications (2)

Publication Number Publication Date
CN109192218A true CN109192218A (en) 2019-01-11
CN109192218B CN109192218B (en) 2021-05-07

Family

ID=64910608

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811066716.6A Active CN109192218B (en) 2018-09-13 2018-09-13 Method and apparatus for audio processing

Country Status (1)

Country Link
CN (1) CN109192218B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097890A (en) * 2019-04-16 2019-08-06 北京搜狗科技发展有限公司 A kind of method of speech processing, device and the device for speech processes
CN110459196A (en) * 2019-09-05 2019-11-15 长沙市回音科技有限公司 A kind of method, apparatus and system adjusting singing songs difficulty
CN111435591A (en) * 2020-01-17 2020-07-21 珠海市杰理科技股份有限公司 Sound synthesis method and system, audio processing chip and electronic equipment
CN111782865A (en) * 2020-06-23 2020-10-16 腾讯音乐娱乐科技(深圳)有限公司 Audio information processing method and device and storage medium
CN112259072A (en) * 2020-09-25 2021-01-22 北京百度网讯科技有限公司 Voice conversion method and device and electronic equipment
CN115461809A (en) * 2020-09-04 2022-12-09 罗兰株式会社 Information processing apparatus and information processing method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1719514A (en) * 2004-07-06 2006-01-11 中国科学院自动化研究所 Based on speech analysis and synthetic high-quality real-time change of voice method
JP5772739B2 (en) * 2012-06-21 2015-09-02 ヤマハ株式会社 Audio processing device
WO2015166981A1 (en) * 2014-04-30 2015-11-05 ヤマハ株式会社 Pitch information generation device, pitch information generation method, program, and computer-readable recording medium
CN106971703A (en) * 2017-03-17 2017-07-21 西北师范大学 A kind of song synthetic method and device based on HMM
CN107731241A (en) * 2017-09-29 2018-02-23 广州酷狗计算机科技有限公司 Handle the method, apparatus and storage medium of audio signal
CN107863095A (en) * 2017-11-21 2018-03-30 广州酷狗计算机科技有限公司 Acoustic signal processing method, device and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1719514A (en) * 2004-07-06 2006-01-11 中国科学院自动化研究所 Based on speech analysis and synthetic high-quality real-time change of voice method
JP5772739B2 (en) * 2012-06-21 2015-09-02 ヤマハ株式会社 Audio processing device
WO2015166981A1 (en) * 2014-04-30 2015-11-05 ヤマハ株式会社 Pitch information generation device, pitch information generation method, program, and computer-readable recording medium
CN106971703A (en) * 2017-03-17 2017-07-21 西北师范大学 A kind of song synthetic method and device based on HMM
CN107731241A (en) * 2017-09-29 2018-02-23 广州酷狗计算机科技有限公司 Handle the method, apparatus and storage medium of audio signal
CN107863095A (en) * 2017-11-21 2018-03-30 广州酷狗计算机科技有限公司 Acoustic signal processing method, device and storage medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097890A (en) * 2019-04-16 2019-08-06 北京搜狗科技发展有限公司 A kind of method of speech processing, device and the device for speech processes
CN110097890B (en) * 2019-04-16 2021-11-02 北京搜狗科技发展有限公司 Voice processing method and device for voice processing
CN110459196A (en) * 2019-09-05 2019-11-15 长沙市回音科技有限公司 A kind of method, apparatus and system adjusting singing songs difficulty
CN111435591A (en) * 2020-01-17 2020-07-21 珠海市杰理科技股份有限公司 Sound synthesis method and system, audio processing chip and electronic equipment
CN111782865A (en) * 2020-06-23 2020-10-16 腾讯音乐娱乐科技(深圳)有限公司 Audio information processing method and device and storage medium
CN115461809A (en) * 2020-09-04 2022-12-09 罗兰株式会社 Information processing apparatus and information processing method
US11922913B2 (en) 2020-09-04 2024-03-05 Roland Corporation Information processing device, information processing method, and non-transitory computer readable recording medium
CN112259072A (en) * 2020-09-25 2021-01-22 北京百度网讯科技有限公司 Voice conversion method and device and electronic equipment

Also Published As

Publication number Publication date
CN109192218B (en) 2021-05-07

Similar Documents

Publication Publication Date Title
CN109192218A (en) The method and apparatus of audio processing
CN109379643A (en) Image synthesizing method, device, terminal and storage medium
CN109729297A (en) The method and apparatus of special efficacy are added in video
CN108008930A (en) The method and apparatus for determining K song score values
CN108595239A (en) image processing method, device, terminal and computer readable storage medium
CN110688082B (en) Method, device, equipment and storage medium for determining adjustment proportion information of volume
CN108965757A (en) video recording method, device, terminal and storage medium
CN108922506A (en) Song audio generation method, device and computer readable storage medium
CN109147757A (en) Song synthetic method and device
CN108538302A (en) The method and apparatus of Composite tone
CN110491358A (en) Carry out method, apparatus, equipment, system and the storage medium of audio recording
CN109300485A (en) Methods of marking, device, electronic equipment and the computer storage medium of audio signal
CN109168073A (en) The method and apparatus that direct broadcasting room cover is shown
CN108965922A (en) Video cover generation method, device and storage medium
CN109346111A (en) Data processing method, device, terminal and storage medium
CN108320756A (en) It is a kind of detection audio whether be absolute music audio method and apparatus
CN109547843A (en) The method and apparatus that audio-video is handled
CN109003621A (en) A kind of audio-frequency processing method, device and storage medium
CN110956971A (en) Audio processing method, device, terminal and storage medium
CN107871012A (en) Audio-frequency processing method, device, storage medium and terminal
CN108922562A (en) Sing evaluation result display methods and device
CN109065068A (en) Audio-frequency processing method, device and storage medium
CN110189771A (en) With the sound quality detection method, device and storage medium of source audio
CN109192223A (en) The method and apparatus of audio alignment
CN109243479A (en) Acoustic signal processing method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant