CN109192218A - The method and apparatus of audio processing - Google Patents
The method and apparatus of audio processing Download PDFInfo
- Publication number
- CN109192218A CN109192218A CN201811066716.6A CN201811066716A CN109192218A CN 109192218 A CN109192218 A CN 109192218A CN 201811066716 A CN201811066716 A CN 201811066716A CN 109192218 A CN109192218 A CN 109192218A
- Authority
- CN
- China
- Prior art keywords
- audio
- audio frame
- frame
- source
- source audio
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000012545 processing Methods 0.000 title claims abstract description 39
- 238000001228 spectrum Methods 0.000 claims abstract description 36
- 239000000284 extract Substances 0.000 claims abstract description 18
- 230000009467 reduction Effects 0.000 claims description 15
- 238000000605 extraction Methods 0.000 claims description 8
- 238000012952 Resampling Methods 0.000 claims description 7
- 230000008859 change Effects 0.000 abstract description 24
- 230000008569 process Effects 0.000 description 13
- 230000001133 acceleration Effects 0.000 description 9
- 230000006870 function Effects 0.000 description 9
- 238000004891 communication Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 8
- 230000002093 peripheral effect Effects 0.000 description 7
- 230000000694 effects Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 241000209140 Triticum Species 0.000 description 2
- 235000021307 Triticum Nutrition 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 239000000919 ceramic Substances 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 230000005484 gravity Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000002194 synthesizing effect Effects 0.000 description 2
- 230000001052 transient effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000003362 replicative effect Effects 0.000 description 1
- 230000006641 stabilisation Effects 0.000 description 1
- 238000011105 stabilization Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
- G10L2021/0135—Voice conversion or morphing
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Telephone Function (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The invention discloses a kind of method and apparatus of audio processing, belong to audio editing techniques field.The described method includes: obtaining tone color reference audio frame in target audio, the spectrum envelope characteristic information of the tone color reference audio frame is extracted;Extract the fundamental frequency information of source audio frame identical with the play time of the tone color reference audio frame in source audio;Based on the fundamental frequency information and the spectrum envelope characteristic information, the corresponding color audio frame of changing voice of the source audio frame is generated.Using the present invention, the technical issues of can effectively solving dodgoing during the change of voice.
Description
Technical field
The present invention relates to audio editing techniques field, in particular to a kind of method and apparatus of audio processing.
Background technique
Now, there is change of voice software on many mobile phones, the conversion of men and women's sound or child's voice uncle's sound is carried out to voice, for
It is a very interesting thing for user.
The principle of the change of voice in the related technology are as follows: source audio is replicated into an audio frame every several audio frames, then by duplication
Audio frame is inserted into the audio frame back being replicated, and obtains the elongated reduction of speed audio of duration.Resampling is carried out to reduction of speed audio to obtain
To duration new audio identical with source audio.The tone and tone color of new audio are all changed, to reach the mesh of the change of voice
's.
In the implementation of the present invention, inventor find the relevant technologies the prior art has at least the following problems:
When wanting the voice audio after modified tone synthesizing song audio with audio accompaniment, there is following two situation: if
Audio accompaniment modifies tone accordingly, and due to the dodgoing of audio accompaniment, the sound quality of audio accompaniment will be damaged, the song being finally synthesizing
The quality of bent audio will decline;If audio accompaniment invariable tone, the accompaniment tone of voice audio and invariable tone after modified tone
For frequency not on a tone, the song audio auditory effect of synthesis is poor.
Summary of the invention
In order to solve the problems, such as present in the relevant technologies, the embodiment of the invention provides a kind of method of audio processing and dresses
It sets.The technical solution is as follows:
In a first aspect, providing a kind of method of audio processing, which comprises
Tone color reference audio frame is obtained in target audio, extracts the spectrum envelope feature letter of the tone color reference audio frame
Breath;
Extract the fundamental frequency information of source audio frame identical with the play time of the tone color reference audio frame in source audio;
Based on the fundamental frequency information and the spectrum envelope characteristic information, the corresponding color sound of changing voice of the source audio frame is generated
Frequency frame.
Optionally, the method also includes:
Extract the consonant information of the source audio frame;
It is described to be based on the fundamental frequency information and the spectrum envelope characteristic information, generate that the source audio frame is corresponding to change voice
Color audio frame, comprising:
Based on the fundamental frequency information, the spectrum envelope characteristic information and the consonant information, the source audio frame is generated
Corresponding color audio frame of changing voice.
Optionally, it is described in target audio obtain tone color reference audio frame before, further includes:
Modified tone is carried out to the source audio to handle to obtain the target audio.
It is optionally, described modified tone is carried out to the source audio to handle to obtain the target audio, comprising:
In the source audio, at interval of the source audio frame of the first preset number, the source audio of the second preset number is chosen
The source audio frame of duplication is inserted into after the source audio frame of selection by frame, the source audio frame for replicating second preset number,
Obtain the corresponding reduction of speed audio of the source audio;
Resampling is carried out to the reduction of speed audio, it is identical described to obtain and duration identical as the frame number of the source audio
Target audio.
Optionally, it is described in target audio obtain tone color reference audio frame before, further includes:
Show local audio list;
When receiving the instruction of the selection to the option of the target audio in the local audio list, described in acquisition
Target audio.
Second aspect, provides a kind of device of audio processing, and described device includes:
Module is obtained, for obtaining tone color reference audio frame in target audio;
Extraction module, for extract the tone color reference audio frame spectrum envelope characteristic information and extract source audio in
The fundamental frequency information of the identical source audio frame of the play time of the tone color reference audio frame;
Generation module generates the source audio frame for being based on the fundamental frequency information and the spectrum envelope characteristic information
Corresponding color audio frame of changing voice.
Optionally, the extraction module is also used to extract the consonant information of the source audio frame;
The generation module is also used to based on the fundamental frequency information, the spectrum envelope characteristic information and consonant letter
Breath, generates the corresponding color audio frame of changing voice of the source audio frame.
Optionally, described device further include:
Modify tone module, handles to obtain the target audio for carrying out modified tone to the source audio.
Optionally, the modified tone module, in the source audio, at interval of the source audio frame of the first preset number,
The source audio frame of the second preset number is chosen, the source audio frame of second preset number is replicated, the source audio frame of duplication is inserted
Enter to after the source audio frame of selection, obtains the corresponding reduction of speed audio of the source audio;
Resampling is carried out to the reduction of speed audio, it is identical described to obtain and duration identical as the frame number of the source audio
Target audio.
Optionally, described device further include:
Display module, for showing local audio list;
The acquisition module is also used to receive to the option of the target audio in the local audio list
When choosing instruction, the target audio is obtained.
The third aspect provides a kind of terminal, and the terminal includes processor and memory, is stored in the memory
At least one instruction, at least one instruction are loaded as the processor and are executed to realize as described in above-mentioned first aspect
The method of audio processing.
Fourth aspect provides a kind of computer readable storage medium, is stored in the computer readable storage medium
At least one instruction, at least one instruction are loaded as the processor and are executed to realize as described in above-mentioned first aspect
The method of audio processing.
Technical solution bring beneficial effect provided in an embodiment of the present invention includes at least:
In the embodiment of the present invention, since finally obtained color audio frame of changing voice includes the fundamental frequency information and tone color of source audio frame
The spectrum envelope characteristic information of reference audio frame, so, the tone color for color audio of changing voice changes, and has reached change of voice purpose, color of changing voice
The tone of audio is identical as source audio, and tone has not been changed.To, since the tone for color audio of changing voice is constant, the sound of audio accompaniment
Tune need not also change.The tone of color audio of changing voice and audio accompaniment does not all change, and color audio of changing voice can direct and accompaniment
Audio synthesizes song audio.Final song audio sound quality will not be damaged, and will not there is a problem of auditory effect difference.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for
For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other
Attached drawing.
Fig. 1 is a kind of flow chart of the method for audio processing provided in an embodiment of the present invention;
Fig. 2 is a kind of flow chart of the method for audio processing provided in an embodiment of the present invention;
Fig. 3 is a kind of structural schematic diagram of the device of audio processing provided in an embodiment of the present invention;
Fig. 4 is a kind of structural schematic diagram of the terminal of audio processing provided in an embodiment of the present invention;
Fig. 5 is a kind of structural schematic diagram of the computer equipment of audio processing provided in an embodiment of the present invention;
Fig. 6 is a kind of schematic diagram of song selection interface provided in an embodiment of the present invention;
Fig. 7 is a kind of schematic diagram at K song interface provided in an embodiment of the present invention;
Fig. 8 is a kind of schematic diagram at change of voice classification interface provided in an embodiment of the present invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention
Formula is described in further detail.
The embodiment of the invention provides a kind of method of audio processing, this method can be realized by terminal.Wherein, the terminal
It can be the mobile terminals such as mobile phone, tablet computer, notebook, be also possible to the fixed terminals such as desktop computer.
Terminal may include processor, memory, audio output part and audio input means etc..Processor, Ke Yiwei
CPU (Central Processing Unit, central processing unit) etc., can be used for editing audio file, control display into
The processing such as row display.Memory can be RAM (Random Access Memory, random access memory) that Flash (dodges
Deposit) etc., data needed for can be used for storing the data received, treatment process, the data generated in treatment process etc., such as sound
Color reference audio frame, source audio frame, color audio frame etc. of changing voice.Audio output part can be speaker, earphone etc..Audio input portion
Part can be microphone etc..Terminal can also include input part, screen etc..Input part can be mouse, touch screen, touching
Plate, keyboard etc. are controlled, corresponding instruction can be generated based on the operation of user.Screen can be touch screen or non-touch-control screen, can be with
For showing the operation interface etc. of application program.
As shown in Figure 1, the process flow of this method may include following step:
In a step 101, modified tone is carried out to source audio to handle to obtain target audio.
Wherein, source audio can be the voice audio of user.
In an implementation, user can install the application program for K song and audio processing at the terminal, when user wants K
When singing and carrying out voice change process to the song of oneself, shortcut icon can be clicked and run the application program, and in application program
The function choosing-item of K song is selected in main interface.At this point, can show song selection interface (as shown in Figure 6), album in application program
Audio list can be shown by selecting in interface, include the option of multiple song audios in audio list, and user may browse through audio column
Table clicks the option that the song audio sung is wanted oneself in selection.After selection, application program enters K song interface (such as Fig. 7 institute
Show).Accompaniment in the terminal plays song audio, and the lyrics can be shown on the screen.Meanwhile to will start audio defeated for terminal
Enter component (such as microphone), carry out audio recording, user can sing with accompaniment, the voice sound that terminal will be recorded
Frequency is used as source audio.
Change of voice classification interface (as shown in Figure 8), change of voice classification circle can be shown after the completion of voice audio recording, in application program
Sound type list can be shown in face, multiple sound type options can be shown in sound type list, and user may browse through
Sound type list clicks selection oneself and wants the sound type option become.If user thinks that the change of voice is child's voice, then he can point
Hit child's voice option.Then, terminal can the source audio to user carry out to child's voice change modified tone processing.Source audio modified tone is handled
Target audio is obtained afterwards.
Optionally, modified tone processing can choose soundtouch algorithm (a kind of audio pitch shifting algorithm), frequency domain modified tone method or
Parameter modified tone method modified tone, by taking soundtouch algorithm as an example, the modified tone principle of soundtouch algorithm is as follows: in source audio,
At interval of the source audio frame of the first preset number, the source audio frame of the second preset number is chosen, second preset number is replicated
The source audio frame of duplication is inserted into after the source audio frame of selection by source audio frame, obtains the corresponding reduction of speed audio of source audio;
Resampling is carried out to the reduction of speed audio, obtains the identical target audio of and duration identical as the frame number of source audio.
Wherein, the specific value of the first preset number and the second preset number can be fixed numbers, can also be according to source
The fundamental frequency of audio and the sound type corresponding fundamental frequency of user's selection determine.
In a step 102, tone color reference audio frame is obtained in target audio, extracts the frequency spectrum of the tone color reference audio frame
Envelope characteristic information.
Wherein, the spectrum envelope characteristic information of tone color reference audio frame is the shape feature for describing spectrum curve
Information, spectrum envelope characteristic information can characterize tone color.
In an implementation, after generating target audio, can in target audio, since first audio frame, according to
Playing sequence obtains audio frame therein, i.e. tone color reference audio frame one by one.Then, it to the tone color reference audio frame of acquisition, mentions
The fundamental frequency information for taking tone color reference audio frame extracts the spectrum envelope feature of tone color reference audio frame then in conjunction with the fundamental frequency information
Information.
In step 103, source audio frame identical with the play time of tone color reference audio frame in source audio is extracted
Fundamental frequency information.
Wherein, fundamental frequency information can be the crest frequency of audio frame frequency spectrum.
It in an implementation, can be in source sound while terminal extracts spectrum envelope characteristic information in tone color reference audio frame
Source audio frame identical with the play time of tone color reference audio frame is obtained in frequency.Target is generated based on source audio by above-mentioned
The treatment process of audio is it is found that source audio is identical with duration with the frame number of target audio.So obtaining in target audio
While one audio frame is as tone color reference audio frame, first source audio frame can be obtained in source audio, in target sound
While obtaining second audio frame as tone color reference audio frame in frequency, second source audio can be obtained in source audio
Frame, in this way, the source audio frame and tone color reference audio frame play time having the same that obtain.After obtaining source audio frame,
It can be with the fundamental frequency information of extraction source audio frame.
Optionally, in order to enable change of voice effect more natural reality, can also extract the consonant information in source audio.
In an implementation, the treatment process of target audio is generated based on above-mentioned source audio.After obtaining source audio frame, it can incite somebody to action
The fundamental frequency information and consonant information of source audio frame all extract.
At step 104, it is based on fundamental frequency information and spectrum envelope characteristic information, generates the corresponding color sound of changing voice of source audio frame
Frequency frame.
In an implementation, terminal can call world tool (a kind of tool that audio can be generated) to generate new voice sound
Frequently.The spectrum envelope characteristic information of the fundamental frequency information of source audio and target audio is input in world tool, world tool
New voice audio can be generated.The people's sound audio has the fundamental frequency information of source audio and the spectrum envelope feature letter of target audio
Breath, so, the tone of the people's sound audio is consistent with source audio, and tone color is consistent with target audio.After overall effect is the change of voice
It is constant that voice audio tone color compared with the voice audio initially recorded changes tone.
Optionally, the consonant information of the source audio frame based on said extracted, the corresponding treatment process of step 104 are as follows: base
In fundamental frequency information, spectrum envelope characteristic information and consonant information, the corresponding color audio frame of changing voice of source audio frame is generated.
In an implementation, terminal can call world tool (a kind of tool that audio can be generated) to generate new voice sound
Frequently.The spectrum envelope characteristic information of the fundamental frequency information of source audio, the consonant information of source audio and target audio is input to world
In tool, world tool can generate new voice audio.Fundamental frequency information and target audio of the people's sound audio with source audio
Spectrum envelope characteristic information, so, the tone of the people's sound audio is consistent with source audio, and tone color is consistent with target audio.Due to being somebody's turn to do
Voice audio also has the consonant information of source audio, so, the people's sound audio and it is based only on fundamental frequency information and spectrum envelope feature
The voice audio that information generates is compared, and sounds that sound is more natural.
Finally, after the completion of the change of voice, user can click audition button, and terminal plays are changed voice color audio and audio accompaniment.Such as
Fruit user can click save button to change of voice good results, user, and the color audio that will change voice synthesizes new song with audio accompaniment
Audio is stored in local file;If user is dissatisfied to change of voice effect, user, which can click, rerecords button, records again simultaneously
Carry out change of voice operation.User can also click publication button, and selection uploads new song audio on the net.
As shown in Fig. 2, the process flow of this method may include following step:
In step 201, local audio list is shown.When receiving the option to the target audio in local audio list
Selection instruction when, obtain target audio.
Wherein, target audio can be the voice audio being stored in advance at the terminal.
In an implementation, user can install the application program for K song and audio processing at the terminal, when user wants K
When singing and carrying out voice change process to the song of oneself, shortcut icon can be clicked and run the application program, and in application program
The function choosing-item of K song is selected in main interface.At this point, can show song selection interface (as shown in Figure 6), album in application program
Audio list can be shown by selecting in interface, include the option of multiple song audios in audio list, and user may browse through audio column
Table clicks the option that the song audio sung is wanted oneself in selection.After selection, application program enters K song interface (such as Fig. 7 institute
Show).Accompaniment in the terminal plays song audio, and the lyrics can be shown on the screen.Terminal will be in the song audio
Voice audio is as target audio.Meanwhile terminal will start audio input means (such as microphone), carry out audio recording, user
It can be sung with accompaniment, terminal is using the voice audio recorded as source audio.
In step 202, tone color reference audio frame is obtained in target audio, extracts the frequency spectrum of the tone color reference audio frame
Envelope characteristic information.
Specific implementation process is referring to step 102.
In step 203, source audio frame identical with the play time of tone color reference audio frame in source audio is extracted
Fundamental frequency information.
Specific implementation process is referring to step 103.
In step 204, it is based on fundamental frequency information and institute's spectrum envelope characteristic information, generates that institute's source audio frame is corresponding changes voice
Color audio frame.
Specific implementation process is referring to step 104.
Based on the same technical idea, the embodiment of the invention also provides a kind of device of audio processing, which can be with
For the terminal in above-described embodiment, as shown in figure 3, the device includes:
Module 301 is obtained, for obtaining tone color reference audio frame in target audio;
Extraction module 302, for extract tone color reference audio frame spectrum envelope characteristic information and extract source audio in
The fundamental frequency information of the identical source audio frame of the play time of tone color reference audio frame;
Generation module 303, for generating based on fundamental frequency information and spectrum envelope characteristic information, source audio frame is corresponding to change voice
Color audio frame.
Optionally, extraction module 302 are also used to the consonant information of extraction source audio frame;
Generation module 303 is also used to generate source audio based on fundamental frequency information, spectrum envelope characteristic information and consonant information
The corresponding color audio frame of changing voice of frame.
Optionally, described device further include:
Modify tone module 304, handles to obtain target audio for carrying out modified tone to source audio.
Optionally, modify tone module 304, in source audio, at interval of the source audio frame of the first preset number, chooses the
The source audio frame of duplication is inserted into selection by the source audio frame of the source audio frame of two preset numbers, the second preset number of duplication
After source audio frame, the corresponding reduction of speed audio of source audio is obtained;
Resampling is carried out to reduction of speed audio, obtains the identical target audio of and duration identical as the frame number of source audio.
Optionally, described device further include:
Display module 305, for showing local audio list;
Module 306 is obtained, is also used to receive the instruction of the selection to the option of the target audio in local audio list
When, obtain target audio.
About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method
Embodiment in be described in detail, no detailed explanation will be given here.
It should be understood that the device of audio processing provided by the above embodiment is in audio processing, only with above-mentioned each function
Can module division progress for example, in practical application, can according to need and by above-mentioned function distribution by different functions
Module is completed, i.e., the internal structure of equipment is divided into different functional modules, described above all or part of to complete
Function.In addition, the device of audio processing provided by the above embodiment and the embodiment of the method for audio processing belong to same design,
Specific implementation process is detailed in embodiment of the method, and which is not described herein again.
Fig. 4 is a kind of structural block diagram of terminal provided in an embodiment of the present invention.The terminal 400 can be Portable movable end
End, such as: smart phone, tablet computer.Terminal 400 is also possible to referred to as other titles such as user equipment, portable terminal.
In general, terminal 400 includes: processor 401 and memory 402.
Processor 401 may include one or more processing cores, such as 4 core processors, 4 core processors etc..Place
Reason device 401 can use DSP (Digital Signal Processing, Digital Signal Processing), FPGA (Field-
Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, may be programmed
Logic array) at least one of example, in hardware realize.Processor 401 also may include primary processor and coprocessor, master
Processor is the processor for being handled data in the awake state, also referred to as CPU (Central Processing
Unit, central processing unit);Coprocessor is the low power processor for being handled data in the standby state.?
In some embodiments, processor 401 can be integrated with GPU (Graphics Processing Unit, image processor),
GPU is used to be responsible for the rendering and drafting of content to be shown needed for display screen.In some embodiments, processor 401 can also be wrapped
AI (Artificial Intelligence, artificial intelligence) processor is included, the AI processor is for handling related machine learning
Calculating operation.
Memory 402 may include one or more computer readable storage mediums, which can
To be tangible and non-transient.Memory 402 may also include high-speed random access memory and nonvolatile memory,
Such as one or more disk storage equipments, flash memory device.In some embodiments, non-transient in memory 402
Computer readable storage medium for storing at least one instruction, at least one instruction for performed by processor 401 with
The method for realizing audio processing provided herein.
In some embodiments, terminal 400 is also optional includes: peripheral device interface 403 and at least one peripheral equipment.
Specifically, peripheral equipment includes: radio circuit 404, touch display screen 405, camera 406, voicefrequency circuit 407, positioning component
At least one of 408 and power supply 409.
Peripheral device interface 403 can be used for I/O (Input/Output, input/output) is relevant outside at least one
Peripheral equipment is connected to processor 401 and memory 402.In some embodiments, processor 401, memory 402 and peripheral equipment
Interface 403 is integrated on same chip or circuit board;In some other embodiments, processor 401, memory 402 and outer
Any one or two in peripheral equipment interface 403 can realize on individual chip or circuit board, the present embodiment to this not
It is limited.
Radio circuit 404 is for receiving and emitting RF (Radio Frequency, radio frequency) signal, also referred to as electromagnetic signal.It penetrates
Frequency circuit 404 is communicated by electromagnetic signal with communication network and other communication equipments.Radio circuit 404 turns electric signal
It is changed to electromagnetic signal to be sent, alternatively, the electromagnetic signal received is converted to electric signal.Optionally, radio circuit 404 wraps
It includes: antenna system, RF transceiver, one or more amplifiers, tuner, oscillator, digital signal processor, codec chip
Group, user identity module card etc..Radio circuit 404 can be carried out by least one wireless communication protocol with other terminals
Communication.The wireless communication protocol includes but is not limited to: WWW, Metropolitan Area Network (MAN), Intranet, each third generation mobile communication network (2G, 3G,
4G and 5G), WLAN and/or WiFi (Wireless Fidelity, Wireless Fidelity) network.In some embodiments, it penetrates
Frequency circuit 404 can also include NFC (Near Field Communication, wireless near field communication) related circuit, this
Application is not limited this.
Touch display screen 405 is for showing UI (User Interface, user interface).The UI may include figure, text
Sheet, icon, video and its their any combination.Touch display screen 405 also have acquisition touch display screen 405 surface or
The ability of the touch signal of surface.The touch signal can be used as control signal and be input to processor 401 and be handled.Touching
Display screen 405 is touched for providing virtual push button and/or dummy keyboard, also referred to as soft button and/or soft keyboard.In some embodiments
In, touch display screen 405 can be one, and the front panel of terminal 400 is arranged;In further embodiments, touch display screen 405
It can be at least two, be separately positioned on the different surfaces of terminal 400 or in foldover design;In still other embodiments, touch
Display screen 405 can be flexible display screen, be arranged on the curved surface of terminal 400 or on fold plane.Even, touch display screen
405 can also be arranged to non-rectangle irregular figure, namely abnormity screen.Touch display screen 405 can use LCD (Liquid
Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode)
Etc. materials preparation.
CCD camera assembly 406 is for acquiring image or video.Optionally, CCD camera assembly 406 include front camera and
Rear camera.In general, front camera is for realizing video calling or self-timer, rear camera is for realizing photo or video
Shooting.In some embodiments, rear camera at least two are main camera, depth of field camera, wide-angle imaging respectively
Any one in head, to realize that main camera and the fusion of depth of field camera realize background blurring function, main camera and wide-angle
Pan-shot and VR (Virtual Reality, virtual reality) shooting function are realized in camera fusion.In some embodiments
In, CCD camera assembly 406 can also include flash lamp.Flash lamp can be monochromatic warm flash lamp, be also possible to double-colored temperature flash of light
Lamp.Double-colored temperature flash lamp refers to the combination of warm light flash lamp and cold light flash lamp, can be used for the light compensation under different-colour.
Voicefrequency circuit 407 is used to provide the audio interface between user and terminal 400.Voicefrequency circuit 407 may include wheat
Gram wind and loudspeaker.Microphone is used to acquire the sound wave of user and environment, and converts sound waves into electric signal and be input to processor
401 are handled, or are input to radio circuit 404 to realize voice communication.For stereo acquisition or the purpose of noise reduction, wheat
Gram wind can be it is multiple, be separately positioned on the different parts of terminal 400.Microphone can also be array microphone or omnidirectional's acquisition
Type microphone.Loudspeaker is then used to that sound wave will to be converted to from the electric signal of processor 401 or radio circuit 404.Loudspeaker can
To be traditional wafer speaker, it is also possible to piezoelectric ceramic loudspeaker.When loudspeaker is piezoelectric ceramic loudspeaker, not only may be used
To convert electrical signals to the audible sound wave of the mankind, the sound wave that the mankind do not hear can also be converted electrical signals to survey
Away from etc. purposes.In some embodiments, voicefrequency circuit 407 can also include earphone jack.
Positioning component 408 is used for the current geographic position of positioning terminal 400, to realize navigation or LBS (Location
Based Service, location based service).Positioning component 408 can be the GPS (Global based on the U.S.
Positioning System, global positioning system), China dipper system or Russia Galileo system positioning group
Part.
Power supply 409 is used to be powered for the various components in terminal 400.Power supply 409 can be alternating current, direct current,
Disposable battery or rechargeable battery.When power supply 409 includes rechargeable battery, which can be wired charging electricity
Pond or wireless charging battery.Wired charging battery is the battery to be charged by Wireline, and wireless charging battery is by wireless
The battery of coil charges.The rechargeable battery can be also used for supporting fast charge technology.
In some embodiments, terminal 400 further includes having one or more sensors 410.The one or more sensors
410 include but is not limited to: acceleration transducer 411, gyro sensor 412, pressure sensor 413, fingerprint sensor 414,
Optical sensor 415 and proximity sensor 416.
The acceleration that acceleration transducer 411 can detecte in three reference axis of the coordinate system established with terminal 400 is big
It is small.For example, acceleration transducer 411 can be used for detecting component of the acceleration of gravity in three reference axis.Processor 401 can
With the acceleration of gravity signal acquired according to acceleration transducer 411, touch display screen 405 is controlled with transverse views or longitudinal view
Figure carries out the display of user interface.Acceleration transducer 411 can be also used for the acquisition of game or the exercise data of user.
Gyro sensor 412 can detecte body direction and the rotational angle of terminal 400, and gyro sensor 412 can
To cooperate with acquisition user to act the 3D of terminal 400 with acceleration transducer 411.Processor 401 is according to gyro sensor 412
Following function may be implemented in the data of acquisition: when action induction (for example changing UI according to the tilt operation of user), shooting
Image stabilization, game control and inertial navigation.
The lower layer of side frame and/or touch display screen 405 in terminal 400 can be set in pressure sensor 413.Work as pressure
When the side frame of terminal 400 is arranged in sensor 413, it can detecte user to the gripping signal of terminal 400, believed according to the gripping
Number carry out right-hand man's identification or prompt operation.When the lower layer of touch display screen 405 is arranged in pressure sensor 413, Ke Yigen
According to user to the pressure operation of touch display screen 405, realization controls the operability control on the interface UI.Operability
Control includes at least one of button control, scroll bar control, icon control, menu control.
Fingerprint sensor 414 is used to acquire the fingerprint of user, according to the identity of collected fingerprint recognition user.Knowing
Not Chu the identity of user when being trusted identity, authorize the user to execute relevant sensitive operation, the sensitive operation by processor 401
Including solution lock screen, check encryption information, downloading software, payment and change setting etc..End can be set in fingerprint sensor 414
Front, the back side or the side at end 400.When being provided with physical button or manufacturer Logo in terminal 400, fingerprint sensor 414 can
To be integrated with physical button or manufacturer Logo.
Optical sensor 415 is for acquiring ambient light intensity.In one embodiment, processor 401 can be according to optics
The ambient light intensity that sensor 415 acquires controls the display brightness of touch display screen 405.Specifically, when ambient light intensity is higher
When, the display brightness of touch display screen 405 is turned up;When ambient light intensity is lower, the display for turning down touch display screen 405 is bright
Degree.In another embodiment, the ambient light intensity that processor 401 can also be acquired according to optical sensor 415, dynamic adjust
The acquisition parameters of CCD camera assembly 406.
Proximity sensor 416, also referred to as range sensor are generally arranged at the front of terminal 400.Proximity sensor 416 is used
In the distance between the front of acquisition user and terminal 400.In one embodiment, when proximity sensor 416 detects user
When the distance between front of terminal 400 gradually becomes smaller, touch display screen 405 is controlled by processor 401 and is cut from bright screen state
It is changed to breath screen state;When proximity sensor 416 detects user and the distance between the front of terminal 400 becomes larger, by
Processor 401 controls touch display screen 405 and is switched to bright screen state from breath screen state.
It will be understood by those skilled in the art that the restriction of the not structure paired terminal 400 of structure shown in Fig. 4, can wrap
It includes than illustrating more or fewer components, perhaps combine certain components or is arranged using different components.
In the exemplary embodiment, a kind of computer readable storage medium is additionally provided, is stored at least in storage medium
One instruction, at least one instruction are loaded by processor and are executed the side to realize the identification maneuver classification in above-described embodiment
Method.For example, the computer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk
With optical data storage devices etc..
Fig. 5 is a kind of structural schematic diagram of computer equipment provided in an embodiment of the present invention, which can be because
Configuration or performance are different and generate bigger difference, may include one or more processors (central
Processing units, CPU) 501 and one or more memory 502, wherein it is stored in the memory 502
There is at least one instruction, at least one instruction is loaded by the processor 501 and executed to realize above-mentioned audio processing
Method.
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware
It completes, relevant hardware can also be instructed to complete by program, the program can store in a kind of computer-readable
In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and
Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.
Claims (12)
1. a kind of method of audio processing, which is characterized in that the described method includes:
Tone color reference audio frame is obtained in target audio, extracts the spectrum envelope characteristic information of the tone color reference audio frame;
Extract the fundamental frequency information of source audio frame identical with the play time of the tone color reference audio frame in source audio;
Based on the fundamental frequency information and the spectrum envelope characteristic information, the corresponding color audio of changing voice of the source audio frame is generated
Frame.
2. the method according to claim 1, wherein the method also includes:
Extract the consonant information of the source audio frame;
It is described to be based on the fundamental frequency information and the spectrum envelope characteristic information, generate the corresponding color sound of changing voice of the source audio frame
Frequency frame, comprising:
Based on the fundamental frequency information, the spectrum envelope characteristic information and the consonant information, it is corresponding to generate the source audio frame
Color audio frame of changing voice.
3. the method according to claim 1, wherein it is described in target audio obtain tone color reference audio frame it
Before, further includes:
Modified tone is carried out to the source audio to handle to obtain the target audio.
4. according to the method described in claim 3, it is characterized in that, it is described to the source audio carry out modified tone handle to obtain it is described
Target audio, comprising:
In the source audio, at interval of the source audio frame of the first preset number, the source audio frame of the second preset number is chosen, it is multiple
The source audio frame of duplication is inserted into after the source audio frame of selection by the source audio frame for making second preset number, obtains institute
State the corresponding reduction of speed audio of source audio;
Resampling is carried out to the reduction of speed audio, obtains the identical target of and duration identical as the frame number of the source audio
Audio.
5. the method according to claim 1, wherein it is described in target audio obtain tone color reference audio frame it
Before, further includes:
Show local audio list;
When receiving the instruction of the selection to the option of the target audio in the local audio list, the target is obtained
Audio.
6. a kind of device of audio processing, which is characterized in that described device includes:
Module is obtained, for obtaining tone color reference audio frame in target audio;
Extraction module, for extracting the spectrum envelope characteristic information of the tone color reference audio frame, extract in source audio with it is described
The fundamental frequency information of the identical source audio frame of the play time of tone color reference audio frame;
It is corresponding to generate the source audio frame for being based on the fundamental frequency information and the spectrum envelope characteristic information for generation module
Color audio frame of changing voice.
7. device according to claim 6, which is characterized in that the extraction module is also used to extract the source audio frame
Consonant information;
The generation module is generated for being based on the fundamental frequency information, the spectrum envelope characteristic information and the consonant information
The corresponding color audio frame of changing voice of the source audio frame.
8. device according to claim 6, which is characterized in that described device further include:
Modify tone module, handles to obtain the target audio for carrying out modified tone to the source audio.
9. device according to claim 8, which is characterized in that the modified tone module is used for:
In the source audio, at interval of the source audio frame of the first preset number, the source audio frame of the second preset number is chosen, it is multiple
The source audio frame of duplication is inserted into after the source audio frame of selection by the source audio frame for making second preset number, obtains institute
State the corresponding reduction of speed audio of source audio;
Resampling is carried out to the reduction of speed audio, obtains the identical target of and duration identical as the frame number of the source audio
Audio.
10. device according to claim 6, which is characterized in that described device further include: display module, for showing this
Ground audio list;
The acquisition module is also used to when the selection for receiving the option to the target audio in the local audio list
When instruction, the target audio is obtained.
11. a kind of terminal, which is characterized in that the terminal includes processor and memory, is stored at least in the memory
One instruction, at least one instruction are loaded by the processor and are executed to realize as described in claim 1 to 6 is any
The method of audio processing.
12. a kind of computer readable storage medium, which is characterized in that be stored at least one instruction, institute in the storage medium
At least one instruction is stated to be loaded by the processor and executed to realize such as the audio processing as described in claim 1 to 6 is any
Method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811066716.6A CN109192218B (en) | 2018-09-13 | 2018-09-13 | Method and apparatus for audio processing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811066716.6A CN109192218B (en) | 2018-09-13 | 2018-09-13 | Method and apparatus for audio processing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109192218A true CN109192218A (en) | 2019-01-11 |
CN109192218B CN109192218B (en) | 2021-05-07 |
Family
ID=64910608
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811066716.6A Active CN109192218B (en) | 2018-09-13 | 2018-09-13 | Method and apparatus for audio processing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109192218B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110097890A (en) * | 2019-04-16 | 2019-08-06 | 北京搜狗科技发展有限公司 | A kind of method of speech processing, device and the device for speech processes |
CN110459196A (en) * | 2019-09-05 | 2019-11-15 | 长沙市回音科技有限公司 | A kind of method, apparatus and system adjusting singing songs difficulty |
CN111435591A (en) * | 2020-01-17 | 2020-07-21 | 珠海市杰理科技股份有限公司 | Sound synthesis method and system, audio processing chip and electronic equipment |
CN111782865A (en) * | 2020-06-23 | 2020-10-16 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio information processing method and device and storage medium |
CN112259072A (en) * | 2020-09-25 | 2021-01-22 | 北京百度网讯科技有限公司 | Voice conversion method and device and electronic equipment |
CN115461809A (en) * | 2020-09-04 | 2022-12-09 | 罗兰株式会社 | Information processing apparatus and information processing method |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1719514A (en) * | 2004-07-06 | 2006-01-11 | 中国科学院自动化研究所 | Based on speech analysis and synthetic high-quality real-time change of voice method |
JP5772739B2 (en) * | 2012-06-21 | 2015-09-02 | ヤマハ株式会社 | Audio processing device |
WO2015166981A1 (en) * | 2014-04-30 | 2015-11-05 | ヤマハ株式会社 | Pitch information generation device, pitch information generation method, program, and computer-readable recording medium |
CN106971703A (en) * | 2017-03-17 | 2017-07-21 | 西北师范大学 | A kind of song synthetic method and device based on HMM |
CN107731241A (en) * | 2017-09-29 | 2018-02-23 | 广州酷狗计算机科技有限公司 | Handle the method, apparatus and storage medium of audio signal |
CN107863095A (en) * | 2017-11-21 | 2018-03-30 | 广州酷狗计算机科技有限公司 | Acoustic signal processing method, device and storage medium |
-
2018
- 2018-09-13 CN CN201811066716.6A patent/CN109192218B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1719514A (en) * | 2004-07-06 | 2006-01-11 | 中国科学院自动化研究所 | Based on speech analysis and synthetic high-quality real-time change of voice method |
JP5772739B2 (en) * | 2012-06-21 | 2015-09-02 | ヤマハ株式会社 | Audio processing device |
WO2015166981A1 (en) * | 2014-04-30 | 2015-11-05 | ヤマハ株式会社 | Pitch information generation device, pitch information generation method, program, and computer-readable recording medium |
CN106971703A (en) * | 2017-03-17 | 2017-07-21 | 西北师范大学 | A kind of song synthetic method and device based on HMM |
CN107731241A (en) * | 2017-09-29 | 2018-02-23 | 广州酷狗计算机科技有限公司 | Handle the method, apparatus and storage medium of audio signal |
CN107863095A (en) * | 2017-11-21 | 2018-03-30 | 广州酷狗计算机科技有限公司 | Acoustic signal processing method, device and storage medium |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110097890A (en) * | 2019-04-16 | 2019-08-06 | 北京搜狗科技发展有限公司 | A kind of method of speech processing, device and the device for speech processes |
CN110097890B (en) * | 2019-04-16 | 2021-11-02 | 北京搜狗科技发展有限公司 | Voice processing method and device for voice processing |
CN110459196A (en) * | 2019-09-05 | 2019-11-15 | 长沙市回音科技有限公司 | A kind of method, apparatus and system adjusting singing songs difficulty |
CN111435591A (en) * | 2020-01-17 | 2020-07-21 | 珠海市杰理科技股份有限公司 | Sound synthesis method and system, audio processing chip and electronic equipment |
CN111782865A (en) * | 2020-06-23 | 2020-10-16 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio information processing method and device and storage medium |
CN115461809A (en) * | 2020-09-04 | 2022-12-09 | 罗兰株式会社 | Information processing apparatus and information processing method |
US11922913B2 (en) | 2020-09-04 | 2024-03-05 | Roland Corporation | Information processing device, information processing method, and non-transitory computer readable recording medium |
CN112259072A (en) * | 2020-09-25 | 2021-01-22 | 北京百度网讯科技有限公司 | Voice conversion method and device and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN109192218B (en) | 2021-05-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109192218A (en) | The method and apparatus of audio processing | |
CN109379643A (en) | Image synthesizing method, device, terminal and storage medium | |
CN109729297A (en) | The method and apparatus of special efficacy are added in video | |
CN108008930A (en) | The method and apparatus for determining K song score values | |
CN108595239A (en) | image processing method, device, terminal and computer readable storage medium | |
CN110688082B (en) | Method, device, equipment and storage medium for determining adjustment proportion information of volume | |
CN108965757A (en) | video recording method, device, terminal and storage medium | |
CN108922506A (en) | Song audio generation method, device and computer readable storage medium | |
CN109147757A (en) | Song synthetic method and device | |
CN108538302A (en) | The method and apparatus of Composite tone | |
CN110491358A (en) | Carry out method, apparatus, equipment, system and the storage medium of audio recording | |
CN109300485A (en) | Methods of marking, device, electronic equipment and the computer storage medium of audio signal | |
CN109168073A (en) | The method and apparatus that direct broadcasting room cover is shown | |
CN108965922A (en) | Video cover generation method, device and storage medium | |
CN109346111A (en) | Data processing method, device, terminal and storage medium | |
CN108320756A (en) | It is a kind of detection audio whether be absolute music audio method and apparatus | |
CN109547843A (en) | The method and apparatus that audio-video is handled | |
CN109003621A (en) | A kind of audio-frequency processing method, device and storage medium | |
CN110956971A (en) | Audio processing method, device, terminal and storage medium | |
CN107871012A (en) | Audio-frequency processing method, device, storage medium and terminal | |
CN108922562A (en) | Sing evaluation result display methods and device | |
CN109065068A (en) | Audio-frequency processing method, device and storage medium | |
CN110189771A (en) | With the sound quality detection method, device and storage medium of source audio | |
CN109192223A (en) | The method and apparatus of audio alignment | |
CN109243479A (en) | Acoustic signal processing method, device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |