CN109391842A - A kind of dubbing method, mobile terminal - Google Patents
A kind of dubbing method, mobile terminal Download PDFInfo
- Publication number
- CN109391842A CN109391842A CN201811368673.7A CN201811368673A CN109391842A CN 109391842 A CN109391842 A CN 109391842A CN 201811368673 A CN201811368673 A CN 201811368673A CN 109391842 A CN109391842 A CN 109391842A
- Authority
- CN
- China
- Prior art keywords
- feature
- main object
- video data
- area
- facial
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
Abstract
The present invention provides a kind of dubbing method, mobile terminal and computer readable storage mediums, are related to technical field of image processing.Wherein, which comprises receive video data to be dubbed;Determine the characteristic information of each frame image in the video data;The video data is dubbed according to the characteristic information.The embodiment of the present invention can carry out automatic dubbing to video data, avoid manually dubbing and being synthesized, improve the efficiency dubbed according to the characteristic information of frame image each in video data.
Description
Technical field
The present invention relates to field of image processing more particularly to a kind of dubbing methods, mobile terminal.
Background technique
Currently, small video, GIF (graphic interchange format) cardon and dynamic expression etc. are that online social and amusement is essential
Element.People promote mutual emotion by distributing interesting small video, by send out some dynamic expression or GIF figure
Piece substitutes the personal text to be expressed.
Sometimes, small video, which lacks, dubs or user desires to it and realizes other dubbed effect.Especially now
More and more people love to keep pets, and shoot the video even more a kind of prevalence to pet, are even more people to pet video addition dubbed effect
A kind of enjoyment.Moreover, adding dubbed effect to cardon or expression packet, the effect of its expression can be more shown, while may be used also
To increase whole interest.
In the prior art, it to realize and be dubbed to video, need the Video processing software and audio processing software of profession, use
Family individual dubs adjusting synthesis, for the user of not professional technique, realizes the process very complicated dubbed.
Summary of the invention
The present invention provides a kind of dubbing method, to solve the problems, such as user's dubbing process very complicated.
In a first aspect, it is applied to mobile terminal the embodiment of the invention provides a kind of dubbing method, this method comprises:
Receive video data to be dubbed;
Determine the characteristic information of each frame image in the video data;
The video data is dubbed according to the characteristic information.
Second aspect, the embodiment of the invention provides a kind of mobile terminal, which includes:
Receiving module, for receiving video data to be dubbed;
Determining module, for determining the characteristic information of each frame image in the video data;
Module is dubbed, for dubbing according to the characteristic information to the video data.
The third aspect provides a kind of mobile terminal, which includes processor, memory and be stored in described deposit
On reservoir and the computer program that can run on the processor, the computer program are realized when being executed by the processor
The step of dubbing method of the present invention.
Fourth aspect provides a kind of computer readable storage medium, stores and calculates on the computer readable storage medium
Machine program, the step of dubbing method of the present invention is realized when computer program is executed by processor.
In embodiments of the present invention, by receiving video data to be dubbed;Determine each frame image in the video data
Characteristic information;The video data is dubbed according to the characteristic information.The embodiment of the present invention can be according to video counts
The characteristic information of each frame image in carries out automatic dubbing to video data, avoids manually dubbing and being synthesized, improve and match
The efficiency of sound.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by institute in the description to the embodiment of the present invention
Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention
Example, for those of ordinary skill in the art, without any creative labor, can also be according to these attached drawings
Obtain other attached drawings.
Fig. 1 shows the flow chart of one of the embodiment of the present invention one dubbing method;
Fig. 2 shows the flow charts of one of the embodiment of the present invention two dubbing method;
Fig. 3 shows the structural block diagram of one of the embodiment of the present invention three mobile terminal;
Fig. 4 shows the structural block diagram of one of the embodiment of the present invention three mobile terminal;
Fig. 5 shows the structural block diagram of one of the embodiment of the present invention four mobile terminal.
Specific embodiment
The exemplary embodiment that the present invention will be described in more detail below with reference to accompanying drawings.Although showing the present invention in attached drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the present invention without should be by embodiments set forth here
It is limited.It is to be able to thoroughly understand the present invention on the contrary, providing these embodiments, and can be by the scope of the present invention
It is fully disclosed to those skilled in the art.
Embodiment one
Referring to Fig.1, the flow chart of the dubbing method of the embodiment of the present invention one is shown, can specifically include following steps:
Step 101, video data to be dubbed is received.
In embodiments of the present invention, video data includes small video or GIF cardon or dynamic expression etc., wherein small video
For the video with limited frame, such as the video for the 10s that wechat is recorded.
In embodiments of the present invention, video data further includes the video of other forms, without restriction herein.
Step 102, the characteristic information of each frame image in the video data is determined.
In embodiments of the present invention, the characteristic information of each frame image includes: main object, the main object in each frame image
Form, background locating for main object etc..
Wherein, main object includes: personage, animal, cartoon character, and wherein personage can be divided into old man, youth and children.It is dynamic
Object can classify according to species taxonomy, cartoon according to cartoon character.The form of main object includes: to stand, and is climbed on object
Or eat, the state or movement of equal main objects of speaking.Background locating for main object includes: ring locating for main object
Border, weather etc..
For example, identifying that the main object of each frame image in video data is spadger, the form of spadger is at home
It has a meal.The information that then characteristic information of each frame image is joined together is that spadger eats in.
Step 103, the video data is dubbed according to the characteristic information.
In embodiments of the present invention, information can be dubbed for the storage of various characteristic informations is corresponding in advance, wherein match message
Breath includes text and acoustic information.For example, can be adjusted when the characteristic information for determining video data eats in for spadger
It is dubbed information with corresponding this video data is dubbed.The corresponding relationship for dubbing information and characteristic information is established in advance, when
When identifying corresponding characteristic information, corresponding information of dubbing can be called directly, video data is dubbed.
For example, can call directly when characteristic information is at table for spadger and dub information " good perfume Na Nana " progress
It dubs, wherein dubbing the sound of speaking that information can record spadger in advance, or is synthesized according to computer.Work as characteristic information
Vehicle is being chased after for dog, can call directly and information " where walking " dubbed, wherein dubbing information can record in advance or according to computer
It is synthesized.It is above-mentioned to dub the interest that increase video data.
In embodiments of the present invention, by receiving video data to be dubbed;Determine each frame image in the video data
Characteristic information;The video data is dubbed according to the characteristic information.The embodiment of the present invention can be according to video counts
The characteristic information of each frame image in carries out automatic dubbing to video data, avoids manually dubbing and being synthesized, improve and match
The efficiency of sound.
Embodiment two
Referring to Fig. 2, the flow chart of the dubbing method of the embodiment of the present invention two is shown, can specifically include following steps:
Step 201, video data to be dubbed is received.
Referring to step 101, details are not described herein.
Step 202, the main object and background information of each frame image in the video data are determined.
In embodiments of the present invention, the characteristic information includes: body feature and background characteristics.
In embodiments of the present invention, it is personage or animal or cartoon etc. that main object, which includes main object,.Or main body pair
As specifically why biology or object.Such as main object be spadger or old lady or young schoolgirl or dog or cat,
Or Sailor Moon or Doraemon or desk with espressiove, stool etc..
In embodiments of the present invention, background characteristics includes: background environment at main object, such as the woods, family,
School, road etc..
In embodiments of the present invention, the region where main object can be plucked out, then identifies main body using diagram technology is scratched
Object;Identification removes other regions in the region where main object, identifies background information.Improve the accuracy of identification.
Step 203, the body feature of the main object is determined.
In embodiments of the present invention, the body feature includes: that the global feature, facial characteristics, oral area of main object are special
Levy wherein at least one.
In embodiments of the present invention, global feature refers to the form of main object entirety, such as when main object is personage,
Global feature includes: personage in standing or be seated or recumbency or running or dance or swimming.Specifically, whole
Feature refers to the state that the movement of the entire body of people is showed.Facial characteristics can refer to main object facial expression: such as surprised,
Fear, detest, indignation, happiness, sadness etc..Oral area feature refers to the shape of the mouth as one speaks, for personage, the shape of the mouth as one speaks have it is a variety of, can basis
The shape of the mouth as one speaks determines the content that personage is intended by;For animal, the shape of the mouth as one speaks has specific several situations, big to open, part a little or closed.
For cartoon image, nozzle type can be enriched as personage there are many shape.
In embodiments of the present invention, when the body feature includes global feature, the step 203 includes:
Sub-step 2031 extracts the body region in each frame image in the video data.
In embodiments of the present invention, the body region in each frame image can be extracted using diagram technology is scratched.When main body pair
As there was only the figure viewed from behind or side, when present face is not with oral area on the image, body region is only extracted.
Sub-step 2032 determines the global feature of main object according to the body region.
In embodiments of the present invention, body region is identified, wherein when the main object of body region is personage or animal
When, global feature can be identified by identifying the limb action of main object.
Sub-step 2032 specifically: if being extracted body region, the body region is input to the first identification model, is obtained
Obtain the global feature of the main object.
In embodiments of the present invention, the first identification model can be obtained according to the training of the first training sample in advance, wherein first
Training sample is the corresponding image of main body region, global feature is description to main body to picture.Then using the first identification model
When, body region is input to the first identification model, obtains the global feature of main object.
In embodiments of the present invention, when the body feature includes global feature and facial characteristics, step 203 packet
It includes:
Sub-step 2033 extracts the face in the body region and body region in each frame image in the video data
Region;
In embodiments of the present invention, the body region in each frame image can be extracted using diagram technology is scratched, in needs pair
When facial characteristics is determined, the facial area in body region is extracted.
When main object is birds or other do not have the animal of the shape of the mouth as one speaks, the body region and face of each frame image are extracted
Region.
Sub-step 2034 determines the global feature of main object according to the body region;
Sub-step 2035 determines the facial characteristics of main object according to the facial area;
In embodiments of the present invention, sub-step 2035 specifically: the facial area is input to the second identification model, is obtained
Obtain the facial characteristics of the main object.
In embodiments of the present invention, the second identification model can be obtained according to the training of the second training sample in advance, wherein first
Training sample is the corresponding image of facial area, facial characteristics is description to main body to image planes portion.Then using the second identification
When model, facial area is input to the second identification model, obtains the facial characteristics of main object.
In embodiments of the present invention, described when the body feature includes global feature, facial characteristics and oral area feature
Step 203 includes:
Sub-step 2036 extracts the body region in each frame image in the video data, the face in body region
Mouth area in region and facial area;
In embodiments of the present invention, the body region in each frame image can be extracted using diagram technology is scratched, in needs pair
When facial characteristics is determined, the facial area extracted in body region directly exists when needing to analyze mouth area
Mouth area is extracted in body region to be analyzed, or is extracted mouth area in facial area and analyzed.
It in embodiments of the present invention, when main object is for personage or with the animal of the shape of the mouth as one speaks, and is personage or animal
Front when, that is, extract body region, facial area and the mouth area of each frame image.
Sub-step 2037 determines the global feature of main object according to the body region;
Sub-step 2038 determines the facial characteristics of main object according to the facial area;
Sub-step 2039 determines the oral area feature of main object according to the mouth area;
In embodiments of the present invention, sub-step 2039 specifically: the mouth area is input to third identification model, is obtained
Obtain the oral area feature of the main object.
In embodiments of the present invention, third identification model can be obtained according to the training of third training sample in advance, wherein third
Training sample is the corresponding image of mouth area, oral area feature is to main body to the description as oral area.Then identified using third
When model, mouth area is input to third identification model, obtains the oral area feature of main object.
In embodiments of the present invention, when the body feature includes facial characteristics and oral area feature, step 203 packet
It includes:
Sub-step 20310 extracts the mouth in the facial area and facial area in each frame image in the video data
Portion region;
In embodiments of the present invention, some images only have face area and mouth area, and there is no limbs, when in such feelings
Under condition, face area therein and mouth area are only extracted.
Sub-step 20311 determines the facial characteristics of main object according to the facial area;
Sub-step 20312 determines the oral area feature of main object according to the mouth area.
In embodiments of the present invention, when image is there was only facial area and mouth area, wherein facial area is only extracted
And mouth area, determine corresponding feature.
Step 204, the background characteristics of the background information is determined.
In embodiments of the present invention, step 205 includes: by the background information of each frame image, the 4th identification mould of input
Type obtains the background characteristics.
In embodiments of the present invention, the 4th identification model can be obtained according to the training of the 4th training sample in advance, wherein the 4th
Training sample is the corresponding image in background area, background characteristics is description to main body to picture.Then using the 4th identification model
When, background area is input to the 4th identification model, obtains the background characteristics of main object.
Step 205, according to the body feature, target voice sound information is determined.
In embodiments of the present invention, it can be achieved that storing corresponding target voice sound information according to body feature, wherein voice letter
Breath includes tone, tone color and volume.For example, when body feature personage running when, facial characteristics be pain expression, the shape of the mouth as one speaks
For the slowly varying shape of the mouth as one speaks, it is determined that the frequency of tone is a certain range.Volume can be for compared with amount of bass.Tone color can be according to personage spy
Property determine.
In embodiments of the present invention, it can be achieved that different subjects feature and voice sound information are established corresponding relationship, then in determination
When body feature, corresponding target voice sound information is called directly.
Step 206, target text information is determined according to the body feature and the background characteristics.
In embodiments of the present invention, combination that can in advance to different subjects feature and background characteristics, stores corresponding target
Text information.After determining body feature and background characteristics, corresponding target text information is called directly.
Step 207, the video data is dubbed according to the target voice sound information and the target text information.
In embodiments of the present invention, it obtains dubbing information after target voice sound information and target text information being combined,
It will dub information and after video data synthesized, complete to dub.
In embodiments of the present invention, the step 207 includes:
Sub-step 2071 is based on generic sound library, obtains the corresponding generic sound data of the target text information,
In, the generic sound library includes: the corresponding generic sound data of each text information.
In embodiments of the present invention, each text information includes all texts that may be used during expression, each text
The generic sound data for thering is oneself to fix.Wherein, generic sound data include: tone color, tone and volume.I.e. each text
There are fixed tone color, tone and volume.In example in real time of the invention, each text can store identical tone color, tone and sound
Amount composition generic sound library.
In embodiments of the present invention, in generic sound library, each text has volume, tone and the tone color of standard.
Sub-step 2072 is adjusted the generic sound data according to the target voice sound information, obtains target sound
Sound data.
In embodiments of the present invention, when obtaining target voice sound information, according to target voice sound information, to target text information pair
The acoustic information in generic sound data is answered to be adjusted.The volume, tone color and tone of target text information is set to be adjusted to target
The effect of voice sound information.
Sub-step 2073 is based on the target voice data and the target text information, carries out to the video data
It dubs.
In embodiments of the present invention, each text has the multiple combinations of tone, tone color and volume, when to each text
Every kind of acoustic information stored, need biggish database, therefore, each text be stored as generic sound data, when
After determining target voice sound information, directly generic sound data are adjusted, reduce the amount of storage of text and voice sound information.
In embodiments of the present invention, by receiving video data to be dubbed;Determine each frame image in the video data
Characteristic information;The video data is dubbed according to the characteristic information.The embodiment of the present invention can be according to video counts
The characteristic information of each frame image in carries out automatic dubbing to video data, avoids manually dubbing and being synthesized, improve and match
The efficiency of sound.
Embodiment three
Referring to Fig. 3, a kind of structural block diagram of mobile terminal 300 of the embodiment of the present invention three is shown, can specifically include:
Receiving module 301, for receiving video data to be dubbed;
Determining module 302, for determining the characteristic information of each frame image in the video data;
Module 303 is dubbed, for dubbing according to the characteristic information to the video data.
Optionally, on the basis of Fig. 3, referring to Fig. 4, the determining module 302 includes:
First determination unit 3021, for determining the main object and background information of each frame image in the video data;
Wherein, the characteristic information includes: body feature and background characteristics;
Second determination unit 3022, for determining the body feature of the main object;
Third determination unit 3023, for determining the background characteristics of the background information;
The module 303 of dubbing includes:
4th determination unit 3031, for determining target voice sound information according to the body feature;
5th determination unit 3032, for determining target text information according to the body feature and the background characteristics;
Unit 3033 is dubbed, is used for according to the target voice sound information and the target text information to the video data
It is dubbed.
When the body feature includes global feature, second determination unit 3022, comprising:
First extracts subelement, for extracting the body region in each frame image in the video data;
First determines subelement, for determining the global feature of main object according to the body region;
When the body feature includes global feature and facial characteristics, second determination unit 3022, comprising:
Second extracts subelement, for extracting body region and body region in each frame image in the video data
In facial area;
Second determines subelement, for determining the global feature of main object according to the body region;
Third determines subelement, for determining the facial characteristics of main object according to the facial area;
When the body feature includes global feature, facial characteristics and oral area feature, second determination unit 3022,
Include:
Third extracts subelement, for extracting body region, body region in each frame image in the video data
In facial area and facial area in mouth area;
4th determines subelement, for determining the global feature of main object according to the body region;
5th determines subelement, for determining the facial characteristics of main object according to the facial area;
6th determines subelement, for determining the oral area feature of main object according to the mouth area;
When the body feature includes facial characteristics and oral area feature, second determination unit 3022, comprising:
4th extracts subelement, for extracting facial area and facial area in each frame image in the video data
In mouth area;
7th determines that subelement determines the facial characteristics of main object according to the facial area;
8th determines that subelement determines the oral area feature of main object according to the mouth area.
Described first determines subelement, specifically for the body region is input to the first identification model, described in acquisition
The global feature of main object;
Described second determines subelement, specifically for the facial area is input to the second identification model, described in acquisition
The facial characteristics of main object;
The third determines subelement, specifically for the mouth area is input to third identification model, described in acquisition
The oral area feature of main object;
The third determination unit, comprising:
4th determines subelement, for by the background information of each frame image, inputting the 4th identification model, described in acquisition
Background characteristics.
It is then described to dub unit 3033, comprising:
Subelement is obtained, generic sound library is based on, obtains the corresponding generic sound data of the target text information,
In, the generic sound library includes: the corresponding generic sound data of each text information;
Subelement is obtained, for being adjusted to the generic sound data according to the target voice sound information, obtains mesh
Mark voice data;
With sub-unit, for being based on the target voice data and the target text information, to the video data
It is dubbed.
Mobile terminal provided in an embodiment of the present invention can be realized mobile terminal in the embodiment of the method for Fig. 1 to Fig. 2 and realize
Each process, to avoid repeating, which is not described herein again.
In embodiments of the present invention, mobile terminal is by receiving video data to be dubbed;It determines in the video data
The characteristic information of each frame image;The video data is dubbed according to the characteristic information.The embodiment of the present invention is mobile eventually
End can carry out automatic dubbing to video data, avoid manually dubbing simultaneously according to the characteristic information of frame image each in video data
It is synthesized, improves the efficiency dubbed.
Example IV
A kind of hardware structural diagram of Fig. 5 mobile terminal of each embodiment to realize the present invention,
The mobile terminal 500 includes but is not limited to: radio frequency unit 501, network module 502, audio output unit 503, defeated
Enter unit 504, sensor 505, display unit 506, user input unit 507, interface unit 508, memory 509, processor
The components such as 510 and power supply 511.It will be understood by those skilled in the art that mobile terminal structure shown in Fig. 5 is not constituted
Restriction to mobile terminal, mobile terminal may include than illustrating more or fewer components, perhaps combine certain components or
Different component layouts.In embodiments of the present invention, mobile terminal include but is not limited to mobile phone, tablet computer, laptop,
Palm PC, car-mounted terminal, wearable device and pedometer etc..
Processor 510, for receiving video data to be dubbed;Determine the feature letter of each frame image in the video data
Breath;The video data is dubbed according to the characteristic information.
In embodiments of the present invention, by receiving video data to be dubbed;Determine each frame image in the video data
Characteristic information;The video data is dubbed according to the characteristic information.The embodiment of the present invention can be according to video counts
The characteristic information of each frame image in carries out automatic dubbing to video data, avoids manually dubbing and being synthesized, improve and match
The efficiency of sound.
It should be understood that the embodiment of the present invention in, radio frequency unit 501 can be used for receiving and sending messages or communication process in, signal
Send and receive, specifically, by from base station downlink data receive after, to processor 510 handle;In addition, by uplink
Data are sent to base station.In general, radio frequency unit 501 includes but is not limited to antenna, at least one amplifier, transceiver, coupling
Device, low-noise amplifier, duplexer etc..In addition, radio frequency unit 501 can also by wireless communication system and network and other set
Standby communication.
Mobile terminal provides wireless broadband internet by network module 502 for user and accesses, and such as user is helped to receive
It sends e-mails, browse webpage and access streaming video etc..
Audio output unit 503 can be received by radio frequency unit 501 or network module 502 or in memory 509
The audio data of storage is converted into audio signal and exports to be sound.Moreover, audio output unit 503 can also be provided and be moved
The relevant audio output of specific function that dynamic terminal 500 executes is (for example, call signal receives sound, message sink sound etc.
Deng).Audio output unit 503 includes loudspeaker, buzzer and receiver etc..
Input unit 504 is for receiving audio or video signal.Input unit 504 may include graphics processor
(Graphics Processing Unit, GPU) 5041 and microphone 5042, graphics processor 5041 is in video acquisition mode
Or the image data of the static images or video obtained in image capture mode by image capture apparatus (such as camera) carries out
Reason.Treated, and picture frame may be displayed on display unit 506.Through graphics processor 5041, treated that picture frame can be deposited
Storage is sent in memory 509 (or other storage mediums) or via radio frequency unit 501 or network module 502.Mike
Wind 5042 can receive sound, and can be audio data by such acoustic processing.Treated audio data can be
The format output that mobile communication base station can be sent to via radio frequency unit 501 is converted in the case where telephone calling model.
Mobile terminal 500 further includes at least one sensor 505, such as optical sensor, motion sensor and other biographies
Sensor.Specifically, optical sensor includes ambient light sensor and proximity sensor, wherein ambient light sensor can be according to environment
The light and shade of light adjusts the brightness of display panel 5061, and proximity sensor can close when mobile terminal 500 is moved in one's ear
Display panel 5061 and/or backlight.As a kind of motion sensor, accelerometer sensor can detect in all directions (general
For three axis) size of acceleration, it can detect that size and the direction of gravity when static, can be used to identify mobile terminal posture (ratio
Such as horizontal/vertical screen switching, dependent game, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, tap);It passes
Sensor 505 can also include fingerprint sensor, pressure sensor, iris sensor, molecule sensor, gyroscope, barometer, wet
Meter, thermometer, infrared sensor etc. are spent, details are not described herein.
Display unit 506 is for showing information input by user or being supplied to the information of user.Display unit 506 can wrap
Display panel 5061 is included, liquid crystal display (Liquid Crystal Display, LCD), Organic Light Emitting Diode can be used
Forms such as (Organic Light-Emitting Diode, OLED) configure display panel 5061.
User input unit 507 can be used for receiving the number or character information of input, and generate the use with mobile terminal
Family setting and the related key signals input of function control.Specifically, user input unit 507 include touch panel 5071 and
Other input equipments 5072.Touch panel 5071, also referred to as touch screen collect the touch operation of user on it or nearby
(for example user uses any suitable subjects or attachment such as finger, stylus on touch panel 5071 or in touch panel
Operation near 5071).Touch panel 5071 may include both touch detecting apparatus and touch controller.Wherein, it touches
Detection device detects the touch orientation of user, and detects touch operation bring signal, transmits a signal to touch controller;Touching
It touches controller and receives touch information from touch detecting apparatus, and be converted into contact coordinate, then give processor 510, connect
It receives the order that processor 510 is sent and is executed.Furthermore, it is possible to using resistance-type, condenser type, infrared ray and surface acoustic wave
Equal multiple types realize touch panel 5071.In addition to touch panel 5071, user input unit 507 can also include other inputs
Equipment 5072.Specifically, other input equipments 5072 can include but is not limited to physical keyboard, function key (such as volume control
Key, switch key etc.), trace ball, mouse, operating stick, details are not described herein.
Further, touch panel 5071 can be covered on display panel 5061, when touch panel 5071 is detected at it
On or near touch operation after, send processor 510 to determine the type of touch event, be followed by subsequent processing device 510 according to touching
The type for touching event provides corresponding visual output on display panel 5061.Although in Fig. 5, touch panel 5071 and display
Panel 5061 is the function that outputs and inputs of realizing mobile terminal as two independent components, but in some embodiments
In, can be integrated by touch panel 5071 and display panel 5061 and realize the function that outputs and inputs of mobile terminal, it is specific this
Place is without limitation.
Interface unit 508 is the interface that external device (ED) is connect with mobile terminal 500.For example, external device (ED) may include having
Line or wireless head-band earphone port, external power supply (or battery charger) port, wired or wireless data port, storage card end
Mouth, port, the port audio input/output (I/O), video i/o port, earphone end for connecting the device with identification module
Mouthful etc..Interface unit 508 can be used for receiving the input (for example, data information, electric power etc.) from external device (ED) and
By one or more elements that the input received is transferred in mobile terminal 500 or can be used in 500 He of mobile terminal
Data are transmitted between external device (ED).
Memory 509 can be used for storing software program and various data.Memory 509 can mainly include storing program area
The storage data area and, wherein storing program area can (such as the sound of application program needed for storage program area, at least one function
Sound playing function, image player function etc.) etc.;Storage data area can store according to mobile phone use created data (such as
Audio data, phone directory etc.) etc..In addition, memory 509 may include high-speed random access memory, it can also include non-easy
The property lost memory, a for example, at least disk memory, flush memory device or other volatile solid-state parts.
Processor 510 is the control centre of mobile terminal, utilizes each of various interfaces and the entire mobile terminal of connection
A part by running or execute the software program and/or module that are stored in memory 509, and calls and is stored in storage
Data in device 509 execute the various functions and processing data of mobile terminal, to carry out integral monitoring to mobile terminal.Place
Managing device 510 may include one or more processing units;Preferably, processor 510 can integrate application processor and modulatedemodulate is mediated
Manage device, wherein the main processing operation system of application processor, user interface and application program etc., modem processor is main
Processing wireless communication.It is understood that above-mentioned modem processor can not also be integrated into processor 510.
Mobile terminal 500 can also include the power supply 511 (such as battery) powered to all parts, it is preferred that power supply 511
Can be logically contiguous by power-supply management system and processor 510, to realize management charging by power-supply management system, put
The functions such as electricity and power managed.
In addition, mobile terminal 500 includes some unshowned functional modules, details are not described herein.
Preferably, the embodiment of the present invention also provides a kind of mobile terminal, including processor 510, and memory 509 is stored in
On memory 509 and the computer program that can run on the processor 510, the computer program are executed by processor 510
Each process of the above-mentioned dubbing method embodiment of Shi Shixian, and identical technical effect can be reached, to avoid repeating, here no longer
It repeats.
The embodiment of the present invention also provides a kind of computer readable storage medium, and meter is stored on computer readable storage medium
Calculation machine program, the computer program realize each process of above-mentioned dubbing method embodiment when being executed by processor, and can reach
Identical technical effect, to avoid repeating, which is not described herein again.Wherein, the computer readable storage medium is deposited Ru read-only
Reservoir (Read-Only Memory, abbreviation ROM), random access memory (Random Access Memory, abbreviation RAM),
Magnetic or disk etc..
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row
His property includes, so that the process, method, article or the device that include a series of elements not only include those elements, and
And further include other elements that are not explicitly listed, or further include for this process, method, article or device institute it is intrinsic
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do
There is also other identical elements in the process, method of element, article or device.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases
The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art
The part contributed out can be embodied in the form of software products, which is stored in a storage medium
In (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal (can be mobile phone, computer, service
Device, air conditioner or network equipment etc.) execute method described in each embodiment of the present invention.
The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific
Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art
Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much
Form belongs within protection of the invention.
Claims (12)
1. a kind of dubbing method is applied to mobile terminal, which is characterized in that the described method includes:
Receive video data to be dubbed;
Determine the characteristic information of each frame image in the video data;
The video data is dubbed according to the characteristic information.
2. dubbing method according to claim 1, which is characterized in that the characteristic information includes: body feature and background
Feature;In the determination video data the step of characteristic information of each frame image, comprising:
Determine the main object and background information of each frame image in the video data;
Determine the body feature of the main object;
Determine the background characteristics of the background information;
The described the step of video data is dubbed according to the characteristic information, comprising:
According to the body feature, target voice sound information is determined;
Target text information is determined according to the body feature and the background characteristics;
The video data is dubbed according to the target voice sound information and the target text information.
3. dubbing method according to claim 2, which is characterized in that
When the body feature includes global feature, the step of the body feature of the determination main object, comprising:
Extract the body region in each frame image in the video data;
According to the body region, the global feature of main object is determined;
When the body feature includes global feature and facial characteristics, the step of the body feature of the determination main object
Suddenly, comprising:
Extract the facial area in the body region and body region in each frame image in the video data;
According to the body region, the global feature of main object is determined;
According to the facial area, the facial characteristics of main object is determined;
When the body feature includes global feature, facial characteristics and oral area feature, the master of the determination main object
The step of body characteristics, comprising:
Extract the body region in each frame image in the video data, in facial area and facial area in body region
Mouth area;
According to the body region, the global feature of main object is determined;
According to the facial area, the facial characteristics of main object is determined;
According to the mouth area, the oral area feature of main object is determined;
When the body feature includes facial characteristics and oral area feature, the step of the body feature of the determination main object
Suddenly, comprising:
Extract the mouth area in the facial area and facial area in each frame image in the video data;
According to the facial area, the facial characteristics of main object is determined;
According to the mouth area, the oral area feature of main object is determined.
4. dubbing method according to claim 3, which is characterized in that it is described according to the body region, determine main body pair
The step of global feature of elephant, comprising:
The body region is input to the first identification model, obtains the global feature of the main object;
It is described according to the facial area, the step of determining the facial characteristics of main object, comprising:
The facial area is input to the second identification model, obtains the facial characteristics of the main object;
It is described according to the mouth area, the step of determining the oral area feature of main object, comprising:
The mouth area is input to third identification model, obtains the oral area feature of the main object;
The step of background characteristics of the determination background information, comprising:
By the background information of each frame image, the 4th identification model is inputted, the background characteristics is obtained.
5. according to the method described in claim 2, it is characterized in that,
The described the step of video data is dubbed according to the target voice sound information and the target text information, packet
It includes:
Based on generic sound library, the corresponding generic sound data of the target text information are obtained, wherein the generic sound library
It include: the corresponding generic sound data of each text information;
According to the target voice sound information, the generic sound data are adjusted, obtain target voice data;
Based on the target voice data and the target text information, the video data is dubbed.
6. a kind of mobile terminal, which is characterized in that the mobile terminal includes:
Receiving module, for receiving video data to be dubbed;
Determining module, for determining the characteristic information of each frame image in the video data;
Module is dubbed, for dubbing according to the characteristic information to the video data.
7. mobile terminal according to claim 6, which is characterized in that the characteristic information includes: body feature and background
Feature;The determining module, comprising:
First determination unit, for determining the main object and background information of each frame image in the video data;
Second determination unit, for determining the body feature of the main object;
Third determination unit, for determining the background characteristics of the background information;
It is described to dub module, comprising:
4th determination unit, for determining target voice sound information according to the body feature;
5th determination unit, for determining target text information according to the body feature and the background characteristics;
Unit is dubbed, for matching according to the target voice sound information and the target text information to the video data
Sound.
8. mobile terminal according to claim 7, which is characterized in that the body feature includes:
When the body feature includes global feature, second determination unit, comprising:
First extracts subelement, for extracting the body region in each frame image in the video data;
First determines subelement, for determining the global feature of main object according to the body region;
When the body feature includes global feature and facial characteristics, second determination unit, comprising:
Second extracts subelement, for extracting in the body region and body region in each frame image in the video data
Facial area;
First determines subelement, for determining the global feature of main object according to the body region;
Second determines subelement, for determining the facial characteristics of main object according to the facial area;
When the body feature includes global feature, facial characteristics and oral area feature, second determination unit, comprising:
Third extracts subelement, for extracting the body region in each frame image in the video data, in body region
Mouth area in facial area and facial area;
First determines subelement, for determining the global feature of main object according to the body region;
Second determines subelement, for determining the facial characteristics of main object according to the facial area;
Third determines subelement, for determining the oral area feature of main object according to the mouth area;
When the body feature includes facial characteristics and oral area feature, second determination unit, comprising:
4th extracts subelement, for extracting in the facial area and facial area in each frame image in the video data
Mouth area;
Second determines that subelement determines the facial characteristics of main object according to the facial area;
Third determines subelement, according to the mouth area, determines the oral area feature of main object.
9. mobile terminal according to claim 8, which is characterized in that
Described first determines that subelement obtains the main body specifically for the body region is input to the first identification model
The global feature of object;
Described second determines that subelement obtains the main body specifically for the facial area is input to the second identification model
The facial characteristics of object;
The third determines subelement, specifically for the mouth area is input to third identification model, obtains the main body
The oral area feature of object;
The third determination unit, comprising:
4th determines subelement, for inputting the background information of each frame image the 4th identification model, obtaining the background
Feature.
10. mobile terminal according to claim 7, which is characterized in that further include:
Module is provided, for providing in generic sound library;The generic sound library includes: the corresponding generic sound of each text information
Data;
It is then described to dub unit, comprising:
Subelement is obtained, for being based on the generic sound library, obtains the corresponding generic sound data of the target text information;
Subelement is obtained, for being adjusted to the generic sound data according to the target voice sound information, obtains target sound
Sound data;
With sub-unit, for being based on the target voice data and the target text information, the video data is carried out
It dubs.
11. a kind of mobile terminal, which is characterized in that including processor, memory and be stored on the memory and can be in institute
The computer program run on processor is stated, such as claim 1 to 5 is realized when the computer program is executed by the processor
Any one of described in dubbing method the step of.
12. a kind of computer readable storage medium, which is characterized in that store computer journey on the computer readable storage medium
Sequence realizes the step of the dubbing method as described in any one of claims 1 to 5 when the computer program is executed by processor
Suddenly.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811368673.7A CN109391842B (en) | 2018-11-16 | 2018-11-16 | Dubbing method and mobile terminal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811368673.7A CN109391842B (en) | 2018-11-16 | 2018-11-16 | Dubbing method and mobile terminal |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109391842A true CN109391842A (en) | 2019-02-26 |
CN109391842B CN109391842B (en) | 2021-01-26 |
Family
ID=65429646
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811368673.7A Active CN109391842B (en) | 2018-11-16 | 2018-11-16 | Dubbing method and mobile terminal |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109391842B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110459200A (en) * | 2019-07-05 | 2019-11-15 | 深圳壹账通智能科技有限公司 | Phoneme synthesizing method, device, computer equipment and storage medium |
CN111046814A (en) * | 2019-12-18 | 2020-04-21 | 维沃移动通信有限公司 | Image processing method and electronic device |
CN112954453A (en) * | 2021-02-07 | 2021-06-11 | 北京有竹居网络技术有限公司 | Video dubbing method and apparatus, storage medium, and electronic device |
CN113630630A (en) * | 2021-08-09 | 2021-11-09 | 咪咕数字传媒有限公司 | Method, device and equipment for processing dubbing information of video commentary |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070172214A1 (en) * | 2006-01-24 | 2007-07-26 | Funai Electric Co., Ltd. | Optical disc signal reproducing device |
US20080005767A1 (en) * | 2006-01-27 | 2008-01-03 | Samsung Electronics Co., Ltd. | Multimedia processing apparatus and method for mobile phone |
CN101937570A (en) * | 2009-10-11 | 2011-01-05 | 上海本略信息科技有限公司 | Animation mouth shape automatic matching implementation method based on voice and text recognition |
CN103763480A (en) * | 2014-01-24 | 2014-04-30 | 三星电子(中国)研发中心 | Method and equipment for obtaining video dubbing |
CN106060424A (en) * | 2016-06-14 | 2016-10-26 | 徐文波 | Video dubbing method and device |
CN106254933A (en) * | 2016-08-08 | 2016-12-21 | 腾讯科技(深圳)有限公司 | Subtitle extraction method and device |
CN107172449A (en) * | 2017-06-19 | 2017-09-15 | 微鲸科技有限公司 | Multi-medium play method, device and multimedia storage method |
CN107293286A (en) * | 2017-05-27 | 2017-10-24 | 华南理工大学 | A kind of speech samples collection method that game is dubbed based on network |
CN107659850A (en) * | 2016-11-24 | 2018-02-02 | 腾讯科技(北京)有限公司 | Media information processing method and device |
-
2018
- 2018-11-16 CN CN201811368673.7A patent/CN109391842B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070172214A1 (en) * | 2006-01-24 | 2007-07-26 | Funai Electric Co., Ltd. | Optical disc signal reproducing device |
US20080005767A1 (en) * | 2006-01-27 | 2008-01-03 | Samsung Electronics Co., Ltd. | Multimedia processing apparatus and method for mobile phone |
CN101937570A (en) * | 2009-10-11 | 2011-01-05 | 上海本略信息科技有限公司 | Animation mouth shape automatic matching implementation method based on voice and text recognition |
CN103763480A (en) * | 2014-01-24 | 2014-04-30 | 三星电子(中国)研发中心 | Method and equipment for obtaining video dubbing |
CN106060424A (en) * | 2016-06-14 | 2016-10-26 | 徐文波 | Video dubbing method and device |
CN106254933A (en) * | 2016-08-08 | 2016-12-21 | 腾讯科技(深圳)有限公司 | Subtitle extraction method and device |
CN107659850A (en) * | 2016-11-24 | 2018-02-02 | 腾讯科技(北京)有限公司 | Media information processing method and device |
CN107293286A (en) * | 2017-05-27 | 2017-10-24 | 华南理工大学 | A kind of speech samples collection method that game is dubbed based on network |
CN107172449A (en) * | 2017-06-19 | 2017-09-15 | 微鲸科技有限公司 | Multi-medium play method, device and multimedia storage method |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110459200A (en) * | 2019-07-05 | 2019-11-15 | 深圳壹账通智能科技有限公司 | Phoneme synthesizing method, device, computer equipment and storage medium |
CN111046814A (en) * | 2019-12-18 | 2020-04-21 | 维沃移动通信有限公司 | Image processing method and electronic device |
CN112954453A (en) * | 2021-02-07 | 2021-06-11 | 北京有竹居网络技术有限公司 | Video dubbing method and apparatus, storage medium, and electronic device |
CN113630630A (en) * | 2021-08-09 | 2021-11-09 | 咪咕数字传媒有限公司 | Method, device and equipment for processing dubbing information of video commentary |
CN113630630B (en) * | 2021-08-09 | 2023-08-15 | 咪咕数字传媒有限公司 | Method, device and equipment for processing video comment dubbing information |
Also Published As
Publication number | Publication date |
---|---|
CN109391842B (en) | 2021-01-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109391842A (en) | A kind of dubbing method, mobile terminal | |
CN107864353B (en) | A kind of video recording method and mobile terminal | |
CN108337558A (en) | Audio and video clipping method and terminal | |
CN108197185A (en) | A kind of music recommends method, terminal and computer readable storage medium | |
CN109743504A (en) | A kind of auxiliary photo-taking method, mobile terminal and storage medium | |
CN108920119A (en) | A kind of sharing method and mobile terminal | |
CN108174236A (en) | A kind of media file processing method, server and mobile terminal | |
CN109819167A (en) | A kind of image processing method, device and mobile terminal | |
CN109308178A (en) | A kind of voice drafting method and its terminal device | |
CN109040641A (en) | A kind of video data synthetic method and device | |
CN109660728A (en) | A kind of photographic method and device | |
CN109215007A (en) | A kind of image generating method and terminal device | |
CN108874352A (en) | A kind of information display method and mobile terminal | |
CN108989558A (en) | The method and device of terminal call | |
CN109167884A (en) | A kind of method of servicing and device based on user speech | |
CN108682040A (en) | A kind of sketch image generation method, terminal and computer readable storage medium | |
CN109215655A (en) | The method and mobile terminal of text are added in video | |
CN108986026A (en) | A kind of picture joining method, terminal and computer readable storage medium | |
CN110490897A (en) | Imitate the method and electronic equipment that video generates | |
CN108197206A (en) | Expression packet generation method, mobile terminal and computer readable storage medium | |
CN108668024A (en) | A kind of method of speech processing and terminal | |
CN109618218A (en) | A kind of method for processing video frequency and mobile terminal | |
CN109981904A (en) | A kind of method for controlling volume and terminal device | |
CN109816601A (en) | A kind of image processing method and terminal device | |
CN109189303A (en) | Method for editing text and mobile terminal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |