CN109391842A

CN109391842A - A kind of dubbing method, mobile terminal

Info

Publication number: CN109391842A
Application number: CN201811368673.7A
Authority: CN
Inventors: 秦帅
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2018-11-16
Filing date: 2018-11-16
Publication date: 2019-02-26
Anticipated expiration: 2038-11-16
Also published as: CN109391842B

Abstract

The present invention provides a kind of dubbing method, mobile terminal and computer readable storage mediums, are related to technical field of image processing.Wherein, which comprises receive video data to be dubbed；Determine the characteristic information of each frame image in the video data；The video data is dubbed according to the characteristic information.The embodiment of the present invention can carry out automatic dubbing to video data, avoid manually dubbing and being synthesized, improve the efficiency dubbed according to the characteristic information of frame image each in video data.

Description

A kind of dubbing method, mobile terminal

Technical field

The present invention relates to field of image processing more particularly to a kind of dubbing methods, mobile terminal.

Background technique

Currently, small video, GIF (graphic interchange format) cardon and dynamic expression etc. are that online social and amusement is essential Element.People promote mutual emotion by distributing interesting small video, by send out some dynamic expression or GIF figure Piece substitutes the personal text to be expressed.

Sometimes, small video, which lacks, dubs or user desires to it and realizes other dubbed effect.Especially now More and more people love to keep pets, and shoot the video even more a kind of prevalence to pet, are even more people to pet video addition dubbed effect A kind of enjoyment.Moreover, adding dubbed effect to cardon or expression packet, the effect of its expression can be more shown, while may be used also To increase whole interest.

In the prior art, it to realize and be dubbed to video, need the Video processing software and audio processing software of profession, use Family individual dubs adjusting synthesis, for the user of not professional technique, realizes the process very complicated dubbed.

Summary of the invention

The present invention provides a kind of dubbing method, to solve the problems, such as user's dubbing process very complicated.

In a first aspect, it is applied to mobile terminal the embodiment of the invention provides a kind of dubbing method, this method comprises:

Receive video data to be dubbed；

Determine the characteristic information of each frame image in the video data；

The video data is dubbed according to the characteristic information.

Second aspect, the embodiment of the invention provides a kind of mobile terminal, which includes:

Receiving module, for receiving video data to be dubbed；

Determining module, for determining the characteristic information of each frame image in the video data；

Module is dubbed, for dubbing according to the characteristic information to the video data.

The third aspect provides a kind of mobile terminal, which includes processor, memory and be stored in described deposit On reservoir and the computer program that can run on the processor, the computer program are realized when being executed by the processor The step of dubbing method of the present invention.

Fourth aspect provides a kind of computer readable storage medium, stores and calculates on the computer readable storage medium Machine program, the step of dubbing method of the present invention is realized when computer program is executed by processor.

In embodiments of the present invention, by receiving video data to be dubbed；Determine each frame image in the video data Characteristic information；The video data is dubbed according to the characteristic information.The embodiment of the present invention can be according to video counts The characteristic information of each frame image in carries out automatic dubbing to video data, avoids manually dubbing and being synthesized, improve and match The efficiency of sound.

Detailed description of the invention

In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by institute in the description to the embodiment of the present invention Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without any creative labor, can also be according to these attached drawings Obtain other attached drawings.

Fig. 1 shows the flow chart of one of the embodiment of the present invention one dubbing method；

Fig. 2 shows the flow charts of one of the embodiment of the present invention two dubbing method；

Fig. 3 shows the structural block diagram of one of the embodiment of the present invention three mobile terminal；

Fig. 4 shows the structural block diagram of one of the embodiment of the present invention three mobile terminal；

Fig. 5 shows the structural block diagram of one of the embodiment of the present invention four mobile terminal.

Specific embodiment

The exemplary embodiment that the present invention will be described in more detail below with reference to accompanying drawings.Although showing the present invention in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the present invention without should be by embodiments set forth here It is limited.It is to be able to thoroughly understand the present invention on the contrary, providing these embodiments, and can be by the scope of the present invention It is fully disclosed to those skilled in the art.

Embodiment one

Referring to Fig.1, the flow chart of the dubbing method of the embodiment of the present invention one is shown, can specifically include following steps:

Step 101, video data to be dubbed is received.

In embodiments of the present invention, video data includes small video or GIF cardon or dynamic expression etc., wherein small video For the video with limited frame, such as the video for the 10s that wechat is recorded.

In embodiments of the present invention, video data further includes the video of other forms, without restriction herein.

Step 102, the characteristic information of each frame image in the video data is determined.

In embodiments of the present invention, the characteristic information of each frame image includes: main object, the main object in each frame image Form, background locating for main object etc..

Wherein, main object includes: personage, animal, cartoon character, and wherein personage can be divided into old man, youth and children.It is dynamic Object can classify according to species taxonomy, cartoon according to cartoon character.The form of main object includes: to stand, and is climbed on object Or eat, the state or movement of equal main objects of speaking.Background locating for main object includes: ring locating for main object Border, weather etc..

For example, identifying that the main object of each frame image in video data is spadger, the form of spadger is at home It has a meal.The information that then characteristic information of each frame image is joined together is that spadger eats in.

Step 103, the video data is dubbed according to the characteristic information.

In embodiments of the present invention, information can be dubbed for the storage of various characteristic informations is corresponding in advance, wherein match message Breath includes text and acoustic information.For example, can be adjusted when the characteristic information for determining video data eats in for spadger It is dubbed information with corresponding this video data is dubbed.The corresponding relationship for dubbing information and characteristic information is established in advance, when When identifying corresponding characteristic information, corresponding information of dubbing can be called directly, video data is dubbed.

For example, can call directly when characteristic information is at table for spadger and dub information " good perfume Na Nana " progress It dubs, wherein dubbing the sound of speaking that information can record spadger in advance, or is synthesized according to computer.Work as characteristic information Vehicle is being chased after for dog, can call directly and information " where walking " dubbed, wherein dubbing information can record in advance or according to computer It is synthesized.It is above-mentioned to dub the interest that increase video data.

Embodiment two

Referring to Fig. 2, the flow chart of the dubbing method of the embodiment of the present invention two is shown, can specifically include following steps:

Step 201, video data to be dubbed is received.

Referring to step 101, details are not described herein.

Step 202, the main object and background information of each frame image in the video data are determined.

In embodiments of the present invention, the characteristic information includes: body feature and background characteristics.

In embodiments of the present invention, it is personage or animal or cartoon etc. that main object, which includes main object,.Or main body pair As specifically why biology or object.Such as main object be spadger or old lady or young schoolgirl or dog or cat, Or Sailor Moon or Doraemon or desk with espressiove, stool etc..

In embodiments of the present invention, background characteristics includes: background environment at main object, such as the woods, family, School, road etc..

In embodiments of the present invention, the region where main object can be plucked out, then identifies main body using diagram technology is scratched Object；Identification removes other regions in the region where main object, identifies background information.Improve the accuracy of identification.

Step 203, the body feature of the main object is determined.

In embodiments of the present invention, the body feature includes: that the global feature, facial characteristics, oral area of main object are special Levy wherein at least one.

In embodiments of the present invention, global feature refers to the form of main object entirety, such as when main object is personage, Global feature includes: personage in standing or be seated or recumbency or running or dance or swimming.Specifically, whole Feature refers to the state that the movement of the entire body of people is showed.Facial characteristics can refer to main object facial expression: such as surprised, Fear, detest, indignation, happiness, sadness etc..Oral area feature refers to the shape of the mouth as one speaks, for personage, the shape of the mouth as one speaks have it is a variety of, can basis The shape of the mouth as one speaks determines the content that personage is intended by；For animal, the shape of the mouth as one speaks has specific several situations, big to open, part a little or closed. For cartoon image, nozzle type can be enriched as personage there are many shape.

In embodiments of the present invention, when the body feature includes global feature, the step 203 includes:

Sub-step 2031 extracts the body region in each frame image in the video data.

In embodiments of the present invention, the body region in each frame image can be extracted using diagram technology is scratched.When main body pair As there was only the figure viewed from behind or side, when present face is not with oral area on the image, body region is only extracted.

Sub-step 2032 determines the global feature of main object according to the body region.

In embodiments of the present invention, body region is identified, wherein when the main object of body region is personage or animal When, global feature can be identified by identifying the limb action of main object.

Sub-step 2032 specifically: if being extracted body region, the body region is input to the first identification model, is obtained Obtain the global feature of the main object.

In embodiments of the present invention, the first identification model can be obtained according to the training of the first training sample in advance, wherein first Training sample is the corresponding image of main body region, global feature is description to main body to picture.Then using the first identification model When, body region is input to the first identification model, obtains the global feature of main object.

In embodiments of the present invention, when the body feature includes global feature and facial characteristics, step 203 packet It includes:

Sub-step 2033 extracts the face in the body region and body region in each frame image in the video data Region；

In embodiments of the present invention, the body region in each frame image can be extracted using diagram technology is scratched, in needs pair When facial characteristics is determined, the facial area in body region is extracted.

When main object is birds or other do not have the animal of the shape of the mouth as one speaks, the body region and face of each frame image are extracted Region.

Sub-step 2034 determines the global feature of main object according to the body region；

Sub-step 2035 determines the facial characteristics of main object according to the facial area；

In embodiments of the present invention, sub-step 2035 specifically: the facial area is input to the second identification model, is obtained Obtain the facial characteristics of the main object.

In embodiments of the present invention, the second identification model can be obtained according to the training of the second training sample in advance, wherein first Training sample is the corresponding image of facial area, facial characteristics is description to main body to image planes portion.Then using the second identification When model, facial area is input to the second identification model, obtains the facial characteristics of main object.

In embodiments of the present invention, described when the body feature includes global feature, facial characteristics and oral area feature Step 203 includes:

Sub-step 2036 extracts the body region in each frame image in the video data, the face in body region Mouth area in region and facial area；

In embodiments of the present invention, the body region in each frame image can be extracted using diagram technology is scratched, in needs pair When facial characteristics is determined, the facial area extracted in body region directly exists when needing to analyze mouth area Mouth area is extracted in body region to be analyzed, or is extracted mouth area in facial area and analyzed.

It in embodiments of the present invention, when main object is for personage or with the animal of the shape of the mouth as one speaks, and is personage or animal Front when, that is, extract body region, facial area and the mouth area of each frame image.

Sub-step 2037 determines the global feature of main object according to the body region；

Sub-step 2038 determines the facial characteristics of main object according to the facial area；

Sub-step 2039 determines the oral area feature of main object according to the mouth area；

In embodiments of the present invention, sub-step 2039 specifically: the mouth area is input to third identification model, is obtained Obtain the oral area feature of the main object.

In embodiments of the present invention, third identification model can be obtained according to the training of third training sample in advance, wherein third Training sample is the corresponding image of mouth area, oral area feature is to main body to the description as oral area.Then identified using third When model, mouth area is input to third identification model, obtains the oral area feature of main object.

In embodiments of the present invention, when the body feature includes facial characteristics and oral area feature, step 203 packet It includes:

Sub-step 20310 extracts the mouth in the facial area and facial area in each frame image in the video data Portion region；

In embodiments of the present invention, some images only have face area and mouth area, and there is no limbs, when in such feelings Under condition, face area therein and mouth area are only extracted.

Sub-step 20311 determines the facial characteristics of main object according to the facial area；

Sub-step 20312 determines the oral area feature of main object according to the mouth area.

In embodiments of the present invention, when image is there was only facial area and mouth area, wherein facial area is only extracted And mouth area, determine corresponding feature.

Step 204, the background characteristics of the background information is determined.

In embodiments of the present invention, step 205 includes: by the background information of each frame image, the 4th identification mould of input Type obtains the background characteristics.

In embodiments of the present invention, the 4th identification model can be obtained according to the training of the 4th training sample in advance, wherein the 4th Training sample is the corresponding image in background area, background characteristics is description to main body to picture.Then using the 4th identification model When, background area is input to the 4th identification model, obtains the background characteristics of main object.

Step 205, according to the body feature, target voice sound information is determined.

In embodiments of the present invention, it can be achieved that storing corresponding target voice sound information according to body feature, wherein voice letter Breath includes tone, tone color and volume.For example, when body feature personage running when, facial characteristics be pain expression, the shape of the mouth as one speaks For the slowly varying shape of the mouth as one speaks, it is determined that the frequency of tone is a certain range.Volume can be for compared with amount of bass.Tone color can be according to personage spy Property determine.

In embodiments of the present invention, it can be achieved that different subjects feature and voice sound information are established corresponding relationship, then in determination When body feature, corresponding target voice sound information is called directly.

Step 206, target text information is determined according to the body feature and the background characteristics.

In embodiments of the present invention, combination that can in advance to different subjects feature and background characteristics, stores corresponding target Text information.After determining body feature and background characteristics, corresponding target text information is called directly.

Step 207, the video data is dubbed according to the target voice sound information and the target text information.

In embodiments of the present invention, it obtains dubbing information after target voice sound information and target text information being combined, It will dub information and after video data synthesized, complete to dub.

In embodiments of the present invention, the step 207 includes:

Sub-step 2071 is based on generic sound library, obtains the corresponding generic sound data of the target text information, In, the generic sound library includes: the corresponding generic sound data of each text information.

In embodiments of the present invention, each text information includes all texts that may be used during expression, each text The generic sound data for thering is oneself to fix.Wherein, generic sound data include: tone color, tone and volume.I.e. each text There are fixed tone color, tone and volume.In example in real time of the invention, each text can store identical tone color, tone and sound Amount composition generic sound library.

In embodiments of the present invention, in generic sound library, each text has volume, tone and the tone color of standard.

Sub-step 2072 is adjusted the generic sound data according to the target voice sound information, obtains target sound Sound data.

In embodiments of the present invention, when obtaining target voice sound information, according to target voice sound information, to target text information pair The acoustic information in generic sound data is answered to be adjusted.The volume, tone color and tone of target text information is set to be adjusted to target The effect of voice sound information.

Sub-step 2073 is based on the target voice data and the target text information, carries out to the video data It dubs.

In embodiments of the present invention, each text has the multiple combinations of tone, tone color and volume, when to each text Every kind of acoustic information stored, need biggish database, therefore, each text be stored as generic sound data, when After determining target voice sound information, directly generic sound data are adjusted, reduce the amount of storage of text and voice sound information.

Embodiment three

Referring to Fig. 3, a kind of structural block diagram of mobile terminal 300 of the embodiment of the present invention three is shown, can specifically include:

Receiving module 301, for receiving video data to be dubbed；

Determining module 302, for determining the characteristic information of each frame image in the video data；

Module 303 is dubbed, for dubbing according to the characteristic information to the video data.

Optionally, on the basis of Fig. 3, referring to Fig. 4, the determining module 302 includes:

First determination unit 3021, for determining the main object and background information of each frame image in the video data；

Wherein, the characteristic information includes: body feature and background characteristics；

Second determination unit 3022, for determining the body feature of the main object；

Third determination unit 3023, for determining the background characteristics of the background information；

The module 303 of dubbing includes:

4th determination unit 3031, for determining target voice sound information according to the body feature；

5th determination unit 3032, for determining target text information according to the body feature and the background characteristics；

Unit 3033 is dubbed, is used for according to the target voice sound information and the target text information to the video data It is dubbed.

When the body feature includes global feature, second determination unit 3022, comprising:

First extracts subelement, for extracting the body region in each frame image in the video data；

First determines subelement, for determining the global feature of main object according to the body region；

When the body feature includes global feature and facial characteristics, second determination unit 3022, comprising:

Second extracts subelement, for extracting body region and body region in each frame image in the video data In facial area；

Second determines subelement, for determining the global feature of main object according to the body region；

Third determines subelement, for determining the facial characteristics of main object according to the facial area；

When the body feature includes global feature, facial characteristics and oral area feature, second determination unit 3022, Include:

Third extracts subelement, for extracting body region, body region in each frame image in the video data In facial area and facial area in mouth area；

4th determines subelement, for determining the global feature of main object according to the body region；

5th determines subelement, for determining the facial characteristics of main object according to the facial area；

6th determines subelement, for determining the oral area feature of main object according to the mouth area；

When the body feature includes facial characteristics and oral area feature, second determination unit 3022, comprising:

4th extracts subelement, for extracting facial area and facial area in each frame image in the video data In mouth area；

7th determines that subelement determines the facial characteristics of main object according to the facial area；

8th determines that subelement determines the oral area feature of main object according to the mouth area.

Described first determines subelement, specifically for the body region is input to the first identification model, described in acquisition The global feature of main object；

Described second determines subelement, specifically for the facial area is input to the second identification model, described in acquisition The facial characteristics of main object；

The third determines subelement, specifically for the mouth area is input to third identification model, described in acquisition The oral area feature of main object；

The third determination unit, comprising:

4th determines subelement, for by the background information of each frame image, inputting the 4th identification model, described in acquisition Background characteristics.

It is then described to dub unit 3033, comprising:

Subelement is obtained, generic sound library is based on, obtains the corresponding generic sound data of the target text information, In, the generic sound library includes: the corresponding generic sound data of each text information；

Subelement is obtained, for being adjusted to the generic sound data according to the target voice sound information, obtains mesh Mark voice data；

With sub-unit, for being based on the target voice data and the target text information, to the video data It is dubbed.

Mobile terminal provided in an embodiment of the present invention can be realized mobile terminal in the embodiment of the method for Fig. 1 to Fig. 2 and realize Each process, to avoid repeating, which is not described herein again.

In embodiments of the present invention, mobile terminal is by receiving video data to be dubbed；It determines in the video data The characteristic information of each frame image；The video data is dubbed according to the characteristic information.The embodiment of the present invention is mobile eventually End can carry out automatic dubbing to video data, avoid manually dubbing simultaneously according to the characteristic information of frame image each in video data It is synthesized, improves the efficiency dubbed.

Example IV

A kind of hardware structural diagram of Fig. 5 mobile terminal of each embodiment to realize the present invention,

The mobile terminal 500 includes but is not limited to: radio frequency unit 501, network module 502, audio output unit 503, defeated Enter unit 504, sensor 505, display unit 506, user input unit 507, interface unit 508, memory 509, processor The components such as 510 and power supply 511.It will be understood by those skilled in the art that mobile terminal structure shown in Fig. 5 is not constituted Restriction to mobile terminal, mobile terminal may include than illustrating more or fewer components, perhaps combine certain components or Different component layouts.In embodiments of the present invention, mobile terminal include but is not limited to mobile phone, tablet computer, laptop, Palm PC, car-mounted terminal, wearable device and pedometer etc..

Processor 510, for receiving video data to be dubbed；Determine the feature letter of each frame image in the video data Breath；The video data is dubbed according to the characteristic information.

It should be understood that the embodiment of the present invention in, radio frequency unit 501 can be used for receiving and sending messages or communication process in, signal Send and receive, specifically, by from base station downlink data receive after, to processor 510 handle；In addition, by uplink Data are sent to base station.In general, radio frequency unit 501 includes but is not limited to antenna, at least one amplifier, transceiver, coupling Device, low-noise amplifier, duplexer etc..In addition, radio frequency unit 501 can also by wireless communication system and network and other set Standby communication.

Mobile terminal provides wireless broadband internet by network module 502 for user and accesses, and such as user is helped to receive It sends e-mails, browse webpage and access streaming video etc..

Audio output unit 503 can be received by radio frequency unit 501 or network module 502 or in memory 509 The audio data of storage is converted into audio signal and exports to be sound.Moreover, audio output unit 503 can also be provided and be moved The relevant audio output of specific function that dynamic terminal 500 executes is (for example, call signal receives sound, message sink sound etc. Deng).Audio output unit 503 includes loudspeaker, buzzer and receiver etc..

Input unit 504 is for receiving audio or video signal.Input unit 504 may include graphics processor (Graphics Processing Unit, GPU) 5041 and microphone 5042, graphics processor 5041 is in video acquisition mode Or the image data of the static images or video obtained in image capture mode by image capture apparatus (such as camera) carries out Reason.Treated, and picture frame may be displayed on display unit 506.Through graphics processor 5041, treated that picture frame can be deposited Storage is sent in memory 509 (or other storage mediums) or via radio frequency unit 501 or network module 502.Mike Wind 5042 can receive sound, and can be audio data by such acoustic processing.Treated audio data can be The format output that mobile communication base station can be sent to via radio frequency unit 501 is converted in the case where telephone calling model.

Mobile terminal 500 further includes at least one sensor 505, such as optical sensor, motion sensor and other biographies Sensor.Specifically, optical sensor includes ambient light sensor and proximity sensor, wherein ambient light sensor can be according to environment The light and shade of light adjusts the brightness of display panel 5061, and proximity sensor can close when mobile terminal 500 is moved in one's ear Display panel 5061 and/or backlight.As a kind of motion sensor, accelerometer sensor can detect in all directions (general For three axis) size of acceleration, it can detect that size and the direction of gravity when static, can be used to identify mobile terminal posture (ratio Such as horizontal/vertical screen switching, dependent game, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, tap)；It passes Sensor 505 can also include fingerprint sensor, pressure sensor, iris sensor, molecule sensor, gyroscope, barometer, wet Meter, thermometer, infrared sensor etc. are spent, details are not described herein.

Display unit 506 is for showing information input by user or being supplied to the information of user.Display unit 506 can wrap Display panel 5061 is included, liquid crystal display (Liquid Crystal Display, LCD), Organic Light Emitting Diode can be used Forms such as (Organic Light-Emitting Diode, OLED) configure display panel 5061.

User input unit 507 can be used for receiving the number or character information of input, and generate the use with mobile terminal Family setting and the related key signals input of function control.Specifically, user input unit 507 include touch panel 5071 and Other input equipments 5072.Touch panel 5071, also referred to as touch screen collect the touch operation of user on it or nearby (for example user uses any suitable subjects or attachment such as finger, stylus on touch panel 5071 or in touch panel Operation near 5071).Touch panel 5071 may include both touch detecting apparatus and touch controller.Wherein, it touches Detection device detects the touch orientation of user, and detects touch operation bring signal, transmits a signal to touch controller；Touching It touches controller and receives touch information from touch detecting apparatus, and be converted into contact coordinate, then give processor 510, connect It receives the order that processor 510 is sent and is executed.Furthermore, it is possible to using resistance-type, condenser type, infrared ray and surface acoustic wave Equal multiple types realize touch panel 5071.In addition to touch panel 5071, user input unit 507 can also include other inputs Equipment 5072.Specifically, other input equipments 5072 can include but is not limited to physical keyboard, function key (such as volume control Key, switch key etc.), trace ball, mouse, operating stick, details are not described herein.

Further, touch panel 5071 can be covered on display panel 5061, when touch panel 5071 is detected at it On or near touch operation after, send processor 510 to determine the type of touch event, be followed by subsequent processing device 510 according to touching The type for touching event provides corresponding visual output on display panel 5061.Although in Fig. 5, touch panel 5071 and display Panel 5061 is the function that outputs and inputs of realizing mobile terminal as two independent components, but in some embodiments In, can be integrated by touch panel 5071 and display panel 5061 and realize the function that outputs and inputs of mobile terminal, it is specific this Place is without limitation.

Interface unit 508 is the interface that external device (ED) is connect with mobile terminal 500.For example, external device (ED) may include having Line or wireless head-band earphone port, external power supply (or battery charger) port, wired or wireless data port, storage card end Mouth, port, the port audio input/output (I/O), video i/o port, earphone end for connecting the device with identification module Mouthful etc..Interface unit 508 can be used for receiving the input (for example, data information, electric power etc.) from external device (ED) and By one or more elements that the input received is transferred in mobile terminal 500 or can be used in 500 He of mobile terminal Data are transmitted between external device (ED).

Memory 509 can be used for storing software program and various data.Memory 509 can mainly include storing program area The storage data area and, wherein storing program area can (such as the sound of application program needed for storage program area, at least one function Sound playing function, image player function etc.) etc.；Storage data area can store according to mobile phone use created data (such as Audio data, phone directory etc.) etc..In addition, memory 509 may include high-speed random access memory, it can also include non-easy The property lost memory, a for example, at least disk memory, flush memory device or other volatile solid-state parts.

Processor 510 is the control centre of mobile terminal, utilizes each of various interfaces and the entire mobile terminal of connection A part by running or execute the software program and/or module that are stored in memory 509, and calls and is stored in storage Data in device 509 execute the various functions and processing data of mobile terminal, to carry out integral monitoring to mobile terminal.Place Managing device 510 may include one or more processing units；Preferably, processor 510 can integrate application processor and modulatedemodulate is mediated Manage device, wherein the main processing operation system of application processor, user interface and application program etc., modem processor is main Processing wireless communication.It is understood that above-mentioned modem processor can not also be integrated into processor 510.

Mobile terminal 500 can also include the power supply 511 (such as battery) powered to all parts, it is preferred that power supply 511 Can be logically contiguous by power-supply management system and processor 510, to realize management charging by power-supply management system, put The functions such as electricity and power managed.

In addition, mobile terminal 500 includes some unshowned functional modules, details are not described herein.

Preferably, the embodiment of the present invention also provides a kind of mobile terminal, including processor 510, and memory 509 is stored in On memory 509 and the computer program that can run on the processor 510, the computer program are executed by processor 510 Each process of the above-mentioned dubbing method embodiment of Shi Shixian, and identical technical effect can be reached, to avoid repeating, here no longer It repeats.

The embodiment of the present invention also provides a kind of computer readable storage medium, and meter is stored on computer readable storage medium Calculation machine program, the computer program realize each process of above-mentioned dubbing method embodiment when being executed by processor, and can reach Identical technical effect, to avoid repeating, which is not described herein again.Wherein, the computer readable storage medium is deposited Ru read-only Reservoir (Read-Only Memory, abbreviation ROM), random access memory (Random Access Memory, abbreviation RAM), Magnetic or disk etc..

It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, method, article or the device that include a series of elements not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, method, article or device institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, method of element, article or device.

Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in a storage medium In (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal (can be mobile phone, computer, service Device, air conditioner or network equipment etc.) execute method described in each embodiment of the present invention.

The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much Form belongs within protection of the invention.

Claims

1. a kind of dubbing method is applied to mobile terminal, which is characterized in that the described method includes:

Receive video data to be dubbed；

The video data is dubbed according to the characteristic information.

2. dubbing method according to claim 1, which is characterized in that the characteristic information includes: body feature and background Feature；In the determination video data the step of characteristic information of each frame image, comprising:

Determine the main object and background information of each frame image in the video data；

Determine the body feature of the main object；

Determine the background characteristics of the background information；

The described the step of video data is dubbed according to the characteristic information, comprising:

According to the body feature, target voice sound information is determined；

Target text information is determined according to the body feature and the background characteristics；

The video data is dubbed according to the target voice sound information and the target text information.

3. dubbing method according to claim 2, which is characterized in that

When the body feature includes global feature, the step of the body feature of the determination main object, comprising:

Extract the body region in each frame image in the video data；

According to the body region, the global feature of main object is determined；

When the body feature includes global feature and facial characteristics, the step of the body feature of the determination main object Suddenly, comprising:

Extract the facial area in the body region and body region in each frame image in the video data；

According to the facial area, the facial characteristics of main object is determined；

When the body feature includes global feature, facial characteristics and oral area feature, the master of the determination main object The step of body characteristics, comprising:

Extract the body region in each frame image in the video data, in facial area and facial area in body region Mouth area；

According to the mouth area, the oral area feature of main object is determined；

When the body feature includes facial characteristics and oral area feature, the step of the body feature of the determination main object Suddenly, comprising:

Extract the mouth area in the facial area and facial area in each frame image in the video data；

According to the mouth area, the oral area feature of main object is determined.

4. dubbing method according to claim 3, which is characterized in that it is described according to the body region, determine main body pair The step of global feature of elephant, comprising:

The body region is input to the first identification model, obtains the global feature of the main object；

It is described according to the facial area, the step of determining the facial characteristics of main object, comprising:

The facial area is input to the second identification model, obtains the facial characteristics of the main object；

It is described according to the mouth area, the step of determining the oral area feature of main object, comprising:

The mouth area is input to third identification model, obtains the oral area feature of the main object；

The step of background characteristics of the determination background information, comprising:

By the background information of each frame image, the 4th identification model is inputted, the background characteristics is obtained.

5. according to the method described in claim 2, it is characterized in that,

The described the step of video data is dubbed according to the target voice sound information and the target text information, packet It includes:

Based on generic sound library, the corresponding generic sound data of the target text information are obtained, wherein the generic sound library It include: the corresponding generic sound data of each text information；

According to the target voice sound information, the generic sound data are adjusted, obtain target voice data；

Based on the target voice data and the target text information, the video data is dubbed.

6. a kind of mobile terminal, which is characterized in that the mobile terminal includes:

Receiving module, for receiving video data to be dubbed；

7. mobile terminal according to claim 6, which is characterized in that the characteristic information includes: body feature and background Feature；The determining module, comprising:

First determination unit, for determining the main object and background information of each frame image in the video data；

Second determination unit, for determining the body feature of the main object；

Third determination unit, for determining the background characteristics of the background information；

It is described to dub module, comprising:

4th determination unit, for determining target voice sound information according to the body feature；

5th determination unit, for determining target text information according to the body feature and the background characteristics；

Unit is dubbed, for matching according to the target voice sound information and the target text information to the video data Sound.

8. mobile terminal according to claim 7, which is characterized in that the body feature includes:

When the body feature includes global feature, second determination unit, comprising:

When the body feature includes global feature and facial characteristics, second determination unit, comprising:

Second extracts subelement, for extracting in the body region and body region in each frame image in the video data Facial area；

Second determines subelement, for determining the facial characteristics of main object according to the facial area；

When the body feature includes global feature, facial characteristics and oral area feature, second determination unit, comprising:

Third extracts subelement, for extracting the body region in each frame image in the video data, in body region Mouth area in facial area and facial area；

Third determines subelement, for determining the oral area feature of main object according to the mouth area；

When the body feature includes facial characteristics and oral area feature, second determination unit, comprising:

4th extracts subelement, for extracting in the facial area and facial area in each frame image in the video data Mouth area；

Second determines that subelement determines the facial characteristics of main object according to the facial area；

Third determines subelement, according to the mouth area, determines the oral area feature of main object.

9. mobile terminal according to claim 8, which is characterized in that

Described first determines that subelement obtains the main body specifically for the body region is input to the first identification model The global feature of object；

Described second determines that subelement obtains the main body specifically for the facial area is input to the second identification model The facial characteristics of object；

The third determines subelement, specifically for the mouth area is input to third identification model, obtains the main body The oral area feature of object；

The third determination unit, comprising:

4th determines subelement, for inputting the background information of each frame image the 4th identification model, obtaining the background Feature.

10. mobile terminal according to claim 7, which is characterized in that further include:

Module is provided, for providing in generic sound library；The generic sound library includes: the corresponding generic sound of each text information Data；

It is then described to dub unit, comprising:

Subelement is obtained, for being based on the generic sound library, obtains the corresponding generic sound data of the target text information；

Subelement is obtained, for being adjusted to the generic sound data according to the target voice sound information, obtains target sound Sound data；

With sub-unit, for being based on the target voice data and the target text information, the video data is carried out It dubs.

11. a kind of mobile terminal, which is characterized in that including processor, memory and be stored on the memory and can be in institute The computer program run on processor is stated, such as claim 1 to 5 is realized when the computer program is executed by the processor Any one of described in dubbing method the step of.

12. a kind of computer readable storage medium, which is characterized in that store computer journey on the computer readable storage medium Sequence realizes the step of the dubbing method as described in any one of claims 1 to 5 when the computer program is executed by processor Suddenly.