CN106060629A

CN106060629A - Picture extraction method and terminal

Info

Publication number: CN106060629A
Application number: CN201610592552.5A
Authority: CN
Inventors: 白斌
Original assignee: Beijing Kingsoft Internet Security Software Co Ltd
Current assignee: Beijing Kingsoft Internet Security Software Co Ltd
Priority date: 2016-07-25
Filing date: 2016-07-25
Publication date: 2016-10-26

Abstract

The embodiment of the invention discloses a picture extraction method, which comprises the following steps: acquiring waveform data of audio data in audio and video data to be processed; acquiring target waveform data from the waveform data, and acquiring target audio data matched with the target waveform data from the audio data; acquiring target audio and video data corresponding to the target audio data from the audio and video data to be processed; and extracting a picture from the target audio and video data to obtain a target picture. The embodiment of the invention also discloses a terminal. By adopting the method and the device, the efficiency of extracting the target picture is improved, and the extraction cost is reduced.

Description

The extracting method of a kind of picture and terminal

Technical field

The present invention relates to electronic technology field, particularly relate to extracting method and the terminal of a kind of picture.

Background technology

Video has a lot of Highlights, in order to effectively utilize the picture of these Highlights, factory in playing process at present The picture of these Highlights is manually intercepted by Shang Changhui, and runs it, as prepared advertising, or makes video Brief introduction etc..

But, owing to excellent picture is manually to intercept, this be often depending on intercept operation personnel invite people like with And individual's problem such as quality, this makes the image quality manually intercepting out uncontrollable, it is impossible to ensure the quality of excellent picture, and A large amount of human cost need to be spent to carry out checking video and carrying out operation intercept, which increase the cost overhead of manufacturer, and extract Picture efficiency is low.

Summary of the invention

Embodiment of the present invention technical problem to be solved is, it is provided that the extracting method of a kind of picture and terminal.Can carry The high efficiency extracting target picture, reduces extraction cost.

In order to solve above-mentioned technical problem, embodiments provide the extracting method of a kind of picture, including:

Obtain the Wave data of voice data in pending audio, video data；

From described Wave data, obtain target waveform data, obtain and described object wave figurate number in described voice data Target audio data according to coupling；

The target sound video data corresponding with described target audio data is obtained in described pending audio, video data；

Carry out extracting picture from described target sound video data, it is thus achieved that target picture.

Wherein, described acquisition target waveform data from described Wave data, obtain with described in described voice data The target audio data of target waveform Data Matching include:

Detect described Wave data, obtain amplitude more than the Wave data presetting amplitude thresholds；

Described amplitude is set to target waveform data more than the Wave data presetting amplitude thresholds.

Wherein, described carrying out from described target sound video data extracts picture, it is thus achieved that target picture includes:

Extract the target video data in described target sound video data；

Described target video data is carried out camera lens division, it is thus achieved that the video data of each camera lens；

Picture extraction is carried out respectively, it is thus achieved that at least one target picture from the video data of described each camera lens.

Wherein, described carrying out respectively from the video data of described each camera lens extracts picture, it is thus achieved that at least one target is drawn Face includes:

Carry out respectively extracting picture from the video data of described each camera lens, it is thus achieved that at least one pending extraction picture Face；

When only getting a pending extraction picture, described pending extraction picture is set to target and draws Face；

When getting the pending extraction picture of at least two, the extraction picture pending to described at least two is carried out Filter process, it is thus achieved that at least one target picture described.

Wherein, described at least one pending extraction picture described is filtered process, it is thus achieved that described at least one Target picture includes:

In at least one pending extracting described, picture calculates any two pending phases extracted between picture Like degree；

Judge that whether described similarity is more than the threshold value preset；

When described similarity is more than the threshold value preset, filter described any two pending extract in pictures any One pending extraction picture, described any two pending extract in pictures by except described any one pending Extract another the pending extraction picture outside picture and be set to described target picture；

When described similarity is less than the threshold value preset, described any two pending extraction pictures are disposed as institute State target picture.

Wherein, described carrying out from described target sound video data extracts picture, it is thus achieved that after target picture, also include:

At least two target picture is carried out video-splicing, it is thus achieved that target video also exports described target video.

The embodiment of the present invention additionally provides a kind of terminal, including:

First acquiring unit, for obtaining the Wave data of the voice data in pending audio, video data；

Second acquisition unit, for obtaining target waveform data from described Wave data, obtains in described voice data Take and the target audio data of described target waveform Data Matching；

3rd acquiring unit is corresponding with described target audio data for obtaining in described pending audio, video data Target sound video data；

Extraction unit, for carrying out extraction picture, it is thus achieved that target picture from described target sound video data.

Wherein, described second acquisition unit includes:

Detection sub-unit, is used for detecting described Wave data, obtains amplitude more than the Wave data presetting amplitude thresholds；

First arranges subelement, for described amplitude is set to object wave figurate number more than the Wave data presetting amplitude thresholds According to.

Wherein, described extraction unit includes:

First extracts subelement, for extracting the target video data in described target sound video data；

Divide subelement, for described target video data is carried out camera lens division, it is thus achieved that the video data of each camera lens；

Second extracts subelement, for carrying out picture extraction from the video data of described each camera lens respectively, it is thus achieved that at least One target picture.

Wherein, described second extraction subelement includes:

3rd extracts subelement, for carrying out extraction picture from the video data of described each camera lens respectively, it is thus achieved that at least One pending extraction picture；

Second arranges subelement, for when only getting a pending extraction picture, by described pending carrying Take picture and be set to target picture；

Process subelement, for when getting the pending extraction picture of at least two, waiting to locate to described at least two The extraction picture of reason carries out filtering process, it is thus achieved that at least one target picture described.

Wherein, described process subelement includes:

Computation subunit, for calculating any two pending proposing at least one pending extracting in picture described Take the similarity between picture；

Judgment sub-unit, for judging that whether described similarity is more than the threshold value preset；

Filter subelement, for when described judgment sub-unit judges described similarity more than the threshold value preset, filtering institute State any two pending any one pending extraction pictures extracted in pictures, described any two pending Extract in picture and another the pending extraction picture in addition to described any one pending extraction picture is set to Described target picture；

3rd arranges subelement, is used for when described judgment sub-unit judges described similarity less than the threshold value preset, will Described any two pending extraction pictures are disposed as described target picture.

Wherein, described terminal also includes:

Concatenation unit, for carrying out video-splicing by least one target picture described, it is thus achieved that target video also exports institute State target video.

The embodiment of the present invention additionally provides a kind of terminal, including: housing, processor, memorizer, circuit board and power supply electricity Road, wherein, described circuit board is placed in the interior volume that described housing surrounds, described processor and described memorizer and is arranged on institute State on circuit board；Described power circuit, powers for each circuit or the device for described mobile terminal；Described memorizer is used for Storage executable program code；Described processor by read the executable program code of storage in described memorizer run with The program that described executable program code is corresponding, for performing following steps:

Obtain the Wave data of voice data in pending audio, video data；

In embodiments of the present invention, terminal obtains the Wave data of the voice data in pending audio, video data, from Described Wave data obtains target waveform data, described voice data obtains the mesh with described target waveform Data Matching Mark voice data, obtains the target sound video counts corresponding with described target audio data in described pending audio, video data According to, carry out extracting picture from described target sound video data, it is thus achieved that target picture, extraction mesh from audio, video data can be improved The efficiency of mark picture, reduces extraction cost.

Accompanying drawing explanation

In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing In having technology to describe, the required accompanying drawing used is briefly described, it should be apparent that, the accompanying drawing in describing below is only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, it is also possible to Other accompanying drawing is obtained according to these accompanying drawings.

Fig. 1 is a kind of embodiment schematic flow sheet of the extracting method of a kind of picture that the embodiment of the present invention provides；

Fig. 2 is a kind of example structure figure of a kind of terminal that the embodiment of the present invention provides；

Fig. 3 is the another kind of example structure figure of a kind of terminal that the embodiment of the present invention provides.

Detailed description of the invention

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Describe, it is clear that described embodiment is only a part of embodiment of the present invention rather than whole embodiments wholely.Based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under not making creative work premise Embodiment, broadly falls into the scope of protection of the invention.

Executive agent in the embodiment of the present invention can be terminal, described terminal comprise the steps that computer, panel computer, The intelligent terminal such as notebook, above-mentioned terminal is only citing, and non exhaustive, including but not limited to above-mentioned terminal.

See Fig. 1, be the extracting method one embodiment schematic flow sheet of a kind of picture that the embodiment of the present invention provides.This The extracting method of a kind of picture of inventive embodiments comprises the steps:

S100, obtains the Wave data of voice data in pending audio, video data.

In embodiments of the present invention, audio, video data is made up of voice data and video data, and audio, video data is permissible By Audio Players output audio frequency and video player output video, if audio, video data can be having of televising The audio, video datas such as the video recording on the programme content of voice output, mobile phone with voice output.

In embodiments of the present invention, pending audio, video data is the audio frequency and video number that user selects to carry out processing Can be as pending audio, video data according to, the audio, video data received such as terminal, or terminal can store multiple sound Video data, user therefrom selects an audio, video data as pending audio, video data.

In embodiments of the present invention, when terminal determines pending audio, video data, terminal can be to pending audio frequency and video Decoding data, extracts the voice data included by pending audio, video data.

In embodiments of the present invention, when voice data during terminal gets pending audio, video data, terminal can Processing voice data, it is thus achieved that Wave data, wherein, Wave data can be voice data waveform number in time domain According to.Wherein, all and one time reference line of Wave data and voice data is corresponding.

S101, obtains target waveform data from described Wave data, obtains and described target in described voice data The target audio data of Wave data coupling.

In embodiments of the present invention, when voice data is during playing out, if occurring background music, it is corresponding Wave data amplitude will increase.Therefore, terminal can detection waveform data, and when detect Wave data occur sudden change time, Terminal can determine whether that background music occurs in the voice data that this section of Wave data is corresponding, and until before wave recovery sudden change next time During waveform, terminal can determine whether that the voice data that this section of Wave data is corresponding terminates background music, thus terminal can record this two The time point of secondary change, intercepts the Wave data between the two time point as target waveform data in Wave data.Cause This, terminal can monitor Wave data, obtains amplitude more than or equal to the Wave data presetting amplitude thresholds, and by amplitude more than or Be set to target waveform data equal to the Wave data presetting amplitude thresholds, wherein, amplitude thresholds user can with sets itself, This is not defined.

In embodiments of the present invention, the target waveform data portability timestamp of acquisition, wherein, timestamp is a character Sequence, uniquely identifies the time at certain a moment, i.e. includes start time point and the knot of target waveform data of target waveform data Bundle time point.Owing to shared all and one the time reference line of Wave data and voice data is corresponding, therefore, terminal can be according to target The timestamp of Wave data obtains the target audio data that this timestamp is corresponding on voice data, thus gets and object wave The target audio data of graphic data coupling.

S102, obtains the target sound video corresponding with described target audio data in described pending audio, video data Data.

In embodiments of the present invention, voice data, video data and audio, video data all carry timestamp.Due to sound Voice data in video data need to carry out with the video data in audio, video data Tong Bu play, therefore, voice data time Between stab, the timestamp of video data in audio, video data and all with one time reference line pair of timestamp of audio, video data Should, so that voice data can carry out Tong Bu broadcasting with video data, that is, terminal output audio, video data plays out Time, the Voice & Video of output carries out synchronizing to play.Therefore, can be according to the timestamp in target audio data at audio, video data The audio, video data that this timestamp of middle acquisition is corresponding, thus audio, video data corresponding for this timestamp is set to target sound video counts According to.

S103, carries out extracting picture, it is thus achieved that target picture from described target sound video data.

In embodiments of the present invention, target sound video data can include target audio data and target video data, eventually End can extract target video data included in target sound video data.

After terminal gets target video data, terminal can at least one preset position in target video data At least one picture of upper extraction.Wherein, at least one position can be the start position in target video data, point midway with And final position, further, position can also is that other positions, and user can be arranged voluntarily.Therefore, when the position of terminal preset Putting when including start position, point midway and final position, terminal can start position in target video data, position, midpoint Put and final position is respectively extracted a picture and carried out preserving or exporting as target picture.

Further, after terminal gets target video data, target video data can be carried out point by terminal by camera lens Section, obtains the video data of each camera lens, and carries out extracting picture from the video data of each camera lens, it is thus achieved that target picture.Wherein, eventually End can extract at least one picture at least one the preset position from the video data of each camera lens respectively.Wherein, at least One position can be any one position in the start position in the video data of each camera lens, point midway and final position Put multiple position.Further, position can also is that other positions, and user can be arranged voluntarily.Therefore, when the position of terminal preset Put when including start position, point midway and final position, terminal can start position in the video data of each camera lens, in Respectively extract a picture on some position and final position preserve as target picture and export.

Further, terminal also can be using above-mentioned extracted picture as pending extraction picture, i.e. it may be that eventually End can carry out picture extraction from the video data of each camera lens respectively, it is thus achieved that at least one pending extraction picture, wherein, eventually End can calculate the accessed pending number extracting picture, performs corresponding according to the pending number extracting picture Step.Concrete, when terminal only gets a pending extraction picture, pending extraction picture is set to by terminal Target picture；When terminal gets the pending extraction picture of at least two, terminal can be all pending to obtained Extract picture to carry out filtering process, it is thus achieved that target picture.Wherein, all pending extraction picture obtained is carried out by terminal Filter process, it is thus achieved that target picture may is that terminal calculates any two in the pending extraction picture obtained and waits to locate The similarity extracted between picture of reason, wherein, calculating any two pending similarities extracted between picture can be Terminal all carries out picture detection to these any two pending extraction pictures respectively, calculates the similarity of its content.Work as terminal After calculating these any two pending similarities extracting picture, terminal can determine whether that whether similarity is more than the threshold preset Value, when terminal judges similarity is more than the threshold value preset, terminal can filter this any two pending extracting in picture Any one pending extraction picture, this any two pending extract in pictures by except this any one pending Extract another the pending extraction picture outside picture and be set to target picture, when terminal judges similarity is less than or equal to During the threshold value preset, these any two pending extraction pictures can be disposed as target picture by terminal.Thus terminal can obtain Get target picture.Wherein, terminal can carry out combination of two respectively to the pending extraction picture obtained, thus calculate and appoint Two pending similarities extracted between picture of anticipating can be to calculate pending the extracting between picture of each combination Similarity.

In embodiments of the present invention, after terminal gets target picture, terminal can also carry out exporting target picture.Or Person is supplied to user and makes other information, as carried out making video profile, preparing advertising as excellent picture using target picture.

Further, in embodiments of the present invention, when terminal gets at least two target picture, terminal can be all of Target picture carries out video-splicing and obtains target video and export target video.Meanwhile, terminal also can according to target picture Number obtains the reproduction time of target video, and carries out broadcasting target video in reproduction time.

See Fig. 2, be a kind of embodiment schematic flow sheet of a kind of terminal that the embodiment of the present invention provides.The present invention implements A kind of terminal of example includes:

First acquiring unit 100, for obtaining the Wave data of the voice data in pending audio, video data.

Second acquisition unit 200, for obtaining target waveform data, in described voice data from described Wave data Obtain the target audio data with described target waveform Data Matching.

3rd acquiring unit 300, for obtaining and described target audio data in described pending audio, video data Corresponding target sound video data.

Extraction unit 400, for carrying out extraction picture, it is thus achieved that target picture from described target sound video data.

In embodiments of the present invention, when voice data during terminal gets pending audio, video data, first obtains Taking unit 100 to process voice data, it is thus achieved that Wave data, wherein, Wave data can be that voice data is in time domain On Wave data.Wherein, all and one time reference line of Wave data and voice data is corresponding.

In embodiments of the present invention, when voice data is during playing out, if occurring background music, it is corresponding Wave data amplitude will increase.Therefore, second acquisition unit 200 can detection waveform data, and when Wave data being detected When there is sudden change, second acquisition unit 200 can determine whether that background music occurs in the voice data that this section of Wave data is corresponding, and until Next time wave recovery sudden change before waveform time, second acquisition unit 200 can determine whether the voice data that this section of Wave data is corresponding Terminate background music, thus second acquisition unit 200 can record the time point of this twice change, intercepts this in Wave data Wave data between two time points is as target waveform data.Therefore, second acquisition unit 200 can monitor Wave data, Obtain amplitude more than or equal to the Wave data presetting amplitude thresholds, and by amplitude more than or equal to the waveform presetting amplitude thresholds Data are set to target waveform data, and wherein, amplitude thresholds user can not be defined at this with sets itself.

In embodiments of the present invention, the target waveform data portability timestamp of acquisition, wherein, timestamp is a character Sequence, uniquely identifies the time at certain a moment, i.e. includes start time point and the knot of target waveform data of target waveform data Bundle time point.Owing to shared all and one the time reference line of Wave data and voice data is corresponding, therefore, second acquisition unit 200 can obtain, according to the timestamp of target waveform data, the target audio data that this timestamp is corresponding on voice data, thus Get the target audio data with target waveform Data Matching.

In embodiments of the present invention, voice data, video data and audio, video data all carry timestamp.Due to sound Voice data in video data need to carry out with the video data in audio, video data Tong Bu play, therefore, voice data time Between stab, the timestamp of video data in audio, video data and all with one time reference line pair of timestamp of audio, video data Should, so that voice data can carry out Tong Bu broadcasting with video data, that is, terminal output audio, video data plays out Time, the Voice & Video of output carries out synchronizing to play.Therefore, the 3rd acquiring unit 300 can according in target audio data time Between stab in audio, video data, obtain the audio, video data that this timestamp is corresponding, thus by audio, video data corresponding for this timestamp It is set to target sound video data.

In embodiments of the present invention, target sound video data can include target audio data and target video data, carries Take unit 400 and can extract target video data included in target sound video data.

After extraction unit 400 gets target video data, extraction unit 400 can preset in target video data At least one position on extract at least one picture.Wherein, at least one position can be the starting point in target video data Position, point midway and final position, further, position can also is that other positions, and user can be arranged voluntarily.Cause This, when the position of terminal preset includes start position, point midway and final position, extraction unit 400 can regard in target Frequency according in start position, point midway and final position respectively extract a picture carry out preserving as target picture or Person exports.

Further, after extraction unit 400 gets target video data, extraction unit 400 can be by camera lens to target Video data carries out segmentation, obtains the video data of each camera lens, and carries out extracting picture from the video data of each camera lens, it is thus achieved that mesh Mark picture.Wherein, extraction unit 400 can extract respectively at least one the preset position from the video data of each camera lens to A few picture.Wherein, start position, point midway and the end during at least one position can be the video data of each camera lens Multiple position, any one position in some position.Further, position can also is that other positions, and user can set voluntarily Put.Therefore, when the position of terminal preset includes start position, point midway and final position, extraction unit 400 can be respectively A picture is respectively extracted as target picture on start position, point midway and final position in the video data of camera lens Preserve and export.

Further, extraction unit 400 also can using above-mentioned extracted picture as pending extraction picture, To be, extraction unit 400 can carry out picture extraction from the video data of each camera lens respectively, it is thus achieved that at least one pending carrying Taking picture, wherein, extraction unit 400 can calculate the accessed pending number extracting picture, carries according to pending The number taking picture performs corresponding step.Concrete, when extraction unit 400 only gets a pending extraction picture Time, pending extraction picture is set to target picture by extraction unit 400；Treat when extraction unit 400 gets at least two During the extraction picture processed, extraction unit 400 can filter process to all pending extraction picture obtained, it is thus achieved that Target picture.Wherein, extraction unit 400 filters process to all pending extraction picture obtained, it is thus achieved that target Picture may is that extraction unit 400 calculates any two pending extraction pictures in pending the extracting in picture obtained Similarity between face, wherein, calculating any two pending similarities extracted between picture can be extraction unit 400 Respectively these any two pending extraction pictures are all carried out picture detection, calculate the similarity of its content.Work as extraction unit After 400 calculate these any two pending similarities extracting picture, extraction unit 400 can determine whether whether similarity is more than Preset threshold value, when terminal judges similarity more than preset threshold value time, extraction unit 400 can filter this any two pending Any one the pending extraction picture extracted in picture, will be except this in these any two pending extraction pictures Pending another pending extraction picture extracted outside picture of anticipating is set to target picture, works as extraction unit 400 judge that when similarity is less than or equal to the threshold value preset, extraction unit 400 can be by these any two pending extraction pictures It is disposed as target picture.Thus extraction unit 400 can get target picture.Wherein, extraction unit 400 can be to being obtained Pending extraction picture carries out combination of two respectively, thus calculates any two pending similarities extracted between picture It can be the pending similarity extracted between picture calculating each combination.

In embodiments of the present invention, after extraction unit 400 gets target picture, terminal can also carry out exporting target Picture.Or it is supplied to user and makes other information, as carried out making video profile, making as excellent picture using target picture Advertisement etc..

Further, in embodiments of the present invention, when extraction unit 400 gets at least two target picture, terminal All of target picture can carry out video-splicing acquisition target video and export target video.Meanwhile, terminal also can be according to target The number of picture obtains the reproduction time of target video, and carries out broadcasting target video in reproduction time.

Wherein, described second acquisition unit 200 includes:

Described extraction unit 400 includes:

Second extracts subelement, for carrying out extraction picture from the video data of described each camera lens respectively, it is thus achieved that at least One target picture.

Described second extracts subelement includes:

3rd extracts subelement, for carrying out picture extraction from the video data of described each camera lens respectively, it is thus achieved that at least One pending extraction picture；

Described process subelement includes:

Described terminal also includes:

Concatenation unit, for carrying out video-splicing by least two target picture, it is thus achieved that target video also exports described mesh Mark video.

Wherein it is possible to be understood by, the function of each functional module of the unit in the terminal of the present embodiment can be according to above-mentioned Method in embodiment of the method implements, and it implements process and is referred to the associated description of said method embodiment, this Place no longer repeats.

Refer to Fig. 3, for the another kind of embodiment schematic flow sheet of a kind of terminal of the present invention.As it is shown on figure 3, the present embodiment Described a kind of terminal includes:

Housing 301, processor 302, memorizer 303, circuit board 307 and power circuit 305, wherein, circuit board 307 disposes It is arranged on circuit board 307 at the interior volume that housing 301 surrounds, processor 302 and memorizer 303；Power circuit 305, uses Power in each circuit or the device for terminal；Memorizer 303 is used for storing executable program code；Processor 302 is by reading In access to memory 303, the executable program code of storage runs the program corresponding with executable program code, for execution Following steps:

Obtain the Wave data of voice data in pending audio, video data；

Wherein, described processor 302 obtains target waveform data from described Wave data and includes:

Wherein, described processor 302 carries out extracting picture from described target sound video data, it is thus achieved that target picture bag Include:

Extract the target video data in described target sound video data；

Wherein, described processor 302 carries out extracting picture from the video data of described each camera lens respectively, it is thus achieved that at least one Individual target picture includes:

Wherein, described processor 302 filters process to the extraction picture that described at least two is pending, it is thus achieved that described At least one target picture includes:

In pending the extracting of described at least two, picture calculates any two pending phases extracted between picture Like degree；

Wherein, described processor 302 carry out from described target sound video data extract picture, it is thus achieved that target picture it After, processor 302 also performs:

It is understood that the function of each functional module of the terminal of the present embodiment can be according in said method embodiment Method implements, and it implements process and is referred to the associated description of said method embodiment, the most no longer repeats.

One of ordinary skill in the art will appreciate that all or part of flow process realizing in above-described embodiment method, be permissible Instructing relevant hardware by computer program to complete, described program can be stored in a computer read/write memory medium In, this program is upon execution, it may include such as the flow process of the embodiment of above-mentioned each method.Wherein, described storage medium can be magnetic Dish, CD, read-only store-memory body (Read-Only Memory, ROM) or random store-memory body (Random Access Memory, RAM) etc..

The above disclosed present pre-ferred embodiments that is only, can not limit the right model of the present invention with this certainly Enclose, the equivalent variations therefore made according to the claims in the present invention, still belong to the scope that the present invention is contained.

Claims

1. the extracting method of a picture, it is characterised in that described method includes:

Obtain the Wave data of voice data in pending audio, video data；

From described Wave data, obtain target waveform data, obtain and described target waveform data in described voice data The target audio data joined；

2. the method for claim 1, it is characterised in that described acquisition target waveform packet from described Wave data Include:

3. the method for claim 1, it is characterised in that described carrying out from described target sound video data extracts picture Face, it is thus achieved that target picture includes:

Extract the target video data in described target sound video data；

4. method as claimed in claim 3, it is characterised in that described carry respectively from the video data of described each camera lens Take picture, it is thus achieved that at least one target picture includes:

Carry out respectively extracting picture from the video data of described each camera lens, it is thus achieved that at least one pending extraction picture；

When only getting a pending extraction picture, described pending extraction picture is set to target picture；

When getting the pending extraction picture of at least two, the extraction picture pending to described at least two filters Process, it is thus achieved that at least one target picture described.

5. method as claimed in claim 4, it is characterised in that the described extraction picture pending to described at least two is carried out Filter process, it is thus achieved that at least one target picture described includes:

In pending the extracting of described at least two, picture calculates any two pending similarities extracted between picture；

When described similarity is more than the threshold value preset, filter described any two pending any one extracted in picture Pending extraction picture, in described any two pending extracting except described any one pending extraction in picture Another pending extraction picture outside picture is set to described target picture；

When described similarity is less than the threshold value preset, described any two pending extraction pictures are disposed as described mesh Mark picture.

6. method as claimed in claim 3, it is characterised in that described carrying out from described target sound video data extracts picture Face, it is thus achieved that after target picture, also includes:

7. a terminal, it is characterised in that described terminal includes:

Second acquisition unit, for from described Wave data obtain target waveform data, in described voice data obtain with The target audio data of described target waveform Data Matching；

3rd acquiring unit, for obtaining the mesh corresponding with described target audio data in described pending audio, video data Mark with phonetic symbols video data；

8. terminal as claimed in claim 7, it is characterised in that described second acquisition unit includes:

First arranges subelement, for described amplitude is set to target waveform data more than the Wave data presetting amplitude thresholds.

9. terminal as claimed in claim 7, it is characterised in that described extraction unit includes:

10. a terminal, it is characterised in that including: housing, processor, memorizer, circuit board and power circuit, wherein, described Circuit board is placed in the interior volume that described housing surrounds, described processor and described memorizer and is arranged on described circuit board； Described power circuit, powers for each circuit or the device for described mobile terminal；Described memorizer is used for storing and can perform Program code；Described processor is run by the executable program code of storage in the described memorizer of reading and performs with described The program that program code is corresponding, for performing following steps:

Obtain the Wave data of voice data in pending audio, video data；