CN106210878A

CN106210878A - Picture extraction method and terminal

Info

Publication number: CN106210878A
Application number: CN201610592540.2A
Authority: CN
Inventors: 白斌
Original assignee: Beijing Kingsoft Internet Security Software Co Ltd
Current assignee: Beijing Kingsoft Internet Security Software Co Ltd
Priority date: 2016-07-25
Filing date: 2016-07-25
Publication date: 2016-12-07

Abstract

The embodiment of the invention discloses a picture extraction method, which comprises the following steps: extracting audio data in audio and video data to be processed; acquiring preset background music characteristics, and detecting target audio data matched with the background music characteristics in the audio data; acquiring target audio and video data corresponding to the target audio data from the audio and video data to be processed; and extracting a picture from the target audio and video data to obtain a target picture. The embodiment of the invention also discloses a terminal. By adopting the method and the device, the efficiency of extracting the target picture is improved, and the extraction cost is reduced.

Description

The extracting method of a kind of picture and terminal

Technical field

The present invention relates to electronic technology field, particularly relate to extracting method and the terminal of a kind of picture.

Background technology

Video has a lot of Highlights, in order to effectively utilize the picture of these Highlights, factory in playing process at present The picture of these Highlights is manually intercepted by Shang Changhui, and runs it, as prepared advertising, or makes video Brief introduction etc..

But, owing to target picture is manually to intercept, this be often depending on intercept operation personnel invite people like with And individual's problem such as quality, this makes the image quality manually intercepting out uncontrollable, it is impossible to ensure the quality of target picture, and A large amount of human cost need to be spent to carry out checking video and carrying out operation intercept, which increase the cost overhead of manufacturer, and extract Picture efficiency is low.

Summary of the invention

Embodiment of the present invention technical problem to be solved is, it is provided that the extracting method of a kind of picture and terminal.Can carry The high efficiency extracting target picture, reduces extraction cost.

In order to solve above-mentioned technical problem, embodiments provide the extracting method of a kind of picture, including:

Extract the voice data in pending audio, video data；

Obtaining preset background music feature, in described voice data, detection and described background music feature match Target audio data；

The target sound video data corresponding with described target audio data is obtained in described pending audio, video data；

Carry out extracting picture from described target sound video data, it is thus achieved that target picture.

Wherein, the background music feature that described acquisition is preset, in described voice data, detection is special with described background music Levy the target audio data matched to include:

Obtain preset background music feature；

Described voice data is divided, it is thus achieved that at least one section audio data；

Every section audio data are carried out feature extraction respectively, it is thus achieved that every section audio data characteristic of correspondence data；

The target with described background music characteristic matching is obtained special in described every section audio data characteristic of correspondence data Levy data；

Obtain the voice data that described target characteristic data are corresponding, voice data corresponding for described target characteristic data is set It is set to target audio data.

Wherein, described carrying out from described target sound video data extracts picture, it is thus achieved that target picture includes:

Extract the target video data in described target sound video data；

Described target video data is carried out camera lens division, it is thus achieved that the video data of each camera lens；

Carry out respectively extracting picture from the video data of described each camera lens, it is thus achieved that at least one target picture.

Wherein, described from the video data of described each camera lens, picture extraction is carried out respectively, it is thus achieved that at least one target is drawn Face includes:

Carry out respectively extracting picture from the video data of described each camera lens, it is thus achieved that at least one pending extraction picture Face；

When only getting a pending extraction picture, described pending extraction picture is set to target and draws Face；

When getting the pending extraction picture of at least two, the extraction picture pending to described at least two is carried out Filter process, it is thus achieved that at least one target picture described.

Wherein, the described extraction picture pending to described at least two filters process, it is thus achieved that described at least one Target picture includes:

In pending the extracting of described at least two, picture calculates any two pending phases extracted between picture Like degree；

Judge that whether described similarity is more than the threshold value preset；

When described similarity is more than the threshold value preset, filter described any two pending extract in pictures any One pending extraction picture, described any two pending extract in pictures by except described any one pending Extract another the pending extraction picture outside picture and be set to described target picture；

When described similarity is less than the threshold value preset, described any two pending extraction pictures are disposed as institute State target picture.

Wherein, described carrying out from described target sound video data extracts picture, it is thus achieved that after target picture, also include:

At least two target picture is carried out video-splicing, it is thus achieved that featured videos also exports described featured videos.

The embodiment of the present invention additionally provides a kind of terminal, including:

Extraction unit, for extracting the voice data in pending audio, video data；

Detector unit, for obtaining preset background music feature, detection and described background sound in described voice data The target audio data that happy feature matches；

Acquiring unit, for obtaining the mesh corresponding with described target audio data in described pending audio, video data Mark with phonetic symbols video data；

Extract picture unit, for carrying out picture extraction from described target sound video data, it is thus achieved that target picture.

Wherein, described detector unit includes:

Obtain feature subelement, for obtaining preset background music feature；

First divides subelement, for dividing described voice data, it is thus achieved that at least one section audio data；

First extracts subelement, for every section audio data are carried out feature extraction respectively, it is thus achieved that every section audio data pair The characteristic answered；

Obtain subelement, special with described background music for obtaining in described every section audio data characteristic of correspondence data Levy the target characteristic data of coupling；

First arranges subelement, for obtaining the voice data that described target characteristic data are corresponding, by described target characteristic Voice data corresponding to data is set to target audio data.

Wherein, described extraction picture unit includes:

Second extracts subelement, for extracting the target video data in described target sound video data；

Second divides subelement, for described target video data is carried out camera lens division, it is thus achieved that the video counts of each camera lens According to；

3rd extracts subelement, for carrying out extraction picture from the video data of described each camera lens respectively, it is thus achieved that at least One target picture.

Wherein, described 3rd extraction subelement includes:

3rd extracts subelement, for carrying out extraction picture from the video data of described each camera lens respectively, it is thus achieved that at least One pending extraction picture；

Second arranges subelement, for when only getting a pending extraction picture, by described pending carrying Take picture and be set to target picture；

Process subelement, for when getting the pending extraction picture of at least two, waiting to locate to described at least two The extraction picture of reason carries out filtering process, it is thus achieved that at least one target picture described.

Wherein, filter subelement described in include:

Computation subunit, calculates any two pending proposing for pending the extracting in picture in described at least two Take the similarity between picture；

Judgment sub-unit, for judging that whether described similarity is more than the threshold value preset；

Filter subelement, for when described judgment sub-unit judges described similarity more than the threshold value preset, filtering institute State any two pending any one pending extraction pictures extracted in pictures, described any two pending Extract in picture and another the pending extraction picture in addition to described any one pending extraction picture is set to Described target picture；

3rd arranges subelement, is used for when described judgment sub-unit judges described similarity less than the threshold value preset, will Described any two pending extraction pictures are disposed as described target picture.

Wherein, described terminal also includes:

Concatenation unit, for carrying out video-splicing by least two target picture, it is thus achieved that featured videos also exports described essence Color frequency.

The embodiment of the present invention additionally provides a kind of terminal, including: housing, processor, memorizer, circuit board and power supply electricity Road, wherein, described circuit board is placed in the interior volume that described housing surrounds, described processor and described memorizer and is arranged on institute State on circuit board；Described power circuit, powers for each circuit or the device for described mobile terminal；Described memorizer is used for Storage executable program code；Described processor by read the executable program code of storage in described memorizer run with The program that described executable program code is corresponding, for performing following steps:

Extract the voice data in pending audio, video data；

In embodiments of the present invention, terminal extracts the voice data in pending audio, video data, obtains the preset back of the body Scape musical features, detects the target audio data matched with described background music feature, described in described voice data Pending audio, video data obtains the target sound video data corresponding with described target audio data, regards from described target sound Frequency evidence carries out extracting picture, it is thus achieved that target picture, this can make terminal automatically can extract target from audio, video data Picture, can improve the efficiency of extraction target picture from audio, video data, reduces extraction cost.

Accompanying drawing explanation

In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing In having technology to describe, the required accompanying drawing used is briefly described, it should be apparent that, the accompanying drawing in describing below is only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, it is also possible to Other accompanying drawing is obtained according to these accompanying drawings.

Fig. 1 is a kind of embodiment schematic flow sheet of the extracting method of a kind of picture that the embodiment of the present invention provides；

Fig. 2 is a kind of example structure figure of a kind of terminal that the embodiment of the present invention provides；

Fig. 3 is the another kind of example structure figure of a kind of terminal that the embodiment of the present invention provides.

Detailed description of the invention

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Describe, it is clear that described embodiment is only a part of embodiment of the present invention rather than whole embodiments wholely.Based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under not making creative work premise Embodiment, broadly falls into the scope of protection of the invention.

Executive agent in the embodiment of the present invention can be terminal, described terminal comprise the steps that computer, panel computer, The intelligent terminal such as notebook, above-mentioned terminal is only citing, and non exhaustive, including but not limited to above-mentioned terminal.

See Fig. 1, be the extracting method one embodiment schematic flow sheet of a kind of picture that the embodiment of the present invention provides.This The extracting method of a kind of picture of inventive embodiments comprises the steps:

S100, extracts the voice data in pending audio, video data.

In embodiments of the present invention, audio, video data is made up of voice data and video data, and audio, video data is permissible By Audio Players output audio frequency and video player output video, if audio, video data can be having of televising The audio, video datas such as the video recording on the programme content of voice output, mobile phone with voice output.

In embodiments of the present invention, pending audio, video data is the audio frequency and video number that user selects to carry out processing Can be as pending audio, video data according to, the audio, video data received such as terminal, or terminal can store multiple sound Video data, user therefrom selects an audio, video data as pending audio, video data.

In embodiments of the present invention, when terminal determines pending audio, video data, terminal can be to pending audio frequency and video Decoding data, extracts the voice data included by pending audio, video data.

S101, obtains preset background music feature, detection and described background music feature phase in described voice data The target audio data of coupling.

In embodiments of the present invention, voice data can include polytype voice data, such as the sound of background music type The type audio data such as the voice data of frequency evidence, the voice data of aside type and quiet type.

In embodiments of the present invention, it is generally present in the audio frequency and video number of music of having powerful connections due to the target picture in audio frequency and video According to, therefore, voice data can be identified out having powerful connections the voice data of music by terminal, thus carry out processing and obtain target Picture.

In embodiments of the present invention, terminal can be identified out voice data the having powerful connections voice data of music is permissible It is: obtain preset background musical features that detection and the voice data of background music characteristic matching in voice data, when detecting During with the voice data of background music characteristic matching, extract the voice data with background music characteristic matching, will be with background music The voice data of characteristic matching is as target audio data.Wherein, it is preset that preset background music feature can be that user is carried out Storage.Concrete, the target audio data that detection and background music feature match in voice data may is that audio frequency number According to dividing, it is thus achieved that at least one section audio data, wherein it is possible to be to divide on a time period, such as the time period pair with 1s Voice data divides, and the reproduction time of the voice data of each segmentation is 1s.After voice data is divided by terminal, Terminal can carry out feature extraction respectively to every section audio data, it is thus achieved that every section audio data characteristic of correspondence data, then often Section audio data characteristic of correspondence data obtain the target characteristic data with background music characteristic matching, obtains target characteristic number According to corresponding voice data, voice data corresponding for target characteristic data is set to target audio data, wherein, when terminal obtains When the target characteristic data taken have multiple, terminal can obtain multiple voice datas that multiple target characteristic data are respectively the most corresponding, And multiple voice datas are spliced, it is thus achieved that target audio data.

S102, obtains the target sound video corresponding with described target audio data in described pending audio, video data Data.

In embodiments of the present invention, voice data, video data and audio, video data all carry timestamp, wherein, Timestamp is a character string, uniquely identifies the time at certain a moment.Owing to the voice data in audio, video data regards with sound Video data in frequency evidence need to carry out synchronizing to play, therefore, and the video data in the timestamp of voice data, audio, video data Timestamp corresponding with all with one time reference line of timestamp of audio, video data, so that voice data and video data Can carry out synchronizing to play, that is, when terminal output audio, video data plays out, the Voice & Video of output carries out synchronization and broadcasts Put.Therefore, can obtain, according to the timestamp in target audio data, the audio frequency and video number that this timestamp is corresponding in audio, video data According to, thus audio, video data corresponding for this timestamp is set to target sound video data and gets target sound video data.

S103, carries out extracting picture, it is thus achieved that target picture from described target sound video data.

In embodiments of the present invention, target sound video data can include target audio data and target video data, eventually End can extract target video data included in target sound video data.

After terminal gets target video data, terminal can at least one preset position in target video data At least one picture of upper extraction.Wherein, at least one position can be the start position in target video data, point midway with And final position, further, position can also is that other positions, and user can be arranged voluntarily.Therefore, when the position of terminal preset Putting when including start position, point midway and final position, terminal can start position in target video data, position, midpoint Put and final position is respectively extracted a picture and carried out preserving or exporting as target picture.

Further, after terminal gets target video data, target video data can be carried out point by terminal by camera lens Section, obtains the video data of each camera lens, and carries out extracting picture from the video data of each camera lens, it is thus achieved that target picture.Wherein, eventually End can extract at least one picture at least one the preset position from the video data of each camera lens respectively.Wherein, at least One position can be any one position in the start position in the video data of each camera lens, point midway and final position Put multiple position.Further, position can also is that other positions, and user can be arranged voluntarily.Therefore, when the position of terminal preset Put when including start position, point midway and final position, terminal can start position in the video data of each camera lens, in Respectively extract a picture on some position and final position preserve as target picture and export.

Further, terminal also can be using above-mentioned extracted picture as pending extraction picture, i.e. it may be that eventually End can carry out picture extraction from the video data of each camera lens respectively, it is thus achieved that at least one pending extraction picture, wherein, eventually End can calculate the accessed pending number extracting picture, performs corresponding according to the pending number extracting picture Step.Concrete, when terminal only gets a pending extraction picture, pending extraction picture is set to by terminal Target picture；When terminal gets the pending extraction picture of at least two, terminal can be all pending to obtained Extract picture to carry out filtering process, it is thus achieved that target picture.Wherein, all pending extraction picture obtained is carried out by terminal Filter process, it is thus achieved that target picture may is that terminal calculates any two in the pending extraction picture obtained and waits to locate The similarity extracted between picture of reason, wherein, calculating any two pending similarities extracted between picture can be Terminal all carries out picture detection to these any two pending extraction pictures respectively, calculates the similarity of its content.Work as terminal After calculating these any two pending similarities extracting picture, terminal can determine whether that whether similarity is more than the threshold preset Value, when terminal judges similarity is more than the threshold value preset, terminal can filter this any two pending extracting in picture Any one pending extraction picture, this any two pending extract in pictures by except this any one pending Extract another the pending extraction picture outside picture and be set to target picture, when terminal judges similarity is less than or equal to During the threshold value preset, these any two pending extraction pictures can be disposed as target picture by terminal.Thus terminal can obtain Get target picture.Wherein, terminal can carry out combination of two respectively to the pending extraction picture obtained, thus calculate and appoint Two pending similarities extracted between picture of anticipating can be to calculate pending the extracting between picture of each combination Similarity.

In embodiments of the present invention, after terminal gets target picture, terminal can also carry out exporting target picture.Or Person is supplied to user and makes other information, as carried out making video profile, preparing advertising as excellent picture using target picture.

Further, in embodiments of the present invention, when terminal gets at least two target picture, terminal can be all of Target picture carries out video-splicing and obtains featured videos and export featured videos.Meanwhile, terminal also can according to target picture Number obtains the reproduction time of featured videos, and carries out broadcasting target video in reproduction time.

See Fig. 2, be a kind of embodiment schematic flow sheet of a kind of terminal that the embodiment of the present invention provides.The present invention implements A kind of terminal of example includes:

Extraction unit 100, for extracting the voice data in pending audio, video data.

Detector unit 200, for obtaining preset background music feature, detection and described background in described voice data The target audio data that musical features matches.

Acquiring unit 300 is corresponding with described target audio data for obtaining in described pending audio, video data Target sound video data.

Extract picture unit 400, for carrying out extraction picture from described target sound video data, it is thus achieved that target picture.

In embodiments of the present invention, when terminal determines pending audio, video data, and extraction unit 100 can be to pending Audio, video data is decoded, and extracts the voice data included by pending audio, video data.

In embodiments of the present invention, it is generally present in the audio frequency and video number of music of having powerful connections due to the target picture in audio frequency and video According to, therefore, voice data can be identified out having powerful connections the voice data of music by detector unit 200, thus processes Obtain target picture.

In embodiments of the present invention, voice data can be identified out having powerful connections the audio frequency number of music by detector unit 200 According to may is that detector unit 200 obtains preset background musical features, detection and background music characteristic matching in voice data Voice data, when the voice data with background music characteristic matching being detected, extracts the audio frequency with background music characteristic matching Data, using the voice data with background music characteristic matching as target audio data.Wherein, preset background music feature can Being that user carries out preset storage.Concrete, detector unit 200 detects in voice data and matches with background music feature Target audio data may is that voice data is divided by detector unit 200, it is thus achieved that at least one section audio data, wherein, and can To be to divide on a time period, as voice data is divided by the time period with 1s, broadcasting of the voice data of each segmentation The time of putting is 1s.After voice data is divided by detector unit 200, detector unit 200 can to every section audio data respectively Carry out feature extraction, it is thus achieved that every section audio data characteristic of correspondence data, then in every section audio data characteristic of correspondence data Middle acquisition and the target characteristic data of background music characteristic matching, obtain the voice data that target characteristic data are corresponding, by target Voice data corresponding to characteristic is set to target audio data, wherein, when the target characteristic data that detector unit 200 obtains When having multiple, detector unit 200 can obtain multiple voice datas that multiple target characteristic data are respectively the most corresponding, and by multiple sounds Frequency is according to splicing, it is thus achieved that target audio data.

In embodiments of the present invention, voice data, video data and audio, video data all carry timestamp, wherein, Timestamp is a character string, uniquely identifies the time at certain a moment.Owing to the voice data in audio, video data regards with sound Video data in frequency evidence need to carry out synchronizing to play, therefore, and the video data in the timestamp of voice data, audio, video data Timestamp corresponding with all with one time reference line of timestamp of audio, video data, so that voice data and video data Can carry out synchronizing to play, that is, when terminal output audio, video data plays out, the Voice & Video of output carries out synchronization and broadcasts Put.Therefore, acquiring unit 300 can obtain this timestamp correspondence according to the timestamp in target audio data in audio, video data Audio, video data, thus acquiring unit 300 audio, video data corresponding for this timestamp is set to target sound video data obtain To target sound video data.

In embodiments of the present invention, target sound video data can include target audio data and target video data, carries Take picture unit 400 and can extract target video data included in target sound video data.

After extraction picture unit 400 gets target video data, extracting picture unit 400 can be in target video data In at least one preset position on extract at least one picture.Wherein, at least one position can be target video data In start position, point midway and final position, further, position can also is that other positions, and user can be voluntarily Arrange.Therefore, when the position of terminal preset includes start position, point midway and final position, extract picture unit 400 Can respectively extract a picture as target picture in start position, point midway and final position in target video data Carry out preserving or exporting.

Further, after extraction picture unit 400 gets target video data, extracting picture unit 400 can be by mirror Head carries out segmentation to target video data, obtains the video data of each camera lens, and carries out extracting picture from the video data of each camera lens Face, it is thus achieved that target picture.Wherein, extracting picture unit 400 can at least one preset position from the video data of each camera lens Put and extract at least one picture respectively.Start position during wherein, at least one position can be the video data of each camera lens, Multiple position, any one position in point midway and final position.Further, position can also is that other positions, uses Family can be arranged voluntarily.Therefore, when the position of terminal preset includes start position, point midway and final position, extract Picture unit 400 can respectively extract one on start position, point midway and the final position in the video data of each camera lens Open picture preserve as target picture and export.

Further, extract picture unit 400 also can using above-mentioned extracted picture as pending extraction picture, I.e. it may be that extract picture unit 400 can carry out picture extraction, wherein, extraction unit from the video data of each camera lens respectively 400 can calculate the accessed pending number extracting picture, perform corresponding according to the pending number extracting picture Step.Concrete, when extraction unit 400 only gets a pending extraction picture, extraction unit 400 is by pending Extraction picture be set to target picture；When extraction unit 400 gets the pending extraction picture of at least two, extract picture Face unit 400 can filter process to all pending extraction picture obtained, it is thus achieved that target picture.Wherein, extract Picture unit 400 filters process to all pending extraction picture obtained, it is thus achieved that target picture may is that extraction Picture unit 400 obtained pending extract picture calculates any two pending extract between pictures similar Degree, wherein, calculating any two pending similarities extracted between picture can be that these any two are treated by terminal respectively The extraction picture processed all carries out picture detection, calculates the similarity of its content.When extracting picture unit 400, to calculate this any After two pending similarities extracting picture, extract picture unit 400 and can determine whether whether similarity is more than the threshold value preset, When extracting picture unit 400 and judging that similarity is more than default threshold value, extraction picture unit 400 can filter these any two and treat Any one the pending extraction picture extracted in picture processed, will remove in these any two pending extraction pictures Another pending extraction picture outside this any one pending extraction picture is set to target picture, when extracting picture Face unit 400 judges when similarity is less than or equal to the threshold value preset, extract picture unit 400 can by this any two pending Extraction picture be disposed as target picture.Thus extract picture unit 400 and can get target picture.Wherein, picture is extracted Unit 400 can carry out combination of two respectively to the pending extraction picture obtained, thus calculate any two pending Extracting the similarity between picture can be the pending similarity extracted between picture calculating each combination.

In embodiments of the present invention, after extraction picture unit 400 gets target picture, terminal can also export Target picture.Or be supplied to user and make other information, as using target picture as excellent picture carry out make video profile, Prepare advertising.

Further, in embodiments of the present invention, when extracting picture unit 400 and getting at least two target picture, Terminal all of target picture can be carried out video-splicing acquisition featured videos and export featured videos.Meanwhile, terminal also can basis The number of target picture obtains the reproduction time of featured videos, and carries out broadcasting target video in reproduction time.

Wherein, described detector unit 200 includes:

Obtain feature subelement, for obtaining preset background music feature；

Described extraction picture unit 400 includes:

3rd extracts subelement, for carrying out picture extraction from the video data of described each camera lens respectively, it is thus achieved that at least One target picture.

Described 3rd extracts subelement includes:

The described subelement that filters includes:

Computation subunit, for calculating any two pending proposing at least one pending extracting in picture described Take the similarity between picture；

Described terminal also includes:

Wherein it is possible to be understood by, the function of each functional module of the unit in the terminal of the present embodiment can be according to above-mentioned Method in embodiment of the method implements, and it implements process and is referred to the associated description of said method embodiment, this Place no longer repeats.

Refer to Fig. 3, for the another kind of embodiment schematic flow sheet of a kind of terminal of the present invention.As it is shown on figure 3, the present embodiment Described a kind of terminal includes:

Housing 301, processor 302, memorizer 303, circuit board 307 and power circuit 305, wherein, circuit board 307 disposes It is arranged on circuit board 307 at the interior volume that housing 301 surrounds, processor 302 and memorizer 303；Power circuit 305, uses Power in each circuit or the device for terminal；Memorizer 303 is used for storing executable program code；Processor 302 is by reading In access to memory 303, the executable program code of storage runs the program corresponding with executable program code, for execution Following steps:

Extract the voice data in pending audio, video data；

Wherein, described processor 302 obtains preset background music feature, detection and the described back of the body in described voice data The target audio data that scape musical features matches include:

Obtain preset background music feature；

Wherein, described processor 302 carries out extracting picture from described target sound video data, it is thus achieved that target picture bag Include:

Extract the target video data in described target sound video data；

Picture extraction is carried out respectively, it is thus achieved that at least one target picture from the video data of described each camera lens.

Wherein, described processor 302 carries out extracting picture from the video data of described each camera lens respectively, it is thus achieved that at least one Individual target picture includes:

Wherein, described processor 302 filters process to the extraction picture that described at least two is pending, it is thus achieved that described At least one target picture includes:

Wherein, described processor 302 carry out from described target sound video data extract picture, it is thus achieved that target picture it After, described processor 302 also performs:

It is understood that the function of each functional module of the terminal of the present embodiment can be according in said method embodiment Method implements, and it implements process and is referred to the associated description of said method embodiment, the most no longer repeats.

One of ordinary skill in the art will appreciate that all or part of flow process realizing in above-described embodiment method, be permissible Instructing relevant hardware by computer program to complete, described program can be stored in a computer read/write memory medium In, this program is upon execution, it may include such as the flow process of the embodiment of above-mentioned each method.Wherein, described storage medium can be magnetic Dish, CD, read-only store-memory body (Read-Only Memory, ROM) or random store-memory body (Random Access Memory, RAM) etc..

The above disclosed present pre-ferred embodiments that is only, can not limit the right model of the present invention with this certainly Enclose, the equivalent variations therefore made according to the claims in the present invention, still belong to the scope that the present invention is contained.

Claims

1. the extracting method of a picture, it is characterised in that described method includes:

Extract the voice data in pending audio, video data；

Obtain preset background music feature, described voice data detects the target matched with described background music feature Voice data；

2. the method for claim 1, it is characterised in that the background music feature that described acquisition is preset, at described audio frequency The target audio data that in data, detection and described background music feature match include:

Obtain preset background music feature；

The target characteristic number with described background music characteristic matching is obtained in described every section audio data characteristic of correspondence data According to；

Obtain the voice data that described target characteristic data are corresponding, voice data corresponding for described target characteristic data is set to Target audio data.

3. the method for claim 1, it is characterised in that described carrying out from described target sound video data extracts picture Face, it is thus achieved that target picture includes:

Extract the target video data in described target sound video data；

4. method as claimed in claim 3, it is characterised in that described carry respectively from the video data of described each camera lens Take picture, it is thus achieved that at least one target picture includes:

Carry out respectively extracting picture from the video data of described each camera lens, it is thus achieved that at least one pending extraction picture；

When only getting a pending extraction picture, described pending extraction picture is set to target picture；

When getting the pending extraction picture of at least two, the extraction picture pending to described at least two filters Process, it is thus achieved that at least one target picture described.

5. method as claimed in claim 4, it is characterised in that the described extraction picture pending to described at least two is carried out Filter process, it is thus achieved that at least one target picture described includes:

In pending the extracting of described at least two, picture calculates any two pending similarities extracted between picture；

When described similarity is more than the threshold value preset, filter described any two pending any one extracted in picture Pending extraction picture, in described any two pending extracting except described any one pending extraction in picture Another pending extraction picture outside picture is set to described target picture；

When described similarity is less than the threshold value preset, described any two pending extraction pictures are disposed as described mesh Mark picture.

6. method as claimed in claim 3, it is characterised in that described carrying out from described target sound video data extracts picture Face, it is thus achieved that after target picture, also includes:

7. a terminal, it is characterised in that described terminal includes:

Extraction unit, for extracting the voice data in pending audio, video data；

Detector unit, for obtaining preset background music feature, in described voice data, detection is special with described background music Levy the target audio data matched；

Acquiring unit, for obtaining the target sound corresponding with described target audio data in described pending audio, video data Video data；

Extract picture unit, for carrying out extraction picture from described target sound video data, it is thus achieved that target picture.

8. terminal as claimed in claim 7, it is characterised in that described detector unit includes:

Obtain feature subelement, for obtaining preset background music feature；

First extracts subelement, for every section audio data are carried out feature extraction respectively, it is thus achieved that every section audio data are corresponding Characteristic；

Obtain subelement, for obtaining and described background music feature in described every section audio data characteristic of correspondence data The target characteristic data joined；

First arranges subelement, for obtaining the voice data that described target characteristic data are corresponding, by described target characteristic data Corresponding voice data is set to target audio data.

9. terminal as claimed in claim 7, it is characterised in that described extraction picture unit includes:

Second divides subelement, for described target video data is carried out camera lens division, it is thus achieved that the video data of each camera lens；

10. a terminal, it is characterised in that including: housing, processor, memorizer, circuit board and power circuit, wherein, described Circuit board is placed in the interior volume that described housing surrounds, described processor and described memorizer and is arranged on described circuit board； Described power circuit, powers for each circuit or the device for described mobile terminal；Described memorizer is used for storing and can perform Program code；Described processor is run by the executable program code of storage in the described memorizer of reading and performs with described The program that program code is corresponding, for performing following steps:

Extract the voice data in pending audio, video data；