CN106210878A - Picture extraction method and terminal - Google Patents
Picture extraction method and terminal Download PDFInfo
- Publication number
- CN106210878A CN106210878A CN201610592540.2A CN201610592540A CN106210878A CN 106210878 A CN106210878 A CN 106210878A CN 201610592540 A CN201610592540 A CN 201610592540A CN 106210878 A CN106210878 A CN 106210878A
- Authority
- CN
- China
- Prior art keywords
- picture
- target
- data
- video data
- pending
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4394—Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
Abstract
The embodiment of the invention discloses a picture extraction method, which comprises the following steps: extracting audio data in audio and video data to be processed; acquiring preset background music characteristics, and detecting target audio data matched with the background music characteristics in the audio data; acquiring target audio and video data corresponding to the target audio data from the audio and video data to be processed; and extracting a picture from the target audio and video data to obtain a target picture. The embodiment of the invention also discloses a terminal. By adopting the method and the device, the efficiency of extracting the target picture is improved, and the extraction cost is reduced.
Description
Technical field
The present invention relates to electronic technology field, particularly relate to extracting method and the terminal of a kind of picture.
Background technology
Video has a lot of Highlights, in order to effectively utilize the picture of these Highlights, factory in playing process at present
The picture of these Highlights is manually intercepted by Shang Changhui, and runs it, as prepared advertising, or makes video
Brief introduction etc..
But, owing to target picture is manually to intercept, this be often depending on intercept operation personnel invite people like with
And individual's problem such as quality, this makes the image quality manually intercepting out uncontrollable, it is impossible to ensure the quality of target picture, and
A large amount of human cost need to be spent to carry out checking video and carrying out operation intercept, which increase the cost overhead of manufacturer, and extract
Picture efficiency is low.
Summary of the invention
Embodiment of the present invention technical problem to be solved is, it is provided that the extracting method of a kind of picture and terminal.Can carry
The high efficiency extracting target picture, reduces extraction cost.
In order to solve above-mentioned technical problem, embodiments provide the extracting method of a kind of picture, including:
Extract the voice data in pending audio, video data;
Obtaining preset background music feature, in described voice data, detection and described background music feature match
Target audio data;
The target sound video data corresponding with described target audio data is obtained in described pending audio, video data;
Carry out extracting picture from described target sound video data, it is thus achieved that target picture.
Wherein, the background music feature that described acquisition is preset, in described voice data, detection is special with described background music
Levy the target audio data matched to include:
Obtain preset background music feature;
Described voice data is divided, it is thus achieved that at least one section audio data;
Every section audio data are carried out feature extraction respectively, it is thus achieved that every section audio data characteristic of correspondence data;
The target with described background music characteristic matching is obtained special in described every section audio data characteristic of correspondence data
Levy data;
Obtain the voice data that described target characteristic data are corresponding, voice data corresponding for described target characteristic data is set
It is set to target audio data.
Wherein, described carrying out from described target sound video data extracts picture, it is thus achieved that target picture includes:
Extract the target video data in described target sound video data;
Described target video data is carried out camera lens division, it is thus achieved that the video data of each camera lens;
Carry out respectively extracting picture from the video data of described each camera lens, it is thus achieved that at least one target picture.
Wherein, described from the video data of described each camera lens, picture extraction is carried out respectively, it is thus achieved that at least one target is drawn
Face includes:
Carry out respectively extracting picture from the video data of described each camera lens, it is thus achieved that at least one pending extraction picture
Face;
When only getting a pending extraction picture, described pending extraction picture is set to target and draws
Face;
When getting the pending extraction picture of at least two, the extraction picture pending to described at least two is carried out
Filter process, it is thus achieved that at least one target picture described.
Wherein, the described extraction picture pending to described at least two filters process, it is thus achieved that described at least one
Target picture includes:
In pending the extracting of described at least two, picture calculates any two pending phases extracted between picture
Like degree;
Judge that whether described similarity is more than the threshold value preset;
When described similarity is more than the threshold value preset, filter described any two pending extract in pictures any
One pending extraction picture, described any two pending extract in pictures by except described any one pending
Extract another the pending extraction picture outside picture and be set to described target picture;
When described similarity is less than the threshold value preset, described any two pending extraction pictures are disposed as institute
State target picture.
Wherein, described carrying out from described target sound video data extracts picture, it is thus achieved that after target picture, also include:
At least two target picture is carried out video-splicing, it is thus achieved that featured videos also exports described featured videos.
The embodiment of the present invention additionally provides a kind of terminal, including:
Extraction unit, for extracting the voice data in pending audio, video data;
Detector unit, for obtaining preset background music feature, detection and described background sound in described voice data
The target audio data that happy feature matches;
Acquiring unit, for obtaining the mesh corresponding with described target audio data in described pending audio, video data
Mark with phonetic symbols video data;
Extract picture unit, for carrying out picture extraction from described target sound video data, it is thus achieved that target picture.
Wherein, described detector unit includes:
Obtain feature subelement, for obtaining preset background music feature;
First divides subelement, for dividing described voice data, it is thus achieved that at least one section audio data;
First extracts subelement, for every section audio data are carried out feature extraction respectively, it is thus achieved that every section audio data pair
The characteristic answered;
Obtain subelement, special with described background music for obtaining in described every section audio data characteristic of correspondence data
Levy the target characteristic data of coupling;
First arranges subelement, for obtaining the voice data that described target characteristic data are corresponding, by described target characteristic
Voice data corresponding to data is set to target audio data.
Wherein, described extraction picture unit includes:
Second extracts subelement, for extracting the target video data in described target sound video data;
Second divides subelement, for described target video data is carried out camera lens division, it is thus achieved that the video counts of each camera lens
According to;
3rd extracts subelement, for carrying out extraction picture from the video data of described each camera lens respectively, it is thus achieved that at least
One target picture.
Wherein, described 3rd extraction subelement includes:
3rd extracts subelement, for carrying out extraction picture from the video data of described each camera lens respectively, it is thus achieved that at least
One pending extraction picture;
Second arranges subelement, for when only getting a pending extraction picture, by described pending carrying
Take picture and be set to target picture;
Process subelement, for when getting the pending extraction picture of at least two, waiting to locate to described at least two
The extraction picture of reason carries out filtering process, it is thus achieved that at least one target picture described.
Wherein, filter subelement described in include:
Computation subunit, calculates any two pending proposing for pending the extracting in picture in described at least two
Take the similarity between picture;
Judgment sub-unit, for judging that whether described similarity is more than the threshold value preset;
Filter subelement, for when described judgment sub-unit judges described similarity more than the threshold value preset, filtering institute
State any two pending any one pending extraction pictures extracted in pictures, described any two pending
Extract in picture and another the pending extraction picture in addition to described any one pending extraction picture is set to
Described target picture;
3rd arranges subelement, is used for when described judgment sub-unit judges described similarity less than the threshold value preset, will
Described any two pending extraction pictures are disposed as described target picture.
Wherein, described terminal also includes:
Concatenation unit, for carrying out video-splicing by least two target picture, it is thus achieved that featured videos also exports described essence
Color frequency.
The embodiment of the present invention additionally provides a kind of terminal, including: housing, processor, memorizer, circuit board and power supply electricity
Road, wherein, described circuit board is placed in the interior volume that described housing surrounds, described processor and described memorizer and is arranged on institute
State on circuit board;Described power circuit, powers for each circuit or the device for described mobile terminal;Described memorizer is used for
Storage executable program code;Described processor by read the executable program code of storage in described memorizer run with
The program that described executable program code is corresponding, for performing following steps:
Extract the voice data in pending audio, video data;
Obtaining preset background music feature, in described voice data, detection and described background music feature match
Target audio data;
The target sound video data corresponding with described target audio data is obtained in described pending audio, video data;
Carry out extracting picture from described target sound video data, it is thus achieved that target picture.
In embodiments of the present invention, terminal extracts the voice data in pending audio, video data, obtains the preset back of the body
Scape musical features, detects the target audio data matched with described background music feature, described in described voice data
Pending audio, video data obtains the target sound video data corresponding with described target audio data, regards from described target sound
Frequency evidence carries out extracting picture, it is thus achieved that target picture, this can make terminal automatically can extract target from audio, video data
Picture, can improve the efficiency of extraction target picture from audio, video data, reduces extraction cost.
Accompanying drawing explanation
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
In having technology to describe, the required accompanying drawing used is briefly described, it should be apparent that, the accompanying drawing in describing below is only this
Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, it is also possible to
Other accompanying drawing is obtained according to these accompanying drawings.
Fig. 1 is a kind of embodiment schematic flow sheet of the extracting method of a kind of picture that the embodiment of the present invention provides;
Fig. 2 is a kind of example structure figure of a kind of terminal that the embodiment of the present invention provides;
Fig. 3 is the another kind of example structure figure of a kind of terminal that the embodiment of the present invention provides.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Describe, it is clear that described embodiment is only a part of embodiment of the present invention rather than whole embodiments wholely.Based on
Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under not making creative work premise
Embodiment, broadly falls into the scope of protection of the invention.
Executive agent in the embodiment of the present invention can be terminal, described terminal comprise the steps that computer, panel computer,
The intelligent terminal such as notebook, above-mentioned terminal is only citing, and non exhaustive, including but not limited to above-mentioned terminal.
See Fig. 1, be the extracting method one embodiment schematic flow sheet of a kind of picture that the embodiment of the present invention provides.This
The extracting method of a kind of picture of inventive embodiments comprises the steps:
S100, extracts the voice data in pending audio, video data.
In embodiments of the present invention, audio, video data is made up of voice data and video data, and audio, video data is permissible
By Audio Players output audio frequency and video player output video, if audio, video data can be having of televising
The audio, video datas such as the video recording on the programme content of voice output, mobile phone with voice output.
In embodiments of the present invention, pending audio, video data is the audio frequency and video number that user selects to carry out processing
Can be as pending audio, video data according to, the audio, video data received such as terminal, or terminal can store multiple sound
Video data, user therefrom selects an audio, video data as pending audio, video data.
In embodiments of the present invention, when terminal determines pending audio, video data, terminal can be to pending audio frequency and video
Decoding data, extracts the voice data included by pending audio, video data.
S101, obtains preset background music feature, detection and described background music feature phase in described voice data
The target audio data of coupling.
In embodiments of the present invention, voice data can include polytype voice data, such as the sound of background music type
The type audio data such as the voice data of frequency evidence, the voice data of aside type and quiet type.
In embodiments of the present invention, it is generally present in the audio frequency and video number of music of having powerful connections due to the target picture in audio frequency and video
According to, therefore, voice data can be identified out having powerful connections the voice data of music by terminal, thus carry out processing and obtain target
Picture.
In embodiments of the present invention, terminal can be identified out voice data the having powerful connections voice data of music is permissible
It is: obtain preset background musical features that detection and the voice data of background music characteristic matching in voice data, when detecting
During with the voice data of background music characteristic matching, extract the voice data with background music characteristic matching, will be with background music
The voice data of characteristic matching is as target audio data.Wherein, it is preset that preset background music feature can be that user is carried out
Storage.Concrete, the target audio data that detection and background music feature match in voice data may is that audio frequency number
According to dividing, it is thus achieved that at least one section audio data, wherein it is possible to be to divide on a time period, such as the time period pair with 1s
Voice data divides, and the reproduction time of the voice data of each segmentation is 1s.After voice data is divided by terminal,
Terminal can carry out feature extraction respectively to every section audio data, it is thus achieved that every section audio data characteristic of correspondence data, then often
Section audio data characteristic of correspondence data obtain the target characteristic data with background music characteristic matching, obtains target characteristic number
According to corresponding voice data, voice data corresponding for target characteristic data is set to target audio data, wherein, when terminal obtains
When the target characteristic data taken have multiple, terminal can obtain multiple voice datas that multiple target characteristic data are respectively the most corresponding,
And multiple voice datas are spliced, it is thus achieved that target audio data.
S102, obtains the target sound video corresponding with described target audio data in described pending audio, video data
Data.
In embodiments of the present invention, voice data, video data and audio, video data all carry timestamp, wherein,
Timestamp is a character string, uniquely identifies the time at certain a moment.Owing to the voice data in audio, video data regards with sound
Video data in frequency evidence need to carry out synchronizing to play, therefore, and the video data in the timestamp of voice data, audio, video data
Timestamp corresponding with all with one time reference line of timestamp of audio, video data, so that voice data and video data
Can carry out synchronizing to play, that is, when terminal output audio, video data plays out, the Voice & Video of output carries out synchronization and broadcasts
Put.Therefore, can obtain, according to the timestamp in target audio data, the audio frequency and video number that this timestamp is corresponding in audio, video data
According to, thus audio, video data corresponding for this timestamp is set to target sound video data and gets target sound video data.
S103, carries out extracting picture, it is thus achieved that target picture from described target sound video data.
In embodiments of the present invention, target sound video data can include target audio data and target video data, eventually
End can extract target video data included in target sound video data.
After terminal gets target video data, terminal can at least one preset position in target video data
At least one picture of upper extraction.Wherein, at least one position can be the start position in target video data, point midway with
And final position, further, position can also is that other positions, and user can be arranged voluntarily.Therefore, when the position of terminal preset
Putting when including start position, point midway and final position, terminal can start position in target video data, position, midpoint
Put and final position is respectively extracted a picture and carried out preserving or exporting as target picture.
Further, after terminal gets target video data, target video data can be carried out point by terminal by camera lens
Section, obtains the video data of each camera lens, and carries out extracting picture from the video data of each camera lens, it is thus achieved that target picture.Wherein, eventually
End can extract at least one picture at least one the preset position from the video data of each camera lens respectively.Wherein, at least
One position can be any one position in the start position in the video data of each camera lens, point midway and final position
Put multiple position.Further, position can also is that other positions, and user can be arranged voluntarily.Therefore, when the position of terminal preset
Put when including start position, point midway and final position, terminal can start position in the video data of each camera lens, in
Respectively extract a picture on some position and final position preserve as target picture and export.
Further, terminal also can be using above-mentioned extracted picture as pending extraction picture, i.e. it may be that eventually
End can carry out picture extraction from the video data of each camera lens respectively, it is thus achieved that at least one pending extraction picture, wherein, eventually
End can calculate the accessed pending number extracting picture, performs corresponding according to the pending number extracting picture
Step.Concrete, when terminal only gets a pending extraction picture, pending extraction picture is set to by terminal
Target picture;When terminal gets the pending extraction picture of at least two, terminal can be all pending to obtained
Extract picture to carry out filtering process, it is thus achieved that target picture.Wherein, all pending extraction picture obtained is carried out by terminal
Filter process, it is thus achieved that target picture may is that terminal calculates any two in the pending extraction picture obtained and waits to locate
The similarity extracted between picture of reason, wherein, calculating any two pending similarities extracted between picture can be
Terminal all carries out picture detection to these any two pending extraction pictures respectively, calculates the similarity of its content.Work as terminal
After calculating these any two pending similarities extracting picture, terminal can determine whether that whether similarity is more than the threshold preset
Value, when terminal judges similarity is more than the threshold value preset, terminal can filter this any two pending extracting in picture
Any one pending extraction picture, this any two pending extract in pictures by except this any one pending
Extract another the pending extraction picture outside picture and be set to target picture, when terminal judges similarity is less than or equal to
During the threshold value preset, these any two pending extraction pictures can be disposed as target picture by terminal.Thus terminal can obtain
Get target picture.Wherein, terminal can carry out combination of two respectively to the pending extraction picture obtained, thus calculate and appoint
Two pending similarities extracted between picture of anticipating can be to calculate pending the extracting between picture of each combination
Similarity.
In embodiments of the present invention, after terminal gets target picture, terminal can also carry out exporting target picture.Or
Person is supplied to user and makes other information, as carried out making video profile, preparing advertising as excellent picture using target picture.
Further, in embodiments of the present invention, when terminal gets at least two target picture, terminal can be all of
Target picture carries out video-splicing and obtains featured videos and export featured videos.Meanwhile, terminal also can according to target picture
Number obtains the reproduction time of featured videos, and carries out broadcasting target video in reproduction time.
In embodiments of the present invention, terminal extracts the voice data in pending audio, video data, obtains the preset back of the body
Scape musical features, detects the target audio data matched with described background music feature, described in described voice data
Pending audio, video data obtains the target sound video data corresponding with described target audio data, regards from described target sound
Frequency evidence carries out extracting picture, it is thus achieved that target picture, this can make terminal automatically can extract target from audio, video data
Picture, can improve the efficiency of extraction target picture from audio, video data, reduces extraction cost.
See Fig. 2, be a kind of embodiment schematic flow sheet of a kind of terminal that the embodiment of the present invention provides.The present invention implements
A kind of terminal of example includes:
Extraction unit 100, for extracting the voice data in pending audio, video data.
Detector unit 200, for obtaining preset background music feature, detection and described background in described voice data
The target audio data that musical features matches.
Acquiring unit 300 is corresponding with described target audio data for obtaining in described pending audio, video data
Target sound video data.
Extract picture unit 400, for carrying out extraction picture from described target sound video data, it is thus achieved that target picture.
In embodiments of the present invention, audio, video data is made up of voice data and video data, and audio, video data is permissible
By Audio Players output audio frequency and video player output video, if audio, video data can be having of televising
The audio, video datas such as the video recording on the programme content of voice output, mobile phone with voice output.
In embodiments of the present invention, pending audio, video data is the audio frequency and video number that user selects to carry out processing
Can be as pending audio, video data according to, the audio, video data received such as terminal, or terminal can store multiple sound
Video data, user therefrom selects an audio, video data as pending audio, video data.
In embodiments of the present invention, when terminal determines pending audio, video data, and extraction unit 100 can be to pending
Audio, video data is decoded, and extracts the voice data included by pending audio, video data.
In embodiments of the present invention, voice data can include polytype voice data, such as the sound of background music type
The type audio data such as the voice data of frequency evidence, the voice data of aside type and quiet type.
In embodiments of the present invention, it is generally present in the audio frequency and video number of music of having powerful connections due to the target picture in audio frequency and video
According to, therefore, voice data can be identified out having powerful connections the voice data of music by detector unit 200, thus processes
Obtain target picture.
In embodiments of the present invention, voice data can be identified out having powerful connections the audio frequency number of music by detector unit 200
According to may is that detector unit 200 obtains preset background musical features, detection and background music characteristic matching in voice data
Voice data, when the voice data with background music characteristic matching being detected, extracts the audio frequency with background music characteristic matching
Data, using the voice data with background music characteristic matching as target audio data.Wherein, preset background music feature can
Being that user carries out preset storage.Concrete, detector unit 200 detects in voice data and matches with background music feature
Target audio data may is that voice data is divided by detector unit 200, it is thus achieved that at least one section audio data, wherein, and can
To be to divide on a time period, as voice data is divided by the time period with 1s, broadcasting of the voice data of each segmentation
The time of putting is 1s.After voice data is divided by detector unit 200, detector unit 200 can to every section audio data respectively
Carry out feature extraction, it is thus achieved that every section audio data characteristic of correspondence data, then in every section audio data characteristic of correspondence data
Middle acquisition and the target characteristic data of background music characteristic matching, obtain the voice data that target characteristic data are corresponding, by target
Voice data corresponding to characteristic is set to target audio data, wherein, when the target characteristic data that detector unit 200 obtains
When having multiple, detector unit 200 can obtain multiple voice datas that multiple target characteristic data are respectively the most corresponding, and by multiple sounds
Frequency is according to splicing, it is thus achieved that target audio data.
In embodiments of the present invention, voice data, video data and audio, video data all carry timestamp, wherein,
Timestamp is a character string, uniquely identifies the time at certain a moment.Owing to the voice data in audio, video data regards with sound
Video data in frequency evidence need to carry out synchronizing to play, therefore, and the video data in the timestamp of voice data, audio, video data
Timestamp corresponding with all with one time reference line of timestamp of audio, video data, so that voice data and video data
Can carry out synchronizing to play, that is, when terminal output audio, video data plays out, the Voice & Video of output carries out synchronization and broadcasts
Put.Therefore, acquiring unit 300 can obtain this timestamp correspondence according to the timestamp in target audio data in audio, video data
Audio, video data, thus acquiring unit 300 audio, video data corresponding for this timestamp is set to target sound video data obtain
To target sound video data.
In embodiments of the present invention, target sound video data can include target audio data and target video data, carries
Take picture unit 400 and can extract target video data included in target sound video data.
After extraction picture unit 400 gets target video data, extracting picture unit 400 can be in target video data
In at least one preset position on extract at least one picture.Wherein, at least one position can be target video data
In start position, point midway and final position, further, position can also is that other positions, and user can be voluntarily
Arrange.Therefore, when the position of terminal preset includes start position, point midway and final position, extract picture unit 400
Can respectively extract a picture as target picture in start position, point midway and final position in target video data
Carry out preserving or exporting.
Further, after extraction picture unit 400 gets target video data, extracting picture unit 400 can be by mirror
Head carries out segmentation to target video data, obtains the video data of each camera lens, and carries out extracting picture from the video data of each camera lens
Face, it is thus achieved that target picture.Wherein, extracting picture unit 400 can at least one preset position from the video data of each camera lens
Put and extract at least one picture respectively.Start position during wherein, at least one position can be the video data of each camera lens,
Multiple position, any one position in point midway and final position.Further, position can also is that other positions, uses
Family can be arranged voluntarily.Therefore, when the position of terminal preset includes start position, point midway and final position, extract
Picture unit 400 can respectively extract one on start position, point midway and the final position in the video data of each camera lens
Open picture preserve as target picture and export.
Further, extract picture unit 400 also can using above-mentioned extracted picture as pending extraction picture,
I.e. it may be that extract picture unit 400 can carry out picture extraction, wherein, extraction unit from the video data of each camera lens respectively
400 can calculate the accessed pending number extracting picture, perform corresponding according to the pending number extracting picture
Step.Concrete, when extraction unit 400 only gets a pending extraction picture, extraction unit 400 is by pending
Extraction picture be set to target picture;When extraction unit 400 gets the pending extraction picture of at least two, extract picture
Face unit 400 can filter process to all pending extraction picture obtained, it is thus achieved that target picture.Wherein, extract
Picture unit 400 filters process to all pending extraction picture obtained, it is thus achieved that target picture may is that extraction
Picture unit 400 obtained pending extract picture calculates any two pending extract between pictures similar
Degree, wherein, calculating any two pending similarities extracted between picture can be that these any two are treated by terminal respectively
The extraction picture processed all carries out picture detection, calculates the similarity of its content.When extracting picture unit 400, to calculate this any
After two pending similarities extracting picture, extract picture unit 400 and can determine whether whether similarity is more than the threshold value preset,
When extracting picture unit 400 and judging that similarity is more than default threshold value, extraction picture unit 400 can filter these any two and treat
Any one the pending extraction picture extracted in picture processed, will remove in these any two pending extraction pictures
Another pending extraction picture outside this any one pending extraction picture is set to target picture, when extracting picture
Face unit 400 judges when similarity is less than or equal to the threshold value preset, extract picture unit 400 can by this any two pending
Extraction picture be disposed as target picture.Thus extract picture unit 400 and can get target picture.Wherein, picture is extracted
Unit 400 can carry out combination of two respectively to the pending extraction picture obtained, thus calculate any two pending
Extracting the similarity between picture can be the pending similarity extracted between picture calculating each combination.
In embodiments of the present invention, after extraction picture unit 400 gets target picture, terminal can also export
Target picture.Or be supplied to user and make other information, as using target picture as excellent picture carry out make video profile,
Prepare advertising.
Further, in embodiments of the present invention, when extracting picture unit 400 and getting at least two target picture,
Terminal all of target picture can be carried out video-splicing acquisition featured videos and export featured videos.Meanwhile, terminal also can basis
The number of target picture obtains the reproduction time of featured videos, and carries out broadcasting target video in reproduction time.
Wherein, described detector unit 200 includes:
Obtain feature subelement, for obtaining preset background music feature;
First divides subelement, for dividing described voice data, it is thus achieved that at least one section audio data;
First extracts subelement, for every section audio data are carried out feature extraction respectively, it is thus achieved that every section audio data pair
The characteristic answered;
Obtain subelement, special with described background music for obtaining in described every section audio data characteristic of correspondence data
Levy the target characteristic data of coupling;
First arranges subelement, for obtaining the voice data that described target characteristic data are corresponding, by described target characteristic
Voice data corresponding to data is set to target audio data.
Described extraction picture unit 400 includes:
Second extracts subelement, for extracting the target video data in described target sound video data;
Second divides subelement, for described target video data is carried out camera lens division, it is thus achieved that the video counts of each camera lens
According to;
3rd extracts subelement, for carrying out picture extraction from the video data of described each camera lens respectively, it is thus achieved that at least
One target picture.
Described 3rd extracts subelement includes:
3rd extracts subelement, for carrying out extraction picture from the video data of described each camera lens respectively, it is thus achieved that at least
One pending extraction picture;
Second arranges subelement, for when only getting a pending extraction picture, by described pending carrying
Take picture and be set to target picture;
Process subelement, for when getting the pending extraction picture of at least two, waiting to locate to described at least two
The extraction picture of reason carries out filtering process, it is thus achieved that at least one target picture described.
The described subelement that filters includes:
Computation subunit, for calculating any two pending proposing at least one pending extracting in picture described
Take the similarity between picture;
Judgment sub-unit, for judging that whether described similarity is more than the threshold value preset;
Filter subelement, for when described judgment sub-unit judges described similarity more than the threshold value preset, filtering institute
State any two pending any one pending extraction pictures extracted in pictures, described any two pending
Extract in picture and another the pending extraction picture in addition to described any one pending extraction picture is set to
Described target picture;
3rd arranges subelement, is used for when described judgment sub-unit judges described similarity less than the threshold value preset, will
Described any two pending extraction pictures are disposed as described target picture.
Described terminal also includes:
Concatenation unit, for carrying out video-splicing by least two target picture, it is thus achieved that featured videos also exports described essence
Color frequency.
Wherein it is possible to be understood by, the function of each functional module of the unit in the terminal of the present embodiment can be according to above-mentioned
Method in embodiment of the method implements, and it implements process and is referred to the associated description of said method embodiment, this
Place no longer repeats.
In embodiments of the present invention, terminal extracts the voice data in pending audio, video data, obtains the preset back of the body
Scape musical features, detects the target audio data matched with described background music feature, described in described voice data
Pending audio, video data obtains the target sound video data corresponding with described target audio data, regards from described target sound
Frequency evidence carries out extracting picture, it is thus achieved that target picture, this can make terminal automatically can extract target from audio, video data
Picture, can improve the efficiency of extraction target picture from audio, video data, reduces extraction cost.
Refer to Fig. 3, for the another kind of embodiment schematic flow sheet of a kind of terminal of the present invention.As it is shown on figure 3, the present embodiment
Described a kind of terminal includes:
Housing 301, processor 302, memorizer 303, circuit board 307 and power circuit 305, wherein, circuit board 307 disposes
It is arranged on circuit board 307 at the interior volume that housing 301 surrounds, processor 302 and memorizer 303;Power circuit 305, uses
Power in each circuit or the device for terminal;Memorizer 303 is used for storing executable program code;Processor 302 is by reading
In access to memory 303, the executable program code of storage runs the program corresponding with executable program code, for execution
Following steps:
Extract the voice data in pending audio, video data;
Obtaining preset background music feature, in described voice data, detection and described background music feature match
Target audio data;
The target sound video data corresponding with described target audio data is obtained in described pending audio, video data;
Carry out extracting picture from described target sound video data, it is thus achieved that target picture.
Wherein, described processor 302 obtains preset background music feature, detection and the described back of the body in described voice data
The target audio data that scape musical features matches include:
Obtain preset background music feature;
Described voice data is divided, it is thus achieved that at least one section audio data;
Every section audio data are carried out feature extraction respectively, it is thus achieved that every section audio data characteristic of correspondence data;
The target with described background music characteristic matching is obtained special in described every section audio data characteristic of correspondence data
Levy data;
Obtain the voice data that described target characteristic data are corresponding, voice data corresponding for described target characteristic data is set
It is set to target audio data.
Wherein, described processor 302 carries out extracting picture from described target sound video data, it is thus achieved that target picture bag
Include:
Extract the target video data in described target sound video data;
Described target video data is carried out camera lens division, it is thus achieved that the video data of each camera lens;
Picture extraction is carried out respectively, it is thus achieved that at least one target picture from the video data of described each camera lens.
Wherein, described processor 302 carries out extracting picture from the video data of described each camera lens respectively, it is thus achieved that at least one
Individual target picture includes:
Carry out respectively extracting picture from the video data of described each camera lens, it is thus achieved that at least one pending extraction picture
Face;
When only getting a pending extraction picture, described pending extraction picture is set to target and draws
Face;
When getting the pending extraction picture of at least two, the extraction picture pending to described at least two is carried out
Filter process, it is thus achieved that at least one target picture described.
Wherein, described processor 302 filters process to the extraction picture that described at least two is pending, it is thus achieved that described
At least one target picture includes:
In pending the extracting of described at least two, picture calculates any two pending phases extracted between picture
Like degree;
Judge that whether described similarity is more than the threshold value preset;
When described similarity is more than the threshold value preset, filter described any two pending extract in pictures any
One pending extraction picture, described any two pending extract in pictures by except described any one pending
Extract another the pending extraction picture outside picture and be set to described target picture;
When described similarity is less than the threshold value preset, described any two pending extraction pictures are disposed as institute
State target picture.
Wherein, described processor 302 carry out from described target sound video data extract picture, it is thus achieved that target picture it
After, described processor 302 also performs:
At least two target picture is carried out video-splicing, it is thus achieved that featured videos also exports described featured videos.
It is understood that the function of each functional module of the terminal of the present embodiment can be according in said method embodiment
Method implements, and it implements process and is referred to the associated description of said method embodiment, the most no longer repeats.
In embodiments of the present invention, terminal extracts the voice data in pending audio, video data, obtains the preset back of the body
Scape musical features, detects the target audio data matched with described background music feature, described in described voice data
Pending audio, video data obtains the target sound video data corresponding with described target audio data, regards from described target sound
Frequency evidence carries out extracting picture, it is thus achieved that target picture, this can make terminal automatically can extract target from audio, video data
Picture, can improve the efficiency of extraction target picture from audio, video data, reduces extraction cost.
One of ordinary skill in the art will appreciate that all or part of flow process realizing in above-described embodiment method, be permissible
Instructing relevant hardware by computer program to complete, described program can be stored in a computer read/write memory medium
In, this program is upon execution, it may include such as the flow process of the embodiment of above-mentioned each method.Wherein, described storage medium can be magnetic
Dish, CD, read-only store-memory body (Read-Only Memory, ROM) or random store-memory body (Random Access
Memory, RAM) etc..
The above disclosed present pre-ferred embodiments that is only, can not limit the right model of the present invention with this certainly
Enclose, the equivalent variations therefore made according to the claims in the present invention, still belong to the scope that the present invention is contained.
Claims (10)
1. the extracting method of a picture, it is characterised in that described method includes:
Extract the voice data in pending audio, video data;
Obtain preset background music feature, described voice data detects the target matched with described background music feature
Voice data;
The target sound video data corresponding with described target audio data is obtained in described pending audio, video data;
Carry out extracting picture from described target sound video data, it is thus achieved that target picture.
2. the method for claim 1, it is characterised in that the background music feature that described acquisition is preset, at described audio frequency
The target audio data that in data, detection and described background music feature match include:
Obtain preset background music feature;
Described voice data is divided, it is thus achieved that at least one section audio data;
Every section audio data are carried out feature extraction respectively, it is thus achieved that every section audio data characteristic of correspondence data;
The target characteristic number with described background music characteristic matching is obtained in described every section audio data characteristic of correspondence data
According to;
Obtain the voice data that described target characteristic data are corresponding, voice data corresponding for described target characteristic data is set to
Target audio data.
3. the method for claim 1, it is characterised in that described carrying out from described target sound video data extracts picture
Face, it is thus achieved that target picture includes:
Extract the target video data in described target sound video data;
Described target video data is carried out camera lens division, it is thus achieved that the video data of each camera lens;
Picture extraction is carried out respectively, it is thus achieved that at least one target picture from the video data of described each camera lens.
4. method as claimed in claim 3, it is characterised in that described carry respectively from the video data of described each camera lens
Take picture, it is thus achieved that at least one target picture includes:
Carry out respectively extracting picture from the video data of described each camera lens, it is thus achieved that at least one pending extraction picture;
When only getting a pending extraction picture, described pending extraction picture is set to target picture;
When getting the pending extraction picture of at least two, the extraction picture pending to described at least two filters
Process, it is thus achieved that at least one target picture described.
5. method as claimed in claim 4, it is characterised in that the described extraction picture pending to described at least two is carried out
Filter process, it is thus achieved that at least one target picture described includes:
In pending the extracting of described at least two, picture calculates any two pending similarities extracted between picture;
Judge that whether described similarity is more than the threshold value preset;
When described similarity is more than the threshold value preset, filter described any two pending any one extracted in picture
Pending extraction picture, in described any two pending extracting except described any one pending extraction in picture
Another pending extraction picture outside picture is set to described target picture;
When described similarity is less than the threshold value preset, described any two pending extraction pictures are disposed as described mesh
Mark picture.
6. method as claimed in claim 3, it is characterised in that described carrying out from described target sound video data extracts picture
Face, it is thus achieved that after target picture, also includes:
At least two target picture is carried out video-splicing, it is thus achieved that featured videos also exports described featured videos.
7. a terminal, it is characterised in that described terminal includes:
Extraction unit, for extracting the voice data in pending audio, video data;
Detector unit, for obtaining preset background music feature, in described voice data, detection is special with described background music
Levy the target audio data matched;
Acquiring unit, for obtaining the target sound corresponding with described target audio data in described pending audio, video data
Video data;
Extract picture unit, for carrying out extraction picture from described target sound video data, it is thus achieved that target picture.
8. terminal as claimed in claim 7, it is characterised in that described detector unit includes:
Obtain feature subelement, for obtaining preset background music feature;
First divides subelement, for dividing described voice data, it is thus achieved that at least one section audio data;
First extracts subelement, for every section audio data are carried out feature extraction respectively, it is thus achieved that every section audio data are corresponding
Characteristic;
Obtain subelement, for obtaining and described background music feature in described every section audio data characteristic of correspondence data
The target characteristic data joined;
First arranges subelement, for obtaining the voice data that described target characteristic data are corresponding, by described target characteristic data
Corresponding voice data is set to target audio data.
9. terminal as claimed in claim 7, it is characterised in that described extraction picture unit includes:
Second extracts subelement, for extracting the target video data in described target sound video data;
Second divides subelement, for described target video data is carried out camera lens division, it is thus achieved that the video data of each camera lens;
3rd extracts subelement, for carrying out picture extraction from the video data of described each camera lens respectively, it is thus achieved that at least one
Target picture.
10. a terminal, it is characterised in that including: housing, processor, memorizer, circuit board and power circuit, wherein, described
Circuit board is placed in the interior volume that described housing surrounds, described processor and described memorizer and is arranged on described circuit board;
Described power circuit, powers for each circuit or the device for described mobile terminal;Described memorizer is used for storing and can perform
Program code;Described processor is run by the executable program code of storage in the described memorizer of reading and performs with described
The program that program code is corresponding, for performing following steps:
Extract the voice data in pending audio, video data;
Obtain preset background music feature, described voice data detects the target matched with described background music feature
Voice data;
The target sound video data corresponding with described target audio data is obtained in described pending audio, video data;
Carry out extracting picture from described target sound video data, it is thus achieved that target picture.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610592540.2A CN106210878A (en) | 2016-07-25 | 2016-07-25 | Picture extraction method and terminal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610592540.2A CN106210878A (en) | 2016-07-25 | 2016-07-25 | Picture extraction method and terminal |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106210878A true CN106210878A (en) | 2016-12-07 |
Family
ID=57495129
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610592540.2A Pending CN106210878A (en) | 2016-07-25 | 2016-07-25 | Picture extraction method and terminal |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106210878A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107135419A (en) * | 2017-06-14 | 2017-09-05 | 北京奇虎科技有限公司 | A kind of method and apparatus for editing video |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1658663A (en) * | 2004-02-18 | 2005-08-24 | 三星电子株式会社 | Method and apparatus for summarizing a plurality of frames |
US20070168864A1 (en) * | 2006-01-11 | 2007-07-19 | Koji Yamamoto | Video summarization apparatus and method |
CN101431689A (en) * | 2007-11-05 | 2009-05-13 | 华为技术有限公司 | Method and device for generating video abstract |
US20100104261A1 (en) * | 2008-10-24 | 2010-04-29 | Zhu Liu | Brief and high-interest video summary generation |
CN104867161A (en) * | 2015-05-14 | 2015-08-26 | 国家电网公司 | Video-processing method and device |
CN105721955A (en) * | 2016-01-20 | 2016-06-29 | 天津大学 | Video key frame selecting method |
-
2016
- 2016-07-25 CN CN201610592540.2A patent/CN106210878A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1658663A (en) * | 2004-02-18 | 2005-08-24 | 三星电子株式会社 | Method and apparatus for summarizing a plurality of frames |
US20070168864A1 (en) * | 2006-01-11 | 2007-07-19 | Koji Yamamoto | Video summarization apparatus and method |
CN101431689A (en) * | 2007-11-05 | 2009-05-13 | 华为技术有限公司 | Method and device for generating video abstract |
US20100104261A1 (en) * | 2008-10-24 | 2010-04-29 | Zhu Liu | Brief and high-interest video summary generation |
CN104867161A (en) * | 2015-05-14 | 2015-08-26 | 国家电网公司 | Video-processing method and device |
CN105721955A (en) * | 2016-01-20 | 2016-06-29 | 天津大学 | Video key frame selecting method |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107135419A (en) * | 2017-06-14 | 2017-09-05 | 北京奇虎科技有限公司 | A kind of method and apparatus for editing video |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109729420B (en) | Picture processing method and device, mobile terminal and computer readable storage medium | |
EP4068793A1 (en) | Video editing method, video editing apparatus, terminal, and readable storage medium | |
CN106851401A (en) | A kind of method and system of automatic addition captions | |
CN108024079A (en) | Record screen method, apparatus, terminal and storage medium | |
CN104756188A (en) | Device and method for changing shape of lips on basis of automatic word translation | |
CN105578097A (en) | Video recording method and terminal | |
CN105407261A (en) | Image processing device and method, and electronic equipment | |
CN106982344B (en) | Video information processing method and device | |
KR20100002090A (en) | Electronic apparatus, video content editing method, and program | |
WO2016197708A1 (en) | Recording method and terminal | |
CN107170432A (en) | A kind of music generating method and device | |
CN107517313A (en) | Awakening method and device, terminal and readable storage medium storing program for executing | |
CN105554361A (en) | Processing method and system of dynamic video shooting | |
CN104469487B (en) | A kind of detection method and device of scene switching point | |
CN104090883A (en) | Playing control processing method and playing control processing device for audio file | |
CN105938390A (en) | Content output apparatus and content output method | |
CN105787976A (en) | Method and apparatus for processing pictures | |
CN110062163B (en) | Multimedia data processing method and device | |
CN105224936B (en) | A kind of iris feature information extracting method and device | |
CN105100647A (en) | Subtitle correction method and terminal | |
CN108153882A (en) | A kind of data processing method and device | |
CN106210878A (en) | Picture extraction method and terminal | |
CN101923474A (en) | Program running parameter configuration method and computer | |
WO2021052130A1 (en) | Video processing method, apparatus and device, and computer-readable storage medium | |
JP2015082692A (en) | Video editing device, video editing method, and video editing program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20161207 |
|
RJ01 | Rejection of invention patent application after publication |