CN105760488A

CN105760488A - Image expressing method and device based on multi-level feature fusion

Info

Publication number: CN105760488A
Application number: CN201610089958.1A
Authority: CN
Inventors: 田永鸿; 鄢科; 梁大为; 王耀威
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2016-02-17
Filing date: 2016-02-17
Publication date: 2016-07-13
Anticipated expiration: 2036-02-17
Also published as: CN105760488B

Abstract

The embodiment of the invention provides an image expressing method and device based on multi-level feature fusion. The image expressing method based on multi-level feature fusion includes the steps that at least two features of an input image are obtained; the features are combinations of at least two of a scene level feature, an object level feature and a point level feature; the features are fused into feature space of the input image to serve as expression of the input image; according to the expression of the input image, the input image is processed. Expressive power of the image can be improved.

Description

Based on the image expression method and apparatus that multi-level features merges

Technical field

The present invention relates to computer vision field, particularly relate to a kind of image expression method and apparatus merged based on multi-level features.

Background technology

Along with rise and photographic head quickly the popularizing in terminal of mobile Internet, it is more and more convenient that people record image.View data growth trend exponentially.How the reasonable information expressed in image accurately is the key of many Computer Vision Task, for instance image retrieval, image classification etc..Nearly ten years, Scale invariant features transform (SIFT) is widely used in numerous directions of computer vision, and it is indeformable that rotation, dimensional variation etc. are had good geometry by it.In order to allow the semantic information of the more effective expression image of SIFT, there has been proposed word bag model (Bag-of-Word), the Fei Sheer vector expression such as (FisherVector) and local feature Aggregation Descriptor (VLAD).But these methods do not solve " semantic gap " preferably.

In recent years, the representative that convolutional neural networks (CNN) learns as the degree of depth, to classify at image, many visual tasks such as object detection have significantly surmounted traditional method, it can express the high-layer semantic information of image preferably.Current great majority are the characteristic vectors that the network that full figure input trains obtains a fixing dimension based on the image expression of CNN, express an image.Although the expression of CNN has the semantic information of relatively horn of plenty, but it is comparatively sensitive to some geometric transformations, and its performance is very big by the impact of training data, therefore is lacking in training data or the less task of training data can not well play a role.

Existing feature description is mainly based upon the expression of SIFT or CNN, the general information only describing image overall, i.e. the feature distribution of whole image.If the less information that would become hard to describe this object in feature of object proportion in the picture.Obviously, this is very big on the impact of the related application such as object retrieval and classification.

Summary of the invention

The embodiment provides a kind of image expression method and apparatus merged based on multi-level features, it is possible to increase the expressiveness to image.

To achieve these goals, this invention takes following technical scheme.

A kind of image expression method merged based on multi-level features, including:

Obtain at least two feature of input picture；Described at least two is characterized as the combination of at least two of scene level characteristics, object level characteristics, some level characteristics；

By the feature space of described at least two Feature Fusion to described input picture, as the expression of described input picture；

Expression according to described input picture, processes described input picture.

A kind of image expression device merged based on multi-level features, including:

Acquiring unit, obtains at least two feature of input picture；Described at least two is characterized as the combination of at least two of scene level characteristics, object level characteristics, some level characteristics；

Polymerized unit, by the feature space of described at least two Feature Fusion to described input picture, as the expression of described input picture；

Processing unit, the expression according to described input picture, described input picture is processed.

The present invention proposes a kind of image expression method based on multi-level features fusion and device, and the feature after fusion has very strong expressiveness, and it remains to the performance remained stable under relatively low dimensional.

Aspect and advantage that the present invention adds will part provide in the following description, and these will become apparent from the description below, or is recognized by the practice of the present invention.

Accompanying drawing explanation

In order to be illustrated more clearly that the technical scheme of the embodiment of the present invention, below the accompanying drawing used required during embodiment is described is briefly described, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the premise not paying creative work, it is also possible to obtain other accompanying drawing according to these accompanying drawings.

Fig. 1 be the embodiment of the present invention one provide a kind of based on multi-level features merge image expression method process chart；

Fig. 2 is the block diagram of the image expression method merged based on multi-level features that another embodiment of the present invention provides；

Fig. 3 is the flow chart of the image expression method of object rank feature representation in the embodiment of the present invention；

Fig. 4 is the schematic diagram of the image expression method merged based on multi-level features for an image.

Fig. 5 be the present invention carry out, based on image expression method, the flow chart retrieved；

Fig. 6 be the present invention carry out, based on image expression method, the flow chart classified；

Fig. 7 is the connection diagram of a kind of image expression device merged based on multi-level features that the embodiment of the present invention provides.

Detailed description of the invention

Being described below in detail embodiments of the present invention, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar element or has the element of same or like function from start to finish.The embodiment described below with reference to accompanying drawing is illustrative of, and is only used for explaining the present invention, and is not construed as limiting the claims.

Those skilled in the art of the present technique are appreciated that unless expressly stated, and singulative used herein " ", " one ", " described " and " being somebody's turn to do " may also comprise plural form.Should be further understood that, the wording " including " used in the description of the present invention refers to there is described feature, integer, step, operation, element and/or assembly, but it is not excluded that existence or adds other features one or more, integer, step, operation, element, assembly and/or their group.It should be understood that when we claim element to be " connected " or during " coupled " to another element, it can be directly connected or coupled to other elements, or can also there is intermediary element.Additionally, " connection " used herein or " coupling " can include wireless connections or couple.Wording "and/or" used herein includes one or more any cell listing item being associated and all combinations.

Those skilled in the art of the present technique are appreciated that unless otherwise defined, all terms used herein (include technical term and scientific terminology) and have with the those of ordinary skill in art of the present invention be commonly understood by identical meaning.Should also be understood that in such as general dictionary, those terms of definition should be understood that have the meaning consistent with the meaning in the context of prior art, and unless defined as here, will not explain by idealization or excessively formal implication.

For ease of the understanding to the embodiment of the present invention, it is further explained explanation below in conjunction with accompanying drawing for several specific embodiments, and each embodiment is not intended that the restriction to the embodiment of the present invention.

As it is shown in figure 1, the embodiment of a kind of image expression method merged based on multi-level features, including:

Step 11, obtains at least two feature of input picture；Described at least two is characterized as the combination of at least two of scene level characteristics, object level characteristics, some level characteristics；

Step 12, by the feature space of described at least two Feature Fusion to described input picture, as the expression of described input picture；

Step 13, the expression according to described input picture, described input picture is processed.Wherein, processing can be the expression according to input picture, carries out concrete application.

Such as, step 13 includes:

Feature space according to described input picture, classifies to described input picture；Or

Feature space according to described input picture, retrieves described input picture.

Or, step 13 includes:

Step 131, carries out post processing to the expression of described input picture；

Step 132, the expression according to the described input picture of post processing, described input picture is processed.

Step 131 includes:

The feature space of described input picture is carried out power normalized, normalized, dimension-reduction treatment or whitening processing.

Wherein, step 12 includes:

Carry out described at least two feature waiting weights or differential weights value to merge, by the feature space of described at least two Feature Fusion to described input picture.

Wherein, the obtaining step of described scene level characteristics includes:

Utilize degree of depth study and/or artificial screening feature, obtain the described input picture information in full figure rank, as the scene level characteristics of described input picture.

The obtaining step of described object level characteristics includes:

Generate the subject area rectangle frame of described input picture；

Obtain the degree of depth study of each described subject area rectangle frame and/or the feature of artificial screening；

It is polymerized the degree of depth study of each described subject area rectangle frame and/or the feature of artificial screening, as the object level characteristics of described input picture.

The degree of depth study of each described subject area rectangle frame of described polymerization and/or the feature of artificial screening, the step as the object level characteristics of described input picture includes:

It is polymerized the degree of depth study of each described subject area rectangle frame and/or the feature of artificial screening；

Described feature after polymerization is carried out post processing, as the object level characteristics of described input picture.Wherein, post processing can be normalized or dimension-reduction treatment；

Wherein, the obtaining step of the some level characteristics of described input picture includes:

Obtain the feature based on the detection of metric space extreme point of described input picture；

The described feature based on the detection of metric space extreme point of polymerization, is fixed the feature of dimension, as the some level characteristics of described input picture.

The present invention can express the feature of image many levels simultaneously, can well overcome the defect in conventional images expression, describe the information of image scene, object and three ranks of point simultaneously, produce further to simplify description, thus accuracy rate based on image expression inter-related task is substantially improved.

The following specifically describes another embodiment of the present invention.

As in figure 2 it is shown, this method comprises the following steps:

Step 1, scene level characteristics are expressed, it is thus achieved that the abstractdesription of full figure；

Step 2, object rank feature representation, for subject area or other important area, it is thus achieved that the feature description in each region, be then polymerized the expression of regional；

Step 3, some rank feature representation, it is thus achieved that image is based on the description of metric space extreme point；

Step 4, multi-level features fusion method, utilize Feature Fusion and post-processing approach, by the Feature Fusion of many levels a to feature space.

Above-mentioned each step is specifically described below:

Step 1, scene level characteristics are expressed

The scene level characteristics of image refers to: what produce after considering the Global Information of image can express the feature that its overall situation is abstract.Scene level another characteristic is relevant to the semanteme of image entirety, and every image has the scene level characteristics of same dimension.Described scene level characteristics obtaining step includes, for input picture, obtaining the abstract characteristics of its scene rank, available degree of depth study and or artificial screening feature description institute input picture in the information of full figure rank.

Step 2, object rank feature representation

Object rank feature representation refers to and carries out feature description for the subject area in image or other important area.Described object rank feature extraction step includes, for input picture, obtaining the feature of its all subject area or other important area.

As it is shown on figure 3, it step included

A) regional choice, utilizes object candidates region method or the rectangular area of artificial selected some, with the object in this labelling image or important area；

B) Region Feature Extraction, obtain each selection area the degree of depth study and or artificial screening feature；

C) provincial characteristics polymerization, is polymerized the feature of all selection areas；

D) provincial characteristics post processing, carries out post processing to the provincial characteristics of polymerization.

Step 3, some level characteristics are expressed

Point level characteristics is expressed and is referred to the description that image carries out distinguished point based, thus it is indeformable to allow the expression of image have good geometry, is maintained with relatively stable expression effect.Described some level characteristics obtaining step includes, for input picture, obtaining its feature based on metric space extreme point.Specifically include following steps:

A) extreme point feature description, obtains the feature description that image detects according to extreme point；

B) polymerization of some level characteristics, feature in polymerization procedure a), be fixed the feature of dimension；

C) some level characteristics post processing, carries out post processing to the feature of polymerization.

Step 4, multi-level features fusion method

Multi-level features fusion method refers to the Feature Fusion of image many levels to feature space so that every pictures can go to express many levels feature by the feature of identical dimensional.Multi-level features merges and can merge for the feature of the feature of any two levels or whole three levels.Multi-level features fusion steps, described multi-level features fusion steps includes utilizing Feature Fusion and post-processing approach, by the Feature Fusion of many levels a to feature space.

In Feature Fusion process, the feature between different levels can carry out waiting the fusion of weights or differential weights value.Feature after fusion can carry out post processing, produces the image expression simplified.

Fig. 4 is the schematic diagram of the image expression merged based on multi-level features for an image.

Present invention can apply to the task of multiple computer vision.Embodiment with the image retrieval embodiment merged based on multi-level features with based on the image classification of multi-level features fusion illustrates below.

Following example are based on the image retrieval that multi-level features merges.Fig. 5 is the flow chart of the image retrieval merged based on multi-level features of the embodiment of the present invention.The present embodiment comprises the following steps:

Step 51, scene level characteristics expresses step: this example full articulamentum of convolutional neural networks expresses the information of scene rank.The network structure that can be used to extract scene level characteristics has GoogLeNet, Alexnet, VGGnet etc..

Step 52, object rank feature representation step: the object rank of image is characterized by that the object occurred in image is described by pointer.Key step is as follows:

(1) object suggestion areas, uses the object suggestion areas in object suggesting method detection image, and available object suggesting method has edgebox, selectivesearch, Bing etc..

(2) region screening, utilizes relevant priori candidate region to be screened and ranking, it is thus achieved that the candidate region of needs；

(3) subject area feature extraction, inputs convolutional neural networks by each subject area, extracts the feature of its full articulamentum；

(4) object rank characteristic aggregation, by the characteristic aggregation of each subject area, obtain the feature of identical dimensional, available characteristics of objects polymerization has Sum (Average) polymerization, Max polymerization, VLAD polymerization and related variation thereof etc., and this several object level another characteristic polymerizations are described below:

A () Sum polymerization is to be sued for peace with the feature of dimension by all objects:

f_{o} = [Σ_{n = 1}^{N} f_{o_{n}}^{(1)}, ..., Σ_{n = 1}^{N} f_{o_{n}}^{(D)}]

f_oIt is the feature after polymerization,Referring to the i-th dimension of the characteristic vector of the n-th subject area, N is the number of subject area, and D is the dimension of each characteristics of objects.Average polymerization need byReplace with

\frac{1}{N} Σ_{n = 1}^{N} f_{o_{n}}^{(D)}

?.

B () Max polymerization is the maximum taking all objects in each dimension:

f_{o} = [m a x {f_{o_{n}}^{(1)}}_{n = 1}^{N}, ..., m a x {f_{o_{n}}^{(D)}}_{n = 1}^{N}]

f_oIt is the feature after polymerization,Referring to the i-th dimension of the characteristic vector of the n-th subject area, N is the number of subject area, and D is the dimension of each characteristics of objects.

C () VLAD polymerization is a kind of complex polymerization, obtain c first by k-means algorithm₁,c₂,…,c_kK cluster centre, each characteristics of objects finds nearest cluster centreThen retaining it with to by the residual error of cluster centre, VLAD is the residual error sum that each cluster centre retains:

F_{v} = [\underset{N C (f_{o_{i}}) = c_{1}}{Σ} (f_{o_{i}} - c_{1}), ..., \underset{N C (f_{o_{i}}) = c_{k}}{Σ} (f_{o_{i}} - c_{k})]

Represent that arest neighbors center is c_tAll characteristic vectors add up, thus using VLAD to express the dimension of object level characteristics is k*D, and dimension is too high, therefore can it be normalized afterwards and dimension-reduction treatment obtains f_o。

Step 53, some level characteristics express step.

This example carries out a level characteristics based on SIFT and expresses.Specifically, first extract the description of the SIFT feature point of image, then these characteristic points are polymerized, it is thus achieved that the expression f of a fixing dimension_p.Available polymerization has FisherVector, VLAD etc., PCA can be used afterwards to carry out post processing, reduce characteristic dimension.

Step 54, Feature Fusion step.

Whole three layers features are carried out waiting the fusion of weights by the Feature Fusion module of this example, and the feature that every figure can produce a fixing dimension goes to express the information of three levels simultaneously.Before carrying out Feature Fusion, three expression expressing existing identical dimensional expressed.During Feature Fusion, first by the normalization respectively of the feature of three parts, then three parts are stitched together:

F=[f_s,f_o,f_p]

f_sFor scene level another characteristic, f_oFor object level another characteristic, f_pFor a level characteristics, f is spliced feature, and f then carries out dimensionality reduction and albefaction, can reduce the dimension of f with principal component analysis, carry out albefaction further:

f_whiten=diag (1./sqrt (v₁,v₂,…,v_h))*U*f

Wherein, h is the intrinsic dimensionality retained after carrying out principal component analysis, v_iFor i-th singular value, U is the transition matrix of principal component analysis.To re-start normalization after albefaction and obtain final expression.The so final expression of simplifying obtained describes the information of three levels of image simultaneously.

Step 55, characteristic measure；Use Euclidean distance or other distance metric method to calculate the distance between image, find in data base and the image of retrieval image similarity.

Following example are based on the image classification that multi-level features merges.

Fig. 6 is the flow chart of the image classification merged based on multi-level features of the embodiment of the present invention two.It is identical with the method that embodiment one uses that the scene level characteristics that the present embodiment includes expresses step, object rank feature representation step, some level characteristics table step and Feature Fusion step, therefore grader step is only introduced in this part.

Grader step: image classification task need to be trained in training set and be obtained grader.To the picture in training set, the image expression method based on multi-level features fusion using the present invention describes image, then in conjunction with image category label, suitable grader (SVM, LogisticRegression etc.) training is used to obtain classifier parameters.First classification chart picture is carried out same feature representation, then inputs the feature in the grader trained, obtain classification results.

As it is shown in fig. 7, be a kind of image expression device merged based on multi-level features of the present invention, including:

Acquiring unit 71, obtains at least two feature of input picture；Described at least two is characterized as the combination of at least two of scene level characteristics, object level characteristics, some level characteristics.Described acquiring unit 21 includes scene level characteristics processing unit, object rank characteristic processing unit, some level characteristics processing unit.

Polymerized unit 72, by the feature space of described at least two Feature Fusion to described input picture, as the expression of described input picture；

Processing unit 73, the expression according to described input picture, described input picture is processed.

Another embodiment of assembly of the invention is described below.

A kind of image expression device merged based on multi-level features, described device includes:

Scene level characteristics processing unit, described scene level characteristics processing unit is for for input picture, extracting the feature of degree of depth study and/or artificial screening；

Object rank characteristic processing unit, described object rank characteristic processing unit, for for input picture, generating subject area rectangle frame, extracts the study of the subject area degree of depth and or artificial screening feature, and is polymerized all subject area features；

Point level characteristics processing unit, described some level characteristics processing unit is for for input picture, extracting its feature based on extreme point, and be aggregated to fixing dimension.

Multi-level features integrated unit, Feature Fusion to the feature space that many levels characteristic processing unit is inputted by described multi-level features integrated unit.

One of ordinary skill in the art will appreciate that: accompanying drawing is the schematic diagram of an embodiment, module or flow process in accompanying drawing are not necessarily implemented necessary to the present invention.As seen through the above description of the embodiments, those skilled in the art is it can be understood that can add the mode of required general hardware platform by software to the present invention and realize.Based on such understanding, the part that prior art is contributed by technical scheme substantially in other words can embody with the form of software product, this computer software product can be stored in storage medium, such as ROM/RAM, magnetic disc, CD etc., including some instructions with so that a computer equipment (can be personal computer, server, or the network equipment etc.) perform the method described in some part of each embodiment of the present invention or embodiment.

Each embodiment in this specification all adopts the mode gone forward one by one to describe, between each embodiment identical similar part mutually referring to, what each embodiment stressed is the difference with other embodiments.Especially for device or system embodiment, owing to it is substantially similar to embodiment of the method, so describing fairly simple, relevant part illustrates referring to the part of embodiment of the method.Apparatus and system embodiment described above is merely schematic, the wherein said unit illustrated as separating component can be or may not be physically separate, the parts shown as unit can be or may not be physical location, namely may be located at a place, or can also be distributed on multiple NE.Some or all of module therein can be selected according to the actual needs to realize the purpose of the present embodiment scheme.Those of ordinary skill in the art, when not paying creative work, are namely appreciated that and implement.The above; being only the present invention preferably detailed description of the invention, but protection scope of the present invention is not limited thereto, any those familiar with the art is in the technical scope that the invention discloses; the change that can readily occur in or replacement, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with scope of the claims.

Claims

1. the image expression method merged based on multi-level features, it is characterised in that including:

2. method according to claim 1, it is characterised in that the described expression according to described input picture, the step that described input picture is processed includes:

3. method according to claim 1, it is characterised in that the described expression according to described input picture, the step that described input picture is processed includes:

The expression of described input picture is carried out post processing；

The expression of the described input picture according to post processing, processes described input picture.

4. method according to claim 3, it is characterised in that the described step that described feature space carries out post processing includes:

5. method according to claim 1, it is characterised in that the described step by described at least two Feature Fusion to the feature space of described input picture includes:

6. method according to claim 1, it is characterised in that the obtaining step of described scene level characteristics includes:

7. method according to claim 1, it is characterised in that the obtaining step of described object level characteristics includes:

Generate the subject area rectangle frame of described input picture；

8. method according to claim 7, it is characterised in that the degree of depth study of each described subject area rectangle frame of described polymerization and/or the feature of artificial screening, the step as the object level characteristics of described input picture includes:

Described feature after polymerization is carried out post processing, as the object level characteristics of described input picture.

9. method according to claim 1, it is characterised in that the obtaining step of the some level characteristics of described input picture includes:

10. the image expression device merged based on multi-level features, it is characterised in that including: