CN103312938B - Video process apparatus, method for processing video frequency and equipment - Google Patents

Video process apparatus, method for processing video frequency and equipment Download PDF

Info

Publication number
CN103312938B
CN103312938B CN201210071078.3A CN201210071078A CN103312938B CN 103312938 B CN103312938 B CN 103312938B CN 201210071078 A CN201210071078 A CN 201210071078A CN 103312938 B CN103312938 B CN 103312938B
Authority
CN
China
Prior art keywords
video lens
video
lens
frame
soft label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210071078.3A
Other languages
Chinese (zh)
Other versions
CN103312938A (en
Inventor
李斐
刘汝杰
石原正树
上原祐介
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to CN201210071078.3A priority Critical patent/CN103312938B/en
Priority to JP2013053509A priority patent/JP6015504B2/en
Publication of CN103312938A publication Critical patent/CN103312938A/en
Application granted granted Critical
Publication of CN103312938B publication Critical patent/CN103312938B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention provides video process apparatus, method for processing video frequency and equipment, with the problem at least overcoming the Video processing weak effect of existing supervised and the existence of Semi-supervised video processing technique.Video process apparatus includes: for extracting the pretreatment unit representing frame He carrying out image segmentation;Extract the feature extraction unit of camera lens level, frame level and region class visual signature;The weighted graph building camera lens level, frame level and region class weighted graph sets up unit;The construction of function unit of structure cost function;Obtain video lens by solving the optimal problem of cost function, represent the computing unit of the soft label in frame and region;And the video processing unit of Video processing is carried out according to above-mentioned soft label.Method for processing video frequency is for performing to be capable of the process of the function of above-mentioned video process apparatus.The said equipment includes above-mentioned video process apparatus.The above-mentioned technology of the application present invention, it is possible to obtain good Video processing effect, it is possible to be applied to field of video processing.

Description

Video process apparatus, method for processing video frequency and equipment
Technical field
The present invention relates to field of video processing, particularly relate to a kind of video process apparatus, method for processing video frequency and equipment.
Background technology
Sharp increase along with digital video number, it is necessary to research and development effective video treatment technology.Generally, in more existing video processing technique, it is necessary to user provides some training video camera lenses, then corresponding Video processing is carried out according to these training video camera lenses.Wherein, training video camera lens potentially includes the video lens of tape label and the video lens of not tape label, and the video lens of tape label generally includes positive example video lens (that is, the video lens with positive label) and negative example video lens (that is, the video lens of the negative label of band).Type according to training video camera lens, it is possible to these video processing technique are divided into two classes, the i.e. video processing technique of the video processing technique of supervised and Semi-supervised.
For the video processing technique of supervised, the training video camera lens that it adopts is all the video lens of tape label.But, the number of the video lens of tape label is generally very limited, and the usual effect of process carried out hence with this technology is poor, and can not effectively utilize the information in the video lens of not tape label.
For the video processing technique of Semi-supervised, the training video camera lens that it adopts both had included the video lens of tape label, included again the video lens of not tape label.For the video processing technique of supervised, the video processing technique of Semi-supervised can relatively effectively utilize the information comprised in the video lens of not tape label.But, in the video processing technique of existing Semi-supervised, great majority are merely with camera lens level weighted graph or the video processing technique that carries out merely with frame level weighted graph, even if there are some to utilize the technology of camera lens level weighted graph and frame level weighted graph simultaneously, it also simply individually utilizes two kinds of weighted graphs to be calculated, combines the two result calculated again simply, without the contact considered in the process of computing therebetween, therefore treatment effect is poor.
Summary of the invention
The brief overview about the present invention given below, in order to the basic comprehension about certain aspects of the invention is provided.Should be appreciated that this general introduction is not that the exhaustive about the present invention is summarized.It is not intended to determine the key of the present invention or pith, and nor is it intended to limit the scope of the present invention.It is only intended to and provides some concept in simplified form, in this, as the preamble in greater detail discussed after a while.
Drawbacks described above in view of prior art, an object of the present invention is to provide a kind of video process apparatus, method for processing video frequency and equipment, with the problem of the Video processing weak effect existing for the video processing technique of the video processing technique and Semi-supervised that at least overcome existing supervised.
To achieve these goals, according to an aspect of the present invention, provide a kind of video process apparatus, this video process apparatus includes: pretreatment unit, it is arranged to and extracts at least one of each video lens that video lens concentrates respectively and represent frame, and each frame that represents extracted is divided into multiple region, wherein, at least part of video lens of this video lens collection is the video lens of tape label;Feature extraction unit, it is arranged to the camera lens level visual signature of each video lens, frame level visual signature and the region class visual signature that extract above-mentioned video lens and concentrate;Weighted graph sets up unit, and it is arranged to and builds camera lens level weighted graph according to above-mentioned camera lens level visual signature, builds frame level weighted graph according to above-mentioned frame level visual signature, and builds region class weighted graph according to above-mentioned zone level visual signature;Construction of function unit, it is arranged to the soft label in each soft label representing frame in the soft label of each video lens concentrated with above-mentioned video lens, above-mentioned each video lens and above-mentioned each each region represented in frame for unknown quantity, structural information according to above-mentioned camera lens level weighted graph, above-mentioned frame level weighted graph and above-mentioned zone level weighted graph, and according to the soft label of above-mentioned each video lens, relation between above-mentioned each soft label representing frame and the soft label in above-mentioned each region, construct cost function;Computing unit, its optimal problem being configured to solve above-mentioned cost function, it is thus achieved that the value of calculation of above-mentioned unknown quantity;And video processing unit, it is arranged to the value of calculation obtained according to above-mentioned computing unit and carries out Video processing.
According to another aspect of the present invention, additionally provide a kind of method for processing video frequency, this method for processing video frequency includes: at least one of each video lens that extraction video lens is concentrated represents frame respectively, and each frame that represents extracted is divided into multiple region, wherein, at least part of video lens of this video lens collection is the video lens of tape label;Extract the camera lens level visual signature of each video lens, frame level visual signature and region class visual signature that above-mentioned video lens is concentrated;Build camera lens level weighted graph according to above-mentioned camera lens level visual signature, build frame level weighted graph according to above-mentioned frame level visual signature, and build region class weighted graph according to above-mentioned zone level visual signature;Each soft label representing frame in the soft label of each video lens concentrated with above-mentioned video lens, above-mentioned each video lens and the soft label in above-mentioned each each region represented in frame are for unknown quantity, structural information according to above-mentioned camera lens level weighted graph, above-mentioned frame level weighted graph and above-mentioned zone level weighted graph, and according to the soft label of above-mentioned each video lens, relation between above-mentioned each soft label representing frame and the soft label in above-mentioned each region, construct cost function;By solving the optimal problem of above-mentioned cost function, it is thus achieved that the value of calculation of above-mentioned unknown quantity;And carry out Video processing according to the value of calculation obtained.
According to another aspect of the present invention, additionally providing a kind of equipment, this equipment includes video process apparatus as above.
Other side according to the present invention, additionally provide corresponding computer-readable recording medium, on this computer-readable recording medium, storage has the computer program that can be performed by computing equipment, and described program can make described computing equipment perform above-mentioned method for processing video frequency upon execution.
Above-mentioned video process apparatus according to embodiments of the present invention and method for processing video frequency and include the equipment of this video process apparatus, it is capable of one of at least following benefit: by utilizing three kinds of weighted graphs to utilize the characteristic information of video lens fully, and fully excavate the contact between three kinds of weighted graphs, it is possible to obtain good Video processing effect;Utilizing on the basis of video lens of tape label, further with the video lens of not tape label, Video processing can be realized, and then the treatment effect of Video processing can be improved;Video frequency searching result more accurately can be obtained;And video concept detection result more accurately can be obtained.
By below in conjunction with the accompanying drawing detailed description to highly preferred embodiment of the present invention, these and other advantage of the present invention will be apparent from.
Accompanying drawing explanation
The present invention by with reference to being better understood below in association with the description given by accompanying drawing, wherein can employ same or analogous accompanying drawing labelling to represent same or like parts in all of the figs.Described accompanying drawing comprises in this manual and is formed the part of this specification together with detailed description below, and is used for being further illustrated by the preferred embodiments of the present invention and explaining principles of the invention and advantage.In the accompanying drawings:
Fig. 1 is the block diagram schematically showing a kind of exemplary construction of video process apparatus according to an embodiment of the invention.
Fig. 2 is the block diagram that the weighted graph schematically shown in Fig. 1 sets up a kind of possible exemplary construction of unit.
Fig. 3 is the block diagram of a kind of possible exemplary construction schematically showing the construction of function unit in Fig. 1.
Fig. 4 is the block diagram of a kind of possible exemplary construction schematically showing the computing unit in Fig. 1.
Fig. 5 is the block diagram of a kind of possible exemplary construction schematically showing the video processing unit in Fig. 1.
Fig. 6 is the flow chart schematically showing a kind of exemplary process of method for processing video frequency according to an embodiment of the invention.
Fig. 7 is the flow chart of a kind of possible exemplary process schematically showing step S660 as shown in Figure 6.
Fig. 8 is shown schematically under the sample situation that Video processing is video concept detection, the flow chart of a kind of possible exemplary process of step S670 as shown in Figure 6.
Fig. 9 shows the structure diagram that can be used to realize the hardware configuration of a kind of possible messaging device of video process apparatus and method for processing video frequency according to embodiments of the present invention.
It will be appreciated by those skilled in the art that element in accompanying drawing be only used to simple and clear for the purpose of and illustrate, and be not necessarily drawn to scale.Such as, in accompanying drawing, the size of some element is likely to be exaggerated relative to other elements, in order to be favorably improved the understanding to the embodiment of the present invention.
Detailed description of the invention
Hereinafter in connection with accompanying drawing, the one exemplary embodiment of the present invention is described.For clarity and conciseness, all features of actual embodiment are not described in the description.But, it should be recognized that, the process developing any this practical embodiments must be made a lot of decision specific to embodiment, to realize the objectives of developer, such as, meet those restrictive conditions relevant to system and business, and these restrictive conditions may change along with the difference of embodiment.Additionally, it also should be appreciated that, although development is likely to be extremely complex and time-consuming, but for having benefited from those skilled in the art of present disclosure, this development is only routine task.
At this, also need to illustrate be a bit, in order to avoid having obscured the present invention because of unnecessary details, illustrate only in the accompanying drawings and according to the closely-related apparatus structure of the solution of the present invention and/or process step, and eliminate other details little with relation of the present invention.
As described above, the video processing technique of supervised of the prior art or Semi-supervised, it is when processing video lens, and due to reason described above, to result in the effect of process poor.In order to improve Video processing effect, the present invention proposes a kind of video process apparatus, it can utilize the camera lens level visual signature of video lens, frame level visual signature and region class visual signature simultaneously, it is sufficiently used the information in video lens, it is possible to the relation between feature and video lens and the video lens of reflecting video camera lens better.
This video process apparatus includes: pretreatment unit, it is arranged to and extracts at least one of each video lens that video lens concentrates respectively and represent frame, and each frame that represents extracted is divided into multiple region, wherein, at least part of video lens of this video lens collection is the video lens of tape label;Feature extraction unit, it is arranged to the camera lens level visual signature of each video lens, frame level visual signature and the region class visual signature that extract above-mentioned video lens and concentrate;Weighted graph sets up unit, and it is arranged to and builds camera lens level weighted graph according to above-mentioned camera lens level visual signature, builds frame level weighted graph according to above-mentioned frame level visual signature, and builds region class weighted graph according to above-mentioned zone level visual signature;Construction of function unit, it is arranged to the soft label in each soft label representing frame in the soft label of each video lens concentrated with above-mentioned video lens, above-mentioned each video lens and above-mentioned each each region represented in frame for unknown quantity, structural information according to above-mentioned camera lens level weighted graph, frame level weighted graph and region class weighted graph, and according to the soft label of above-mentioned each video lens, relation between above-mentioned each soft label representing frame and the soft label in above-mentioned each region, construct cost function;Computing unit, its optimal problem being configured to solve this cost function, it is thus achieved that the value of calculation of above-mentioned unknown quantity;And video processing unit, it is arranged to the value of calculation obtained according to computing unit and carries out Video processing.
According to an embodiment of the invention video process apparatus is described in detail below in conjunction with Fig. 1-Fig. 5.
Fig. 1 is the block diagram schematically showing a kind of exemplary construction of video process apparatus 100 according to an embodiment of the invention.As it is shown in figure 1, video process apparatus 100 includes pretreatment unit 110 according to an embodiment of the invention, feature extraction unit 120, weighted graph set up unit 130, construction of function unit 140, computing unit 150 and video processing unit 160.
As shown in Figure 1, pretreatment unit 110 in video process apparatus 100 represents frame for extracting at least one respectively from each video lens of video lens concentration, and each frame that represents extracted is carried out image segmentation, that is, each frame that represents of each video lens extracted is divided into multiple region respectively.Wherein, the representative frame that each video lens is extracted can be any one frame in this video lens or any multiframe, it can also be the frame utilizing more existing methods extracting frame to extract, additionally, image mentioned here segmentation can adopt any one image partition method of the prior art to realize, and no longer describes in detail here.Additionally, above-mentioned video lens collection can include multiple video lens, and at least part of video lens in the plurality of video lens is the video lens of tape label.It is to say, the video lens that this video lens is concentrated can be entirely the video lens of tape label, it is also possible to part is the video lens of tape label, remainder is the video lens of not tape label.Wherein, the video lens of above-mentioned tape label can be the video lens (hereinafter referred to as " positive example video lens ") with positive label, it is also possible to be the video lens (hereinafter referred to as " negative example video lens ") of the negative label of band.It should be noted that, " label " that video lens mentioned here is with is (also referred to as hard label, hardlabel) it is a kind of markup information, it is common that be such as labeled in a kind of information characterizing this object type on object (such as video lens) in advance by user.Wherein, video lens with positive label (namely, video lens with positive hard label) it is usually the video lens meeting particular category, and it is with the video lens (that is, being with the video lens of negative hard label) of negative label to be usually the video lens not meeting particular category.Such as, positive label can be the form of " A ", and negative label can be the form of " non-A " accordingly.Wherein, one simply example is " A " is Tiger, namely, video lens with positive label is label is that (these video lens meet classification Tiger for the video lens of Tiger, show these video lens comprise Tiger), the video lens of the negative label of band is then label is the video lens (these video lens do not meet classification Tiger, it was shown that do not comprise Tiger in these video lens) of " non-tiger ".
Wherein it is desired to illustrate, the quantity of the representative frame that each video lens that video lens is concentrated is extracted can be identical, it is also possible to is different.Additionally, the representative frame of each video lens can concentrated by this video lens by image segmentation is divided into multiple region, but each represent frame divided after the number in region that obtains can be identical, it is also possible to be different.
Then, extract, by feature extraction unit 120, the camera lens level visual signature of each video lens, frame level visual signature and the region class visual signature that above-mentioned video lens is concentrated.Wherein, the camera lens level visual signature of each video lens refers to the visual signature of this video lens extracted in camera lens aspect;The frame level visual signature of each video lens refers to the visual signature of this video lens extracted on frame-layer face;And the region class visual signature of each video lens refers to the visual signature of this video lens extracted in regional level.Wherein, " visual signature " mentioned here is able to the information of the content of reflecting video camera lens to a certain extent, can be such as any one feature in the visual signatures such as color characteristic, textural characteristics and shape facility, it is also possible to be the arbitrarily several combination in above multiple visual signature.Additionally, the various methods for extracting visual signature existed in prior art used by the present invention, no longer can describe in detail here.
According to the camera lens level visual signature of each video lens of above-mentioned video lens concentration, frame level visual signature and region class visual signature that feature extraction unit 120 is extracted, weighted graph sets up unit 130 can set up the weighted graph (or claiming weighted graph) of three types.Specifically, weighted graph sets up unit 130 can set up camera lens level weighted graph according to the camera lens level visual signature of each video lens that feature extraction unit 120 is extracted, and the frame level visual signature of each video lens extracted according to feature extraction unit 120 (namely, the visual signature of each frame) set up frame level weighted graph, region class weighted graph can also be set up according to the region class visual signature (that is, the visual signature in each region) of each video lens that feature extraction unit 120 is extracted.
In a kind of implementation of video process apparatus according to an embodiment of the invention, weighted graph as shown in Figure 1 is set up unit 130 and structure as shown in Figure 2 can be adopted to realize.Fig. 2 is the block diagram that the weighted graph schematically shown in Fig. 1 sets up a kind of possible exemplary construction of unit 130.Can include first setting up subelement 210, second setting up subelement 220 and the 3rd and set up subelement 230 as in figure 2 it is shown, weighted graph sets up unit 130.
Wherein, first sets up subelement 210 may be used for building above-mentioned camera lens level weighted graph, each video lens that example video lens as described above is concentrated is as node, between each two node, the similarity on camera lens level visual signature is as the weights on the weighting limit between these two nodes, builds this camera lens level weighted graph.In other words, in the first camera lens level weighted graph setting up constructed by subelement 210, each node represents one of them video lens that above-mentioned video lens is concentrated respectively, connects the weights on the weighting limit of two nodes and then represents between two video lens corresponding to the two node the similarity based on camera lens level visual signature.Wherein, the video lens that the node in this camera lens level weighted graph and video lens are concentrated is one to one.
Similarly, second sets up subelement 220 may be used for building above-mentioned frame level weighted graph, each frame that represents of each video lens that example video lens as described above is concentrated is as node, between each two node, the similarity on frame level visual signature is as the weights on the weighting limit between these two nodes, builds this frame level weighted graph.In other words, in the second frame level weighted graph setting up constructed by subelement 220, one of one of them video lens that each node represents above-mentioned video lens concentration respectively represents frame, and the weights connecting the weighting limit of two nodes then represent that two corresponding to the two node represent the similarity between frame based on frame level visual signature.Wherein, the representative frame of the video lens that the node in this frame level weighted graph is concentrated with video lens is one to one.
In addition, 3rd sets up subelement 230 may be used for building above-mentioned zone level weighted graph, each video lens each that example video lens as described above is concentrated represents each region of frame as node, between each two node, the similarity on region class visual signature is as the weights on the weighting limit between these two nodes, builds this region class weighted graph.In other words, in the 3rd region class weighted graph setting up constructed by subelement 230, each node represents one of them region representing frame of one of them video lens that above-mentioned video lens is concentrated respectively, connects the weights on the weighting limit of two nodes and then represents between two regions corresponding to the two node the similarity based on region class visual signature.Wherein, the region comprised in the representative frame of the video lens that the node in this region class weighted graph is concentrated with video lens is one to one.
Go to Fig. 1, be weighted by figure set up unit 130 build obtain camera lens level weighted graph, frame level weighted graph and region class weighted graph after, it is possible to construct cost function by construction of function unit 140.Wherein, in this cost function, the soft label in each each region representing frame of each video lens that unknown quantity is each soft label representing frame of each video lens of concentrating of the soft label of each video lens concentrated of above-mentioned video lens, above-mentioned video lens and above-mentioned video lens is concentrated.Then, the structural information of the camera lens level weighted graph constructed by unit 130, frame level weighted graph and region class weighted graph is set up according to weighted graph, and the relation between the soft label in the region in the soft label of representative frame of the soft label of each video lens concentrated according to above-mentioned video lens, each video lens and the representative frame of each video lens, it is possible to structure obtains cost function.
It should be noted that soft label (softlabel) is the concept of relatively hard label and a concept defining.Hard label is a kind of real markup information often, the information of a kind of reflected sample classification that it is usually labeled in predetermined sample (such as video lens) in advance;Soft label is then a kind of virtual markup information, and it generally reflects the degree of the classification information that the hard label that object (such as video lens, frame or region) belonging to this soft label meets in predetermined sample characterizes.Usually, can make soft label is any real number (including-1 and 1) between-1 to 1, in this case, the value of soft label is closer to 1 (namely more big), it was shown that in this object corresponding to soft label and predetermined sample, the classification of object with positive label more meets;And on the contrary, the value of soft label is closer to-1 (namely more little), it was shown that in this object corresponding to soft label and predetermined sample, the classification of object with positive label does not more meet.In other words, the value of soft label is more big, showing that the probability of the classification that this object corresponding to soft label meets the above-mentioned object with positive label is more big, the value of soft label is more little, it was shown that the probability of the classification that this object corresponding to soft label meets the above-mentioned object with positive label is more little.Furthermore, it is necessary to illustrate, soft label can also be arranged to other real number, such as can also be set as the real number more than 1 or less than-1, in this case, similarly, in the object of more big soft its correspondence of tag representation and predetermined sample, the classification of object with positive label more meets.
Such as, when the video lens that predetermined sample comprises video lens with positive label and the negative label of band and the above-mentioned video lens with positive label to be label the be video lens of Tiger and the video lens of the negative label of the above-mentioned band video lens that to be label be " non-tiger ", if the soft label of a certain video lens is 0.1, the soft label of another video lens is 0.8, then soft label be 0.8 video lens in comprise tiger probability to be much higher than the video lens that soft label is 0.1.
Specifically, it is possible to adopt structure as shown in Figure 3 to realize function and the operation of construction of function unit 140.Fig. 3 is the block diagram of a kind of possible exemplary construction schematically showing the construction of function unit 140 in Fig. 1.
As it is shown on figure 3, construction of function unit 140 can include the first setting subelement 310, second sets subelement 320 and construction of function subelement 330.Wherein, first sets subelement 310 for setting up the camera lens level weighted graph constructed by unit 130 according to weighted graph, the structural information of frame level weighted graph and region class weighted graph sets the first constraints, second set subelement 320 concentrate for the soft label of the video lens of tape label concentrated according to above-mentioned video lens and this video lens tape label video lens the soft label of representative frame and the soft label in the above-mentioned region represented in frame between relation set the second constraints, then construction of function subelement 330 obtains cost function for constructing according to both the above constraints.As it has been described above, the soft label in each each region represented in frame of each video lens of each soft label representing frame of each video lens of the soft label of each video lens that the unknown quantity in this cost function is above-mentioned video lens to be concentrated, this video lens concentration and this video lens concentration.
Specifically, consider the structural information of above-mentioned three kinds of weighted graphs, the first setting subelement 310 can be passed through and set such first constraints: the difference between the soft label of two video lens making camera lens level visual signature more similar is more little, the difference that make frame level visual signature more similar two represent between the soft label of frame is more little, and makes the difference between the soft label in more similar two regions of region class visual signature more little.
Additionally, the video lens of those tape labels concentrated for above-mentioned video lens, it is possible to the soft label of the video lens of the negative label of order band as far as possible close-1, and the soft label of video lens with positive label is made to try one's best close to 1.This is because, video lens with positive label comprises the content of certain particular category, the video lens of negative label does not then comprise the content of this particular category, therefore when making soft label be any real number between-1 to 1, soft label is more big closer to the probability of the video lens content that comprises this particular category of 1, and soft label is more little closer to the probability of the video lens content that comprises this particular category of-1.Such as video lens that label is " non-tiger " (namely negative label), it is possible to make the soft label of this video lens as far as possible close-1;Conversely, for the video lens that label is Tiger (namely positive label), then the soft label of this video lens can be made as far as possible close to 1.
Representative frame for the video lens of the negative label of above-mentioned band, if label born by certain video mirror headband, then represent that this video lens does not comprise above-mentioned " content of particular category ", then show that any frame in this video lens does not comprise above-mentioned " content of particular category ", and any region in any frame in this video lens does not comprise above-mentioned " content of particular category " yet.The each soft label representing frame that therefore, it can to make in the video lens of the negative label of above-mentioned band as far as possible close-1, and make the soft label in each each region representing frame in the video lens of the negative label of above-mentioned band also as far as possible close-1.
And for the representative frame of the above-mentioned video lens with positive label and region therein, situation is slightly more complex.
Such as, representative frame for the video lens with positive label, if certain positive label of video mirror headband, then represent that this video lens comprises above-mentioned " content of particular category ", namely show that at least one frame in this video lens contains above-mentioned " content of particular category ", but cannot determine is that wherein which frame contains this information actually.When only considering the representative frame of video lens, it is believed that at least one in the above-mentioned video lens with positive label represents and contain in frame above-mentioned " content of particular category ", but cannot determine is that wherein which represents frame and contains this information actually.If certain positive label of video mirror headband, it is possible to only consider the representative frame with maximum soft label in this video lens, this is made to represent the soft label of frame as best one can close to the soft label of this video lens.So, just by interrelated to camera lens level weighted graph and frame level weighted graph.
In addition, as mentioned above, when at least one thinking in the above-mentioned video lens with positive label represents and contain above-mentioned " content of particular category " in frame, then each in " at least one represents frame " should represent and frame at least exists a region respectively comprise above-mentioned " content of particular category ".Frame is represented, it is possible to only consider that this represents the region with maximum soft label in frame, make the soft label in this region represent the soft label of frame (that is, representing frame belonging to this region) close to this as best one can for each in above-mentioned " at least one represents frame ".So, just by interrelated to frame level weighted graph and region class weighted graph.
Furthermore, it is necessary to illustrate, generally, it is impossible to know in the video lens with positive label, which frame is positive (that is, which frame is to comprise above-mentioned " content of particular category ").Therefore, it can select some frames being probably positive example (that is, be likely to comprise the frame of above-mentioned " content of particular category ", hereinafter referred to as " possible positive frame ") according to certain criterion.Such as, above-mentioned possible positive frame can be that the value of soft label represents frame higher than those of the 5th predetermined threshold value, it is also possible to is that those of the value wherein the including soft label region higher than the 6th predetermined threshold value represent frame.
Thus, the second setting subelement 320 can be passed through and set such second constraints: the soft label of the video lens of the negative label of order band, the soft label in all regions representing frame as far as possible close-1 in the video lens of all soft labels representing frame and the negative label of band in the video lens of the negative label of band, the soft label of the order video lens with positive label is as far as possible close to 1, the soft label of order has maximum soft label representative frame in the video lens with positive label represents the soft label of video lens belonging to frame close to this as far as possible, and make the soft label in the region in each possible positive frame in the video lens with positive label with maximum soft label try one's best close to the soft label representing frame belonging to this region.
According to both the above constraints, then can construct above-mentioned cost function by construction of function subelement 330.Such as, construction of function subelement 330 can construct according to both the above constraints and obtain following cost function:
Expression formula one:
Q ( f S , f F , f R )
= 1 2 Σ g , h W gh S ( f g S / d g S - f h S / d h S ) 2 + μ G F 2 Σ i , j W ij F ( f i F / d i F - f j F / d j F ) 2
+ μ G R 2 Σ k , l W kl R ( f k R / d k R - f l R / d l R ) 2 + μ - S Σ S g ∈ S - H 1 ( f g S , - 1 )
+ μ - F Σ F i ∈ F - H 1 ( f i F , - 1 ) + μ - R Σ R k ∈ R - H 1 ( f k R , - 1 ) + μ + S Σ S g ∈ S + H 2 ( f g S , 1 )
+ μ + F Σ S g ∈ S + H 2 ( max F i ∈ S g f i F , f g S ) + μ + R Σ F i ∈ C + H 2 ( max R l ∈ F i f l R , f i F )
Wherein,WithRepresent the individual soft label with h video lens of g that above-mentioned video lens is concentrated respectively, wherein, g=1,2 ..., L, h=1,2 ..., L, L is the quantity that video lens concentrates the video lens included.WithRepresent that all i-th represented in frame of all video lens that above-mentioned video lens concentrates and jth represent the soft label of frame respectively, wherein, i=1,2, ..., M, j=1,2, ..., M, M is the quantity of the representative frame included by all video lens that above-mentioned video lens is concentrated.WithRepresent the soft label in the kth in all regions represented included by frame of all video lens that above-mentioned video lens concentrates and l region respectively, wherein, k=1,2 ..., N, l=1,2 ..., N, N are the quantity in all regions represented included by frame of all video lens that above-mentioned video lens is concentrated.Additionally, fSRepresent the vector that the soft label of all video lens concentrated by above-mentioned video lens forms, fFRepresent the vector that all soft labels representing frame of all video lens concentrated by above-mentioned video lens form, fRRepresent the vector that the soft label in all regions represented in frame of all video lens concentrated by above-mentioned video lens forms.Represent the weights on weighting limit between g video lens and the h video lens node corresponding to respectively in camera lens level weighted graph that video lens is concentrated, WSRepresent the matrix being made up of the weights on all weighting limits in camera lens level weighted graph, that is,It is WSG row, h column element, additionally,WithRepresent W respectivelySG row all elements sum and h row all elements sum.Represent that above-mentioned video lens concentrates all i-th represented in frame of all video lens to represent frame and jth represents the weights on the weighting limit between the node that frame is corresponding in frame level weighted graph respectively, WFRepresent the matrix being made up of the weights on all weighting limits in frame level weighted graph, that is,It is WFThe i-th row, jth column element, additionally,WithRepresent W respectivelyFThe i-th row all elements sum and jth row all elements sum.Similarly,Represent the weights on weighting limit between corresponding in the region class weighted graph respectively node in the kth region in all regions represented included by frame of all video lens that above-mentioned video lens concentrates and the l region, WRRepresent the matrix being made up of the weights on all weighting limits in region class weighted graph, that is,It is WRRow k, l column element, additionally,WithRepresent W respectivelyRRow k all elements sum and l row all elements sum.
Additionally, in above-mentioned expression formula one, SgRepresent the g video lens that video lens is concentrated, S+And S-Represent positive example video lens set and negative example video lens set, the F of video lens concentration respectivelyiRepresent that video lens concentrates all i-th represented in frame of all video lens to represent frame, F-Represent all set representing frame that the negative example video lens that video lens is concentrated is concentrated, RkRepresent that video lens concentrates the kth region in all regions representing frame of all video lens, R-Represent the set in all regions representing frame of the negative example video lens concentration of video lens concentration, C+It is the set of all possible positive frames represented in frame that all video lens that video lens is concentrated comprise, H1(x, y) and H2(x, y) for the function of the discordance (that is, the discordance between x and y) between tolerance two amount, it is possible to a kind of form of employing is H1(x, y)=(max (x-y, 0))2And H2(x, y)=(max (y-x, 0))2.Additionally,WithThe respectively weight coefficient of each corresponding cost item in formula, its value can respectively based on experience value or preset by the mode of test.
Wherein, in above-mentioned expression formula one, first three items is the cost item that the first constraints is corresponding in this cost function, and latter four is then second constraints cost item corresponding in this cost function.Additionally, the superscript occurred in formula " S " represents video lens, superscript " F " represents frame, and superscript " R " represents region.
It should be noted that the concrete formula of cost function given above is an exemplary expression of cost function, not as the restriction to the scope of the present invention.Such as, the expression formula of cost function given hereinabove can also be:
Expression formula two:
Q ( f S , f F , f R )
= 1 2 Σ g , h W gh S ( f g S - f h S ) 2 + μ G F 2 Σ i , j W ij F ( f i F - f j F ) 2
+ μ G R 2 Σ k , l W kl R ( f k R - f l R ) 2 + μ - S Σ S g ∈ S - H 1 ( f g S , - 1 )
+ μ - F Σ F i ∈ F - H 1 ( f i F , - 1 ) + μ - R Σ R k ∈ R - H 1 ( f k R , - 1 ) + μ + S Σ S g ∈ S + H 2 ( f g S , 1 )
+ μ + F Σ S g ∈ S + H 2 ( max F i ∈ S g f i F , f g S ) + μ + R Σ F i ∈ C + H 2 ( max R l ∈ F i f l R , f i F )
Wherein, compared with expression formula one, expression formula two eliminates in the Section 1 in expression formula oneWithIn Section 2WithWith in Section 3With
Additionally, the expression formula of cost function can also have other deformation, for instance, in above-mentioned expression formula one and expression formula two, H therein1(x, y) and H2(x, the form that embodies y) can also be: H1(x, y)=(x-y)2And H2(x, y)=(x-y)2, etc..Additionally, those skilled in the art's content according with disclosure above and/or all should be within the scope of the present invention in conjunction with the obtained deformation of above-mentioned formula of known general knowledge, improvement or other expression-forms.
Next, in order to calculate acquisition unknown quantity therein according to the cost function constructed, namely in order to obtain the value of soft label of each video lens that above-mentioned video lens concentrates, the value of the soft label in each each region represented in frame of each video lens concentrated of the value of each soft label representing frame of each video lens that above-mentioned video lens is concentrated and above-mentioned video lens, it is possible to solved the optimal problem of this cost function by computing unit 150.Specifically, it is possible to realized function and the operation of computing unit 150 by structure as shown in Figure 4.
Fig. 4 is the block diagram of a kind of possible exemplary construction schematically showing the computing unit 150 in Fig. 1.As shown in Figure 4, computing unit 150 can include initializing subelement the 410, the 3rd computation subunit the 420, the 4th computation subunit the 430, the 5th computation subunit 440 and the 3rd judgement subelement 450.By exemplary construction as shown in Figure 4, computing unit 150 can adopt the computational methods of a kind of iteration that above-mentioned optimal problem is solved, that is, by fSAnd fFCompose initial value, utilize above-mentioned cost function to be iterated calculating, finally obtain fR、fFAnd fSValue.Concrete function and the process of each subelement of computing unit 150 as shown in Figure 4 are described more fully below.
As shown in Figure 4, subelement 410 is initialized for video lens being concentrated the soft label f of each video lensSAnd the soft label f respectively representing frame in the video lens each video lens of concentrationFCompose initial value.
Such as, initialize subelement 410 and the initial value f that video lens concentrates the soft label of each video lens can be so setS(0): if SgIt is the video lens with positive label, then makesIf SgIt is the video lens of the negative label of band, then makesIf additionally, SgIt is the video lens of not tape label, then makes
It is likewise possible to so arrange the initial value f of the soft label respectively representing frame in the video lens each video lens of concentration by initializing subelement 410F(0): if FiIt is the representative frame in the video lens with positive label, then makesIf FiIt is the representative frame in the video lens of the negative label of band, then makesIf additionally, FiIt is the representative frame in the video lens of not tape label, then makes
As shown in Figure 4, the 3rd computation subunit 420 for concentrating the soft label f of each video lens according to video lensSCurrency, and concentrate the soft label f respectively representing frame in each video lens according to video lensFCurrency, cost function is converted into the minimization problem of belt restraining, and utilize the concavo-convex process (constrainedconcaveconvexprocedure of belt restraining, CCCP) minimization problem of this belt restraining is solved, to obtain the soft label f in each region respectively representing frame in the video lens each video lens of concentrationRValue of calculation, be used as fRCurrency.
Such as, when calculating first, fSAnd fFCurrency be namely their initial value, according to fSAnd fFCurrency the cost function of shape such as expression formula one can be reduced to following formula,
Expression formula is one by one:
Q ( f R )
= μ G R 2 Σ k , l W kl R ( f k R / d k R - f l R / d l R ) 2 + μ - R Σ R k ∈ R - H 1 ( f k R , - 1 )
+ μ + R Σ F i ∈ C + H 2 ( max R l ∈ F i f l R , f i F )
Wherein, expression formula one by one in the implication of each amount identical with expression formula one.Additionally, expression formula one by one in, the set C of all possible positive frames represented in frame that all video lens that above-mentioned video lens is concentrated comprise+Can be such defined thatWherein, THFNamely the 5th predetermined threshold value mentioned above, and THFValue can determine according to following formula:
TH F = max { t | ∀ S g ∈ S + , ∃ F i ∈ S g , f i F ≥ t } = min S g ∈ S + max F i ∈ S g f i F .
By introducing relaxation factor, shape such as expression formula cost function one by one is converted into the minimization problem of belt restraining, then CCCP can be used to solve this optimization problem.Detailed description about CCCP, it is possible to referring to document A.J.Smola, S.V.N.Vishwanathan, andT.Hofmann, " KernelMethodsforMissingVariables, " inProc.Int.WorkshoponArtificialIntelligenceandStatistics, 2005.
So, the 3rd computation subunit 420 is by utilizing fSAnd fFCurrency and cost function, it is possible to obtain f by the wayRValue of calculation be used as fRCurrency.
As shown in Figure 4, the 4th computation subunit 430 can concentrate the soft label f of each video lens according to video lensSCurrency, and concentrate the soft label f in each region respectively representing frame in each video lens according to video lensRCurrency, cost function is converted into the minimization problem of belt restraining, and utilizes CCCP to solve the minimization problem of this belt restraining, concentrate the soft label f respectively representing frame in each video lens obtaining video lensFValue of calculation, be used as fFCurrency.
Specifically, at the soft label f of video lensSSoft label f with regionRWhen determining, it is possible to the cost function of shape such as expression formula one is reduced to:
Expression formula one or two:
Q ( f F )
= μ G F 2 Σ i , j W ij F ( f i F / d i F - f j F / d j F ) 2 + μ - F Σ F i ∈ F - H 1 ( f i F , - 1 )
+ μ + F Σ S g ∈ S + H 2 ( max F i ∈ S g f i F , f g S ) + μ + R Σ F i ∈ C + H 2 ( max R l ∈ F i f l R , f i F )
Wherein, the implication of each amount in expression formula one or two is identical with expression formula one.Additionally, in expression formula one or two, the set C of all possible positive frames represented in frame that all video lens that above-mentioned video lens is concentrated comprise+Can be such defined thatWherein, THRNamely the 6th predetermined threshold value mentioned above, and THRValue can determine according to following formula:
TH R = max { t | ∀ S g ∈ S + , ∃ R k ∈ S g , f k R ≥ t } = min S g ∈ S + max R k ∈ S g f j R .
Similarly, by introducing relaxation factor, the cost function of shape such as expression formula one or two is converted into the minimization problem of belt restraining, it is possible to use the concavo-convex process of belt restraining solves the minimization problem of this belt restraining.
So, the 4th computation subunit 430 is by utilizing fSAnd fRCurrency and cost function, it is possible to obtain f by the wayFValue of calculation be used as fFCurrency.
As shown in Figure 4, the 5th computation subunit 440 can concentrate the soft label f respectively representing frame in each video lens according to video lensFCurrency, and concentrate the soft label f in each region respectively representing frame in each video lens according to video lensRCurrency, directly utilize cost function and be calculated obtaining video lens and concentrate the soft label f of each video lensSValue of calculation, be used as fSCurrency.
Specifically, the soft label f of frame is being representedFSoft label f with regionRWhen determining, it is possible to the cost function of shape such as expression formula one is reduced to:
Expression formula one or three:
Q ( f S )
= 1 2 Σ g , h W gh S ( f g S / d g S - f h S / d h S ) 2 + μ - S Σ S g ∈ S - H 1 ( f g S , - 1 )
+ μ + S Σ S g ∈ S + H 2 ( f g S , 1 ) + μ + F Σ S g ∈ S + H 2 ( max F i ∈ S g f i F , f g S )
Wherein, the implication of each amount in expression formula one or three is identical with expression formula one.According to expression formula one or three, the 5th computation subunit 440 can obtain f by direct solutionSValue be used as fSCurrency.
As shown in Figure 4, the 3rd judges that subelement 450 is for after having performed successively respectively once to calculate in every order three computation subunit the 420, the 4th computation subunit 430 and the 5th computation subunit 440, it is judged that fR、fFAnd fSCurrent result of calculation whether restrain: if so, then by fR、fFAnd fSCurrent result of calculation retain as the value of calculation of the unknown quantity in above-described cost function;Otherwise, then carry out next iteration calculating respectively again with the 3rd computation subunit the 420, the 4th computation subunit 430 and the 5th computation subunit 440, and recycling the 3rd judgement subelement 450 judges, etc., so it is repeatedly performed iterative computation, until the 3rd judges that subelement 450 judges fR、fFAnd fSCurrent result of calculation convergence till.
As mentioned above, the process of unit 130, construction of function unit 140 and computing unit 150 is set up by pretreatment unit 110, feature extraction unit 120, weighted graph, the value of calculation of each video lens of above-mentioned video lens concentration, each soft label representing frame and each region can be obtained, and then video processing unit 160 can be carried out Video processing according to obtaining these value of calculation.
Wherein, the Video processing performed by video processing unit 160 can be the various process that above-mentioned soft label can be utilized to perform operation.
Such as, in an application example of video process apparatus according to an embodiment of the invention, above-mentioned " Video processing " can be video frequency searching, that is, above-mentioned video process apparatus can be video frequency searching device.
It is said that in general, in order to retrieve required video lens, user provides the training video camera lens of some tape labels to be used as inquiry video lens to searching system.This technology can apply to many aspects of people's daily life, for instance Digital Video Library, individual's video recording and video management, online network site of cinema and TV etc..
In this example, the quantity of the inquiry video lens that user provides can be one, it is also possible to is multiple.When the quantity inquiring about video lens is one, this inquiry video lens is the video lens with positive label.When the quantity inquiring about video lens is multiple, these inquiry video lens can be entirely the video lens with positive label, it is also possible to is the combination of the video lens of the video lens with positive label and the negative label of band.Wherein, in the special case situation that inquiry video lens itself only includes a two field picture, inquiry video lens is query image, then namely the representative frame extracted for inquiry video lens is this query image itself.
nullAs mentioned above,By pretreatment unit 110、Feature extraction unit 120、Weighted graph sets up unit 130、Construction of function unit 140 and a series of of computing unit 150 process operation,The soft label of each video lens that video lens is concentrated can be obtained、The value of calculation of the soft label in each each region representing frame of each video lens that each soft label representing frame of each video lens that this video lens is concentrated and this video lens are concentrated,Thus,Utilize the value of calculation of these soft labels,Video processing unit 160 may determine that the similarity between video lens (those video lens except inquiry video lens) and the inquiry video lens that video lens is concentrated,And then the similarity video lens in preset range between those in which and inquiry video lens can be judged to video frequency searching result (namely,Retrieval result).
Such as, in one example, the video lens met the following conditions can be judged to the result of video frequency searching by video processing unit 160: the soft label of this video lens itself is higher than the first predetermined threshold value, and the soft label of the representative frame with maximum soft label in this video lens is higher than the second predetermined threshold value, and in the above-mentioned representative frame with maximum soft label in this video lens, the soft label in the region with maximum soft label is higher than the 3rd predetermined threshold value.Wherein, the value of first, second, and third predetermined threshold value can be identical, it is also possible to different.Such as, video processing unit 160 can by the soft label of the video lens in last result of calculation higher than 0.8, the soft label of the representative frame with maximum soft label therein higher than 0.75 and this represent the soft label in the region with maximum soft label in the frame part video lens higher than 0.7 and be defined as retrieval result.
In another example, video processing unit 160 can meet the video lens of following condition and be judged to the result of video frequency searching: have the soft label of the representative frame of maximum soft label in the soft label of video lens, video lens and this has the top n video lens that the weighted sum of soft label in the region with maximum soft label in the representative frame of maximum soft label is maximum, wherein, N is positive integer.Such as, the expression formula of this weighted sum may is that &alpha;f g S + &beta; max F i &Element; S g f i F + ( 1 - &alpha; - &beta; ) max R k &Element; F i 0 f k R . That is, for each video lens Sg(g=1,2 ..., L), the value of a corresponding weighted sum can be calculated according to above formula, choose the maximum video lens corresponding to top n weighted sum therein, be used as final retrieval result.Wherein,That represent is video lens SgIn there is the value that of maximum soft label represents the soft label of frame, Fi0Represent video lens SgIn the representative frame with maximum soft label, andThen represent above-mentioned video lens SgIn the representative frame F with maximum soft labeli0In the value of soft label in that region with maximum soft label.Additionally, α and β is linear combination coefficient, and 0 < α < 1,0 < β < 1,0 < alpha+beta < 1.
Additionally, retrieval result can be exported to user according to any one order following by video processing unit 160: the size order of the soft label of the video lens corresponding to retrieval result;Or the size order of the soft label of the representative frame with maximum soft label in the video lens corresponding to retrieval result;Or the size order of the soft label in the region in the representative frame with maximum soft label in the video lens corresponding to retrieval result, there is maximum soft label;Or the soft label of video lens corresponding to retrieval result, video lens has the soft label of the representative frame of maximum soft label and this represents the size order of three's weighted sum of soft label in the region in frame with maximum soft label.
In this example, inquiry video lens that video process apparatus provides according to user and label information thereof, utilize camera lens level weighted graph, contact between architectural feature and the three of frame level weighted graph and three kinds of weighted graphs of region class weighted graph, obtain video lens and concentrate each soft label representing frame and each region of each video lens and each video lens, and then determine that this video lens concentrates each in those video lens except inquiry video lens and the dependency (or similarity) between inquiry video lens according to these soft labels, thus wherein the result of retrieval being defined as to those video lens of inquiry video lens the most relevant (or most like).Compared with existing video retrieval technology, video process apparatus can utilize camera lens level weighted graph, frame level weighted graph and three kinds of weighted graphs of region class weighted graph to realize video frequency searching simultaneously according to an embodiment of the invention, fully excavate the contact between three of the above weighted graph, and the video lens of tape label and the video lens of not tape label can be utilized, without being subject to the impact of video lens this problem of resource-constrained of tape label, therefore, it is possible to obtain better Video processing effect, that is, result can be retrieved more accurately.
Additionally, in another application example of video process apparatus according to an embodiment of the invention, above-mentioned " Video processing " can also be video concept detection, that is, above-mentioned video process apparatus can be video concept detection device.
It is said that in general, video concept detection in order that determine in video lens to be measured and whether comprise (or comprising in great degree) some given semantic concept.This technology can apply to many aspects of people's daily life, for instance the management of video library, home videos, video request program etc..
In this example, video lens to be measured is the video lens of not tape label, and it may be embodied in above-mentioned video lens and concentrates, it is also possible to is not included in this video lens and concentrates.Wherein, the quantity of video lens to be measured can be one, it is also possible to is multiple.Additionally, as it has been described above, at least part of video lens that video lens in this example is concentrated is the video lens of tape label, this is the semantic concept relevant in order to determine the video lens of whether video lens to be measured comprises to video lens is concentrated tape label.
With aforementioned exemplary analogously, by pretreatment unit 110, feature extraction unit 120, weighted graph sets up unit 130, construction of function unit 140 and a series of of computing unit 150 process operation, value of calculation and each of this video lens concentration that can obtain the soft label of each video lens that video lens is concentrated represent frame, the value of calculation of the soft label in each each region representing frame, thus, utilize the value of calculation of these soft labels, video processing unit 160 may determine that whether video lens to be measured includes above-mentioned semantic concept, namely, whether comprise the semantic concept relevant to the video lens of the tape label that video lens is concentrated.Such as, when above-mentioned video lens integrates the video lens that comprises with positive label and the negative video lens of label of band and the video lens with positive label is label as the video lens of the negative label of the video lens of Tiger and the band video lens that to be label be " non-tiger ", then it is readily apparent that, " semantic concept relevant to the video lens of the tape label that video lens is concentrated " i.e. Tiger, that is, video processing unit 160 needs to judge whether comprise tiger in the content of video lens to be measured.Specifically, the function of video processing unit 160 and process can be realized by structure as shown in Figure 5.
Fig. 5 is the block diagram of a kind of possible exemplary construction schematically showing video processing unit 160 in this application example, as shown in Figure 1.As it is shown in figure 5, video processing unit 160 can include first judgement subelement the 510, first computation subunit the 520, second computation subunit 530 and second judges subelement 540.
In order to judge whether video lens to be measured comprises " semantic concept relevant to the video lens of the tape label that video lens is concentrated ", first can judge that subelement 510 judges whether this video lens to be measured includes concentrating at above-mentioned video lens by first, then can describe ensuing computing in two kinds of situation.
nullIn the first scenario、Namely when video lens to be measured is not included in above-mentioned video lens concentration,The first computation subunit 520 can be passed through first extract at least one of this video lens to be measured and represent frame,Then,Each frame that represents of the video lens to be measured extracted is carried out image segmentation,Respectively obtain each multiple regions representing frame,And then the result that can obtain according to computing unit 150 (namely,Each video lens that video lens is concentrated、Each of each video lens represents frame、Respectively represent the value of calculation of the soft label in each region of frame),Obtain the value of calculation of the soft label of video lens to be measured、The value of calculation (concrete calculating process will be described below) of the soft label in each each region representing frame in the value of calculation of each soft label representing frame in video lens to be measured and video lens to be measured.Then, value of calculation according to video lens to be measured and the soft label wherein respectively representing frame, each region, it is possible to calculated the degree value of the relevant semantic concept of the video lens of the tape label that video lens to be measured comprises with above-mentioned video lens is concentrated by the second computation subunit 530.
Wherein, in such a case, it is possible to calculate the soft label of video lens to be measured according to following expression formula three to five and wherein respectively represent the soft label in frame, each region:
Expression formula three:
f S ( S t ) = &Sigma; g [ f g S W S ( S t , S g ) / d g S ] &Sigma; g W S ( S t , S g ) / d t S = d t S &Sigma; g [ f g S W S ( S t , S g ) / d g S ] &Sigma; g W S ( S t , S g )
Expression formula four:
f F ( F t ) = &Sigma; i [ f i F W F ( F t , F i ) / d i F ] &Sigma; i W F ( F t , F i ) / d t F = d t F &Sigma; i [ f i F W F ( F t , F i ) / d i F ] &Sigma; i W F ( F t , F i )
Expression formula five:
f R ( R t ) = &Sigma; k [ f k R W R ( R t , R k ) / d k R ] &Sigma; k W R ( R t , R k ) / d t R = d t R &Sigma; k [ f k R W R ( R t , R k ) / d k R ] &Sigma; k W R ( R t , R k )
Wherein, StRepresent video lens to be measured, FtRepresent that certain in video lens to be measured represents frame, RtRepresent that certain in video lens to be measured represents certain region in frame, fS(St) represent the soft label of video lens to be measured, fF(Ft) represent that certain in video lens to be measured represents frame FtSoft label, fR(Rt) represent video lens to be measured certain represent frame FtIn certain region RtSoft label, Sg、FiAnd RkIdentical with implication described above.WS(St, Sg) for video lens S to be measuredtThe g video lens S with video lens concentrationgBetween the similarity based on camera lens level visual signature,For the g video lens S that video lens is concentratedgWith the similarity sum of the video lens corresponding to all nodes in camera lens level weighted graph,Similarity sum for video lens to be measured with the video lens corresponding to all nodes in camera lens level weighted graph.WF(Ft, Fi) represent frame F for certain in video lens to be measuredtFrame F is represented with all i-th represented in frame of all video lens of video lens concentrationiBetween the similarity based on frame level visual signature,Frame F is represented for above-mentioned i-thiWith the similarity sum of the representative frame corresponding to nodes all in frame level weighted graph,Frame F is represented for certain in video lens to be measuredtSimilarity sum with the representative frame corresponding to nodes all in frame level weighted graph.WR(Rt, Rk) represent frame F for certain in video lens to be measuredtIn certain region RtAll kth region R represented in all regions comprised in frame with all video lens that video lens is concentratedkBetween the similarity based on region class visual signature,For above-mentioned kth region RkWith the similarity sum in the region corresponding to nodes all in region class weighted graph,Frame F is represented for certain in video lens to be measuredtIn certain region RtSimilarity sum with the region corresponding to nodes all in region class weighted graph.
Additionally, in another kind of implementation, it is also possible to calculate the soft label of video lens to be measured according to following expression formula six to eight and wherein respectively represent the soft label in frame, each region:
Expression formula six:
f S ( S t ) = &Sigma; g f g S W S ( S t , S g ) &Sigma; g W S ( S t , S g )
Expression formula seven:
f F ( F t ) = &Sigma; i f i F W F ( F t , F i ) &Sigma; i W F ( F t , F i )
Expression formula eight:
f R ( R t ) = &Sigma; k f k R W R ( R t , R k ) &Sigma; k W R ( R t , R k )
It should be noted that when utilizing expression formula described above to construct cost function, it is possible to use expression formula three to five calculates the soft label of video lens to be measured and wherein respectively represents the soft label in frame, each region;Similarly, when utilizing expression formula described above two to construct cost function, then expression formula six to eight can be utilized to calculate the soft label of video lens to be measured and wherein respectively represent the soft label in frame, each region.
In the latter case namely when video lens to be measured includes concentrating at above-mentioned video lens, then had been obtained for the value of calculation of the soft label respectively representing frame, each region in the value of calculation of the soft label of video lens to be measured and video lens to be measured by the calculating of computing unit 150, therefore can calculate the degree value of the relevant semantic concept of the video lens of the tape label that video lens to be measured comprises with above-mentioned video lens is concentrated either directly through the second computation subunit 530 in the way of described above.
Wherein, following formula can be utilized in above-mentioned two situations to the degree value of the relevant semantic concept of the video lens calculating the tape label that video lens to be measured comprises with above-mentioned video lens is concentrated: &alpha;f g S + &beta; max F i &Element; S g f i F + ( 1 - &alpha; - &beta; ) max R k &Element; F i 0 f k R . Wherein, in formula, parameter is with as defined above identical, repeats no more here.
Thus, in this example, judge that subelement the 510, first computation subunit 520 and the second computation subunit 530 can obtain the degree value of the relevant semantic concept of the video lens of the tape label that video lens to be measured comprises with above-mentioned video lens is concentrated by first.Such as, when positive label is Tiger, by these three subelement 510-530, it may be determined that the degree comprising tiger in the content of video lens to be measured has much.
Then, if above-mentioned degree value is more than or equal to the 4th predetermined threshold value (such as the 4th predetermined threshold value is 0.75), then second judges that subelement 540 can be determined that in the content of this video lens to be measured and comprises " semantic concept relevant with the video lens of the tape label that above-mentioned video lens is concentrated ".If above-mentioned degree value is less than the 4th predetermined threshold value, then second judges that subelement 540 can be determined that in the content of this video lens to be measured and does not comprise " semantic concept relevant with the video lens of the tape label that above-mentioned video lens is concentrated ".
According to above-mentioned result of determination, when second judges that subelement 540 judges that video lens to be measured comprises " semantic concept relevant with the video lens of the tape label that above-mentioned video lens is concentrated ", second judges that subelement 540 can also utilize this semantic concept to mark above-mentioned video lens to be measured further, namely, it is possible to use the label information of the video lens with positive label that above-mentioned video lens is concentrated marks video lens to be measured.Such as, when second judges that subelement 540 judges that video lens to be measured comprises Tiger, then can stick the label of Tiger for video lens to be measured.
In this example, video process apparatus utilizes the contact between camera lens level weighted graph, frame level weighted graph and architectural feature and the three of three kinds of weighted graphs of region class weighted graph, obtain video lens and concentrate each soft label representing frame and each region in each video lens and each video lens, and then determine, according to these soft labels, the semantic concept that the video lens of whether video lens to be measured comprises with above-mentioned video lens is concentrated tape label is relevant.Compared with existing video concept detection technology, utilize the video concept detection that the above-mentioned example of video process apparatus according to an embodiment of the invention realizes can make use of three of the above weighted graph simultaneously, make use of the characteristic information of video lens more fully, and fully excavated the contact between three kinds of weighted graphs, the video lens simultaneously utilizing not tape label on the basis of video lens of tape label can also utilized, it is thus possible to obtain better Video processing effect, that is, concept detection result more accurately can be obtained.
By above description, apply video process apparatus according to an embodiment of the invention, the weighted graph of camera lens level weighted graph, frame level weighted graph and region class weighted graph three types can be utilized, utilize the characteristic information of video lens more fully, and fully excavated the contact between three kinds of weighted graphs such that it is able to obtain good Video processing effect.
Additionally, embodiments of the invention additionally provide a kind of method for processing video frequency.A kind of exemplary process of the method is described below in conjunction with Fig. 6 and Fig. 7.
Fig. 6 is the flow chart schematically showing a kind of exemplary process of method for processing video frequency according to an embodiment of the invention.As shown in Figure 6, the handling process 600 of method for processing video frequency starts from step S610 according to an embodiment of the invention, then performs step S620.
In step S620, at least one of each video lens that extraction video lens is concentrated represents frame respectively, and each frame that represents extracted is divided into multiple region, and wherein, at least part of video lens that this video lens is concentrated is the video lens of tape label.Then step S630 is performed.Wherein, involved in step S620 image segmentation can adopt method as discussed above.
In step S630, extract the camera lens level visual signature of each video lens, frame level visual signature and region class visual signature that above-mentioned video lens is concentrated.Then step S640 is performed.Wherein, the characteristic of above-mentioned three kinds of visual signatures, selection and extracting method etc. are all referred to corresponding contents mentioned above, omit it here and specifically describe.
In step S640, build camera lens level weighted graph according to above-mentioned camera lens level visual signature, build frame level weighted graph according to above-mentioned frame level visual signature, and build region class weighted graph according to above-mentioned zone level visual signature.Then step S650 is performed.
Wherein, in one implementation, can so build above-mentioned camera lens level weighted graph, frame level weighted graph and region class weighted graph: each video lens concentrated using above-mentioned video lens is as node, between each two node, the similarity on camera lens level visual signature is as the weights on the weighting limit between said two node, builds above-mentioned camera lens level weighted graph;Each frame that represents of each video lens concentrated using above-mentioned video lens is as node, and between each two node, the similarity on frame level visual signature is as the weights on the weighting limit between the two node, builds above-mentioned frame level weighted graph;And each each region represented in frame of each video lens concentrated using above-mentioned video lens is as node, between each two node, the similarity on region class visual signature is as the weights on the weighting limit between the two node, builds above-mentioned zone level weighted graph.
In step S650, each soft label representing frame in the soft label of each video lens concentrated with above-mentioned video lens, above-mentioned each video lens and the soft label in above-mentioned each each region represented in frame are for unknown quantity, structural information according to above-mentioned camera lens level weighted graph, frame level weighted graph and region class weighted graph, and according to the soft label of above-mentioned each video lens, relation between above-mentioned each soft label representing frame and the soft label in above-mentioned each region, construct cost function.Then step S660 is performed.
Specifically, it is possible to use method described further below builds above-mentioned cost function.
Such as, can according to the structural information of above-mentioned camera lens level weighted graph, frame level weighted graph and region class weighted graph, set such first constraints: the difference between the soft label of two video lens making camera lens level visual signature more similar is more little, the difference that make frame level visual signature more similar two represent between the soft label of frame is more little, and makes the difference between the soft label in more similar two regions of region class visual signature more little.
nullIn addition,Relation between can also concentrating the soft label of the representative frame in the video lens of tape label according to the soft label of the video lens of the tape label that above-mentioned video lens is concentrated and this video lens and representing the soft label in the region in frame sets such second constraints: the soft label of the video lens of the negative label of order band、The soft label in all regions representing frame as far as possible close-1 in the video lens of all soft labels representing frame and the negative label of band in the video lens of the negative label of band,The soft label of the order video lens with positive label is as far as possible close to 1,The soft label of order has maximum soft label representative frame in the video lens with positive label represents the soft label of video lens belonging to frame close to this as far as possible,And make the soft label in the region in each possible positive frame in the video lens with positive label with maximum soft label try one's best close to the soft label representing frame belonging to this region.
Wherein it is desired to illustrate, each possible positive frame can be the value frame higher than the 5th predetermined threshold value of its soft label, or can be the value wherein the including soft label frame higher than the region of the 6th predetermined threshold value.
It is then possible to construct cost function according to above-mentioned first constraints and the second constraints.Wherein, cost function here can adopt any one form described above, repeats no more here.
Then, in step S660, by solving the optimal problem of above-mentioned cost function, to obtain the value of calculation of above-mentioned unknown quantity.Then step S670 is performed.
Wherein, in step S660, it is possible to adopt the computational methods of a kind of iteration that above-mentioned optimal problem is solved, that is, by fSAnd fFCompose initial value, utilize above-mentioned cost function to be iterated calculating, finally obtain fR、fFAnd fSValue.Below, a kind of possible example calculations process of step S660 will be described in conjunction with Fig. 7.
Fig. 7 is the flow chart of a kind of possible exemplary process schematically showing step S670 as shown in Figure 6.As it is shown in fig. 7, in step S710, first video lens is concentrated the soft label f of each video lensSAnd the soft label f respectively representing frame in the video lens each video lens of concentrationFCompose initial value.Wherein, in step S710, it is possible to adopt and carry out the soft label f to each video lens with the method initializing the processing method performed by subelement 410 identical described by Fig. 4SAnd respectively represent the soft label f of frameFCompose initial value, repeat no more here.Then step S720 is performed.
It follows that by the circular treatment of step S720-S750, calculate fR、fFAnd fSValue.
Wherein, in step S720, concentrate the soft label f of each video lens according to video lensSCurrency, and concentrate the soft label f respectively representing frame in each video lens according to video lensFCurrency, cost function is converted into the minimization problem of belt restraining, and utilizes CCCP to solve the minimization problem of this belt restraining, concentrate the soft label f in each region respectively representing frame in each video lens obtaining video lensRValue of calculation, be used as fRCurrency.Wherein, in step S720, it is possible to adopt the method identical with above in conjunction with the processing method performed by the 3rd computation subunit 420 described by Fig. 4 to obtain fRValue of calculation, repeat no more here.Then step S730 is performed.
In step S730, concentrate the soft label f of each video lens according to video lensSCurrency, and concentrate the soft label f in each region respectively representing frame in each video lens according to video lensRCurrency, cost function is converted into the minimization problem of belt restraining, and utilizes CCCP to solve the minimization problem of this belt restraining, concentrate the soft label f respectively representing frame in each video lens obtaining video lensFValue of calculation, be used as fFCurrency.Wherein, in step S730, it is possible to adopt the method identical with above in conjunction with the processing method performed by the 4th computation subunit 430 described by Fig. 4 to obtain fFValue of calculation, repeat no more here.Then step S740 is performed.
In step S740, concentrate the soft label f respectively representing frame in each video lens according to video lensFCurrency, and concentrate the soft label f in each region respectively representing frame in each video lens according to video lensRCurrency, directly utilize cost function and be calculated obtaining video lens and concentrate the soft label f of each video lensSValue of calculation, be used as fSCurrency.Wherein, in step S740, it is possible to adopt the method identical with above in conjunction with the processing method performed by the 5th computation subunit 440 described by Fig. 4 to obtain fSValue of calculation, repeat no more here.Then step S750 is performed.
In step S750, it is judged that fR、fFAnd fSCurrent result of calculation whether restrain: if, then using the currency of the soft label in the soft label of described each video lens, the described soft label respectively representing frame and described each region as the unknown quantity in described cost function value of calculation retain after, continue executing with step S670;Otherwise, return and perform step S720, carry out next iteration calculating.
So, calculated by the loop iteration of step S720-750, it is possible to respectively at fR、fFAnd fSIn two of which when determining, with the element in that vector remaining for variable, such that it is able to solve the value of the element obtained in that vector remaining, be sequentially carried out iteration according to order as above, that is, according to fR→fF→fS→fR→fF→fS→ ... order be circulated iterative computation, till numerical convergence.Thus, the value of calculation of unknown quantity in the cost function described in step S650 can just be obtained.
Go to Fig. 6, in step S670, the value of calculation according to above-mentioned computed unknown quantity, carry out Video processing.Then step S680 is performed.
Wherein, in an example of method for processing video frequency according to an embodiment of the invention, Video processing involved in this method for processing video frequency can be video frequency searching, and in this case, above-mentioned video lens collection includes the inquiry video lens of tape label.Additionally, in this case, in step S670, it is possible to according to the value of calculation obtained, above-mentioned video lens is concentrated except this inquiry video lens and this inquiry video lens similarity video lens in preset range be judged to retrieval result.Above-mentioned retrieval result can be such video lens: its soft label is higher than the video lens of the first predetermined threshold value, and the soft label in this video lens with the representative frame of maximum soft label is higher than the 3rd predetermined threshold value higher than the second predetermined threshold value and this soft label representing the region in frame with maximum soft label.
In addition, above-mentioned retrieval result it is also possible that video lens: its soft label and wherein there is the soft label of representative frame of maximum soft label and this represents the top n video lens that three's weighted sum of soft label in the region in frame with maximum soft label is maximum, wherein, N is positive integer.Wherein, in the special case situation that inquiry video lens itself only includes a two field picture, inquiry video lens is query image, then namely the representative frame extracted for inquiry video lens is this query image itself.Additionally, above-mentioned retrieval result can also be output according to a definite sequence, for instance any one in several orders as described above can export retrieval result, repeat no more here.
Additionally, in another example of method for processing video frequency according to an embodiment of the invention, Video processing involved in this method for processing video frequency can also be video concept detection.In this case, in step S670, it is possible to the value of calculation according to the soft label obtained in step S660, the semantic concept that the video lens of the tape label that whether video lens to be measured of not tape label comprises with video lens is concentrated is relevant is judged.Wherein, in this case, the process of step S670 can be realized by step S810-S860 as shown in Figure 8, and this concrete processing procedure is described below.
Fig. 8 is shown schematically under the sample situation that Video processing is video concept detection, the flow chart of a kind of possible exemplary process of step S670 as shown in Figure 6.As shown in Figure 8, in step S810, judge whether video lens to be measured includes concentrating at above-mentioned video lens: if so, then due to the soft label of video lens to be measured and wherein respectively represent frame, the soft label in each region all obtains, therefore can directly perform step S830 to carry out next step calculate;If it is not, then due to the soft label of video lens to be measured and wherein respectively represent frame, each region soft label all unknown, therefore can perform step S820 to obtain these soft labels.
In step S820, can first extract at least one frame in video lens to be measured, it is used as the representative frame of this video lens to be measured, then each frame that represents of video lens to be measured is divided into multiple region, then the value of calculation according to the unknown quantity obtained, obtains the value of calculation of the soft label in each each region representing frame in the value of calculation of each soft label representing frame in the value of calculation of the soft label of video lens to be measured, video lens to be measured and video lens to be measured.Circular is referred to described above for calculating each method representing frame and the wherein soft label in each region in the soft label of video lens to be measured, video lens to be measured, does not repeat them here.After execution of step S820, perform step S830.
In step S830, the value of calculation of each soft label representing frame and the value of calculation of the soft label in each each region representing frame in video lens to be measured in the value of calculation of the soft label according to the video lens to be measured obtained, surveyed video lens, calculate the degree value of the relevant semantic concept of the video lens of the tape label that video lens to be measured comprises with above-mentioned video lens is concentrated;Wherein, this calculating process with reference to the computational methods about degree value described by above appropriate section, can also repeat no more.Then step S840 is performed.
In step S840, judge that whether this degree value is more than or equal to the 4th predetermined threshold value: if, then perform step S850, namely, step S850 judging, video lens to be measured comprises " semantic concept relevant with the video lens of the tape label that above-mentioned video lens is concentrated ", then performs subsequent step (such as step S680 as shown in Figure 6);Otherwise, perform step S860, that is, judge that in step S860 video lens to be measured does not comprise " semantic concept relevant with the video lens of the tape label that above-mentioned video lens is concentrated ", then perform subsequent step (such as step S680 as shown in Figure 6).
It should be noted that, process or the son of each step in above-mentioned according to an embodiment of the invention method for processing video frequency process, can have the processing procedure of operation or the function being capable of the unit of video process apparatus described above, subelement, module or submodule, and can reaching similar technique effect, at this, the descriptions thereof are omitted.
By above description, apply method for processing video frequency according to an embodiment of the invention, the weighted graph of camera lens level weighted graph, frame level weighted graph and region class weighted graph three types can be utilized, utilize the characteristic information of video lens more fully, and fully excavated the contact between three kinds of weighted graphs such that it is able to obtain good Video processing effect.Additionally, method for processing video frequency can also utilize the video lens of tape label and the video lens of not tape label according to an embodiment of the invention simultaneously, thus greatly enrich available resource, it is possible to make better processing effect, more accurate.
Additionally, embodiments of the invention additionally provide a kind of equipment, this equipment includes video process apparatus as above.Wherein, this equipment can be such as photographing unit, video camera, computer (such as, desktop computer or notebook computer), mobile phone (such as smart mobile phone), personal digital assistant and multimedia processing apparatus (such as, there are MP3, MP4 etc. of video playback capability), etc..
The said equipment according to embodiments of the present invention, by integrated above-mentioned video process apparatus, therefore the weighted graph of camera lens level weighted graph, frame level weighted graph and region class weighted graph three types can be utilized, utilize the characteristic information of video lens more fully, and fully excavated the contact between three kinds of weighted graphs such that it is able to obtain good Video processing effect.
Each component units, subelement etc. in above-mentioned video process apparatus according to an embodiment of the invention can be configured by the mode of software, firmware, hardware or its combination in any.When being realized by software or firmware, to the machine (such as the general-purpose machinery 900 shown in Fig. 9) with specialized hardware structure, the program constituting this software or firmware can be installed from storage medium or network, this machine is when being provided with various program, it is possible to perform the various functions of above-mentioned each component units, subelement.
Fig. 9 shows the structure diagram that can be used to realize the hardware configuration of a kind of possible messaging device of video process apparatus and method for processing video frequency according to an embodiment of the invention.
In fig .9, CPU (CPU) 901 is according to the program stored in read only memory (ROM) 902 or the program various process of execution being loaded into random access memory (RAM) 903 from storage part 908.In RAM903, always according to needing to store the data required when CPU901 performs various process etc..CPU901, ROM902 and RAM903 are connected to each other via bus 904.Input/output interface 905 is also connected to bus 904.
Components described below is also connected to input/output interface 905: importation 906 (including keyboard, mouse etc.), output part 907 (include display, such as cathode ray tube (CRT), liquid crystal display (LCD) etc., and speaker etc.), storage part 908 (including hard disk etc.), communications portion 909 (including NIC such as LAN card, modem etc.).Communications portion 909 performs communication process via network such as the Internet.As required, driver 910 can be connected to input/output interface 905.Detachable media 911 such as disk, CD, magneto-optic disk, semiconductor memory etc. can be installed in driver 910 as required so that the computer program read out can be installed in storage part 908 as required.
When realizing above-mentioned series of processes by software, it is possible to from network such as the Internet or the program installing composition software from storage medium such as detachable media 911.
It will be understood by those of skill in the art that this storage medium be not limited to shown in Fig. 9 wherein have program stored therein and equipment distributes the detachable media 911 of the program that provides a user with separately.The example of detachable media 911 comprises disk (comprising floppy disk), CD (comprising compact disc read-only memory (CD-ROM) and digital universal disc (DVD)), magneto-optic disk (comprising mini-disk (MD) (registered trade mark)) and semiconductor memory.Or, storage medium can be hard disk of comprising etc., wherein computer program stored in ROM902, storage part 908, and is distributed to user together with the equipment comprising them.
Additionally, the invention allows for the program product that a kind of storage has the instruction code of machine-readable.When described instruction code is read by machine and performs, above-mentioned method for processing video frequency according to an embodiment of the invention can be performed.Correspondingly, the various storage mediums of the such as disk, CD, magneto-optic disk, semiconductor memory etc. for carrying this program product are also included within disclosure of the invention.
Herein above in the description of the specific embodiment of the invention, the feature described for a kind of embodiment and/or illustrate can use in one or more other embodiment in same or similar mode, combined with the feature in other embodiment, or substitute the feature in other embodiment.
It should be emphasized that term " include/comprise " refers to the existence of feature, key element, step or assembly herein when using, but it is not precluded from the existence of one or more further feature, key element, step or assembly or additional.Relate to the term " first " of ordinal number, " second " etc. are not offered as enforcement order or the importance degree of feature, key element, step or assembly that these terms limit, and be only used to describe clear for the purpose of and be arranged to and be identified between these features, key element, step or assembly.
Additionally, the method for various embodiments of the present invention be not limited to specifications described in or accompanying drawing shown in time sequencing perform, it is also possible to according to other time sequencing, concurrently or independently executable.Therefore, the technical scope of the present invention is not construed as limiting by the execution sequence of the method described in this specification.
It should be further understood that each operating process according to the said method of the present invention can also realize in the way of being stored in the computer executable program in various machine-readable storage medium.
And, the purpose of the present invention can also be accomplished by: have the storage medium of above-mentioned executable program code to be supplied to system or equipment directly or indirectly storage, and computer or the CPU (CPU) in this system or equipment reads and perform said procedure code.
Now, as long as this system or equipment have the function of execution program, then embodiments of the present invention are not limited to program, and this program can also be arbitrary form, such as, target program, interpreter perform program or be supplied to the shell script etc. of operating system.
These machinable mediums above-mentioned include but not limited to: various memorizeies and memory element, semiconductor equipment, and disk cell is light, magnetic and magneto-optic disk such as, and other is suitable to the medium etc. of storage information.
It addition, client computer is by being connected to the corresponding website on the Internet, and the computer program code according to the present invention downloaded and is installed in computer and then performs this program, it is also possible to realizing the present invention.
Finally, it can further be stated that, in this article, such as left and right, first and second or the like relational terms be used merely to separate an entity or operation with another entity or operating space, and not necessarily require or imply the relation that there is any this reality between these entities or operation or sequentially.And, term " includes ", " comprising " or its any other variant are intended to comprising of nonexcludability, so that include the process of a series of key element, method, article or equipment not only include those key elements, but also include other key elements being not expressly set out, or also include the key element intrinsic for this process, method, article or equipment.When there is no more restriction, statement " including ... " key element limited, it is not excluded that there is also other identical element in including the process of described key element, method, article or equipment.
To sum up, in an embodiment according to the present invention, the invention provides following scheme:
1. 1 kinds of video process apparatus of remarks, including: pretreatment unit, it is arranged to and extracts at least one of each video lens that video lens concentrates respectively and represent frame, and each frame that represents extracted is divided into multiple region, wherein, at least part of video lens of described video lens collection is the video lens of tape label;Feature extraction unit, it is arranged to the camera lens level visual signature of each video lens, frame level visual signature and the region class visual signature that extract described video lens and concentrate;Weighted graph sets up unit, and it is arranged to and builds camera lens level weighted graph according to described camera lens level visual signature, builds frame level weighted graph according to described frame level visual signature, and builds region class weighted graph according to described region class visual signature;Construction of function unit, it is arranged to the soft label in each soft label representing frame in the soft label of each video lens concentrated with described video lens, described each video lens and described each each region represented in frame for unknown quantity, structural information according to described camera lens level weighted graph, described frame level weighted graph and described region class weighted graph, and according to the soft label of described each video lens, relation between described each soft label representing frame and the soft label in described each region, construct cost function;Computing unit, its optimal problem being configured to solve described cost function, it is thus achieved that the value of calculation of described unknown quantity;And video processing unit, it is arranged to the value of calculation obtained according to described computing unit and carries out Video processing.
The remarks 2. video process apparatus according to remarks 1, described video process apparatus is video frequency searching device, wherein, described video lens collection includes the inquiry video lens of tape label, and described video processing unit is arranged to the value of calculation obtained according to described computing unit, described video lens is concentrated the similarity of except described inquiry video lens except and the described inquiry video lens video lens in preset range be judged to retrieve result.
The remarks 3. video process apparatus according to remarks 2, wherein, described video lens concentrates the similarity of except described inquiry video lens and described inquiry video lens video lens in preset range to be the one in following video lens: its soft label is higher than the video lens of the first predetermined threshold value, and the soft label in this video lens with the representative frame of maximum soft label is higher than the 3rd predetermined threshold value higher than the second predetermined threshold value and this soft label representing the region in frame with maximum soft label;And its soft label and wherein there is the soft label of representative frame of maximum soft label and this represents the top n video lens that three's weighted sum of soft label in the region in frame with maximum soft label is maximum, wherein, N is positive integer.
The remarks 4. video process apparatus according to remarks 2 or 3, wherein, when described inquiry video lens only includes a two field picture, described inquiry video lens is query image, and the representative frame in described inquiry video lens is described query image itself.
The remarks 5. video process apparatus according to remarks 1, described video process apparatus is video concept detection device, wherein, described video processing unit is arranged to the result obtained according to described computing unit, it is determined that the semantic concept that video lens of the tape label that whether video lens to be measured comprises with described video lens is concentrated of tape label is not relevant.
The remarks 6. video process apparatus according to remarks 5, wherein, described video processing unit includes: first judges subelement, and it is arranged to whether the described video lens to be measured of judgement includes concentrating at described video lens;First computation subunit, it is arranged to when described video lens to be measured is not included in described video lens concentration, at least one extracting described video lens to be measured represents frame, and each frame that represents of described video lens to be measured is divided into multiple region, and according to the result that described computing unit obtains, obtain the value of calculation of the soft label in each each region representing frame in the value of calculation of each soft label representing frame in the value of calculation of the soft label of described video lens to be measured, described video lens to be measured and described video lens to be measured;Second computation subunit, it is arranged to the result obtained according to described first computation subunit, calculates described video lens to be measured and comprises the degree value of the semantic concept relevant with the video lens of the tape label that described video lens is concentrated;And second judge subelement, it is arranged to and judges that described video lens to be measured comprises the semantic concept relevant with the video lens of the tape label that described video lens is concentrated when the computed described degree value of described second computation subunit is more than or equal to four predetermined threshold value, and judges that described video lens to be measured does not comprise the semantic concept relevant with the video lens of the tape label that described video lens is concentrated when described degree value is less than described four predetermined threshold value.
The remarks 7. video process apparatus according to remarks 5 or 6, wherein, described video processing unit is additionally configured to when described video lens to be measured is judged as and comprises the semantic concept relevant with the video lens of the tape label that described video lens is concentrated, and uses the label of the video lens with positive label that described video lens concentrates to mark described video lens to be measured.
Remarks 8. is according to described video process apparatus arbitrary in remarks 1-7, wherein, described weighted graph is set up unit and is included: first sets up subelement, it is arranged to each video lens of concentrating using described video lens as node, between each two node, the similarity on camera lens level visual signature is as the weights on the weighting limit between said two node, builds described camera lens level weighted graph;Second sets up subelement, it is arranged to each video lens of concentrating using described video lens each and represents frame as node, between each two node, the similarity on frame level visual signature is as the weights on the weighting limit between said two node, builds described frame level weighted graph;And the 3rd set up subelement, it is arranged to each video lens of concentrating using described video lens each and represents each region of frame as node, between each two node, the similarity on region class visual signature is as the weights on the weighting limit between said two node, builds described region class weighted graph.
Remarks 9. is according to described video process apparatus arbitrary in remarks 1-8, wherein, described construction of function unit includes: first sets subelement, it is arranged to according to described camera lens level weighted graph, the structural information of described frame level weighted graph and described region class weighted graph, set such first constraints: the difference between the soft label of two video lens making camera lens level visual signature more similar is more little, the difference that make frame level visual signature more similar two represent between the soft label of frame is more little, and make the difference between the soft label in more similar two regions of region class visual signature more little;Second sets subelement, it is arranged to the soft label according to described each video lens, relation between described each soft label representing frame and the soft label in described each region is to set such second constraints: the soft label of the video lens of the negative label of order band, the soft label in all regions representing frame as far as possible close-1 in the video lens of all soft labels representing frame and the negative label of band in the video lens of the negative label of band, the soft label of the order video lens with positive label is as far as possible close to 1, the soft label of order has maximum soft label representative frame in the video lens with positive label represents the soft label of video lens belonging to frame close to this as far as possible, and make the soft label in the region in each possible positive frame in the video lens with positive label with maximum soft label try one's best close to the soft label representing frame belonging to this region;And construction of function subelement, the soft label in each each region representing frame of each video lens that each soft label representing frame of each video lens that it is arranged to the soft label of each video lens concentrated with described video lens, described video lens is concentrated and described video lens are concentrated is for unknown quantity, according to described first constraints and described second constraints, construct cost function.
The remarks 10. video process apparatus according to remarks 9, wherein, described possible positive frame is such frame: the value of the soft label of this frame is higher than the 5th predetermined threshold value;Or this frame includes the soft label region higher than the 6th predetermined threshold value.
Remarks 11. is according to described video process apparatus arbitrary in remarks 1-10, and wherein, described computing unit includes:
Initializing subelement, it is arranged to the soft label that described video lens is concentrated each video lens and described video lens concentrates the soft label respectively representing frame in each video lens to compose initial value;
3rd computation subunit, it is arranged to the currency of the soft label concentrating each video lens according to described video lens, and the currency according to the soft label respectively representing frame in the described video lens each video lens of concentration, described cost function is converted into the minimization problem of belt restraining, and utilize the concavo-convex process of belt restraining to solve the minimization problem of this belt restraining, to obtain the value of calculation that described video lens concentrates the soft label in each region respectively representing frame in each video lens;
4th computation subunit, it is arranged to the currency of the soft label concentrating each video lens according to described video lens, and the currency of the soft label according to each region respectively representing frame in the described video lens each video lens of concentration, described cost function is converted into the minimization problem of belt restraining, and utilize the concavo-convex process of belt restraining to solve the minimization problem of this belt restraining, concentrate the value of calculation of the soft label respectively representing frame in each video lens obtaining described video lens;
5th computation subunit, it is arranged to the currency according to the soft label respectively representing frame in the described video lens each video lens of concentration, and the currency of the soft label according to each region respectively representing frame in the described video lens each video lens of concentration, utilize described cost function to be calculated obtaining the value of calculation of the soft label that described video lens concentrates each video lens;And
3rd judges subelement, it is arranged in every order three computation subunit, after 4th computation subunit and the 5th computation subunit have performed once to calculate respectively successively, judge that described video lens concentrates the soft label of each video lens, described video lens concentrates whether the currency of the soft label in the soft label respectively representing frame in each video lens and each region respectively representing frame in the described video lens each video lens of concentration restrains: if, then by the soft label of described each video lens, the currency of the soft label in the described soft label respectively representing frame and described each region retains as the value of calculation of the unknown quantity in described cost function;Otherwise, next iteration calculating is carried out respectively, till the 3rd judges the currency convergence that subelement judges the soft label in the soft label of described each video lens, the described soft label respectively representing frame and described each region again with the 3rd computation subunit, the 4th computation subunit and the 5th computation subunit.
12. 1 kinds of method for processing video frequency of remarks, including: at least one of each video lens that extraction video lens is concentrated represents frame respectively, and each frame that represents extracted is divided into multiple region, wherein, at least part of video lens of described video lens collection is the video lens of tape label;Extract the camera lens level visual signature of each video lens, frame level visual signature and region class visual signature that described video lens is concentrated;Build camera lens level weighted graph according to described camera lens level visual signature, build frame level weighted graph according to described frame level visual signature, and build region class weighted graph according to described region class visual signature;Each soft label representing frame in the soft label of each video lens concentrated with described video lens, described each video lens and the soft label in described each each region represented in frame are for unknown quantity, structural information according to described camera lens level weighted graph, described frame level weighted graph and described region class weighted graph, and according to the soft label of described each video lens, relation between described each soft label representing frame and the soft label in described each region, construct cost function;By solving the optimal problem of described cost function, it is thus achieved that the value of calculation of described unknown quantity;And carry out Video processing according to the value of calculation obtained.
The remarks 13. method for processing video frequency according to remarks 12, described Video processing is video frequency searching, wherein, described video lens collection includes the inquiry video lens of tape label, and described carry out the step of Video processing according to the value of calculation obtained and include: according to the value of calculation obtained, described video lens is concentrated the similarity of except described inquiry video lens except and the described inquiry video lens video lens in preset range be judged to retrieve result.
The remarks 14. method for processing video frequency according to remarks 13, wherein, described video lens concentrates the similarity of except described inquiry video lens and described inquiry video lens video lens in preset range to be the one in following video lens: its soft label is higher than the video lens of the first predetermined threshold value, and the soft label in this video lens with the representative frame of maximum soft label is higher than the 3rd predetermined threshold value higher than the second predetermined threshold value and this soft label representing the region in frame with maximum soft label;And its soft label and wherein there is the soft label of representative frame of maximum soft label and this represents the top n video lens that three's weighted sum of soft label in the region in frame with maximum soft label is maximum, wherein, N is positive integer.
The remarks 15. method for processing video frequency according to remarks 13 or 14, wherein, when described inquiry video lens only includes a two field picture, described inquiry video lens is query image, and the representative frame in described inquiry video lens is described query image itself.
The remarks 16. method for processing video frequency according to remarks 12, wherein, described Video processing is video concept detection, and described carry out the step of Video processing according to the value of calculation obtained and include: according to the value of calculation obtained, it is determined that the semantic concept that video lens of the tape label that whether video lens to be measured comprises with described video lens is concentrated of tape label is not relevant.
The remarks 17. method for processing video frequency according to remarks 16, wherein, the semantic concept that the video lens of the tape label that whether video lens to be measured comprises with described video lens is concentrated of described judgement not tape label is relevant includes: judge whether described video lens to be measured includes concentrating at described video lens;When described video lens to be measured is not included in described video lens concentration, at least one extracting described video lens to be measured represents frame, and each frame that represents of described video lens to be measured is divided into multiple region, and the value of calculation according to described unknown quantity, obtain the value of calculation of the soft label in each each region representing frame in the value of calculation of each soft label representing frame in the value of calculation of the soft label of described video lens to be measured, described video lens to be measured and described video lens to be measured;In the value of calculation of the soft label according to the video lens described to be measured obtained, described video lens to be measured, the value of calculation of each soft label representing frame and the value of calculation of the soft label in each each region representing frame in described video lens to be measured, calculate described video lens to be measured and comprise the degree value of the semantic concept relevant with the video lens of the tape label that described video lens is concentrated;And judge that when described degree value is more than or equal to four predetermined threshold value described video lens to be measured comprises the semantic concept relevant with the video lens of the tape label that described video lens is concentrated, and judge that described video lens to be measured does not comprise the semantic concept relevant with the video lens of the tape label that described video lens is concentrated when described degree value is less than described four predetermined threshold value.
The remarks 18. method for processing video frequency according to remarks 16 or 17, also include: when described video lens to be measured is judged as and comprises the semantic concept relevant with the video lens of the tape label that described video lens is concentrated, use the label of the video lens with positive label that described video lens concentrates to mark described video lens to be measured.
Remarks 19. is according to described method for processing video frequency arbitrary in remarks 12-18, wherein, described build camera lens level weighted graph according to described camera lens level visual signature, build frame level weighted graph according to described frame level visual signature and build region class weighted graph according to described region class visual signature and include: each video lens concentrated using described video lens is as node, between each two node, the similarity on camera lens level visual signature is as the weights on the weighting limit between said two node, builds described camera lens level weighted graph;Each frame that represents of each video lens concentrated using described video lens is as node, and between each two node, the similarity on frame level visual signature is as the weights on the weighting limit between said two node, builds described frame level weighted graph;And each video lens concentrated using described video lens each represents each region of frame as node, between each two node, the similarity on region class visual signature is as the weights on the weighting limit between said two node, builds described region class weighted graph.
Remarks 20. is according to described method for processing video frequency arbitrary in remarks 12-19, wherein, described cost function constructs in the following way: the structural information according to described camera lens level weighted graph, described frame level weighted graph and described region class weighted graph, set such first constraints: the difference between the soft label of two video lens making camera lens level visual signature more similar is more little, the difference that make frame level visual signature more similar two represent between the soft label of frame is more little, and makes the difference between the soft label in more similar two regions of region class visual signature more little;Soft label according to described each video lens, relation between described each soft label representing frame and the soft label in described each region is to set such second constraints: the soft label of the video lens of the negative label of order band, the soft label in all regions representing frame as far as possible close-1 in the video lens of all soft labels representing frame and the negative label of band in the video lens of the negative label of band, the soft label of the order video lens with positive label is as far as possible close to 1, the soft label of order has maximum soft label representative frame in the video lens with positive label represents the soft label of video lens belonging to frame close to this as far as possible, and make the soft label in the region in each possible positive frame in the video lens with positive label with maximum soft label try one's best close to the soft label representing frame belonging to this region;And the soft label in each each region representing frame of each video lens that each soft label representing frame of each video lens of concentrating of the soft label of each video lens concentrated with described video lens, described video lens and described video lens are concentrated is for unknown quantity, according to described first constraints and described second constraints, construct cost function.
The remarks 21. method for processing video frequency according to remarks 20, wherein, described possible positive frame is such frame: the value of the soft label of this frame is higher than the 5th predetermined threshold value;Or this frame includes the value region higher than the 6th predetermined threshold value of soft label.
Remarks 22. is according to described method for processing video frequency arbitrary in remarks 12-21, and wherein, the described value of calculation obtaining described unknown quantity by solving the optimal problem of described cost function includes:
Soft label and described video lens that described video lens is concentrated each video lens concentrate the soft label respectively representing frame in each video lens to compose initial value;
The currency of the soft label of each video lens is concentrated according to described video lens, and the currency according to the soft label respectively representing frame in the described video lens each video lens of concentration, described cost function is converted into the minimization problem of belt restraining, and utilize the concavo-convex process of belt restraining to solve the minimization problem of this belt restraining, to obtain the value of calculation that described video lens concentrates the soft label in each region respectively representing frame in each video lens;
The currency of the soft label of each video lens is concentrated according to described video lens, and the currency of the soft label according to each region respectively representing frame in the described video lens each video lens of concentration, described cost function is converted into the minimization problem of belt restraining, and utilize the concavo-convex process of belt restraining to solve the minimization problem of this belt restraining, concentrate the value of calculation of the soft label respectively representing frame in each video lens obtaining described video lens;
Currency according to the soft label respectively representing frame in the described video lens each video lens of concentration, and the currency of the soft label according to each region respectively representing frame in the described video lens each video lens of concentration, utilize described cost function to be calculated obtaining the value of calculation of the soft label that described video lens concentrates each video lens;And
Judge whether the currency of the soft label in the soft label of described each video lens, the described soft label respectively representing frame and described each region restrains: if so, then retained as the value of calculation of the unknown quantity in described cost function by the currency of the soft label in the soft label of described each video lens, the described soft label respectively representing frame and described each region;Otherwise, carry out next time iterative computation and to calculate successively respectively the value of calculation of the soft label of the value of calculation of the soft label in described each region, the value of calculation of the described soft label respectively representing frame and described each video lens, until the value of calculation of soft label in computed described each region, the value of calculation of the described soft label respectively representing frame and described each video lens soft label value of calculation currency convergence till.
23. 1 kinds of equipment of remarks, it include arbitrary in remarks 1-11 as described in video process apparatus.
The remarks 24. equipment according to remarks 23, wherein, described equipment is any one in following equipment: photographing unit, video camera, computer, mobile phone, personal digital assistant and multimedia processing apparatus.
25. 1 kinds of computer-readable recording mediums of remarks, on it, storage has the computer program that can be performed by computing equipment, and described program can make described computing equipment perform the method for processing video frequency according to any one in remarks 12-22 upon execution.

Claims (10)

1. a video process apparatus, including:
Pretreatment unit, it is arranged to and extracts at least one of each video lens that video lens concentrates respectively and represent frame, and each frame that represents extracted is divided into multiple region, wherein, at least part of video lens of described video lens collection is the video lens of tape label;
Feature extraction unit, it is arranged to the camera lens level visual signature of each video lens, frame level visual signature and the region class visual signature that extract described video lens and concentrate;
Weighted graph sets up unit, and it is arranged to and builds camera lens level weighted graph according to described camera lens level visual signature, builds frame level weighted graph according to described frame level visual signature, and builds region class weighted graph according to described region class visual signature;
Construction of function unit, it is arranged to the soft label in each soft label representing frame in the soft label of each video lens concentrated with described video lens, described each video lens and described each each region represented in frame for unknown quantity, structural information according to described camera lens level weighted graph, described frame level weighted graph and described region class weighted graph, and according to the soft label of described each video lens, relation between described each soft label representing frame and the soft label in described each region, construct cost function;
Computing unit, its optimal problem being configured to solve described cost function, it is thus achieved that the value of calculation of described unknown quantity;And
Video processing unit, it is arranged to the value of calculation obtained according to described computing unit and carries out Video processing.
2. video process apparatus according to claim 1, described video process apparatus is video frequency searching device, wherein,
Described video lens collection includes the inquiry video lens of tape label, and
Described video processing unit is arranged to the value of calculation obtained according to described computing unit, and described video lens is concentrated except described inquiry video lens except and described inquiry video lens similarity video lens in preset range be judged to retrieve result.
3. video process apparatus according to claim 2, wherein, described video lens concentrates the similarity of except described inquiry video lens and described inquiry video lens video lens in preset range to be the one in following video lens:
Its soft label is higher than the video lens of the first predetermined threshold value, and the soft label in this video lens with the representative frame of maximum soft label is higher than the 3rd predetermined threshold value higher than the second predetermined threshold value and this soft label representing the region in frame with maximum soft label;And
Its soft label and wherein have the soft label of representative frame of maximum soft label and this represents the top n video lens that three's weighted sum of soft label in the region in frame with maximum soft label is maximum, wherein, N is positive integer.
4. video process apparatus according to claim 1, described video process apparatus is video concept detection device, wherein,
Described video processing unit is arranged to the result obtained according to described computing unit, it is determined that the semantic concept that video lens of the tape label that whether video lens to be measured comprises with described video lens is concentrated of tape label is not relevant.
5. video process apparatus according to claim 4, wherein, described video processing unit includes:
First judges subelement, and it is arranged to whether the described video lens to be measured of judgement includes concentrating at described video lens;
First computation subunit, it is arranged to when described video lens to be measured is not included in described video lens concentration, at least one extracting described video lens to be measured represents frame, and each frame that represents of described video lens to be measured is divided into multiple region, and according to the result that described computing unit obtains, obtain the value of calculation of the soft label in each each region representing frame in the value of calculation of each soft label representing frame in the value of calculation of the soft label of described video lens to be measured, described video lens to be measured and described video lens to be measured;
Second computation subunit, it is arranged to the result obtained according to described first computation subunit, calculates described video lens to be measured and comprises the degree value of the semantic concept relevant with the video lens of the tape label that described video lens is concentrated;And
Second judges subelement, it is arranged to and judges that described video lens to be measured comprises the semantic concept relevant with the video lens of the tape label that described video lens is concentrated when the computed described degree value of described second computation subunit is more than or equal to four predetermined threshold value, and judges that described video lens to be measured does not comprise the semantic concept relevant with the video lens of the tape label that described video lens is concentrated when described degree value is less than described four predetermined threshold value.
6., according to described video process apparatus arbitrary in claim 1-5, wherein, described construction of function unit includes:
First sets subelement, it is arranged to the structural information according to described camera lens level weighted graph, described frame level weighted graph and described region class weighted graph, set such first constraints: the difference between the soft label of two video lens making camera lens level visual signature more similar is more little, the difference that make frame level visual signature more similar two represent between the soft label of frame is more little, and makes the difference between the soft label in more similar two regions of region class visual signature more little;
Second sets subelement, it is arranged to the soft label according to described each video lens, relation between described each soft label representing frame and the soft label in described each region is to set such second constraints: the soft label of the video lens of the negative label of order band, the soft label in all regions representing frame as far as possible close-1 in the video lens of all soft labels representing frame and the negative label of band in the video lens of the negative label of band, the soft label of the order video lens with positive label is as far as possible close to 1, the soft label of order has maximum soft label representative frame in the video lens with positive label represents the soft label of video lens belonging to frame close to this as far as possible, and make the soft label in the region in each possible positive frame in the video lens with positive label with maximum soft label try one's best close to the soft label representing frame belonging to this region;And
Construction of function subelement, the soft label in each each region representing frame of each video lens that each soft label representing frame of each video lens that it is arranged to the soft label of each video lens concentrated with described video lens, described video lens is concentrated and described video lens are concentrated is for unknown quantity, according to described first constraints and described second constraints, construct cost function.
7. video process apparatus according to claim 6, wherein, described possible positive frame is such frame:
The value of the soft label of this frame is higher than the 5th predetermined threshold value;Or
This frame includes the soft label region higher than the 6th predetermined threshold value.
8., according to described video process apparatus arbitrary in claim 1-5, wherein, described computing unit includes:
Initializing subelement, it is arranged to the soft label that described video lens is concentrated each video lens and described video lens concentrates the soft label respectively representing frame in each video lens to compose initial value;
3rd computation subunit, it is arranged to the currency of the soft label concentrating each video lens according to described video lens, and the currency according to the soft label respectively representing frame in the described video lens each video lens of concentration, described cost function is converted into the minimization problem of belt restraining, and utilize the concavo-convex process of belt restraining to solve the minimization problem of this belt restraining, to obtain the value of calculation that described video lens concentrates the soft label in each region respectively representing frame in each video lens;
4th computation subunit, it is arranged to the currency of the soft label concentrating each video lens according to described video lens, and the currency of the soft label according to each region respectively representing frame in the described video lens each video lens of concentration, described cost function is converted into the minimization problem of belt restraining, and utilize the concavo-convex process of belt restraining to solve the minimization problem of this belt restraining, concentrate the value of calculation of the soft label respectively representing frame in each video lens obtaining described video lens;
5th computation subunit, it is arranged to the currency according to the soft label respectively representing frame in the described video lens each video lens of concentration, and the currency of the soft label according to each region respectively representing frame in the described video lens each video lens of concentration, utilize described cost function to be calculated obtaining the value of calculation of the soft label that described video lens concentrates each video lens;And
3rd judges subelement, it is arranged in every order three computation subunit, after 4th computation subunit and the 5th computation subunit have performed once to calculate respectively successively, judge that described video lens concentrates the soft label of each video lens, described video lens concentrates whether the currency of the soft label in the soft label respectively representing frame in each video lens and each region respectively representing frame in the described video lens each video lens of concentration restrains: if, then by the soft label of described each video lens, the currency of the soft label in the described soft label respectively representing frame and described each region retains as the value of calculation of the unknown quantity in described cost function;Otherwise, next iteration calculating is carried out respectively, till the 3rd judges the currency convergence that subelement judges the soft label in the soft label of described each video lens, the described soft label respectively representing frame and described each region again with the 3rd computation subunit, the 4th computation subunit and the 5th computation subunit.
9. a method for processing video frequency, including:
At least one of each video lens that extraction video lens is concentrated represents frame respectively, and each frame that represents extracted is divided into multiple region, and wherein, at least part of video lens of described video lens collection is the video lens of tape label;
Extract the camera lens level visual signature of each video lens, frame level visual signature and region class visual signature that described video lens is concentrated;
Build camera lens level weighted graph according to described camera lens level visual signature, build frame level weighted graph according to described frame level visual signature, and build region class weighted graph according to described region class visual signature;
Each soft label representing frame in the soft label of each video lens concentrated with described video lens, described each video lens and the soft label in described each each region represented in frame are for unknown quantity, structural information according to described camera lens level weighted graph, described frame level weighted graph and described region class weighted graph, and according to the soft label of described each video lens, relation between described each soft label representing frame and the soft label in described each region, construct cost function;
By solving the optimal problem of described cost function, it is thus achieved that the value of calculation of described unknown quantity;And
Video processing is carried out according to the value of calculation obtained.
10. an equipment, it include arbitrary in claim 1-8 as described in video process apparatus.
CN201210071078.3A 2012-03-16 2012-03-16 Video process apparatus, method for processing video frequency and equipment Active CN103312938B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201210071078.3A CN103312938B (en) 2012-03-16 2012-03-16 Video process apparatus, method for processing video frequency and equipment
JP2013053509A JP6015504B2 (en) 2012-03-16 2013-03-15 Video processing apparatus, video processing method and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210071078.3A CN103312938B (en) 2012-03-16 2012-03-16 Video process apparatus, method for processing video frequency and equipment

Publications (2)

Publication Number Publication Date
CN103312938A CN103312938A (en) 2013-09-18
CN103312938B true CN103312938B (en) 2016-07-06

Family

ID=49137695

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210071078.3A Active CN103312938B (en) 2012-03-16 2012-03-16 Video process apparatus, method for processing video frequency and equipment

Country Status (2)

Country Link
JP (1) JP6015504B2 (en)
CN (1) CN103312938B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103310221B (en) * 2012-03-16 2016-04-13 富士通株式会社 Image processing apparatus, image processing method and equipment
CN111368732B (en) * 2020-03-04 2023-09-01 阿波罗智联(北京)科技有限公司 Method and device for detecting lane lines
CN114390200B (en) * 2022-01-12 2023-04-14 平安科技(深圳)有限公司 Camera cheating identification method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6078688A (en) * 1996-08-23 2000-06-20 Nec Research Institute, Inc. Method for image segmentation by minimizing the ratio between the exterior boundary cost and the cost of the enclosed region
CN101299241A (en) * 2008-01-14 2008-11-05 浙江大学 Method for detecting multi-mode video semantic conception based on tensor representation
CN102184242A (en) * 2011-05-16 2011-09-14 天津大学 Cross-camera video abstract extracting method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5633734B2 (en) * 2009-11-11 2014-12-03 ソニー株式会社 Information processing apparatus, information processing method, and program
JP5531865B2 (en) * 2010-09-03 2014-06-25 カシオ計算機株式会社 Image processing apparatus, image processing method, and program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6078688A (en) * 1996-08-23 2000-06-20 Nec Research Institute, Inc. Method for image segmentation by minimizing the ratio between the exterior boundary cost and the cost of the enclosed region
CN101299241A (en) * 2008-01-14 2008-11-05 浙江大学 Method for detecting multi-mode video semantic conception based on tensor representation
CN102184242A (en) * 2011-05-16 2011-09-14 天津大学 Cross-camera video abstract extracting method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Multi-Layer Multi-Instance Learning for Video Concept Detection;Zhiwei Gu et al.;《IEEE TRANSACTIONS ON MULTIMEDIA》;20081212;第10卷(第8期);第1605-1616页 *
基于视觉与结构谱特征融合的视频检索;翟素兰 等;《计算机工程与应用》;20121111;第176-180页 *

Also Published As

Publication number Publication date
JP6015504B2 (en) 2016-10-26
JP2013196700A (en) 2013-09-30
CN103312938A (en) 2013-09-18

Similar Documents

Publication Publication Date Title
CN111737476B (en) Text processing method and device, computer readable storage medium and electronic equipment
CN108287864B (en) Interest group dividing method, device, medium and computing equipment
CN112819023B (en) Sample set acquisition method, device, computer equipment and storage medium
US7805010B2 (en) Cross-ontological analytics for alignment of different classification schemes
CN111832440B (en) Face feature extraction model construction method, computer storage medium and equipment
CN112199532B (en) Zero sample image retrieval method and device based on Hash coding and graph attention machine mechanism
CN112988979A (en) Entity identification method, entity identification device, computer readable medium and electronic equipment
CN113011172B (en) Text processing method, device, computer equipment and storage medium
CN110188763A (en) A kind of image significance detection method based on improvement graph model
Wang et al. Multi-task Joint Sparse Representation Classification Based on Fisher Discrimination Dictionary Learning.
US20230308381A1 (en) Test script generation from test specifications using natural language processing
CN103312938B (en) Video process apparatus, method for processing video frequency and equipment
CN103310221B (en) Image processing apparatus, image processing method and equipment
CN113656700A (en) Hash retrieval method based on multi-similarity consistent matrix decomposition
Zhou et al. Research on the unbiased probability estimation of error-correcting output coding
CN111709475B (en) N-gram-based multi-label classification method and device
CN112417260B (en) Localized recommendation method, device and storage medium
Lin et al. Gaussian similarity preserving for cross-modal hashing
CN109918058B (en) Information processing apparatus and method, and method of recommending code in programming environment
Wu et al. Research on top-k association rules mining algorithm based on clustering
CN113568983A (en) Scene graph generation method and device, computer readable medium and electronic equipment
CN112199531A (en) Cross-modal retrieval method and device based on Hash algorithm and neighborhood map
CN112329459A (en) Text labeling method and neural network model construction method
CN113033827B (en) Training method and device for deep forest
CN116136866B (en) Knowledge graph-based correction method and device for Chinese news abstract factual knowledge

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant