CN102393909B

CN102393909B - Method for detecting goal events in soccer video based on hidden markov model

Info

Publication number: CN102393909B
Application number: CN201110180084.8A
Authority: CN
Inventors: 同鸣; 谢文娟; 张伟
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2011-06-29
Filing date: 2011-06-29
Publication date: 2014-01-15
Anticipated expiration: 2031-06-29
Also published as: CN102393909A

Abstract

The invention discloses a method for detecting goal events in a soccer video based on a hidden markov model. By the method, the problems of complicated event detection system model and low detection rate in the prior art are solved. The method comprises the following steps of: firstly, performing physical shot segmenting and semantic shot labeling on a training video and a test video, and respectively forming a training data set and a test data set respectively by using acquired semantic shot sequences; secondly, calculating an initial parameter of the hidden markov model according to the training data set; thirdly, training an initial model by adopting a Baum-Welch algorithm and the training data set and establishing the hidden markov model for the goal events; fourthly, calculating a probability of the model for generating training data by adopting a forward algorithm and acquiring a judgment threshold value; and finally, calculating a probability of the model for generating test data and detecting the goal event in the test video according to the judgment threshold value. By the method, detection for semantic goal events can be realized accurately; and the method is applied to the field of semantic analysis, such as detection for wonderful events in the soccer video and the like.

Description

Football video goal event detecting method based on Hidden Markov Model (HMM)

Technical field

The invention belongs to Video Information Retrieval Techniques: field, relate to sports video semantic analysis, can be used in football video goal event detection, to detect quickly and accurately goal event.

Background technology

Sports video enjoys the extensive concern of researcher and various circles of society because having huge audient colony and huge commercial value.The automatic detection of excellent event of sports video is the focus of Video Semantic Analysis area research always, its difficult point is to solve low-level feature to the semantic gap between high-level semantic, this Chinese scholars is conducted extensive research, obtained high achievement in research.

Method based on machine learning mainly contains at present:

(1)Ding?Y，Fan?G?L.Sports?Video?Mining?via?Multichannel?Segmental?Hidden?Markov?Models[J]IEEE?Trans.on?Multimedia，2009，11(7)：1301-1309。The superperformance of the method based on Hidden Markov modeling sequential rule, built hyperchannel part Hidden Markov Model (HMM), can carry out simultaneously video structure by different level, parallel parsing, caught more exactly the mutual rule between a plurality of hidden Markov chains, semantic event detection accuracy has reached 87.06%, but the structure more complicated of model.

(2)Sadlier?D?A，O′Connor?N?E.Event?detection?in?field?sports?video?using?audio-visual?features?and?a?support?vector?machine[J]IEEE?Trans.on?Circuits?and?Systems?for?Video?Technology，2005，15(10)：1225-1233。The method is by setting up audio and video characteristic detecting unit, adopts support vector machine to merge extracting feature, realized the detection of eventful and noneventful event in the videos such as football, rugby.The method is due to semantic event detection problem is directly solved as tagsort problem, and do not make full use of semantic information, and its event detection accuracy only reaches 74%.

(3)Xu?C?S，Zhang?Y?F，Zhu?G?Y，et?al.Using?webcast?text?for?semantic?event?detection?in?broadcast?sports?video[J].IEEE?Trans.on?Multimedia，2008，10(7)：1342-1355。This method adopts the critical event in potential semantic analysis Sampling network text, using text detection result and low-level feature as the input of conditional random field models, realizes the detection of multiple Context event in football, basketball sport video.But it is more consuming time that this method builds the process of model, do not adopt hidden state variable, can not more effectively excavate the potential rule of Context event, limited the raising that detects performance.

Summary of the invention

The present invention seeks to the deficiency for above-mentioned prior art, a kind of football video goal event detecting method based on Hidden Markov Model (HMM) is proposed, to build simple and effective goal event model, adopt hidden state variable to excavate the sequential rule of Context event, and introduce semantic information, improve event detection accuracy.

For achieving the above object, technical scheme of the present invention comprises the steps:

(1) to N ₁individual training video fragment and N ₂individual test video fragment is carried out respectively physical shots and is cut apart, and obtains the physical shots sequence P of d training video fragment _dphysical shots sequence Q with e test video fragment _e, wherein, d ∈ { 1,2, L, N ₁, e ∈ { 1,2, L, N ₂;

(2) the physical shots sequence P to d training video fragment _din physical shots and the physical shots sequence Q of e test video fragment _ein physical shots carry out respectively semantic tagger, obtain by camera lens far away, in the semantic shot sequence Q of d training video fragment forming of camera lens, close-up shot, spectators' camera lens and playback camera lens _dsemantic shot sequence Z with e test video fragment _e, and by N ₁the semantic shot sequence O of individual training video fragment ₁, O ₂, L,

as training dataset

by N ₂the semantic shot sequence Z of individual test video fragment ₁, Z ₂, L,

as test data set

(3) to the N in training dataset O ₁individual semantic shot sequence O ₁, O ₂, L,

the residing game status of each semantic shot in each semantic shot sequence of artificial judgment, i.e. state θ is carried out in match ₁or the state θ that calls the time ₂, obtain N ₁individual status switch W ₁, W ₂, L,

(4) definition semantic shot integrates as ε={ s ₁, s ₂, s ₃, s ₄, s ₅, wherein, s ₁, s ₂, s ₃, s ₄, s ₅represent five kinds of semantic shots, i.e. s ₁for camera lens far away, s ₂for middle camera lens, s ₃for close-up shot, s ₄for spectators' camera lens, s ₅for playback camera lens;

(5) according to the N in training dataset O ₁individual semantic shot sequence O ₁, O ₂, L,

with corresponding N ₁individual status switch W ₁, W ₂, L, initial model parameter lambda=(U, A, the C) that calculates Hidden Markov Model (HMM), wherein, U is original state probability vector, and A is state transition probability matrix, and C is observed value probability matrix;

(6) according to training dataset O, adopt Baum-Welch algorithm to train the initial model parameter lambda of Hidden Markov Model (HMM)=(U, A, C), obtain the final mask parameter of the Hidden Markov Model (HMM) of the event of scoring and utilize this final mask parameter to set up the Hidden Markov Model (HMM) of goal event, wherein,

final original state probability vector,

end-state transition matrix,

it is final observed value probability matrix;

(7) according to d semantic shot sequence O in the Hidden Markov Model (HMM) of goal event and training dataset O _d, the Hidden Markov Model (HMM) that adopts forward direction algorithm to calculate goal event produces d semantic shot sequence O _dprobability

(8) according to the Hidden Markov Model (HMM) of goal event, produce N in training dataset O ₁individual semantic shot sequence O ₁, O ₂, L,

probability select

in minimum value as the decision threshold T of goal event ₁:

T_{1} = \min {P (O_{1} | \overset{&OverBar;}{λ}), P (O_{2} | \overset{&OverBar;}{λ}), L, P (O_{N_{1}} | \overset{&OverBar;}{λ})};

(9) according to e semantic shot sequence Z in the Hidden Markov Model (HMM) of goal event and test data set Z _e, the Hidden Markov Model (HMM) that adopts forward direction algorithm to calculate goal event produces e semantic shot sequence Z _eprobability

(10) if in e test video fragment, comprise goal event, if

in e test video fragment, do not comprise goal event.

The present invention compared with prior art has the following advantages:

(1) the present invention, owing to having set up the Hidden Markov Model (HMM) of goal event, takes full advantage of the semantic information in physical shots, improved the detection performance of goal event, and model construction process is simple, does not need complicated training;

(2) the present invention is due to the physical shots of video is labeled as to semantic shot, then the detection of event of scoring of the input using semantic shot sequence as Hidden Markov Model (HMM), effectively alleviate low-level feature to the semantic gap between high-level semantic, improved the detection performance of goal event.

Accompanying drawing explanation

Fig. 1 is the representative frame exemplary plot of football video goal sequence and non-goal sequence;

Fig. 2 is process flow diagram of the present invention.

Embodiment

One, basic theory introduction

Football match is liked by masses deeply, but bout the video data volume is huge, the interested excellent event of spectators is a very little part for the whole match conventionally, therefore, match video is analyzed and processed, and the semanteme of realizing the excellent events such as goal, penalty shot detects most important in football video semantic analysis field.Yet, section of football match video has specific structure, deeply, excavate exactly architectural feature and the contact of this inherence, set up effective section of football match video structural model, make the semanteme detection of excellent event become possibility, in sports video semantic analysis field, there is important theory value and market application foreground.

Section of football match video fragment can be divided into goal video segment and non-goal video segment, each fragment comprises camera lens far away, middle camera lens, close-up shot, spectators' camera lens and playback camera lens, by the analyses of a large amount of true match videos are found, goal fragment contains more close-up shot and playback camera lens, less camera lens far away and middle camera lens.Fig. 1 is the representative frame exemplary plot of sequence and non-goal sequence of scoring in football video, wherein Fig. 1 (a) is goal sequence, it has shown with 5 camera lenses the event of once scoring, and these 5 camera lenses are shooting panorama camera lens far away, shooting sportsman close-up shot, spectators' camera lens, the middle camera lens that comprises some sportsmen and playback camera lens; Fig. 1 (b) is non-goal sequence, and it showed and shown once non-goal event with camera lens far away and intersecting of middle camera lens.

Hidden Markov Model (HMM) is a dual random process, and wherein, Markov chain is basic stochastic process, described the transfer of state, and state is invisible, and another kind of stochastic process has been described the statistical relationship between state and the visible observation sequence of state generation.

The definition of Hidden Markov Model (HMM):

λ=(N, M, U, A, C) or brief note are λ=(U, A, C)

Wherein, the number of the state that N is Hidden Markov Model (HMM), { θ ₁, θ ₂, L, θ _nbe N state in Hidden Markov Model (HMM), q _tfor Hidden Markov Model (HMM) is at t residing state of the moment, q _t∈ { θ ₁, θ ₂, L, θ _n; M is the Hidden Markov Model (HMM) observed value number that status produces at any time, { s ₁, s ₂, L, s _mbe M observed value, E _tfor t moment state q _tthe observed value producing, E _t∈ { s ₁, s ₂, L, s _m; U is original state probability vector, U={U ₁, U ₂, L, U _n, U _i=P (q ₁=θ _i), i=1,2, L, N, U _irepresent that Hidden Markov Model (HMM) is at t=1 moment status q ₁for state θ _iprobability; A is state transition probability matrix, A=(a _ij) _{n * N}, a _ij=P (q _t+1=θ _j| q _t=θ _i), j=1,2, L, N, a _ijrepresent that Hidden Markov Model (HMM) is at t moment status q _tfor state θ _iunder condition, at t+1 moment status q _t+1for state θ _jprobability; C is observed value probability matrix, C=(c _jk) _{n * M}, c _jk=P (E _t=s _k| q _t=θ _j), k=1,2, L, M, c _jkrepresent that Hidden Markov Model (HMM) is engraved in t status q constantly when t _tfor state θ _junder condition, state q _tthe observed value E producing _tfor s _kprobability.

Two, football video goal event detecting method

With reference to Fig. 2, the present invention is based on the football video goal event detecting method of Hidden Markov Model (HMM), step is as follows:

Step 1, carries out physical shots to video segment and cuts apart, and obtains physical shots sequence.

Choose goal video segment as training video fragment, choose goal video segment and non-goal video segment and form test video fragment, to N ₁individual training video fragment and N ₂individual test video fragment is carried out respectively physical shots and is cut apart, and obtains the physical shots sequence P of d training video fragment _dphysical shots sequence Q with e test video fragment _e, wherein, d ∈ { 1,2, L, N ₁, e ∈ { 1,2, L, N ₂.

Step 2, the physical shots sequence P to d training video fragment _din physical shots and the physical shots sequence Q of e test video fragment _ein physical shots carry out respectively semantic tagger, to the physical shots that comprises semantic information, give a semantic label, obtain by camera lens far away, in the semantic shot sequence O of d training video fragment forming of camera lens, close-up shot, spectators' camera lens and playback camera lens _dsemantic shot sequence Z with e test video fragment _e.

(2.1) by the physical shots sequence P of d training video fragment _din physical shots and the physical shots sequence Q of e test video fragment _ein physical shots be all labeled as respectively real-time camera lens and playback camera lens:

(2.1a) will contain N ₃the training video fragment of width two field picture or each the width two field picture in test video fragment are hsv color space from RGB color space conversion, its RGB color space is by red component R, green component G and blue component B form, and obtain value h, the value s of saturation degree component S of chromatic component H, the value v of luminance component V after conversion:

h = \{\begin{matrix} 0, & if & MAX = MIN \\ \frac{1}{6} \times \frac{g - b}{MAX - MIN}, & if & MAX = r and g &GreaterEqual; b \\ \frac{1}{6} \times \frac{g - b}{MAX - MIN} + 1, & if & MAX = r and g < b \\ \frac{1}{6} \times \frac{b - r}{MAX - MIN} + \frac{1}{3}, & if & MAX = g \\ \frac{1}{6} \times \frac{r - g}{MAX - MIN} + \frac{2}{3}, & if & MAX = b \end{matrix}

s = \{\begin{matrix} 0, & if MAX = 0 \\ \frac{MAX - MIN}{MAX} = 1 - \frac{MIN}{MAX}, & otherwise \end{matrix}

v＝MAX

Wherein, r is the normalized value of red component R of each pixel of each width two field picture, g is the normalized value of green component G of each pixel of each width two field picture, b is the normalized value of blue component B of each pixel of each width two field picture, MAX is r, the g of each pixel of each width two field picture, the maximal value in b, MIN is r, the g of each pixel of each width two field picture, the minimum value in b, is calculated as follows:

MAX＝max(r，g，b)

MIN＝min(r，g，b)

r = \frac{r^{'}}{255}

g = \frac{g^{'}}{255}

b = \frac{b^{'}}{255}

Wherein, r ' is the value of red component R of each pixel of each width two field picture, and g ' is the value of green component G of each pixel of each width two field picture, and b ' is the value of blue component B of each pixel of each width two field picture;

(2.1b) according to the corresponding l level of the value h index hue of chromatic component in the n ' width two field picture _lnumber of pixels num (hue _l), calculate index hue in the 256 handle histograms of chromatic component of the n ' width two field picture _lcorresponding value hist _{n '}(hue _l):

hist _n′(hue _l)＝num(hue _l)

Wherein, n ' ∈ { 1,2, L, N ₃, hue _lbe the l level index of the n ' width two field picture chromatic component, l ∈ { 1,2, L, 256}, hue _l∈ { 1,2, L, 256};

(2.1c) according to index hue in the histogram of the chromatic component of n+1 width two field picture _lcorresponding value hist _n+1(hue _l) and the histogram of the chromatic component of n width two field picture in index hue _lcorresponding value hist _n(hue _l), calculate the chroma histogram difference HHD of n+1 width two field picture and n width two field picture _n:

{HHD}_{n} = \frac{1}{L \times K} Σ_{l = 1}^{256} | {hist}_{n + 1} ({hue}_{l}) - {hist}_{n} ({hue}_{l}) |

Wherein, L is the height of each width two field picture, and K is the width of each width two field picture, n ∈ { 1,2, L, N ₃-1};

(2.1d) according to chroma histogram difference HHD _n, calculate the N of this video segment ₃the average HHD of-1 chroma histogram difference:

HHD = \frac{1}{N_{3} - 1} Σ_{n = 1}^{N_{3} - 1} {HHD}_{n};

(2.1e) choose HHD _nbe greater than threshold value T ₂frame, wherein, threshold value T ₂for 2 times of the HHD of this video segment, get T ₂=0.1938;

(2.1f) choose the camera lens ls that the duration is 10～20 frames _w, obtain a series of candidate's logo camera lens

wherein, w ∈ { 1,2, L, N ₄, N ₄for candidate's logo camera lens sum;

(2.1g) real logo camera lens must occur in pairs, and the fragment in the middle of logo camera lens is playback fragment, and playback fragment at least comprises 1 camera lens.Utilize camera lens segmentation procedure to detect candidate's logo camera lens ls _{w '}with candidate's logo camera lens ls _{w '-1}between the camera lens number that comprises of video segment: if the camera lens number that this video segment comprises is greater than 1, camera lens in this video segment is labeled as to playback camera lens, if the camera lens number that this video segment comprises equals 1, camera lens in this video segment is labeled as to real-time camera lens, wherein, w ' ∈ { 2,3, L, N ₄;

(2.2) real-time camera lens is further labeled as to camera lens far away, middle camera lens He Fei place camera lens, the overall situation that wherein camera lens far away provides match to carry out, conventionally contain very large site area, middle camera lens is described one or several sportsmen's whole body and action, also contain certain site area, but be less than camera lens far away, therefore, adopt place ratio PR to distinguish camera lens far away and middle camera lens, i.e. the place pixel number of a width two field picture and the always ratio of pixel number, when some camera lens far away contains part spectators region, site area reduces, place ratio PR also reduces, be easy to camera lens far away and middle camera lens mistake mark, after therefore the present invention goes the cutting of two field picture top to 1/3rd, according to the place ratio PR of two field picture after cutting and the threshold value of choosing, real-time camera lens is further labeled as to camera lens far away, middle camera lens He Fei place camera lens:

(2.2a) choosing 60 width distant view two field pictures in camera lens in real time, according to index hue in 256 handle histograms of the chromatic component of p width two field picture _lcorresponding value hist _p(hue _l), calculate index hue in the cumulative histogram of chromatic component of 60 width distant view two field pictures _lcorresponding value sh (hue _l):

sh ({hue}_{l}) = Σ_{p = 1}^{60} {hist}_{p} ({hue}_{l})

Wherein, hue _lbe the l level index of p width two field picture chromatic component, l ∈ { 1,2, L, 256}, hue _l∈ { 1,2, L, 256}, p ∈ { 1,2, L, 60};

(2.2b) according to index hue in cumulative histogram _lcorresponding value sh (hue _l), the peak F of calculating cumulative histogram:

F＝max{sh(hue ₁)，sh(hue ₂)，L，sh(hue ₂₅₆)}；

(2.2c), according to value that in cumulative histogram, each index is corresponding and the peak F of cumulative histogram, determine the lower limit index hue that meets following condition _low:

sh(hue _low)≥0.2×F

sh(hue _low-1)＜0.2×F

Wherein, sh (hue _low) be cumulative histogram lower limit index hue _lowcorresponding value, sh (hue _low-1) be index hue in cumulative histogram _lowthe value of-1 correspondence;

(2.2d), according to value that in cumulative histogram, each index is corresponding and the peak F of cumulative histogram, determine the upper limit index hue that meets following condition _up:

sh(hue _up)≥0.2×F

sh(hue _up+1)＜0.2×F

Wherein, sh (hue _up) be upper limit index hue in cumulative histogram _upcorresponding value, sh (hue _up+ 1) be index hue in cumulative histogram _upthe value of+1 correspondence;

(2.2e) each width two field picture cutting of real-time camera lens is gone to top 1/3rd, after statistics cutting, in each width two field picture, the value h of chromatic component belongs to interval [hue _low/ 256, hue _up/ 256] place number of pixels C ₁, calculate the place ratio PR of each width two field picture:

PR = \frac{C_{1}}{\frac{2}{3} \times L \times K}

Wherein, L is the height of each width two field picture, and K is the width of each width two field picture;

(2.2f) according to the threshold value T setting ₃, T ₄with the place ratio PR of each width two field picture, judge the type of each width two field picture:

If the place ratio PR of a width two field picture is greater than threshold value T ₃, this width two field picture is distant view two field picture,

If the place ratio PR of a width two field picture is less than or equal to threshold value T ₃and be more than or equal to threshold value T ₄, this width two field picture is middle scape two field picture,

If the place ratio PR of a width two field picture is less than threshold value T ₄, this width two field picture Shi Fei place two field picture,

Wherein, get threshold value T ₃=0.70, T ₄=0.30;

If (2.2g) more than 55% two field picture of real-time camera lens to be marked belongs to distant view two field picture, marking this real-time camera lens is camera lens far away; If more than 55% two field picture of real-time camera lens to be marked belongs to middle scape two field picture, marking this real-time camera lens is middle camera lens; Otherwise be labeled as non-place camera lens;

(2.3) Jiang Fei place camera lens is further labeled as close-up shot and spectators' camera lens, because viewership in spectators' camera lens is more, background is complicated, marginal information is abundant, close-up shot personage's large percentage, smooth region is more, need represent with edge pixel ratio EPR the ratio of edge pixel point number in each width two field picture and total pixel number, therefore the present invention is according to edge pixel ratio EPR and the threshold value of choosing, Jiang Fei place camera lens is further labeled as close-up shot and spectators' camera lens as follows:

(2.3a) each width two field picture of Jiang Fei place camera lens is from RGB color space conversion to YC _bc _rcolor space, obtains the value y of luminance component Y, chroma blue component C _bvalue cb, red color component C _rvalue cr:

y＝0.299r′+0.578g′+0.114b′

cb＝0.564(b′-y)

cr＝0.713(r′-y)

(2.3b) according to the value y of the luminance component Y of each width two field picture, with Canny operator, detect the edge pixel in each width two field picture, obtain the number C of edge pixel ₂;

(2.3c) according to the number C of the edge pixel in each width two field picture ₂, calculate the edge pixel ratio EPR of each width two field picture in non-place camera lens to be marked:

EPR = \frac{C_{2}}{L \times K}

If (2.3d) EPR of a width two field picture is greater than threshold value T ₅, be labeled as spectators' two field picture, otherwise be labeled as feature two field picture, wherein, get T ₅=0.10;

If (2.3e) more than 55% two field picture of non-place camera lens to be marked belongs to spectators' two field picture, marking Gai Fei place camera lens is spectators' camera lens, otherwise is labeled as close-up shot.

Step 3, by N ₁the semantic shot sequence O of individual training video fragment ₁, O ₂, L, as training dataset

as test data set

Step 4, to the N in training dataset O ₁individual semantic shot sequence O ₁, O ₂, L,

Step 5, definition semantic shot integrates as ε={ s ₁, s ₂, s ₃, s ₄, s ₅, wherein, s ₁, s ₂, s ₃, s ₄, s ₅represent five kinds of semantic shots, i.e. s ₁for camera lens far away, s ₂for middle camera lens, s ₃for close-up shot, s ₄for spectators' camera lens, s ₅for playback camera lens;

Step 6, according to the N in training dataset O ₁individual semantic shot sequence O ₁, O ₂, L,

with corresponding N ₁individual status switch W ₁, W ₂, L,

initial model parameter lambda=(U, A, the C) that calculates Hidden Markov Model (HMM), wherein, U is original state probability vector, and A is state-transition matrix, and C is observed value probability matrix.

(6.1) according to N ₁individual semantic shot sequence O ₁, O ₂...,

in in state θ _isemantic shot number x _iand N ₁individual semantic shot sequence O ₁, O ₂...,

in all semantic shot number x, calculate original state probability vector U:

U＝{U ₁，U ₂，L，U _N}

U_{i} = \frac{x_{i}}{x}

Wherein, i={1,2, L, N}, U _irepresent that Hidden Markov Model (HMM) is at t=1 moment status q ₁for state θ _iprobability, the state number that N is Hidden Markov Model (HMM), N=2, q _tfor Hidden Markov Model (HMM) is at t residing state of the moment, q _t∈ { θ ₁, θ ₂;

(6.2) statistics N ₁individual semantic shot sequence O ₁, O ₂, L,

middle semantic shot is from state θ _itransfer to state θ _jnumber x _{(i, j)}and N ₁individual semantic shot sequence O ₁, O ₂, L,

middle semantic shot is from state θ _ibe transferred to the number x of free position _{(i, *)}, according to x _{(i, j)}and x _{(i, *)}computing mode shift-matrix A:

A＝(a _ij) _N×N

a_{ij} = \frac{x_{(i, j)}}{x_{(i, *)}}

Wherein, j={1,2, L, N}, a _ijrepresent that Hidden Markov Model (HMM) is at t moment status q _tfor state θ _tunder condition, at t+1 moment status q _t+1for state θ _jprobability;

(6.3) according to N ₁individual semantic shot sequence O ₁, O ₂, L,

in in state θ _jsemantic shot s _knumber x _{j, k}and N ₁individual semantic shot sequence O ₁, O ₂, L,

in in state θ _jthe total number x of semantic shot _j, calculate observed value probability matrix C:

C＝(c _jk) _N×M

c_{ij} = \frac{x_{j, k}}{x_{j}}

Wherein, k={1,2, L, M}, c _jkrepresent that Hidden Markov Model (HMM) is at t moment status q _tfor state θ _jcondition under, state q _tthe observed value E producing _tfor semantic shot s _kprobability, M is the Hidden Markov Model (HMM) observed value number that status produces at any time, M=5, s ₁, s ₂, s ₃, s ₄, s ₅five kinds of five kinds of observed values that semantic shot is Hidden Markov Model (HMM).

Step 7, according to training dataset O, adopts Baum-Welch algorithm to train the initial model parameter lambda of Hidden Markov Model (HMM)=(U, A, C), obtains the final mask parameter of the Hidden Markov Model (HMM) of the event of scoring

and utilize this final mask parameter to set up the Hidden Markov Model (HMM) of goal event, wherein,

final original state probability vector,

end-state transition matrix,

it is final observed value probability matrix.

Step 8, according to d semantic shot sequence O in the Hidden Markov Model (HMM) of goal event and training dataset O _d, the Hidden Markov Model (HMM) that adopts forward direction algorithm to calculate goal event produces d semantic shot sequence O _dprobability

(8.1) according to d semantic shot sequence O in the Hidden Markov Model (HMM) of goal event and training dataset O _din the 1st semantic shot O _{d, 1}, calculate in final mask parameter

under condition, Hidden Markov Model (HMM) is at t=1 status q constantly ₁for state θ _iand the 1st observed value is d semantic shot sequence O in training dataset O _din the 1st semantic shot O _{d, 1}probability

α_{1}^{d} (i) = {\overset{&OverBar;}{U}}_{i} \times η_{i} (O_{d, 1})

Wherein, for end-state probability vector

i element, η _i(O _{d, 1}) represent that Hidden Markov Model (HMM) is at t=1 moment status q ₁for state θ _iunder condition, state q ₁the observed value E producing ₁for d semantic shot sequence O in training dataset O _din the 1st semantic shot O _{d, 1}probability, work as O _{d, 1}be k kind semantic shot s _ktime,

for final observed value probability matrix the capable k column element of i;

(8.2) according to d semantic shot sequence in training dataset O

and probability

wherein, T _dfor d semantic shot sequence O in training dataset O _din semantic shot number, calculate in final mask parameter

under condition, Hidden Markov Model (HMM) is at t+1 status q constantly _t+1for state θ _jand the 1st observed value is d semantic shot sequence O in training dataset O to t+1 observed value _din the 1st semantic shot O _{d, 1}to t+1 semantic shot O _{d, t+1}probability

obtain

α_{t + 1}^{d} (j) = [Σ_{i = 1}^{n} α_{t}^{d} (i) {\overset{&OverBar;}{a}}_{ij}] η_{j} (O_{d, t + 1}), t = 1,2, L, T_{d} - 1

Wherein,

for in final mask parameter

under condition, Hidden Markov Model (HMM) is at t status q constantly _tfor state θ _iand the 1st observed value is d semantic shot sequence O in training dataset O to t observed value _din the 1st semantic shot O _{d, 1}to t semantic shot O _{d, t}probability,

for end-state transition probability matrix

the capable j column element of i, η _j(O _{d, t+1}) represent that Hidden Markov Model (HMM) is at t+1 moment status q _t+1for state θ _junder condition, state q _t+1the observed value E producing _t+1for d semantic shot sequence O in training dataset O _din t+1 semantic shot O _{d, t+1}probability, as semantic shot O _{d, t+1}be k kind semantic shot s _ktime,

for final observed value probability matrix

the capable k column element of j;

(8.3) according to probability

the Hidden Markov Model (HMM) of calculating goal event produces d semantic shot sequence O _dprobability

P (O_{d} | \overset{&OverBar;}{λ}) = Σ_{j = 1}^{N} α_{T_{d}}^{d} (j) .

Step 9, produces N in training dataset O according to the Hidden Markov Model (HMM) of goal event ₁individual semantic shot sequence O ₁, O ₂, L, probability

select

in minimum value as the decision threshold T of goal event ₁:

T_{1} = \min {P (O_{1} | \overset{&OverBar;}{λ}), P (O_{2} | \overset{&OverBar;}{λ}), L, P (O_{N_{1}} | \overset{&OverBar;}{λ})} .

Step 10, according to e semantic shot sequence Z in the Hidden Markov Model (HMM) of goal event and test data set Z _e, the Hidden Markov Model (HMM) that adopts forward direction algorithm to calculate goal event produces e semantic shot sequence Z _eprobability

(10.1) according to e semantic shot sequence Z in the Hidden Markov Model (HMM) of goal event and test data set Z _ein the 1st semantic shot Z _{e, 1}, calculate in final mask parameter

under condition, Hidden Markov Model (HMM) is at t=1 status q constantly ₁for state θ _iand the 1st observed value is e semantic shot sequence Z in test data set Z _ein the 1st semantic shot Z _{e, 1}probability

β_{1}^{e} (i) = {\overset{&OverBar;}{U}}_{i} \times γ_{i} (Z_{e, 1})

Wherein,

for final original state probability vector

i element, γ _i(Z _{e, 1}) represent that Hidden Markov Model (HMM) is at t=1 moment status q ₁for state θ _iunder condition, state q ₁the observed value E producing ₁for e semantic shot sequence Z in test data set Z _ein the 1st semantic shot Z _{e, 1}probability, work as Z _{e, 1}be k kind semantic shot s _ktime,

for final observed value probability matrix

the capable k column element of i;

(10.2) according to e semantic shot sequence in test data set Z

and probability wherein, T ' _efor e semantic shot sequence Z in test data set Z _ein semantic shot number, calculate in final mask parameter under condition, Hidden Markov Model (HMM) is at t+1 status q constantly _t+1for state θ _jand the 1st observed value is e semantic shot sequence Z in test data set Z to t+1 observed value _ein the 1st semantic shot Z _{e, 1}to t+1 semantic shot Z _{e, t+1}probability obtain

β_{t + 1}^{e} (j) = [Σ_{i = 1}^{N} β_{t}^{e} (i) {\overset{&OverBar;}{a}}_{ij}] γ_{j} (Z_{e, t + 1}), t = 1,2, L, T_{e}^{'} - 1

Wherein,

be illustrated in final mask parameter

under condition, Hidden Markov Model (HMM) is at t status q constantly _tfor state θ _iand the 1st observed value is e semantic shot sequence Z in test data set Z to t observed value _ein the 1st semantic shot Z _{e, 1}to t semantic shot Z _{e, t}probability,

for end-state transition probability matrix

the capable j column element of i, γ _j(Z _{e, t+1}) represent that Hidden Markov Model (HMM) is at t+1 moment status q _t+1for state θ _junder condition, state q _t+1the observed value E producing _t+1for e semantic shot sequence Z in test data set Z _ein t+1 semantic shot Z _{e, t+1}probability, as semantic shot Z _{e, t+1}be k kind semantic shot s _ktime,

for final observed value probability matrix

the capable k column element of j;

(10.3) according to probability

the Hidden Markov Model (HMM) of calculating goal event produces e semantic shot sequence Z _eprobability

P (Z_{e} | \overset{&OverBar;}{λ}) = Σ_{j = 1}^{N} β_{T_{e}^{'}}^{e} (j) .

Step 11, if

in e test video fragment, comprise goal event, if

in e test video fragment, do not comprise goal event, wherein, T ₁for goal event decision threshold, this decision threshold is to select the Hidden Markov Model (HMM) of goal event to produce N in training dataset O ₁individual semantic shot sequence O ₁, O ₂, L,

probability in minimum value.

Effect of the present invention can further illustrate by following experiment simulation.

1) simulated conditions

Experiment video is selected from the match of a plurality of plays of South Africa world cup in 2010, mpeg-1 form, and frame resolution is 352 * 288.Experiment video is divided into two parts, and a part, as training video fragment, contains 21 goal video segments, and remainder, as training video fragment, contains 29 goal video segments and 10 non-goal video segments.Experiment software environment is Matlab R2008a.

2) emulation content and result

Emulation one: according to the Hidden Markov Model (HMM) of the goal event of setting up, 39 test video fragments are asked respectively to the probability that produces test data under this model, according to decision threshold, detect in test video fragment whether contain goal event, experimental result is as shown in table 1.

Table 1

As can be seen from Table 1, the present invention for football video goal event detection precision ratio reached 92.31%, recall ratio reached 82.76%, the detection of goal event has good effect.

Above simulation result shows, the football video goal event detecting method based on Hidden Markov Model (HMM) that the present invention proposes, can realize the detection of goal event exactly.

Claims

1. the football video goal event detecting method based on Hidden Markov Model (HMM), comprises the steps:

(1) to N ₁individual training video fragment and N ₂individual test video fragment is carried out respectively physical shots and is cut apart, and obtains the physical shots sequence P of d training video fragment _dphysical shots sequence Q with e test video fragment _e, wherein, d ∈ 1,2 ..., N ₁, e ∈ 1,2 ..., N ₂;

(2) the physical shots sequence P to d training video fragment _din physical shots and the physical shots sequence Q of e test video fragment _ein physical shots carry out respectively semantic tagger, obtain by camera lens far away, in the semantic shot sequence O of d training video fragment forming of camera lens, close-up shot, spectators' camera lens and playback camera lens _dsemantic shot sequence Z with e test video fragment _e, and by N ₁the semantic shot sequence of individual training video fragment

as training dataset

by N ₂the semantic shot sequence of individual test video fragment as test data set

(3) to the N in training dataset O ₁individual semantic shot sequence

the residing game status of each semantic shot in each semantic shot sequence of artificial judgment, i.e. state θ is carried out in match ₁or the state θ that calls the time ₂, obtain N ₁individual status switch

(5) according to the N in training dataset O ₁individual semantic shot sequence with corresponding N ₁individual status switch

initial model parameter lambda=(U, A, the C) that calculates Hidden Markov Model (HMM), wherein, U is original state probability vector, and A is state transition probability matrix, and C is observed value probability matrix;

(6) according to training dataset O, adopt Baum-Welch algorithm to train the initial model parameter lambda of Hidden Markov Model (HMM)=(U, A, C), obtain the final mask parameter of the Hidden Markov Model (HMM) of the event of scoring

final original state probability vector,

end-state transition probability matrix, it is final observed value probability matrix;

(8) according to the Hidden Markov Model (HMM) of goal event, produce N in training dataset O ₁individual semantic shot sequence

probability , select

in minimum value as the decision threshold T of goal event ₁:

T_{1} = \min {P (O_{1} | \overset{&OverBar;}{λ}), P (O_{2} | \overset{&OverBar;}{λ}), \cdot \cdot \cdot, P (O_{N_{1}} | \overset{&OverBar;}{λ})};

;

(10) if , in e test video fragment, comprise goal event, if

, in e test video fragment, do not comprise goal event.

2. football video goal event detecting method according to claim 1, wherein described " the physical shots sequence P to d training video fragment of step (2) _din physical shots and the physical shots sequence Q of e test video fragment _ein physical shots carry out respectively semantic tagger ", carry out as follows:

(2.1) by the physical shots sequence P of d training video fragment _din physical shots and the physical shots sequence Q of e test video fragment _ein physical shots be all labeled as respectively real-time camera lens and playback camera lens;

(2.2) real-time camera lens is further labeled as to camera lens far away, middle camera lens He Fei place camera lens;

(2.3) Jiang Fei place camera lens is further labeled as close-up shot and spectators' camera lens.

3. football video goal event detecting method according to claim 2, wherein step (2.1) described " by the physical shots sequence P of d training video fragment _din physical shots and the physical shots sequence Q of e test video fragment _ein physical shots be all labeled as respectively real-time camera lens and playback camera lens ", carry out as follows:

(2.1a) will contain N ₃the training video fragment of width two field picture or each the width two field picture in test video fragment are hsv color space from RGB color space conversion, obtain value h, the value s of saturation degree component of chromatic component, the value v of luminance component:

v＝MAX

MAX＝max(r,g,b)

MIN＝min(r,g,b)

r = \frac{r^{'}}{255}

g = \frac{g^{'}}{255}

b = \frac{b^{'}}{255}

Wherein, r' is the value of red component R of each pixel of each width two field picture, and g' is the value of green component G of each pixel of each width two field picture, and b' is the value of blue component B of each pixel of each width two field picture;

(2.1b) according to the corresponding l level of the value index hue of the chromatic component of pixel in the n ' width two field picture _lnumber of pixels num (hue _l), calculate index hue in the 256 handle histograms of chromatic component of the n ' width two field picture _lcorresponding value hist _{n '}(hue _l):

hist _n′(hue _l)＝num(hue _l)

Wherein, n ' ∈ 1,2 ..., N ₃, hue _lbe the l level index of the n ' width two field picture chromatic component, l ∈ 1,2 ..., 256}, hue _l∈ 1,2 ..., 256};

HH D_{n} = \frac{1}{L \times K} Σ_{l = 1}^{256} | {hist}_{n + 1} ({hue}_{l}) - {hist}_{n} ({hue}_{l}) |

Wherein, L is the height of each width two field picture, and K is the width of each width two field picture, n ∈ 1,2 ..., N ₃-1};

HHD = \frac{1}{N_{3} - 1} Σ_{n = 1}^{N_{3} - 1} {HHD}_{n};

(2.1e) choose HHD _nbe greater than threshold value T ₂frame, wherein, threshold value T ₂for 2 times of the HHD of this video segment;

wherein, w ∈ 1,2 ..., N ₄, N ₄for candidate's logo camera lens sum;

(2.1g) utilize camera lens segmentation procedure to detect candidate's logo camera lens ls _{w '}with candidate's logo camera lens ls _{w '-1}between the camera lens number that comprises of video segment: if the camera lens number that this video segment comprises is greater than 1, camera lens in this video segment is labeled as to playback camera lens, if the camera lens number that this video segment comprises equals 1, camera lens in this video segment is labeled as to real-time camera lens, wherein, w ' ∈ { 2,3,, N ₄.

4. football video goal event detecting method according to claim 3, described " real-time camera lens is further labeled as to camera lens far away, middle camera lens He Fei place camera lens " of step (2.2) wherein, carry out as follows:

sh ({hue}_{l}) = Σ_{p = 1}^{60} {hist}_{p} ({hue}_{l})

Wherein, hue _lbe the l level index of p width two field picture chromatic component, l ∈ 1,2 ..., 256}, hue _l∈ 1,2 ..., 256}, p ∈ 1,2 ..., 60};

F＝max{sh(hue ₁),sh(hue ₂),…,sh(hue ₂₅₆)}；

sh(hue _low)≥0.2×F

sh(hue _low-1)＜0.2×F

sh(hue _up)≥0.2×F

sh(hue _up+1)＜0.2×F

(2.2e) each width two field picture cutting of real-time camera lens is gone to top 1/3rd, after statistics cutting, in each width two field picture, the value h of the chromatic component of pixel belongs to interval [hue _low/ 256, hue _up/ 256] place number of pixels C ₁, calculate the place ratio PR of each width two field picture:

PR = \frac{C_{1}}{\frac{2}{3} \times L \times K}

(2.2f), according to the place ratio PR of each width two field picture, judge the type of each width two field picture:

Wherein, get threshold value T ₃=0.70, T ₄=0.30;

If (2.2g) more than 55% two field picture of real-time camera lens to be marked belongs to distant view two field picture, marking this real-time camera lens is camera lens far away; If more than 55% two field picture of real-time camera lens to be marked belongs to middle scape two field picture, marking this real-time camera lens is middle camera lens; Otherwise be labeled as non-place camera lens.

5. football video goal event detecting method according to claim 2, wherein the described “Jiang Fei place camera lens of step (2.3) is further labeled as close-up shot and spectators' camera lens ", carry out as follows:

y＝0.299r'+0.578g'+0.114b'

cb＝0.564(b'-y)

cr＝0.713(r'-y)

(2.3b) according to value y corresponding to each pixel in the luminance component Y of each width two field picture, with Canny operator, detect the edge pixel in each width two field picture, obtain the number C of edge pixel ₂;

EPR = \frac{C_{2}}{L \times K}

6. football video goal event detecting method according to claim 1, wherein step (5) described " according to the N in training dataset O ₁individual semantic shot sequence

with corresponding N ₁individual status switch

calculate initial model parameter lambda=(U, A, the C) of Hidden Markov Model (HMM) ", carry out as follows:

(5.1) according to N ₁individual semantic shot sequence

in state θ _isemantic shot number x _iand N ₁individual semantic shot sequence

in all semantic shot number x, calculate original state probability vector U:

U＝{U ₁,U ₂}

U_{i} = \frac{x_{i}}{x}

Wherein, i={1,2}, U _irepresent that Hidden Markov Model (HMM) is at t=1 moment status q ₁for state θ _iprobability, q _tfor Hidden Markov Model (HMM) is at t residing state of the moment, q _t∈ { θ ₁, θ ₂;

(5.2) statistics N ₁individual semantic shot sequence

middle semantic shot is from state θ _itransfer to state θ _jnumber x _{(i, j)}and N ₁individual semantic shot sequence

middle semantic shot is from state θ _ibe transferred to the number x of free position _{(i, *)}, according to x _{(i, j)}and x _{(i, *)}computing mode transition probability matrix A:

A＝(a _ij) _2×2

a_{ij} = \frac{x_{(i, j)}}{x_{(i, *)}}

Wherein, j={1,2}, a _ijrepresent that Hidden Markov Model (HMM) is at t moment status q _tfor state θ _iunder condition, at t+1 moment status q _t+1for state θ _jprobability;

(5.3) according to N ₁individual semantic shot sequence

in in state θ _jsemantic shot s _knumber x _j,kand N ₁individual semantic shot sequence

C＝(c _jk) _2×M

c_{jk} = \frac{x_{j, k}}{x_{j}}

Wherein, k={1,2 ..., M}, c _jkrepresent that Hidden Markov Model (HMM) is at t moment status q _tfor state θ _jcondition under, state q _tthe observed value E producing _tfor semantic shot s _kprobability, M is the Hidden Markov Model (HMM) observed value number that status produces at any time, M=5, s ₁, s ₂, s ₃, s ₄, s ₅five kinds of five kinds of observed values that semantic shot is Hidden Markov Model (HMM).

7. football video goal event detecting method according to claim 1, wherein step (7) described " according to d semantic shot sequence O in the Hidden Markov Model (HMM) of goal event and training dataset O _d, the Hidden Markov Model (HMM) that adopts forward direction algorithm to calculate goal event produces d semantic shot sequence O _dprobability

", carry out as follows:

(7.1) according to d semantic shot sequence O in the Hidden Markov Model (HMM) of goal event and training dataset O _din the 1st semantic shot O _{d, 1}, calculate in final mask parameter

α_{1}^{d} (i) = {\overset{&OverBar;}{U}}_{i} \times η_{i} (O_{d, 1})

Wherein, i={1,2}, for final original state probability vector

for final observed value probability matrix

the capable k column element of i, k={1,2 ..., M}, M=5;

(7.2) according to d semantic shot sequence in training dataset O

and probability

(i), wherein, T _dfor d semantic shot sequence O in training dataset O _din semantic shot number, calculate in final mask parameter

under condition, Hidden Markov Model (HMM) is at t+1 status q constantly _t+1for state θ _jand the 1st observed value is followed successively by d semantic shot sequence O in training dataset O to t+1 observed value _din the 1st semantic shot O _{d, 1}to t+1 semantic shot O _{d, t+1}probability

obtain

α_{t + 1}^{d} (j) = [Σ_{i = 1}^{2} α_{t}^{d} (i) {\overset{&OverBar;}{a}}_{ij}] η_{j} (O_{d, t + 1}), t = 1,2, \cdot \cdot \cdot, T_{d} - 1

Wherein, j={1,2},

for in final mask parameter

under condition, Hidden Markov Model (HMM) is at t status q constantly _tfor state θ _iand the 1st observed value is followed successively by d semantic shot sequence O in training dataset O to t observed value _din the 1st semantic shot O _{d, 1}to t semantic shot O _d,tprobability,

for end-state transition probability matrix

, for final observed value probability matrix

the capable k column element of j;

(7.3) according to probability

, the Hidden Markov Model (HMM) of calculating goal event produces d semantic shot sequence O _dprobability

:

P (O_{d} | \overset{&OverBar;}{λ}) = Σ_{j = 1}^{2} α_{T_{d}}^{d} (j) .

8. football video goal event detecting method according to claim 1, wherein step (9) described " according to e semantic shot sequence Z in the Hidden Markov Model (HMM) of goal event and test data set Z _e, the Hidden Markov Model (HMM) that adopts forward direction algorithm to calculate goal event produces e semantic shot sequence Z _eprobability

", carry out as follows:

(9.1) according to e semantic shot sequence Z in the Hidden Markov Model (HMM) of goal event and test data set Z _ein the 1st semantic shot Z _{e, 1}, calculate in final mask parameter

β_{1}^{e} (i) = {\overset{&OverBar;}{U}}_{i} \times γ_{i} (Z_{e, 1})

Wherein, i={1,2},

for i the element of final original state probability vector U, γ _i(Z _{e, 1}) represent that Hidden Markov Model (HMM) is at t=1 moment status q ₁for state θ _iunder condition, state q ₁the observed value E producing ₁for e semantic shot sequence Z in test data set Z _ein the 1st semantic shot Z _{e, 1}probability, work as Z _{e, 1}be k kind semantic shot s _ktime,

for final observed value probability matrix the capable k column element of i, k={1,2 ..., M}, M=5;

(9.2) according to e semantic shot sequence in test data set Z

and probability

, wherein, T _e' be e semantic shot sequence Z in test data set Z _ein semantic shot number, calculate in final mask parameter

under condition, Hidden Markov Model (HMM) is at t+1 status q constantly _t+1for state θ _jand the 1st observed value is followed successively by e semantic shot sequence Z in test data set Z to t+1 observed value _ein the 1st semantic shot Z _{e, 1}to t+1 semantic shot Z _{e, t+1}probability

obtain

β_{t + 1}^{e} (j) = [Σ_{i = 1}^{2} β_{t}^{e} (i) {\overset{&OverBar;}{a}}_{ij}] γ_{j} (Z_{e, t + 1}), t = 1,2, \cdot \cdot \cdot, T_{e}^{'} - 1

Wherein, j={1,2},

be illustrated in final mask parameter under condition, Hidden Markov Model (HMM) is at t status q constantly _tfor state θ _iand the 1st observed value is followed successively by e semantic shot sequence Z in test data set Z to t observed value _ein the 1st semantic shot Z _{e, 1}to t semantic shot Z _e,tprobability,

for end-state transition probability matrix

for final observed value probability matrix

the capable k column element of j;

(9.3) according to probability

P (Z_{e} | \overset{&OverBar;}{λ}) = Σ_{j = 1}^{2} β_{T_{e}^{'}}^{e} (j) .