CN110363220A

CN110363220A - Behavior category detection method, device, electronic equipment and computer-readable medium

Info

Publication number: CN110363220A
Application number: CN201910503133.3A
Authority: CN
Inventors: 杨洋
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2019-06-11
Filing date: 2019-06-11
Publication date: 2019-10-22
Anticipated expiration: 2039-06-11
Also published as: CN110363220B

Abstract

The embodiment of the present application discloses behavior category detection method, device, electronic equipment and computer-readable medium.The embodiment of the method includes: to carry out human testing to the frame in target video, determines the human object region in the frame；It determines the scene areas in the frame, human body subject area and the scene areas is separately input into behavior classification detection model trained in advance, obtain behavior classification testing result corresponding with human body subject area, the scene areas respectively；Obtained behavior classification testing result is counted, determines the behavior classification of the human object in the frame.This embodiment improves the accuracys to the detection of the behavior classification of human object in video frame.

Description

Behavior category detection method, device, electronic equipment and computer-readable medium

Technical field

The invention relates to field of computer technology, and in particular to behavior category detection method, device, electronic equipment And computer-readable medium.

Background technique

Video understand as automated analysis, handle video premise and means, video recommendations, extraction wonderful, Video, which labels etc., has important value and significance.For example, important behavior act is often in the videos such as movie and television play It is the key that video content plot analysis.Therefore, the detection that human body behavior classification is carried out to the frame in video, can be video Analysis provides support.

Relevant mode is usually the movement displaying for the human body being concerned only in video frame, the human body being directly based upon in video Movement displaying carry out behavior kind judging.However, different behaviors may under many scenes (such as in movie and television play) There is similar movement displaying (such as drink water and have a meal), determines behavior classification, usual accuracy according only to the movement displaying of human body It is lower.

Summary of the invention

The embodiment of the present application proposes behavior category detection method, device, electronic equipment and computer-readable medium, with solution Certainly determine that behavior classification caused by behavior classification detects not accurate enough technology and asks according only to the performance of human body in the prior art Topic.

In a first aspect, the embodiment of the present application provides a kind of behavior category detection method, this method comprises: to target video In frame carry out human testing, determine the human object region in frame；Determine the scene areas in frame, by human object region and Scene areas is separately input into advance trained behavior classification detection model, obtain respectively with human object region, scene areas Corresponding behavior classification testing result, wherein behavior classification detection model is used to characterize the corresponding relationship of image Yu behavior classification； Obtained behavior classification testing result is counted, determines the behavior classification of the human object in frame.

In some embodiments, behavior classification testing result includes the probability that behavior classification is each default behavior classification；With And count obtained behavior classification testing result, determine the behavior classification of the human object in frame, comprising: by gained To behavior classification testing result in the probability of identical default behavior classification counted, obtain point of each default behavior classification Number；Based on the score of each default behavior classification, the behavior classification of the human object in frame is determined.

In some embodiments, it based on obtained score, before the behavior classification for determining the human object in frame, incites somebody to action Obtained behavior classification testing result is counted, and determines the behavior classification of the human object in frame, further includes: input frame To object detection model trained in advance, object detection result is obtained, wherein object detection model is used for the object in image Object is detected；The default behavior classification interacted with object is related to for each, is determined and the default behavior classification pair The interaction object answered extracts the score of interaction object, this is preset to score and the interaction of behavior classification from object detection result The score of object is weighted, and the score of behavior classification is preset using weighted results as this, to carry out score update.

In some embodiments, the score based on each default behavior classification, determines the behavior classification of the human object in frame, Comprise determining whether the score for having greater than preset threshold；Exist in response to determining, is preset corresponding to the maximum value by score Behavior classification is determined as the behavior classification of the human object in frame.

In some embodiments, the score based on each default behavior classification, determines the behavior classification of the human object in frame, Further include: in response to determining that there is no the scores greater than preset threshold to choose at least one according to the sequence of score from big to small Default behavior classification；Behavior classification is preset for each selected, extracts the behavior to match with the default behavior classification Frame is input to behavior kind judging model by kind judging model, obtains determining result, wherein behavior kind judging model is used Whether the behavior classification of the human object in judgement image is the default behavior classification；Based on selected each default behavior class It is not corresponding to determine as a result, determining the behavior classification of the human object in frame.

In some embodiments, this method further include: according to the time sequencing of the frame in video, by each frame of target video In the behavior classification of human object be smoothed, generate behavior classification information sequence.

In some embodiments, behavior classification detection model is obtained by the training of following model training step: being obtained and is trained Sample set, wherein the sample that training sample is concentrated includes training image sample and the first markup information, and the first markup information is used for Indicate the behavior classification of the human object in training image sample；Using training sample concentrate training image sample as input, Using corresponding first markup information of the training image sample inputted as output, using machine learning method, training is gone For classification detection model.

In some embodiments, after training obtains behavior classification detection model, model training step further include: obtain Test sample collection, wherein the sample that test sample is concentrated includes test image sample and the second markup information, the second markup information It is used to indicate the behavior classification of the human object in test image sample；The sample that test sample is concentrated is extracted, following survey is executed Try is rapid: the test image sample in extracted sample is input to behavior classification detection model；The detection of judgement behavior classification Whether the behavior classification testing result of model output matches with the second markup information in extracted sample；In response to true It is fixed to mismatch, extracted sample is determined into difficult sample.

In some embodiments, after the testing procedure is finished, model training step further include: according to behavior Each difficult sample is added to corresponding target sample and concentrated, wherein behavior classification and target sample collection correspond by classification；It is right In each target sample collection, using the corresponding behavior classification of the target sample collection as goal behavior classification, by the target sample The test image sample of the sample of concentration as input, using corresponding second markup information of the test image sample inputted as Output, using machine learning method, training obtains behavior kind judging model corresponding with goal behavior classification.

Second aspect, the embodiment of the present application provide a kind of behavior classification detection device, which includes: human testing list Member is configured to carry out human testing to the frame in target video, determines the human object region in frame；The detection of behavior classification is single Human object region and scene areas are separately input into row trained in advance by member, the scene areas being configured to determine in frame For classification detection model, behavior classification testing result corresponding with human object region, scene areas respectively is obtained, wherein row It is used to characterize the corresponding relationship of image Yu behavior classification for classification detection model；Statistic unit is configured to obtained row It is counted for classification testing result, determines the behavior classification of the human object in frame.

In some embodiments, behavior classification testing result includes the probability that behavior classification is each default behavior classification；With And statistic unit, comprising: statistical module is configured to the identical default behavior class in obtained behavior classification testing result Other probability is counted, and the score of each default behavior classification is obtained；Determining module is configured to based on each default behavior classification Score, determine the behavior classification of the human object in frame.

In some embodiments, statistical module is further configured to: frame is input to object detection mould trained in advance Type obtains object detection result, wherein object detection model is for detecting the subject in image；For each A default behavior classification for being related to interacting with object determines interactive object corresponding with the default behavior classification, from object inspection It surveys in result, extracts the score of interaction object, which is weighted with the score for interacting object, it will Weighted results preset the score of behavior classification as this, to carry out score update.

In some embodiments, determining module is further configured to: determining whether there is the score greater than preset threshold； Exist in response to determining, default behavior classification corresponding to the maximum value by score is determined as the behavior class of the human object in frame Not.

In some embodiments, determining module is further configured to: in response to determining that there is no greater than preset threshold Score chooses at least one default behavior classification according to the sequence of score from big to small；Row is preset for each selected For classification, the behavior kind judging model to match with the default behavior classification is extracted, frame is input to behavior kind judging mould Type obtains determining result, wherein behavior kind judging model be used to determine human object in image behavior classification whether be The default behavior classification；Determine based on selected each default behavior classification is corresponding as a result, determining the human object in frame Behavior classification.

In some embodiments, device further include: smoothing processing unit is configured to the time according to the frame in video Sequentially, the behavior classification of the human object in each frame of target video is smoothed, generates behavior classification information sequence.

The third aspect, the embodiment of the present application provide a kind of electronic equipment, comprising: one or more processors；Storage dress Set, be stored thereon with one or more programs, when one or more programs are executed by one or more processors so that one or Multiple processors realize the method such as any embodiment in above-mentioned first aspect.

Fourth aspect, the embodiment of the present application provide a kind of computer-readable medium, are stored thereon with computer program, should The method such as any embodiment in above-mentioned first aspect is realized when program is executed by processor.

Behavior category detection method and device provided by the embodiments of the present application, by carrying out human body to the frame in target video Detection, so that it is determined that the human object region in frame out.Then determine the scene areas in frame, so as to by human object region and Scene areas is separately input into advance trained behavior classification detection model, obtain respectively with human object region, scene areas Corresponding behavior classification testing result.Finally, obtained behavior classification testing result is counted, the human body in frame is determined The behavior classification of object.It is combined thus, it is possible to show human body with scene, so that being tied in the detection process to behavior classification More information has been closed, the accuracy to the detection of the behavior classification of human object in video frame is helped to improve.

Detailed description of the invention

By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:

Fig. 1 is the flow chart according to one embodiment of the behavior category detection method of the application；

Fig. 2 is the flow chart according to another embodiment of the behavior category detection method of the application；

Fig. 3 is the schematic diagram according to a treatment process of the behavior category detection method of the application；

Fig. 4 is the structural schematic diagram according to one embodiment of the behavior classification detection device of the application；

Fig. 5 is adapted for the structural schematic diagram for the computer system for realizing the electronic equipment of the embodiment of the present application.

Specific embodiment

The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to related invention is illustrated only in attached drawing.

It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

Referring to FIG. 1, it illustrates the processes 100 according to one embodiment of the behavior category detection method of the application. Behavior category detection method, comprising the following steps:

Step 101, human testing is carried out to the frame in target video, determines the human object region in above-mentioned frame.

In the present embodiment, the executing subject (such as the electronic equipments such as server) of behavior category detection method can be to mesh The frame marked in video carries out human testing.Herein, target video can be any video of current pending processing.In practice, depending on Frequency can be described with frame (Frame).Here, frame is the minimum vision unit for forming video.Each frame is the figure of width static state Picture.Frame sequence continuous in time is synthesized to and just forms video together.

Herein, above-mentioned executing subject can use various modes and carry out human testing.For example, can will be in target video Frame is input to human testing model trained in advance, determines the human object region in the frame.Wherein, above-mentioned human testing model For the human object region in detection image.Here, human testing model can be using machine learning method, be based on sample Collect (mark comprising human body image sample He the position for being used to indicate human object region), to existing convolutional neural networks (Convolutional Neural Network, CNN) carries out what Training obtained.Wherein, convolutional neural networks can be with Using various existing structures, such as DenseBox, VGGNet, ResNet, SegNet etc..

In practice, convolutional neural networks are a kind of feedforward neural networks, its artificial neuron can respond a part and cover Surrounding cells within the scope of lid have outstanding performance for image procossing, therefore, it is possible to carry out target view using convolutional neural networks The extraction of the feature of frame in frequency and characteristic processing etc..Convolutional neural networks may include convolutional layer, pond layer, full articulamentum Deng.Wherein, convolutional layer can be used for extracting characteristics of image.Pond layer can be used for carrying out the information of input down-sampled (downsample).It should be noted that above-mentioned machine learning method, Training method are research and applications extensively at present Well-known technique, details are not described herein.

In some optional implementations of the present embodiment, faster-RCNN network structure can be used and carry out human body The training of detection model.In practice, faster-RCNN network structure is a kind of neural network that can be used for carrying out target detection Structure can be exactly found object position in the picture.Thus, it is possible to carry out human object region using the network It determines.It, should due to including RPN (RegionProposal Network, region candidate network) in faster-RCNN network structure Network can quickly determine out the region comprising certain specified object in image, thus, relative to other network structures, Ke Yigeng The detection in human object region is carried out fastly.

Step 102, it determines the scene areas in above-mentioned frame, human object region and scene areas is separately input into advance Trained behavior classification detection model obtains behavior classification testing result corresponding with human object region, scene areas respectively.

In the present embodiment, above-mentioned executing subject can determine the scene areas in above-mentioned frame first.Wherein, scene areas The region of scene where can be the human object in the frame.It, then can be with for example, the human object in certain frame is eaten in dining room Using dining area as scene areas.Herein, selective search (selective search), slip window sampling be can use Existing detection modes such as (sliding window approach) determines scene areas.

Optionally, when using structure of the faster-RCNN network structure as human testing model, this can also be multiplexed RPN in network structure carries out the determination of scene areas.To promote the detection speed of scene areas.

After determining scene areas, human object region and scene areas can be separately input into pre- by above-mentioned executing subject First trained behavior classification detection model, obtains behavior classification testing result corresponding with human object region, and, obtain with The corresponding behavior classification testing result of scene areas.Wherein, behavior classification detection model can be used for characterizing image and behavior class Other corresponding relationship.It is based on mass data statistics as an example, behavior classification detection model can be technical staff and builds in advance Mapping table vertical, for characterizing image relationship corresponding with behavior classification.As another example, above-mentioned behavior classification Detection model can use the model that machine learning method training obtains, which can go to the human object in image For classification detection.

In some optional implementations of the present embodiment, above-mentioned behavior classification detection model can pass through such as drag Training step training obtains:

The first step obtains training sample set.Wherein, the sample that above-mentioned training sample is concentrated may include training image sample With the first markup information.Above-mentioned first markup information can serve to indicate that the behavior class of the human object in training image sample Not.

Second step, the training image sample that above-mentioned training sample is concentrated is as input, the training image sample that will be inputted This corresponding first markup information is as output, and using machine learning method, training obtains behavior classification detection model.Here, For training the initial model of behavior classification detection model, the various existing neural network knots with classification feature can be used Structure (such as DenseBox, VGGNet, ResNet, SegNet etc.).

Step 103, obtained behavior classification testing result is counted, determines the row of the human object in above-mentioned frame For classification.

In the present embodiment, above-mentioned executing subject can by obtained behavior classification testing result (including with human body pair As the corresponding behavior classification testing result in region and behavior classification testing result corresponding with scene areas) it is counted, it determines The behavior classification of human object in above-mentioned frame.Herein, it can use various modes to be counted.For example, classification testing result It may include the score of different behavior classifications.A possibility that score of some behavior classification is higher, belongs to behavior classification is bigger. The score of identical behavior classification in two classification testing results can be added or be weighted by above-mentioned executing subject, The corresponding behavior classification of the maximum value of calculated result is determined as to the behavior classification of human object.

In some optional implementations of the present embodiment, it is each that behavior classification testing result, which may include behavior classification, The probability of default behavior classification.Above-mentioned executing subject can first will be identical default in obtained behavior classification testing result The probability of behavior classification is counted (for example, directly can be added or weight), obtains the score of each default behavior classification.And Afterwards, the behavior classification of the human object in above-mentioned frame can be determined based on the score of each default behavior classification.As an example, can The corresponding behavior classification of the maximum value of score to be directly determined as to the behavior classification of human object.

Optionally, in above-mentioned implementation, after the score for obtaining each default behavior classification, above-mentioned executing subject is also Subject in above-mentioned frame can be detected, binding object testing result, the score of each default behavior classification is carried out Adjustment.It can specifically execute in accordance with the following steps:

It is possible, firstly, to which above-mentioned frame to be input to object detection model trained in advance, object detection result is obtained.Wherein, Above-mentioned object detection model is for detecting the subject in image.It, can be using for carrying out target inspection in practice The YOLO frame (including convolutional layer and full articulamentum) of survey carries out object detection.

Later, the default behavior classification interacted with object is related to for each, can determines and presets behavior class with this Not corresponding interactive object extracts the score of above-mentioned interactive object, this is preset behavior classification from above-mentioned object detection result Score be weighted with the above-mentioned score for interacting object, the score of behavior classification is preset using weighted results as this, with carry out Score updates.To carry out score update and then be based on updated score, determine the human object in above-mentioned frame Behavior classification.

It should be noted that be related to the default behavior classification interacted with object and default behavior classification with interact object The corresponding relationship of body can be known.As an example, some default behavior classification is to play guitar.Due to needs and the object of playing guitar Product (i.e. guitar) interact, thus the default behavior classification is the behavior classification interacted with object, presets behavior classification with this Corresponding interactive object is guitar.It should be pointed out that the weight coefficient of the score of default behavior classification, above-mentioned interactive object The weight coefficient of score may each be technical staff, and institute is preset as needed.

The method provided by the above embodiment of the application, by carrying out human testing to the frame in target video, thus really Make the human object region in above-mentioned frame.Determine the scene areas in above-mentioned frame, then so as to by above-mentioned human object region Be separately input into advance trained behavior classification detection model with above-mentioned scene areas, obtain respectively with above-mentioned human object area Domain, the corresponding behavior classification testing result of above-mentioned scene areas.Finally, obtained behavior classification testing result is united Meter, determines the behavior classification of the human object in above-mentioned frame.It is combined thus, it is possible to show human body with scene, so that right More information is combined in the detection process of behavior classification, helps to improve the inspection to the behavior classification of human object in video frame The accuracy of survey.

With further reference to Fig. 2, it illustrates the processes 200 of another embodiment of behavior category detection method.The behavior The process 200 of category detection method, comprising the following steps:

Step 201, human testing is carried out to the frame in target video, determines the human object region in above-mentioned frame.

In the present embodiment, the executing subject (such as server 105 shown in FIG. 1) of behavior category detection method can make The training of human testing model is carried out with faster-RCNN network structure.In practice, faster-RCNN network structure is a kind of It can be used for carrying out the neural network structure of target detection, object position can be exactly found in the picture.As a result, may be used To carry out the determination in human object region using the network.Due to including RPN (Region in faster-RCNN network structure Proposal Network, region candidate network), thus, relative to other network structures, it can quickly carry out human object The detection in region.

Step 202, it determines the scene areas in above-mentioned frame, human object region and scene areas is separately input into advance Trained behavior classification detection model obtains behavior classification testing result corresponding with human object region, scene areas respectively.

In the present embodiment, above-mentioned executing subject can determine the scene areas in above-mentioned frame first.Herein, it can be multiplexed RPN in the network structure carries out the determination of scene areas.To promote the detection speed of scene areas.Determining scene area Behind domain, human object region and scene areas can be separately input into behavior classification detection trained in advance by above-mentioned executing subject Model obtains behavior classification testing result corresponding with human object region, and, obtain behavior class corresponding with scene areas Other testing result.Herein, behavior classification testing result may include the probability that behavior classification is each default behavior classification.

In the present embodiment, above-mentioned behavior classification detection model can be obtained by the training of following model training step:

In the present embodiment, after training obtains behavior classification detection model, following steps can also be performed, with determination Trip is that classification detection model is difficult to the sample (can be described as difficult sample) detected:

The first step obtains test sample collection.Wherein, the sample that above-mentioned test sample is concentrated may include test image sample With the second markup information.Above-mentioned second markup information can serve to indicate that the behavior class of the human object in test image sample Not.

Second step extracts the sample that above-mentioned test sample is concentrated, executes following testing procedure: firstly, by extracted sample Test image sample in this is input to above-mentioned behavior classification detection model.Later, above-mentioned behavior classification detection model institute is determined Whether the behavior classification testing result of output matches with the second markup information in extracted sample.Specifically, classification is examined The behavior that survey model can calculate the human object in the test image sample belongs to the probability of each default behavior classification.It can To determine whether the corresponding default behavior classification of maximum probability value is identical as behavior classification indicated by above-mentioned second markup information. If they are the same, then it can determine in behavior classification testing result that above-mentioned behavior classification detection model is exported and extracted sample The second markup information match.Conversely, then mismatching.It mismatches, extracted sample can be determined as in response to determining Difficult sample.

Each difficult sample is added to corresponding target sample and concentrated, wherein behavior classification by third step according to behavior classification It is corresponded with target sample collection.Herein, target sample collection corresponding with each behavior classification can be pre-established.When determining After the difficult sample of some sample, according to the second markup information in the sample, the corresponding behavior classification of the sample can be determined. It is concentrated it is thus possible to which the sample is added to target sample corresponding with behavior classification.It should be noted that for some row For classification, it is corresponding can also to be added behavior classification in addition to being added sample in distress in advance for corresponding target sample collection Normal sample.

In the present embodiment, kind judging model can also be carried out using the target sample collection after addition sample in distress Training.For some behavior classification, the kind judging model trained using the corresponding target sample collection of behavior classification can For determining whether the behavior of the human object in image belongs to behavior classification.Specifically, for each target sample Collection, can be using the test image sample for the sample that the target sample is concentrated as input, the test image sample pair that will be inputted The second markup information answered is as output, and using machine learning method, training obtains row corresponding with above-mentioned goal behavior classification For kind judging model.In practice, above-mentioned kind judging model can also use the existing neural network with classification feature Structure (such as DenseBox, VGGNet, ResNet, SegNet etc.).

Step 203, the probability of the identical default behavior classification in obtained behavior classification testing result is counted, Obtain the score of each default behavior classification.

In the present embodiment, above-mentioned executing subject can first will be identical pre- in obtained behavior classification testing result If the probability of behavior classification is counted (for example, directly can be added or weight), the score of each default behavior classification is obtained.

Step 204, above-mentioned frame is input to object detection model trained in advance, obtains object detection result.

In the present embodiment, above-mentioned frame can be input to object detection model trained in advance by above-mentioned executing subject, be obtained To object detection result.Wherein, above-mentioned object detection model can be used for detecting the subject in image.Practice In, object detection can be carried out using the YOLO frame (including convolutional layer and full articulamentum) for carrying out target detection.

Step 205, the default behavior classification interacted with object is related to for each, determined and the default behavior classification Corresponding interactive object extracts the score of interaction object, this is preset to score and the friendship of behavior classification from object detection result The score of mutual object is weighted, and presets the score of behavior classification, using weighted results as this to carry out score update.

In the present embodiment, the default behavior classification interacted with object is related to for each, above-mentioned executing subject can The corresponding interactive object of behavior classification is preset with this to determine.Then, can from the object detection result that step 204 obtains, The score for presetting behavior classification is weighted by the score for extracting above-mentioned interactive object with the above-mentioned score for interacting object.Most Afterwards, the score of behavior classification can be preset, using weighted results as this to carry out score update.

As a result, further combined with the object information in video frame in the detection process to behavior classification, thus into one Step improves the accuracy to the detection of the behavior classification of human object in video frame.

Step 206, the score based on each default behavior classification, determines the behavior classification of the human object in above-mentioned frame.

In the present embodiment, above-mentioned executing subject can be determined in above-mentioned frame based on the score of each default behavior classification The behavior classification of human object.As an example, the corresponding default behavior classification of the maximum value of score can be determined as above-mentioned frame In human object behavior classification.

In some optional implementations of the present embodiment, above-mentioned executing subject, which may first determine whether to exist, to be greater than The score of preset threshold.Exist in response to determining, it can be on default behavior classification corresponding to the maximum value by score be determined as State the behavior classification of the human object in frame.

In above-mentioned implementation, in response to determine there is no greater than above-mentioned preset threshold score (at this time it is understood that Behavior classification can not be accurately determined for behavior classification detection model), following steps can be executed:

Firstly, the sequence according to score from big to small, chooses at least one default behavior classification.

Then, behavior classification is preset for each selected, extracts the behavior to match with the default behavior classification Above-mentioned frame is input to above-mentioned behavior kind judging model by kind judging model, obtains determining result.Wherein, above-mentioned behavior class Other decision model can be used for determining whether the behavior classification of the human object in image is the default behavior classification.Herein, on The generation step for stating classification discrimination model describes in step 202, and details are not described herein again.

As an example, respectively play guitar, ride according to default behavior classification selected by the sequence of score from big to small, It sings.At this point it is possible to choose the first behavior kind judging model for determining whether " playing guitar " this behavior classification, use In determine whether " riding " this behavior classification the second behavior kind judging model, for determine whether " singing " this The third behavior kind judging model of behavior classification.Then, which can be separately input to these three behavior kind judging moulds In type, the differentiation result of these three behavior classification discrimination models output is respectively obtained.Differentiate that result may include and belongs to corresponding line For the probability of classification.For example, the probability of the first behavior kind judging model output is 0.8；Second behavior kind judging model is defeated Probability out is 0.2；The probability of third behavior kind judging model output is 0.5.

Finally, can determine based on selected each default behavior classification is corresponding as a result, determining the human body in above-mentioned frame The behavior classification of object.Herein, on the corresponding classification of behavior kind judging model of output probability maximum value being determined as State the behavior classification of the human object in frame.

As a result, when behavior classification detection model can not detect behavior classification, determined by behavior kind judging model Trip is classification, thus, the precision to the detection of the behavior classification of human object in video frame is improved, and further improve The accuracy of detection.

Step 207, according to the time sequencing of the frame in video, by the row of the human object in each frame of above-mentioned target video It is smoothed for classification, generates behavior classification information sequence.

In the present embodiment, above-mentioned executing subject can be according to the time sequencing of the frame in video, by above-mentioned target video Each frame in the behavior classification of human object be smoothed, generate behavior classification information sequence.Herein, behavior classification is believed Behavior classification information in breath sequence can serve to indicate that behavior classification.

Here, for each frame, can determine whether the forward and backward corresponding behavior classification information of the frame is identical first.If The corresponding behavior classification information of the forward and backward frame of the frame is identical, and the corresponding behavior classification information of frame row corresponding with forward and backward frame For classification information difference, then the behavior classification information of the frame can be deleted, the smoothing processing of behavior classification information has been carried out.By This, can eliminate and identify wrong behavior classification information, further improve the accuracy of identification.

From figure 2 it can be seen that compared with the corresponding embodiment of Fig. 1, behavior category detection method in the present embodiment Process 200 relates to the step of determining object detection result using object detection model, and, it relates to default behavior classification Score the step of being weighted with the score for interacting object.The scheme of the present embodiment description as a result, in the inspection to behavior classification Further combined with the object information in video frame during survey, to further improve the row to human object in video frame For the accuracy of the detection of classification.On the other hand, when behavior classification detection model can not detect behavior classification, pass through behavior Kind judging model determines behavior classification, thus, improve the essence to the detection of the behavior classification of human object in video frame Degree.At the same time, according to the time sequencing of the frame in video, by the behavior class of the human object in each frame of above-mentioned target video It is not smoothed, behavior classification testing result can be carried out smoothly, further improving the standard of detection in time-domain True property.

With continued reference to the signal that Fig. 3, Fig. 2 are according to a treatment process of the behavior category detection method of the present embodiment Figure.In the treatment process of Fig. 3, need to carry out the detection of behavior classification to target video.The electronics of process performing classification detection is set It can store behavior classification detection model trained in advance, object detection model, scene detection model etc. in standby.

Above-mentioned electronic equipment can carry out human testing to the frame in target video first, really after obtaining target video Human object region in fixed above-mentioned frame.

Then, above-mentioned electronic equipment can carry out scene analysis to above-mentioned frame, the scene areas in above-mentioned frame be determined, by people Body subject area and scene areas are separately input into advance trained behavior classification detection model, obtain respectively with human object area Domain, the corresponding behavior classification testing result of scene areas.Then, above-mentioned electronic equipment can detect obtained behavior classification As a result the probability of the identical default behavior classification in is counted, and the score of each default behavior classification is obtained.

Then, above-mentioned electronic equipment can carry out object detection to above-mentioned frame.Specifically, above-mentioned frame can be input to pre- First trained object detection model, obtains object detection result.

Then, above-mentioned electronic equipment can further count object detection result.Specifically, for each be related to The default behavior classification of object interaction, above-mentioned electronic equipment, which can be determined, presets the corresponding interactive object of behavior classification with this.And Afterwards, from object detection result, the score of interaction object is extracted.Later, by this preset the score of behavior classification with interact object Score be weighted, preset the score of behavior classification, using weighted results as this to carry out score update.

Then, above-mentioned electronic equipment can based on the result further counted (i.e. updated each default behavior classification Score), determine the behavior classification of the human object in above-mentioned frame.

Then, above-mentioned electronic equipment can be according to the time sequencing of the frame in video, will be in each frame of above-mentioned target video The behavior classification of human object be smoothed, generate behavior classification information sequence.

Object detection, human testing, scene analysis etc. have been carried out in the detection process to behavior classification as a result, thus Much information has been merged, the accuracy to the detection of the behavior classification of human object in video frame is improved.

With further reference to Fig. 4, as the realization to method shown in above-mentioned each figure, this application provides a kind of inspections of behavior classification One embodiment of device is surveyed, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, which can specifically apply In various electronic equipments.

As shown in figure 4, behavior classification detection device 400 described in the present embodiment includes: human testing unit 401, matched It is set to and human testing is carried out to the frame in target video, determine the human object region in above-mentioned frame；Behavior classification detection unit 402, above-mentioned human object region and above-mentioned scene areas are separately input by the scene areas being configured to determine in above-mentioned frame Trained behavior classification detection model in advance, obtains behavior corresponding with above-mentioned human object region, above-mentioned scene areas respectively Classification testing result, wherein above-mentioned behavior classification detection model is used to characterize the corresponding relationship of image Yu behavior classification；Statistics is single Member 403, is configured to count obtained behavior classification testing result, determines the behavior of the human object in above-mentioned frame Classification.

In some optional implementations of the present embodiment, it is each that behavior classification testing result, which may include behavior classification, The probability of default behavior classification.Above-mentioned statistic unit 403 may include statistical module 4031 and determining module 4032.Wherein, on Stating statistical module 4031 may be configured to the general of the identical default behavior classification in obtained behavior classification testing result Rate is counted, and the score of each default behavior classification is obtained.Above-mentioned determining module 4032 is configured to each default row For the score of classification, the behavior classification of the human object in above-mentioned frame is determined.

In some optional implementations of the present embodiment, which can be further configured to: will be upper State frame and be input in advance trained object detection model, obtain object detection result, wherein above-mentioned object detection model for pair Subject in image is detected；The default behavior classification interacted with object is related to for each, determination is pre- with this If the corresponding interactive object of behavior classification, from above-mentioned object detection result, the score of above-mentioned interactive object is extracted, this is preset The score of behavior classification is weighted with the above-mentioned score for interacting object, and point of behavior classification is preset using weighted results as this Number, to carry out score update.

In some optional implementations of the present embodiment, above-mentioned determining module can be further configured to: be determined With the presence or absence of the score for being greater than preset threshold；Exist in response to determining, default behavior classification corresponding to the maximum value by score It is determined as the behavior classification of the human object in above-mentioned frame.

In some optional implementations of the present embodiment, above-mentioned determining module can be further configured to: response In determining that there is no the scores greater than above-mentioned preset threshold, according to the sequence of score from big to small, at least one default row is chosen For classification；Behavior classification is preset for each selected, the behavior classification to match with the default behavior classification is extracted and sentences Above-mentioned frame is input to above-mentioned behavior kind judging model by cover half type, obtains determining result, wherein above-mentioned behavior kind judging Model is used to determine whether the behavior classification of the human object in image to be the default behavior classification；Based on selected each default Behavior classification is corresponding to be determined as a result, determining the behavior classification of the human object in above-mentioned frame.

In some optional implementations of the present embodiment, which can also include smoothing processing unit 404.Its In, above-mentioned smoothing processing unit 404 may be configured to the time sequencing according to the frame in video, by each of above-mentioned target video The behavior classification of human object in frame is smoothed, and generates behavior classification information sequence.

In some optional implementations of the present embodiment, above-mentioned behavior classification detection model can pass through such as drag Training step training obtains: obtaining training sample set, wherein the sample that above-mentioned training sample is concentrated include training image sample and First markup information, above-mentioned first markup information are used to indicate the behavior classification of the human object in training image sample；It will be upper The training image sample of training sample concentration is stated as input, by corresponding first markup information of the training image sample inputted As output, using machine learning method, training obtains behavior classification detection model.

In some optional implementations of the present embodiment, after above-mentioned training obtains behavior classification detection model, Above-mentioned model training step can also include: acquisition test sample collection, wherein the sample that above-mentioned test sample is concentrated includes test Image pattern and the second markup information, above-mentioned second markup information are used to indicate the behavior of the human object in test image sample Classification；The sample that above-mentioned test sample is concentrated is extracted, executes following testing procedure: the test chart in extracted sample is decent Originally it is input to above-mentioned behavior classification detection model；Determine the behavior classification testing result that above-mentioned behavior classification detection model is exported Whether match with the second markup information in extracted sample；It is mismatched in response to determining, extracted sample is determined Difficult sample；According to behavior classification, each difficult sample is added to corresponding target sample and is concentrated, wherein behavior classification and target Sample set corresponds.

In some optional implementations of the present embodiment, above-mentioned model training step can also include: for each A target sample collection concentrates the target sample using the corresponding behavior classification of the target sample collection as goal behavior classification The test image sample of sample is as input, using corresponding second markup information of the test image sample inputted as exporting, Using machine learning method, training obtains behavior kind judging model corresponding with above-mentioned goal behavior classification.

The device provided by the above embodiment of the application carries out the frame in target video by human testing unit 401 Human testing, so that it is determined that the human object region in above-mentioned frame out.Then behavior classification detection unit 402 determines in above-mentioned frame Scene areas, examined so that above-mentioned human object region and above-mentioned scene areas are separately input into advance trained behavior classification Model is surveyed, behavior classification testing result corresponding with above-mentioned human object region, above-mentioned scene areas respectively is obtained.Finally, system Meter unit 403 counts obtained behavior classification testing result, determines the behavior classification of the human object in above-mentioned frame. It is combined thus, it is possible to show human body with scene, so that more information is combined in the detection process to behavior classification, from And improve the accuracy to the detection of the behavior classification of human object in video frame.

Below with reference to Fig. 5, it illustrates the computer systems 500 for the electronic equipment for being suitable for being used to realize the embodiment of the present application Structural schematic diagram.Whole electronic equipment shown in Fig. 5 is only an example, should not function and use to the embodiment of the present application Range band carrys out any restrictions.

As shown in figure 5, computer system 500 includes central processing unit (CPU) 501, it can be read-only according to being stored in Program in memory (ROM) 502 or be loaded into the program in random access storage device (RAM) 503 from storage section 508 and Execute various movements appropriate and processing.In RAM 503, also it is stored with system 500 and operates required various programs and data. CPU501, ROM 502 and RAM 503 is connected with each other by bus 504.Input/output (I/O) interface 505 is also connected to always Line 504.

I/O interface 505 is connected to lower component: the importation 506 including keyboard, mouse etc.；Including such as liquid crystal Show the output par, c 507 of device (LCD) etc. and loudspeaker etc.；Storage section 508 including hard disk etc.；And including such as LAN The communications portion 509 of the network interface card of card, modem etc..Communications portion 509 is executed via the network of such as internet Communication process.Driver 510 is also connected to I/O interface 505 as needed.Detachable media 511, such as disk, CD, magneto-optic Disk, semiconductor memory etc. are mounted on as needed on driver 510, in order to from the computer program root read thereon According to needing to be mounted into storage section 508.

Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communications portion 509, and/or from detachable media 511 are mounted.When the computer program is executed by central processing unit (CPU) 501, limited in execution the present processes Above-mentioned function.It should be noted that computer-readable medium described herein can be computer-readable signal media or Computer readable storage medium either the two any combination.Computer readable storage medium for example can be --- but Be not limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination. The more specific example of computer readable storage medium can include but is not limited to: have one or more conducting wires electrical connection, Portable computer diskette, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only deposit Reservoir (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory Part or above-mentioned any appropriate combination.In this application, computer readable storage medium, which can be, any include or stores The tangible medium of program, the program can be commanded execution system, device or device use or in connection.And In the application, computer-readable signal media may include in a base band or the data as the propagation of carrier wave a part are believed Number, wherein carrying computer-readable program code.The data-signal of this propagation can take various forms, including but not It is limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer Any computer-readable medium other than readable storage medium storing program for executing, the computer-readable medium can send, propagate or transmit use In by the use of instruction execution system, device or device or program in connection.Include on computer-readable medium Program code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc., Huo Zheshang Any appropriate combination stated.

Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.

Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor packet Include human testing unit, behavior classification detection unit unit and statistic unit.Wherein, the title of these units is under certain conditions The restriction to the unit itself is not constituted, for example, human testing unit unit is also described as " in target video Frame carries out human testing, determines the unit in the human object region in the frame ".

As on the other hand, present invention also provides a kind of computer-readable medium, which be can be Included in device described in above-described embodiment；It is also possible to individualism, and without in the supplying device.Above-mentioned calculating Machine readable medium carries one or more program, when said one or multiple programs are executed by the device, so that should Device: human testing is carried out to the frame in target video, determines the human object region in the frame；Determine the scene area in the frame Human body subject area and the scene areas are separately input into behavior classification detection model trained in advance, are distinguished by domain Behavior classification testing result corresponding with human body subject area, the scene areas；By obtained behavior classification testing result It is counted, determines the behavior classification of the human object in the frame.

Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein Can technical characteristic replaced mutually and the technical solution that is formed.

Claims

1. a kind of behavior category detection method, which is characterized in that the described method includes:

Human testing is carried out to the frame in target video, determines the human object region in the frame；

It determines the scene areas in the frame, the human object region and the scene areas is separately input into preparatory training Behavior classification detection model, obtain corresponding with the human object region, scene areas behavior classification detection respectively As a result, wherein the behavior classification detection model is used to characterize the corresponding relationship of image Yu behavior classification；

Obtained behavior classification testing result is counted, determines the behavior classification of the human object in the frame.

2. behavior category detection method according to claim 1, which is characterized in that the behavior classification testing result includes Behavior classification is the probability of each default behavior classification；And

It is described to count obtained behavior classification testing result, determine the behavior classification of the human object in the frame, Include:

The probability of identical default behavior classification in obtained behavior classification testing result is counted, each default row is obtained For the score of classification；

Based on the score of each default behavior classification, the behavior classification of the human object in the frame is determined.

3. behavior category detection method according to claim 2, which is characterized in that it is based on obtained score described, It is described to count obtained behavior classification testing result before the behavior classification for determining the human object in the frame, Determine the behavior classification of the human object in the frame, further includes:

The frame is input to object detection model trained in advance, obtains object detection result, wherein the object detection mould Type is for detecting the subject in image；

The default behavior classification interacted with object is related to for each, determines and presets the corresponding interactive object of behavior classification with this Body extracts the score of the interactive object from the object detection result, by this preset behavior classification score and the friendship The score of mutual object is weighted, and presets the score of behavior classification, using weighted results as this to carry out score update.

4. behavior category detection method according to claim 3, which is characterized in that described to be based on each default behavior class Other score determines the behavior classification of the human object in the frame, comprising:

Determine whether there is the score greater than preset threshold；

Exist in response to determining, default behavior classification corresponding to the maximum value by score is determined as the human object in the frame Behavior classification.

5. behavior category detection method according to claim 4, which is characterized in that described to be based on each default behavior class Other score determines the behavior classification of the human object in the frame, further includes:

In response to determining there is no the score greater than the preset threshold, according to the sequence of score from big to small, at least one is chosen A default behavior classification；

Behavior classification is preset for each selected, extracts the behavior kind judging mould to match with the default behavior classification The frame is input to the behavior kind judging model by type, obtains determining result, wherein the behavior kind judging model For determining whether the behavior classification of the human object in image is the default behavior classification；

Determine based on selected each default behavior classification is corresponding as a result, determining the behavior class of the human object in the frame Not.

6. behavior category detection method according to claim 1, which is characterized in that the method also includes:

According to the time sequencing of the frame in video, the behavior classification of the human object in each frame of the target video is carried out flat Sliding processing, generates behavior classification information sequence.

7. behavior category detection method described in one of -6 according to claim 1, which is characterized in that the behavior classification detects mould Type is obtained by the training of following model training step:

Obtaining training sample set, wherein the sample that the training sample is concentrated includes training image sample and the first markup information, First markup information is used to indicate the behavior classification of the human object in training image sample；

The training image sample that the training sample is concentrated is as input, the training image sample corresponding first that will be inputted Markup information is as output, and using machine learning method, training obtains behavior classification detection model.

8. behavior category detection method according to claim 7, which is characterized in that obtain the inspection of behavior classification in the training It surveys after model, the model training step further include:

Obtaining test sample collection, wherein the sample that the test sample is concentrated includes test image sample and the second markup information, Second markup information is used to indicate the behavior classification of the human object in test image sample；

The sample that the test sample is concentrated is extracted, executes following testing procedure: the test chart in extracted sample is decent Originally it is input to the behavior classification detection model；Judge the behavior classification testing result that the behavior classification detection model is exported Whether match with the second markup information in extracted sample；It is mismatched in response to determining, extracted sample is determined Difficult sample.

9. behavior category detection method according to claim 8, which is characterized in that be finished it in the testing procedure Afterwards, the model training step further include:

According to behavior classification, each difficult sample is added to corresponding target sample and is concentrated, wherein behavior classification and target sample collection It corresponds；

For each target sample collection, using the corresponding behavior classification of the target sample collection as goal behavior classification, by the mesh The test image sample of the sample of this concentration of standard specimen believes corresponding second mark of the test image sample inputted as input Breath is as output, and using machine learning method, training obtains behavior kind judging model corresponding with the goal behavior classification.

10. a kind of behavior classification detection device, which is characterized in that described device includes:

Human testing unit is configured to carry out human testing to the frame in target video, determines the human object in the frame Region；

Behavior classification detection unit, the scene areas being configured to determine in the frame, by the human object region and described Scene areas is separately input into advance trained behavior classification detection model, obtain respectively with the human object region, described The corresponding behavior classification testing result of scene areas, wherein the behavior classification detection model is for characterizing image and behavior class Other corresponding relationship；

Statistic unit is configured to count obtained behavior classification testing result, determines the human body pair in the frame The behavior classification of elephant.

11. behavior classification detection device according to claim 10, which is characterized in that the behavior classification testing result packet Include the probability that behavior classification is each default behavior classification；And

The statistic unit, comprising:

Statistical module is configured to carry out the probability of the identical default behavior classification in obtained behavior classification testing result Statistics, obtains the score of each default behavior classification；

Determining module is configured to determine the row of the human object in the frame based on the score of each default behavior classification For classification.

12. behavior classification detection device according to claim 11, which is characterized in that the statistical module is further matched It is set to:

13. behavior classification detection device according to claim 12, which is characterized in that the determining module is further matched It is set to:

Determine whether there is the score greater than preset threshold；

14. behavior classification detection device according to claim 13, which is characterized in that the determining module is further matched It is set to:

15. behavior classification detection device according to claim 10, which is characterized in that described device further include:

Smoothing processing unit is configured to the time sequencing according to the frame in video, by the people in each frame of the target video The behavior classification of body object is smoothed, and generates behavior classification information sequence.

16. behavior classification detection device described in one of 0-15 according to claim 1, which is characterized in that the behavior classification inspection Model is surveyed to obtain by the training of following model training step:

17. behavior classification detection device according to claim 16, which is characterized in that obtain behavior classification in the training After detection model, the model training step further include:

18. behavior classification detection device according to claim 17, which is characterized in that be finished in the testing procedure Later, the model training step further include:

19. a kind of electronic equipment characterized by comprising

One or more processors；

Storage device is stored thereon with one or more programs,

When one or more of programs are executed by one or more of processors, so that one or more of processors are real The now method as described in any in claim 1-9.

20. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that the program is executed by processor Method of the Shi Shixian as described in any in claim 1-9.