CN109426793A

CN109426793A - A kind of image behavior recognition methods, equipment and computer readable storage medium

Info

Publication number: CN109426793A
Application number: CN201710780212.XA
Authority: CN
Inventors: 王勃飞
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2017-09-01
Filing date: 2017-09-01
Publication date: 2019-03-05

Abstract

The invention discloses a kind of image behavior recognition methods, equipment and computer readable storage mediums.This method comprises: carrying out the division of subassembly to the target region in images to be recognized, the region where each subassembly is determined；The feature that each subassembly is extracted from the region where each subassembly, according to the feature of each subassembly determine target belonging to behavior classification.The present invention carries out image recognition by the full convolutional network LRFCN model of regional area, in the case where only increasing few computing cost, can effectively improve the accuracy of identification.

Description

A kind of image behavior recognition methods, equipment and computer readable storage medium

Technical field

The present invention relates to technical field of image processing, can more particularly to a kind of image Activity recognition, equipment and computer Read storage medium.

Background technique

In recent years, it as monitoring electronic equipment is in the continuous universal of every field, is more efficiently supervised from monitor video The demand for surveying valuable information is increasingly prominent.Traditional monitoring method is but personal monitoring's video using being manually monitored Method low efficiency, accuracy be difficult to ensure, so there is an urgent need to a kind of method for capableing of behavior in intelligent distinguishing video, and energy It is enough that behavior interested in video is detected.

Summary of the invention

The present invention provides a kind of image behavior recognition methods, equipment and computer readable storage medium, to hold in battery Measure in constant situation, terminal realize power consumption control the problem of.

For achieving the above object, the present invention uses following technical solutions:

According to one aspect of the present invention, a kind of image behavior recognition methods is provided, which comprises

The division that subassembly is carried out to the target region in images to be recognized, determines the area where each subassembly Domain；

The feature that each subassembly is extracted from the region where each subassembly, according to the feature of each subassembly Determine behavior classification belonging to the target.

Optionally, the target region in images to be recognized carries out the division of subassembly, determines each sub-portion Region where part, comprising:

The division of subassembly is carried out to the target region according to preset subassembly average proportions value；

Prospect background segmentation is carried out using each subassembly region of the Region Segmentation Algorithm to division, before obtaining subassembly Scape segmentation result.

Optionally, in the division for carrying out subassembly to the target region according to preset subassembly average proportions value Before, further includes:

The mark of subassembly is carried out to the image comprising target that sample data is concentrated；

According to the region of the subassembly of mark, the ratio value of image shared by subassembly is determined；

Count the ratio value of image where the sample data concentrates identical subassembly and value, according to described and value determination The subassembly average proportions value, wherein the subassembly average proportions value is different subassemblies and value ratio.

Optionally, the Region Segmentation Algorithm comprises at least one of the following: GrabCut algorithm, GraphCut algorithm and RandomWalker algorithm.

Optionally, the feature that each subassembly is extracted in the region from where each subassembly, according to each The feature of subassembly determines behavior classification belonging to the target, comprising:

To where the subassembly region and the target region carry out feature extraction respectively；

The feature that the feature that subassembly extracts is extracted with the target region is cascaded, the feature after cascade is made For the target signature；

Behavior classification belonging to the target is determined from default disaggregated model according to the target signature.

Optionally, described to determine behavior class belonging to the target from default disaggregated model according to the target signature Not, comprising:

The probability of affiliated every kind of behavior classification is determined from default disaggregated model according to the target signature；

The behavior classification of the maximum probability is chosen as behavior classification belonging to the target.

Optionally, determined from default disaggregated model according to the target signature behavior classification belonging to the target it Before, further includes:

Obtain pre-training disaggregated model；

The sample data set comprising multiclass behavior is established, and to sample data set target region, behavior classification And the region of subassembly is labeled；The pre-training disaggregated model is instructed based on the sample data set of mark Practice, obtains the default disaggregated model.

Optionally, it is described obtain the preset disaggregated model after, the method also includes:

The image concentrated to the sample data is cut, to expand the sample data set；

Energy damage threshold is optimized according to the sample data set after expansion, the default classification mould after being optimized Type.

According to one aspect of the present invention, a kind of image Activity recognition equipment is provided, comprising: memory and processor；Its In, computer instruction is stored in the memory, it is above-mentioned to realize when the computer instruction is executed by the processor Overall Steps and part steps in image behavior recognition methods.

According to one aspect of the present invention, a kind of computer readable storage medium, the computer-readable storage medium are provided Matter is stored with one or more program, above-mentioned to realize when one or more of programs are executed by the processor Image behavior recognition methods in Overall Steps and part steps.

The present invention has the beneficial effect that:

Image behavior recognition methods, equipment and computer readable storage medium provided by the embodiment of the present invention, using office The full convolutional network in portion region improves pond process, carries out drawing for subassembly by the target region that will be identified Point, final behavior classification is determined according to the feature that subassembly obtains.Therefore, the present invention is by local shape factor, only In the case where increasing few computing cost, the accuracy of identification can effectively improve.

The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, the followings are specific embodiments of the present invention.

Detailed description of the invention

In order to illustrate the embodiments of the present invention more clearly or it is existing in scheme, below will be in embodiment or existing description Required attached drawing is briefly described, it should be apparent that, the accompanying drawings in the following description is only of the invention some Embodiment without any creative labor, can also be according to these attached drawings for ordinary people in the field Obtain other attached drawings.

Fig. 1 is the flow chart of image behavior recognition methods provided in the embodiment of the present invention；

Fig. 2 is the network structure of image behavior recognition methods provided in the embodiment of the present invention；

Fig. 3 is the cascade schematic diagram of feature in the embodiment of the present invention；

Fig. 4 is the functional block diagram of image Activity recognition equipment provided in the embodiment of the present invention.

Specific embodiment

Below in conjunction with attached drawing and embodiment, the present invention will be described in further detail.It should be appreciated that described herein Specific embodiment be only used to explain the present invention, limit the present invention.

In computer vision field, there are all multi-methods to can be used for Activity recognition, still, in many cases, background is built Mould, the real-time of foreground target detection and tracking and precision are difficult to reach requirement.And one as machine learning of deep learning There is good improvement in new branch in real-time and accuracy.In object detection field, there are some typical deep learning moulds Type scheme, is broadly divided into two classes, and one kind is such as YOLO of the method based on recurrence (You Only Look Once), SSD (Single Shot Multibox Detector) etc., such methods efficiency is relatively high but precision is limited, and one kind is the side based on candidate regions Method, such as faster RCNN (region-based convolutional neural networks), RFCN (Region- Based Fully Convolutional Networks) etc., such methods precision is higher but efficiency decreases.

In view of Activity recognition problem and target detection problems have certain similitude, but difficulty is bigger, therefore, this hair Bright selection improves on the basis of current accuracy of identification highest RFCN, proposes a kind of based on the full convolution net of regional area The Activity recognition method of network (Local Region-based Fully Convolutional Networks, abbreviation LRFCN), Activity recognition for video.

Embodiment of the method

Image behavior recognition methods provided by the embodiment of the present invention specifically comprises the following steps: as depicted in figs. 1 and 2

Step 101, the division that subassembly is carried out to the target region in images to be recognized, determines each subassembly institute Region.

Step 102, the feature that each subassembly is extracted from the region where each subassembly, according to each subassembly Feature determines behavior classification belonging to target.

Target region is carried out the division of subassembly by the embodiment of the present invention, and according to the feature that subassembly extracts come really Final behavior classification belonging to setting the goal, is based on this, and the present invention is identified by using local feature, only increases few calculate In the case where expense, the accuracy of identification can effectively improve.

Wherein, in an alternate embodiment of the present invention, in target region (the also referred to as region of interest to images to be recognized Domain RoI) when being identified, the identification of area-of-interest RoI can be carried out using region recommendation network RPN.For " region pushes away It recommends network RPN " and already belongs to technology known to those skilled in the art, it will not be described here.It can certainly be using other knowledges Other technology carries out the identification of area-of-interest, does not do excessive restriction here.Wherein, in the target region of images to be recognized Before being identified, images to be recognized is normalized, so that image can obtain unification after normalized The standard picture of form.

Wherein, in an alternate embodiment of the present invention, drawing for subassembly is carried out to the target region in images to be recognized Timesharing, the division including carrying out subassembly to target region according to preset subassembly average proportions value；Utilize region point It cuts algorithm and prospect background segmentation is carried out to each subassembly region of division, obtain the foreground segmentation result of subassembly.

Here Preliminary division is carried out to subassembly region by subassembly average proportions value.Wherein, subassembly is average Ratio value is sample data set when being trained according to disaggregated model (LRFCN model) to determine, to guarantee the accuracy of the value.

Specifically, in the division for carrying out subassembly to the target region according to preset subassembly average proportions value Before, need to obtain the subassembly average proportions value.Here determine subassembly average proportions mode, include the following:

In statistical sample data set the ratio value of image where identical subassembly and value, determined according to described and value described in The subassembly average proportions value, wherein the subassembly average proportions value is different subassemblies and value ratio.

Specifically, the calculation of subassembly average proportions value is as follows:

Wherein, (part1)_i+(part2)_i+…(partk)_i=1；partk_iFor mesh shared by k subassembly in i-th of target Mark the ratio of region；K is the number of subassembly；The number of targets that n is included for sample data set in training library.

It is also to say, to subassembly 1, subassembly 2 ... the subassembly K of targets all inside sample data set in the present invention The ratio of shared ROI is summed respectively, then according to the ratio and to determine average proportions value between each subassembly.For example, Human body is divided into head, body and lower limb three parts by one specific embodiment.To head proprietary inside sample data set Portion, body, lower limb ratio are averaging, it is assumed that the ratio after i-th of people's normalization is Head_i:BodyUp_i:BodyDown_i, Wherein, Head_i+BodyUp_i+BodyDown_i=1, if sample data set one shares n people, average proportions value is

Here, it in order to guarantee the accuracy of subassembly region division, needs to carry out the region further to Preliminary division Accurate Segmentation.Specifically, when being split, background interference is excluded using Region Segmentation Algorithm, in each subassembly area Background and prospect are distinguished in domain, obtain the foreground segmentation result of subassembly.

Wherein, it is preferred that Region Segmentation Algorithm uses GrabCut algorithm, GraphCut algorithm or RandomWalker Any one of algorithm.Certainly it can also be realized using other algorithms, be no longer introduced here, do not departed from core of the present invention and think Think, all in the scope of the present invention.Here, by taking GrabCut algorithm as an example, the specific implementation process of segmentation is illustrated.

Firstly, defining the optimization aim of an energy function E description segmentation, formula is expressed as follows:

E (α, k, θ, z)=U (α, k, θ, z)+V (α, z)

Wherein, the realm data item of U function representation energy function, the smooth item (border item) of V function representation energy function； α is that (background label 0, prospect label are that 1), k is the Gauss point using GMM (mixed Gauss model) to picture init Tag The number of amount, θ are the statistics parameter (weight of Gaussian component, mean vector, covariance matrix) of GMM, and z is the figure of subassembly Sheet data.

Then, the min-cut minimal cut for solving the energy function can obtain the segmenting pixels set of prospect background.

Wherein, in an alternate embodiment of the present invention, the spy of each subassembly is extracted from the region where each subassembly Sign, according to the feature of each subassembly determine target belonging to behavior classification, comprising:

To where subassembly region and target region carry out feature extraction respectively；

The feature that the feature that subassembly extracts is extracted with target region is cascaded, the feature after cascade is as mesh Mark feature；

Behavior classification belonging to target is determined from preset disaggregated model according to target signature.

Specifically, when extracting the feature of each subassembly, the pixel of subassembly region is rolled up with convolution kernel Long-pending, the value after convolution is the feature of subassembly.But because LRFCN network usually has plurality of layers, i.e. convolution operation can be with iteration Many times.So its practical range corresponded in initial original image is bigger than the region of segmentation result.

Wherein, in order to enable the accuracy of image recognition, when extracting each subassembly feature, while extracting target place The global feature in region.For example, shown in Fig. 3, mainly to head, body, the corresponding local feature in three partial regions of lower limb, and One tandem compound of the entire corresponding global characteristics progress of human region, is constituted eventually for the feature described to whole region.

Based on this it is found that here by by the feature in each subassembly feature (local pond) and entire RoI extracted region (whole pond) is cascaded, and need to only be increased in the case where introducing computing cost, can have been made the characteristic of image recognition Increase, effectively improves the accuracy of identification of image.It certainly, can also be special by each subassembly in an alternate embodiment of the present invention The cascade nature of sign is identified as target signature, is identified relative to the feature based on entire RoI extracted region, can also To effectively improve the accuracy of identification of image.

Wherein, it in an alternate embodiment of the present invention, is determined belonging to target from preset disaggregated model according to target signature Behavior classification, comprising: determine the probability of every kind of behavior classification belonging to target signature；Choose the behavior classification conduct of maximum probability The affiliated behavior classification of target.

Further, in one embodiment of the invention, target is being determined from default disaggregated model according to the target signature Before affiliated behavior classification, it is thus necessary to determine that default disaggregated model (LRFCN model).Here, default disaggregated model is determined Mode, specifically include as follows:

Obtain pre-training disaggregated model；

Establish include multiclass behavior sample data set, and to sample data set target region, behavior classification and The region of subassembly is labeled, and is trained, is obtained pre- to pre-training disaggregated model based on the sample data set of mark If disaggregated model.

Here, when obtaining pre-training disaggregated model, be by large database training obtain, such as ImageNet this One biggish database.Specifically, when establishing multiclass behavior sample data set, all images are in background, shooting in the data set Angle, illumination, picture two time scales approach will have certain otherness.Then by artificial mode to the target area in image The region of domain, behavior type and each subassembly is labeled.Then by the sample data set of mark to pre-training mould Type is trained, to be adjusted to the parameter in LRFCN model.

Further, optionally, pre-training disaggregated model is trained in the sample data set based on mark, is obtained pre- If disaggregated model after, this method further include:

Random cropping is carried out by the image concentrated to sample data, to expand sample data set；According to expansion Sample data set afterwards optimizes energy damage threshold, the preset disaggregated model after being optimized.

Specifically, in training LRFCN model, energy damage threshold is to intersect entropy loss and bounding box recurrence loss With as shown by the following formula:

Wherein, s is that all kinds of softmax is responded, t^*Prediction result is represented relative to the offset of ground truth, t is pre- Result is surveyed relative to the offset of preset frame.c^*=0 illustrates that the label of RoI is background, works as c^*[c when > 0^*It > 0]=1, is otherwise 0.L_reg Indicate bounding box loss, r_cIndicate that the score of the spatial position of the RoI c class is averaged pond, circular is for example following Shown in formula:

L_reg(t,t^*)=R (t-t^*)

t_x=(x-x_a)/w_a,t_y=(y-y_a)/h_a,t_w=log (w/w_a), t_h=log (h/h_a)

t_x ^*=(x^*-x_a)/w_a,t_y ^*=(y^*-y_a)/h_a,t_w ^*=log (w^*/w_a), t_h ^*=log (h^*/h_a)

Wherein, R is Smooth L1 loss function, and x, y, w, h are respectively the center point coordinate and width height of predicted boundary frame, Subscript a is the center point coordinate and width height of preset frame, and the center point coordinate and width that subscript * is ground truth are high.

Based on it is above-mentioned it is found that by energy damage threshold can be used to estimate model predicted value and true value it is inconsistent Degree, energy damage threshold is smaller, and model accuracy is better.Therefore, LRFCN can be guaranteed by training energy damage threshold The accuracy of model, to improve the accuracy of identification.

TV is eaten, seen in family's monitor video, electronic equipment is played, falls down, the five class behaviors identification such as cruelty to child For the training process of LFRCN model in the present invention is illustrated:

Step 201, the sample data set comprising multiclass behavior is established.

Here, first against propose the problem of, establish one comprising eat, see TV, play electronic equipment, fall down, The database of five class behavior such as cruelty to child, every class include about 2000, and the sample of these images is all derived from family's monitoring view Frequently.

Secondly, randomly selecting therein 2/3rds as training sample and being put into trained library, remaining one third As test sample.The content that all images include all derives from family's monitor video of reality.

Step 202, manually to the target area in image, behavior classification, the regional areas such as target cranial, body, lower limb It is labeled.Specifically, include the following:

Step 2021, it to carry out artificial mark ground truth to the target in image, is marked out simultaneously by picture frame Target area and behavior class label, if class label is 0,1,2,3,4；

Step 2022, three head, body, lower limb parts in human body target are calibrated in sample image, and according to calibration As a result, calculating average proportions value shared by head, body and lower limb in each region；

Step 2023, the specific picture that three head, body, lower limb parts cover in human body target in sample image is calibrated Plain position records specific location of pixels by image template.

Step 203, pre-training LFRCN network model is obtained.

Because the neural network in LFRCN network model includes quantity of parameters, and what the sample data oneself established was concentrated Sample number is on the low side, is directly trained with sample data set and is easy to happen over-fitting, thus select ImageNet this compared with LFRCN network model is first obtained on big database, and then pre-training LFRCN network model is trained based on training library.

Step 204 is trained pre-training LFRCN network model based on training library, finely tunes the parameter of the network model. The process can be divided into following small step:

The size of image in training library is normalized step 2041, makes the maximum side of image less than 600；

Step 2042 will train every piece image all random croppings in library, carry out the expansion of database；

Since network parameter is more and sample is less, in order to avoid over-fitting, randomly cut from image in training Image to training library expanded to network training, to increase sample number.

The above-mentioned energy damage threshold of step 2043, optimization, obtains final LFRCN network model.Wherein, it was training Cheng Zhong sets the parameter that initial learning rate randomly abandons 50% as 0.000001 and according to 0.5 Loss Rate.For optimization Process is no longer introduced here, such as can pass through least square method and gradient descent method etc..

Based on above-mentioned it is found that the behavior proposed by the invention based on the full convolutional neural networks of regional area (LRFCN) is known Other method can extremely accurate detect the goal behavior in video, have centainly for the technology vacancy of current intelligent security guard Degree is filled up.

Apparatus embodiments

According to an embodiment of the invention, a kind of image Activity recognition equipment is provided, for realizing above-mentioned image behavior Recognition methods.As shown in Figure 4.The equipment includes processor 42 and is stored with the memory 41 of 42 executable instruction of processor.Tool Body, image Activity recognition equipment provided in an embodiment of the present invention, when the executable instruction in memory 41 is held by processor 42 When row, with image behavior recognition methods provided in implementation method embodiment.It should be noted that in apparatus embodiments, Concrete implementation is no longer repeated, may refer to the detailed description in embodiment of the method, in this embodiment no longer into Row repeats.

Wherein, processor 42 can be general processor, such as central processing unit (central processing unit, CPU), it can also be digital signal processor (digital signal processor, DSP), specific integrated circuit (application specific integrated circuit, ASIC), or be arranged to implement the embodiment of the present invention One or more integrated circuits.

Memory 41 is transferred to CPU for storing program code, and by the program code.Memory 41 may include easy The property lost memory (volatile memory), such as random access memory (random access memory, RAM)；Storage Device 41 also may include nonvolatile memory (non-volatile memory), such as read-only memory (read-only Memory, ROM), flash memory (flash memory), hard disk (hard disk drive, HDD) or solid state hard disk (solid-state drive, SSD)；Memory 41 can also include the combination of the memory of mentioned kind.

Storage medium embodiment

The embodiment of the invention also provides a kind of computer readable storage mediums.Here computer readable storage medium is deposited Contain one or more program.Wherein, computer readable storage medium may include volatile memory, such as arbitrary access Memory；Memory also may include nonvolatile memory, such as read-only memory, flash memory, hard disk or solid-state are hard Disk；Memory can also include the combination of the memory of mentioned kind.When one or more in computer readable storage medium Program can be executed by one or more processor, with complete in image behavior recognition methods provided by implementation method embodiment Portion's step and part steps.For step concrete implementation, the detailed description in embodiment of the method may refer to, in the embodiment In no longer repeated.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, program can be stored in computer-readable storage medium, the journey Sequence is when being executed, it may include such as the process of the embodiment of above-mentioned each method.

Although describing the application by embodiment, it will be apparent to one skilled in the art that the application is there are many deformation and becomes Change without departing from the spirit and scope of the present invention.If being wanted in this way, these modifications and changes of the present invention belongs to right of the present invention Ask and its equivalent technologies within the scope of, then the present invention is also intended to include these modifications and variations.

Claims

1. a kind of image behavior recognition methods characterized by comprising

The division that subassembly is carried out to the target region in images to be recognized, determines the region where each subassembly；

The feature that each subassembly is extracted from the region where each subassembly is determined according to the feature of each subassembly Behavior classification belonging to the target.

2. image behavior recognition methods according to claim 1, which is characterized in that the target in images to be recognized Region carries out the division of subassembly, determines the region where each subassembly, comprising:

Prospect background segmentation is carried out using each subassembly region of the Region Segmentation Algorithm to division, obtains the prospect point of subassembly Cut result.

3. image behavior recognition methods according to claim 2, which is characterized in that according to preset subassembly average specific Example value carries out the target region before the division of subassembly, further includes:

4. image behavior recognition methods according to claim 2, which is characterized in that the Region Segmentation Algorithm includes following It is at least one: GrabCut algorithm, GraphCut algorithm and RandomWalker algorithm.

5. image behavior recognition methods according to claim 1, which is characterized in that described where each subassembly Region in extract the feature of each subassembly, according to the feature of each subassembly determine the target belonging to behavior classification, Include:

The feature that the feature that subassembly extracts is extracted with the target region is cascaded, the feature after cascade is as institute State target signature；

6. image behavior recognition methods according to claim 5, which is characterized in that it is described according to the target signature from pre- If determining behavior classification belonging to the target in disaggregated model, comprising:

7. image behavior recognition methods according to claim 5, which is characterized in that according to the target signature from default In disaggregated model before behavior classification belonging to the determining target, further includes:

Obtain pre-training disaggregated model；

Establish include multiclass behavior sample data set, and to sample data set target region, behavior classification and The region of subassembly is labeled；The pre-training disaggregated model is trained based on the sample data set of mark, is obtained To the default disaggregated model.

8. image behavior recognition methods according to claim 7, which is characterized in that described to obtain the preset classification mould After type, the method also includes:

Energy damage threshold is optimized according to the sample data set after expansion, the default disaggregated model after being optimized.

9. a kind of image Activity recognition equipment characterized by comprising memory and processor；Wherein, it is deposited in the memory Computer instruction is stored up, when the computer instruction is executed by the processor, to realize described in any one of claim 1~8 Image behavior recognition methods in step.

10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage have one or Multiple programs, when one or more of programs are executed by the processor, to realize any one of claim 1~8 institute The step in image behavior recognition methods stated.