CN108446649A

CN108446649A - Method and device for alarm

Info

Publication number: CN108446649A
Application number: CN201810256338.1A
Authority: CN
Inventors: 杨锐
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd; Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2018-03-27
Filing date: 2018-03-27
Publication date: 2018-08-24

Abstract

The embodiment of the present application discloses the method and device for alarm.One specific implementation mode of this method includes：Image recognition is carried out to the video of acquisition, obtains people information；The video clip of the corresponding personage of people information is imported to Activity recognition model trained in advance, obtains the behavior label of corresponding personage, behavior label is arranged for the behavior of personage to be identified, and for behavior in above-mentioned Activity recognition model；It is non-security in response to above-mentioned behavior label, then sends out alarm signal.This embodiment improves the accuracys of behavior in identification video, can find non-security behavior in time, and send out alarm signal.

Description

Method and device for alarm

Technical field

The invention relates to field of computer technology, and in particular to the method and device for alarm.

Background technology

With the development of science and technology, the application range of electronic equipment is also increasingly wider.Electronic equipment can be arranged at crossing or The positions such as shop, collection vehicle or pedestrian's video or image.When accident etc. occurs, staff can be adopted by electronic equipment The video or image of collection make analysis to accident.

Invention content

The purpose of the embodiment of the present application is to propose the method and device for alarm.

In a first aspect, the embodiment of the present application provides a kind of method for alarm, this method includes：To the video of acquisition Image recognition is carried out, people information is obtained；The video clip of the corresponding personage of people information is imported behavior trained in advance to know Other model obtains the behavior label of corresponding personage, and above-mentioned Activity recognition model is row for the behavior of personage to be identified For behavior label is arranged；It is non-security in response to above-mentioned behavior label, then sends out alarm signal.

In some embodiments, the above-mentioned video to acquisition carries out image recognition, obtains people information, including：In response to There are character images in video, extract character image；Identify the people information of the corresponding personage of character image, above-mentioned people information Including at least one of following：Gender, height, dressing color.

In some embodiments, above-mentioned Activity recognition model includes convolutional neural networks, Recognition with Recurrent Neural Network and full connection Layer.

In some embodiments, the above-mentioned video clip by the corresponding personage of people information imports behavior trained in advance and knows Other model obtains the behavior label of corresponding personage, including：Above-mentioned video clip is input to above-mentioned convolutional neural networks, is obtained The feature vector of each frame image of above-mentioned video clip, wherein above-mentioned convolutional neural networks are for characterizing video clip and video Each frame image feature vector between correspondence；The feature vector of each frame image of above-mentioned video clip is input to Recognition with Recurrent Neural Network is stated, the feature vector of above-mentioned video clip is obtained, wherein above-mentioned Recognition with Recurrent Neural Network is for characterizing piece of video The feature vector of correspondence between the feature vector and the feature vector of video clip of each frame image of section, video clip is used Incidence relation between the feature vector of each frame image of characterization video clip；The feature vector of above-mentioned video clip is inputted To above-mentioned full articulamentum, obtain above-mentioned behavior label, wherein above-mentioned full articulamentum be used to characterize the feature vector of video clip with Correspondence between behavior label.

In some embodiments, training obtains above-mentioned Activity recognition model as follows：Obtain multiple record someone The historical behavior label corresponding to each history video in the history video of object behavior and above-mentioned multiple history videos, wherein Whether historical behavior label is safe for identifying personage's behavior；Using each history video in above-mentioned multiple history videos as defeated Enter, using the historical behavior label corresponding to each history video in above-mentioned multiple history videos as output, training obtains State Activity recognition model.

In some embodiments, above-mentioned each history video using in above-mentioned multiple history videos, will be above-mentioned as input For historical behavior label corresponding to each history video in multiple history videos as output, training obtains above-mentioned Activity recognition Model, including：Execute following training step：Each history video in above-mentioned multiple history videos is sequentially input to initial row For identification model, the prediction history behavior label corresponding to each history video in above-mentioned multiple history videos is obtained, it will be upper State the prediction history behavior label corresponding to each history video in multiple history videos and going through corresponding to the history video History behavior label is compared, and obtains the recognition accuracy of above-mentioned initial behavior identification model, determines that above-mentioned recognition accuracy is It is no to be more than default accuracy rate threshold value, if more than above-mentioned default accuracy rate threshold value, then using above-mentioned initial behavior identification model as instruction Practice the Activity recognition model completed.

In some embodiments, above-mentioned each history video using in above-mentioned multiple history videos, will be above-mentioned as input For historical behavior label corresponding to each history video in multiple history videos as output, training obtains above-mentioned Activity recognition Model, including：In response to being not more than above-mentioned default accuracy rate threshold value, the parameter of above-mentioned initial behavior identification model is adjusted, and after It is continuous to execute above-mentioned training step.

Second aspect, the embodiment of the present application provide a kind of device for alarm, which includes：People information identifies Unit carries out image recognition for the video to acquisition, obtains people information；Behavior label acquiring unit, for believing personage The video clip for ceasing corresponding personage imports Activity recognition model trained in advance, obtains the behavior label of corresponding personage, above-mentioned Behavior label is arranged for the behavior of personage to be identified, and for behavior in Activity recognition model；Alarm Unit, in response to Above-mentioned behavior label is non-security, then sends out alarm signal.

In some embodiments, above-mentioned people information recognition unit includes：Personal image extraction subelement, in response to There are character images in video, extract character image；People information identifies subelement, for identification the corresponding personage of character image People information, above-mentioned people information includes at least one of following：Gender, height, dressing color.

In some embodiments, above-mentioned behavior label acquiring unit includes：Image feature vector obtains subelement, and being used for will Above-mentioned video clip is input to above-mentioned convolutional neural networks, obtains the feature vector of each frame image of above-mentioned video clip, wherein Correspondence between the feature vector for each frame image that above-mentioned convolutional neural networks are used to characterize video clip and video；Video Feature vector obtains subelement, for the feature vector of each frame image of above-mentioned video clip to be input to above-mentioned cycle nerve net Network obtains the feature vector of above-mentioned video clip, wherein above-mentioned Recognition with Recurrent Neural Network is used to characterize each frame image of video clip Feature vector and the feature vector of video clip between correspondence, the feature vector of video clip is for characterizing piece of video Incidence relation between the feature vector of each frame image of section；Behavior label obtains subelement, is used for above-mentioned video clip Feature vector is input to above-mentioned full articulamentum, obtains above-mentioned behavior label, wherein above-mentioned full articulamentum is for characterizing video clip Feature vector and behavior label between correspondence.

In some embodiments, above-mentioned apparatus includes Activity recognition model training unit, above-mentioned Activity recognition model training Unit includes：Historical information obtains subelement, multiple record the history video for having personage's behavior for obtaining and above-mentioned multiple goes through The historical behavior label corresponding to each history video in history video, wherein historical behavior label is for identifying personage's behavior It is whether safe；Activity recognition model training subelement, for using each history video in above-mentioned multiple history videos as defeated Enter, using the historical behavior label corresponding to each history video in above-mentioned multiple history videos as output, training obtains State Activity recognition model.

In some embodiments, above-mentioned Activity recognition model training subelement includes：Activity recognition model training module is used It is sequentially input to initial behavior identification model in by each history video in above-mentioned multiple history videos, obtains above-mentioned multiple go through The prediction history behavior label corresponding to each history video in history video, by each history in above-mentioned multiple history videos Prediction history behavior label corresponding to video is compared with the historical behavior label corresponding to the history video, is obtained above-mentioned The recognition accuracy of initial behavior identification model, determines whether above-mentioned recognition accuracy is more than default accuracy rate threshold value, if more than Above-mentioned default accuracy rate threshold value, then the Activity recognition model completed above-mentioned initial behavior identification model as training.

In some embodiments, above-mentioned Activity recognition model training subelement further includes：Parameter adjustment module, for responding In no more than above-mentioned default accuracy rate threshold value, the parameter of above-mentioned initial behavior identification model is adjusted, and continue to execute above-mentioned training Step.

The third aspect, the embodiment of the present application provide a kind of server, including：One or more processors；Memory is used In the one or more programs of storage, camera, for obtaining image and/or video；When said one or multiple programs are above-mentioned When one or more processors execute so that said one or multiple processors execute the side for alarm of above-mentioned first aspect Method.

Fourth aspect, the embodiment of the present application provide a kind of computer-readable medium, are stored thereon with computer program, It is characterized in that, which realizes the method for alarm of above-mentioned first aspect when being executed by processor.

Method and device provided by the embodiments of the present application for alarm carries out image recognition to the video of acquisition first, Obtain people information；Then, the video clip of the corresponding personage of people information is imported to Activity recognition model trained in advance, is obtained To the behavior label of corresponding personage；Finally alarm signal is sent out when behavior label is non-security.The application can be to video bag The behavior contained is analyzed, and behavior label is arranged for behavior；Alarm signal is sent out when behavior label is non-security.It improves The accuracy for identifying behavior in video, can find non-security behavior, and send out alarm signal in time.

Description of the drawings

By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon：

Fig. 1 is that this application can be applied to exemplary system architecture figures therein；

Fig. 2 is the flow chart according to one embodiment of the method for alarm of the application；

Fig. 3 is the flow chart according to one embodiment of the Activity recognition model training method of the application；

Fig. 4 is the schematic diagram according to an application scenarios of the method for alarm of the application；

Fig. 5 is the structural schematic diagram according to one embodiment of the device for alarm of the application；

Fig. 6 is adapted for the structural schematic diagram of the computer system of the server for realizing the embodiment of the present application.

Specific implementation mode

The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, is illustrated only in attached drawing and invent relevant part with related.

It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

Fig. 1 shows the exemplary of the device for the method for alarm or for alarm that can apply the embodiment of the present application System architecture 100.

As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105. Network 104 between terminal device 101,102,103 and server 105 provide communication link medium.Network 104 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..

User can be interacted by network 104 with server 105 with using terminal equipment 101,102,103, to receive or send out It delivers letters breath etc..Various video processing applications can be installed on terminal device 101,102,103, such as image processing application, regarded Frequency acquisition applications, video control application, Video segmentation application, video send tool, video frequency tracking application etc..

Terminal device 101,102,103 can be hardware, can also be software.When terminal device 101,102,103 is hard Can be the various electronic equipments that there is camera and support image and video acquisition when part, including but not limited to monitoring is taken the photograph As head, IP Camera, camera violating the regulations etc..When terminal device 101,102,103 is software, above-mentioned institute may be mounted at In the electronic equipment enumerated.Multiple softwares or software module (such as providing Distributed Services) may be implemented into it, also may be used To be implemented as single software or software module.It is not specifically limited herein.

Server 105 can be to provide the server of various services, such as be regarded to 101,102,103 offer of terminal device Frequency carries out the server of data processing.Server can carry out the processing such as analyzing to data such as the videos that receives, when finding to regard There are alarm signal can be sent out when non-security behavior in frequency.

It should be noted that the method for alarm that the embodiment of the present application is provided generally is executed by server 105, phase Ying Di, the device for alarm are generally positioned in server 105.

It should be noted that server can be hardware, can also be software.When server is hardware, may be implemented At the distributed server cluster that multiple servers form, individual server can also be implemented as.It, can when server is software To be implemented as multiple softwares or software module (such as providing Distributed Services), single software or software can also be implemented as Module.It is not specifically limited herein.

It should be understood that the number of the terminal device, network and server in Fig. 1 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.

With continued reference to Fig. 2, the flow 200 of one embodiment of the method for alarm according to the application is shown.It should Method for alarm includes the following steps：

Step 201, image recognition is carried out to the video of acquisition, obtains people information.

In the present embodiment, being used for the executive agent (such as server 105 shown in FIG. 1) of the method for alarm can pass through Wired connection mode or radio connection receive video from terminal device 101,102,103.Wherein, terminal device 101, 102,103 the video of real-time/non-real time acquisition can be sent to server 105.People information is used for the personage in video It is described.It should be pointed out that above-mentioned radio connection can include but is not limited to 3G/4G connections, WiFi connections, bluetooth Connection, WiMAX connections, Zigbee connections, UWB (ultra wideband) connections and other currently known or exploitations in the future Radio connection.

The existing camera mounted on positions such as crossing or shops can acquire the video information in setting range.Work people Member can carry out a degree of control to camera, such as can be：The video acquisition angle of camera is controlled, with Obtain the video data in particular video frequency acquisition range.The equipment such as the light compensating lamp that can also be carried by camera, to a certain degree On to acquisition video when brightness control.When generation accident, staff can transfer the phase obtained before Video is closed, to analyze etc. accident.But this analyzing processing to accident just carries out after generation accident, tool Having time hysteresis quality can only play accident the effects that analyzing reason or process, not be avoided that the generation of accident.

For this purpose, the executive agent of the present embodiment can obtain the acquisition of terminal device 101,102,103 real-time or non real-time Video.Later, each frame image that executive agent can include to video carries out the processing such as image recognition, from each frame image The people information for including to video.

In some optional realization methods of the present embodiment, the above-mentioned video to acquisition carries out image recognition, obtains people Object information, may comprise steps of：

The first step extracts character image in response to there are character images in video.

May be recorded in video has personage, it is also possible to not have record personal.In general, to the analysis of video be all to personage or The analysis carried out with the relevant information of personage.Therefore, when, there are when character image, executive agent can extract accordingly in video Character image.

Second step, the people information of the corresponding personage of identification character image.

After obtaining character image, image procossing is carried out to character image, the personage of the corresponding character image can be recognized People information.Wherein, above-mentioned people information may include at least one of following for the personage in video to be described： Number of person, gender, height, dressing color etc..

Step 202, the video clip of the corresponding personage of people information is imported to Activity recognition model trained in advance, is obtained The behavior label of corresponding personage.

The data volume of video is usually larger.For the ease of analyzing specific personage, executive agent can be from video In extract the video clip of corresponding personage.Later, the video clip of personage can be imported row trained in advance by executive agent For identification model, the behavior label of corresponding personage is obtained.Wherein, the above-mentioned Activity recognition model of the application can be used for personage Behavior be identified, and for behavior be arranged behavior label.Behavior can be a variety of, such as can be on foot, quarrel, fight Deng.Behavior label is for classifying to behavior by certain standard.For example, standard can be standard according to whether safety, it is right Behavior is divided.The content of corresponding behavior label can be safety behavior or non-security behavior；Standard can also be personage The content of relationship, corresponding behavior label can be friends, family's relationship, strange relationship etc..

In some optional realization methods of the present embodiment, the Activity recognition model of the present embodiment can be artificial neuron Network may include convolutional neural networks, Recognition with Recurrent Neural Network and full articulamentum.Correspondingly, above-mentioned that people information is corresponding The video clip of personage imports Activity recognition model trained in advance, obtains the behavior label of corresponding personage, may include following Step：

Above-mentioned video clip is input to above-mentioned convolutional neural networks by the first step, obtains each frame figure of above-mentioned video clip The feature vector of picture.

In the present embodiment, video clip can be inputted convolutional neural networks by executive agent, obtained video clip and included Each frame image feature vector.Video clip is made of multiple image, and the feature vector of each frame image can be used for describing each Feature possessed by frame image.For example, can there is the feature vector of corresponding " taking in one's arms " action, " sealing mouth " to act in the frame image The feature vector etc. of feature vector and " running " action.The feature vector of image can pass through the label for marking each action The modes such as point indicate.Executive agent can train the feature of each frame image of characterization video clip and video in several ways The convolutional neural networks of correspondence between vector.The feature vector of above-mentioned image can pass through setting flag on the image Point, and realized in plane coordinates or the coordinate value of space coordinate with mark point.

In the present embodiment, convolutional neural networks can be a kind of feedforward neural network, its artificial neuron can ring The surrounding cells in a part of coverage area are answered, have outstanding performance for large-scale image procossing.In general, the base of convolutional neural networks This structure includes two layers, and one is characterized extract layer, and the input of each neuron is connected with the local acceptance region of preceding layer, and carries Take the feature of the part.After the local feature is extracted, its position relationship between other feature is also decided therewith； The second is Feature Mapping layer, each computation layer of network is made of multiple Feature Mappings, and each Feature Mapping is a plane, is put down The weights of all neurons are equal on face.Here, executive agent can be defeated by the input side of video clip from convolutional neural networks Enter, passes through the processing of the parameter of each layer in convolutional neural networks successively, and exported from the outlet side of convolutional neural networks, output The information of side output is the feature vector of each frame image of video clip.

As an example, executive agent can be based on each frame figure to great amount of samples video clip and Sample video segment The feature vector of picture is counted and is generated the spy for each frame image for being stored with multiple Sample video segments and Sample video segment The mapping table of the correspondence of vector is levied, and using the mapping table as convolutional neural networks.In this way, executive agent can Video clip and multiple Sample video segments in the mapping table to be compared successively.If in the mapping table Some Sample video segment and the video clip are same or similar, then can be by the Sample video segment in the mapping table Each frame image feature vector as the video clip each frame image feature vector.

As another example, executive agent can obtain each frame figure of Sample video segment and Sample video segment first The feature vector of picture；Then using Sample video segment as input, the feature vector of each frame image of Sample video segment is made For output, pair between the feature vector for each frame image for obtaining to characterize Sample video segment and Sample video segment is trained The convolutional neural networks that should be related to.In this way, executive agent can input video clip from the input side of convolutional neural networks, according to The processing of the secondary parameter by each layer in convolutional neural networks, and exported from the outlet side of convolutional neural networks, outlet side is defeated The information gone out is the feature vector of each frame image of video clip.

The feature vector of each frame image of above-mentioned video clip is input to above-mentioned Recognition with Recurrent Neural Network, obtained by second step The feature vector of above-mentioned video clip.

In the present embodiment, the feature vector of each frame image of video clip can be input to cycle nerve by executive agent Network, to obtain the feature vector of video clip.Wherein, the feature vector of video clip can be used for characterizing video clip Incidence relation between the feature vector of each frame image.The combination of eigenvectors of each frame image at multiple moment is got up, it can be with Constitute the feature vector of video clip.For example, the feature vector that " the taking in one's arms " of above-mentioned each frame image is acted, " sealing mouth " act Feature vector and the feature vector etc. of " running " action be combined according to chronological order, corresponding video can be obtained The feature vector of segment.The feature vector of the video clip can be the feature vector for characterizing coherent multiple actions.It will Multiple combination of actions, which are got up, can correspond to certain behavior.Therefore, the feature vector of video clip is it is also assumed that be various figures Feature vector mobilism expression, can include information more more than the feature vector of each frame image.Executive agent can lead to It crosses between the feature vector and the feature vector of video clip for each frame image that various ways training can characterize video clip The Recognition with Recurrent Neural Network of incidence relation.

In the present embodiment, Recognition with Recurrent Neural Network is a kind of artificial neural network of node orientation connection cyclization.This net The substantive characteristics of network is the feedback link of the existing inside between processing unit has feedforward to connect again, and internal state can be shown Dynamic time sequence behavior.

As an example, executive agent can be based on the feature vector and sample of each frame image to great amount of samples video clip The feature vector of video clip counted and generate the feature vector of each frame image for being stored with multiple Sample video segments with The mapping table of the correspondence of the feature vector of Sample video segment, and using the mapping table as cycle nerve net Network.In this way, executive agent can calculate the feature vector of each frame image of video clip and multiple samples in the mapping table The parameters such as the Euclidean distance between the feature vector of each frame image of this video clip.If some sample in the mapping table Euclidean distance between the feature vector of each frame image of video clip and the feature vector of each frame image of video clip is less than Preset distance threshold, can be using the feature vector of the Sample video segment in the mapping table as the video clip Feature vector.

As another example, executive agent can obtain the feature vector and sample of each frame image of Sample video segment The feature vector of video clip；Then using the feature vector of each frame image of Sample video segment as input, by Sample video The feature vector of segment obtains to characterize the feature vector and sample of each frame image of Sample video segment as output, training The Recognition with Recurrent Neural Network of correspondence between the feature vector of video clip.In this way, executive agent can be by video clip The feature vector of each frame image is inputted from the input side of Recognition with Recurrent Neural Network, passes through the ginseng of each layer in Recognition with Recurrent Neural Network successively Several processing, and exported from the outlet side of Recognition with Recurrent Neural Network, the information of outlet side output is the feature vector of video clip.

Third walks, and the feature vector of above-mentioned video clip is input to above-mentioned full articulamentum, obtains above-mentioned behavior label.

In the present embodiment, the feature vector of video clip can be input to full articulamentum by executive agent, to obtain The behavior label of video clip.Wherein, can include a plurality of types of behaviors such as walk, quarrel, fighting in video clip.It is different Behavior can correspond to different behavior labels.For example, by the feature vector of " taking in one's arms " action of above-mentioned each frame image, " sealing The feature vector of mouth " action and the feature vector etc. of " running " action are combined according to chronological order, can be obtained pair The feature vector for the video clip answered.The feature vector of the video clip can correspond to the behaviors such as " kidnaping and selling people ", " robbery ".This When, can be able to be the behavior label of " non-security " to the feature vector set content of the video clip.Executive agent can lead to Cross the full articulamentum for the correspondence that various ways training can characterize between the feature vector of video clip and behavior label.

In the present embodiment, all node phases of each node of full articulamentum and the output layer of Recognition with Recurrent Neural Network Even, the feature vector of the video for Recognition with Recurrent Neural Network output layer is exported integrates.Due to the characteristic that it is connected entirely, one As the parameter of full articulamentum be also most.Meanwhile the feature vector of video is carried out in the parameter using full articulamentum linear After transformation, the result of linear transformation can be converted plus a nonlinear activation function, to introduce non-linear factor, To enhance the ability to express of Activity recognition model.Wherein, excitation function can be softmax functions.Softmax functions are artificial Common a kind of excitation function in neural network, in this not go into detail.

As an example, executive agent can based on to great amount of samples video clip feature vector and behavior label into Row counts and generates the feature vector pass corresponding with the correspondence of sample behavior label for being stored with multiple Sample video segments It is table, and using the mapping table as full articulamentum.In this way, executive agent can calculate the feature vector of video clip and be somebody's turn to do Euclidean distance between the feature vector of multiple Sample video segments in mapping table.If some in the mapping table Euclidean distance between the feature vector of Sample video segment and the feature vector of the video clip is less than preset distance threshold, Then using the corresponding sample behavior label of the Sample video segment in the mapping table as the behavior label of the video clip.

As another example, executive agent can obtain the feature vector and sample behavior mark of Sample video segment first Label.Then, using the feature vector of Sample video segment as input, using sample behavior label as output, training obtains can Characterize the full articulamentum of the correspondence between the feature vector and sample behavior label of Sample video segment.In this way, executing master Body can input the feature vector of video clip from the input side of full articulamentum, by the parameter and excitation function of full articulamentum Processing, and from the outlet side of full articulamentum export, outlet side output information be behavior label.

Step 203, it is non-security in response to above-mentioned behavior label, then sends out alarm signal.

Seen from the above description, the behavior label substance that behavior label model obtains can be there are many situation.When behavior mark When the content of label is non-security, executive agent can send out alarm signal in time, to prevent the generation of accident, or subtract The loss etc. of minor accident.For example, trailing a child when terminal device 101,102,103 collects a certain adult, and cover small When child is embraced by the mouth of child, the behavior label model on executive agent can analyze the video clip of the adult, Show that the behavior of the adult is non-security behavior label.Later, executive agent can send out alarm signal in time, so as to phase Pass personnel are monitored or arrest to the adult in time.

With further reference to Fig. 3, it illustrates according to one embodiment of the Activity recognition model training method of the application Flow 300.The flow 300 of behavior identification model training method, includes the following steps：

Step 301, obtaining multiple records has each of the history video of personage's behavior and above-mentioned multiple history videos to go through Historical behavior label corresponding to history video.

In the present embodiment, executive agent (such as the clothes shown in FIG. 1 of Activity recognition model training method operation thereon Business device 105) multiple each history video institutes for recording and having in the history video and multiple history videos of personage's behavior can be obtained Corresponding historical behavior label.

In the present embodiment, executive agent can obtain multiple history videos for recording and having personage's behavior, and be this field Technical staff plays.Those skilled in the art can be rule of thumb to each history video labeling history in multiple history videos Behavior label.Wherein, whether safe the historical behavior label of the present embodiment can be used for identifying personage's behavior in history video. For example, when the video content of history video is the behavior fought, quarreled etc. with propensity to violence, technical staff can be history Video set content is non-security historical behavior label.The content of historical behavior label can be different interior as needed Hold, depending on actual needs.

Step 302, each history video in above-mentioned multiple history videos is sequentially input to initial behavior identification model, Obtain the prediction history behavior label corresponding to each history video in above-mentioned multiple history videos.

In the present embodiment, there are the history video of personage's behavior, executive agent that can incite somebody to action based on acquired multiple records Each history video in multiple history videos is sequentially input to initial behavior identification model, to obtain in multiple history videos Each history video corresponding to prediction history behavior label.Here, executive agent can be by each history video from initial The input side of Activity recognition model inputs, and passes through the processing of the parameter of each layer in initial behavior identification model successively, and from first It begins as the outlet side output of identification model, the information that outlet side exports is the prediction history behavior corresponding to the history video Label.Wherein, initial behavior identification model can be unbred Activity recognition model or the Activity recognition that training is not completed Model, each layer are provided with initiation parameter, and initiation parameter can be by constantly in the training process of behavior identification model Adjustment.

Initial behavior identification model can train in accordance with the following steps to be obtained：

The first step obtains the sample of the Sample video segment and above-mentioned multiple and different behavior types of multiple and different behavior types The sample behavior label of the Sample video segment of each behavior type in video clip.

Can be that the Sample video segment setting comprising typical behavior type corresponds to when the initial behavior identification model of training Sample behavior label.Such as can be that Sample video segment contents include abducting child behavior, and the behavior includes to child's " sealing mouth " acts and " quickly walking about " action.Can be then the Sample video segment set content be " kidnaping and selling people " or " non-peace Sample behavior label entirely ".

Second step, using machine learning method, by each row in the Sample video segment of above-mentioned multiple and different behavior types It is the Sample video segment conduct input of type, by each behavior class in the Sample video segment of above-mentioned multiple and different behavior types The sample behavior label of the Sample video segment of type obtains initial behavior identification model as output, training.

It, can be by each row after obtaining the sample behavior label of above-mentioned Sample video segment and corresponding Sample video segment It is the Sample video segment conduct input of type, using the sample behavior label of the Sample video segment of behavior type as defeated Go out, training obtains initial behavior identification model.Initial behavior identification model can be based on the model realizations such as deep learning model, and And can have and include the structures such as convolutional neural networks, Recognition with Recurrent Neural Network and full articulamentum.

Step 303, by corresponding to each history video in above-mentioned multiple history videos prediction history behavior label with Historical behavior label corresponding to the history video is compared, and obtains the recognition accuracy of above-mentioned initial behavior identification model.

In the present embodiment, based on corresponding to each history video in the obtained multiple history videos of step 302 Prediction history behavior label, executive agent can be by the prediction history rows corresponding to each history video in multiple history videos It is compared with the historical behavior label corresponding to the history video for label, to obtain the identification of initial behavior identification model Accuracy rate.Specifically, if the prediction history behavior label corresponding to a history video and the history corresponding to the history video Behavior label is same or similar, then initial behavior identification model identification is correct；If the prediction history corresponding to a history video Behavior label is different or not close from the historical behavior label corresponding to the history video, then initial behavior identification model identification is wrong Accidentally.Here, executive agent can calculate the ratio for identifying correct number and total sample number, and as initial behavior identification model Recognition accuracy.

Step 304, determine whether above-mentioned recognition accuracy is more than default accuracy rate threshold value.

In the present embodiment, the recognition accuracy based on the obtained initial behavior identification model of step 303, executive agent The recognition accuracy of initial behavior identification model can be compared with default accuracy rate threshold value, if more than default accuracy rate threshold Value, thens follow the steps 305；If more than default accuracy rate threshold value, 306 are thened follow the steps.

Step 305, the Activity recognition model above-mentioned initial behavior identification model completed as training.

In the present embodiment, the case where the recognition accuracy of initial behavior identification model is more than default accuracy rate threshold value Under, illustrate that the training of behavior identification model is completed.At this point, executive agent can be completed initial behavior identification model as training Activity recognition model.

Step 306, the parameter of above-mentioned initial behavior identification model is adjusted.

In the present embodiment, the case where the recognition accuracy of initial behavior identification model is not more than default accuracy rate threshold value Under, executive agent can adjust the parameter of initial behavior identification model, and return to step 302, until training and can know In other video until the Activity recognition model of the behavior of personage.

It is a schematic diagram according to the application scenarios of the method for alarm of the present embodiment with continued reference to Fig. 4, Fig. 4. In the application scenarios of Fig. 4, terminal device 101 collects the video that two spadgers fight.Later, terminal device 101 will be adopted The video collected is sent to server 105 by network 104.Server 105 carries out image recognition to video, determines in the video Including people information.For example, people information can be " 2 people, man ".Then, server 105 believes the correspondence personage in video The video clip of the personage of breath imports Activity recognition model, can be confirmed that personage's behavior is to fight, and it is non-security to obtain content Behavior label.Finally, server 105 sends out alarm signal according to behavior label.Related personnel can be true according to alarm signal Determine personage position, and is prevented.

The method that above-described embodiment of the application provides carries out image recognition to the video of acquisition first, obtains personage's letter Breath；Then, the video clip of the corresponding personage of people information is imported to Activity recognition model trained in advance, obtains corresponding personage Behavior label；Finally alarm signal is sent out when behavior label is non-security.The application can to behavior that video includes into Row analysis, and behavior label is set for behavior；Alarm signal is sent out when behavior label is non-security.It improves in identification video The accuracy of behavior can find non-security behavior, and send out alarm signal in time.

With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, this application provides a kind of for alarm One embodiment of device, the device embodiment is corresponding with embodiment of the method shown in Fig. 2, which specifically can be applied to In various electronic equipments.

As shown in figure 5, the device 500 for alarm of the present embodiment may include：People information recognition unit 501, row For label acquiring unit 502 and Alarm Unit 503.Wherein, people information recognition unit 501 to the video of acquisition for carrying out figure As identification, people information is obtained；Behavior label acquiring unit 502 is used to import the video clip of the corresponding personage of people information Trained Activity recognition model in advance, obtains the behavior label of corresponding personage, and above-mentioned Activity recognition model is used for the row to personage To be identified, and behavior label is set for behavior；Alarm Unit 503 be used in response to above-mentioned behavior label be it is non-security, then Send out alarm signal.

In some optional realization methods of the present embodiment, above-mentioned people information recognition unit 501 may include：Personage Image zooming-out subelement (not shown) and people information identify subelement (not shown).Wherein, personal image extraction Subelement is used to, in response to there are character images in video, extract character image；People information identifies subelement people for identification The people information of the corresponding personage of object image, above-mentioned people information may include at least one of following：Gender, height, dressing face Color etc..

In some optional realization methods of the present embodiment, above-mentioned Activity recognition model includes convolutional neural networks, follows Ring neural network and full articulamentum.

In some optional realization methods of the present embodiment, above-mentioned behavior label acquiring unit 502 may include：Image Feature vector obtains subelement (not shown), video feature vector obtains subelement (not shown) and behavior label obtains Take subelement (not shown).Wherein, it is above-mentioned for above-mentioned video clip to be input to obtain subelement for image feature vector Convolutional neural networks obtain the feature vector of each frame image of above-mentioned video clip, wherein above-mentioned convolutional neural networks are used for table Levy the correspondence between the feature vector of each frame image of video clip and video；Video feature vector obtains subelement and is used for The feature vector of each frame image of above-mentioned video clip is input to above-mentioned Recognition with Recurrent Neural Network, obtains the spy of above-mentioned video clip Sign vector, wherein above-mentioned Recognition with Recurrent Neural Network is used to characterize the feature vector and video clip of each frame image of video clip Correspondence between feature vector, the feature vector of video clip are used to characterize the feature vector of each frame image of video clip Between incidence relation；Behavior label obtains subelement, above-mentioned connects entirely for the feature vector of above-mentioned video clip to be input to Connect layer, obtain above-mentioned behavior label, wherein above-mentioned full articulamentum be used to characterize the feature vector of video clip and behavior label it Between correspondence.

In some optional realization methods of the present embodiment, the device 500 for alarm may include Activity recognition mould Type training unit (not shown), above-mentioned Activity recognition model training unit include：Historical information obtains subelement (in figure not Show) and Activity recognition model training subelement (not shown).Wherein, it is multiple for obtaining to obtain subelement for historical information Record has the historical behavior mark corresponding to each history video in the history video and above-mentioned multiple history videos of personage's behavior Label, wherein whether historical behavior label is safe for identifying personage's behavior；Activity recognition model training subelement is used for will be above-mentioned Each history video in multiple history videos is as input, corresponding to each history video in above-mentioned multiple history videos Historical behavior label as output, training obtain above-mentioned Activity recognition model.

In some optional realization methods of the present embodiment, above-mentioned Activity recognition model training subelement may include： Activity recognition model training module (not shown), for each history video in above-mentioned multiple history videos is defeated successively Enter to initial behavior identification model, obtains the prediction history behavior corresponding to each history video in above-mentioned multiple history videos Label, by prediction history behavior label and history video institute corresponding to each history video in above-mentioned multiple history videos Corresponding historical behavior label is compared, and is obtained the recognition accuracy of above-mentioned initial behavior identification model, is determined above-mentioned identification Whether accuracy rate is more than default accuracy rate threshold value, if more than above-mentioned default accuracy rate threshold value, then by above-mentioned initial Activity recognition mould The Activity recognition model that type is completed as training.

In some optional realization methods of the present embodiment, above-mentioned Activity recognition model training subelement can also wrap It includes：Parameter adjustment module (not shown), in response to being not more than above-mentioned default accuracy rate threshold value, adjusting above-mentioned initial row For the parameter of identification model, and continue to execute above-mentioned training step.

The present embodiment additionally provides a kind of server, including：One or more processors；Memory, for storing one Or multiple programs, camera, for obtaining image and/or video；When said one or multiple programs are by said one or multiple When processor executes so that said one or multiple processors execute the above-mentioned method for alarm.

The present embodiment additionally provides a kind of computer-readable medium, is stored thereon with computer program, which is handled Device realizes the above-mentioned method for alarm when executing.

Below with reference to Fig. 6, it illustrates the computer systems 600 suitable for the server for realizing the embodiment of the present application Structural schematic diagram.Server shown in Fig. 6 is only an example, should not be to the function and use scope band of the embodiment of the present application Carry out any restrictions.

As shown in fig. 6, computer system 600 includes central processing unit (CPU) 601, it can be read-only according to being stored in Program in memory (ROM) 602 or be loaded into the program in random access storage device (RAM) 603 from storage section 608 and Execute various actions appropriate and processing.In RAM 603, also it is stored with system 600 and operates required various programs and data. CPU 601, ROM 602 and RAM 603 are connected with each other by bus 604.Input/output (I/O) interface 605 is also connected to always Line 604.

It is connected to I/O interfaces 605 with lower component：Importation 606 including keyboard, mouse etc.；It is penetrated including such as cathode The output par, c 607 of spool (CRT), liquid crystal display (LCD) etc. and loud speaker etc.；Storage section 608 including hard disk etc.； And the communications portion 609 of the network interface card including LAN card, modem etc..Communications portion 609 via such as because The network of spy's net executes communication process.Driver 610 is also according to needing to be connected to I/O interfaces 605.Detachable media 611, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on driver 610, as needed in order to be read from thereon Computer program be mounted into storage section 608 as needed.Camera 612 is also according to needing to be connected to I/O interfaces 605.

Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed by communications portion 609 from network, and/or from detachable media 611 are mounted.When the computer program is executed by central processing unit (CPU) 601, limited in execution the present processes Above-mentioned function.

It should be noted that the above-mentioned computer-readable medium of the application can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two arbitrarily combines.Computer readable storage medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or arbitrary above combination.Meter The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to：Electrical connection with one or more conducting wires, just It takes formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type and may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In this application, can be any include computer readable storage medium or storage journey The tangible medium of sequence, the program can be commanded the either device use or in connection of execution system, device.And at this In application, computer-readable signal media may include in a base band or as the data-signal that a carrier wave part is propagated, Wherein carry computer-readable program code.Diversified forms may be used in the data-signal of this propagation, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By instruction execution system, device either device use or program in connection.Include on computer-readable medium Program code can transmit with any suitable medium, including but not limited to：Wirelessly, electric wire, optical cable, RF etc. or above-mentioned Any appropriate combination.

Flow chart in attached drawing and block diagram, it is illustrated that according to the system of the various embodiments of the application, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part for a part for one module, program segment, or code of table, the module, program segment, or code includes one or more uses The executable instruction of the logic function as defined in realization.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, this is depended on the functions involved.Also it to note Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.

Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described unit can also be arranged in the processor, for example, can be described as：A kind of processor packet Include people information recognition unit, behavior label acquiring unit and Alarm Unit.Wherein, the title of these units is under certain conditions The restriction to the unit itself is not constituted, for example, Alarm Unit is also described as " sending out when behavior label is non-security Go out the unit of alarm signal ".

As on the other hand, present invention also provides a kind of computer-readable medium, which can be Included in device described in above-described embodiment；Can also be individualism, and without be incorporated the device in.Above-mentioned calculating Machine readable medium carries one or more program, when said one or multiple programs are executed by the device so that should Device：Image recognition is carried out to the video of acquisition, obtains people information；The video clip of the corresponding personage of people information is imported Trained Activity recognition model in advance, obtains the behavior label of corresponding personage, and above-mentioned Activity recognition model is used for the row to personage To be identified, and behavior label is set for behavior；It is non-security in response to above-mentioned behavior label, then sends out alarm signal.

Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.People in the art Member should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature Other technical solutions of arbitrary combination and formation.Such as features described above has similar work(with (but not limited to) disclosed herein Can technical characteristic replaced mutually and the technical solution that is formed.

Claims

1. a kind of method for alarm, including：

Image recognition is carried out to the video of acquisition, obtains people information；

The video clip of the corresponding personage of people information is imported to Activity recognition model trained in advance, obtains the row of corresponding personage For label, behavior label is arranged for the behavior of personage to be identified, and for behavior in the Activity recognition model；

It is non-security in response to the behavior label, then sends out alarm signal.

2. according to the method described in claim 1, wherein, the video of described pair of acquisition carries out image recognition, obtains people information, Including：

In response to there are character images in video, character image is extracted；

Identify that the people information of the corresponding personage of character image, the people information include at least one of following：Gender, height, Dressing color.

3. according to the method described in claim 1, wherein, the Activity recognition model includes convolutional neural networks, cycle nerve Network and full articulamentum.

4. according to the method described in claim 3, wherein, the video clip by the corresponding personage of people information imports advance Trained Activity recognition model obtains the behavior label of corresponding personage, including：

The video clip is input to the convolutional neural networks, obtain the feature of each frame image of the video clip to Amount, wherein the convolutional neural networks are corresponding between video clip and the feature vector of each frame image of video for characterizing Relationship；

The feature vector of each frame image of the video clip is input to the Recognition with Recurrent Neural Network, obtains the video clip Feature vector, wherein the Recognition with Recurrent Neural Network be used for characterize video clip each frame image feature vector and piece of video Correspondence between the feature vector of section, the feature vector of video clip are used to characterize the feature of each frame image of video clip Incidence relation between vector；

The feature vector of the video clip is input to the full articulamentum, obtains the behavior label, wherein described to connect entirely Layer is connect for characterizing the correspondence between the feature vector of video clip and behavior label.

5. according to the method described in claim 1, wherein, training obtains the Activity recognition model as follows：

It obtains corresponding to each history video that multiple records have in the history video and the multiple history video of personage's behavior Historical behavior label, wherein historical behavior label for identify personage's behavior whether safety；

Using each history video in the multiple history video as input, by each history in the multiple history video For historical behavior label corresponding to video as output, training obtains the Activity recognition model.

6. according to the method described in claim 5, wherein, each history video using in the multiple history video as Input, using the historical behavior label corresponding to each history video in the multiple history video as output, training obtains The Activity recognition model, including：

Execute following training step：Each history video in the multiple history video is sequentially input to initial Activity recognition Model obtains the prediction history behavior label corresponding to each history video in the multiple history video, will be the multiple The prediction history behavior label corresponding to each history video in history video and the historical behavior corresponding to the history video Label is compared, and is obtained the recognition accuracy of the initial behavior identification model, is determined whether the recognition accuracy is more than Default accuracy rate threshold value is then completed the initial behavior identification model as training if more than the default accuracy rate threshold value Activity recognition model.

7. according to the method described in claim 6, wherein, each history video using in the multiple history video as Input, using the historical behavior label corresponding to each history video in the multiple history video as output, training obtains The Activity recognition model, including：

In response to being not more than the default accuracy rate threshold value, the parameter of the initial behavior identification model is adjusted, and continue to execute The training step.

8. a kind of device for alarm, including：

People information recognition unit carries out image recognition for the video to acquisition, obtains people information；

Behavior label acquiring unit, for the video clip of the corresponding personage of people information to be imported to Activity recognition trained in advance Model obtains the behavior label of corresponding personage, and the Activity recognition model is behavior for the behavior of personage to be identified Setting behavior label；

Alarm Unit then sends out alarm signal for being non-security in response to the behavior label.

9. device according to claim 8, wherein the people information recognition unit includes：

Personal image extraction subelement, in response to there are character images in video, extracting character image；

People information identifies subelement, and the people information of the corresponding personage of character image, the people information include for identification At least one of below：Gender, height, dressing color.

10. device according to claim 8, wherein the Activity recognition model includes convolutional neural networks, cycle nerve Network and full articulamentum.

11. device according to claim 10, wherein the behavior label acquiring unit includes：

Image feature vector obtains subelement, for the video clip to be input to the convolutional neural networks, obtains described The feature vector of each frame image of video clip, wherein the convolutional neural networks are used to characterize each of video clip and video Correspondence between the feature vector of frame image；

Video feature vector obtains subelement, for the feature vector of each frame image of the video clip to be input to described follow Ring neural network obtains the feature vector of the video clip, wherein the Recognition with Recurrent Neural Network is for characterizing video clip The feature vector of correspondence between the feature vector and the feature vector of video clip of each frame image, video clip is used for table Levy the incidence relation between the feature vector of each frame image of video clip；

Behavior label obtains subelement and obtains institute for the feature vector of the video clip to be input to the full articulamentum State behavior label, wherein the full articulamentum is used to characterize the corresponding pass between the feature vector of video clip and behavior label System.

12. device according to claim 8, wherein described device includes Activity recognition model training unit, the behavior Identification model training unit includes：

Historical information obtains subelement, has the history video of personage's behavior and the multiple history video for obtaining multiple records In each history video corresponding to historical behavior label, wherein historical behavior label is for identifying whether personage's behavior is pacified Entirely；

Activity recognition model training subelement is used for using each history video in the multiple history video as input, will For historical behavior label corresponding to each history video in the multiple history video as output, training obtains the behavior Identification model.

13. device according to claim 12, wherein the Activity recognition model training subelement includes：

Activity recognition model training module, for sequentially inputting each history video in the multiple history video to initial Activity recognition model obtains the prediction history behavior label corresponding to each history video in the multiple history video, will Corresponding to the prediction history behavior label and the history video corresponding to each history video in the multiple history video Historical behavior label is compared, and is obtained the recognition accuracy of the initial behavior identification model, is determined the recognition accuracy Whether be more than default accuracy rate threshold value, if more than the default accuracy rate threshold value, then using the initial behavior identification model as The Activity recognition model that training is completed.

14. device according to claim 13, wherein the Activity recognition model training subelement further includes：

Parameter adjustment module, in response to being not more than the default accuracy rate threshold value, adjusting the initial behavior identification model Parameter, and continue to execute the training step.

15. a kind of server, including：

One or more processors；

Memory, for storing one or more programs；

Camera, for obtaining image and/or video；

When one or more of programs are executed by one or more of processors so that one or more of processors Perform claim requires any method in 1 to 7.

16. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that the program is executed by processor In Shi Shixian such as claim 1 to 7 it is any as described in method.