CN113837066A - Behavior recognition method and device, electronic equipment and computer storage medium - Google Patents

Behavior recognition method and device, electronic equipment and computer storage medium Download PDF

Info

Publication number
CN113837066A
CN113837066A CN202111109050.XA CN202111109050A CN113837066A CN 113837066 A CN113837066 A CN 113837066A CN 202111109050 A CN202111109050 A CN 202111109050A CN 113837066 A CN113837066 A CN 113837066A
Authority
CN
China
Prior art keywords
information
event
face
behavior
human body
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111109050.XA
Other languages
Chinese (zh)
Inventor
舒杰
罗春能
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Sensetime Technology Co Ltd
Original Assignee
Shenzhen Sensetime Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Sensetime Technology Co Ltd filed Critical Shenzhen Sensetime Technology Co Ltd
Priority to CN202111109050.XA priority Critical patent/CN113837066A/en
Publication of CN113837066A publication Critical patent/CN113837066A/en
Priority to PCT/CN2022/077461 priority patent/WO2023045239A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the disclosure discloses a behavior identification method, a behavior identification device, an electronic device and a computer storage medium, wherein the behavior identification method comprises the following steps: acquiring an image frame sequence; tracking at least one object detected in the image frame sequence to obtain tracking information of each object; performing behavior recognition on the at least one object based on the tracking information of each object to obtain an identifier of a behavior event in the image frame sequence; detecting object information of at least one object in the sequence of image frames participating in the behavioral event; and associating the identification of the behavior event with the object information of the at least one object to obtain a behavior recognition result.

Description

Behavior recognition method and device, electronic equipment and computer storage medium
Technical Field
The embodiments of the present disclosure relate to, but not limited to, the field of computer vision technologies, and in particular, to a behavior recognition method and apparatus, an electronic device, and a computer storage medium.
Background
With the gradual popularization of the Artificial Intelligence (AI) algorithm in smart city applications, the advantages of the algorithm gradually show: higher accuracy than the traditional algorithm, better adaptability to the scene, and the like. Because of the enormous amount of information in video, it becomes necessary to efficiently identify the information in video. In the related technology, only the detection of the object in the video is concerned, and the information in the detected result is not concerned to be associated, so that the detected object cannot be traced afterwards.
Disclosure of Invention
The embodiment of the disclosure provides a behavior identification method and device, electronic equipment and a computer storage medium.
In a first aspect, a behavior recognition method is provided, the method including: acquiring an image frame sequence; tracking at least one object detected in the image frame sequence to obtain tracking information of each object; performing behavior recognition on the at least one object based on the tracking information of each object to obtain an identifier of a behavior event in the image frame sequence; detecting object information of at least one object in the sequence of image frames participating in the behavioral event; and associating the identification of the behavior event with the object information of the at least one object to obtain a behavior recognition result.
In some embodiments, the detecting object information of at least one object in the sequence of image frames that participates in the behavioral event includes: determining an event region in each image of the sequence of image frames in which the behavioral event exists; determining object information of the at least one object from an event region in which the behavior event exists in each image.
In this way, since the event region where the behavior event exists in each image of the image frame sequence can be determined, so that at least one object can be determined in the obtained event region, the detection region for detecting the object is reduced, so that at least one object can be detected from the event region quickly, and the speed of obtaining the object information of at least one object is improved.
In some embodiments, the object information of the at least one object includes a face identification and a body identification of the at least one object; associating the identifier of the behavior event with the object information of the at least one object to obtain a behavior recognition result, including: the face identification and the human body identification belonging to the same object in the face identification and the human body identification of the at least one object are associated to obtain an association result; and associating the identifier of the behavior event with the association result to obtain the behavior identification result.
Therefore, the face identification and the human body identification belonging to the same object are associated to obtain an association result, and then the identification of the behavior event and the association result are associated to obtain a behavior recognition result, so that the face and the human body participating in the behavior event can be obtained from the behavior recognition result, and the information retrieval is more comprehensive.
In some embodiments, the associating the face identifier and the body identifier belonging to the same object in the face identifier and the body identifier of the at least one object to obtain an association result includes: determining a position relationship between each face and each human body participating in the behavioral event in each image of the image frame sequence; and associating the face identifier and the human body identifier belonging to the same object in the face identifier and the human body identifier of the at least one object based on the position relationship to obtain an association result.
Therefore, the face identification and the human body identification belonging to the same object are associated through the determined position relationship between each face and each human body participating in the behavior event, so that an association scheme of the face and the human body is provided, and the associated face and the associated human body can be accurately determined.
In some embodiments, the associating the face identifier and the body identifier belonging to the same object in the face identifier and the body identifier of the at least one object to obtain an association result includes: acquiring a face identifier and a human body identifier belonging to the same object from the face identifier and the human body identifier of the at least one object; corresponding the same label value to the face identification and the human body identification belonging to the same object; wherein; the face identifications of different objects correspond to different label values, and the body identifications of different objects correspond to different label values; and associating the face identification and the human body identification corresponding to the same label value to obtain the association result.
In this way, the face identifier and the body identifier belonging to the same object are obtained, the same label value is corresponding to the face identifier and the body identifier belonging to the same object, and the face identifier and the body identifier corresponding to the same label value are associated to obtain an association result, so that the face identifier and the body identifier belonging to the same object can be easily associated.
In some embodiments, the object information includes: face feature information and/or face attribute information; the detecting object information of at least one object in the image frame sequence participating in the behavior event includes: detecting face feature information of the at least one object participating in the behavior event in the image frame sequence; determining face attribute information of the at least one object based on the face feature information of the at least one object; the method further comprises the following steps: determining identity information of the at least one object based on the facial feature information and/or the facial attribute information of the at least one object.
Therefore, the identity information of at least one object is determined based on the face feature information and/or the face attribute information of at least one object, so that the identity information of the object participating in the action event can be determined through the face information of the object, and the object participating in the action event can be subjected to post-incident tracing.
In some embodiments, the object information includes: the detecting object information of at least one object participating in the behavior event in the image frame sequence includes: detecting human characteristic information of the at least one object participating in the behavior event in the image frame sequence; determining human body attribute information of the at least one object based on the human body feature information of the at least one object; the method further comprises the following steps: determining identity information of the at least one object based on the body characteristic information and/or the body attribute information of the at least one object.
Therefore, the identity information of the at least one object is determined based on the human body characteristic information and/or the human body attribute information of the at least one object, so that the identity information of the object participating in the action event can be determined through the human body information of the object, and the object participating in the action event can be subjected to post-incident tracing.
In some embodiments, the method further comprises: outputting alarm information under the condition that the behavior event is a preset event; the alert information includes at least one of the following shown in the sequence of image frames: the event region where the behavioral event occurs, the spatio-temporal information of the behavioral event, the face frame participating in the behavioral event object, the body frame participating in the behavioral event object, the face image in the face frame, the body image in the body frame, the attribute information of the face and/or body of the object participating in the behavioral event, and the attribute information of the object-associated family participating in the behavioral event.
Therefore, when the behavior event is a preset event, the alarm information can be output, so that the staff can determine that the preset event exists, and further can process the preset event in time.
In a second aspect, there is provided a behavior recognition apparatus, the apparatus comprising: an obtaining module, configured to obtain a sequence of image frames; the tracking module is used for tracking at least one object detected in the image frame sequence to obtain tracking information of each object; the identification module is used for carrying out behavior identification on the at least one object based on the tracking information of each object to obtain the identification of a behavior event in the image frame sequence; a detection module for detecting object information of at least one object in the sequence of image frames participating in the behavioral event; and the association module is used for associating the identification of the behavior event with the object information of the at least one object to obtain a behavior identification result.
In a third aspect, an electronic device is provided, including: a memory storing a computer program operable on the processor and a processor implementing the steps of the method when executing the computer program.
In a fourth aspect, a computer storage medium is provided that stores one or more programs executable by one or more processors to implement the steps in the above-described method.
In the embodiment of the present disclosure, since the identifier of the behavior event in the image frame sequence is obtained by performing behavior recognition on at least one object, and then the identifier of the behavior event is associated with the object information of the at least one object participating in the behavior event, the object participating in the behavior event can be traced afterwards through the behavior recognition result obtained by association.
Drawings
To more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without inventive efforts.
Fig. 1 is a schematic flow chart illustrating an implementation of a behavior recognition method according to an embodiment of the present disclosure;
fig. 2 is a schematic flow chart illustrating an implementation of another behavior identification method according to an embodiment of the present disclosure;
fig. 3 is a schematic flow chart illustrating an implementation of another behavior identification method according to an embodiment of the present disclosure;
fig. 4 is a schematic flow chart illustrating an implementation of another behavior identification method according to an embodiment of the present disclosure;
fig. 5 is a schematic flow chart illustrating an implementation of a behavior recognition method according to another embodiment of the present disclosure;
fig. 6 is a schematic flow chart illustrating an implementation of a behavior recognition method according to another embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of a behavior recognition apparatus according to an embodiment of the present disclosure;
fig. 8 is a hardware entity diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
The technical solution of the present disclosure will be specifically described below by way of examples with reference to the accompanying drawings. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.
It should be noted that: in the examples of this disclosure, "first," "second," etc. are used to distinguish similar objects and are not necessarily used to describe a particular order or sequence. Plural in the embodiments of the present disclosure means at least two.
In addition, the technical solutions described in the embodiments of the present disclosure can be arbitrarily combined without conflict. In the description of the present disclosure, "plurality" means two or more, and "multiframe" means two or more frames unless specifically limited otherwise.
In some embodiments, the following manner is provided to determine whether a behavioral event exists in a video: the method comprises the steps of firstly obtaining the gray difference value of adjacent frames, then obtaining a binary image according to the comparison of the gray difference value and a certain threshold value, and finally judging whether a behavior event exists according to an area with larger change in the binary image.
In other embodiments, the following manner is provided to determine whether a behavioral event exists in a video: the method comprises the steps of training a target detector in advance, obtaining a pedestrian coordinate frame in each frame of image in a video, classifying crowds in each frame of image according to the distance between the pedestrian coordinate frames in each frame of image, and sending the classified crowds to an abnormal behavior classifier model for processing to obtain abnormal behavior probability.
However, the two ways of determining whether there is a behavior event are based on information in neighboring images or information of the current frame, and the false detection rate is serious because the amount of information for determining whether there is a behavior event is small.
In addition, after determining that the behavior event occurs, the relevant personnel need to go to the scene to process the behavior event, and there is a lag in time, for example, after the relevant personnel arrive at the scene, it is likely that the behavior event has ended, so that the source of the object participating in the behavior event cannot be traced.
The embodiment of the disclosure provides a behavior recognition method, which utilizes a deep learning technology and fully considers the time characteristics of a video to analyze pedestrians in a video scene, so as to analyze whether a behavior event exists in the video. In addition, the basic information of the personnel participating in the action event can be reserved and stored for later use by the staff. The behavioral event may be a dangerous event, an event of fighting a shelf, an event of riding the electric vehicle without wearing a helmet, an event of speeding or an event of not marking the driving according to regulations. In an embodiment of the present disclosure, the behavior event is a fighting event. The present disclosure does not limit the specific content of the behavior event, and in other embodiments, the behavior event may be other, and the embodiments of the present disclosure do not limit this.
Fig. 1 is a schematic flow chart of an implementation of a behavior recognition method according to an embodiment of the present disclosure, as shown in fig. 1, the method is applied to a behavior recognition device, in some embodiments, the behavior recognition device may be a processor or a chip, and the processor or the chip may be applied to an electronic device. In other embodiments, the behavior recognition device may be an electronic device. The electronic device, the other devices described below, the display device, or the terminal device in the embodiments of the present disclosure may include at least one of: wearable devices such as a server, a Mobile Phone (Mobile Phone), a tablet computer (Pad), a computer with wireless transceiving function, a palm computer, a desktop computer, a personal digital assistant, a portable media player, an intelligent sound box, a navigation device, an intelligent watch, intelligent glasses, an intelligent necklace and the like, a pedometer, a digital TV, a Virtual Reality (VR) terminal device, an Augmented Reality (AR) terminal device, a wireless terminal in Industrial Control (Industrial Control), a wireless terminal in unmanned Driving (Self Driving), a wireless terminal in Remote Medical Surgery (Remote Medical Surgery), a wireless terminal in Smart Grid, a wireless terminal in Transportation Safety (Transportation Safety), a wireless terminal in Smart City (Smart City), a wireless terminal in Smart Home (Smart Home), and a vehicle, a vehicle-mounted device, or a vehicle-mounted module in a vehicle networking system. The method comprises the following steps:
s101, acquiring an image frame sequence.
In some embodiments, one or more cameras may respectively capture real-time videos, and send the captured real-time videos to the behavior recognition device, so that the behavior recognition device can obtain the real-time videos sent by one or more cameras. In other embodiments, a camera may be disposed on the behavior recognition device, and the real scene is photographed by the camera on the behavior recognition device, so as to obtain a real-time video. In still other embodiments, the behavior recognition device may receive a video that is pre-stored by the other device or retrieved from its own memory.
After the behavior recognition device obtains the real-time video or the pre-stored video, the real-time video or the pre-stored video can be intercepted at intervals of a set time length, so that the behavior recognition device can obtain the video with the set time length at intervals of the set time length and determine an image frame sequence based on the video with the set time length. The set time length may be a fixed time length, or the set time length may be a time length that varies based on the current time and/or the people and/or the flow in the current shooting scene. The set time period may range from 1 second to 10 minutes, for example, the set time period may be 1 second, 3 seconds, 1 minute, or 10 minutes. In some embodiments, the number of images included in different sequences of video frames is the same.
Therefore, the image frame sequence is acquired at intervals, so that the behavior recognition result of one image frame sequence is acquired at intervals, whether a behavior event occurs in the image frame sequence is determined through one image frame sequence, the time information and the space information of the image frame sequence are fully considered, and the reliability of the acquired behavior recognition result is improved.
Determining a sequence of image frames based on the video of a set duration, which in some embodiments may be a sequence of image frames; in other embodiments, in order to reduce the amount of calculation of the behavior recognition device, a set proportion of images or a set number of images may be extracted from the video with a set duration, and the set proportion of images or the set number of images may be determined as the image frame sequence.
In the case where the behavior recognition device obtains real-time videos sent by multiple cameras, in some embodiments, the real-time videos sent by the multiple cameras may be processed in parallel; in other embodiments, the real-time video transmitted by each camera may be processed in turn.
S102, tracking at least one object detected in the image frame sequence to obtain tracking information of each object.
The objects of the at least one object may be objects of the same attribute, e.g., at least one object is a person. Alternatively, the objects of the at least one object may be objects of different attributes, e.g., the at least one object may include at least one of a person, an animal, a vehicle, etc.
An implementation of the foregoing S102 may include: and detecting the object in each image in the image frame sequence in sequence to obtain the tracking information of each object. Wherein the tracking information of each object may include at least one of: a coordinate frame of each object in the sequence of image frames, a screenshot of the coordinate frame of each object, etc. The object may include a human face and/or a human body. There may be a plurality of coordinate frames per object, and the number of coordinate frames per object may be the same as the number of detected occurrences of each object in the image frame sequence.
S103, performing behavior recognition on the at least one object based on the tracking information of each object to obtain the identifier of the behavior event in the image frame sequence.
In some embodiments, a deep neural network may be obtained first, the tracking information of each object is input to the deep neural network, and the deep neural network may perform behavior recognition on the at least one object, so as to obtain a detection result of whether a behavior event exists in the image frame sequence. In case the detection result characterizes the presence of a behavioral event in the sequence of image frames, an identification of the behavioral event may be determined. For example, in the case that the behavioral event is a shelving event, the deep neural network may be a human posture estimation network through which it is determined whether there is a shelving event. In some embodiments, the deep neural network may analyze each image in the sequence of image frames and obtain at least one of whether there is an overhead event in each image, the number of people participating in the overhead event, the severity of the overhead event, and the like, and determine whether there is a behavioral event in the sequence of image frames based on at least one of the ratio of the images in the sequence of image frames in which there is an overhead event to all the images in the sequence of image frames, the number of people participating in the overhead event in each image in which there is an overhead event, and the severity of the overhead event in each image in which there is an overhead event.
In this way, since the deep neural network is obtained by training, the deep neural network obtains: the detection result of whether the behavior event exists in the image frame sequence can enable the detection result of whether the behavior event exists to be accurate.
Taking the action event as an example of a fighting event, in other embodiments, an image of each object (e.g., an image of a human body or an image of a human face) may be extracted from the image frame sequence, and then at least one of the number of arm-ups, the amplitude of arm-swings, the number of leg-ups, the amplitude of leg-ups, age information of the human face, expression information of the human face, wound information of the human face, and the like of the object may be determined based on the extracted image of the human body to determine whether each object participates in the action event.
The identification of the behavioral event may correspond to a sequence of image frames. In some embodiments, the identification of the behavioral event may be determined based on at least one of a capture timestamp of the sequence of image frames, real capture location information of the sequence of image frames, numbering information of the camera, and the like. That is, the image frame sequences are different, the behavior events are different in identification, and a one-to-one correspondence relationship exists between the image frame sequences and the identification of the behavior events.
And S104, detecting object information of at least one object participating in the behavior event in the image frame sequence.
The at least one object participating in the behavioral event may include: at least one human face and/or at least one human body participating in a behavioral event. In some embodiments, by detecting at least one object participating in a behavioral event in the sequence of image frames, an object box of the at least one object may be obtained (the object box of the object comprises a coordinate box of the object box and/or a coordinate box screenshot of the object box), for example, the object box of the at least one object may comprise at least one of: the method comprises the steps of at least one face frame of a face, at least one face frame screenshot of the face, at least one human body frame of a human body and at least one human body frame screenshot of the human body.
The object frame in the embodiment of the present disclosure may be coordinate information of the object frame, the face frame may be coordinate information of the face frame, and the body frame may be coordinate information of the body frame. The screenshot of the object frame in the embodiment of the disclosure may be an image corresponding to the object frame, the screenshot of the face frame may be an image corresponding to the face frame, and the screenshot of the body frame may be an image corresponding to the body frame.
The at least one object participating in the behavioral event is: at least one object participating in a behavioral event is determined from the sequence of image frames.
The object information of each object of the at least one object may include at least one of: position information of each object and/or an object frame of each object in each image in the image frame sequence, feature information of each object, attribute information of each object, identification information of each object, and an object tag value of each object.
In the disclosed embodiment, the objects may include faces and/or bodies, the at least one object may include at least one face and/or at least one body, and each object may include each face and/or each body.
And S105, associating the identification of the behavior event with the object information of the at least one object to obtain a behavior identification result.
In some embodiments, the behavior recognition results may also be stored. For example, the behavior recognition result may be sent to a storage device to cause the storage device to store the behavior recognition result. The storage device may be a device separate from the behavior recognizing apparatus, for example, the storage device may be a distributed storage device. In other embodiments, storing the behavior recognition results may include: the behavior recognizing result is stored in the behavior recognizing means.
In the case of retrieving the behavior recognition result, at least one of the following may be obtained: identification of a behavioral event, object information of at least one object. In some embodiments, the field information to be retrieved may include at least one of: the time interval information to be retrieved, the address information to be retrieved, the face information to be retrieved, the human body information to be retrieved and the name information to be retrieved.
In some embodiments, the behavior recognition result may be used to output at least one of the following if the obtained field information to be retrieved corresponds to the identification of the behavior event: identification of a behavioral event, object information of at least one object.
In some embodiments, the object identifications of the same object in different images in the sequence of image frames are the same. In other embodiments, the object identification of the same object in different images in the sequence of image frames is different.
The identification of one behavior event can be associated with the object information of at least one object to obtain associated information, and the associated information can represent the association relationship between the identification of the behavior event and the identification of each object, so that each object participating in the behavior event in the image frame sequence can be easily determined based on the association relationship and the identification of the behavior event.
For example, for a certain image frame sequence, a unique identifier of a behavior event corresponding to the image frame sequence is generated, the unique identifier of the behavior event may be denoted as le, and the association information may be included in an association (associations1) of the event. For example, associations1 { { le, lf { (le, lf) }i},{le,lpj},...}. Wherein the associated information includes: { le, lfiAnd { le, lp }jAnd so on. Wherein, lfiRepresenting the ith personal face, lp, of a person participating in a behavioral event in a sequence of image framesjRepresenting the jth individual in the sequence of image frames participating in the behavioral event.
In the embodiment of the present disclosure, since the identifier of the behavior event in the image frame sequence is obtained by performing behavior recognition on at least one object, and then the identifier of the behavior event is associated with the object information of the at least one object participating in the behavior event, the object participating in the behavior event can be traced afterwards through the behavior recognition result obtained by association.
Fig. 2 is a schematic flow chart of another implementation of a method for storing an event according to an embodiment of the present disclosure, as shown in fig. 2, the method is applied to a storage device for an event, and the method includes:
s201, acquiring an image frame sequence.
S202, tracking at least one object detected in the image frame sequence to obtain tracking information of each object.
S203, performing behavior recognition on the at least one object based on the tracking information of each object to obtain the identifier of the behavior event in the image frame sequence.
S204, determining an event area in each image of the image frame sequence, wherein the event area exists in the behavior event.
S205, determining object information of the at least one object from the event area of each image in which the behavior event exists.
In some embodiments, the tracking information of each object is input to a deep neural network, and the following information can be obtained through the deep neural network: an event region in each image of the sequence of image frames where a behavioral event exists. The behavior recognition means may then determine at least one object from the event region in each image in which the behavior event exists.
In some embodiments, the behavior recognizing means may determine all objects in an event area where the behavior event exists in each image as the at least one object. For example, in the case where one face and two bodies are detected through the event area, the one face and the two bodies may be determined as at least one object. In other embodiments, to make the determined objects participating in the behavioral event accurate, an event region in each image where the behavioral event exists may be identified to obtain at least one object.
S206, associating the identification of the behavior event with the object information of the at least one object to obtain a behavior recognition result.
In the embodiment of the present disclosure, since the event region where the behavior event exists in each image of the image frame sequence can be determined, so that at least one object can be determined in the obtained event region, the detection region for detecting the object is reduced, so that at least one object can be quickly detected from the event region, and the speed of obtaining the object information of at least one object is improved.
Fig. 3 is a schematic flow chart of an implementation of another behavior identification method provided in an embodiment of the present disclosure, and as shown in fig. 3, the method is applied to a behavior identification device, and the method includes:
s301, acquiring an image frame sequence.
S302, tracking at least one object detected in the image frame sequence to obtain tracking information of each object.
S303, performing behavior recognition on the at least one object based on the tracking information of each object to obtain the identifier of the behavior event in the image frame sequence.
S304, detecting object information of at least one object participating in the behavior event in the image frame sequence.
Wherein the object information of the at least one object comprises a face identifier and a body identifier of the at least one object.
S305, the face identification and the human body identification belonging to the same object in the face identification and the human body identification of the at least one object are associated to obtain an association result.
In some embodiments, S305 may be implemented by: determining a position relationship between each face and each human body participating in the behavioral event in each image of the image frame sequence; and associating the face identifier and the human body identifier belonging to the same object in the face identifier and the human body identifier of the at least one object based on the position relationship to obtain an association result.
The behavior recognition means may determine a face frame of each face participating in the behavior event, position information in each image, and a body frame of each body participating in the behavior event in each body, position information in each image, and then determine a positional relationship between each face participating in the behavior event and each body based on the two position information.
By the method, the face identification and the human body identification belonging to the same object are associated through the determined position relationship between each face and each human body participating in the behavior event, so that an association scheme of the face and the human body is provided, and the associated face and the associated human body can be accurately determined.
In some embodiments, the associating the face identifier and the body identifier belonging to the same object in the face identifier and the body identifier of the at least one object to obtain an association result may include: acquiring a face identifier and a human body identifier belonging to the same object from the face identifier and the human body identifier of the at least one object; corresponding the same label value to the face identification and the human body identification belonging to the same object; wherein; the face identifications of different objects correspond to different label values, and the body identifications of different objects correspond to different label values; and associating the face identification and the human body identification corresponding to the same label value to obtain the association result.
In some embodiments, the object information of the object may include a tag value of a human face and/or a tag value of a human body. The label value of the face belonging to the same person is the same as the label value of the human body, or the label value of the face belonging to the same person has a mapping relation with the label value of the human body.
In some embodiments, the identity of the behavior event, the identity of the at least one human face, and the tag value of the human face may be determined as the behavior recognition result. In other embodiments, the identity of the behavioral event, the identity of the at least one human body, and the tag value of each designated human body may be determined as a behavioral recognition result. In still other embodiments, the identity of the behavior event, the identity of the at least one face, the tag value of the at least one face, the identity of the at least one person, and the tag value of the at least one person may be determined as the behavior recognition result.
In this way, the face identifier and the body identifier belonging to the same object are obtained, the same label value is corresponding to the face identifier and the body identifier belonging to the same object, and the face identifier and the body identifier corresponding to the same label value are associated to obtain an association result, so that the face identifier and the body identifier belonging to the same object can be easily associated.
S306, correlating the identification of the behavior event with the correlation result to obtain the behavior recognition result.
In some embodiments, the association information may be obtained by associating the identifier of the behavior event with the association result.
In some embodiments, the behavior recognition device may obtain face information corresponding to each of all the face identifiers, and determine whether a face tag value exists in the face information of the face, or determine that the face has an associated human body when it is determined that the face tag value in the face information of the face is not zero, so that the face identifier and the human body identifier belonging to the same object can be associated to obtain an association result. In other embodiments, the behavior recognition device may determine whether a face frame corresponding to each of all faces is included in another frame, and if so, determine that the face has an associated human body, and the associated human body is a human body in another frame, so that a face identifier and a human body identifier belonging to the same object can be associated to obtain an association result; and in the case of no, determining that the human face has no associated human body. One scenario in which there is no human body associated with a face is that the human body is occluded so that the human body corresponding to the face cannot be identified.
For example, the association information includes { le, lf }1},{le,lf2},{le,lf3And determining each designated face identification with associated person as lf1And lf3In the case of (2), the face identification lf is determined1The associated human body identification is lp1Determining the face identity lf3The associated human body identification is lp2Identify the face if1Is marked with the human body as lp1Correlating to obtain { lf1,lp1}, identifying the face as lf3Is marked with the human body as lp2Correlating to obtain { lf3,lp2}。
In some embodiments, the association result may be included in an association (associations2) of a human face, where 2 { { le, lf { (le, lf) { (la, lf) } indicates that the association result is associated with the human facei},{lfi,lpj}},{lfi,lpjSign lf of characterizing faceiIdentification lp with human bodyjAnd (6) associating. In some embodiments, the association result may be associations2 or { lf {i,lpj}。
In some embodiments, the behavior recognition device may obtain human body information corresponding to each of all human body identifiers, and determine whether a human body tag value exists in the human body information of the human body, or determine that a related face exists in the human body when it is determined that the human body tag value in the human body information of the human body is not zero, so that a face identifier and a human body identifier belonging to the same object can be related to obtain a related result. In other embodiments, the behavior recognition device may determine that a human body frame corresponding to each of all human bodies includes another frame, and if so, determine that the human body has an associated face, and the associated face is a face in the another frame, so that a face identifier and a human body identifier belonging to the same object can be associated to obtain an association result; and in the case of no, determining that the human body has no associated human face. One scenario in which a human body does not have an associated face is that the face is occluded, and thus the face corresponding to the human body cannot be identified.
For example, the association information includes { le, lp1},{le,lp2},{le,lp3And determining each designated body identity with an associated face as lp1And lp2In the case of (2), the human body identification lp is determined1The associated face identification is lf1Determining the identity lp of the body2The associated face identification is lf3Identify the human body by lp1Is marked with the human face as lf1Correlating to obtain { lf1,lp1}, marking the human body with lp2Is marked with the human face as lf3Correlating to obtain { lp2,lf3}。
In some embodiments, the association result may be included in an association (associations3) of a human body, where 3 { { le, lp } indicates that the association result is a human bodyj},{lpj,lfi}},{lpj,lfiSign lp for characterizing human bodyjIdentity lf with a human faceiAnd (6) associating. In some embodiments, the association result may be associations3 or { lp }j,lfi}。
In the embodiment of the disclosure, the face identifier and the body identifier belonging to the same object are associated to obtain an association result, and then the identifier of the behavior event and the association result are associated to obtain a behavior recognition result, so that the face and the body participating in the behavior event can be obtained from the behavior recognition result, and the information retrieval is more comprehensive.
Fig. 4 is a schematic flow chart illustrating an implementation process of another behavior identification method provided in the embodiment of the present disclosure, as shown in fig. 4, in the embodiment of the present disclosure, the object information includes: the method is applied to a behavior recognition device and comprises the following steps:
s401, acquiring an image frame sequence.
S402, tracking at least one object detected in the image frame sequence to obtain tracking information of each object.
S403, performing behavior recognition on the at least one object based on the tracking information of each object to obtain the identifier of the behavior event in the image frame sequence.
S404, detecting the face feature information of the at least one object participating in the behavior event in the image frame sequence.
S405, determining face attribute information of the at least one object based on the face feature information of the at least one object.
S406, determining identity information of the at least one object based on the face feature information and/or the face attribute information of the at least one object.
In some embodiments, in the case that the identity information of the at least one object is determined, the identity information of the at least one object may be output, so that the relevant person can quickly determine the identity of the object participating in the behavioral event based on the identity information of the at least one object.
S407, associating the identifier of the behavior event with the object information of the at least one object to obtain a behavior recognition result.
In some embodiments, at least one of face location information, face feature information, face attribute information, and an identification of at least one face is included in the object information of the object. The face location information may be: face position information of at least one face in each image of the sequence of image frames.
The face position information of the at least one face in each image may be position information of a face frame of the at least one face in each image. For example, the position information of the face frame of at least one face in each image may include: and the position information of the upper left corner and the lower right corner of the face frame of at least one face in each image.
By determining the face feature information of at least one face, the feature information of each face in the at least one face may be matched with one or more images in the face library to determine at least one of a name, a school, a class, a work unit, an identity, attribute information of a family member, and the like of a person to which each face in the at least one face belongs. The attribute information of the family may include at least one of a name, a gender, a relationship with an object participating in the behavioral event, a contact address, and the like of the family.
The attribute information of the face may include one of: whether glasses are worn, whether a mask is worn, age, sex, attribute information of five sense organs and the like.
In the embodiment of the disclosure, the identity information of at least one object is determined based on the face feature information and/or the face attribute information of at least one object, so that the identity information of the object participating in the behavior event can be determined through the face information of the object, and further the object participating in the behavior event can be subjected to post-incident tracing.
Fig. 5 is a schematic flow chart illustrating an implementation process of a behavior recognition method according to another embodiment of the present disclosure, as shown in fig. 5, in an embodiment of the present disclosure, object information includes: the method is applied to a behavior recognition device and comprises the following steps:
s501, acquiring an image frame sequence.
S502, tracking at least one object detected in the image frame sequence to obtain tracking information of each object.
S503, performing behavior recognition on the at least one object based on the tracking information of each object to obtain the identifier of the behavior event in the image frame sequence.
S504, detecting human body feature information of the at least one object participating in the behavior event in the image frame sequence.
And S505, determining human body attribute information of the at least one object based on the human body feature information of the at least one object.
S506, determining identity information of the at least one object based on the human body characteristic information and/or the human body attribute information of the at least one object.
S507, associating the identification of the behavior event with the object information of the at least one object to obtain a behavior recognition result.
In some embodiments, at least one of the human body position information, the human body feature information, the human body attribute information, the identification of the at least one human body is included in the object information of the object. The human body position information may be: and human body position information of at least one human body in each image of the image frame sequence.
The human body position information of the at least one human body in each image may be position information of a human body frame of the at least one human body in each image. For example, the position information of the body frame of the at least one body in each image may include: and position information of the upper left corner and the lower right corner of the human body frame of at least one human body in each image.
By determining the human body feature information of at least one human body, the feature information of each human body in the at least one human body can be matched with one or more images in the human body library, and at least one of the name, school, class, work unit, identity, attribute information of family members and the like of the person to which each human body in the at least one human body belongs is determined.
The attribute information of the human body may include one of: height, body type, weight, wearing information, etc.
In the embodiment of the disclosure, the identity information of at least one object is determined based on the human body characteristic information and/or the human body attribute information of at least one object, so that the identity information of the object participating in the behavior event can be determined through the human body information of the object, and further the object participating in the behavior event can be subjected to post-incident tracing.
Fig. 6 is a schematic flow chart of an implementation of a behavior recognition method according to another embodiment of the present disclosure, as shown in fig. 6, the method is applied to a behavior recognition apparatus, and the method includes:
s601, acquiring an image frame sequence.
S602, tracking at least one object detected in the image frame sequence to obtain tracking information of each object.
S603, performing behavior recognition on the at least one object based on the tracking information of each object to obtain the identifier of the behavior event in the image frame sequence.
S604, detecting object information of at least one object participating in the behavior event in the image frame sequence.
S605, associating the identification of the behavior event with the object information of the at least one object to obtain a behavior recognition result.
And S606, outputting alarm information under the condition that the behavior event is a preset event.
The predetermined event may be a predetermined dangerous event, for example, the predetermined event may include at least one of: the event of getting up, the event of riding the electric vehicle without wearing a helmet, the event of driving over speed and the event of driving without marking according to the regulations.
In some embodiments, outputting the alert information may include: outputting alarm information to a designated device, wherein the designated device comprises at least one of the following: the system comprises a display device, a terminal device of a worker in an area where the action event occurs, and a terminal device of a family member of an object participating in the action event.
Wherein the alert information comprises at least one of the following shown in the sequence of image frames: the event region where the behavioral event occurs, the spatio-temporal information of the behavioral event, the face frame participating in the behavioral event object, the body frame participating in the behavioral event object, the face image in the face frame, the body image in the body frame, the attribute information of the face and/or body of the object participating in the behavioral event, and the attribute information of the object-associated family participating in the behavioral event.
The alarm information comprises at least one of a face frame of an object participating in the behavior event, a body frame of the object participating in the behavior event and an event area where the behavior event occurs, so that at least one of the face frame, the body frame and the event area where the behavior event occurs can be correspondingly displayed on the displayed real-time video, and a worker can easily know that the behavior event occurs through the displayed real-time video.
In the embodiment of the disclosure, when the behavior event is a preset event, the alarm information may be output, so that a worker can determine that the preset event exists, and further can process the preset event in time.
In some embodiments, the camera collects a video stream and sends the collected video stream to the server, so that the server performs the steps of the behavior recognition method provided by the embodiments of the present disclosure, and the server may display an alarm of a behavior event through an alarm page when it is determined that the behavior event occurs in a certain image frame sequence. In case a behavioural event occurs for a plurality of consecutive image frame sequences, an alert may be output. In some embodiments, after the event duration reaches a first preset duration, outputting an alarm once, and outputting a plurality of detection results of key frames spaced by a preset second duration.
The behavior identification method provided by the embodiment of the disclosure can be realized by the following modules: the system comprises a video decoding module, a target detection tracking module, a shelving detection module, a human face and human body matching module, a characteristic attribute extraction module, an event output module and an alarm display module. Wherein the modules may be provided on one device or on different devices.
The video decoding module is used for decoding the accessed video stream. The target detection tracking module is used for detecting and tracking pedestrians in the video stream. The framing detection module is configured to input the image frame sequence into a human font estimation modeling (i.e., the deep neural network described above) to obtain an estimation of the behavior event. The human body detection module of people's face is used for having the pedestrian who puts up the region and carrying out the human detection of people's face. The human body matching module is used for matching human bodies detected by the human bodies. The characteristic attribute extraction module is used for extracting characteristic information and attribute information of human faces and human bodies. The event output module is used for outputting the event occurrence area, the event and the human face information participating in the event to the alarm display module. The alarm display module is used for displaying an alarm.
The target detection tracking module is used for detecting pedestrians appearing in the video, a human body detection algorithm is adopted, an image detection algorithm based on deep learning is used in the embodiment of the disclosure, the detection rate is high compared with that of a traditional algorithm, the target detection tracking module can deal with the human body in a complex environment, and the human body split objects are tracked.
The framing detection module is used for inputting the detected human body into the human body posture estimation model, taking the human body event sequence in the video as input, outputting an estimation result of a framing event at intervals, fully considering the time and space information of the video and improving the reliability of result output.
The human face and human body matching module is used for associating the detected human face and the human body which belong to the same object through a space geometric relationship.
And the event output module is used for organizing and outputting human face characteristics, attributes and other information participating in the shelving event to a specified position for downstream storage or later retrieval.
The alarm display module is used for displaying the occurrence of the shelving area by a World Wide Web (Web) page.
The human face and human body matching module finds the corresponding relation between the human face set and the human body set of the pedestrian in the framed region. For a set of faces { f1,f2,...,fnAnd a set of persons { p }1,p2,...,pmThat each item in the face set has 0 or 1 correspondence in the human body set to an element in the face set, such as fiGenerating a unique mark, denoted as lfi(ii) a Also for elements in the human body set, such as pjGenerating a unique mark, denoted lpj. If for a face fiThere is a corresponding human body pjThen f will beiIn which a new mark mf is addediAnd make mfi=lpjSimilarly at pjIn which a new mark mp is addedjTo make mpj=lfi
The human face and human body attribute detection module is used for extracting attributes and characteristics of the detected human face set and the detected human body set. For a set of faces { f1,f2,...,fnEach of the elements, such as fiExtracting corresponding features feiAnd attributes of a human face faiWait for basic information, thus forming a new set { F1,F2,...,FnIn which Fi={fi,fei,fai,lfi,mfi} (if mfiPresence), the same set of reason { P1,P2,...,PnIn which P isi={pi,pei,pai,lpi,mpi} (if mpiPresent).
The event output module is used for integrally packaging an event, when the event of fighting is detected by the fighting detection module, an event E is generated, a unique mark is generated for the event E and is marked as le, and when the event is output, an association item is set, wherein all human faces and human bodies related to the event are contained, namely the association item in the event E is { { le, lf { (le, lf) }i},{le,lpj}, for faces participating in the event, assume FiThe association items are set, which include the event itself and the corresponding human body, i.e. FiIs { { le, lf { (le, lf) } ini},{lfi,lpjFor the human body P like thisjOf which P isjIs { { le, lp { } { (le, lp)j},{lpj,lfi}}. The association of the event and the human face is completed, wherein the human face, the human body and the event are respectively output and stored. And the event and the index of the human face and the human body can be completed according to the incidence relation.
The pedestrian trajectory is detected by deep learning, the pedestrian model behavior modeling is analyzed by utilizing the image frame sequence, the pedestrian in the video can be accurately analyzed, and the accurate behavior prediction result can be output.
The embodiment of the disclosure outputs the whole shelf event, and completes the whole event closed loop. The behavior recognition method comprises the steps of starting tracking when participants appear, starting recognizing behaviors when a shelving event appears, outputting the shelving event, outputting basic information of the participants who participate in the shelving event, and finally storing the information for the staff to use.
The precaution provided by the embodiment of the present disclosure can be applied to various places where the camera exists, including but not limited to traffic places, shopping malls, stations, schools or squares.
The embodiment of the present disclosure may further provide a behavior recognition method, where the method is applied to a behavior recognition device, and after obtaining a behavior recognition result, the method may further include: acquiring field information to be retrieved; and acquiring the identifier of the target behavior event corresponding to the field information to be retrieved and/or the object information participating in the target behavior event object from the behavior recognition result set. In some embodiments, an identification of the target behavioral event, and/or object information of the participating target behavioral event object may also be output.
In some embodiments, the behavior recognition device may include a display screen through which a worker inputs the field information to be retrieved, so that the behavior recognition device obtains the field information to be retrieved. In other embodiments, the staff member inputs the field information to be retrieved through a terminal device different from the behavior recognition device, and the terminal device sends the field information to be retrieved to the behavior recognition device, so that the behavior recognition device obtains the field information to be retrieved.
The field information to be retrieved includes: time interval information to be retrieved and address information to be retrieved.
In some embodiments, obtaining the field information to be retrieved may be implemented by: and responding to the input time interval information to be retrieved and the address information to be retrieved to obtain the time interval information to be retrieved and the address information to be retrieved.
The period information may be continuous certain period information or discontinuous at least two period information.
In this embodiment, the staff member may directly input the time period information to be retrieved and the address information to be retrieved on the display screen of the behavior recognizing device or the display screen of the terminal device, so that the behavior recognizing device obtains the time period information to be retrieved and the address information to be retrieved. The address information may be, for example, a literature road, or an intersection between the literature road and a construction road.
In other embodiments, obtaining the field information to be retrieved may be implemented by: responding to the input time interval information to be retrieved, and outputting at least one address information matched with the time interval information to be retrieved; and responding to the triggering operation of the address information to be retrieved in the at least one piece of address information to obtain the address information to be retrieved.
In this embodiment, after the staff inputs the time period information to be retrieved on the display screen of the behavior recognition device or the display screen of the terminal device, at least one piece of address information of the behavior event occurring in the time period information to be retrieved may be popped up on the display screen, and the staff may select the address information to be retrieved from the displayed at least one piece of address information.
The address information to be retrieved in the embodiments of the present disclosure may be one or more address information. In some embodiments, the at least one address information may also relate to the authority of the account on which the worker logs in.
By the method, the staff can input the time period information to be retrieved first, so that the staff can determine which addresses are subjected to the action events based on the time period information to be retrieved, the staff can select the address information to be retrieved concerned by the staff according to the displayed at least one piece of address information, and further view the object information of the object participating in the event, wherein the object information exists in the address information to be retrieved.
In still other embodiments, obtaining the field information to be retrieved may be implemented by: responding to the input address information to be retrieved; outputting at least one period of information matched with the address information to be retrieved; and responding to the triggering operation of the time interval information to be retrieved in the at least one time interval information to obtain the time interval information to be retrieved.
In this way, the identification of the target behavior event acquired by the behavior recognition device is related not only to the time period information to be retrieved, which is of interest to the worker, but also to the address information to be retrieved, which is of interest to the worker, so that the worker can easily retrieve the object information related to the behavior event object in the time period to be retrieved and the address information to be retrieved, which are of interest to the worker, from a large amount of information.
The behavior recognition result set may correspond to a plurality of image frame sequences. The behavior recognition result set may include behavior recognition results of at least one image frame sequence, the behavior recognition results of each image frame sequence including: the method comprises the steps of identifying the occurrence of a behavior event in an image frame sequence and object information of at least one object participating in the behavior event.
It should be noted that the identifiers of different behavior events may all represent identifiers of fighting events.
The identification of the at least one behavior event may correspond one-to-one to the at least one image frame sequence. The identification of the target behavioral event may be one or more. For example, in the case where the time duration corresponding to the input period information is one hour and the set time duration is one minute, the obtained identifiers of the target behavior events are 60.
In the embodiment of the disclosure, by acquiring the identifier of the target behavior event corresponding to the field information to be retrieved and/or the object information of the object participating in the target behavior event, the staff can easily obtain the behavior event or the object information corresponding to the field information to be retrieved, which is of interest.
In some embodiments, the object information of the participation target behavior event object includes: face information corresponding to the identifications of all faces associated with the identification of the target behavior event, and/or body information corresponding to the identifications of all bodies associated with the identification of the target behavior event. In some embodiments, after obtaining the behavior recognition result, the method may further include: acquiring field information to be retrieved; acquiring the identifier of a target behavior event corresponding to the field information to be retrieved from the behavior recognition result set; acquiring all face identifications and/or all human body identifications associated with the identification of the target behavior event from a plurality of associated information stored in the behavior recognition result set; acquiring face information corresponding to the identifications of all faces and/or human body information corresponding to the identifications of all human bodies from the object information in the behavior recognition result set; and outputting the identification of the target behavior event and/or the object information of the object participating in the target behavior event. The plurality of association information may include association information between an identity of each event and an identity of the object, e.g., a plurality of association informationThe information may include the following: { le1, lfi},{le1,lpj},{le2,lfi},{le2,lpjAnd so on. Wherein, { le1, lfiAn identity le1 and a face identity lf representing a behavioral event 1iAn association between them; { le1, lpjAn identity le1 and a face identity lp representing a behavior event 1jAn association between them; { le2, lfiAn identity le2 and a face identity lf representing a behavior event 2iAn association between them; { le2, lpjAn identity le2 and a face identity lp representing a behavior event 2jAn association between them. Since each piece of association information includes the association information between the identifier of the behavior event and the identifier of the object, the identifiers of all faces and/or all human bodies associated with the identifier of the target behavior event can be obtained based on the plurality of pieces of association information.
By storing the object information of at least one object in the behavior recognition device or the storage device, the face information corresponding to the identifications of all faces and the body information corresponding to the identifications of all bodies can be obtained by the identifications of all faces and/or the identifications of all bodies.
In the embodiment of the present disclosure, the stored multiple pieces of association information are used to obtain the identifications of all faces and/or the identifications of all human bodies associated with the identification of the target behavior event, so that the face information corresponding to the identifications of all faces and/or the human body information corresponding to the identifications of all human bodies can be easily obtained.
In some embodiments, the object information of the participation target behavior event object includes: and human body information corresponding to the human body identification related to the designated human face in all the human faces. In some embodiments, after obtaining the behavior recognition result, the method may further include: acquiring field information to be retrieved; acquiring the identifier of a target behavior event corresponding to the field information to be retrieved from the behavior recognition result set; acquiring identifications of all human faces associated with the identification of the target behavior event from a plurality of associated information stored in the behavior recognition result set; acquiring face information corresponding to the identifications of all faces from object information in the behavior recognition result set; acquiring the identification of the human body related to the appointed face in all the faces from the correlation result stored in the behavior recognition result set; designating the face as a face having an associated body; acquiring human body information corresponding to the identification of the associated human body from the object information stored in the behavior recognition result set; and outputting the identification of the target behavior event and/or the object information of the object participating in the target behavior event.
In other embodiments, after obtaining face information corresponding to the identifications of all faces from the object information stored in the behavior recognition result set, the following steps may be further performed: acquiring the identification of the face associated with the specified human body in all human bodies from the association result stored in the behavior recognition result set; designating the human body as a human body having an associated face; acquiring face information corresponding to the identification of the associated face from the object information stored in the behavior recognition result set; and outputting the identification of the target behavior event and/or the object information of the object participating in the target behavior event.
The association result may include an identification of a human body associated with an identification of any human face, for example, the association result may include associations2 { { le, lf { (le, lf) }i},{lfi,lpj} or { lfi,lpjAnd thus, the identification of the human body with which the specified face is associated can be determined based on the association result.
In the embodiment of the disclosure, not only the face information corresponding to the identifications of all the faces but also the body information corresponding to the identifications of the associated bodies are obtained, so that the worker can learn not only the face information of the behavior event but also the body information matched with the faces participating in the behavior event, and further can learn the objects participating in the behavior event more comprehensively.
The association result may include an identification of a face associated with an identification of any human, for example, the association result may include associations3 { { le, lpj},{lpj,lfi} or { lp }j,lfiAnd thus, the identification of the face associated with the specified human body can be determined based on the association result.
In the embodiment of the disclosure, not only the human body information corresponding to the identifications of all human bodies but also the face information corresponding to the identifications of the associated faces are obtained, so that the worker can learn not only the human body information of the behavior participation event but also the face information matched with the human body of the behavior participation event, and further the worker can learn the object of the behavior participation event more comprehensively.
Based on the foregoing embodiments, the present disclosure provides a behavior recognition apparatus, which includes modules that can be implemented by a processor in a terminal device.
Fig. 7 is a schematic structural diagram of a component of a behavior recognition apparatus according to an embodiment of the present disclosure, and as shown in fig. 7, a behavior recognition apparatus 700 includes: an obtaining module 701, configured to obtain a sequence of image frames; a tracking module 702, configured to track at least one object detected in the image frame sequence to obtain tracking information of each object; a recognition module 703, configured to perform behavior recognition on the at least one object based on the tracking information of each object, so as to obtain an identifier of a behavior event in the image frame sequence; a detecting module 704 for detecting object information of at least one object in the sequence of image frames participating in the behavior event; the associating module 705 is configured to associate the identifier of the behavior event with the object information of the at least one object to obtain a behavior recognition result.
In some embodiments, the detecting module 704 is further configured to determine an event region in each image of the image frame sequence where the behavior event exists; determining object information of the at least one object from an event region in which the behavior event exists in each image.
In some embodiments, the object information of the at least one object includes a face identification and a body identification of the at least one object; the association module 705 is further configured to associate the face identifier and the human body identifier belonging to the same object among the face identifiers and the human body identifiers of the at least one object, so as to obtain an association result; and associating the identifier of the behavior event with the association result to obtain the behavior identification result.
In some embodiments, the association module 705 is further configured to determine, in each image of the sequence of image frames, a positional relationship between each face and each human body participating in the behavioral event; and associating the face identifier and the human body identifier belonging to the same object in the face identifier and the human body identifier of the at least one object based on the position relationship to obtain an association result.
In some embodiments, the association module 705 is further configured to obtain, from the face identifier and the body identifier of the at least one object, a face identifier and a body identifier that belong to the same object; corresponding the same label value to the face identification and the human body identification belonging to the same object; wherein; the face identifications of different objects correspond to different label values, and the body identifications of different objects correspond to different label values; and associating the face identification and the human body identification corresponding to the same label value to obtain the association result.
In some embodiments, the object information includes: face feature information and/or face attribute information; a detecting module 704, further configured to detect facial feature information of the at least one object participating in the behavior event in the image frame sequence; determining face attribute information of the at least one object based on the face feature information of the at least one object; the behavior recognition apparatus 700 further includes: a determining module 706, configured to determine identity information of the at least one object based on the facial feature information and/or the facial attribute information of the at least one object.
In some embodiments, the object information includes: a human body feature information and/or human body attribute information, the detecting module 704 is further configured to detect human body feature information of the at least one object participating in the behavior event in the image frame sequence; determining human body attribute information of the at least one object based on the human body feature information of the at least one object; a determining module 706, further configured to determine identity information of the at least one object based on the human characteristic information and/or the human attribute information of the at least one object.
In some embodiments, the behavior recognition device 700 further includes: an output module 707, configured to output alarm information when the behavior event is a preset event; the alert information includes at least one of the following shown in the sequence of image frames: the event region where the behavioral event occurs, the spatio-temporal information of the behavioral event, the face frame participating in the behavioral event object, the body frame participating in the behavioral event object, the face image in the face frame, the body image in the body frame, the attribute information of the face and/or body of the object participating in the behavioral event, and the attribute information of the object-associated family participating in the behavioral event.
The above description of the apparatus embodiments, similar to the above description of the method embodiments, has similar beneficial effects as the method embodiments. For technical details not disclosed in the embodiments of the apparatus of the present disclosure, reference is made to the description of the embodiments of the method of the present disclosure.
It should be noted that, in the embodiment of the present disclosure, if the behavior recognition method is implemented in the form of a software functional module and sold or used as a standalone product, it may also be stored in a computer storage medium. Based on such understanding, the technical solutions of the embodiments of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a terminal device to execute all or part of the methods of the embodiments of the present disclosure.
Fig. 8 is a schematic diagram of a hardware entity of an electronic device according to an embodiment of the present disclosure, and as shown in fig. 8, the hardware entity of the electronic device 800 includes: a processor 801 and a memory 802, wherein the memory 802 stores a computer program operable on the processor 801, and the processor 801 executes the program to implement the steps of the method of any of the above embodiments. The electronic device 800 may include the behavior recognition apparatus described above.
The Memory 802 stores a computer program that can be executed on the processor, and the Memory 802 is configured to store instructions and applications that can be executed by the processor 801, and can also buffer data (e.g., image data, audio data, voice communication data, and video communication data) to be processed or already processed by the processor 801 and modules in the electronic device 800, and can be implemented by a FLASH Memory (FLASH) or a Random Access Memory (RAM). The processor 801, when executing the program, implements the steps of any of the behavior recognition methods described above. The processor 801 generally controls the overall operation of the electronic device 800.
The embodiments of the present disclosure provide a computer storage medium storing one or more programs, which are executable by one or more processors to implement the steps of the behavior recognition method according to any one of the above embodiments.
Here, it should be noted that: the above description of the storage medium and device embodiments is similar to the description of the method embodiments above, with similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the storage medium and apparatus of the present disclosure, reference is made to the description of the embodiments of the method of the present disclosure.
The behavior recognition device, chip or processor may include an integration of any one or more of the following: an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), an embedded neural Network Processing Unit (NPU), a controller, a microcontroller, and a microprocessor. It is understood that the electronic device implementing the above processor function may be other, and the embodiments of the present disclosure are not particularly limited.
The computer storage medium/Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a magnetic Random Access Memory (FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical Disc, or a Compact Disc Read-Only Memory (CD-ROM), and the like; but may also be various terminals such as mobile phones, computers, tablet devices, personal digital assistants, etc., that include one or any combination of the above-mentioned memories.
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment of the present disclosure" or "a previous embodiment" or "some implementations" or "some embodiments" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, the appearances of the phrase "in one embodiment" or "in an embodiment" or "the presently disclosed embodiment" or "the foregoing embodiments" or "some implementations" or "some embodiments" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present disclosure, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present disclosure. The above-mentioned serial numbers of the embodiments of the present disclosure are merely for description and do not represent the merits of the embodiments.
Without being specifically described, the behavior recognition device executes any step in the embodiments of the present disclosure, and the processor of the behavior recognition device may execute the step. Unless otherwise specified, the embodiments of the present disclosure do not limit the order in which the behavior recognition device performs the following steps. In addition, the data may be processed in the same way or in different ways in different embodiments. It should be further noted that any step in the embodiments of the present disclosure may be performed independently by behavior recognition, that is, when the behavior recognition device performs any step in the above embodiments, it may not depend on the performance of other steps.
In the several embodiments provided in the present disclosure, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the modules is only one logical functional division, and there may be other division ways in actual implementation, such as: multiple modules or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or modules may be electrical, mechanical or other.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules; the network module can be located in one place or distributed on a plurality of network modules; some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all functional modules in the embodiments of the present disclosure may be integrated into one processing module, or each module may be separately regarded as one module, or two or more modules may be integrated into one module; the integrated module can be realized in a hardware form, and can also be realized in a form of hardware and a software functional module.
The methods disclosed in the several method embodiments provided in this disclosure may be combined arbitrarily without conflict to arrive at new method embodiments. Features disclosed in several of the product embodiments provided in this disclosure may be combined in any combination to yield new product embodiments without conflict.
The features disclosed in the several method or apparatus embodiments provided in this disclosure may be combined in any combination to arrive at a new method or apparatus embodiment without conflict.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Read Only Memory (ROM), a magnetic disk, or an optical disk.
Alternatively, the integrated modules of the present disclosure may be stored in a computer storage medium if they are implemented in the form of software functional modules and sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present disclosure. And the aforementioned storage medium includes: a removable storage device, a ROM, a magnetic or optical disk, or other various media that can store program code.
In the embodiments of the present disclosure, the descriptions of the same steps and the same contents in different embodiments may be mutually referred to. In the embodiment of the present disclosure, the term "and" does not affect the order of the steps, for example, the electronic device executes a and executes B, where the electronic device may execute a first and then execute B, or the electronic device executes B first and then execute a, or the electronic device executes B while executing a.
As used in the disclosed embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be understood that the term "and/or" as used herein is merely one type of association that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
It should be noted that, in the embodiments of the present disclosure, all the steps may be executed or some of the steps may be executed, as long as a complete technical solution can be formed.
The above description is only an embodiment of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present disclosure, and all the changes or substitutions should be covered by the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (11)

1. A method of behavior recognition, the method comprising:
acquiring an image frame sequence;
tracking at least one object detected in the image frame sequence to obtain tracking information of each object;
performing behavior recognition on the at least one object based on the tracking information of each object to obtain an identifier of a behavior event in the image frame sequence;
detecting object information of at least one object in the sequence of image frames participating in the behavioral event;
and associating the identification of the behavior event with the object information of the at least one object to obtain a behavior recognition result.
2. The method of claim 1, wherein said detecting object information for at least one object in the sequence of image frames that is associated with the behavioral event comprises:
determining an event region in each image of the sequence of image frames in which the behavioral event exists;
determining object information of the at least one object from an event region in which the behavior event exists in each image.
3. The method according to claim 1 or 2, wherein the object information of the at least one object comprises a face identification and a body identification of the at least one object; associating the identifier of the behavior event with the object information of the at least one object to obtain a behavior recognition result, including:
the face identification and the human body identification belonging to the same object in the face identification and the human body identification of the at least one object are associated to obtain an association result;
and associating the identifier of the behavior event with the association result to obtain the behavior identification result.
4. The method according to claim 3, wherein the associating the face identifier and the body identifier belonging to the same object, among the face identifiers and the body identifiers of the at least one object, to obtain an association result comprises:
determining a position relationship between each face and each human body participating in the behavioral event in each image of the image frame sequence;
and associating the face identifier and the human body identifier belonging to the same object in the face identifier and the human body identifier of the at least one object based on the position relationship to obtain an association result.
5. The method according to claim 3 or 4, wherein the associating the face identifier and the body identifier belonging to the same object in the face identifier and the body identifier of the at least one object to obtain an association result comprises:
acquiring a face identifier and a human body identifier belonging to the same object from the face identifier and the human body identifier of the at least one object;
corresponding the same label value to the face identification and the human body identification belonging to the same object; wherein; the face identifications of different objects correspond to different label values, and the body identifications of different objects correspond to different label values;
and associating the face identification and the human body identification corresponding to the same label value to obtain the association result.
6. The method of any of claims 1 to 5, wherein the object information comprises: face feature information and/or face attribute information; the detecting object information of at least one object in the image frame sequence participating in the behavior event includes:
detecting face feature information of the at least one object participating in the behavior event in the image frame sequence;
determining face attribute information of the at least one object based on the face feature information of the at least one object;
the method further comprises the following steps: determining identity information of the at least one object based on the facial feature information and/or the facial attribute information of the at least one object.
7. The method of any of claims 1 to 5, wherein the object information comprises: the detecting object information of at least one object participating in the behavior event in the image frame sequence includes:
detecting human characteristic information of the at least one object participating in the behavior event in the image frame sequence;
determining human body attribute information of the at least one object based on the human body feature information of the at least one object;
the method further comprises the following steps: determining identity information of the at least one object based on the body characteristic information and/or the body attribute information of the at least one object.
8. The method of any of claims 1 to 7, wherein the method further comprises: outputting alarm information under the condition that the behavior event is a preset event;
the alert information includes at least one of the following shown in the sequence of image frames:
the event region where the behavioral event occurs, the spatio-temporal information of the behavioral event, the face frame participating in the behavioral event object, the body frame participating in the behavioral event object, the face image in the face frame, the body image in the body frame, the attribute information of the face and/or body of the object participating in the behavioral event, and the attribute information of the object-associated family participating in the behavioral event.
9. A behavior recognition device, the device comprising:
an obtaining module, configured to obtain a sequence of image frames;
the tracking module is used for tracking at least one object detected in the image frame sequence to obtain tracking information of each object;
the identification module is used for carrying out behavior identification on the at least one object based on the tracking information of each object to obtain the identification of a behavior event in the image frame sequence;
a detection module for detecting object information of at least one object in the sequence of image frames participating in the behavioral event;
and the association module is used for associating the identification of the behavior event with the object information of the at least one object to obtain a behavior identification result.
10. An electronic device, comprising: a memory and a processor, wherein the processor is capable of,
the memory stores a computer program operable on the processor,
the processor, when executing the computer program, implements the steps of the method of any one of claims 1 to 8.
11. A computer storage medium storing one or more programs, the one or more programs being executable by one or more processors to perform the steps of the method of any one of claims 1 to 8.
CN202111109050.XA 2021-09-22 2021-09-22 Behavior recognition method and device, electronic equipment and computer storage medium Pending CN113837066A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111109050.XA CN113837066A (en) 2021-09-22 2021-09-22 Behavior recognition method and device, electronic equipment and computer storage medium
PCT/CN2022/077461 WO2023045239A1 (en) 2021-09-22 2022-02-23 Behavior recognition method and apparatus, device, medium, chip, product, and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111109050.XA CN113837066A (en) 2021-09-22 2021-09-22 Behavior recognition method and device, electronic equipment and computer storage medium

Publications (1)

Publication Number Publication Date
CN113837066A true CN113837066A (en) 2021-12-24

Family

ID=78960367

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111109050.XA Pending CN113837066A (en) 2021-09-22 2021-09-22 Behavior recognition method and device, electronic equipment and computer storage medium

Country Status (2)

Country Link
CN (1) CN113837066A (en)
WO (1) WO2023045239A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023045239A1 (en) * 2021-09-22 2023-03-30 上海商汤智能科技有限公司 Behavior recognition method and apparatus, device, medium, chip, product, and program

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018133666A1 (en) * 2017-01-17 2018-07-26 腾讯科技(深圳)有限公司 Method and apparatus for tracking video target
CN111985428A (en) * 2020-08-27 2020-11-24 上海商汤智能科技有限公司 Security detection method and device, electronic equipment and storage medium
CN113111839A (en) * 2021-04-25 2021-07-13 上海商汤智能科技有限公司 Behavior recognition method and device, equipment and storage medium
CN113223046A (en) * 2020-07-10 2021-08-06 浙江大华技术股份有限公司 Method and system for identifying prisoner behaviors

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102317628B1 (en) * 2019-07-04 2021-10-26 (주)넷비젼텔레콤 Method and system for recognizing situation based on event tagging
CN111860430B (en) * 2020-07-30 2023-04-07 浙江大华技术股份有限公司 Identification method and device of fighting behavior, storage medium and electronic device
CN113837066A (en) * 2021-09-22 2021-12-24 深圳市商汤科技有限公司 Behavior recognition method and device, electronic equipment and computer storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018133666A1 (en) * 2017-01-17 2018-07-26 腾讯科技(深圳)有限公司 Method and apparatus for tracking video target
CN113223046A (en) * 2020-07-10 2021-08-06 浙江大华技术股份有限公司 Method and system for identifying prisoner behaviors
CN111985428A (en) * 2020-08-27 2020-11-24 上海商汤智能科技有限公司 Security detection method and device, electronic equipment and storage medium
CN113111839A (en) * 2021-04-25 2021-07-13 上海商汤智能科技有限公司 Behavior recognition method and device, equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023045239A1 (en) * 2021-09-22 2023-03-30 上海商汤智能科技有限公司 Behavior recognition method and apparatus, device, medium, chip, product, and program

Also Published As

Publication number Publication date
WO2023045239A1 (en) 2023-03-30

Similar Documents

Publication Publication Date Title
US11735018B2 (en) Security system with face recognition
CN109271832B (en) People stream analysis method, people stream analysis device, and people stream analysis system
Feng et al. Spatio-temporal fall event detection in complex scenes using attention guided LSTM
Tsai et al. Implementation of fall detection system based on 3D skeleton for deep learning technique
US20190258866A1 (en) Human presence detection in edge devices
US10009579B2 (en) Method and system for counting people using depth sensor
CN109299646B (en) Crowd abnormal event detection method, device, system and storage medium
CN108154075A (en) The population analysis method learnt via single
CN105405150B (en) Anomaly detection method and device based on fusion feature
US10956753B2 (en) Image processing system and image processing method
CN111259751A (en) Video-based human behavior recognition method, device, equipment and storage medium
Bertoni et al. Perceiving humans: from monocular 3d localization to social distancing
US11288954B2 (en) Tracking and alerting traffic management system using IoT for smart city
Himeur et al. Deep visual social distancing monitoring to combat COVID-19: A comprehensive survey
Fei et al. Flow-pose Net: An effective two-stream network for fall detection
An et al. VFP290k: A large-scale benchmark dataset for vision-based fallen person detection
Zheng et al. Lightweight Fall Detection Algorithm Based on AlphaPose Optimization Model and ST‐GCN
Khel et al. Real-time monitoring of COVID-19 SOP in public gathering using deep learning technique
CN113837066A (en) Behavior recognition method and device, electronic equipment and computer storage medium
CN114360182B (en) Intelligent alarm method, device, equipment and storage medium
Fabbri et al. Inter-homines: Distance-based risk estimation for human safety
Huang et al. Enhancing multi-camera people tracking with anchor-guided clustering and spatio-temporal consistency ID re-assignment
JP2021026744A (en) Information processing device, image recognition method, and learning model generation method
Chen A video surveillance system designed to detect multiple falls
CN114913470A (en) Event detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40064036

Country of ref document: HK

RJ01 Rejection of invention patent application after publication

Application publication date: 20211224

RJ01 Rejection of invention patent application after publication