CN112380971A

CN112380971A - Behavior detection method, device and equipment

Info

Publication number: CN112380971A
Application number: CN202011260947.8A
Authority: CN
Inventors: 赵飞
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2020-11-12
Filing date: 2020-11-12
Publication date: 2021-02-19
Anticipated expiration: 2040-11-12
Also published as: CN112380971B

Abstract

The application provides a behavior detection method, a behavior detection device and behavior detection equipment, wherein the method comprises the following steps: acquiring a video to be detected, wherein the video to be detected comprises a plurality of images to be detected; selecting a plurality of target images to be detected of the same target object from a plurality of images to be detected, and acquiring a slice sequence of the target object based on the plurality of target images to be detected; acquiring a plurality of behavior characteristics of a target object according to the slice sequence; for each behavior feature, determining the similarity between the behavior feature and each cluster center feature based on a plurality of cluster center features corresponding to the specified type behaviors; selecting a target cluster center feature from the plurality of cluster center features based on the similarity of the behavior feature and each cluster center feature; and determining that the appointed type of behavior exists in the video to be detected or the appointed type of behavior does not exist in the video to be detected based on the target cluster center characteristics corresponding to the behavior characteristics. Through the technical scheme of the application, the accuracy of video behavior detection is high.

Description

Behavior detection method, device and equipment

Technical Field

The present application relates to the field of video monitoring technologies, and in particular, to a behavior detection method, apparatus, and device.

Background

Video is a continuous sequence of images, consisting of successive images. Due to the persistence of vision effect of human eyes, when a video is played at a certain rate, human eyes see a sequence of images with continuous motion.

The video behavior sequence compliance detection is an intelligent analysis means for analyzing whether a target behavior conforms to a standard or not, and is used for judging whether a target behavior sequence in a video conforms to the standard or not. The video behavior sequence compliance detection technology can be widely applied to application scenes such as security monitoring field, human-computer interaction field, intelligent park, intelligent classroom, intelligent farm and the like. For example, the video behavior sequence compliance detection technology can detect whether the behavior sequence of an operator in the industrial production process conforms to the standard behavior specification, whether the behavior sequence of a chef conforms to the catering specification, whether the animal feeding action sequence conforms to the compliance, whether the chemical experiment operation of students conforms to the compliance, and the like.

In the related technology, the video behavior sequence compliance detection technology has the problems of low detection accuracy, complex detection mode and the like, and lacks of a universal video behavior sequence compliance detection technology.

Disclosure of Invention

The application provides a behavior detection method, comprising:

acquiring a video to be detected, wherein the video to be detected comprises a plurality of images to be detected;

selecting a plurality of images to be detected of the same target object from the plurality of images to be detected, and acquiring a slice sequence of the target object based on the plurality of images to be detected of the target object, wherein the slice sequence comprises sub-images intercepted from the plurality of images to be detected of the target object based on the position of a target frame of the target object;

acquiring a plurality of behavior characteristics of the target object according to the slice sequence;

for each behavior feature, determining the similarity between the behavior feature and each cluster center feature based on a plurality of cluster center features corresponding to the specified type behaviors; selecting a target cluster center feature from the plurality of cluster center features based on the similarity of the behavior feature and each cluster center feature;

and determining that the video to be detected has the specified type of behavior or the video to be detected does not have the specified type of behavior based on the target cluster center characteristics corresponding to the behavior characteristics.

For example, the determining that the video to be detected has the behavior of the specified type or the video to be detected does not have the behavior of the specified type based on the target cluster center features corresponding to the plurality of behavior features includes: if the target cluster central features corresponding to the behavior features are completely the same as the cluster central features, and the sequence of the target cluster central features corresponding to the behavior features is matched with the sequence of the cluster central features, determining that the video to be detected has the specified type of behavior;

otherwise, determining that the video to be detected does not have the specified type of behavior;

the sequence of the target cluster center features corresponding to the behavior features is matched with the time sequence of the behavior features; the appointed type behaviors comprise a plurality of child behaviors, the number of the child behaviors is the same as the number of the cluster-like central features, the child behaviors are in one-to-one correspondence with the cluster-like central features, and the sequence of the cluster-like central features is matched with the occurrence sequence of the child behaviors.

Illustratively, the selecting a plurality of images to be detected of the same target object from the plurality of images to be detected includes: carrying out target detection on a specific target in the multiple images to be detected to obtain the object position of the specific target in the multiple candidate images to be detected; the candidate images to be detected are images to be detected with specific targets in the multiple images to be detected, wherein the specific targets comprise at least one target object;

carrying out target tracking on the same target object in the candidate images to be detected to obtain the object positions of the target object in the target images to be detected; the target image to be detected is an image to be detected in which the target object exists in the plurality of candidate images to be detected.

Illustratively, the acquiring a slice sequence of the target object based on the plurality of target to-be-detected images includes: determining the target frame position of the target object based on the object positions of the target object in a plurality of images to be detected of the target object, wherein the target frame position is the maximum circumscribed rectangle of all the object positions;

intercepting a plurality of sub-images from the plurality of images to be detected of the target based on the position of the target frame;

slicing the plurality of sub-images according to unit length to obtain at least one slice sequence;

wherein, the interval between two adjacent slice sequences is a fixed interval value.

Illustratively, selecting a target cluster center feature from the plurality of cluster center features based on the similarity between the behavior feature and each cluster center feature includes: determining the maximum similarity based on the similarity of the behavior features and the center features of each cluster; determining the cluster center feature corresponding to the maximum similarity as the target cluster center feature; or, determining whether the maximum similarity is greater than a similarity threshold, and if so, determining the cluster center feature corresponding to the maximum similarity as the target cluster center feature.

Illustratively, the manner for acquiring the central features of the plurality of clusters corresponding to the specified type of behavior includes:

acquiring a plurality of calibration sample images of the specified type of behaviors;

selecting a plurality of target sample images of the same sample object from the plurality of calibration sample images, and acquiring a sample sequence of the sample object based on the plurality of target sample images, wherein the sample sequence comprises sub-images intercepted from the plurality of target sample images based on the sample frame position of the sample object;

obtaining a plurality of sample features of the sample object from the sample sequence;

clustering the plurality of sample characteristics to obtain a plurality of clusters, wherein each cluster comprises at least one sample characteristic; and determining the cluster center characteristics of each cluster based on the sample characteristics in the cluster to obtain the cluster center characteristics corresponding to the plurality of clusters.

Illustratively, for each sample feature within the class cluster, feature values for a plurality of feature dimensions are included; the determining the cluster center feature of the cluster based on the sample feature in the cluster comprises:

for each feature dimension, determining a target feature value of the feature dimension based on feature values of the feature dimension in all sample features within the class cluster;

determining a cluster center feature of the cluster based on the target feature values of the plurality of feature dimensions.

Illustratively, after determining the cluster center feature of the cluster based on the sample features in the cluster, the method further includes: selecting sample characteristics closest to the central characteristics of the clusters from all the sample characteristics in the clusters, and determining a sample sequence corresponding to the selected sample characteristics as behavior cluster samples of the clusters; after determining that the video to be detected has the behavior of the specified type or the video to be detected does not have the behavior of the specified type based on the target cluster center features corresponding to the behavior features, the method further includes: if the video to be detected has the specified type of behavior, displaying a slice sequence with the specified type of behavior, and displaying a behavior cluster sample of each cluster; if the video to be detected does not have the specified type of behavior, displaying the slice sequence with the specified type of behavior and the slice sequence without the specified type of behavior, and displaying the behavior cluster sample of each cluster.

The application provides a behavior detection device, the device includes:

the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a video to be detected, and the video to be detected comprises a plurality of images to be detected; selecting a plurality of images to be detected of the same target object from the plurality of images to be detected, and acquiring a slice sequence of the target object based on the plurality of images to be detected of the target object, wherein the slice sequence comprises sub-images intercepted from the plurality of images to be detected of the target object based on the position of a target frame of the target object;

a second obtaining module, configured to obtain multiple behavior features of the target object according to the slice sequence;

the determining module is used for determining the similarity between the behavior characteristics and the central characteristics of each cluster based on a plurality of central characteristics of the cluster corresponding to the specified type of behaviors aiming at each behavior characteristic; selecting a target cluster center feature from the plurality of cluster center features based on the similarity of the behavior feature and each cluster center feature; and determining that the video to be detected has the specified type of behavior or the video to be detected does not have the specified type of behavior based on the target cluster center characteristics corresponding to the behavior characteristics.

The application provides a behavior detection device, including: a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor;

the processor is configured to execute machine executable instructions to perform the steps of:

According to the technical scheme, the video behavior detection method based on the video cluster center characteristics can determine that the specified type behavior exists in the to-be-detected video or the specified type behavior does not exist in the to-be-detected video based on the cluster center characteristics corresponding to the specified type behavior.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments of the present application or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to the drawings of the embodiments of the present application.

FIG. 1 is a schematic flow chart of a behavior detection method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a training process in one embodiment of the present application;

FIGS. 3A and 3B are schematic diagrams of sample block positions in one embodiment of the present application;

FIG. 4 is a schematic illustration of a detection process in one embodiment of the present application;

FIGS. 5A-5D are schematic diagrams of a similarity matrix in one embodiment of the present application;

FIG. 5E is a schematic view of a compliance visualization interface in one embodiment of the present application;

FIG. 6 is a schematic diagram of a behavior detection device according to an embodiment of the present application;

fig. 7 is a hardware configuration diagram of a behavior detection device according to an embodiment of the present application.

Detailed Description

The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein is meant to encompass any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in the embodiments of the present application to describe various information, the information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. Depending on the context, moreover, the word "if" as used may be interpreted as "at … …" or "when … …" or "in response to a determination".

An embodiment of the present application provides a behavior detection method, which is shown in fig. 1 and is a schematic flow chart of the behavior detection method, where the method may be applied to any device (e.g., an analog Camera, an IPC (IP Camera), a background server, an application server, and the like), and the method may include:

step 101, a video to be detected is obtained, wherein the video to be detected comprises a plurality of images to be detected.

102, selecting a plurality of images to be detected of the same target object from the plurality of images to be detected, and acquiring a slice sequence of the target object based on the plurality of images to be detected of the target object, wherein the slice sequence comprises sub-images intercepted from the plurality of images to be detected of the target object based on the position of a target frame of the target object.

In a possible embodiment, the multiple target images to be detected of the same target object are selected from the multiple images to be detected, which may include but are not limited to: the target detection can be carried out on a specific target in a plurality of images to be detected, and the object position of the specific target in a plurality of candidate images to be detected is obtained; the candidate to-be-detected image is an image to be detected in which a specific target exists among the plurality of images to be detected, and the specific target may include at least one target object. Then, target tracking can be carried out on the same target object in a plurality of candidate images to be detected, and the object position of the target object in the plurality of target images to be detected is obtained; the target image to be detected is an image to be detected in which the target object exists in a plurality of candidate images to be detected.

In one possible embodiment, acquiring a slice sequence of the target object based on a plurality of target to-be-detected images may include, but is not limited to: and determining the target frame position of the target object based on the object positions of the target object in the plurality of images to be detected of the target, wherein the target frame position can be the maximum bounding rectangle of all the object positions. Intercepting a plurality of sub-images from a plurality of images to be detected of the target based on the position of the target frame; the plurality of sub-images are sliced by unit length to obtain at least one slice sequence. In the at least one slice sequence, an interval between adjacent two slice sequences may be a fixed interval value.

And 103, acquiring a plurality of behavior characteristics of the target object according to the slice sequence.

104, aiming at each behavior feature, determining the similarity between the behavior feature and each cluster center feature based on a plurality of cluster center features corresponding to the specified type behaviors; and selecting a target cluster center feature from the plurality of cluster center features based on the similarity between the behavior feature and each cluster center feature.

Illustratively, the maximum similarity is determined based on the similarity of the behavior feature and the center feature of each cluster class. And determining the cluster center feature corresponding to the maximum similarity as the target cluster center feature corresponding to the behavior feature. Or, determining whether the maximum similarity is greater than a similarity threshold (which may be configured according to experience, such as 0.5, and the like), if so, determining the class cluster center feature corresponding to the maximum similarity as the target class cluster center feature corresponding to the behavior feature, and if not, determining that the behavior feature does not have the corresponding target class cluster center feature.

And 105, determining that the video to be detected has the appointed type behavior or the video to be detected does not have the appointed type behavior based on the target cluster center characteristics corresponding to the plurality of behavior characteristics.

In a possible implementation manner, if the target cluster center features corresponding to the behavior features are completely the same as the cluster center features, and the sequence of the target cluster center features corresponding to the behavior features is matched with the sequence of the cluster center features, determining that the video to be detected has the specified type behavior; otherwise, determining that the video to be detected does not have the specified type of behavior. The sequence of the target cluster center features corresponding to the behavior features is matched with the time sequence of the behavior features; the appointed type behaviors comprise a plurality of child behaviors, the number of the child behaviors is the same as the number of the cluster-like central features, the child behaviors are in one-to-one correspondence with the cluster-like central features, and the sequence of the cluster-like central features is matched with the occurrence sequence of the child behaviors.

In a possible implementation manner, the manner for acquiring the central features of the plurality of clusters corresponding to the specified type behavior may include, but is not limited to: the method includes acquiring a plurality of calibration sample images in which a specified type of behavior occurs, selecting a plurality of target sample images of the same sample object from the plurality of calibration sample images, and acquiring a sample sequence of the sample object based on the plurality of target sample images, where the sample sequence may include sub-images that are cut out from the plurality of target sample images based on a sample frame position of the sample object. Obtaining a plurality of sample characteristics of the sample object according to the sample sequence, and clustering the plurality of sample characteristics to obtain a plurality of class clusters, wherein each class cluster comprises at least one sample characteristic; and determining the cluster center characteristics of each cluster based on the sample characteristics in the cluster so as to obtain the cluster center characteristics corresponding to the plurality of clusters.

For example, for each sample feature within a class cluster, feature values for a plurality of feature dimensions may be included; based on this, the cluster center feature of the cluster is determined based on the sample features in the cluster, which may include but is not limited to: for each feature dimension, based on the feature value of the feature dimension in all sample features in the class cluster, a target feature value of the feature dimension (e.g., an average of all feature values of the feature dimension) may be determined; and determining cluster center characteristics of the cluster based on the target characteristic values of the plurality of characteristic dimensions.

For example, a sample feature closest to the center feature of the cluster may be selected from all sample features in the cluster, and a sample sequence corresponding to the selected sample feature may be determined as a behavior cluster sample of the cluster. On this basis, after step 105, if the video to be detected has the specified type of behavior, the slice sequence with the specified type of behavior can be displayed, and the behavior cluster sample of each cluster is displayed; if the video to be detected does not have the specified type of behavior, the slice sequence with the specified type of behavior and the slice sequence without the specified type of behavior can be displayed, and the behavior cluster sample of each cluster can be displayed.

For example, the execution sequence is only an example given for convenience of description, and in practical applications, the execution sequence between the steps may also be changed, and the execution sequence is not limited. Moreover, in other embodiments, the steps of the respective methods do not have to be performed in the order shown and described herein, and the methods may include more or less steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.

The above technical solution of the embodiment of the present application is described below with reference to specific application scenarios.

The video behavior sequence compliance detection technology can be widely applied to application scenes such as security monitoring field, human-computer interaction field, intelligent park, intelligent classroom, intelligent farm and the like. In the application scenario, the embodiment of the application provides a behavior detection method, the behavior detection method is used for realizing a universal video behavior sequence compliance detection technology, and the behavior detection method aims to improve the universality of the video behavior sequence compliance detection technology, reduce the use threshold of the technology, and facilitate the rapid popularization of the technology in various fields.

For example, the behavior detection method is used for detecting whether the behavior sequence of an operator in the industrial production process meets the standard behavior specification (the standard process is that a power supply is turned on, an instrument panel is operated, a melt-blown material is filled, a scrubbing and disinfecting operation table is cleaned, and a motor is turned off), the behavior of the operator is called as the specified type behavior, the specified type behavior comprises the sub-behaviors of turning on the power supply, operating the instrument panel, filling the melt-blown material, scrubbing and disinfecting the operation table, turning off the motor and the like, and the occurrence sequence of the sub-behaviors is as follows: power-on-operation panel-melt-blown material-scrub disinfection console-motor off. For another example, the behavior detection method is used for detecting whether the behavior sequence of the chef meets the catering specification (the standard flow is: chef cap wearing, food material cleaning, vegetable cutting, firing, cooking, dish loading and fire closing), and the behavior of the chef is called as the specified type behavior, and the specified type behavior comprises the sub-behaviors of chef cap wearing, food material cleaning, vegetable cutting, firing, dish frying, dish loading, fire closing and the like. For another example, the behavior detection method is used for detecting whether the animal food intake action sequence is in compliance (the compliance process is: directional slaughter-drinking-eating-returning), and the behavior of the animal is called as specified type behavior, and the specified type behavior comprises sub-behaviors such as directional slaughter, drinking, eating, returning and the like. As another example, the behavior detection method is used for detecting whether the chemical experiment operation of the student is in compliance (the standard process is that reagent is added into a beaker, an alcohol lamp is ignited, the reagent in the beaker is stirred, the experiment data is recorded, the alcohol lamp is removed, and the alcohol lamp is extinguished), and the behavior of the student is called a specified type behavior which comprises a plurality of child behaviors.

Of course, the above are just a few examples of application scenarios, and are not limiting. For convenience of description, the following takes as an example to detect whether the behavior sequence of the operator in the industrial production process conforms to the standard behavior specification.

The behavior detection method of the embodiment of the application may relate to a training process and a detection process, which are described below. Referring to fig. 2, a schematic diagram of a training process is shown, and a plurality of cluster center features corresponding to a behavior of a specified type can be obtained through the training process.

Step 201, obtaining a plurality of calibration sample images of the specified type of behavior.

For example, a sample training video may be obtained, which may include a plurality of sample training images that are consecutive images, e.g., the sample training video includes consecutive

sample training images

1, 2, …, m. For the sample training video, a sample training image in the sample training video where a specified type of behavior occurs (e.g., where any child behavior of the specified type of behavior occurs) may be calibrated, and the calibrated sample training image is referred to as a calibration sample image, that is, the calibration sample image is the sample training image in the sample training video where the specified type of behavior occurs.

The method for calibrating the calibration sample image may be as follows: and for each calibration sample image in which the specified type of behavior occurs, calibrating the spatial position of the specified type of behavior through a drawing frame (including but not limited to a rectangular frame, a circular frame, a polygonal frame and the like), and giving out the labeling information such as the behavior category and the like.

For example, for a sample training video input by a user, assuming that frames 10-19 (sample training images) generate a child behavior 1 (power on) of a specified type of behavior, frames 20-28 generate a child behavior 2 (operating a dashboard) of a specified type of behavior, frames 29-35 generate a child behavior 3 (filling melt-blown material) of a specified type of behavior, frames 36-43 generate a child behavior 4 (scrub and disinfection console) of a specified type of behavior, and frames 44-50 generate a child behavior 5 (motor off) of a specified type of behavior, frames 10-50 are calibration sample images.

For each calibration sample image, an object (e.g., a person) with a specified type of behavior in the calibration sample image is selected by pulling a frame, taking a rectangular frame as an example, the rectangular frame includes the object with the specified type of behavior, the spatial position of the calibration sample image is the object position of the object, and the object position may include coordinate information of the rectangular frame, such as an upper left-corner coordinate (an upper left-corner abscissa and an upper left-corner ordinate) and a lower right-corner coordinate (a lower right-corner abscissa and a lower right-corner ordinate), or a lower left-corner coordinate (a lower left-corner abscissa and a lower left-corner ordinate) and an upper right-corner coordinate (an upper right-corner abscissa and an upper right-corner ordinate). Of course, the above is only an example of the coordinate information of the rectangular frame, and this is not limited thereto. For example, the coordinate information may be an upper left corner coordinate, a width and a height of the rectangular frame, and the lower right corner coordinate can be determined by the upper left corner coordinate and the width and the height of the rectangular frame. For another example, the coordinate information may be a lower left corner coordinate, a width and a height of the rectangular frame, and the upper right corner coordinate may be determined by the lower left corner coordinate and the width and the height of the rectangular frame. Obviously, the rectangular frame of the object with the specified type of behavior in the calibration sample image, i.e. the object position, can be determined by the coordinate information.

In step 202, a plurality of target sample images of the same sample object are selected from the plurality of calibration sample images.

For example, if only one object in the sample training video has a specified type of behavior, the object is used as a sample object, and all calibration sample images are used as target sample images of the sample object.

And if at least two objects in the sample training video have the specified type behaviors, taking each object as a sample object. For each sample object (taking the processing procedure of one sample object as an example later), a calibration sample image in which the sample object exists is selected from all calibration sample images, and the selected calibration sample image is used as a target sample image of the sample object. For example, if the calibration sample image of the sample object 1 with the specified type of behavior is 1-10, and the calibration sample image of the sample object 2 with the specified type of behavior is 11-25, the calibration sample images 1-10 are used as the target sample images of the sample object 1.

In order to select a target sample image of the sample object from all the calibration sample images, a tracking algorithm may be used to track the sample object to obtain a plurality of calibration sample images of the sample object, and the calibration sample images are used as the target sample images of the sample object, which is not described again.

Step 203, a sample sequence of the sample object is acquired based on the plurality of target sample images, and the sample sequence may include sub-images intercepted from the plurality of target sample images based on the sample frame position of the sample object.

Illustratively, for step 203, the sample sequence may be obtained by:

step 2031, based on the object positions of the sample object in the plurality of target sample images, determining the sample frame position of the sample object, where the sample frame position may represent the spatial range of all object positions, and the spatial range may include, but is not limited to, a circumscribed rectangle frame, a circumscribed circle frame, a circumscribed polygon frame, etc., and then take the case of a circumscribed rectangle frame, that is, the sample frame position may be the maximum circumscribed rectangle of all object positions.

Referring to the above embodiment, the object position of the sample object in each target sample image can be known, and based on this, the maximum bounding rectangle of all the object positions is taken as the sample frame position of the sample object.

For example, when the coordinate system is established with the position of the upper left corner of the target sample image as the origin of coordinates, the horizontal direction is the right direction as the horizontal axis, and the horizontal direction is the down direction as the vertical axis, the object position may include the upper left corner abscissa, the upper left corner ordinate, the lower right corner abscissa, and the lower right corner ordinate. On the basis, selecting the minimum value of the horizontal coordinates of the upper left corner based on the horizontal coordinates of the upper left corner of the object position in each target sample image; selecting the minimum value of the vertical coordinate of the upper left corner based on the vertical coordinate of the upper left corner of the object position in each target sample image; selecting the maximum value of the lower right-corner horizontal coordinate based on the lower right-corner horizontal coordinate of the object position in each target sample image; and selecting the maximum value of the vertical coordinate of the lower right corner based on the vertical coordinate of the lower right corner of the object position in each target sample image.

Then, the position of the sample frame of the sample object is determined according to the minimum value of the horizontal coordinate of the upper left corner, the minimum value of the vertical coordinate of the upper left corner, the maximum value of the horizontal coordinate of the lower right corner and the maximum value of the vertical coordinate of the lower right corner.

Referring to fig. 3A, the object positions of the sample objects in each target sample image each include the coordinates of the top left corner (left _ top _ x, left _ top _ y), and the coordinates of the bottom right corner (right _ bottom _ x, right _ bottom _ y). And selecting the minimum value of the upper left-corner abscissa based on all the upper left-corner abscissas and recording the minimum value as min ({ left _ top _ x }), and selecting the minimum value of the upper left-corner ordinate based on all the upper left-corner ordinates and recording the minimum value as min ({ left _ top _ y }). The maximum value of the bottom-right abscissa is chosen based on all the bottom-right abscissas and is marked as max ({ right _ bottom _ x }, and the maximum value of the bottom-right ordinate is chosen based on all the bottom-right ordinates and is marked as max ({ right _ bottom _ y }).

Then, min ({ left _ top _ x }) and min ({ left _ top _ y })) are combined into a coordinate point a1, max ({ right _ bottom _ x } and max ({ right _ bottom _ y }) are combined into a coordinate point a2, and a rectangular frame based on the coordinate point a1 and the coordinate point a2 is a sample frame position of the sample object.

For another example, when the coordinate system is established with the lower left corner position of the target sample image as the origin of coordinates, the horizontal right direction is the horizontal axis, and the horizontal upper direction is the vertical axis, the object position may include a lower left corner abscissa, a lower left corner ordinate, an upper right corner abscissa, and an upper right corner ordinate. On the basis, selecting the minimum value of the horizontal coordinates of the lower left corner based on the horizontal coordinates of the lower left corner of the object position in each target sample image; selecting the minimum value of the vertical coordinate of the lower left corner based on the vertical coordinate of the lower left corner of the object position in each target sample image; selecting the maximum value of the upper right-corner horizontal coordinate based on the upper right-corner horizontal coordinate of the object position in each target sample image; and selecting the maximum value of the vertical coordinate of the upper right corner based on the vertical coordinate of the upper right corner of the object position in each target sample image.

Then, the position of the sample frame of the sample object is determined according to the minimum value of the horizontal coordinate of the lower left corner, the minimum value of the vertical coordinate of the lower left corner, the maximum value of the horizontal coordinate of the upper right corner and the maximum value of the vertical coordinate of the upper right corner.

Referring to fig. 3B, the object position of the sample object in each target sample image includes the coordinates of the lower left corner (lower left abscissa left _ bottom _ x and lower left ordinate left _ bottom _ y) and the coordinates of the upper right corner (upper right abscissa right _ top _ x and upper right ordinate right _ top _ y). And selecting the minimum value of the horizontal coordinate of the lower left corner as min ({ left _ bottom _ x }) based on all the horizontal coordinates of the lower left corner, and selecting the minimum value of the vertical coordinate of the lower left corner as min ({ left _ bottom _ y }) based on all the vertical coordinates of the lower left corner. The maximum of the top-right abscissa is chosen based on all top-right abscissas and denoted as max ({ right _ top _ x }, and the maximum of the top-right ordinate is chosen based on all top-right ordinates and denoted as max ({ right _ top _ y }).

Then, min ({ left _ bottom _ x }) and min ({ left _ bottom _ y }) may be combined into a coordinate point B1, max ({ right _ top _ x } and max ({ right _ top _ y }) may be combined into a coordinate point B2, and the rectangular frame based on the coordinate point B1 and the coordinate point B2 is the sample frame position of the sample object.

Of course, the above manner is only an example, and the present invention is not limited thereto as long as the sample frame position can be determined.

Step 2032, a plurality of sub-images are cut out from the plurality of target sample images based on the sample frame position.

For example, for each target sample image, a sub-image that matches the sample frame location may be truncated from the target sample image. For example, a rectangular frame is determined based on the sample frame position, the abscissa of the upper left corner of the rectangular frame is the minimum of the abscissas of the upper left corner, the ordinate of the upper left corner of the rectangular frame may be the minimum of the ordinate of the upper left corner, the abscissa of the lower right corner of the rectangular frame may be the maximum of the abscissa of the lower right corner, and the ordinate of the lower right corner of the rectangular frame may be the maximum of the ordinate of the lower right corner.

After the above-described processing is performed on each target sample image, a plurality of sub-images can be obtained.

Step 2033, slicing the multiple sub-images according to unit length (which can be configured empirically) to obtain at least one sample sequence, where an interval between two adjacent sample sequences is a fixed interval value.

For example, the unit length may be denoted as N, and a value of N may be configured according to experience, such as 16 frames, 32 frames, and the like, which is not limited in this respect. The fixed interval value can be empirically configured, such as-N/2, -N/4, 0, N/4, N/2, etc., and is not limited. By setting the fixed interval values to be-N/2, -N/4, 0, N/4, N/2 and the like, the granularity of the slice can be ensured to adapt to the sub-behaviors of different intervals.

For example, assuming that N is 16 and the fixed interval value is-N/2, after slicing the plurality of sub-images (it is necessary to slice the plurality of sub-images in the video according to the order of the sub-images, that is, sub-image 1 is earlier than sub-image 2, sub-image 2 is earlier than sub-image 3, and so on), sample sequence 1 may include sub-image 1-sub-image 16, sample sequence 2 may include sub-image 9-sub-image 24, sample sequence 3 may include sub-image 17-sub-image 32, and so on. Obviously, there are 8 frames (i.e., N/2) of images that repeat from sample sequence 2 to sample

sequence

1, 8 frames of images that repeat from sample sequence 3 to sample sequence 2, and so on.

For another example, assuming that N is 16 and the fixed interval value is 0, after slicing the plurality of sub-images, sample sequence 1 may include sub-image 1-sub-image 16, sample sequence 2 may include sub-image 17-sub-image 32, sample sequence 3 may include sub-image 33-sub-image 48, and so on.

For another example, assuming that N is 16 and the fixed interval value is N/2, after slicing the plurality of sub-images, sample sequence 1 may include sub-image 1-sub-image 16, sample sequence 2 may include sub-image 25-sub-image 40, sample sequence 3 may include sub-image 49-sub-image 64, and so on. Obviously, sample sequence 2 is separated from sample sequence 1 by 8 frames (i.e., N/2) of images, and so on.

Step 204, a plurality of sample characteristics of the sample object are obtained according to the sample sequence.

For example, for each sample sequence, feature extraction may be performed on the sample sequence (i.e., a plurality of sub-images), so as to obtain vectorized feature data capable of expressing the sample sequence, and the vectorized feature data is used as a sample feature of the sample object. For example, the sample features can be obtained by performing feature extraction on the sample sequence by using a general behavior recognition model (including but not limited to LSTM, dual-flow network, C3D, P3D, I3D, slowfast, etc.) and a classification neural network (including but not limited to resnet18, resnet50, resnet101, resnet152, VGG, etc.). Of course, the above is only an example of performing feature extraction on the sample sequence, and this is not a limitation as long as the sample features can be obtained. The embodiment is not limited in the manner of extracting features of the sample sequence by using the general behavior recognition model and the classification neural network.

For each sample sequence, after feature extraction is performed on the sample sequence, a sample feature can be obtained (for example, one sample feature corresponding to the sample sequence, and of course, the sample sequence may also correspond to a plurality of sample features). Then, the sample features corresponding to all the sample sequences may be combined to form a behavior feature set, where the behavior feature set includes the sample features corresponding to all the sample sequences. In the behavior feature set, the precedence order relationship of the sample features is maintained, for example, the behavior feature set sequentially includes the sample feature corresponding to the sample sequence 1, the sample feature corresponding to the sample sequence 2, and the sample feature corresponding to the sample sequence 3, and so on, that is, the sample feature corresponding to the sample sequence 1 is located before the sample feature corresponding to the sample sequence 2, and the sample feature corresponding to the sample sequence 2 is located before the sample feature corresponding to the sample sequence 3.

For steps 201-204, assume that the user has calibrated k compliant behavior sequences corresponding to k target trajectories, denoted as { A }₁,A₂,...,A_kEach target track is a plurality of sub-images (sub-images before slicing), and the target track A is processed₁After slicing, a sample sequence can be obtained

In the target track A₂After slicing, a sample sequence can be obtained

By analogy, the target track A is subjected to_kAfter slicing, a sample sequence can be obtained

In summary, a sample sequence set comprising a plurality of sample sequences can be obtained

In the sample sequence

After the characteristic extraction is carried out, the characteristics of the sample can be obtained

In the sample sequence

By analogy, and as described above, in the case of sample sequence aggregation

After the characteristic extraction is carried out on each sample sequence in the method, a behavior characteristic set can be obtained

Step 205, clustering the plurality of sample features to obtain a plurality of class clusters, each class cluster including at least one sample feature, the plurality of class clusters being a plurality of class clusters corresponding to the specified type behavior.

For example, for a specific type of behavior, the specific type of behavior may include multiple child behaviors, and since the same child behavior has the same or similar features and different child behaviors have different features, after a sample sequence is subjected to feature extraction to obtain sample features, all the sample features may be clustered to obtain multiple clusters, that is, the same or similar sample features are clustered to the same cluster, and different sample features are clustered to different clusters to obtain multiple clusters, where each cluster includes at least one sample feature.

Obviously, the number of the class clusters and the number of the child behaviors may be the same, that is, the child behaviors correspond to the class clusters one to one, and when the specified type behavior includes a plurality of child behaviors, the specified type behavior corresponds to the plurality of class clusters.

When all sample features are clustered to obtain a plurality of clusters, all sample features can be clustered by adopting an unsupervised clustering algorithm (such as a hierarchical-based clustering algorithm, a density-based clustering algorithm and the like, without limitation) to obtain a plurality of clusters, and the clustering mode is not limited as long as the same or similar sample features can be clustered to the same cluster and different sample features can be clustered to different clusters.

For example, the specified type of behavior may include m child behaviors, and since the same child behaviors have the same or similar characteristics and different child behaviors have different characteristics, m class clusters may be obtained after clustering the behavior characteristic set F, and the m child behaviors are in one-to-one correspondence with the m class clusters.

In a possible implementation manner, the specified type of behavior includes a plurality of child behaviors, the plurality of child behaviors are in one-to-one correspondence with the plurality of class clusters, and the order of the plurality of class clusters is matched with the occurrence order of the plurality of child behaviors, for example, the occurrence order of the plurality of child behaviors is child behavior 1, child behavior 2, and child behavior 3 in turn, and then the order of the plurality of class clusters is class cluster corresponding to child behavior 1, class cluster corresponding to child behavior 2, and class cluster corresponding to child behavior 3 in turn.

For example, the sequential relationship of the plurality of class clusters may be determined based on the sequential relationship of the sample features within the class clusters. For example, referring to the above embodiment, the behavior feature set retains the precedence relationship of the sample features, the precedence relationship of the sample features is determined based on the precedence relationship of the sample sequence, and the precedence relationship of the sample sequence is determined based on the precedence relationship of the target sample image in the video, so that the precedence relationship of the sample features is matched with the occurrence sequence of the plurality of child behaviors in the video. On this basis, for any two clusters (denoted as cluster 1 and cluster 2), if the sample feature (any sample feature) in the cluster 1 is located before the sample feature in the cluster 2, the cluster 1 is located before the cluster 2, and if the sample feature in the cluster 1 is located after the sample feature in the cluster 2, the cluster 1 is located after the cluster 2.

In summary, after obtaining the plurality of class clusters, the order relationship of the plurality of class clusters can be obtained.

Step 206, determining a cluster center feature of each of the plurality of clusters based on the sample features in the cluster to obtain cluster center features corresponding to the plurality of clusters.

Obviously, for each class cluster, the class cluster has one class cluster center feature, that is, the number of the class cluster center features is the same as the number of the class clusters, that is, the class cluster center features and the class clusters are in one-to-one correspondence.

The specified type behavior comprises a plurality of sub behaviors, the number of the class clusters is the same as that of the sub behaviors (the sub behaviors are in one-to-one correspondence with the class clusters), the number of the class cluster center features is the same as that of the class clusters (the class cluster center features are in one-to-one correspondence with the class clusters), therefore, the number of the sub behaviors is the same as that of the class cluster center features, the plurality of sub behaviors are in one-to-one correspondence with the plurality of class cluster center features, namely, the specified type behavior corresponds to the plurality of class cluster center features.

In one possible implementation, the plurality of child behaviors are in one-to-one correspondence with a plurality of cluster-like center features, and the order of the plurality of cluster-like center features is matched with the occurrence order of the plurality of child behaviors. For example, the sequence of occurrence of the plurality of child actions is: the sequence of the central features of the plurality of clusters is as follows, namely child behavior 1, child behavior 2 and child behavior 3: the cluster center feature corresponding to child behavior 1 (i.e., the cluster center feature of the cluster corresponding to child behavior 1), the cluster center feature corresponding to child behavior 2, and the cluster center feature corresponding to child behavior 3.

Obviously, referring to the above embodiment, the sequential relationship of the plurality of class clusters has been obtained, and therefore, the sequential relationship of the plurality of class cluster center features may be determined based on the sequential relationship of the plurality of class clusters, for example, if the class cluster 1 is located before the class cluster 2, the class cluster center feature of the class cluster 1 is located before the class cluster center feature of the class cluster 2.

In one possible implementation, for each sample feature within a class cluster, the sample feature may be F_i＝[v₁,v₂,...,v_w]W is the characteristic dimension, v₁Is a characteristic value of characteristic dimension 1, v₂Is the eigenvalue of the characteristic dimension 2, v_wIs the eigenvalue of the characteristic dimension w. Based on the method, the characteristic values of the characteristic dimension 1 of all sample characteristics in the class cluster can be averaged to obtain the target characteristic value of the characteristic dimension 1

Averaging the characteristic values of the characteristic dimension 2 of all the sample characteristics in the class cluster to obtain a target characteristic value of the characteristic dimension 2

By analogy, the characteristic values of the characteristic dimensions w of all the sample characteristics in the class cluster are averaged to obtain the target characteristic value of the characteristic dimension w

Then, these target feature values can be combined to obtain a cluster-like center feature of the cluster, for example, the cluster-like center feature of the cluster can be

Assuming that there are m class clusters, the class cluster center feature of the m class clusters may be { Fc }₁,Fc₂,...,Fc_m}。

Step 207, for each of the plurality of clusters, selecting a sample feature closest to the cluster center feature (i.e., the cluster center feature of the cluster) from all the sample features in the cluster, and determining a sample sequence corresponding to the selected sample feature as a behavior cluster sample of the cluster.

For example, after the cluster center feature of each cluster is obtained, the distance (such as the euclidean distance, the cosine distance, and the like, without limitation) between each sample feature in the cluster and the cluster center feature may be calculated, the sample feature closest to the cluster center feature is used as the representative feature of the cluster, and the sample sequence corresponding to the sample feature is used as the behavior cluster sample of the cluster.

In summary, in the training process, a plurality of cluster center features corresponding to the specified type of behavior can be obtained, the sequence of the cluster center features can be obtained, and the number of clusters and the behavior cluster sample can be obtained.

The following description of the detection process, referring to fig. 4, is a schematic diagram of the detection process, and the detection process can detect whether the video to be detected has the specified type of behavior.

Step 401, a video to be detected is obtained, wherein the video to be detected comprises a plurality of images to be detected.

For example, the video to be detected may include a plurality of consecutive images to be detected, for example, the video to be detected may include consecutive image to be detected 1, image to be detected 2, …, and image to be detected n.

Step 402, performing target detection on a specific target in a plurality of images to be detected to obtain the object position of the specific target in a plurality of candidate images to be detected, where the candidate images to be detected are the images to be detected in which the specific target exists in the plurality of images to be detected, and the specific target may include at least one target object.

For example, a target detection algorithm may be used to perform target detection on a specific target (including but not limited to a person, a vehicle, an animal, and the like) in a video to be detected (i.e., a plurality of images to be detected), to obtain an image to be detected with the specific target, to mark the image to be detected with the specific target as a candidate image to be detected, and to determine an object position of the specific target in the candidate images to be detected.

By way of example, and not limitation, the target detection algorithm may include HOG, DPM, Faster R-CNN, YOLO-V3, SSD, and the like. Regarding the process of performing target detection on a specific target in a video to be detected by using a target detection algorithm, the present embodiment is not limited.

Step 403, performing target tracking on the same target object in the multiple candidate images to be detected to obtain object positions of the target object in the multiple target images to be detected, for example, the target image to be detected may be an image to be detected in which the target object exists in the multiple candidate images to be detected.

Illustratively, when the target detection is performed on a specific target in a plurality of images to be detected, a candidate image to be detected of at least one target object can be obtained. For each target object (taking a processing procedure of one target object as an example in the following), it is necessary to select a target to-be-detected image in which the target object exists from all candidate images to be detected, and determine an object position of the target object in the target to-be-detected image.

For example, a tracking algorithm may be used to perform target tracking on the same target object in multiple candidate images to be detected, to obtain an image to be detected with the target object, and mark the image to be detected with the target object as an image to be detected with the target object, and a tracking algorithm may be used to determine the object position of the target object in the multiple images to be detected with the target object, where the object position may include coordinate information of a rectangular frame.

Exemplary tracking algorithms may include, but are not limited to, MOT, deep sort, etc., without limitation. Regarding the process of performing target tracking on a target object by using a tracking algorithm, the present embodiment is not limited. For example, based on the target detection result (i.e. the object position of the specific target in the multiple candidate images to be detected), the association of the same target object is completed, and the trajectory information of the target object is generated, which may include the object position of the target object in the multiple target images to be detected.

Step 404, acquiring a slice sequence of the target object based on the plurality of target images to be detected, where the slice sequence includes sub-images intercepted from the plurality of target images to be detected based on the target frame position of the target object.

Illustratively, for step 404, the slice sequence may be obtained by:

step 4041, based on the object positions of the target object in the multiple target to-be-detected images, determining a target frame position of the target object, where the target frame position represents a spatial range of all object positions corresponding to the target object, and the spatial range may include, but is not limited to, a circumscribed rectangle frame, a circumscribed circle frame, a circumscribed polygon frame, and the like, and then the circumscribed rectangle frame is taken as an example, that is, the target frame position may be the maximum circumscribed rectangle of all object positions. For example, the object positions of the target object in the multiple target images to be detected can be known, and based on the object positions, the maximum bounding rectangle of all the object positions is used as the target frame position of the target object.

Illustratively, the implementation process of step 4041 is similar to step 2031, and will not be repeated here.

Step 4042, a plurality of sub-images are extracted from the plurality of images to be detected of the target based on the position of the target frame.

Illustratively, the implementation process of step 4042 is similar to step 2032, and will not be repeated here.

Step 4043, slicing the plurality of sub-images according to the unit length to obtain at least one slice sequence, where an interval between two adjacent slice sequences in the at least one slice sequence may be a fixed interval value.

Exemplarily, the implementation process of step 4043 is similar to that of step 2033, except that the sample sequence is replaced by the slice sequence, and the processing of the sample sequence and the slice sequence is the same, and will not be repeated here.

Step 405, a plurality of behavior features of the target object are obtained according to the slice sequence.

For example, for each slice sequence, feature extraction may be performed on the slice sequence (i.e., a plurality of sub-images) to obtain vectorized feature data capable of expressing the slice sequence, and the vectorized feature data is used as the behavior feature of the target object. For example, feature extraction may be performed on the slice sequence by using a general behavior recognition model and a classification neural network to obtain behavior features. For each slice sequence, after feature extraction is performed on the slice sequence, behavior features can be obtained. Then, the behavior features corresponding to all the slice sequences may be combined to form a behavior feature set, where the behavior feature set includes the behavior features corresponding to all the slice sequences. In the behavior feature set, the precedence order relationship of the behavior features is reserved.

Step 406, determining, for each behavior feature, a similarity between the behavior feature and each cluster center feature based on a plurality of cluster center features corresponding to the specified type of behavior; and selecting a target cluster center feature from the plurality of cluster center features based on the similarity between the behavior feature and each cluster center feature.

Referring to step 206, a plurality of cluster center features corresponding to the specified type of behavior are obtained, and therefore, the similarity (e.g., cosine similarity) between the behavior feature and each cluster center feature can be determined, and the maximum similarity can be determined from the similarities. And then, determining whether the maximum similarity is greater than a similarity threshold, if so, determining the class cluster central feature corresponding to the maximum similarity as a target class cluster central feature corresponding to the behavior feature, and if not, determining that the behavior feature does not have the corresponding target class cluster central feature.

In summary, the target cluster center feature corresponding to the behavior feature can be obtained. For example, the behavior feature set sequentially includes behavior feature 1, behavior feature 2, and behavior feature 3, and these behavior features have a precedence relationship, in step 206, a target cluster center feature (e.g., cluster center feature 1) corresponding to behavior feature 1 may be obtained, a target cluster center feature (e.g., cluster center feature 2) corresponding to behavior feature 2 may be obtained, and a target cluster center feature (e.g., cluster center feature 3) corresponding to behavior feature 3 may be obtained.

Illustratively, because the plurality of behavior features have a precedence relationship, the target cluster center features corresponding to the plurality of behavior features also have a precedence relationship, and the order of the target cluster center features corresponding to the plurality of behavior features matches with the time precedence order of the plurality of behavior features. For example, the time sequence of the behavior features is behavior feature 1, behavior feature 2, and behavior feature 3, and the sequence of the target cluster center features corresponding to the behavior features is: the target cluster center characteristic corresponding to the behavior characteristic 1, the target cluster center characteristic corresponding to the behavior characteristic 2 and the target cluster center characteristic corresponding to the behavior characteristic 3.

Step 407, determining that the video to be detected has the specified type of behavior based on the target cluster center features corresponding to the plurality of behavior features, or determining that the video to be detected does not have the specified type of behavior. For example, if the target cluster center features corresponding to the multiple behavior features are completely the same as all the cluster center features corresponding to the specified type of behavior, and the sequence of the target cluster center features corresponding to the multiple behavior features (see the above embodiment, the sequence of the target cluster center features corresponding to the multiple behavior features, that is, the time sequence of the multiple behavior features, has been determined) matches the sequence of the multiple cluster center features (see the above embodiment, the sequence of the multiple cluster center features has been determined, and the sequence of the multiple cluster center features matches the occurrence sequence of the multiple sub-behaviors), it is determined that the specified type of behavior exists in the video to be detected, and otherwise, it is determined that the specified type of behavior does not exist in the video to be detected.

For example, the fact that the target class cluster center features corresponding to the multiple behavior features are identical to all class cluster center features corresponding to the specified type of behavior means that: assuming that a specified type behavior corresponds to a class cluster center feature 1, a class cluster center feature 2, and a class cluster center feature 3, the target class cluster center features corresponding to the behavior features need to include the class cluster center feature 1, the class cluster center feature 2, and the class cluster center feature 3. If the target cluster center features corresponding to the behavior features only include part (not all) of the cluster center features 1, the cluster center features 2 and the cluster center features 3, it is indicated that all the cluster center features corresponding to the specified type of behavior are not identical.

Illustratively, the matching of the order of the target cluster center features corresponding to the plurality of behavior features and the order of the plurality of cluster center features refers to: assuming that the sequence of all cluster center features corresponding to the specified type behavior is cluster center feature 1, cluster center feature 2 and cluster center feature 3 in sequence, the sequence of the target cluster center features corresponding to the plurality of behavior features also needs to be cluster center feature 1, cluster center feature 2 and cluster center feature 3. If the sequence of the target cluster center features corresponding to the plurality of behavior features is not the cluster center feature 1, the cluster center feature 2 and the cluster center feature 3, it is indicated that the target cluster center features are not matched with the sequence of the plurality of cluster center features.

For example, if the sequence of the target cluster center features corresponding to the plurality of behavior features is cluster center feature 1, cluster center feature 2, cluster center feature 3, and cluster center feature 3, the target cluster center features are matched with the sequence of the plurality of cluster center features, that is, the two cluster center features 1 are merged, and the two cluster center features 3 are merged to obtain the sequence of the cluster center feature 1, the cluster center feature 2, and the cluster center feature 3.

If the sequence of the target cluster center features corresponding to the behavior features is cluster center feature 1, cluster center feature 3, cluster center feature 2 and cluster center feature 2 in sequence, the target cluster center features are not matched with the sequence of the cluster center features, namely the two cluster center features 1 are merged, and the two cluster center features 2 are merged to obtain the sequence of the cluster center features 1, the cluster center features 3 and the cluster center features 2.

In one possible implementation, referring to fig. 5A, a similarity matrix of the behavior feature (i.e., each behavior feature in the behavior feature set) and the cluster center feature (i.e., each cluster center feature corresponding to a specific type of behavior) may be obtained, where the similarity matrix may be m rows and q columns. Illustratively, m rows represent m cluster-center-like features, the order of the cluster-center-like features is 1, 2, …, m, q columns represent q behavior features, and the order of the behavior features is 1, 2, …, q. Based on the similarity matrix, the numerical values in the first row represent the similarity between each behavior feature and the central feature of the first cluster, the numerical values in the second row represent the similarity between each behavior feature and the central feature of the second cluster, and so on.

As can be seen from fig. 5A, behavior feature 1, behavior feature 2, behavior feature 3, and behavior feature 13 (i.e., q) do not have corresponding target cluster center features, target cluster center features corresponding to behavior feature 4 and behavior feature 5 are all cluster center features 1, target cluster center feature corresponding to behavior feature 6 is cluster center feature 2, target cluster center feature corresponding to behavior feature 7 is cluster center feature 3, target cluster center feature corresponding to behavior feature 8 is cluster center feature 4, target cluster center feature corresponding to behavior feature 9 is cluster center feature 5, target cluster center feature corresponding to behavior feature 10 is cluster center feature 6, and target cluster center features corresponding to behavior feature 11 and behavior feature 12 are all cluster center features 7 (i.e., m).

To sum up, the target cluster center features corresponding to all the behavior features are completely the same as all the cluster center features corresponding to the specified type of behavior (i.e., cluster center feature 1-cluster center feature 7), and the sequence of the target cluster center features corresponding to all the behavior features matches with the sequence of all the cluster center features corresponding to all the specified type of behavior, i.e., the sequence is cluster center features 1, … and cluster center feature 7.

Referring to fig. 5B, the target cluster center features corresponding to all behavior features do not include cluster center feature 4, and therefore, all cluster center features corresponding to the specified type of behavior are not identical.

Referring to fig. 5C, the target class cluster center feature corresponding to all behavior features does not include class cluster center feature 1, and therefore, all class cluster center features corresponding to the specified type of behavior are not identical.

Referring to fig. 5D, the target cluster center features corresponding to all behavior features do not include cluster center feature 7 (i.e., m), and therefore, all cluster center features corresponding to the specified type of behavior are not identical.

Referring to fig. 5D, the sequence of the target cluster center features corresponding to the behavior features sequentially is: the cluster center feature 1, the cluster center feature 2, the cluster center feature 3, the cluster center feature 4, the cluster center feature 5, the cluster center feature 6 and the cluster center feature 2 are not matched with the sequence of the cluster center features.

In a possible implementation manner, if the video to be detected has the specified type of behavior, a slice sequence with the specified type of behavior may be displayed, and a behavior cluster sample of each cluster may be displayed. Or, if the video to be detected does not have the specified type of behavior, the slice sequence with the specified type of behavior and the slice sequence without the specified type of behavior can be displayed, and the behavior cluster sample of each cluster can be displayed.

For example, if the target behavior sequence is detected to be compliant, the slice sequences of compliant behaviors are displayed in series in a time sequence, and the behavior cluster samples are displayed in an overlapping manner. And if the target behavior sequence is detected to be not compliant, serially displaying the compliant slice sequence and the non-compliant slice sequence according to the time sequence, and overlaying and displaying the behavior cluster samples.

By displaying the slice sequence which is in compliance and the slice sequence which is not in compliance and displaying the behavior cluster samples in an overlapping mode, the user is prompted to carry out non-compliance self-checking and quality improvement conveniently, and the management worker can carry out summary analysis and quality improvement conveniently. Referring to fig. 5E, a schematic view of a compliance visualization interface is shown.

In this embodiment, the above process may be implemented by a target detection module, a target tracking module, a time series slicing module, a feature extraction module, a feature clustering module, a similarity measurement module, and a visualization module. For example, step 202 is implemented based on the target tracking module, step 203 is implemented based on the time-series slicing module, step 204 is implemented based on the feature extraction module, and step 205-step 207 are implemented based on the feature clustering module. For another example, the step 402 is implemented based on the target detection module, the step 403 is implemented based on the target tracking module, the step 404 is implemented based on the time-series slicing module, the step 405 is implemented based on the feature extraction module, the step 406-step 407 is implemented based on the similarity measurement module, and the display of the slicing sequence and the behavior cluster sample is implemented based on the visualization module.

According to the technical scheme, the video behavior detection method is high in accuracy and simple in detection mode, is an automatic general video behavior sequence compliance detection method, can improve the universality of a video behavior sequence compliance detection technology, reduces the use threshold of the technology, and facilitates rapid popularization of the technology in various fields. The method avoids the detection of each sub-behavior in the behavior sequence, uses the general behavior recognition model to extract the characteristics, does not need to pay more attention to the behavior recognition problem of a specific scene, and is convenient to be widely used under a plurality of scenes and a plurality of tasks. The key information modeling (cluster center characteristics, sequence of the cluster center characteristics and the like) of any compliance behavior sequence is automatically completed by adopting unsupervised clustering, the clustering algorithm has stronger capability of resisting noise interference, the difficulty of sample data collection is reduced, and the related personnel can quickly position the non-compliance segments.

Based on the same application concept as the method, an embodiment of the present application provides a behavior detection apparatus, as shown in fig. 6, which is a schematic structural diagram of the behavior detection apparatus, and the apparatus may include:

the first acquisition module 61 is configured to acquire a video to be detected, where the video to be detected includes a plurality of images to be detected; selecting a plurality of images to be detected of the same target object from the plurality of images to be detected, and acquiring a slice sequence of the target object based on the plurality of images to be detected of the target object, wherein the slice sequence comprises sub-images intercepted from the plurality of images to be detected of the target object based on the position of a target frame of the target object; a second obtaining module 62, configured to obtain a plurality of behavior characteristics of the target object according to the slice sequence; a determining module 63, configured to determine, for each behavior feature, a similarity between the behavior feature and each cluster center feature based on a plurality of cluster center features corresponding to a specified type of behavior; selecting a target cluster center feature from the plurality of cluster center features based on the similarity of the behavior feature and each cluster center feature; and determining that the video to be detected has the specified type of behavior or the video to be detected does not have the specified type of behavior based on the target cluster center characteristics corresponding to the behavior characteristics.

For example, the determining module 63 determines that the video to be detected has the specified type of behavior based on the target cluster center features corresponding to the multiple behavior features, or when the video to be detected does not have the specified type of behavior, is specifically configured to: if the target cluster central features corresponding to the behavior features are completely the same as the cluster central features, and the sequence of the target cluster central features corresponding to the behavior features is matched with the sequence of the cluster central features, determining that the video to be detected has the specified type of behavior; otherwise, determining that the video to be detected does not have the specified type of behavior; the sequence of the target cluster center features corresponding to the behavior features is matched with the time sequence of the behavior features; the appointed type behaviors comprise a plurality of child behaviors, the number of the child behaviors is the same as the number of the cluster-like central features, the child behaviors are in one-to-one correspondence with the cluster-like central features, and the sequence of the cluster-like central features is matched with the occurrence sequence of the child behaviors.

For example, when the first obtaining module 61 selects a plurality of images to be detected of the same target object from the plurality of images to be detected, it is specifically configured to: carrying out target detection on a specific target in the multiple images to be detected to obtain the object position of the specific target in the multiple candidate images to be detected; the candidate images to be detected are images to be detected with specific targets in the multiple images to be detected, wherein the specific targets comprise at least one target object; carrying out target tracking on the same target object in the candidate images to be detected to obtain the object positions of the target object in the target images to be detected; the target image to be detected is an image to be detected in which the target object exists in the plurality of candidate images to be detected.

For example, the first obtaining module 61 is specifically configured to, when obtaining the slice sequence of the target object based on the multiple target images to be detected: determining the target frame position of the target object based on the object positions of the target object in a plurality of images to be detected of the target object, wherein the target frame position is the maximum circumscribed rectangle of all the object positions; intercepting a plurality of sub-images from the plurality of images to be detected of the target based on the position of the target frame; slicing the plurality of sub-images according to unit length to obtain at least one slice sequence; wherein, the interval between two adjacent slice sequences is a fixed interval value.

Illustratively, the first obtaining module 61 is further configured to: obtaining a plurality of cluster center features corresponding to the specified type of behavior, and specifically configured to: acquiring a plurality of calibration sample images of the specified type of behaviors;

Based on the same application concept as the method, the embodiment of the present application provides a behavior detection device, as shown in fig. 7, the behavior detection device may include: a processor 71 and a machine-readable storage medium 72, the machine-readable storage medium 72 storing machine-executable instructions executable by the processor 71; the processor 71 is configured to execute machine executable instructions to perform the following steps:

Based on the same application concept as the method, embodiments of the present application further provide a machine-readable storage medium, where a plurality of computer instructions are stored, and when the computer instructions are executed by a processor, the behavior detection method disclosed in the above example of the present application can be implemented.

The machine-readable storage medium may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: a RAM (random Access Memory), a volatile Memory, a non-volatile Memory, a flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Furthermore, these computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A method of behavior detection, the method comprising:

2. The method of claim 1,

the determining that the video to be detected has the specified type of behavior or the video to be detected does not have the specified type of behavior based on the target cluster center features corresponding to the plurality of behavior features includes:

if the target cluster central features corresponding to the behavior features are completely the same as the cluster central features, and the sequence of the target cluster central features corresponding to the behavior features is matched with the sequence of the cluster central features, determining that the video to be detected has the specified type of behavior;

3. The method according to claim 1, wherein the selecting a plurality of images to be detected of the same target object from the plurality of images to be detected comprises:

carrying out target detection on a specific target in the multiple images to be detected to obtain the object position of the specific target in the multiple candidate images to be detected; the candidate images to be detected are images to be detected with specific targets in the multiple images to be detected, wherein the specific targets comprise at least one target object;

4. The method of claim 3,

the acquiring of the slice sequence of the target object based on the plurality of target images to be detected comprises:

determining the target frame position of the target object based on the object positions of the target object in a plurality of images to be detected of the target object, wherein the target frame position is the maximum circumscribed rectangle of all the object positions;

5. The method of claim 1, wherein selecting a target cluster center feature from the plurality of cluster center features based on the similarity of the behavior feature to each cluster center feature comprises:

determining the maximum similarity based on the similarity of the behavior features and the center features of each cluster;

determining the cluster center feature corresponding to the maximum similarity as the target cluster center feature;

or, determining whether the maximum similarity is greater than a similarity threshold, and if so, determining the cluster center feature corresponding to the maximum similarity as the target cluster center feature.

6. The method according to any one of claims 1 to 5,

the obtaining mode of the central features of the plurality of clusters corresponding to the specified type of behavior comprises the following steps:

7. The method of claim 6,

for each sample feature within the class cluster, feature values for a plurality of feature dimensions are included;

the determining the cluster center feature of the cluster based on the sample feature in the cluster comprises:

8. The method of claim 6,

after determining the cluster center feature of the cluster based on the sample features in the cluster, the method further includes: selecting sample characteristics closest to the central characteristics of the clusters from all the sample characteristics in the clusters, and determining a sample sequence corresponding to the selected sample characteristics as behavior cluster samples of the clusters;

after determining that the video to be detected has the behavior of the specified type or the video to be detected does not have the behavior of the specified type based on the target cluster center features corresponding to the behavior features, the method further includes: if the video to be detected has the specified type of behavior, displaying a slice sequence with the specified type of behavior, and displaying a behavior cluster sample of each cluster; if the video to be detected does not have the specified type of behavior, displaying the slice sequence with the specified type of behavior and the slice sequence without the specified type of behavior, and displaying the behavior cluster sample of each cluster.

9. A behavior detection device, characterized in that the device comprises:

10. A behavior detection device, comprising: a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor;