CN112380971B

CN112380971B - Behavior detection method, device and equipment

Info

Publication number: CN112380971B
Application number: CN202011260947.8A
Authority: CN
Inventors: 赵飞
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2020-11-12
Filing date: 2020-11-12
Publication date: 2023-08-25
Anticipated expiration: 2040-11-12
Also published as: CN112380971A

Abstract

The application provides a behavior detection method, a behavior detection device and behavior detection equipment, wherein the method comprises the following steps: acquiring a video to be detected, wherein the video to be detected comprises a plurality of images to be detected; selecting a plurality of target to-be-detected images of the same target object from the plurality of to-be-detected images, and acquiring a slice sequence of the target object based on the plurality of target to-be-detected images; acquiring a plurality of behavior characteristics of a target object according to the slice sequence; for each behavior feature, determining the similarity between the behavior feature and each cluster center feature based on a plurality of cluster center features corresponding to the specified type of behavior; selecting a target cluster center feature from the plurality of cluster center features based on the similarity of the behavior feature and each cluster center feature; and determining that the specified type of behavior exists in the video to be detected or the specified type of behavior does not exist in the video to be detected based on the target cluster center features corresponding to the behavior features. By the technical scheme, the video behavior detection accuracy is high.

Description

Behavior detection method, device and equipment

Technical Field

The present application relates to the field of video monitoring technologies, and in particular, to a behavior detection method, apparatus, and device.

Background

Video is a sequence of successive images, consisting of successive images. Due to the persistence of vision effect of the human eye, when the video is played at a certain rate, the human eye sees a sequence of images that are continuous in motion.

The video behavior sequence compliance detection is an intelligent analysis means for analyzing whether the target behavior accords with the specification, and is used for judging whether the target behavior sequence in the video is compliant. The video behavior sequence compliance detection technology can be widely applied to the application fields of security monitoring field, man-machine interaction field, intelligent park, intelligent classroom, intelligent farm and the like. For example, the video behavior sequence compliance detection technology can detect whether the behavior sequence of an operator in an industrial production process accords with a standard behavior specification, detect whether the behavior sequence of a chef accords with a catering specification, detect whether the animal feeding action sequence is in compliance, detect whether the chemical experiment operation of students is in compliance, and the like.

In the related technology, the video behavior sequence compliance detection technology has the problems of low detection accuracy, complex detection mode and the like, and lacks a video behavior sequence compliance detection technology which can be commonly used.

Disclosure of Invention

The application provides a behavior detection method, which comprises the following steps:

Acquiring a video to be detected, wherein the video to be detected comprises a plurality of images to be detected;

selecting a plurality of target to-be-detected images of the same target object from the plurality of to-be-detected images, and acquiring a slice sequence of the target object based on the plurality of target to-be-detected images, wherein the slice sequence comprises sub-images intercepted from the plurality of target to-be-detected images based on the target frame position of the target object;

acquiring a plurality of behavior characteristics of the target object according to the slice sequence;

for each behavior feature, determining the similarity between the behavior feature and each cluster center feature based on a plurality of cluster center features corresponding to the specified type of behavior; selecting a target cluster center feature from the plurality of cluster center features based on the similarity of the behavior feature and each cluster center feature;

and determining that the video to be detected has the specified type of behavior or the video to be detected does not have the specified type of behavior based on the target cluster center characteristics corresponding to the behavior characteristics.

The determining that the specified type of behavior exists in the video to be detected or the specified type of behavior does not exist in the video to be detected based on the target cluster center features corresponding to the behavior features includes: if the target cluster center features corresponding to the behavior features are identical to the cluster center features, and the sequence of the target cluster center features corresponding to the behavior features is matched with the sequence of the cluster center features, determining that the video to be detected has the specified type of behavior;

Otherwise, determining that the specified type of behavior does not exist in the video to be detected;

the sequence of the central characteristics of the target class cluster corresponding to the behavior characteristics is matched with the time sequence of the behavior characteristics; the specified type behavior comprises a plurality of sub-behaviors, the number of the sub-behaviors is the same as that of the cluster center features, the plurality of sub-behaviors are in one-to-one correspondence with the cluster center features, and the sequence of the cluster center features is matched with the occurrence sequence of the plurality of sub-behaviors.

The selecting a plurality of target to-be-detected images of the same target object from the plurality of to-be-detected images includes: performing target detection on specific targets in the plurality of images to be detected to obtain object positions of the specific targets in the plurality of candidate images to be detected; the candidate to-be-detected image is to-be-detected images with specific targets in the plurality of to-be-detected images, wherein the specific targets comprise at least one target object;

performing target tracking on the same target object in the plurality of candidate images to be detected to obtain object positions of the target object in the plurality of target images to be detected; the target to-be-detected image is an image to be detected in which the target object exists in the plurality of candidate to-be-detected images.

Illustratively, the acquiring a slice sequence of the target object based on the plurality of target to-be-detected images includes: determining a target frame position of the target object based on the object positions of the target object in a plurality of target images to be detected, wherein the target frame position is a maximum circumscribed rectangle of all object positions;

intercepting a plurality of sub-images from the plurality of target images to be detected based on the target frame positions;

slicing the plurality of sub-images according to unit length to obtain at least one slice sequence;

wherein the interval between two adjacent slice sequences is a fixed interval value.

Illustratively, selecting a target cluster center feature from the plurality of cluster center features based on the similarity of the behavior feature to each cluster center feature includes: determining the maximum similarity based on the similarity between the behavior feature and the central feature of each class cluster; determining the central characteristic of the class cluster corresponding to the maximum similarity as the central characteristic of the target class cluster; or determining whether the maximum similarity is greater than a similarity threshold, if so, determining the cluster center feature corresponding to the maximum similarity as the target cluster center feature.

Exemplary, the method for obtaining the central characteristics of the multiple class clusters corresponding to the specified type of behavior includes:

acquiring a plurality of calibration sample images of the specified type of behavior;

selecting a plurality of target sample images of the same sample object from the plurality of calibration sample images, and acquiring a sample sequence of the sample object based on the plurality of target sample images, wherein the sample sequence comprises sub-images which are intercepted from the plurality of target sample images based on the sample frame positions of the sample object;

acquiring a plurality of sample features of the sample object according to the sample sequence;

clustering the plurality of sample features to obtain a plurality of class clusters, wherein each class cluster comprises at least one sample feature; and determining the center characteristics of the class clusters based on the sample characteristics in the class clusters aiming at each class cluster so as to obtain the center characteristics of the class clusters corresponding to the plurality of class clusters respectively.

Illustratively, for each sample feature within the class cluster, a feature value comprising a plurality of feature dimensions; the determining the cluster center feature of the cluster based on the sample feature in the cluster includes:

for each feature dimension, determining a target feature value for the feature dimension based on feature values for the feature dimension in all sample features within the class cluster;

And determining a cluster center feature of the cluster based on the target feature values of the feature dimensions.

Illustratively, after determining the cluster center feature of the cluster based on the sample feature in the cluster, the method further includes: selecting a sample feature closest to the central feature of the class cluster from all sample features in the class cluster, and determining a sample sequence corresponding to the selected sample feature as a behavior class cluster sample of the class cluster; the determining that the video to be detected has the specified type of behavior based on the target cluster center features corresponding to the behavior features, or after the video to be detected does not have the specified type of behavior, further includes: if the video to be detected has the specified type of behavior, displaying a slice sequence with the specified type of behavior, and displaying a behavior cluster sample of each cluster; and if the video to be detected does not have the specified type of behavior, displaying a slice sequence with the specified type of behavior and a slice sequence without the specified type of behavior, and displaying a behavior class cluster sample of each class cluster.

The application provides a behavior detection device, comprising:

The first acquisition module is used for acquiring a video to be detected, wherein the video to be detected comprises a plurality of images to be detected; selecting a plurality of target to-be-detected images of the same target object from the plurality of to-be-detected images, and acquiring a slice sequence of the target object based on the plurality of target to-be-detected images, wherein the slice sequence comprises sub-images intercepted from the plurality of target to-be-detected images based on the target frame position of the target object;

the second acquisition module is used for acquiring a plurality of behavior characteristics of the target object according to the slice sequence;

the determining module is used for determining the similarity between the behavior characteristics and the central characteristics of each class cluster based on the central characteristics of the class clusters corresponding to the specified type of behavior aiming at each behavior characteristic; selecting a target cluster center feature from the plurality of cluster center features based on the similarity of the behavior feature and each cluster center feature; and determining that the video to be detected has the specified type of behavior or the video to be detected does not have the specified type of behavior based on the target cluster center characteristics corresponding to the behavior characteristics.

The present application provides a behavior detection apparatus including: a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor;

The processor is configured to execute machine-executable instructions to perform the steps of:

According to the technical scheme, in the embodiment of the application, the presence of the specified type of behavior in the video to be detected or the absence of the specified type of behavior in the video to be detected can be determined based on the plurality of cluster center characteristics corresponding to the specified type of behavior, the video behavior detection accuracy is high, the detection mode is simple, the method is an automatic universal video behavior sequence compliance detection method, the universality of a video behavior sequence compliance detection technology can be improved, the use threshold of the technology is reduced, and the technology can be conveniently and rapidly popularized in various fields.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following description will briefly describe the drawings required to be used in the embodiments of the present application or the description in the prior art, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to these drawings of the embodiments of the present application for a person having ordinary skill in the art.

FIG. 1 is a flow chart of a behavior detection method in one embodiment of the application;

FIG. 2 is a schematic diagram of a training process in one embodiment of the application;

FIGS. 3A and 3B are schematic diagrams of sample frame positions in one embodiment of the application;

FIG. 4 is a schematic diagram of a detection process in one embodiment of the application;

FIGS. 5A-5D are schematic diagrams of a similarity matrix in one embodiment of the application;

FIG. 5E is a schematic diagram of a compliance visualization interface in one embodiment of the present application;

FIG. 6 is a schematic diagram of a behavior detection apparatus in one embodiment of the present application;

fig. 7 is a hardware configuration diagram of a behavior detection apparatus in one embodiment of the present application.

Detailed Description

The terminology used in the embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to any or all possible combinations including one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in embodiments of the present application to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the application. Depending on the context, furthermore, the word "if" used may be interpreted as "at … …" or "at … …" or "in response to a determination".

In the embodiment of the present application, a behavior detection method is provided, and referring to fig. 1, which is a flow chart of the behavior detection method, the method may be applied to any device (such as an analog Camera, an IPC (internet protocol Camera), a background server, an application server, etc.), and the method may include:

Step 101, obtaining a video to be detected, wherein the video to be detected comprises a plurality of images to be detected.

Step 102, selecting a plurality of target to-be-detected images of the same target object from the plurality of to-be-detected images, and acquiring a slice sequence of the target object based on the plurality of target to-be-detected images, wherein the slice sequence comprises sub-images intercepted from the plurality of target to-be-detected images based on the target frame position of the target object.

In one possible implementation, the selecting the multiple target to-be-detected images of the same target object from the multiple to-be-detected images may include, but is not limited to: target detection can be carried out on specific targets in the plurality of images to be detected, so that the positions of the specific targets in the plurality of candidate images to be detected are obtained; the candidate image to be detected is an image to be detected in which a specific target, which may include at least one target object, exists among the plurality of images to be detected. Then, target tracking can be carried out on the same target object in the plurality of candidate images to be detected, so that the object positions of the target object in the plurality of target images to be detected are obtained; the target to-be-detected image is an image to be detected in which the target object exists in a plurality of candidate to-be-detected images.

In one possible embodiment, acquiring a slice sequence of the target object based on a plurality of target to-be-detected images may include, but is not limited to: and determining the target frame position of the target object based on the object positions of the target object in the plurality of target images to be detected, wherein the target frame position can be the maximum circumscribed rectangle of all the object positions. Intercepting a plurality of sub-images from a plurality of target images to be detected based on the target frame position; and slicing the plurality of sub-images according to the unit length to obtain at least one slice sequence. In at least one slice sequence, the interval between two adjacent slice sequences may be a fixed interval value.

Step 103, obtaining a plurality of behavior characteristics of the target object according to the slice sequence.

Step 104, determining the similarity between each behavior feature and each cluster center feature based on a plurality of cluster center features corresponding to the specified type of behavior; and selecting a target cluster center feature from a plurality of cluster center features based on the similarity of the behavior feature and each cluster center feature.

Illustratively, a maximum similarity is determined based on the similarity of the behavioral characteristics to the central characteristics of each class cluster. And determining the central characteristic of the class cluster corresponding to the maximum similarity as the central characteristic of the target class cluster corresponding to the behavior characteristic. Or determining whether the maximum similarity is greater than a similarity threshold (which can be configured empirically, such as 0.5, etc.), if so, determining the cluster center feature corresponding to the maximum similarity as the target cluster center feature corresponding to the behavior feature, and if not, determining that the behavior feature does not have the corresponding target cluster center feature.

And 105, determining that the video to be detected has the specified type of behavior or the video to be detected does not have the specified type of behavior based on the target class cluster center characteristics corresponding to the behavior characteristics.

In a possible implementation manner, if the target cluster center features corresponding to the behavior features are identical to the cluster center features, and the sequence of the target cluster center features corresponding to the behavior features is matched with the sequence of the cluster center features, determining that the video to be detected has a specific type of behavior; otherwise, determining that the video to be detected does not have the specified type of behavior. The sequence of the central characteristics of the target class cluster corresponding to the behavior characteristics is matched with the time sequence of the behavior characteristics; the specified type behavior comprises a plurality of sub-behaviors, the number of the sub-behaviors is the same as that of the central characteristics of the class clusters, the plurality of sub-behaviors are in one-to-one correspondence with the central characteristics of the class clusters, and the sequence of the central characteristics of the class clusters is matched with the occurrence sequence of the plurality of sub-behaviors.

In one possible implementation manner, the obtaining manner of the plurality of cluster-like center features corresponding to the specified type of behavior may include, but is not limited to: a plurality of calibration sample images of a specified type of behavior are acquired, a plurality of target sample images of the same sample object are selected from the plurality of calibration sample images, and a sample sequence of the sample object is acquired based on the plurality of target sample images, wherein the sample sequence can comprise sub-images which are truncated from the plurality of target sample images based on the sample frame position of the sample object. Obtaining a plurality of sample characteristics of the sample object according to the sample sequence, and clustering the plurality of sample characteristics to obtain a plurality of class clusters, wherein each class cluster comprises at least one sample characteristic; for each class cluster, determining the class cluster center characteristic of the class cluster based on the sample characteristic in the class cluster to obtain class cluster center characteristics respectively corresponding to a plurality of class clusters.

For example, for each sample feature within a class cluster, feature values for multiple feature dimensions may be included; on this basis, cluster-like center features of the clusters are determined based on sample features within the clusters, which may include, but are not limited to: for each feature dimension, a target feature value for the feature dimension (e.g., an average of all feature values for the feature dimension) may be determined based on the feature values for the feature dimension in all sample features within the class of clusters; and determining the cluster-like central feature of the cluster based on the target feature values of the feature dimensions.

For example, a sample feature closest to the center feature of the cluster may be selected from all sample features in the cluster, and a sample sequence corresponding to the selected sample feature may be determined as a behavior cluster sample of the cluster. On this basis, after step 105, if the video to be detected has a specific type of behavior, a slice sequence with the specific type of behavior may be displayed, and a behavior cluster sample of each cluster may be displayed; if the video to be detected does not have the specified type of behavior, a slice sequence with the specified type of behavior and a slice sequence without the specified type of behavior can be displayed, and behavior class cluster examples of each class cluster are displayed.

For example, the above execution sequence is only an example given for convenience of description, and in practical application, the execution sequence between steps may be changed, which is not limited. Moreover, in other embodiments, the steps of the corresponding methods need not be performed in the order shown and described herein, and the methods may include more or less steps than described herein. Furthermore, individual steps described in this specification, in other embodiments, may be described as being split into multiple steps; various steps described in this specification, in other embodiments, may be combined into a single step.

The above technical solution of the embodiments of the present application is described below with reference to specific application scenarios.

The video behavior sequence compliance detection technology can be widely applied to the application fields of security monitoring field, man-machine interaction field, intelligent park, intelligent classroom, intelligent farm and the like. In the application scenario, the embodiment of the application provides a behavior detection method, which is used for realizing a universal video behavior sequence compliance detection technology, and aims to improve the universality of the video behavior sequence compliance detection technology, reduce the use threshold of the technology and facilitate the quick popularization of the technology in various fields.

For example, the behavior detection method is used for detecting whether a behavior sequence of an operator in an industrial production process accords with a standard behavior specification (the standard process is that a power supply is turned on, a dashboard is operated, a melt-blown material is filled, a scrubbing and disinfecting operation table is turned off, and a motor is turned off), the behavior of the operator is called as a specified type behavior, and the specified type behavior comprises sub-behaviors of turning on the power supply, operating the dashboard, filling the melt-blown material, scrubbing and disinfecting operation table, turning off the motor and the like, and the occurrence sequence of the sub-behaviors is as follows: the method comprises the steps of power on, instrument panel operation, melt blown material filling, scrubbing and sterilizing operation table, and motor off. For another example, the behavior detection method is used for detecting whether the behavior sequence of the chef meets the catering standard (the standard flow is that the chef is worn on the cap, the food material is cleaned, the dish is cut, the dish is fired, the dish is cooked, the dish is packed and the fire is turned off), the behavior of the chef is called as a specified type of behavior, and the specified type of behavior comprises child behaviors such as the step of wearing the chef on the cap, the step of cleaning the food material, the step of cutting the dish, the step of firing, the step of cooking, the step of packing, the step of turning off the fire and the like. For another example, the behavior detection method is used for detecting whether the feeding action sequence of the animal is in compliance (the compliance flow is that the animal is out of the fence, drinking water, eating grass and returning to the fence), the behavior of the animal is called as a specified type of behavior, and the specified type of behavior comprises sub-behaviors such as out of the fence, drinking water, eating grass and returning to the fence. For another example, the behavior detection method is used for detecting whether the chemical experiment operation of the student is compliant (standard flow is that a reagent is added into a beaker, an alcohol lamp is ignited, the reagent in the beaker is stirred, experimental data is recorded, the alcohol lamp is removed, and the alcohol lamp is turned off), and the behavior of the student is called as a specified type of behavior, and the specified type of behavior comprises a plurality of sub-behaviors.

Of course, the above are just a few examples of application scenarios, which are not limiting. For convenience of description, it is taken as an example to detect whether the behavior sequence of the operator in the industrial production process meets the standard behavior specification.

The behavior detection method of the embodiment of the application can relate to a training process and a detection process, and the training process and the detection process are respectively described below. Referring to fig. 2, a schematic diagram of a training process is shown, through which a plurality of cluster-like center features corresponding to a specific type of behavior can be obtained.

Step 201, a plurality of calibration sample images in which a specified type of behavior occurs are acquired.

For example, a sample training video may be acquired that may include a plurality of sample training images that are consecutive images, such as sample training video including consecutive sample training image 1, sample training images 2, …, sample training image m. For the sample training video, the sample training image with the specified type of behavior (such as any sub-behavior with the specified type of behavior) in the sample training video can be calibrated, and the calibrated sample training image is referred to as a calibrated sample image, i.e., the calibrated sample image is the sample training image with the specified type of behavior in the sample training video.

The method for calibrating the calibration sample image can be as follows: for each calibration sample image where the specified type of behavior occurs, the spatial position of the specified type of behavior is calibrated through a drawing frame (including but not limited to a rectangular frame, a circular frame, a polygonal frame and the like), and labeling information such as behavior category and the like is given.

For example, for a sample training video input by a user, assuming that sub-action 1 (power on) of a specified type of action occurs in frames 10 to 19 (sample training image), sub-action 2 (dashboard operation) of a specified type of action occurs in frames 20 to 28, sub-action 3 (meltblown material loading) of a specified type of action occurs in frames 29 to 35, sub-action 4 (scrub and disinfect console) of a specified type of action occurs in frames 36 to 43, sub-action 5 (motor off) of a specified type of action occurs in frames 44 to 50, frames 10 to 50 are calibration sample images.

For each calibration sample image, an object (such as a person) with a specified type of behavior in the calibration sample image is selected by a frame pulling mode, taking a rectangular frame as an example, the rectangular frame comprises the object with the specified type of behavior, and the spatial position of the calibration sample image is the object position of the object, and the object position can comprise coordinate information of the rectangular frame, such as upper left corner coordinates (upper left corner abscissa and upper left corner ordinate) and lower right corner coordinates (lower right corner abscissa and lower right corner ordinate), or lower left corner coordinates (lower left corner abscissa and lower left corner ordinate) and upper right corner coordinates (upper right corner abscissa and upper right corner ordinate). Of course, the above is merely an example of the coordinate information of the rectangular frame, and is not limited thereto. For example, the coordinate information may be the upper left corner coordinate, the width and height of the rectangular frame, and the lower right corner coordinate may be determined by the upper left corner coordinate, the width and height of the rectangular frame. For another example, the coordinate information may be a lower left corner coordinate, a width and a height of the rectangular frame, and an upper right corner coordinate may be determined by the lower left corner coordinate, the width and the height of the rectangular frame. Obviously, the rectangular frame of the object, namely the position of the object, of the specified type of behavior in the calibration sample image can be determined through the coordinate information.

Step 202, selecting a plurality of target sample images of the same sample object from a plurality of calibration sample images.

For example, if only one object exists in the sample training video and a specified type of behavior occurs, the object is taken as a sample object, and all calibration sample images are taken as target sample images of the sample object.

And if at least two objects exist in the sample training video and the specified type of behavior occurs, taking each object as a sample object. For each sample object (the subsequent processing of one sample object is taken as an example), a calibration sample image in which the sample object exists is selected from all calibration sample images, and the selected calibration sample image is taken as a target sample image of the sample object. For example, the calibration sample image of the sample object 1 with the specified type of behavior is 1-10, and the calibration sample image of the sample object 2 with the specified type of behavior is 11-25, and the calibration sample image 1-10 is taken as the target sample image of the sample object 1.

In order to select the target sample image of the sample object from all the calibration sample images, a tracking algorithm may be used to track the sample object, so as to obtain a plurality of calibration sample images of the sample object, and these calibration sample images are used as the target sample image of the sample object, which is not described in detail.

A sample sequence of the sample object is acquired based on the plurality of target sample images, which may include sub-images truncated from the plurality of target sample images based on a sample frame position of the sample object, step 203.

Illustratively, for step 203, the sample sequence may be obtained by:

in step 2031, a sample frame position of the sample object is determined based on the object positions of the sample object in the plurality of target sample images, where the sample frame position may represent a spatial range of all object positions, where the spatial range may include, but is not limited to, an circumscribed rectangular frame, an circumscribed circular frame, an circumscribed polygonal frame, etc., and the circumscribed rectangular frame is taken as an example subsequently, i.e., the sample frame position may be a maximum circumscribed rectangle of all object positions.

With reference to the above embodiment, the object position of the sample object in each target sample image can be known, and based on this, the maximum bounding rectangle of all the object positions is taken as the sample frame position of the sample object.

For example, when the coordinate system is established with the upper left corner position of the target sample image as the origin of coordinates, the horizontal right is the horizontal axis, and the horizontal down is the vertical axis, the object position may include the upper left corner abscissa, the upper left corner ordinate, the lower right corner abscissa, and the lower right corner ordinate. On the basis, selecting the minimum value of the left upper-corner abscissa based on the left upper-corner abscissa of the object position in each target sample image; selecting a minimum value of the upper left ordinate based on the upper left ordinate of the object position in each target sample image; selecting a maximum value of a lower right-hand abscissa based on the lower right-hand abscissa of the object position in each target sample image; the maximum value of the lower right ordinate is selected based on the lower right ordinate of the object position in each target sample image.

Then, the sample frame position of the sample object is determined from the minimum value of the upper left-hand abscissa, the minimum value of the upper left-hand ordinate, the maximum value of the lower right-hand abscissa, and the maximum value of the lower right-hand ordinate.

Referring to fig. 3A, the object position in each target sample image for the sample object includes an upper left corner coordinate (upper left corner abscissa left_top_x, upper left corner ordinate left_top_y) and a lower right corner coordinate (lower right corner abscissa right_bottom_x, lower right corner ordinate right_bottom_y). The minimum value of the upper left-hand abscissa is chosen based on all upper left-hand abscissas and denoted min ({ left_top_x }), and the minimum value of the upper left-hand ordinate is chosen based on all upper left-hand abscissas and denoted min ({ left_top_y }). The maximum value of the lower right-hand abscissa is selected based on all lower right-hand abscissas and denoted as max ({ right_bottom_x }, and the maximum value of the lower right-hand ordinate is selected based on all lower right-hand abscissas and denoted as max ({ right_bottom_y }).

Then, min ({ left_top_x }) and min ({ left_top_y }) are combined into one coordinate point A1, max ({ right_bottom_x } and max ({ right_bottom_y }) are combined into one coordinate point A2, and a rectangular frame composed based on the coordinate point A1 and the coordinate point A2 is the sample frame position of the sample object.

For another example, the object position may include a lower left-hand abscissa, a lower left-hand ordinate, an upper right-hand abscissa, and an upper right-hand ordinate, with the lower left-hand position of the object sample image as the origin of coordinates, and with the horizontal right as the horizontal axis and the horizontal up as the vertical axis. On the basis, selecting the minimum value of the left lower-corner abscissa based on the left lower-corner abscissa of the object position in each target sample image; selecting a minimum value of the lower left ordinate of the object position in each target sample image based on the lower left ordinate of the object position; selecting a maximum value of an upper right-hand abscissa based on the upper right-hand abscissa of the object position in each target sample image; the maximum value of the upper right ordinate is selected based on the upper right ordinate of the object position in each target sample image.

Then, the sample frame position of the sample object is determined from the minimum value of the lower left-hand abscissa, the minimum value of the lower left-hand ordinate, the maximum value of the upper right-hand abscissa, and the maximum value of the upper right-hand ordinate.

Referring to fig. 3B, the object position in each target sample image for the sample object includes a lower left corner coordinate (lower left corner abscissa left_bottom_x, lower left corner ordinate left_bottom_y) and an upper right corner coordinate (upper right corner abscissa right_top_x, upper right corner ordinate right_top_y). The minimum value of the lower left-hand abscissa is chosen based on all lower left-hand abscissas and denoted min ({ left_bottom_x }), and the minimum value of the lower left-hand ordinate is chosen based on all lower left-hand abscissas and denoted min ({ left_bottom_y }). The maximum value of the upper right-hand abscissa is chosen based on all upper right-hand abscissas and denoted as max ({ right_top_x }), and the maximum value of the upper right-hand ordinate is chosen based on all upper right-hand abscissas and denoted as max ({ right_top_y }).

Then, min ({ left_bottom_x }) and min ({ left_bottom_y }) may be combined into one coordinate point B1, max ({ right_top_x } and max ({ right_top_y }) may be combined into one coordinate point B2, and a rectangular frame based on the coordinate point B1 and the coordinate point B2 may be the sample frame position of the sample object.

Of course, the above manner is merely an example, and is not limited thereto, as long as the sample frame position can be determined.

Step 2032, a plurality of sub-images are truncated from the plurality of target sample images based on the sample frame position.

For example, for each target sample image, a sub-image matching the sample box position may be truncated from the target sample image. For example, a rectangular frame is determined based on the sample frame position, the upper left-hand abscissa of the rectangular frame is the minimum value of the upper left-hand abscissa, the upper left-hand ordinate of the rectangular frame may be the minimum value of the upper left-hand ordinate, the lower right-hand abscissa of the rectangular frame may be the maximum value of the lower right-hand abscissa, the lower right-hand ordinate of the rectangular frame may be the maximum value of the lower right-hand ordinate, and after obtaining the rectangular frame, a sub-image matching the rectangular frame may be cut from the target sample image.

After the above processing is performed on each target sample image, a plurality of sub-images can be obtained.

Step 2033, slicing the plurality of sub-images according to a unit length (which may be empirically configured) to obtain at least one sample sequence, where an interval between two adjacent sample sequences is a fixed interval value.

Illustratively, the unit length may be recorded as N, and the value of N may be empirically configured, such as 16 frames, 32 frames, etc., which is not limited. The fixed interval value may be empirically configured such as-N/2, -N/4,0, N/4, N/2, etc., and is not limited thereto. By setting the fixed interval value to-N/2, -N/4,0, N/4, N/2 equal intervals, it is possible to ensure that the granularity of the slice adapts to the sub-behaviors of different intervals.

For example, assuming that N is 16 and the fixed interval value is-N/2, then the plurality of sub-images are sliced (it is necessary to slice the plurality of sub-images in the video in order of the video, e.g., sub-image 1 is earlier than sub-image 2, sub-image 2 is earlier than sub-image 3, and so on), sample sequence 1 may include sub-image 1-sub-image 16, sample sequence 2 may include sub-image 9-sub-image 24, sample sequence 3 may include sub-image 17-sub-image 32, and so on. Obviously, there are 8 repeated frames (i.e., N/2) of images for sample sequence 2 and sample sequence 1, 8 repeated frames of images for sample sequence 3 and sample sequence 2, and so on.

For another example, assuming that N is 16 and the fixed interval value is 0, after slicing the plurality of sub-images, sample sequence 1 may include sub-image 1-sub-image 16, sample sequence 2 may include sub-image 17-sub-image 32, sample sequence 3 may include sub-image 33-sub-image 48, and so on.

For another example, assuming that N is 16 and the fixed interval value is N/2, after slicing the plurality of sub-images, sample sequence 1 may include sub-images 1-16, sample sequence 2 may include sub-images 25-40, sample sequence 3 may include sub-images 49-64, and so on. Obviously, sample sequence 2 is separated from sample sequence 1 by 8 frames (i.e., N/2) of images, and so on.

Step 204, obtaining a plurality of sample features of a sample object from a sample sequence.

For each sample sequence, feature extraction may be performed on the sample sequence (i.e., a plurality of sub-images), to obtain vectorized feature data capable of expressing the sample sequence, and the vectorized feature data is taken as a sample feature of a sample object. For example, the sample sequence may be extracted by using a general behavior recognition model (including but not limited to LSTM, dual-flow network, C3D, P3D, I3D, slow, etc.) and a classification neural network (including but not limited to resnet18, resnet50, resnet101, resnet152, VGG, etc.), so as to obtain the sample feature. Of course, the above is merely an example of extracting features from a sample sequence, and is not limited thereto, as long as the sample features can be obtained. The method for extracting the characteristics of the sample sequence by using the general behavior recognition model and the classification neural network is not limited in this embodiment.

For each sample sequence, after the feature extraction is performed on the sample sequence, a sample feature (such as a sample feature corresponding to the sample sequence, of course, the sample sequence may also correspond to a plurality of sample features) may be obtained. The sample features corresponding to all sample sequences may then be combined to form a behavioral feature set comprising sample features corresponding to all sample sequences. In the behavior feature set, the sequence relation of the sample features is reserved, for example, the behavior feature set sequentially comprises the sample features corresponding to the sample sequence 1, the sample features corresponding to the sample sequence 2 and the sample features corresponding to the sample sequence 3, so that the sample features corresponding to the sample sequence 1 are positioned before the sample features corresponding to the sample sequence 2, and the sample features corresponding to the sample sequence 2 are positioned before the sample features corresponding to the sample sequence 3.

For steps 201-204, assume that the user has calibrated k compliance behavior sequences corresponding to k target trajectories, denoted { A } ₁ ,A ₂ ,...,A _k Each target track being a plurality of sub-images (sub-images before slicing) for target track a ₁ After slicing, a sample sequence can be obtained In the process of aiming at the target track A ₂ After slicing, the sample sequence can be obtained>And so on, in the process of aiming at the target track A _k After slicing, a sample sequence can be obtainedFrom the above, a sample sequence set including a plurality of sample sequences can be obtained

In the case of a sample sequenceAfter feature extraction, sample features can be obtained>In the case of sample sequences->After feature extraction, sample features can be obtained>In the same way, in the case of the sample sequence setAfter feature extraction of each sample sequence in (a) a behavior feature set +.>

Step 205, clustering the plurality of sample features to obtain a plurality of class clusters, wherein each class cluster comprises at least one sample feature, and the plurality of class clusters are a plurality of class clusters corresponding to the specified type of behavior.

For example, for a specific type of behavior, the specific type of behavior may include multiple sub-behaviors, and since the same sub-behavior has the same or similar characteristics, and different sub-behaviors have different characteristics, after extracting the characteristics of the sample sequence to obtain sample characteristics, all the sample characteristics may be clustered to obtain multiple class clusters, that is, the same or similar sample characteristics are clustered to the same class cluster, and different sample characteristics are clustered to different class clusters, so as to obtain multiple class clusters, where each class cluster includes at least one sample characteristic.

Obviously, the number of class clusters and the number of sub-behaviors can be the same, namely, the sub-behaviors are in one-to-one correspondence with the class clusters, and when the specified type behavior comprises a plurality of sub-behaviors, the specified type behavior corresponds to the plurality of class clusters.

When clustering all sample features to obtain a plurality of class clusters, an unsupervised clustering algorithm (such as a hierarchical clustering algorithm, a density-based clustering algorithm, etc., which is not limited thereto) may be used to cluster all sample features to obtain a plurality of class clusters, and the clustering mode is not limited thereto, so long as the same or similar sample features can be clustered to the same class cluster, and different sample features can be clustered to different class clusters.

For example, the specified type of behavior may include m sub-behaviors, and since the same sub-behaviors have the same or similar features, different sub-behaviors have different features, after the behavior feature set F is clustered, m class clusters may be obtained, where the m sub-behaviors are in one-to-one correspondence with the m class clusters.

In one possible implementation manner, the specified type of behavior includes a plurality of sub-behaviors, the plurality of sub-behaviors are in one-to-one correspondence with the plurality of class clusters, and the order of the plurality of class clusters matches the order of occurrence of the plurality of sub-behaviors, for example, the order of occurrence of the plurality of sub-behaviors is sub-behavior 1, sub-behavior 2 and sub-behavior 3 in turn, and then the order of the plurality of class clusters is class cluster corresponding to sub-behavior 1, class cluster corresponding to sub-behavior 2 and class cluster corresponding to sub-behavior 3 in turn.

For example, the order relationship of the plurality of class clusters may be determined based on the order relationship of the sample features within the class clusters. For example, referring to the above embodiment, the precedence relationship of the sample feature is reserved in the behavior feature set, and the precedence relationship of the sample feature is determined based on the precedence relationship of the sample sequence, and the precedence relationship of the sample sequence is determined based on the precedence relationship of the target sample image in the video, so that the precedence relationship of the sample feature is matched with the occurrence sequence of the plurality of sub-behaviors in the video. On this basis, for any two clusters (denoted as cluster 1 and cluster 2), if the sample feature (any sample feature) in cluster 1 is located before the sample feature in cluster 2, then cluster 1 is located before cluster 2, and if the sample feature in cluster 1 is located after the sample feature in cluster 2, then cluster 1 is located after cluster 2.

In summary, after obtaining a plurality of class clusters, a sequential relationship of the plurality of class clusters can be obtained.

Step 206, determining a class cluster center feature of the class cluster based on the sample feature in the class cluster for each of the plurality of class clusters to obtain class cluster center features respectively corresponding to the plurality of class clusters.

Obviously, for each class cluster, the class cluster has a class cluster center feature, that is, the number of class cluster center features is the same as the number of class clusters, that is, the class cluster center features are in one-to-one correspondence with the class clusters.

The specified type behavior comprises a plurality of sub-behaviors, the number of the class clusters is the same as the number of the sub-behaviors (the sub-behaviors are in one-to-one correspondence with the class clusters), the number of the class cluster center features is the same as the number of the class clusters (the class cluster center features are in one-to-one correspondence with the class clusters), and therefore the number of the sub-behaviors is the same as the number of the class cluster center features, and the plurality of sub-behaviors are in one-to-one correspondence with the plurality of class cluster center features, namely the specified type behavior is corresponding to the plurality of class cluster center features.

In one possible implementation, the plurality of sub-actions are in one-to-one correspondence with the plurality of cluster-like central features, and the order of the plurality of cluster-like central features matches the order of occurrence of the plurality of sub-actions. For example, the occurrence sequence of the plurality of sub-behaviors is as follows: sub-behavior 1, sub-behavior 2 and sub-behavior 3, the sequence of the central features of the plurality of clusters is sequentially: the cluster center feature corresponding to the child behavior 1 (i.e., the cluster center feature of the cluster corresponding to the child behavior 1), the cluster center feature corresponding to the child behavior 2, and the cluster center feature corresponding to the child behavior 3.

Obviously, referring to the above embodiment, the order relation of the plurality of class clusters has been obtained, and therefore, the order relation of the plurality of class cluster center features may be determined based on the order relation of the plurality of class clusters, for example, if the class cluster 1 is located before the class cluster 2, the class cluster center feature of the class cluster 1 is located before the class cluster center feature of the class cluster 2.

In one possible implementation, for each sample feature within a class cluster, the sample feature may be F _i ＝[v ₁ ,v ₂ ,...,v _w ]W is the feature dimension, v ₁ Is the eigenvalue of the characteristic dimension 1, v ₂ Is the eigenvalue of the characteristic dimension 2, v _w Is the eigenvalue of the eigenvector w. Based on the method, the characteristic values of the characteristic dimension 1 of all sample characteristics in the class cluster can be averaged to obtain a target characteristic value of the characteristic dimension 1Averaging the feature values of feature dimension 2 of all sample features in the class cluster to obtain target feature value +.>And by analogy, the characteristic values of the characteristic dimension w of all sample characteristics in the class cluster are averaged to obtain a target characteristic value +.>Then, these target feature values can be combined to obtain the cluster-like central feature of the cluster, for example, the cluster-like central feature of the cluster can be +. >

Assuming that m class clusters exist, the class cluster center characteristic of the m class clusters may be { Fc ₁ ,Fc ₂ ,...,Fc _m }。

Step 207, selecting, for each of the plurality of class clusters, a sample feature closest to a class cluster center feature (i.e., a class cluster center feature of the class cluster) from all sample features in the class cluster, and determining a sample sequence corresponding to the selected sample feature as a behavior class cluster sample of the class cluster.

For each cluster, the distance between each sample feature in the cluster and the central feature of the cluster (such as euclidean distance, cosine distance, etc. without limitation) can be calculated after the central feature of the cluster is obtained, the sample feature closest to the central feature of the cluster is taken as the representative feature of the cluster, and the sample sequence corresponding to the sample feature is taken as the behavior cluster sample of the cluster.

In summary, in the training process, a plurality of cluster center features corresponding to the specified type of behavior can be obtained, the sequence of the cluster center features is obtained, and the number of clusters and the behavior cluster sample are obtained.

The following describes a detection process, referring to fig. 4, which is a schematic diagram of the detection process, through which whether a video to be detected has a specific type of behavior can be detected.

Step 401, obtaining a video to be detected, where the video to be detected includes a plurality of images to be detected.

For example, the video to be detected may include a plurality of consecutive images to be detected, e.g., the video to be detected may include consecutive image to be detected 1, image to be detected 2, …, image to be detected n.

In step 402, target detection is performed on a specific target in the multiple images to be detected, so as to obtain the object position of the specific target in multiple candidate images to be detected, where the candidate images to be detected are images to be detected in which the specific target exists in the multiple images to be detected, and the specific target may include at least one target object.

For example, a target detection algorithm may be used to detect a specific target (including but not limited to a person, a car, an animal, etc.) in a video to be detected (i.e., a plurality of images to be detected), obtain an image to be detected in which the specific target exists, record the image to be detected in which the specific target exists as a candidate image to be detected, and determine an object position of the specific target in the plurality of candidate images to be detected by using the target detection algorithm.

By way of example, the object detection algorithm may include, but is not limited to, HOG, DPM, faster R-CNN, yolo-V3, SSD, and the like, without limitation to this object detection algorithm. With respect to the process of performing target detection on a specific target in a video to be detected by using the target detection algorithm, there is no limitation in this embodiment.

In step 403, the target tracking is performed on the same target object in the plurality of candidate to-be-detected images, so as to obtain the object position of the target object in the plurality of target to-be-detected images, and the target to-be-detected image may be, for example, an to-be-detected image in which the target object exists in the plurality of candidate to-be-detected images.

For example, when target detection is performed on a specific target in a plurality of images to be detected, a candidate image to be detected of at least one target object may be obtained. For each target object (a processing procedure of one target object is taken as an example later), a target to-be-detected image in which the target object exists needs to be selected from all candidate to-be-detected images, and an object position of the target object in the target to-be-detected image is determined.

For example, a tracking algorithm may be used to track a target object of the plurality of candidate images to be detected, to obtain an image to be detected with the target object, the image to be detected with the target object is recorded as a target image to be detected, and a tracking algorithm is used to determine an object position of the target object in the plurality of target images to be detected, where the object position may include coordinate information of a rectangular frame.

By way of example, the tracking algorithm may include, but is not limited to, MOT, deepSort, etc., without limitation to this tracking algorithm. With respect to the process of performing target tracking on the target object using the tracking algorithm, there is no limitation in the present embodiment. For example, based on the target detection result (i.e., the object positions of the specific target in the plurality of candidate images to be detected), the association of the same target object is completed, track information of the target object is generated, and the track information of the target object may include the object positions of the target object in the plurality of target images to be detected.

Step 404, acquiring a slice sequence of the target object based on the plurality of target to-be-detected images, wherein the slice sequence comprises sub-images intercepted from the plurality of target to-be-detected images based on the target frame position of the target object.

Illustratively, for step 404, the slice sequence may be obtained using the following steps:

step 4041, determining a target frame position of the target object based on the object positions of the target object in the plurality of target to-be-detected images, where the target frame position represents a spatial range of all object positions corresponding to the target object, where the spatial range may include, but is not limited to, an circumscribed rectangle frame, a circumscribed circle frame, a circumscribed polygon frame, and the like, and taking the circumscribed rectangle frame as an example, that is, the target frame position may be a maximum circumscribed rectangle of all object positions. For example, the object positions of the target object in the plurality of target images to be detected can be known, and based on the object positions, the maximum circumscribed rectangle of all the object positions is taken as the target frame position of the target object.

The implementation process of step 4041 is similar to that of step 2031, and will not be repeated here.

Step 4042, a plurality of sub-images are truncated from the plurality of target to-be-detected images based on the target frame position.

The implementation process of step 4042 is similar to that of step 2032, and will not be repeated here.

Step 4043, slicing the plurality of sub-images according to the unit length to obtain at least one slice sequence, wherein in the at least one slice sequence, the interval between two adjacent slice sequences may be a fixed interval value.

The implementation process of step 4043 is similar to that of step 2033, except that the sample sequence is replaced by a slice sequence, and the processing of the sample sequence and the slice sequence is the same, so that the detailed description will not be repeated.

Step 405, acquiring a plurality of behavior features of a target object according to a slice sequence.

For each slice sequence, feature extraction may be performed on the slice sequence (i.e., a plurality of sub-images), to obtain vectorized feature data capable of expressing the slice sequence, and the vectorized feature data is taken as a behavior feature of the target object. For example, a general behavior recognition model and a classification neural network can be used to perform feature extraction on the slice sequence to obtain behavior features. For each slice sequence, after feature extraction of the slice sequence, a behavioral feature can be obtained. The behavioral characteristics corresponding to all slice sequences may then be combined to form a behavioral characteristic set comprising the behavioral characteristics corresponding to all slice sequences. In the behavior feature set, the precedence relationship of the behavior features is preserved.

Step 406, for each behavior feature, determining a similarity between the behavior feature and each cluster center feature based on a plurality of cluster center features corresponding to the specified type of behavior; and selecting a target cluster center feature from a plurality of cluster center features based on the similarity of the behavior feature and each cluster center feature.

Referring to step 206, a plurality of cluster-like central features corresponding to the specified type of behavior have been obtained, so that a similarity (such as cosine similarity) between the behavior feature and each cluster-like central feature can be determined, and a maximum similarity can be determined from the similarities. And then, determining whether the maximum similarity is larger than a similarity threshold, if so, determining the cluster center feature corresponding to the maximum similarity as the target cluster center feature corresponding to the behavior feature, and if not, determining that the behavior feature does not have the corresponding target cluster center feature.

In summary, the center feature of the target cluster corresponding to the behavior feature can be obtained. For example, the behavior feature set sequentially includes a behavior feature 1, a behavior feature 2, and a behavior feature 3, where the behavior features have a sequential relationship, and in step 206, a target cluster center feature (such as cluster center feature 1) corresponding to the behavior feature 1 may be obtained, a target cluster center feature (such as cluster center feature 2) corresponding to the behavior feature 2 may be obtained, and a target cluster center feature (such as cluster center feature 3) corresponding to the behavior feature 3 may be obtained.

For example, since the plurality of behavior features have a precedence relationship, the target cluster center features corresponding to the plurality of behavior features also have a precedence relationship, and the sequence of the target cluster center features corresponding to the plurality of behavior features matches the time sequence of the plurality of behavior features. For example, the time sequence of the behavior features is behavior feature 1, behavior feature 2 and behavior feature 3, and the sequence of the center features of the target class cluster corresponding to the behavior features is: the central characteristics of the target class cluster corresponding to the behavior characteristic 1, the central characteristics of the target class cluster corresponding to the behavior characteristic 2 and the central characteristics of the target class cluster corresponding to the behavior characteristic 3.

And step 407, determining that the video to be detected has the specified type of behavior or the video to be detected does not have the specified type of behavior based on the target class cluster center features corresponding to the behavior features. For example, if the target cluster center feature corresponding to the plurality of behavior features is identical to all the cluster center features corresponding to the specified type of behavior, and the sequence of the target cluster center feature corresponding to the plurality of behavior features (see the above embodiment, the sequence of the target cluster center feature corresponding to the plurality of behavior features has been determined, that is, the chronological sequence of the plurality of behavior features) matches the sequence of the plurality of cluster center features (see the above embodiment, the sequence of the plurality of cluster center features has been determined, and the sequence of the plurality of cluster center features matches the occurrence sequence of the plurality of sub-behaviors), then it is determined that the specified type of behavior exists in the video to be detected, otherwise, it is determined that the specified type of behavior does not exist in the video to be detected.

By way of example, the fact that the target cluster center feature corresponding to the plurality of behavior features is identical to all cluster center features corresponding to the specified type of behavior means that: assuming that the specified type of behavior corresponds to the cluster center feature 1, the cluster center feature 2 and the cluster center feature 3, the target cluster center features corresponding to the behavior features need to include the cluster center feature 1, the cluster center feature 2 and the cluster center feature 3. If the target cluster center features corresponding to the behavior features only include part (but not all) of the cluster center features 1, 2 and 3, it is indicated that all the cluster center features corresponding to the specified type of behavior are not completely identical.

Illustratively, the matching of the order of the target cluster center features corresponding to the plurality of behavior features with the order of the plurality of cluster center features means: assuming that the sequence of all the cluster center features corresponding to the specified type behavior is a cluster center feature 1, a cluster center feature 2 and a cluster center feature 3 in sequence, the sequence of the target cluster center features corresponding to the behavior features also needs to be a cluster center feature 1, a cluster center feature 2 and a cluster center feature 3. If the sequence of the target cluster center features corresponding to the behavior features is not the cluster center feature 1, the cluster center feature 2 and the cluster center feature 3, the sequence of the target cluster center features is not matched with the sequence of the cluster center features.

For example, if the order of the target cluster center features corresponding to the behavior features is the cluster center feature 1, the cluster center feature 2, the cluster center feature 3, and the cluster center feature 3, the target cluster center features are sequentially matched with the order of the cluster center features, that is, the two cluster center features 1 are combined, and the two cluster center features 3 are combined, so as to obtain the order of the cluster center feature 1, the cluster center feature 2, and the cluster center feature 3.

If the sequence of the target cluster center features corresponding to the behavior features is a cluster center feature 1, a cluster center feature 3, a cluster center feature 2 and a cluster center feature 2 in sequence, the sequence of the target cluster center features is not matched with the sequence of the target cluster center features, namely, the two cluster center features 1 are combined, and the two cluster center features 2 are combined to obtain the sequence of the cluster center feature 1, the cluster center feature 3 and the cluster center feature 2.

In one possible implementation, referring to fig. 5A, a similarity matrix of the behavior feature (i.e., each behavior feature in the behavior feature set) and the cluster center feature (i.e., each cluster center feature corresponding to the specified type of behavior) may be obtained, where the similarity matrix may be m rows and q columns. Illustratively, m rows represent m cluster-like central features, which are in the order of 1, 2, …, m, q columns represent q behavioral features, which are in the order of 1, 2, …, q. Based on the similarity matrix, the first row of values represents the similarity of each behavioral feature to the first cluster-like central feature, the second row of values represents the similarity of each behavioral feature to the second cluster-like central feature, and so on.

As can be seen from fig. 5A, the behavior feature 1, the behavior feature 2, the behavior feature 3, and the behavior feature 13 (i.e., q) do not have corresponding target cluster center features, the target cluster center features corresponding to the behavior feature 4 and the behavior feature 5 are both the cluster center feature 1, the target cluster center feature corresponding to the behavior feature 6 is the cluster center feature 2, the target cluster center feature corresponding to the behavior feature 7 is the cluster center feature 3, the target cluster center feature corresponding to the behavior feature 8 is the cluster center feature 4, the target cluster center feature corresponding to the behavior feature 9 is the cluster center feature 5, the target cluster center feature corresponding to the behavior feature 10 is the cluster center feature 6, and the target cluster center features corresponding to the behavior feature 11 and the behavior feature 12 are both the cluster center feature 7 (i.e., m).

In summary, the target cluster center features corresponding to all behavior features are identical to all cluster center features corresponding to the specified types of behaviors (i.e., cluster center feature 1-cluster center feature 7), and the order of the target cluster center features corresponding to all behavior features is matched with the order of all cluster center features corresponding to all specified types of behaviors, i.e., the order is cluster center features 1, … and cluster center feature 7.

Referring to fig. 5B, the target cluster center feature corresponding to all behavior features does not include the cluster center feature 4, and thus, all cluster center features corresponding to a specified type of behavior are not exactly the same.

Referring to fig. 5C, the target cluster center feature corresponding to all behavior features does not include the cluster center feature 1, and thus, all cluster center features corresponding to a specified type of behavior are not exactly the same.

Referring to fig. 5D, the target cluster center feature corresponding to all behavior features does not include the cluster center feature 7 (i.e., m), and thus, all cluster center features corresponding to the specified type of behavior are not exactly the same.

Referring to fig. 5D, the order of the central features of the target class cluster corresponding to the plurality of behavior features is sequentially: the cluster center features 1, 2, 3, 4, 5, 6 and 2 are not matched with the sequence of the cluster center features.

In one possible implementation manner, if the video to be detected has a specific type of behavior, a slice sequence with the specific type of behavior can be displayed, and a behavior class cluster sample of each class cluster is displayed. Or if the video to be detected does not have the specified type of behavior, displaying a slice sequence with the specified type of behavior and a slice sequence without the specified type of behavior, and displaying a behavior class cluster sample of each class cluster.

For example, the compliance detection result of the target behavior sequence may be displayed in a personalized manner, for example, if compliance of the target behavior sequence is detected, slice sequences of the compliance behavior are displayed in series according to time sequence, and behavior cluster samples are displayed in a superimposed manner. If the target behavior sequence is detected to be non-compliant, the compliant slice sequence and the non-compliant slice sequence are displayed in series according to the time sequence, and behavior cluster samples are displayed in a superposition mode.

Through displaying the compliant slice sequence and the non-compliant slice sequence and superposing and displaying the behavior cluster sample, the method is convenient for prompting the user to carry out the non-compliant self-check and the quality improvement, and is beneficial to management workers to carry out summary analysis and the quality improvement. Referring to FIG. 5E, a schematic diagram of a compliance visualization interface is shown.

In this embodiment, the above-mentioned flow may be implemented by a target detection module, a target tracking module, a time sequence slicing module, a feature extraction module, a feature clustering module, a similarity measurement module, and a visualization module. For example, step 202 is implemented based on the object tracking module, step 203 is implemented based on the time series slicing module, step 204 is implemented based on the feature extraction module, and steps 205-207 are implemented based on the feature clustering module. For another example, the step 402 is implemented based on the target detection module, the step 403 is implemented based on the target tracking module, the step 404 is implemented based on the time sequence slicing module, the step 405 is implemented based on the feature extraction module, the steps 406-407 are implemented based on the similarity measurement module, and the display of the slice sequence and the behavior cluster sample is implemented based on the visualization module.

According to the technical scheme, in the embodiment of the application, the video behavior detection accuracy is high, the detection mode is simple, the method is an automatic universal video behavior sequence compliance detection method, the universality of a video behavior sequence compliance detection technology can be improved, the use threshold of the technology is reduced, and the technology can be conveniently and rapidly popularized in various fields. The method avoids the detection of each sub-behavior in the behavior sequence, uses the general behavior recognition model to perform feature extraction, does not need to pay attention to the behavior recognition problem of a specific scene, and is convenient to popularize and use under a plurality of tasks of a plurality of scenes. The key information modeling (cluster-like central characteristics, sequence of the cluster-like central characteristics and the like) of any compliance behavior sequence is automatically completed by adopting the unsupervised clustering, the clustering algorithm has stronger noise interference resistance, the difficulty of sample data collection is reduced, and the related personnel can conveniently and rapidly locate the non-compliance fragments.

Based on the same application concept as the above method, an embodiment of the present application provides a behavior detection device, as shown in fig. 6, which is a schematic structural diagram of the behavior detection device, where the device may include:

a first obtaining module 61, configured to obtain a video to be detected, where the video to be detected includes a plurality of images to be detected; selecting a plurality of target to-be-detected images of the same target object from the plurality of to-be-detected images, and acquiring a slice sequence of the target object based on the plurality of target to-be-detected images, wherein the slice sequence comprises sub-images intercepted from the plurality of target to-be-detected images based on the target frame position of the target object; a second obtaining module 62, configured to obtain a plurality of behavioral characteristics of the target object according to the slice sequence; a determining module 63, configured to determine, for each behavior feature, a similarity between the behavior feature and each cluster center feature based on a plurality of cluster center features corresponding to a specified type of behavior; selecting a target cluster center feature from the plurality of cluster center features based on the similarity of the behavior feature and each cluster center feature; and determining that the video to be detected has the specified type of behavior or the video to be detected does not have the specified type of behavior based on the target cluster center characteristics corresponding to the behavior characteristics.

The determining module 63 is configured to determine, based on the target cluster center features corresponding to the behavior features, that the specified type of behavior exists in the video to be detected, or that the specified type of behavior does not exist in the video to be detected, specifically: if the target cluster center features corresponding to the behavior features are identical to the cluster center features, and the sequence of the target cluster center features corresponding to the behavior features is matched with the sequence of the cluster center features, determining that the video to be detected has the specified type of behavior; otherwise, determining that the specified type of behavior does not exist in the video to be detected; the sequence of the central characteristics of the target class cluster corresponding to the behavior characteristics is matched with the time sequence of the behavior characteristics; the specified type behavior comprises a plurality of sub-behaviors, the number of the sub-behaviors is the same as that of the cluster center features, the plurality of sub-behaviors are in one-to-one correspondence with the cluster center features, and the sequence of the cluster center features is matched with the occurrence sequence of the plurality of sub-behaviors.

For example, the first obtaining module 61 is specifically configured to: performing target detection on specific targets in the plurality of images to be detected to obtain object positions of the specific targets in the plurality of candidate images to be detected; the candidate to-be-detected image is to-be-detected images with specific targets in the plurality of to-be-detected images, wherein the specific targets comprise at least one target object; performing target tracking on the same target object in the plurality of candidate images to be detected to obtain object positions of the target object in the plurality of target images to be detected; the target to-be-detected image is an image to be detected in which the target object exists in the plurality of candidate to-be-detected images.

Illustratively, the first acquiring module 61 is specifically configured to, when acquiring a slice sequence of the target object based on the plurality of target to-be-detected images: determining a target frame position of the target object based on the object positions of the target object in a plurality of target images to be detected, wherein the target frame position is a maximum circumscribed rectangle of all object positions; intercepting a plurality of sub-images from the plurality of target images to be detected based on the target frame positions; slicing the plurality of sub-images according to unit length to obtain at least one slice sequence; wherein the interval between two adjacent slice sequences is a fixed interval value.

Illustratively, the first obtaining module 61 is further configured to: the method comprises the steps of obtaining a plurality of cluster center features corresponding to the specified type behaviors, wherein the cluster center features are specifically used for: acquiring a plurality of calibration sample images of the specified type of behavior;

Based on the same application concept as the above method, an embodiment of the present application provides a behavior detection apparatus, as shown in fig. 7, where the behavior detection apparatus may include: a processor 71 and a machine-readable storage medium 72, the machine-readable storage medium 72 storing machine-executable instructions executable by the processor 71; the processor 71 is configured to execute machine executable instructions to implement the steps of:

Based on the same application concept as the above method, the embodiment of the present application further provides a machine-readable storage medium, where a number of computer instructions are stored, where the computer instructions can implement the behavior detection method disclosed in the above example of the present application when the computer instructions are executed by a processor.

Wherein the machine-readable storage medium may be any electronic, magnetic, optical, or other physical storage device that can contain or store information, such as executable instructions, data, or the like. For example, a machine-readable storage medium may be: RAM (Radom Access Memory, random access memory), volatile memory, non-volatile memory, flash memory, a storage drive (e.g., hard drive), a solid state drive, any type of storage disk (e.g., optical disk, dvd, etc.), or a similar storage medium, or a combination thereof.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. A typical implementation device is a computer, which may be in the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in the same piece or pieces of software and/or hardware when implementing the present application.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the application may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Moreover, these computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims

1. A method of behavior detection, the method comprising:

determining that the video to be detected has the specified type of behavior or the video to be detected does not have the specified type of behavior based on the target cluster center features corresponding to the behavior features;

the determining that the video to be detected has the specified type of behavior or the video to be detected does not have the specified type of behavior based on the target cluster center features corresponding to the behavior features includes:

if the target cluster center features corresponding to the behavior features are identical to the cluster center features, and the sequence of the target cluster center features corresponding to the behavior features is matched with the sequence of the cluster center features, determining that the video to be detected has the specified type of behavior;

2. The method according to claim 1, wherein the selecting a plurality of target to-be-detected images of the same target object from the plurality of to-be-detected images includes:

performing target detection on specific targets in the plurality of images to be detected to obtain object positions of the specific targets in the plurality of candidate images to be detected; the candidate to-be-detected image is to-be-detected images with specific targets in the plurality of to-be-detected images, wherein the specific targets comprise at least one target object;

3. The method of claim 2, wherein the step of determining the position of the substrate comprises,

the acquiring a slice sequence of the target object based on the plurality of target to-be-detected images includes:

determining a target frame position of the target object based on the object positions of the target object in a plurality of target images to be detected, wherein the target frame position is a maximum circumscribed rectangle of all object positions;

4. The method of claim 1, wherein selecting a target cluster center feature from the plurality of cluster center features based on a similarity of the behavioral feature to each cluster center feature comprises:

determining the maximum similarity based on the similarity between the behavior feature and the central feature of each class cluster;

determining the central characteristic of the class cluster corresponding to the maximum similarity as the central characteristic of the target class cluster;

or determining whether the maximum similarity is greater than a similarity threshold, if so, determining the cluster center feature corresponding to the maximum similarity as the target cluster center feature.

5. The method according to any one of claim 1 to 4, wherein,

the method for acquiring the central characteristics of the plurality of class clusters corresponding to the specified type of behavior comprises the following steps:

6. The method of claim 5, wherein the step of determining the position of the probe is performed,

for each sample feature within the class cluster, including feature values for a plurality of feature dimensions;

the determining the cluster center feature of the cluster based on the sample feature in the cluster includes:

7. The method of claim 5, wherein the step of determining the position of the probe is performed,

after the cluster center feature of the cluster is determined based on the sample feature in the cluster, the method further comprises: selecting a sample feature closest to the central feature of the class cluster from all sample features in the class cluster, and determining a sample sequence corresponding to the selected sample feature as a behavior class cluster sample of the class cluster;

the determining that the video to be detected has the specified type of behavior based on the target cluster center features corresponding to the behavior features, or after the video to be detected does not have the specified type of behavior, further includes: and if the video to be detected has the specified type of behavior, displaying a slice sequence with the specified type of behavior, and displaying a behavior class cluster sample of each class cluster.

8. A behavior detection apparatus, the apparatus comprising:

the determining module is used for determining the similarity between the behavior characteristics and the central characteristics of each class cluster based on the central characteristics of the class clusters corresponding to the specified type of behavior aiming at each behavior characteristic; selecting a target cluster center feature from the plurality of cluster center features based on the similarity of the behavior feature and each cluster center feature; determining that the video to be detected has the specified type of behavior or the video to be detected does not have the specified type of behavior based on the target cluster center features corresponding to the behavior features;

the determining module determines that the specified type of behavior exists in the video to be detected or is specifically used when the specified type of behavior does not exist in the video to be detected based on the target cluster center features corresponding to the behavior features: if the target cluster center features corresponding to the behavior features are identical to the cluster center features, and the sequence of the target cluster center features corresponding to the behavior features is matched with the sequence of the cluster center features, determining that the video to be detected has the specified type of behavior; otherwise, determining that the specified type of behavior does not exist in the video to be detected;

9. A behavior detection apparatus, characterized by comprising: a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor;