CN114399711A

CN114399711A - Logistics sorting form identification method and device and storage medium

Info

Publication number: CN114399711A
Application number: CN202210015524.2A
Authority: CN
Inventors: 陈智勇; 郭聪; 于伟; 王林芳
Original assignee: Jingdong Technology Information Technology Co Ltd
Current assignee: Jingdong Technology Information Technology Co Ltd
Priority date: 2022-01-07
Filing date: 2022-01-07
Publication date: 2022-04-26

Abstract

The application provides a logistics sorting form identification method and device and a storage medium. The method comprises the following steps: determining a plurality of tracking key points in a video clip start time image of a video clip; acquiring a plurality of motion tracks of the plurality of tracking key points in the video clip; clustering the plurality of motion tracks to obtain a plurality of track clusters; determining at least one first track cluster of the plurality of track clusters that meets monitoring requirements; and acquiring a first characteristic of each first track cluster, and judging whether violent sorting behavior occurs or not according to the first characteristic. The method processes the track data of all moving objects in the video picture, and is not limited by the number and the types of the targets. The realization of final output is a physical quantity with practical significance, the complete black box of the model is avoided, the interpretability of the output of the model is improved, and the possibility of personalized configuration is provided for the use of different business parties and different scenes.

Description

Logistics sorting form identification method and device and storage medium

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a method and a device for identifying a logistics sorting form and a storage medium.

Background

At present, the sorting of express delivery is usually carried out in a manual mode, and violent sorting behaviors of objects, such as throwing, stepping, kicking and the like, by a sorter can exist. In order to avoid violent sorting, a camera is usually installed on the field for express sorting, video data in the sorting process is collected through the camera, the video data is processed, and whether violent sorting behaviors exist or not is identified.

The express sorting scene is complex, and comprises various targets such as people, express and other moving objects, the number is indefinite, and the categories are various. In the related art. The identification method based on target detection and target tracking is difficult to judge all targets of all categories, and the required calculation amount is large. The neural network-based end-to-end identification method is low in model universality, cannot meet different judgment standards of different business parties, needs to be retrained for different business parties, and is high in required labeling data difficulty and high in cost.

Disclosure of Invention

The application provides a method and a device for identifying a logistics sorting form and a storage medium, which are used for solving at least one of the technical problems. The technical scheme of the application is as follows:

according to a first aspect of embodiments of the present application, there is provided a logistics sorting form identification method, including:

determining a plurality of tracking key points in a video clip start time image of a video clip; the tracking key point is an initial tracking point of a pixel in the image at the starting moment of the video clip;

acquiring a plurality of motion tracks of the plurality of tracking key points in the video clip;

clustering the plurality of motion tracks to obtain a plurality of track clusters;

determining at least one first track cluster of the plurality of track clusters that meets monitoring requirements;

and acquiring a first characteristic of each first track cluster, and judging whether violent sorting behavior occurs or not according to the first characteristic.

According to a second aspect of the embodiments of the present application, there is provided a logistics sorting form recognition apparatus, including:

the device comprises a determining module, a judging module and a judging module, wherein the determining module is used for determining a plurality of tracking key points in a video clip starting moment image of a video clip; the tracking key point is an initial tracking point of a pixel in the image at the starting moment of the video clip;

the acquisition module is used for acquiring a plurality of motion tracks of the plurality of tracking key points in the video clip;

the clustering module is used for clustering the plurality of motion tracks to obtain a plurality of track clusters;

the classification module is used for determining at least one first track cluster which meets the monitoring requirement in the plurality of track clusters;

and the judging module is used for acquiring the first characteristic of each first track cluster and judging whether violent sorting behavior occurs or not according to the first characteristic.

According to a third aspect of embodiments of the present application, there is provided a computer apparatus comprising:

a processor; and

a memory communicatively coupled to the processor; wherein the content of the first and second substances,

the memory stores instructions executable by the processor to enable the processor to perform the method of the embodiment of the first aspect of the present application.

According to a fourth aspect of the embodiments of the present application, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the method for identifying a logistics sorting form according to the embodiments of the first aspect of the present application.

According to a fifth aspect of the embodiments of the present application, there is provided a computer program product, which includes computer instructions, when executed by a processor, for implementing the steps of the method for identifying logistics sorting form according to the embodiments of the first aspect of the present application.

According to the logistics sorting form identification method, the logistics sorting form identification device and the storage medium, the video segment starting moment images of the video segments related to the logistics sorting behaviors are obtained, the tracking key points are extracted from the video segment starting moment images through dense sampling or other methods, and then the motion tracks of all the tracking key points are obtained, so that the motion tracks of all moving objects are obtained; then automatically clustering the motion tracks, and classifying the motion tracks with close motion trends and close motion positions; and then, identifying the concerned target category, extracting characteristic data, and judging whether violent sorting behaviors occur or not according to the characteristic data. The method is obtained by tracking the motion track of the key point, and the processed track data of all moving objects in the video picture is not limited by the number and the types of the targets. The physical characteristics of the clustered target categories are extracted, the final output is physical quantity with practical significance, the complete black box of the model is avoided, the interpretability of the model output is improved, and the possibility of personalized configuration is provided for the use of different business parties and different scenes.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and, together with the description, serve to explain the principles of the application and are not to be construed as limiting the application.

Fig. 1 is a flowchart illustrating a logistics sort form recognition method according to an exemplary embodiment of the present application.

Fig. 2 is a diagram illustrating tracking keypoints divided in an image at the start time of a video clip according to an exemplary embodiment of the present application.

Fig. 3 is a schematic diagram illustrating tracking of a motion trajectory of a keypoint in the video segment according to an exemplary embodiment of the present application.

FIG. 4 is a schematic diagram illustrating trajectory clustering according to an exemplary embodiment of the present application.

FIG. 5 is a schematic diagram illustrating circumscribing points within a cluster, according to an exemplary embodiment of the present application.

Fig. 6 is a flowchart illustrating a logistics sort format identification method according to another exemplary embodiment of the present application.

Fig. 7 is a flowchart illustrating a logistics sort format identification method according to yet another exemplary embodiment of the present application.

Fig. 8 is a flowchart illustrating a logistics sort format identification method according to yet another exemplary embodiment of the present application.

Fig. 9 is a flowchart illustrating a logistics sort format identification method according to yet another exemplary embodiment of the present application.

Fig. 10 is a flowchart illustrating a method for identifying a logistics sorting pattern according to an exemplary embodiment of the present application.

Fig. 11 is a block diagram illustrating a logistics sorting form recognition apparatus according to an embodiment of the present application.

FIG. 12 is a block diagram illustrating a computer device according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in this application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

Where in the description of the present application, "/" indicates an OR meaning, for example, A/B may indicate A or B; "and/or" herein is merely an association describing an associated object, and means that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone.

The first method is to determine and judge violent sorting based on target detection and target tracking, firstly, an interested target (such as people and goods) is detected from a video image by using a target detection technology, then, the position of the target in a subsequent video sequence is tracked by using the target tracking, and then, whether the target belongs to violent sorting behavior is judged according to a tracking result. The second method is that the violent behavior is predicted end to end based on models such as a convolutional neural network (CNN, 2D convolution or 3D convolution) or a cyclic neural network (RNN), the time-space characteristics of video segments or continuous video frames are extracted by using the more advanced convolutional neural network or the cyclic neural network in the field of computer vision as a basic model, and model training is carried out by using artificially labeled action position information data as the basis; and directly judging whether violent sorting behaviors occur in the time window or not to predict.

The first method for determining and judging violent sorting based on target detection and target tracking needs to specify the position of an object to be tracked before tracking is started, and can complete positioning of the action occurrence position in a simple scene (for example, a situation that only one or a few targets exist in a picture and the type of the object is known); however, in a complex scene of the real world, the number of targets is uncertain, the types of the targets are uncertain, and the scheme is difficult to judge all targets of all types; in addition, in order to ensure the quality of detection and tracking, the detection model and the tracking model need to process each frame in the observation window, and the required calculation amount is large.

The second method for judging violence sorting based on the convolutional neural network or the cyclic neural network model comprises the following steps that firstly, a large amount of manual data marking is needed, and marking cost is high; secondly, the model training is carried out by taking models such as a convolutional neural network or a cyclic neural network as a 'black box', the training process is complex and long, the training difficulty is high, and the model can not be converged; in addition, because the model is a black box operation, once the definition of the violent sorting standard changes (the situation is common among different business parties), the model needs to be retrained, the universality of the model is low, different judgment standards of different business parties cannot be met, and the required labeling data is difficult and high in cost aiming at the retraining of different business parties. Fourthly, in addition to judging the violent sorting behaviors, if the position of the violent sorting behaviors in the picture needs to be further determined, further position information needs to be provided, and the model needs to be trained again.

In order to solve the above problems, in the embodiment of the application, in order to improve the universality of the violent sorting determination model for different types of target objects, the interpretability is good, the requirements of different determination rules of different business parties are met, and the existing capability is conveniently expanded. A method and a device for identifying a logistics sorting form and a storage medium are provided.

In order to solve the problems, in the embodiment of the application, the video segment starting moment image of the video segment related to the logistics sorting behavior is firstly obtained, the tracking key points are extracted from the video segment starting moment image through dense sampling or other methods, and then the motion tracks of all the tracking key points are obtained, so that the motion tracks of all the moving objects are obtained; then automatically clustering the motion tracks, and classifying the motion tracks with close motion trends and close motion positions; and then, identifying the concerned target category, extracting characteristic data, and judging whether violent sorting behaviors occur or not according to the characteristic data.

The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

Fig. 1 is a flowchart of a logistics sorting form identification method according to an embodiment of the application. The embodiment provides a logistics sorting form identification method, and an execution main body of the logistics sorting form identification method can be computer equipment such as a server. As shown in fig. 1, the logistics sort form recognition method may include the following steps.

Step S101, determining a plurality of tracking key points in a video clip start time image of a video clip, wherein the tracking key points are initial tracking points of pixels in the video clip start time image.

In one implementation, after a video segment related to the logistics sorting behavior is acquired, a start time image of the video segment may be extracted, and the start time image of the video segment may be used as the start time image of the video segment.

It should be noted that there are many ways to determine the tracking key point in the image at the beginning of the video segment, for example, the image may be divided into a plurality of squares, and the center of each square is used as the tracking key point, or other ways to determine the tracking key point in the image, such as super-pixel (super-pixel) segmentation. Two example implementations of these will be given below:

as an example of one possible implementation, the image at the start time of the video segment is divided into several squares at fixed intervals, and the center of each square is determined as the tracking key point.

For example, the image at the beginning of the video segment is divided into squares according to a fixed interval, and intensive sampling is performed, wherein the center of each square is defined as an initial tracking key point. The grid division interval may be determined according to specific use conditions such as computing resources, for example, the length and width of each grid is 20 pixels, as shown in fig. 2, and dots in the graph may be regarded as the tracking key points. In order to improve the effect of final positioning, the operation of extracting the tracking key points may be performed on multiple scales at the same time, and the number of the scales may be selected according to specific use conditions such as computing resources, for example, 8 scales.

As another example of possible implementation, the video segment start time image is segmented by super-pixels (super-pixel), and a circumscribed center of each super-pixel is determined as the tracking keypoint.

It should be noted that, the advantage of using super-pixel segmentation is to segment out the background data, and only concern the meaningful foreground data, but the corresponding computation consumption is also larger, and whether to select this method may depend on the specific use cases of computing resources, etc.

It should be further noted that the above two examples are only examples provided for facilitating understanding of implementation manners of determining the tracking key points in the video segment start time image by those skilled in the art, and cannot be taken as specific limitations on how to determine the tracking key points in the video segment start time image, that is, the present application may also determine the tracking key points in the video segment start time image by using other manners, for example, in addition to the method of dense sampling proposed in the above embodiment, a method of extracting feature points and the like may also be used for obtaining the tracking key points.

Step S102, obtaining a plurality of motion tracks of the plurality of tracking key points in the video clip.

According to the method and the device, the moving object in the video clip is captured by acquiring the motion track of the tracking key point.

In this embodiment, the obtaining of the plurality of motion trajectories of the plurality of tracking key points in the video segment includes:

and tracking the tracking key points in the video clip by an optical flow method aiming at each tracking key point to acquire the motion trail.

For example, the determined tracking key points may be tracked sequentially in subsequent video segments by using an optical flow method, as shown in fig. 3, and the motion trajectory is calculated by using the optical flow method. There are many methods for calculating the optical flow, and the present application is not particularly limited and may be applied. Alternatively, the open source computer vision library OpenCV may be employed.

In order to improve the smoothness of the motion trajectory, in a possible implementation manner, a filter such as binary filtering, median filtering, or the like may be further added to the optical flow to perform smoothing processing.

Thus, the position of each frame of the tracking key point in the whole video clip is obtained to form a motion track; and performing the same processing on all the tracking key points to obtain all the motion tracks of all the tracking key points.

And step S103, clustering the plurality of motion tracks to obtain a plurality of track clusters.

It can be understood that, in a complex real-world scene, there are many objects in motion, such as moving birds, cats, moving cars, and the like, in addition to express packages. The method and the device can automatically cluster the motion tracks in the video segments through a clustering algorithm. And classifying the motion tracks with close motion trends and close motion positions.

In this embodiment, a trajectory clustering method is used to cluster the motion trajectories to obtain a clustering result.

There are many kinds of track clustering methods, and the present application is not particularly limited and may be adopted. Optionally, the track clustering algorithm may adopt unsupervised DBSCAN and other methods, and the use of the specific clustering algorithm may be determined according to the use situation.

Here, in addition to the clustering operation, another operation may be performed in order to group together points whose movement trends are close and movement positions are close.

For example, as shown in fig. 4, A, B, C points (tracking key points) in the graph have the same movement trend and close movement positions in the whole life cycle. And the point D is different from the point A, B, C in both the motion trend and the motion position and belongs to two different clusters. Each cluster center represents a moving object.

Step S104, determining at least one first track cluster which meets the monitoring requirement in the plurality of track clusters.

Optionally, after clustering the motion tracks, moving object categories that obviously do not belong to the attention target are excluded. Therefore, the concerned target category meeting the monitoring requirement is identified, and the obtained first track cluster is the concerned monitoring category. In one implementation, the target class meeting the monitoring requirement can be identified through the classifier, the task type is simple, the required labeling type is simpler, and the model of the classifier is easier to train.

The classifier is capable of identifying whether each track cluster meets monitoring requirements, which depend on a specific scenario. For the selection of the classifier, the classifier for determining whether a specific target is included in the sampling picture can be determined according to practical use, including but not limited to SVM, GBDT, CNN, etc.

Step S105, acquiring a first characteristic of each first track cluster, and judging whether violent sorting behavior occurs or not according to the first characteristic.

In one implementation, the first feature includes an object category, a physical quantity of motion of the object, and location information of occurrence of violent sorting behavior, and the obtaining the first feature of each first trajectory cluster includes:

acquiring a clustering center of each first track cluster, and representing the object type through the clustering center;

acquiring a physical quantity of each first track cluster as a motion physical quantity of the object;

and positioning the occurrence area of each first track cluster to acquire the position information of the occurrence violent sorting behavior.

It is to be understood that since each cluster center represents a moving object, the object class can be represented by the cluster center.

In a possible implementation manner, the physical quantity of each first track cluster is obtained, that is, the physical quantity of each first track cluster in the whole operation track life is calculated, and the physical quantity includes an average speed, a maximum speed, an acceleration, an operation distance, an operation angle and the like.

For example, as shown in fig. 5, for each of the first trajectory clusters, points in the outer class are circumscribed, so as to finally determine the location information of the occurrence of the violent sorting behavior. Besides adopting external key points, the method can also be determined by means of clustering center points, adding clustering radius and the like; and is not particularly limited herein.

The final output of the method comprises physical quantities with practical significance, so that complete black box of the model is avoided, and the interpretability of the output of the model is improved; in addition, the position of the violent sorting can be predicted without additional position marking information, marking cost is saved, and complexity of the model is reduced.

When the information of the first characteristic judgment is submitted to an upper-layer service system, the upper-layer service systems of different service parties can judge whether the sorting behavior is a violent sorting behavior according to respective service judgment rules aiming at the violent sorting.

In a possible implementation manner, the determining whether a violent sorting behavior occurs according to the first feature includes:

and judging whether violent sorting behaviors occur or not according to whether the first characteristic exceeds a preset range or not.

Under the condition that the service judgment rule is clear, a specific user can manually set a threshold value, and whether violent sorting behaviors occur or not is judged by judging whether the violent sorting behaviors exceed a preset range or not.

If the business side does not have an explicit business judgment rule, the information of the first characteristic can be sent to an anomaly detection algorithm, and an event corresponding to the running track showing the anomaly is detected.

In another possible implementation manner, the determining whether a violent sorting behavior occurs according to the first feature includes:

inputting the first characteristic into a preset abnormal detection model to obtain a detection result;

and judging whether violent sorting behaviors occur or not according to the detection result.

As an example, the anomaly detection algorithm corresponding to the anomaly detection model may be selected from: anomaly detection based on cluster analysis, such as k-means clustering; and anomaly detection based on Deep learning, such as Deep SVDD.

Therefore, under the condition that the service judgment rule is not well established or is not clear, the abnormal detection algorithm can be used for reporting the detection of the operation event which is not in accordance with the conventional operation.

The logistics sorting form identification method provided by the embodiment obtains the video segment start time images of the video segments related to the logistics sorting behaviors, extracts the tracking key points from the video segment start time images through dense sampling or other methods, and then obtains the motion tracks of all the tracking key points, thereby obtaining the motion tracks of all moving objects; then automatically clustering the motion tracks, and classifying the motion tracks with close motion trends and close motion positions; and then, identifying the concerned target category, extracting characteristic data, and judging whether violent sorting behaviors occur or not according to the characteristic data. By acquiring the tracking key points, the track data of all moving objects in the video picture is processed, the track data is not limited by the number and the types of the targets, and the problems of variable numbers and types of the targets can be processed in a complex real-world scene. The final output of the method is a physical quantity with practical significance, the complete black box of the model is avoided, the interpretability of the output of the model is improved, and the possibility of personalized configuration is provided for different business parties and different scenes.

Fig. 6 is a flowchart of a logistics sorting form identification method according to another embodiment of the present application, and as shown in fig. 6, the logistics sorting form identification method may include the following steps.

Step S601, carrying out still picture filtering processing on the video clip;

alternatively, the still picture filtering process may adopt a frame difference method, for example, subtracting pixel values corresponding to two adjacent frames from each other to obtain a difference value, then summing the difference values of all pixels, and if the sum is greater than a certain threshold, the picture is considered to have moved, otherwise, the picture is considered to be still; the threshold is set to an empirical value, which may be 6.25, as an example.

Step S602, determining a plurality of tracking key points in the video clip start time image of the video clip; the tracking key point is an initial tracking point of a pixel in the image at the starting moment of the video clip.

Step S603, obtaining a plurality of motion trajectories of the plurality of tracking key points in the video segment;

step S604, clustering the plurality of motion tracks to obtain a plurality of track clusters;

step S605, determining at least one first track cluster which meets the monitoring requirement in the plurality of track clusters;

step S606, acquiring a first characteristic of each first track cluster, and judging whether violent sorting behavior occurs according to the first characteristic.

It should be noted that, in this embodiment, the implementation process of the above steps S602 to S606 may refer to the description of the implementation process of the above steps S101 to S105, and is not described herein again.

In this embodiment, on the basis of the above embodiment, still picture filtering processing is added to the video clip, relatively still pictures are filtered, and only moving pictures are retained and sent to subsequent flow processing, so as to save computing resources.

Fig. 7 is a flowchart of a logistics sorting form identification method according to another embodiment of the present application, and as shown in fig. 7, the logistics sorting form identification method may include the following steps.

Step S701, determining a plurality of tracking key points in the video clip starting moment image of the video clip; the tracking key point is an initial tracking point of a pixel in the image at the starting moment of the video clip.

Step S702, acquiring a minimum eigenvalue of a covariance matrix of each tracking key point in a neighborhood, and deleting the tracking key points of which the minimum eigenvalue is smaller than a first preset threshold value from the plurality of tracking key points;

it can be understood that in the video picture of the actual scene, not all regions contain useful information, for example, a large white wall, a blue sky, etc., and the tracking key points on these regions do not need to exist, so the tracking key points with less information can be filtered out. For example, the minimum eigenvalue of the covariance matrix (generally 3 × 3) of each tracking keypoint in the neighborhood may be calculated, and if the eigenvalue is smaller than a set threshold, the tracking keypoint is deleted; as an example, the threshold T may be selected with reference to the following formula:

wherein the content of the first and second substances,

to track the feature value of the keypoint I in the image I, the setting of the threshold needs to be calculated separately for each frame.

By the method, the tracking key points with less information amount are filtered, the tracking key points which are unnecessary to track are eliminated, and the data amount required to be processed in the subsequent process is reduced, so that the computing resources are saved.

Step S703 is to obtain a motion trajectory of the remaining tracking key points in the video segment.

Step S704, clustering the plurality of motion tracks to obtain a plurality of track clusters;

step S705, determining at least one first track cluster which meets the monitoring requirement in the plurality of track clusters;

step S706, acquiring a first characteristic of each first track cluster, and judging whether violent sorting behavior occurs according to the first characteristic.

It should be noted that, in this embodiment, the implementation processes of the step S701 and the step S703 to the step S706 may refer to the description of the implementation processes of the steps S101 to S105, respectively, and are not described herein again.

In this embodiment, based on the above embodiment, the tracking key points with a small information amount are filtered, the tracking key points which are not necessary to be tracked are excluded, and the data amount required to be processed in the subsequent process is reduced, so as to save the computing resources.

Fig. 8 is a flowchart of a logistics sorting form recognition method according to another embodiment of the present application, and as shown in fig. 8, the logistics sorting form recognition method may include the following steps.

Step S801, determining a plurality of tracking key points in the video clip start time image of the video clip, wherein the tracking key points are initial tracking points of pixels in the video clip start time image;

step S802, acquiring a plurality of motion tracks of the plurality of tracking key points in the video clip;

step S803, obtaining a time length of each motion trajectory, and deleting a motion trajectory of which the time length is smaller than a second preset threshold value from among the plurality of motion trajectories;

optionally, the time length is calculated as follows:

in the above embodiment, the motion trajectory of a certain tracking key point is obtained by tracking the position of the tracking key point in the whole video segment; the starting time of the motion trajectory is the time when the tracking key point starts to move, the ending time of the motion trajectory is the ending time of the tracking key point (if the optical flow of the tracking key point does not change in the x and y directions within a period of time, the motion trajectory ends), and the ending time subtracts the starting time, namely the time length of the motion trajectory.

The time length of each motion track is calculated, the motion tracks with short existence time are filtered out by setting the time length threshold of the motion tracks, so that the general sorting behaviors with slight action amplitude are filtered out, and the setting of the time length threshold can be determined according to the specific actual use condition.

Step S804, deleting a motion trajectory of which the position change does not satisfy a preset condition among the plurality of motion trajectories.

In one implementation, the motion trajectory for which the position change does not satisfy the preset condition includes, but is not limited to, a motion trajectory in which the average moving distance in the X, Y two directions is less than a third preset threshold, and/or a motion trajectory in which the moving standard deviation in the X, Y two directions is less than a fourth preset threshold.

In a possible implementation manner, average moving distances of each of the motion trajectories in two directions X, Y may be obtained, and motion trajectories of which the average moving distances are smaller than a third preset threshold value among the plurality of motion trajectories are deleted; and/or acquiring the moving standard deviation of each motion track in X, Y two directions, and deleting the motion track of which the moving standard deviation is smaller than a fourth preset threshold value in the plurality of motion tracks.

For example, the method for obtaining the average moving distance and the moving standard deviation of each motion trajectory in the X, Y two directions is as follows:

respectively adding the x and y coordinates of each tracking key point at each moment on the whole motion track, and then averaging to obtain the average x and y coordinates of the tracking key point on the whole motion track, and recording the average x and y coordinates as mean _ x, mean _ y and average moving distance;

then, subtracting mean _ x and mean _ y from the values of the x and y coordinates of each moment on the whole motion track respectively to obtain the difference value temp _ x and temp _ y of the moment relative to the average coordinate; temp _ x and temp _ y may be positive or negative, so temp _ x and temp _ y are squared separately, and then the squared values of all temp _ x and temp _ y for the entire trace are summed, and the square root of the sum is taken to obtain the standard deviation of the shift.

Calculating indexes such as average moving distance, moving standard deviation and the like of each moving track in the x direction and the y direction, and filtering the moving track with small position change by using the indexes such as the average moving distance, the moving standard deviation and the like, thereby filtering general sorting behaviors with small action amplitude; the setting of the filtering threshold value can be determined according to specific actual use conditions.

Step S805, clustering remaining motion trajectories among the plurality of motion trajectories to obtain a plurality of trajectory clusters.

The remaining motion trail among the plurality of motion trails can be understood as the motion trail remaining after filtering out the motion trail with little position change and the motion trail with short existence time among the plurality of motion trails.

Optionally, the above step S803 and step S804 do not distinguish the order, that is, step S803 may be executed first and then step S804 is executed, or step S804 may be executed first and then step S803 may be executed, or step S803 and step S804 may be executed at the same time.

Step S806, determining at least one first track cluster which meets the monitoring requirement from the plurality of track clusters;

step S807, obtaining a first feature of each first trajectory cluster, and determining whether a violent sorting behavior occurs according to the first feature.

It should be noted that, in this embodiment, the implementation processes of the step S801, the step S802, and the step S805 to the step S407 may refer to the description of the implementation processes of the steps S101 to S105, respectively, and are not described herein again.

In the embodiment, on the basis of the above embodiment, the motion trajectory with short existence time is filtered, the general sorting behavior with a slight action amplitude is filtered, and the data amount required to be processed in the subsequent flow is reduced, so as to save the computing resources.

Fig. 9 is a flowchart of a logistics sorting form identification method according to yet another embodiment of the present application, and as shown in fig. 9, the logistics sorting form identification method may include the following steps. Step S901, determining a plurality of tracking key points in a video segment start time image of a video segment, where the tracking key points are initial tracking points of pixels in the video segment start time image;

step S902, obtaining a plurality of motion tracks of the plurality of tracking key points in the video clip;

step S903, clustering the plurality of motion tracks to obtain a plurality of track clusters;

step S904, extracting a second feature of each track cluster;

in one possible implementation, the extracting the second feature of each track cluster includes:

determining the center of each track cluster, and acquiring a corresponding first motion track;

and collecting peripheral image information of a plurality of positions of the first motion trail, and taking the peripheral image information as the second characteristic.

For example, screen information in the vicinity of the motion trajectory at the start, middle, and end times is acquired. Here, the intermediate frame may be obtained by sampling, and the size of the picture may be adjusted according to the actual situation, for example, 256 × 256,512 × 512, and the like.

And sending the picture information near the motion track at the sampling start, midway and end moments of each track cluster into a classifier trained in advance, judging whether the specific target class is contained, and only keeping the track clusters meeting the monitoring requirements after filtering by the classifier.

When the classifier model is adopted for classification, only the picture information at the starting time, the ending time and a small amount of intermediate time of the running track is needed to be confirmed, and the data volume to be processed is relatively small.

In another possible implementation manner, on the basis of an embodiment in which super-pixel segmentation is adopted for images at the start time of the video segment, the extracting the second feature of each track cluster includes:

acquiring superpixels of each track cluster;

and combining the super pixels to obtain a contour picture, and taking the contour picture as the second feature.

And combining all corresponding super-pixels in each track cluster to obtain a final contour picture of the moving object, and classifying the final contour picture as a feature output classifier.

Step S905, inputting the second features into a classifier to obtain at least one first track cluster meeting the monitoring requirement.

Step S906, acquiring a first characteristic of each first track cluster, and judging whether violent sorting behavior occurs or not according to the first characteristic.

It should be noted that, in this embodiment, the implementation processes of the steps S901 to S903 may respectively refer to the descriptions of the implementation processes of the steps S101 to S103, and the implementation process of the step S906 may respectively refer to the descriptions of the implementation process of the step S105, which are not described herein again.

In this embodiment, on the basis of the above embodiment, an implementation manner of determining at least one first trajectory cluster that satisfies the monitoring requirement among the plurality of trajectory clusters is provided, and from the viewpoint of task type, compared with a detection model and a tracking model adopted by a classifier adopted in a step of confirming the category, the task type is simpler, the required labeling type is simpler, and the model is easier to train. And the input of the classifier only needs the video frames at the starting time, the ending time and a small number of intermediate times of the running track, or only needs to obtain the super-pixel combination to obtain the outline picture, and the required data volume is relatively small.

On the basis of the above embodiments, the present application provides a specific embodiment, and fig. 10 is a flowchart of a logistics sorting form identification method according to a specific embodiment of the present application, and as shown in fig. 10, the logistics sorting form identification method may include the following steps.

Step S1001, performs still picture filtering processing on the video clip. The implementation process of step S1001 can be referred to as the implementation process of step S601.

Step S1002, determining a plurality of tracking key points in a video segment start time image of a video segment, where the tracking key points are initial tracking points of pixels in the video segment start time image. The implementation process of step S1002 can be referred to as the implementation process of step S101.

Step S1003, obtaining a minimum eigenvalue of the covariance matrix of each tracking keypoint in the neighborhood, and deleting the tracking keypoint, of which the minimum eigenvalue is smaller than a first preset threshold, from the plurality of tracking keypoints. The implementation process of this step S1003 can be referred to the implementation process of step S702.

Step S1004, obtaining the motion trajectories of the remaining tracking key points in the video segment. The implementation process of step S1004 can be referred to the implementation process of step S103.

Step S1005, obtaining a time length of each motion trajectory, and deleting a motion trajectory of which the time length is smaller than a second preset threshold value from the plurality of motion trajectories. The implementation process of this step S1005 can be referred to the implementation process of step S803.

Step S1006, deleting the motion trail of which the position change does not meet the preset condition from the plurality of motion trails. The implementation process of step S1006 can be referred to the implementation process of step S804.

Step 1007, clustering the rest motion tracks in the motion tracks to obtain a plurality of track clusters. The implementation process of step S1007 can be referred to as the implementation process of step S103.

Step S1008, extracting second characteristics of each track cluster. The implementation process of step S1008 can be referred to as the implementation process of step S904.

Step S1009, inputting the second feature into a classifier to obtain the at least one first trajectory cluster meeting the monitoring requirement. The implementation process of this step S1009 can be referred to the implementation process of step S905.

Step S1010, obtaining a first characteristic of each first track cluster, wherein the first characteristic comprises a motion physical quantity of an object. The implementation process of step S1010 may refer to the implementation process of step S105 of acquiring the moving physical quantity of the object.

Step S1011, obtaining a first characteristic of each first trajectory cluster, where the first characteristic includes position information of an object that has undergone violent sorting. The implementation process of step S1011 can refer to the implementation process of step S105, which obtains the position information of the object where the violent sorting action occurs.

Optionally, the foregoing steps S1010 and S1011 do not distinguish the order, that is, step S1010 may be executed first and then step S1011 is executed, or step S804 may be executed first and then step S1011 is executed, or step S1010 and step S1011 may be executed simultaneously.

Step S1012, the first characteristic further includes the category of the object, and it is determined whether a violent sorting action occurs according to the first characteristic. The implementation process of step S1012 can be referred to the implementation process of step S105.

The logistics sorting form identification method provided by each embodiment obtains the video segment start time image of the video segment related to the logistics sorting behavior, extracts the tracking key points from the video segment start time image through dense sampling or other methods, and then obtains the motion tracks of all the tracking key points, thereby obtaining the motion tracks of all moving objects; then automatically clustering the motion tracks, and classifying the motion tracks with close motion trends and close motion positions; and then, identifying the concerned target category, extracting characteristic data, and judging whether violent sorting behaviors occur or not according to the characteristic data. By acquiring the tracking key points, the track data of all moving objects in the video picture is processed, the track data is not limited by the number and the types of the targets, and the problems of variable numbers and types of the targets can be processed in a complex real-world scene. The final output of the method is a physical quantity with practical significance, the complete black box of the model is avoided, the interpretability of the output of the model is improved, and the possibility of personalized configuration is provided for different business parties and different scenes. In addition, the position of the violent sorting can be predicted without additional position marking information, marking cost is saved, and complexity of the model is reduced. From the perspective of task types, compared with a detection model and a tracking model adopted by a classifier adopted in a step of confirming categories, the task types are simpler, the required labeling types are simpler, and the models are easier to train. And the input of the classifier only needs video frames at the starting time, the ending time and a small number of intermediate times of the running track, and the required data volume is relatively small. Furthermore, the method and the device can filter static and slightly moving objects, and only the running objects meeting the conditions are sent to a subsequent processing flow, so that unnecessary calculation is avoided, the required labeled data amount is less, the implementation is easier, and the data needing model reasoning is reduced. Furthermore, an 'abnormity detection' mechanism is provided in a judging link, and under the condition that a service judging rule is not well established or is not clear, an abnormity detection algorithm can be used for detecting and reporting an operation event which does not conform to the conventional method.

The method and the device aim to solve the problem of recognition of violent sorting behaviors in a logistics scene, and can also expand application scenes, such as a security scene, a home special nursing scene (old people, disabled people and the like), a medical monitoring scene, and other scenes for monitoring whether special people and special objects have special defined events, actions and the like in a video or continuous picture frame mode.

Fig. 11 is a block diagram of a structure of a logistics sorting form recognition apparatus according to an embodiment of the present application, where the logistics sorting form recognition apparatus provided in this embodiment can execute a processing procedure provided in the above-mentioned logistics sorting form recognition method embodiment, and as shown in fig. 11, the logistics sorting form recognition apparatus includes a determining module 1101, an obtaining module 1102, a clustering module 1103, a classifying module 1104, and a determining module 1105.

A determining module 1101, configured to determine a plurality of tracking key points in a video segment start time image of a video segment;

an obtaining module 1102, configured to obtain a plurality of motion trajectories of the tracking key points in the video segment;

a clustering module 1103, configured to cluster the plurality of motion tracks to obtain a plurality of track clusters;

a classification module 1104 for determining at least one first trajectory cluster of the plurality of trajectory clusters that meets monitoring requirements;

a determining module 1105, configured to obtain a first feature of each first trajectory cluster, and determine whether a violent sorting behavior occurs according to the first feature.

On the basis of any of the above embodiments, when determining a plurality of tracking key points in an image at a video clip start time of a video clip, the determining module 1101 is configured to:

dividing the image at the starting moment of the video clip into a plurality of squares according to fixed intervals, and determining the center of each square as the tracking key point; alternatively, the first and second electrodes may be,

and performing superpixel segmentation on the image at the starting moment of the video clip, and determining the circumscribed center of each superpixel as the tracking key point.

On the basis of any of the above embodiments, when acquiring the plurality of motion trajectories of the plurality of tracking key points in the video segment, the acquiring module 1101 is configured to:

and tracking the tracking key points in the video clip by an optical flow method to obtain the motion trail.

On the basis of any of the above embodiments, when determining at least one first trajectory cluster of the plurality of trajectory clusters that meets the monitoring requirement, the classification module 1104 is configured to:

extracting a second feature of each track cluster;

inputting the second features into a classifier to obtain the at least one first track cluster meeting the monitoring requirement.

On the basis of any of the above embodiments, when extracting the second feature of each track cluster, the classification module 1104 is configured to:

On the basis of the embodiment that the determining module 1101 is configured to perform superpixel segmentation on the image at the start time of the video segment, and determine the circumscribed center of each superpixel as the tracking key point, the classifying module 1104, when extracting the second feature of each track cluster, is configured to:

acquiring superpixels of each track cluster;

On the basis of any of the above embodiments, the first characteristics include the category of the object, the physical quantity of the object moving, and the location information of the violent sorting behavior, and the determining module 1105, when obtaining the first characteristic of each first trajectory cluster, is configured to:

On the basis of any of the above embodiments, when determining whether a violent sorting behavior occurs according to the first characteristic, the determining module 1105 is configured to:

judging whether violent sorting behaviors occur or not according to whether the first characteristics exceed a preset range or not; alternatively, the first and second electrodes may be,

On the basis of any of the above embodiments, before the obtaining module 1102 is further configured to obtain a plurality of motion trajectories of the plurality of tracking key points in the video segment, to:

acquiring the minimum eigenvalue of the covariance matrix of each tracking key point in the neighborhood;

deleting the tracking key points of which the minimum characteristic value is smaller than a first preset threshold value from the plurality of tracking key points, and acquiring the residual tracking key points from the plurality of tracking key points.

On the basis of any of the above embodiments, the clustering module 1103 performs clustering on the plurality of motion trajectories to obtain a plurality of trajectory clusters, and is further configured to:

acquiring the time length of each motion track;

deleting the motion trail of which the time length is smaller than a second preset threshold value from the plurality of motion trails, and taking the rest motion trail from the plurality of motion trails as a first rest motion trail;

for the first residual motion trail, obtaining the average moving distance of each motion trail in X, Y two directions; deleting the motion trail of which the average moving distance is smaller than a third preset threshold value from the plurality of motion trails, and/or,

acquiring the moving standard deviation of each motion track in X, Y two directions; deleting the motion trail of which the moving standard deviation is smaller than a fourth preset threshold value in the first residual motion trail; and the number of the first and second groups,

and clustering the motion tracks remaining again in the first remaining motion tracks to obtain a plurality of track clusters.

In some embodiments of the present application, the clustering module 1103 is further configured to cluster the plurality of motion trajectories to obtain a plurality of trajectory clusters, and further configured to:

acquiring the average moving distance of each motion track in X, Y two directions; deleting the motion trail of which the average moving distance is smaller than a third preset threshold value from the plurality of motion trails, and/or,

acquiring the moving standard deviation of each motion track in X, Y two directions; deleting the motion trail of which the moving standard deviation is smaller than a fourth preset threshold value from the plurality of motion trails; and the number of the first and second groups,

taking the rest motion trail in the plurality of motion trails as a second rest motion trail;

acquiring the time length of each motion track aiming at the second residual motion track;

and deleting the motion trail of which the time length is smaller than a second preset threshold value in the second residual motion trail, and clustering the motion trail remaining again in the second residual motion trail to obtain a plurality of track clusters.

In some embodiments of the present application, the determining module 1101, before determining the plurality of tracking key points in the video clip start time image of the video clip, further comprises:

and carrying out still picture filtering processing on the video clip.

The logistics sorting form recognition device provided by the embodiment of the application processes the track data of all moving objects in the video picture through the acquisition of the tracking key points, is not limited by the number and the types of the targets, and can process the problems of indefinite number and indefinite types of the targets in a complex real-world scene. The final output of the method is a physical quantity with practical significance, the complete black box of the model is avoided, the interpretability of the output of the model is improved, and the possibility of personalized configuration is provided for different business parties and different scenes. In addition, the position of the violent sorting can be predicted without additional position marking information, marking cost is saved, and complexity of the model is reduced. From the perspective of task types, compared with a detection model and a tracking model adopted by a classifier adopted in a step of confirming categories, the task types are simpler, the required labeling types are simpler, and the models are easier to train. And the input of the classifier only needs video frames at the starting time, the ending time and a small number of intermediate times of the running track, and the required data volume is relatively small.

Fig. 12 is a schematic structural diagram of a computer device according to an embodiment of the present application. The computer device provided in the embodiment of the present application may execute the processing procedure provided in the above logistics sorting form identification method embodiment, as shown in fig. 12, the computer device 1200 includes a memory 1201, a processor 1202, and a computer program; wherein a computer program is stored in the memory 1201 and configured to execute the logistics sort form identification method described in the above embodiment by the processor 1202. In addition, the computer device 1200 may also have a communication interface 1203 for receiving control instructions.

The computer device of the embodiment shown in fig. 12 may be used to implement the technical solution of the above method embodiment, and the implementation principle and technical effect are similar, which are not described herein again.

In addition, the embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the logistics sorting form identification method in the foregoing embodiment.

The present application further provides a computer program product, and when instructions in the computer program product are executed by a processor of the computer device 1200, the computer device 1200 is enabled to execute the logistics sorting form identification method according to the embodiment.

In the several embodiments provided in the embodiments of the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.

It should also be noted that the exemplary embodiments mentioned in this application describe some methods or systems based on a series of steps or devices. However, the present application is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the embodiments of the present application, and are not limited thereto; although the embodiments of the present application have been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A logistics sorting form recognition method is characterized by comprising the following steps:

2. The method of claim 1, wherein determining a plurality of tracking keypoints in a video clip start time image of a video clip comprises:

3. The method of claim 1, wherein the determining at least a first trajectory cluster of the plurality of trajectory clusters that meets monitoring requirements comprises:

extracting a second feature of each track cluster;

4. The method of claim 3, wherein said extracting the second feature of each of the trajectory clusters comprises:

5. The method of claim 3, wherein determining a plurality of tracking key points in a video clip start time image of a video clip comprises:

performing superpixel segmentation on the image at the starting moment of the video clip, and determining the circumscribed center of each superpixel as the tracking key point; and the number of the first and second groups,

the extracting of the second feature of each track cluster includes:

acquiring superpixels of each track cluster;

6. The method according to claim 1, wherein the first characteristics include object categories, physical quantities of movement of the objects, and location information of occurrence of violent sorting behavior, and the acquiring the first characteristics of each of the first trajectory clusters includes:

7. The method of claim 1, wherein said determining whether a violent sorting activity has occurred based on said first characteristic comprises:

8. The method of claim 1, wherein said obtaining a plurality of motion trajectories of said plurality of tracking key points in said video segment comprises:

acquiring a minimum eigenvalue of a covariance matrix of each tracking key point in a neighborhood, and deleting the tracking key points of which the minimum eigenvalue is smaller than a first preset threshold value from the plurality of tracking key points;

and acquiring residual tracking key points in the plurality of tracking key points, tracking the residual tracking key points in the video clip by an optical flow method, and acquiring the motion trail.

9. The method according to any one of claims 1-8, wherein said clustering said plurality of motion trajectories to obtain a plurality of trajectory clusters comprises:

acquiring the time length of each motion track;

10. The method according to any one of claims 1-8, wherein said clustering said plurality of motion trajectories to obtain a plurality of trajectory clusters comprises:

11. The method of claim 1 or 2, wherein determining a plurality of tracking keypoints in a video segment start time image of a video segment further comprises:

and carrying out still picture filtering processing on the video clip.

12. A logistics sorting form recognition device is characterized by comprising:

13. The apparatus of claim 12, wherein the determining module, in determining the plurality of tracking keypoints in the video segment start time image for the video segment, is configured to:

14. The apparatus of claim 12, wherein the classification module, in determining at least one first track cluster of the plurality of track clusters that meets monitoring requirements, is configured to:

extracting a second feature of each track cluster;

15. The apparatus of claim 14, wherein the classification module, in extracting the second feature of each of the trajectory clusters, is configured to:

16. The apparatus of claim 14, wherein the determining module, in determining the plurality of tracking keypoints in the video-clip-start-time image for the video clip, is configured to:

the classification module, when extracting the second feature of each of the trajectory clusters, is configured to:

acquiring superpixels of each track cluster;

17. The apparatus of claim 12, wherein the first characteristics include object type, physical quantity of movement of the object, and location information of occurrence of violent sorting behavior, and the determining module, when obtaining the first characteristic of each of the first trajectory clusters, is configured to:

18. The apparatus of claim 12, wherein the determination module, when determining whether a violent sorting activity has occurred based on the first characteristic, is configured to:

19. The apparatus of claim 12, wherein the obtaining module, when obtaining the plurality of motion trajectories of the plurality of tracking key points in the video segment, is further configured to:

deleting the tracking key points of which the minimum characteristic value is smaller than a first preset threshold value from the plurality of tracking key points, acquiring residual tracking key points from the plurality of tracking key points, and tracking the residual tracking key points in the video clip by an optical flow method to acquire the motion trail.

20. The apparatus according to any of claims 12-19, wherein the clustering module, when clustering the plurality of motion trajectories to obtain a plurality of trajectory clusters, is configured to:

acquiring the time length of each motion track;

21. The apparatus according to any of claims 12-19, wherein the clustering module, when clustering the plurality of motion trajectories to obtain a plurality of trajectory clusters, is further configured to:

22. The apparatus of claim 12, wherein the determining module, prior to determining the plurality of tracking keypoints in the video segment start time image of the video segment, is further configured to:

and performing still picture filtering processing on the video clip, and determining a plurality of tracking key points in the video clip starting moment image of the video clip.

23. A computer device, comprising:

a processor; and

the memory stores instructions executable by the processor to enable the processor to perform the method of identifying a logistics sort format according to any one of claims 1 to 11.

24. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the logistics sort form identification method of any one of claims 1 to 11.