CN117372924B

CN117372924B - Video detection method and device

Info

Publication number: CN117372924B
Application number: CN202311352534.6A
Authority: CN
Inventors: 刘子伟; 余家忠; 靳志娟; 曹润东; 梁清华; 李飞; 刘昱含
Original assignee: China Tower Co Ltd
Current assignee: China Tower Co Ltd
Priority date: 2023-10-18
Filing date: 2023-10-18
Publication date: 2024-05-07
Anticipated expiration: 2043-10-18
Also published as: CN117372924A

Abstract

The application provides a video detection method and a device, which are applied to the technical field of video detection, wherein the method comprises the steps of obtaining a first monitoring video of a target monitoring position; under the condition that the first monitoring video comprises the target engineering vehicle, extracting 2M reference points of each frame of image of the first monitoring video; matching 2M reference points of two adjacent frames of images to obtain 2K groups of key points of the two adjacent frames of images successfully matched; calculating a basic matrix based on 2K groups of key points successfully matched with two adjacent frames of images; singular Value Decomposition (SVD) is carried out on each basic matrix in a plurality of basic matrices obtained based on continuous N frames of images of a first monitoring video, so that a rotation matrix and a translation matrix of each basic matrix are obtained; determining position change values of M characteristic parts of the target engineering vehicle based on the rotation matrix and the translation matrix; and based on the position change values of the M characteristic parts, the working state of the target engineering vehicle is judged, so that the accuracy of the video detection method is improved.

Description

Video detection method and device

Technical Field

The present application relates to the field of image recognition technologies, and in particular, to a video detection method and apparatus.

Background

In daily life, the phenomenon of messy operation of engineering vehicles is very common. Without supervision and control, serious harm is caused to the environment and the life safety of personnel. The existing monitoring mode for the behavior of the engineering vehicle is to detect the behavior of the engineering vehicle through videos, but the existing video detection method generally only judges the operation state of the engineering vehicle through judging the whole motion trail of the engineering vehicle, for example, the engineering vehicle is recognized to be in a non-running state, namely, the engineering vehicle is judged to be in business vacation state, but the existing detection accuracy of the video detection method for the engineering vehicle is lower based on the complexity of the behavior of the engineering vehicle during construction operation.

Disclosure of Invention

The embodiment of the application provides a video detection method and a video detection device, which are used for solving the problem of lower detection accuracy in the existing video detection method.

In order to solve the technical problems, the application is realized as follows:

in a first aspect, an embodiment of the present application provides a video detection method, including:

Acquiring a first monitoring video of a target monitoring position;

Under the condition that the first monitoring video comprises a target engineering vehicle, 2M reference points of each frame of image in continuous N frames of images of the first monitoring video are extracted, wherein the 2M reference points comprise a detection frame center point and a mask center point of each of M characteristic parts of the target engineering vehicle, and N and M are positive integers;

2M reference points of two adjacent frames of images in the continuous N frames of images of the first monitoring video are matched to obtain 2K groups of key points successfully matched with the two adjacent frames of images in the continuous N frames of images of the first monitoring video, each group of key points comprises a first reference point and a second reference point matched with the first reference point, and the first reference point and the second reference point are corresponding reference points in the two adjacent frames of images respectively;

Calculating a basic matrix based on 2K groups of key points successfully matched with two adjacent frames of images in continuous N frames of images of the first monitoring image, wherein K is a positive integer;

Performing Singular Value Decomposition (SVD) on each basic matrix in a plurality of basic matrices obtained based on continuous N frames of images of the first monitoring video to obtain a rotation matrix and a translation matrix of each basic matrix in the plurality of basic matrices;

Determining position change values of M characteristic parts of the target engineering vehicle based on a rotation matrix and a translation matrix of each basic matrix in the plurality of basic matrices;

and judging the working state of the target engineering vehicle based on the position change values of the M characteristic parts.

Optionally, extracting 2M reference points of each frame of image in the continuous N frames of images of the first surveillance video under the condition that the first surveillance video includes the target engineering vehicle includes:

Under the condition that the first monitoring video comprises a target engineering vehicle, judging whether the target engineering vehicle is in a running state or not by calculating the position change of a central point pixel of the target engineering vehicle in N continuous frames of first images in the first monitoring video;

And under the condition that the target engineering vehicle is in a non-driving state, extracting 2M reference points of each frame of image in the continuous N frames of images of the first monitoring video.

Optionally, before extracting 2M reference points of each frame of image in the continuous N frames of images of the first surveillance video in the case that the first surveillance video includes the target engineering vehicle, the method further includes:

And carrying out instance segmentation on each frame of image of the first monitoring video to obtain labels of all objects in the first monitoring video.

Optionally, the performing instance segmentation on each frame of image of the first surveillance video to obtain a tag of each object in the first surveillance video includes:

Dividing each frame of image of the first monitoring video into A areas, wherein the A areas are areas obtained by dividing each frame of image of the first monitoring video for r times according to rows and c times according to columns respectively, A= (r+1) x (c+1), A is a positive integer, and r and c are both non-negative integers;

Respectively calculating the average gray values of R, G and B channels of each of the A areas of each frame of image of the first monitoring image to obtain H average gray values of each frame of image of the first monitoring image, wherein H=3× (r+1) × (c+1);

Based on the H average gray values of each frame of image of the first monitoring image, obtaining a feature vector of each frame of image of the first monitoring image;

Based on the feature vector of each frame of image of the first monitoring image, clustering the continuous N frames of images of the first monitoring video by utilizing a joint similarity calculation formula;

and labeling the clustered continuous N frames of images of the first monitoring video respectively.

Optionally, after the determining the working state of the target engineering vehicle based on the position change values of the M feature parts, the method further includes:

Judging whether the center point of the detection frame of the target engineering vehicle in each frame of image of the first monitoring video is positioned in the center point of the detection frame of the road;

and outputting first warning information under the condition that the center point of the detection frame of the target engineering vehicle is positioned outside the center point of the detection frame of the road in each frame of image of the first monitoring video.

judging whether a target Euclidean distance is smaller than a preset value or not under the condition that the target engineering vehicle is in an operation state and a person exists in the first monitoring video, wherein the target Euclidean distance is the furthest Euclidean distance between a central point pixel of the target engineering vehicle and a central point pixel of the person in each frame of image of the first monitoring video;

and outputting second alarm information under the condition that the target Euclidean distance is smaller than a preset value.

In a second aspect, an embodiment of the present application further provides a video detection apparatus, including:

the first acquisition module is used for acquiring a first monitoring video of the target monitoring position;

The first extraction module is used for extracting 2M reference points of each frame of image in continuous N frames of images of the first monitoring video under the condition that the first monitoring video comprises a target engineering vehicle, wherein the 2M reference points comprise a detection frame center point and a mask center point of each of M characteristic parts of the target engineering vehicle;

The first matching module is used for matching 2M reference points of two adjacent frames of images in the continuous N frames of images of the first monitoring video to obtain 2K groups of key points successfully matched with the two adjacent frames of images in the continuous N frames of images of the first monitoring video, each group of key points comprises a first reference point and a second reference point matched with the first reference point, and the first reference point and the second reference point are corresponding reference points in the two adjacent frames of images respectively;

the first calculation module is used for calculating a basic matrix based on 2K groups of key points successfully matched with two adjacent frames of images in the continuous N frames of images of the first monitoring image;

The first decomposition module is used for carrying out Singular Value Decomposition (SVD) on each basic matrix in a plurality of basic matrices obtained based on continuous N frames of images of the first monitoring video to obtain a rotation matrix and a translation matrix of each basic matrix in the plurality of basic matrices;

The first determining module is used for determining the position change values of M characteristic parts of the target engineering vehicle based on the rotation matrix and the translation matrix of each basic matrix in the plurality of basic matrices;

And the first judging module is used for judging the working state of the target engineering vehicle based on the position change values of the M characteristic parts.

Optionally, the first extraction module includes:

The first judging unit is used for judging whether the target engineering vehicle is in a running state or not by calculating the position change of a central point pixel of the target engineering vehicle in N continuous frames of first images in the first monitoring video under the condition that the first monitoring video comprises the target engineering vehicle;

the first extraction unit is used for extracting 2M reference points of each frame of image in the continuous N frames of images of the first monitoring video under the condition that the target engineering vehicle is in a non-driving state.

Optionally, the apparatus further comprises:

The first segmentation module comprises an example segmentation module for carrying out example segmentation on each frame of image of the first monitoring video to obtain labels of all objects in the first monitoring video.

Optionally, the first segmentation module includes:

The first dividing unit is used for dividing each frame of image of the first monitoring video into A areas, wherein the A areas are areas obtained by dividing each frame of image of the first monitoring video for r times according to rows and c times according to columns respectively, A= (r+1) x (c+1), A is a positive integer, and r and c are both non-negative integers;

A first calculation unit, configured to calculate average gray values of R, G and B channels of each of a regions of each frame of image of the first monitor image, respectively, to obtain H average gray values of each frame of image of the first monitor image, where h=3× (r+1) × (c+1);

The first determining unit is used for obtaining the feature vector of each frame of image of the first monitoring image based on the H average gray values of each frame of image of the first monitoring image;

The first clustering unit is used for clustering the continuous N frames of images of the first monitoring video by utilizing a joint similarity calculation formula based on the feature vector of each frame of image of the first monitoring image;

and the first labeling unit is used for labeling the clustered continuous N frames of images of the first monitoring video respectively.

Optionally, the apparatus further comprises:

The second judging module is used for judging whether the center point of the detection frame of the target engineering vehicle in each frame of image of the first monitoring video is positioned in the center point of the detection frame of the road;

the first output module is used for outputting first alarm information when the detection frame center point of the target engineering vehicle is located outside the detection frame center point of the road in each frame of image of the first monitoring video.

Optionally, the apparatus further comprises:

The third judging module is used for judging whether the target Euclidean distance is smaller than a preset value or not under the condition that the target engineering vehicle is in an operation state and personnel exist in the first monitoring video, wherein the target Euclidean distance is the furthest Euclidean distance between a central point pixel of the target engineering vehicle and a central point pixel of the personnel in each frame of image of the first monitoring video;

And the second output module is used for outputting second alarm information under the condition that the target Euclidean distance is smaller than a preset value.

In a third aspect, an embodiment of the present application further provides an electronic device, including a processor, a memory, and a computer program stored in the memory and capable of running on the processor, where the computer program when executed by the processor implements the steps of the video detection method described above.

In a fourth aspect, embodiments of the present application further provide a computer readable storage medium having a computer program stored thereon, the computer program implementing the steps of the video detection method described above when executed by a processor.

The video detection method comprises the steps of obtaining a first monitoring video of a target monitoring position; under the condition that the first monitoring video comprises a target engineering vehicle, 2M reference points of each frame of image in continuous N frames of images of the first monitoring video are extracted, wherein the 2M reference points comprise a detection frame center point and a mask center point of each of M characteristic parts of the target engineering vehicle; 2M reference points of two adjacent frames of images in the continuous N frames of images of the first monitoring video are matched to obtain 2K groups of key points successfully matched with the two adjacent frames of images in the continuous N frames of images of the first monitoring video, each group of key points comprises a first reference point and a second reference point matched with the first reference point, and the first reference point and the second reference point are corresponding reference points in the two adjacent frames of images respectively; calculating a basic matrix based on 2K groups of key points successfully matched with two adjacent frames of images in continuous N frames of images of the first monitoring image; performing Singular Value Decomposition (SVD) on each basic matrix in a plurality of basic matrices obtained based on continuous N frames of images of the first monitoring video to obtain a rotation matrix and a translation matrix of each basic matrix in the plurality of basic matrices; determining position change values of M characteristic parts of the target engineering vehicle based on a rotation matrix and a translation matrix of each basic matrix in the plurality of basic matrices; and judging the working state of the target engineering vehicle based on the position change values of the M characteristic parts. According to the method, the plurality of reference points of the plurality of characteristic parts of the target engineering vehicle in the first monitoring video are extracted, and then the position change values of the plurality of characteristic parts of the target engineering vehicle are calculated based on the rotation matrix and the translation matrix formed by the reference points, so that the actual operation state of the target engineering vehicle is judged, and the accuracy of the video detection method is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort to a person of ordinary skill in the art.

FIG. 1 is a flowchart of a video detection method according to an embodiment of the present application;

FIG. 2 is a flowchart of image instance segmentation in a video detection method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a video detection system according to an embodiment of the present application;

FIG. 4 is a second flowchart of a video detection method according to an embodiment of the present application;

Fig. 5 is a block diagram of a video detecting apparatus according to an embodiment of the present application;

fig. 6 is a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The embodiment of the application provides a video detection method. Referring to fig. 1, fig. 1 is a flowchart of a video detection method according to an embodiment of the present application, as shown in fig. 1, including the following steps:

step 101, acquiring a first monitoring video of a target monitoring position;

in this step, the target monitoring position may be preferably selected as an accident-prone area, that is, a frequent area of the illegal operation of the engineering vehicle, and the first monitoring video of the target monitoring position within a certain period of time is acquired.

102, Under the condition that a target engineering vehicle is included in the first monitoring video, 2M reference points of each frame of image in continuous N frames of images of the first monitoring video are extracted, wherein the 2M reference points comprise a detection frame center point and a mask center point of each of M characteristic parts of the target engineering vehicle, and N and M are positive integers;

In this step, the target engineering vehicle may be a type of vehicle such as an excavator, a forklift, a mixer truck, or a muck truck, and M feature points of the target engineering vehicle are first determined when the first monitoring video including the target engineering vehicle of the type described above are detected, and, for example, when the target engineering vehicle is an excavator, a bucket, an arm, a boom, a body, or the like of the excavator may be selected as the feature points of the excavator. And extracting a detection frame center point and a mask center point of each of the plurality of feature parts of the target engineering vehicle as reference points.

Step 103, matching 2M reference points of two adjacent frames of images in the continuous N frames of images of the first monitoring video to obtain 2K groups of key points of the two adjacent frames of images in the continuous N frames of images of the first monitoring video, wherein each group of key points comprises a first reference point and a second reference point matched with the first reference point, the first reference point and the second reference point are respectively corresponding reference points in the two adjacent frames of images, and K is a positive integer;

In this step, when the reference point is extracted for each frame of image, if the position of some feature parts of the target engineering vehicle changes, a situation that some feature parts are hidden in the image may occur, and some reference points may be invalid reference points, that is, the actual positions of the feature parts may not be reflected. Therefore, by matching the feature points extracted from the two adjacent frames of images, the subsequent operation is performed on the feature points which can be successfully matched, and the non-matched reference points are the ineffective reference points.

Each group of key points comprises a first reference point and a second reference point matched with the first reference point, wherein the first reference point and the second reference point are corresponding reference points in two adjacent frames of images respectively. For example, the set of key points may include a center point of a detection frame of a bucket of the excavator in the first frame image and a center point of a detection frame of a bucket of the excavator in the second frame image.

104, Calculating a basic matrix based on 2K groups of key points successfully matched with two adjacent frames of images in the continuous N frames of images of the first monitoring image;

in this step, each set of keypoints may form a vector, and a plurality of vectors formed by 2K sets of keypoints may calculate a base matrix.

Step 105, performing singular value decomposition SVD on each of a plurality of basic matrixes obtained based on continuous N frames of images of the first monitoring video to obtain a rotation matrix and a translation matrix of each of the plurality of basic matrixes;

In this step, a singular value decomposition (Singular Value Decomposition, SVD) is performed on the basis matrix, which may be decomposed into a rotation matrix and a translation matrix. The rotation matrix can reflect the rotation angles of a plurality of characteristic parts of the target engineering vehicle, and the translation matrix can reflect the horizontal displacement change of the plurality of characteristic parts of the target engineering vehicle.

Step 106, determining the position change values of M characteristic parts of the target engineering vehicle based on the rotation matrix and the translation matrix of each basic matrix in the plurality of basic matrices;

In the step, continuous N frames of images are analyzed, the overall rotation angle change and the overall horizontal displacement change of M characteristic parts of the target engineering vehicle in the first monitoring video are determined, and the overall position change of the M characteristic parts of the target engineering vehicle is determined.

And 107, judging the working state of the target engineering vehicle based on the position change values of the M characteristic parts.

In this step, the working state of the target working vehicle is determined based on the position change values of the M feature points, and it is possible to determine that the target working vehicle is in the working state by, for example, indicating that the positions of the M feature points have changed greatly when the position change values of the M feature points are larger than the set value.

According to the method, the plurality of reference points of the plurality of characteristic parts of the target engineering vehicle in the first monitoring video are extracted, then the position change values of the plurality of characteristic parts of the target engineering vehicle are calculated based on the rotation matrix and the translation matrix formed by the reference points, the method is not limited to the vehicle body position change of the target engineering vehicle, the actual operation state of the target engineering vehicle is judged from the plurality of characteristic parts of the target engineering vehicle, and the accuracy of the video detection method is improved.

In the video detection method of the embodiment of the application, before 2M reference points of each frame of image in the continuous N frames of images of the first monitoring video are extracted, the running state of the target engineering vehicle can be judged first, because the process of judging the running state of the target engineering vehicle is simpler, if the target engineering vehicle is in the running state, the target engineering vehicle can be determined to be in the working state, a series of complex analysis on a plurality of characteristic parts of the target engineering vehicle is not required to be extracted later to judge the working state of the target engineering vehicle.

Specifically, whether the target engineering vehicle is in a driving state or not can be judged by calculating the position change of the central point pixel of the target engineering vehicle in the continuous N frames of first images in the first monitoring control video, and M characteristic parts of the target engineering vehicle are further extracted for analysis under the condition that the target engineering vehicle is in a non-driving state, so that the accuracy of judging the operation state of the target engineering vehicle by the conventional video detection method is improved.

In the video detection method of the embodiment of the application, an example segmentation method is adopted for detecting each frame of image, compared with target detection, the target detection can locate and identify a specific object and provide a boundary box, and the example segmentation can provide a result of independently segmenting each pixel besides locating and identifying the specific object. Therefore, the video detection method of the embodiment of the application can further improve the accuracy of the video detection result by adopting example segmentation.

Respectively calculating the average gray values of R, G and B channels of each of the A areas of each frame of image of the first monitoring image to obtain H average gray values of each frame of image of the first monitoring image, wherein H=3× (r+1× (c+1);

In the video detection method of the embodiment of the present application, referring to fig. 2, in order to avoid that the person labeling results in different periods or person labeling results in the same period may have a large difference in the process of example segmentation, it is necessary to cluster images with similar scenes. Firstly, dividing each frame of image in a first monitoring video, specifically, dividing each frame of image r times by rows and c times by columns to obtain (r+1) x (c+1) regions, then performing similarity calculation on the (r+1) x (c+1) regions, specifically, calculating average gray values of R, G and B channels of each region in A regions of each frame of image of the first monitoring image respectively to obtain H average gray values of each frame of image of the first monitoring image, and then obtaining feature vectors of each frame of image of the first monitoring image according to the H average gray values of each frame of image of the first monitoring image.

After the feature vector of each frame of image of the first monitoring image is obtained, the continuous N frames of images of the first monitoring video are clustered by utilizing a joint similarity calculation formula, so that the accuracy of labeling the N frames of images can be improved. In addition, for different types of images, targets required to be marked can be respectively called for the example segmentation model detection in fig. 2, pre-marking and marking correction are carried out, and marking is circulated until the number of iterations is reached or certain sample data is obtained.

The aforementioned joint similarity is obtained by combining cosine similarity and ArcFace similarity:

The cosine similarity can be obtained based on the following calculation formula (1):

ArcFace similarity can be obtained based on the following calculation formula (2):

the joint similarity can be obtained based on the following calculation formula (3):

in the formula (3) of the present invention, Calculation/>And pi/6 and rounded down. When/>When the joint similarity is equivalent to cosine similarity; when/>When the joint similarity is equivalent to ArcFace similarity.

In the embodiment, the accuracy of image labeling in the image instance segmentation process can be improved by clustering the images.

In the video detection method provided by the embodiment of the application, further, whether the target engineering vehicle is illegally parked or not can be detected, and specifically, whether the center point of the detection frame of the target engineering vehicle in each frame of image of the first monitoring video is positioned in the center point of the detection frame of the road or not can be judged. And if the center point of the detection frame of the target engineering vehicle in the image is positioned outside the center point of the detection frame of the road, outputting first warning information for prompting.

In such an embodiment, the detection of the illegal parking behavior of the engineering vehicle is advantageous for improving road traffic safety.

In the video detection method provided by the embodiment of the application, when the target engineering vehicle is detected to be in the working state and the existence of the personnel is detected in the first monitoring video, whether the distance between the target engineering vehicle and the personnel is a safe distance needs to be further judged.

Specifically, firstly, calculating Euclidean distance between a center point pixel of a target engineering vehicle and a center point pixel of a person in each frame of image of a first monitoring video, then selecting the farthest Euclidean distance in a plurality of Euclidean distances corresponding to a plurality of frames of images, comparing the Euclidean distance with a preset value, and outputting second warning information to prompt if the farthest Euclidean distance is smaller than the preset value.

The video detection method of the embodiment of the application further judges whether the distance between the target engineering vehicle and the personnel is a safe distance or not so as to determine whether to output an alarm for prompting, thereby being beneficial to improving the safety of the target engineering vehicle in actual operation.

In the video detection method according to the embodiment of the present application, referring to fig. 3 and fig. 4, fig. 3 is a schematic diagram of a video detection system provided by the embodiment of the present application, and the video detection method mentioned in the foregoing embodiment is executed by the video detection system, specifically, a monitoring video of an accident-prone area is collected by a video monitoring device, and by identifying the behavior of an engineering vehicle in the monitoring video, whether there is an illegal behavior is determined, and if there is an illegal behavior, alarm information is sent to prompt to take measures. Fig. 4 is a schematic diagram of a video detection method according to an embodiment of the present application, and the explanation of the specific flow may be referred to the explanation of the foregoing embodiment, which is not repeated here.

The embodiment of the present application further provides a video detection apparatus, referring to fig. 5, the video detection apparatus 500 includes:

A first obtaining module 501, configured to obtain a first monitoring video of a target monitoring position;

A first extracting module 502, configured to extract 2M reference points of each frame of image in N continuous frames of images of the first surveillance video when the first surveillance video includes a target engineering vehicle, where the 2M reference points include a detection frame center point and a mask center point of each of M feature parts of the target engineering vehicle, and N and M are positive integers;

A first matching module 503, configured to match 2M reference points of two adjacent frames of images in the continuous N frames of images of the first monitoring video to obtain 2K groups of key points of two adjacent frames of images in the continuous N frames of images of the first monitoring video, where each group of key points includes a first reference point and a second reference point matched with the first reference point, the first reference point and the second reference point are respectively corresponding reference points in two adjacent frames of images, and K is a positive integer;

a first calculation module 504, configured to calculate a base matrix based on 2K sets of key points that are successfully matched between two adjacent frames of images in the continuous N frames of images of the first monitoring image;

A first decomposition module 505, configured to perform singular value decomposition SVD on each of a plurality of base matrices obtained based on continuous N frames of images of the first surveillance video, to obtain a rotation matrix and a translation matrix of each of the plurality of base matrices;

A first determining module 506, configured to determine a position change value of M feature parts of the target engineering vehicle based on a rotation matrix and a translation matrix of each of the plurality of base matrices;

A first judging module 507, configured to judge a working state of the target engineering vehicle based on the position change values of the M feature parts.

Optionally, the first extraction module includes:

Optionally, the apparatus further comprises:

the first obtaining comprises the steps of carrying out instance segmentation on each frame of image of the first monitoring video, and obtaining labels of all objects in the first monitoring video.

Optionally, the first acquisition module includes:

Optionally, the apparatus further comprises:

Referring to fig. 6, fig. 6 is a block diagram of an electronic device according to still another embodiment of the present application, and as shown in fig. 6, the electronic device includes: processor 601, communication interface 602, communication bus 604 and memory 603, wherein processor 601, communication interface 602 and memory 603 accomplish mutual interaction through communication bus 604.

Wherein the memory 603 is used for storing a computer program; a processor 601, configured to obtain a first monitoring video of a target monitoring position; under the condition that the first monitoring video comprises a target engineering vehicle, 2M reference points of each frame of image in continuous N frames of images of the first monitoring video are extracted, wherein the 2M reference points comprise a detection frame center point and a mask center point of each of M characteristic parts of the target engineering vehicle, and N and M are positive integers; 2M reference points of two adjacent frames of images in the continuous N frames of images of the first monitoring video are matched to obtain 2K groups of key points successfully matched with the two adjacent frames of images in the continuous N frames of images of the first monitoring video, each group of key points comprises a first reference point and a second reference point matched with the first reference point, and the first reference point and the second reference point are corresponding reference points in the two adjacent frames of images respectively; calculating a basic matrix based on 2K groups of key points successfully matched with two adjacent frames of images in continuous N frames of images of the first monitoring image, wherein K is a positive integer; performing Singular Value Decomposition (SVD) on each basic matrix in a plurality of basic matrices obtained based on continuous N frames of images of the first monitoring video to obtain a rotation matrix and a translation matrix of each basic matrix in the plurality of basic matrices; determining position change values of M characteristic parts of the target engineering vehicle based on a rotation matrix and a translation matrix of each basic matrix in the plurality of basic matrices; and judging the working state of the target engineering vehicle based on the position change values of the M characteristic parts.

Optionally, the processor 601 is specifically configured to:

Optionally, the processor 601 is further configured to:

The communication bus 604 referred to above for the electronic devices may be an external device interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, PCT) bus or a broad industry standard architecture (Extended Industry Standard Architecture, EISA) bus, or the like. The communication bus 604 may be divided into an address bus, a data bus, a control bus, and the like. For ease of identification, the drawing is shown with only one bold line, but does not show only one bus or one data type.

The communication interface 602 is used for communication between the above-described terminal and other devices.

The memory 603 may include random access memory (Random Access Memory, RAM) or may include non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory 603 may also be at least one storage device located remotely from the processor 601. The processor 601 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but may also be a digital signal processor (DIGITAL SIGNAL Processing, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.

The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the processes of the video detection method embodiment described above, and can achieve the same technical effects, so that repetition is avoided and no further description is given here. The computer readable storage medium is, for example, a Read-Only Memory (ROM), a random access Memory (Random Access Memory RAM), a magnetic disk or an optical disk.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.

Claims

1. A method of video detection, the method comprising:

Acquiring a first monitoring video of a target monitoring position;

2M reference points of two adjacent frames of images in the continuous N frames of images of the first monitoring video are matched to obtain 2K groups of key points successfully matched with the two adjacent frames of images in the continuous N frames of images of the first monitoring video, each group of key points comprises a first reference point and a second reference point matched with the first reference point, the first reference point and the second reference point are respectively corresponding reference points in the two adjacent frames of images, and K is a positive integer;

Calculating a basic matrix based on 2K groups of key points successfully matched with two adjacent frames of images in continuous N frames of images of the first monitoring image;

Judging the working state of the target engineering vehicle based on the position change values of the M characteristic parts;

And before extracting 2M reference points of each frame of image in the continuous N frames of images of the first monitoring video under the condition that the first monitoring video comprises the target engineering vehicle, the method further comprises the following steps:

Performing instance segmentation on each frame of image of the first monitoring video to obtain labels of all objects in the first monitoring video;

the performing instance segmentation on each frame of image of the first monitoring video to obtain labels of all objects in the first monitoring video includes:

2. The method according to claim 1, wherein extracting 2M reference points per frame image in consecutive N frame images of the first monitoring video in the case that the first monitoring video includes the target engineering vehicle, comprises:

3. The video detection method according to claim 1, wherein after the determination of the working state of the target engineering vehicle based on the position change values of the M feature parts, the method further comprises:

4. The video detection method according to claim 1, wherein after the determination of the working state of the target engineering vehicle based on the position change values of the M feature parts, the method further comprises:

5. A video detection apparatus, the apparatus comprising:

the first extraction module is used for extracting 2M reference points of each frame of image in continuous N frames of images of the first monitoring video under the condition that the first monitoring video comprises a target engineering vehicle, wherein the 2M reference points comprise a detection frame center point and a mask center point of each of M characteristic parts of the target engineering vehicle, and N and M are positive integers;

The first matching module is used for matching 2M reference points of two adjacent frames of images in the continuous N frames of images of the first monitoring video to obtain 2K groups of key points successfully matched with the two adjacent frames of images in the continuous N frames of images of the first monitoring video, each group of key points comprises a first reference point and a second reference point matched with the first reference point, the first reference point and the second reference point are respectively corresponding reference points in the two adjacent frames of images, and K is a positive integer;

the first judging module is used for judging the working state of the target engineering vehicle based on the position change values of the M characteristic parts;

the apparatus further comprises:

the first segmentation module comprises an example segmentation module for carrying out example segmentation on each frame of image of the first monitoring video to obtain labels of all objects in the first monitoring video;

The first segmentation module includes:

6. The video detection device of claim 5, wherein the first extraction module comprises:

7. An electronic device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the video detection method according to any one of claims 1 to 4.

8. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the video detection method according to any of claims 1 to 4.