CN112200101A

CN112200101A - Video monitoring and analyzing method for maritime business based on artificial intelligence

Info

Publication number: CN112200101A
Application number: CN202011102923.XA
Authority: CN
Inventors: 赵睿; 杜红飞; 万为东; 李超; 王华东; 赵志明; 许宁; 路轩轩; 徐顺; 张曼霞; 王鹏; 崔敬涛; 顾鹏飞; 郎亚辉; 王文才; 柳小涛
Original assignee: Henan Provincial Communication Planning and Design Institute Co Ltd
Current assignee: Henan Zhonggong Design and Research Institute Group Co.,Ltd.
Priority date: 2020-10-15
Filing date: 2020-10-15
Publication date: 2021-01-08
Anticipated expiration: 2040-10-15
Also published as: CN112200101B

Abstract

The invention discloses a video monitoring and analyzing method facing to maritime affairs based on artificial intelligence, which comprises the following steps that 1, a target object in each frame identification area of a video data source is identified by adopting an identification algorithm; 2, distinguishing and marking the target objects of the front frame and the rear frame of the video data source in the buffer area through a marking algorithm, finishing the non-repeated marking of the same target object and ensuring the uniqueness of the identified object; intercepting an identification area from each frame of the video, tracking the identification object in an internal area of the buffer area by using a tracking algorithm, tracking the track of the target object out of the buffer area, and ensuring that the target object is not disordered under the influence of overlapping, shielding and re-separating factors in the tracking process so as to obtain the result of the identification object; and 4, recording the position of the tracking target object out of the identification area, and performing behavior analysis and statistics on the tracking target object by using the position. The invention fully utilizes the video monitoring equipment of the existing inland waterway, thereby greatly saving the equipment replacement cost.

Description

Video monitoring and analyzing method for maritime business based on artificial intelligence

Technical Field

The invention relates to the field of inland waterway monitoring management, in particular to a video monitoring and analyzing method for maritime business based on artificial intelligence.

Background

In recent years, with the increasing number of touring ships, freight ships, ferrying ships and river resource development operation ships in inland waterways, hundreds of water traffic safety accidents are caused every year, hundreds of casualties and immeasurable property losses are caused, and great challenges are brought to supervision work of maritime departments.

In order to enhance the navigation control of inland river navigation sections, the maritime department usually adopts the technology of Automatic Identification of Ships (AIS) and inland river very high frequency shore ship data communication system (VHF) as main technologies at key docks and takes the channel video technical means of VTS radar and closed circuit television monitoring system (CCTV) as auxiliary technologies to carry out the safety supervision of various ships at present. The Automatic Identification System (AIS) of the ship is cooperated with a Global Positioning System (GPS) to broadcast the ship position, ship speed, course rate and course and other ship dynamic information combined with ship names, call signs, draft and dangerous goods and other ship static information to nearby ships and shore stations through a Very High Frequency (VHF) channel, so that the nearby ships and shore stations can timely master the dynamic and static information of all ships on nearby water surfaces, and can immediately coordinate with each other to take necessary avoidance actions, thereby greatly helping the safety of the ships.

The inland river very high frequency shore ship data communication system works in a Very High Frequency (VHF) wave band, is one of the most main communication means of inland river and offshore radio mobile services, can carry out ship distress, emergency, safe communication and daily service communication, and is also an important communication tool of search and rescue operation, coordination and avoidance among ships and a ship traffic service system. However, AIS and VTS radars also have exposed many drawbacks in the practical application of electronic cruise, and cannot meet the business requirements of intelligent maritime supervision. For example, AIS has signal blind areas, many ships are not opened or equipped with AIS for various reasons, and the information fusion of AIS and VTS radar is not perfect. The main performance lies in that AIS lacks visual observation and control ability of field conditions, and simultaneously, with continuous improvement of business requirements of a maritime department, the early standard definition video monitoring system has defects in an application process, for example, when a ship is overspeed, overloaded, turns around randomly or overtakes, an original standard definition camera cannot provide effective image details, particularly cannot see ship names clearly, and great inconvenience is brought to maritime supervision and law enforcement personnel.

Disclosure of Invention

The invention aims to provide a video monitoring and analyzing method for marine business based on artificial intelligence, which mainly realizes ship tracking and monitoring by taking a video means as a main means and effectively makes up for the defects of the existing ship positioning equipment.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention relates to a maritime business-oriented video monitoring and analyzing method based on artificial intelligence, which comprises the following steps:

step 1, identifying a target object in each frame identification area of a video data source by adopting an identification algorithm so as to be suitable for an irregular identification area to complete the complete identification of the target object; namely: sequentially reading each frame of a video data source in sequence and setting an identification area on each frame;

step 2, distinguishing and marking the target objects of the front frame and the rear frame of the video data source in a buffer area through a marking algorithm, completing the non-repeated marking of the same target object, distinguishing new and old target objects and ensuring the uniqueness of an identified object; namely: setting a buffer area according to a set reduction ratio according to the set identification area of each frame, wherein the buffer area is superposed with the central point of the identification area;

step 3, tracking the identification object in the internal area of the buffer area by using a tracking algorithm, tracking the track of the target object out of the buffer area, and ensuring that the target object is not disordered under the influence of overlapping, shielding and re-separating factors in the tracking process; namely: intercepting an identification area from each frame of the video, and processing the identification area by using an identification algorithm to obtain a result of an identification object, specifically: intercepting a recognition area from a video picture by using OpenCV, recognizing objects in an input model video interval frame by adopting a trained YOLO recognition model, and obtaining a recognition object list;

YOLO (You Only Look one) is an existing object recognition and positioning algorithm based on a deep neural network, and has the biggest characteristic of high operation speed and can be used for a real-time system; the innovation point of YOLO is that a region suggestion frame type detection framework is improved, the two stages of a candidate region and an object recognition are combined into a whole, and a predefined candidate region is adopted; dividing the whole image, wherein each part is responsible for target detection centered on the part, and predicting candidate frames, positioning confidence degrees and probability vectors of all classes of targets contained in each part at one time; after removing the candidate region, the structure of YOLO includes convolution, pooling and the last two fully-connected layers; the maximum difference is that the final output layer uses a linear function as an activation function to predict the position of a candidate frame and the probability of an object, and the YOLO target detection step is as follows:

step 3.1, read

Frame image, call reserve function (function for adjusting picture size) to adjust image size, and divide image into parts

A grid;

step 3.2, performing feature extraction on the image by using a convolutional neural network;

step 3.3, predicting the position and the type of the target: if the center of a target object falls in a grid, the grid is responsible for predicting the target object; to predict per mesh

In a candidate frame

Confidence and

a category of (1); an output of magnitude

The tensor of (a);

in order to divide the number of the meshes,

the number of frames responsible for each mesh,

the number of categories; each mesh will correspond to

The wide-height range of the bounding box is a full graph and represents the position of the bounding box for finding an object by taking the grid as a center; each bounding box corresponds to a score which represents whether an object exists at the position and the positioning accuracy:

each grid corresponds to

Probability value, finding out the category corresponding to the maximum probability

And the object or a portion of the object is considered to be contained in the grid; each grid corresponding to

The information contained in the dimensional vector is as follows:

1、

the probability of each object classification can be expressed as:

；

indicating the existence of the grid

The probability of (d);

2、

the position information of each candidate frame includes a center point

The coordinates,

Coordinates, frame width candidateswCandidate frame heighth(Center_x,Center_y,width,height)，

A candidate frame is commonly required

A number value to indicate its position;

3、

the confidence formula for the confidence candidates for each candidate box is:

；

a confidence level expressed as a target object;

is the probability of an object existing within the candidate box, as distinguished from

；

The method embodies the degree of closeness of the predicted candidate frame and the real target frame;

4. traversing the scores, excluding objects with lower scores and higher overlapping degrees, and outputting predicted objects;

and 4, recording the position of the tracking target object out of the identification area, and performing behavior analysis and statistics on the tracking target object by using the position.

The video monitoring equipment in the inland river navigation water area comprises shore-based video monitoring equipment mainly used for ports and docks and video monitoring equipment in cabins, and realizes automatic video monitoring and statistical analysis for maritime business by relying on technologies such as big data, cloud computing, artificial intelligence, machine learning and the like. The specific application fields comprise ship identification, ship tracking, ship running track monitoring, port ship entry and exit statistics, ship personnel behavior analysis and the like.

On one hand, the invention solves the problems that the prior video monitoring equipment only supports remote viewing and can not complete related maritime supervision services in an automatic mode, and the traditional manual means not only consumes time and labor, but also can not dispose emergent emergency events in time and the like; on the other hand, the problem of software and hardware binding is solved, the software and the hardware are separated, video monitoring equipment of the existing inland waterway can be fully utilized, and equipment replacement cost is greatly saved.

Drawings

FIG. 1 is a flow chart of the present invention.

FIG. 2 is a flow chart of the YOLO algorithm flow described in this invention.

FIG. 3 is a flow chart of the KCF filtering algorithm of the present invention.

Fig. 4 is a schematic diagram of setting an identification area on each frame according to the present invention.

Fig. 5 is a schematic diagram of the present invention for setting a buffer in the identification area.

Fig. 6 is a schematic diagram of determining new and old objects in step 3.5 according to the embodiment of the present invention.

FIG. 7 is a diagram illustrating the object crossing the buffer at step 3.6 according to the embodiment of the present invention.

FIG. 8 is a diagram of the trigger buffer for tracking the position of the object in step 3.7 according to the embodiment of the present invention.

FIG. 9 is a diagram illustrating the behavior of determining the tracking object in step 4 according to the embodiment of the present invention.

Detailed Description

The following describes embodiments of the present invention in detail with reference to the drawings, which are implemented on the premise of the technical solution of the present invention, and detailed embodiments and specific operation procedures are provided, but the scope of the present invention is not limited to the following embodiments.

The invention relates to a video monitoring and analyzing method for marine business based on artificial intelligence, which realizes the automatic video monitoring and statistical analysis for the marine business by relying on technologies such as big data, cloud computing, artificial intelligence, machine learning and the like. As shown in fig. 1, the steps are as follows:

step 1, identifying a target object in each frame identification area of a video data source by adopting an identification algorithm so as to be suitable for an irregular identification area to complete the complete identification of the target object; namely: sequentially reading each frame of the video data source in sequence and setting an identification area 1 on each frame, as shown in fig. 4;

step 2, distinguishing and marking the target objects of the front frame and the rear frame of the video data source in the buffer area 2 through a marking algorithm, completing the non-repeated marking of the same target object, distinguishing new and old target objects and ensuring the uniqueness of the identified object; namely: setting a buffer area 2 according to a set reduction ratio (for example, 95%) according to the identification area 1 set for each frame, wherein the buffer area 2 is shown as a shaded area in fig. 5, and the buffer area 2 coincides with the center point of the identification area 1; the buffer 2 is set for: 1. eliminating the influence of the moving object under the condition of constant change of the identification area 1; 2. distinguishing new and old objects so as to mark the objects or update the objects;

step 3, intercepting the identification area 1 from each frame of the video, and processing the identification area by using an identification algorithm to obtain the result of the identification object, namely: intercepting a recognition area 1 from a video picture by using OpenCV, recognizing objects in an interval frame of an input model video by adopting a trained YOLO recognition model, and obtaining a recognition object list; the method comprises the following specific steps:

step 3.1, as shown in fig. 2, the step of YOLO target detection:

step 3.1.1, read

Frame image, call up

Function operations resize images, divide images into

A grid;

step 3.1.2, performing feature extraction on the image by using a convolutional neural network;

step 3.1.3, predicting the position and the type of the target: if the center of a target object falls in the grid, the grid is responsible for predicting the target object; to predict per mesh

In a candidate frame

Confidence and

a category of (1); an output of magnitude

The tensor of (a);

in order to divide the number of the meshes,

the number of frames responsible for each mesh,

the number of categories; each mesh will correspond to

each grid corresponds to

The information contained in the dimensional vector is as follows:

1，

the probability of each object classification can be expressed as:

，

for which there is an object

The probability of (d);

2，

the position information of each candidate frame includes a center point

The coordinates,

Coordinates, frame candidate width w, frame candidate height h (Center _ x, Center _ y, width, height),

a candidate frame is commonly required

A number value to indicate its position;

3，

confidence of each candidate box:

the confidence formula for the candidate box is:

；

a confidence level expressed as a target object;

the representation target is

The probability of (d);

the degree of closeness of the predicted candidate frame and the real candidate frame is embodied;

step 3.1.4, traversing the scores, excluding objects with lower scores and higher overlapping degrees, and outputting predicted objects;

step 3.2, loss function of the YOLO algorithm:

the YOLO algorithm treats target detection as a regression problem, uses a mean square error loss function, but uses different weights for different partsA value; firstly distinguishing positioning error and classification error, adopting larger positioning error, namely boundary frame coordinate prediction error, then distinguishing confidence coefficient of boundary frame not containing target and boundary frame containing target, and adopting smaller weight value for former boundary frame

All other weighted values are set to be 1; then, the mean square error is adopted, the mean square error equally treats the boundary boxes with different sizes, the prediction of the width and the height of the network boundary box is changed into the prediction of the square root, namely the predicted value is changed into the prediction of the square root

(ii) a For classification errors it means that there is a mesh of objects to account for the error; the error formula is as follows:

coordinate prediction error:

；

a weight value when a target object is present;

is an accumulation operation;

as the center point of the candidate frame

The coordinates of the position of the object to be imaged,

as the center point of the candidate frame

Fourier transform of the coordinates;

as the centre of a candidate frame

The coordinates of the position of the object to be imaged,

as the center point of the candidate frame

Squares of coordinate fourier transform values;

is the value of the square of the candidate frame width,

fourier transform square values for the candidate frame widths;

the square value of the candidate box height is,

fourier transform square values of the candidate box heights;

accumulating each prediction frame and each grid in sequence;

representation grid

To (1)

When the object exists, the difference between the coordinate and the Fourier transform is calculated, each prediction frame and each grid are accumulated in sequence, and the result is multiplied by the weight value to obtain the prediction error of the coordinate, namely

Width w, height h error of frame

The same as above;

confidence error of candidate box containing target object:

representation grid

To (1)

There is an object in each of the prediction boxes,

is as follows

An object;

is as follows

The square of the individual subject fourier transform values;

accumulating each prediction frame and each grid in sequence;

when the object exists

Accumulating the difference of the square of the Fourier transform of the target object and each prediction frame and each grid in sequence, and calculating the confidence error of a candidate frame containing the target object;

confidence error for candidate box without target object:

representation grid

To (1)

No object exists in each prediction box;

a weight value when no object is present;

is as follows

The number of the objects is one,

is as follows

Fourier transform values of the individual subjects;

accumulating each prediction frame and each grid in sequence;

class prediction error:

is as follows

Objects exist in the grid;

a probability that the object is of a certain class;

the object is the square of a certain class of probability value Fourier transform;

in order to accumulate over all of the categories,

accumulating for each grid;

by passing

Making a difference with the square of Fourier transform, and then performing accumulation operation to calculate a category prediction error;

step 3.3, training of a YOLO network:

before training, firstly, pre-training is carried out on ImageNet (image training set), the pre-trained classification model adopts the first 53 convolutional layers, and 5 pooling layers and full-connection layers are added; testing of the network, of each candidate frame

(associated confidence score for target frame class)

(ii) a Calculate to obtain each

Is/are as follows(relevant confidence score of target frame class), setting a threshold value, filtering out candidate frames with low score, and performing NMS (non-maximum suppression) processing on the reserved candidate frames to obtain a final detection result;

step 3.4, the marking algorithm, process the above identified object list, determine if it is a new object, the determination method uses optimized IOU (intersection ratio,

) Calculating the overlapping ratio of the new object area, the buffer area 2 and the area of the existing object list, if the overlapping ratio is greater than or equal to the threshold value, the new object is not determined, and if the overlapping ratio is less than the threshold value, the new object is determined; the threshold value is obtained by intersecting and comparing the cache region 2 with the intersection of the new object and the old object, and the value of the common threshold value is more than 0.5; the specific algorithm steps are as follows:

suppose that the coordinates of the upper left vertex and the lower right vertex of the recognition box A are divided into

，

Identification frame

Is divided into the coordinates of the upper left vertex and the lower right vertex

,

(ii) a For ease of understanding, the recognition box is described in an algorithmic language: the coordinates are converted into a matrix and,

、

calculate the matrix

(integer value) if

If the numerical value of the integer value in the matrix is less than 0, the identification frames are not intersected; if it is

If the numerical value of the integer value in the matrix is greater than 0, carrying out transformation multiplication on the matrix;

；

；

setting the size of the identified object and a measured threshold value to be 0.5;

step 3.5, as shown in fig. 6, judging the new and old objects: if the object is a new object, marking, creating a tracker, recording information such as an ID (identity) and an initial position; if the object is not a new object, updating the initial position in the existing object tracker;

step 3.6, tracking algorithm: as shown in fig. 7, when the object passes through the buffer 2, a tracking algorithm is used to constantly record the position of the object 3, and a kcf (kernel Correlation filter) filtering algorithm mainly solves the problem of multi-object tracking overlap;

the KCF is a discrimination tracking method, which generally trains a target detector in the tracking process, uses the target detector to detect whether the next frame prediction position is a target, and then uses a new detection result to update a training set so as to update the target detector; when the target detector is trained, a target area is generally selected as a positive sample, the area around the target is a negative sample, and the probability that the area closer to the target is the positive sample is higher; as shown in fig. 3, the steps are as follows:

step 3.6.1, in

In the frame, at the current position

Nearby sampling, training a regressor, wherein the regressor can calculate the response of small-window sampling;

step 3.6.2 in

In a frame, at a previous frame position

Sampling the vicinity, judging each with the above-mentioned regressorA response of the sampling;

step 3.6.3, responding to the strongest sample as the frame position

：

The matrix algorithm comprises circulation matrix Fourier space diagonalization, Fourier diagonalization simplified ridge regression and kernel space ridge regression; circulant matrix fourier diagonalization equation:

；

is a circulant matrix;

is an origin vector

Fourier transform of (1);

is a fourier transform matrix;

upper label

Represents conjugate transpose:

in other words, the first and second electrodes,

similar to a diagonal matrix;

the ridge regression formula is:

；

a regression coefficient matrix;

regularization strength;

is a feature matrix;

a target variable matrix is obtained;

linear least squares with regularization.

Step 3.7, as shown in fig. 8, when the buffer area 2 is triggered by the position of the tracked object 4, the object 4 is monitored at all times, and whether the position of the object is away from the identification area 2 is compared;

step 3.8, after the tracked object 4 leaves the buffer area 2, destroying the corresponding ID tracker, further judging the behavior of the tracked object 4 according to the azimuth relation between the initial area and the leaving area, carrying out classification statistics, outputting the behavior to a screen, and storing the behavior to a database;

Identification frame

,

(ii) a For easy understandingThe recognition box is described in algorithmic logic:

the coordinates are converted into a matrix and,

；

respectively calculating the position of the mass center according to the coordinates of the identification frame

；

(ii) a The position of the dotted line is the central line of the identification area, and the central line position can be calculated

(ii) a Ignore

Influence of coordinates, i.e. determination

Position of

；

Step 4, recording the position of the tracking target object out of the identification area 1, and performing behavior analysis and statistics on the tracking target object by using the position; as shown in fig. 9, the steps are as follows:

step 4.1, if the starting centroid and the leaving centroid are both on the left side of the dotted line 5, the target object 6 is turned back for departure;

step 4.2, the starting centroid is on the left side of the dotted line 5, and the departure centroid is on the right side of the dotted line 5, and the departure target object 7 is the departure target object;

step 4.3, if the starting centroid and the leaving centroid are both on the right side of the dotted line 5, the target object 8 is turned back for entering;

step 4.4, the starting centroid to the right of the dashed line 5 and the departure centroid to the left of the dashed line 5 is the inbound target object 9.

Claims

1. A video monitoring and analyzing method for maritime affairs based on artificial intelligence is characterized in that: the method comprises the following steps:

step 3, intercepting an identification area from each frame of the video, tracking the identification object in an internal area of the buffer area by using a tracking algorithm, tracking the track of the target object out of the buffer area, and ensuring that the target object is tracked without confusion under the influence of overlapping, shielding and re-separating factors in the tracking process so as to obtain the result of the identification object;

2. The artificial intelligence based maritime service oriented video monitoring and analysis method according to claim 1, wherein: in step 3, tracking the identified object by using a tracking algorithm, comprising the following steps:

step 3.1, reading the P frame image, calling the Resize function to adjust the image size, and dividing the image into

A grid;

step 3.3, predicting the position and the type of the target: if the center of a target object is located at a certain target objectIn the grid, the grid is responsible for predicting the target object; to predict per mesh

In a candidate frame

Confidence and

a category of (1); an output of magnitude

The tensor of (a);

in order to divide the number of the meshes,

the number of frames responsible for each mesh,

the number of categories; each mesh corresponds to

The wide-height range of the bounding box is a full graph and represents the position of the bounding box for finding an object by taking the grid as a center; each bounding box corresponds to a score which represents whether an object exists at the position and the positioning accuracy;

;

the representation target is

The probability of (d);

representing the cross ratio of the real position and the predicted position;

each mesh corresponds to

，

Subject is at

A probability of occurrence under the condition, and considering that the object or a part of the object is contained in the grid; each grid corresponding to