CN117218636A - Ship board detection and identification method and device based on multilayer semantic fusion - Google Patents

Ship board detection and identification method and device based on multilayer semantic fusion Download PDF

Info

Publication number
CN117218636A
CN117218636A CN202311037312.5A CN202311037312A CN117218636A CN 117218636 A CN117218636 A CN 117218636A CN 202311037312 A CN202311037312 A CN 202311037312A CN 117218636 A CN117218636 A CN 117218636A
Authority
CN
China
Prior art keywords
ship
picture
semantic
model
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311037312.5A
Other languages
Chinese (zh)
Inventor
陈振盟
刘伟
王名孝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ningbo Shengyang Electronic Technology Co ltd
Original Assignee
Ningbo Shengyang Electronic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ningbo Shengyang Electronic Technology Co ltd filed Critical Ningbo Shengyang Electronic Technology Co ltd
Priority to CN202311037312.5A priority Critical patent/CN117218636A/en
Publication of CN117218636A publication Critical patent/CN117218636A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention relates to a ship board detection and identification method and device based on multilayer semantic fusion, comprising the following steps: acquiring a ship video acquired by a camera; processing the ship video to obtain a ship picture, and detecting and identifying the ship picture by using the trained ship plate retrieval baseline model and the ship plate identification baseline model to obtain a basic picture; inputting the basic picture into a trained ship board detection model, wherein the trained ship board detection model is based on fusion of 4 different layers of object semantic features and edge semantic features, and processing the basic picture to obtain a ship board detection picture; and inputting the ship plate detection map into a trained ship plate identification model, and identifying the ship plate detection map based on fusion of local semantic information and global semantic information by the trained ship plate identification model to obtain the ship plate name. The invention solves the technical problems of complex background, various deformation and difficult detection and identification of the ship plate in the prior art.

Description

Ship board detection and identification method and device based on multilayer semantic fusion
Technical Field
The invention belongs to the technical field of intelligent recognition, and particularly relates to a ship board detection recognition method and device based on multi-layer semantic fusion.
Background
With the rapid development of maritime transportation, maritime fishing industry and aquaculture industry, various cargo ships, fishing ships and aquaculture ships are increasingly more and more, so that the supervision of shipping traffic is difficult while economic steaming is carried out, and meanwhile, the ship passing records are hidden by means of manually closing a ship automatic identification system (AutomaticIdentificationSystem, AIS) or deliberately inputting error information and the like, so that the disorder of supervision is avoided.
In the prior art, a large number of cameras are built along the coast of a port, and a ship board is identified through a vision-based deep learning algorithm.
The prior art has at least the following problems:
1. in the existing shipping scene, the background of ship plate detection is more complex, the scale is influenced by shooting distance, the scale change is more severe, and judgment on the complex background of ship plates is absent.
2. Because the positions of the ship plates on the ship body are not uniform, the ship direction is more open, the shape of the ship plates is changed into free, and the ship plates are difficult to detect and identify through deformation.
3. The ship plate is subjected to the conditions of uneven illumination, shielding and the like caused by shooting angles or shooting environments, so that the content information of the ship plate is missing, and the detection and the identification of the ship plate are inaccurate.
4. The fonts and the font sizes of the ship cards are not uniform, so that the difficulty of detecting and identifying the ship cards is increased.
Disclosure of Invention
The invention provides a ship plate detection and identification method, device, equipment and medium based on multi-layer semantic fusion, which aim to solve the problems that in the prior art, in a shipping scene, the background of ship plate detection is more complex, the scale is influenced by shooting distance, the scale change is more severe, and judgment on the complex background of ship plate is absent; because the positions of the ship plates on the ship body are not uniform, the ship direction is more open, the shape of the ship plates is changed into free, and the ship plates are difficult to detect and identify the deformed ship plates; the ship plate is subjected to the conditions of uneven illumination, shielding and the like caused by shooting angles or shooting environments, so that the content information of the ship plate is lost, and the ship plate is inaccurate to detect and identify; and the font size of the ship plate are not uniform, so that the difficulty of ship plate detection and identification is increased.
The technical scheme for solving the technical problems is as follows: a ship board detection and identification method based on multi-layer semantic fusion comprises the following steps:
s1: acquiring a ship video acquired by a camera;
s2: processing the ship video to obtain a ship picture, wherein the ship picture is a picture containing a ship and obtained from the ship video;
S3: training a ship plate detection baseline model and a ship plate identification baseline model by adopting a pre-labeling method based on the ship picture, and detecting and identifying the ship picture by utilizing the trained ship plate retrieval baseline model and the ship plate identification baseline model to obtain a basic picture;
s4: training a ship plate detection model, and processing the basic picture by using the trained ship plate detection model to obtain a ship plate detection diagram; the trained ship board detection model is based on fusion of object semantic features and edge semantic features of 4 different layers in the basic picture, so that the basic picture is processed;
s5: training a ship plate recognition model, and carrying out recognition processing on the ship plate detection diagram by using the trained ship plate recognition model to obtain a ship plate name, wherein the trained ship plate recognition model is based on fusion of local semantic information and global semantic information of Chinese characters in the ship plate detection diagram, so as to realize recognition processing on the ship plate detection diagram.
The beneficial effects of the invention are as follows: in the aspect of ship plate detection, the ship plate detection method can effectively solve the problem of ship plate detection under complex environment and severe scale change through adjacent layer fusion and global fusion of 4 features with different scales. In the aspect of ship board recognition, through fusion of the local semantic of radicals and the global semantic information of single Chinese characters, more detail information can be extracted, and recognition problems caused by uneven illumination, shielding and font size of a model are improved.
On the basis of the technical scheme, the invention can be improved as follows.
Further, in S1, the sources of the ship video are: port access vessel video, dock access vessel video, marine vessel video, and shore berth vessel video.
The beneficial effects of adopting the further scheme are as follows: according to the invention, the ship plate detection and identification are carried out by acquiring the monitoring video of each monitoring place, so that the safety monitoring of the ship is ensured.
Further, the S2 specifically includes: and inputting the ship video into a ship detection model, and extracting a video frame containing the ship from the video stream by using the ship detection model to obtain a ship picture.
The beneficial effects of adopting the further scheme are as follows: according to the invention, the ship video of the video stream is extracted as the ship picture, and the ship plate detection and identification are performed later.
Further, in the step S3, training the ship plate detection baseline model and the ship plate identification baseline model by adopting the pre-labeling method specifically comprises the following steps:
s3.1: pre-labeling ship plates in the first number of ship pictures to obtain the first number of first pictures;
s3.2: training an initial ship board detection baseline model and an initial ship board identification baseline model through a first number of first pictures to obtain a first ship board detection baseline model and a first ship board identification baseline model;
S3.3: pre-labeling a second number of ship pictures through a first ship board detection baseline model and a first ship board identification baseline model to obtain a second picture containing first pre-labeling information, and correcting the second picture containing the first pre-labeling information;
s3.4: training the first ship board detection baseline model and the first ship board identification baseline model through the first picture and the corrected second picture to obtain a second ship board detection baseline model and a second ship board identification baseline model;
s3.5: pre-marking a third number of ship pictures through a second ship board detection baseline model and a second ship board identification baseline model to obtain third pictures containing second pre-marking information, and performing false mark deletion on the third pictures containing the second pre-marking information;
s3.6: training the second ship board detection baseline model and the second ship board identification baseline model through the first picture, the corrected second picture and the third picture with the error mark deleted to obtain the ship board detection baseline model and the ship board identification baseline model.
The beneficial effects of adopting the further scheme are as follows: the ship plate of the ship picture is marked through the ship plate detection baseline model and the ship plate identification baseline model, so that a basic picture for ship plate detection and identification is obtained.
Further, the training ship board detection model in the above step S4 specifically includes:
s4.1: carrying out data enhancement on the basic picture to obtain a data enhancement picture;
s4.1.1: performing image preprocessing on the basic picture to obtain a preprocessed picture, wherein the image preprocessing comprises rotation, perspective transformation, cutting, overturning and blurring;
s4.1.2: inputting the preprocessed picture into a multi-scale training model, and processing the preprocessed picture by using the dimension calibrated by the multi-scale training model to obtain the preprocessed picture with the calibrated dimension;
s4.1.3: the pixel value of each pixel point of the preprocessed picture with the calibrated size is converted from [0,255] integer data to [ -1,1] floating point data, so as to obtain a numerical value standardized picture;
s4.1.4: generating y_score information, y_thresh information and y_binary information of the numerical value standardized picture by a graph tag information generation method to obtain a data enhancement picture; wherein:
the y_score information comprises the size of a numerical standardized picture, the pixel value of the pixel point in the union of the graphic labels is 1, and the pixel value of the pixel point outside is 0;
the y_thresh information comprises the size of a numerical standardized picture, the pixel value of a pixel point around the edge of the graphic label in a threshold distance range is calculated by the shortest distance from the pixel point to the nearest edge of the marked graphic, and when the shortest distance is 0, y_thresh=0.7; when the shortest distance is equal to the threshold distance, y_thresh=0.3, removing the pixel points in the threshold distance range around the edge of the graphic label, and the pixel value of the rest pixel points is 0;
The y_bin information comprises the size of a numerical standardized picture, the pixel value of the pixel point in the union of the graphic labels is 1, and the pixel value of the pixel point outside is 0;
s4.2: inputting the data enhancement picture into a basic feature extraction network layer, and carrying out feature extraction on the data enhancement picture by utilizing a ResNet34 network of the basic feature extraction network layer to obtain a feature map containing pyramid features with 4 different levels;
s4.3: inputting the feature images containing 4 pyramid features with different levels into a sampling fusion layer, and carrying out up-sampling fusion on the feature images containing 4 pyramid features with different levels by utilizing a DBFPN network of the sampling fusion layer to obtain a ship board detection image;
s4.4: inputting the ship board detection graph into a plurality of deconvolution layers in parallel, and processing the ship board detection graph by utilizing the deconvolution layers to obtain a foreground-background classification score, a segmentation boundary threshold value and a binarization segmentation result, wherein the binarization segmentation result is obtained by calculating the foreground-background classification score and the segmentation boundary threshold value, and the calculation formula is shown as follows:
wherein score is foreground-background classification score, thresh is segmentation boundary threshold, and bin is binary segmentation result;
S4.5: constructing a binary classification Loss function BCELoss according to the foreground and background classification score, constructing a regression Loss function L1Loss according to the segmentation boundary threshold, constructing a segmentation Loss function DiceLoss according to the binary segmentation result, constructing a final Loss function according to the binary classification Loss function BCELoss, the regression Loss function L1Loss and the segmentation Loss function DiceLoss, and performing iterative training on the ship board detection model through the final Loss function;
the classification loss function BCELoss is shown as follows:
l bce =y 1 *logx 1 +(1-y 1 )*log(1-x 1 )
wherein x1 is a foreground score in the foreground-background classification score, the range is [0,1], and y1 is y_score information;
the regression Loss function L1Loss is shown as follows:
l l1 =|x 2 -y 2 |
wherein x2 is a segmentation boundary threshold, and y2 is y_thresh information;
the segmentation loss function DiceLoss is shown as follows:
wherein x3 is a binarization segmentation result, more than 0 is a foreground, less than 0 is a background, and y3 is y_binary information;
the final loss function is shown as follows:
l=l bce +l l1 +l dice
wherein l bce For the two classification loss functions BCELoss, l l1 For regression Loss function L1Loss, L dice Is the segmentation loss function DiceLoss.
The beneficial effects of adopting the further scheme are as follows: the invention provides a ship board detection improvement method based on DBNet. The DBNet is added into the DBFPN, 4 object semantic features and edge semantic features in different layers are fused, the adaptability of the DBNet to the scale is improved, and meanwhile, the judgment capability of the DBNet to the complex background is improved through learning of the self-adaptive threshold. The pixel values of the ship board detection diagram, the output foreground and background classification scores and the segmentation boundary threshold value are aligned, so that the length and the width of the ship board detection diagram are consistent with those of the ship board detection diagram, and the number of channels is 1.
Further, the training ship plate recognition model in the above step S5 specifically includes:
s5.1: inputting the ship board detection map into a feature extraction network layer, carrying out feature extraction on the ship board detection map by utilizing a ResNet50 network of the feature extraction network layer, and outputting a semantic feature map;
s5.2: inputting the semantic feature map into a context semantic fusion layer, and carrying out context semantic fusion on the semantic feature map by utilizing two layers of BiLSTM in the context semantic fusion layer to obtain a context semantic fusion map;
s5.3: inputting the context semantic fusion map into a global semantic layer, mapping the context semantic fusion map into Chinese characters by utilizing the global semantic layer to obtain a Chinese character distribution thermodynamic diagram, obtaining a central pixel point of each Chinese character through expected calculation based on a Chinese character thermodynamic distribution map, constructing a Chinese character area according to the positive-negative distribution to obtain the feature distribution of each Chinese character, and generating a global semantic feature map of each Chinese character;
s5.4: the global semantic feature map of each Chinese character is input into a local semantic layer, a plurality of radical feature vectors constructed based on the local semantic layer, the global semantic feature map and all the radical feature vectors are used as the input of a self-attention mechanism, wherein the global semantic feature map is used as a q vector in the self-attention mechanism, and the radical feature vectors are used as k vectors and v vectors of the self-attention mechanism, so that the information of the radical vectors is fused into a standard through the self-attention mechanism. Obtaining an output result of a self-attention mechanism, wherein the output result is a local semantic feature diagram of a single Chinese character; s5.5, transforming the local semantic feature map to a single Chinese character level, and superposing the transformed local semantic feature map and the global semantic feature map to obtain a radical correction feature vector;
S5.6: inputting the radical correction feature vector into a linear layer to obtain the prediction results of single Chinese characters, and combining the prediction results of all the single Chinese characters of the ship plate to obtain the name of the ship plate.
The beneficial effects of adopting the further scheme are as follows: the invention provides a ship plate identification improvement method CRNN++ based on CRNN. CRNN belongs to an end-to-end character recognition model, comprises a self-separation model, and does not need additional character segmentation labels. According to the character characteristics, the invention improves the CRNN, establishes the local semantic information combining the radicals and the global semantic information of the whole single character, further strengthens the information extraction capability of the characters through the fusion of the local semantic and the global semantic, and constructs more stable character recognition.
Further, in S5.4 above, a cross entropy loss between the local semantic feature map of the single chinese character and the radical local semantic of the single chinese character is calculated, that is, a cross entropy loss between the radical local sequence feature map of the single chinese character and the radical decoding sequence of the single chinese character is calculated, and the radical local semantic is trained through the cross entropy loss, so as to assist the trunk model of the local semantic layer to learn the radical information.
The beneficial effects of adopting the further scheme are as follows: according to the invention, the local semantic layer is iterated by calculating the cross entropy loss, so that the local semantic layer is more accurately output.
In a second aspect, the present invention further provides a ship board detection and recognition device based on multi-layer semantic fusion, which comprises:
and (3) acquiring a video module: the camera is used for acquiring the ship video acquired by the camera;
and a picture conversion module: the ship video processing method comprises the steps of processing a ship video to obtain a ship picture, wherein the ship picture is obtained from the ship video and contains a ship;
and the marking module is used for: the ship picture detection method is used for training a ship plate detection baseline model and a ship plate identification baseline model by adopting a pre-marking method based on the ship picture, and detecting and identifying the ship picture by utilizing the trained ship plate retrieval baseline model and the ship plate identification baseline model to obtain a basic picture;
the ship plate detection module: the ship plate detection model is used for training a ship plate detection model, and the basis pictures are processed by using the trained ship plate detection model to obtain a ship plate detection diagram; the trained ship board detection model is based on fusion of object semantic features and edge semantic features of 4 different layers in the basic picture, so that the basic picture is processed;
The ship plate identification module: the method is used for training a ship plate recognition model, and the ship plate detection diagram is recognized by using the trained ship plate recognition model to obtain a ship plate name, wherein the trained ship plate recognition model is based on fusion of local semantic information and global semantic information of Chinese characters in the ship plate detection diagram, so that the ship plate detection diagram is recognized.
In a third aspect, the present application further provides an electronic device for solving the above technical problem, where the electronic device includes a memory, a processor, and a computer program stored on the memory and capable of running on the processor, and when the processor executes the computer program, the processor implements the method for detecting and identifying a ship board based on multi-layer semantic fusion according to the present application.
In a fourth aspect, the present application further provides a computer readable storage medium, where a computer program is stored, where the computer program is executed by a processor to implement the ship board detection and recognition method based on multi-layer semantic fusion according to the present application.
Additional aspects and advantages of the application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application.
Drawings
Fig. 1 is a schematic flow chart of a ship board detection and identification method based on multi-layer semantic fusion according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a ship board detection and recognition device based on multi-layer semantic fusion according to an embodiment of the present invention.
Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of a ship board detection model according to an embodiment of the present invention.
Fig. 5 is a schematic diagram of a ship board recognition model according to an embodiment of the present invention.
Detailed Description
The principles and features of the present invention are described below with examples given for the purpose of illustration only and are not intended to limit the scope of the invention.
The following describes the technical scheme of the present invention and how the technical scheme of the present invention solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.
The scheme provided by the embodiment of the invention can be applied to any application scene needing data storage. The scheme provided by the embodiment of the invention can be executed by any electronic equipment, for example, the scheme can be terminal equipment of a user and comprises at least one of the following steps: smart phone, tablet computer, notebook computer, desktop computer, intelligent audio amplifier, intelligent wrist-watch, smart television, intelligent vehicle equipment.
The embodiment of the invention provides a possible implementation manner, as shown in fig. 1, a flow diagram of a ship board detection and identification method based on multi-layer semantic fusion is provided, and the scheme can be executed by any electronic device, for example, a terminal device or a terminal device and a server together. For convenience of description, a method provided by an embodiment of the present invention will be described below by taking a server as an execution body, and the method may include the following steps as shown in a flowchart in fig. 1:
s1: acquiring a ship video acquired by a camera;
the sources of ship videos are as follows: port access vessel video, dock access vessel video, marine vessel video, and shore berth vessel video.
The ship license plate detection and identification are carried out by acquiring the monitoring video of each monitoring place, so that the safety monitoring of the ship is ensured.
S2: processing the ship video to obtain a ship picture, wherein the ship picture is a picture containing a ship and obtained from the ship video;
and inputting the ship video into a ship detection model, and extracting a video frame containing the ship from the video stream by using the ship detection model to obtain a ship picture.
According to the ship video processing method, the ship video of the video stream is extracted to be the ship picture, and ship plate detection and identification are performed in the aspect of follow-up.
S3: training a ship plate detection baseline model and a ship plate identification baseline model by adopting a pre-labeling method based on the ship picture, and detecting and identifying the ship picture by utilizing the trained ship plate retrieval baseline model and the ship plate identification baseline model to obtain a basic picture; training the ship plate detection baseline model and the ship plate identification baseline model by adopting a pre-labeling method comprises the following specific steps:
s3.1: pre-labeling ship plates in the first number of ship pictures to obtain the first number of first pictures;
s3.2: training an initial ship board detection baseline model and an initial ship board identification baseline model through a first number of first pictures to obtain a first ship board detection baseline model and a first ship board identification baseline model;
s3.3: pre-labeling a second number of ship pictures through a first ship board detection baseline model and a first ship board identification baseline model to obtain a second picture containing first pre-labeling information, and correcting the second picture containing the first pre-labeling information;
s3.4: training the first ship board detection baseline model and the first ship board identification baseline model through the first picture and the corrected second picture to obtain a second ship board detection baseline model and a second ship board identification baseline model;
S3.5: pre-marking a third number of ship pictures through a second ship board detection baseline model and a second ship board identification baseline model to obtain third pictures containing second pre-marking information, and performing false mark deletion on the third pictures containing the second pre-marking information;
s3.6: training the second ship board detection baseline model and the second ship board identification baseline model through the first picture, the corrected second picture and the third picture with the error mark deleted to obtain the ship board detection baseline model and the ship board identification baseline model.
The ship plate detection base line model and the ship plate identification base line model are used for marking the ship plate of the ship picture, so that a basic picture for ship plate detection and identification is obtained.
S4: training a ship plate detection model, and processing the basic picture by using the trained ship plate detection model to obtain a ship plate detection diagram; the trained ship board detection model is based on fusion of object semantic features and edge semantic features of 4 different layers in the basic picture, so that the basic picture is processed; the training ship board detection model specifically comprises the following steps:
S4.1: carrying out data enhancement on the basic picture to obtain a data enhancement picture;
s4.1.1: performing image preprocessing on the basic picture to obtain a preprocessed picture, wherein the image preprocessing comprises rotation, perspective transformation, cutting, overturning and blurring;
s4.1.2: inputting the preprocessed picture into a multi-scale training model, and processing the preprocessed picture by using the dimension calibrated by the multi-scale training model to obtain the preprocessed picture with the calibrated dimension;
s4.1.3: the pixel value of each pixel point of the preprocessed picture with the calibrated size is converted from [0,255] integer data to [ -1,1] floating point data, so as to obtain a numerical value standardized picture;
s4.1.4: generating y_score information, y_thresh information and y_binary information of the numerical value standardized picture by a graph tag information generation method to obtain a data enhancement picture; wherein:
the y_score information comprises the size of a numerical standardized picture, the pixel value of the pixel point in the union of the graphic labels is 1, and the pixel value of the pixel point outside is 0;
the y_thresh information comprises the size of a numerical standardized picture, the pixel value of a pixel point around the edge of the graphic label in a threshold distance range is calculated by the shortest distance from the pixel point to the nearest edge of the marked graphic, and when the shortest distance is 0, y_thresh=0.7; when the shortest distance is equal to the threshold distance, y_thresh=0.3, removing the pixel points in the threshold distance range around the edge of the graphic label, and the pixel value of the rest pixel points is 0;
The y_bin information comprises the size of a numerical standardized picture, the pixel value of the pixel point in the union of the graphic labels is 1, and the pixel value of the pixel point outside is 0;
s4.2: inputting the data enhancement picture into a basic feature extraction network layer, and carrying out feature extraction on the data enhancement picture by utilizing a ResNet34 network of the basic feature extraction network layer to obtain a feature map containing pyramid features with 4 different levels;
s4.3: inputting the feature images containing 4 pyramid features with different levels into a sampling fusion layer, and carrying out up-sampling fusion on the feature images containing 4 pyramid features with different levels by utilizing a DBFPN network of the sampling fusion layer to obtain a ship board detection image;
s4.4: inputting the ship board detection graph into a plurality of deconvolution layers in parallel, and processing the ship board detection graph by utilizing the deconvolution layers to obtain a foreground-background classification score, a segmentation boundary threshold value and a binarization segmentation result, wherein the binarization segmentation result is obtained by calculating the foreground-background classification score and the segmentation boundary threshold value, and the calculation formula is shown as follows:
wherein score is foreground-background classification score, thresh is segmentation boundary threshold, and bin is binary segmentation result;
S4.5: constructing a binary classification Loss function BCELoss according to the foreground and background classification score, constructing a regression Loss function L1Loss according to the segmentation boundary threshold, constructing a segmentation Loss function DiceLoss according to the binary segmentation result, constructing a final Loss function according to the binary classification Loss function BCELoss, the regression Loss function L1Loss and the segmentation Loss function DiceLoss, and performing iterative training on the ship board detection model through the final Loss function;
the classification loss function BCELoss is shown as follows:
l bce =y 1 *logx 1 +(1-y 1 )*log(1-x 1 )
wherein x1 is a foreground score in the foreground-background classification score, the range is [0,1], and y1 is y_score information;
the regression Loss function L1Loss is shown as follows:
l l1 =|x 2 -y 2 |
wherein x2 is a segmentation boundary threshold, and y2 is y_thresh information;
the segmentation loss function DiceLoss is shown as follows:
wherein x3 is a binarization segmentation result, more than 0 is a foreground, less than 0 is a background, and y3 is y_binary information;
the final loss function is shown as follows:
l=l bce +l l1 +l dice
wherein l bce For the two classification loss functions BCELoss, l l1 For regression Loss function L1Loss, L dice Is the segmentation loss function DiceLoss.
The invention provides a ship board detection improvement method based on DBNet. The DBNet is added into the DBFPN, 4 object semantic features and edge semantic features in different layers are fused, the adaptability of the DBNet to the scale is improved, and meanwhile, the judgment capability of the DBNet to the complex background is improved through learning of the self-adaptive threshold. The pixel values of the ship board detection diagram, the output foreground and background classification scores and the segmentation boundary threshold value are aligned, so that the length and the width of the ship board detection diagram are consistent with those of the ship board detection diagram, and the number of channels is 1.
S5: training a ship plate recognition model, and carrying out recognition processing on the ship plate detection diagram by using the trained ship plate recognition model to obtain a ship plate name, wherein the trained ship plate recognition model is based on fusion of local semantic information and global semantic information of Chinese characters in the ship plate detection diagram, so as to realize recognition processing on the ship plate detection diagram. The training ship plate recognition model specifically comprises the following steps:
s5.1: inputting the ship board detection map into a feature extraction network layer, carrying out feature extraction on the ship board detection map by utilizing a ResNet50 network of the feature extraction network layer, and outputting a semantic feature map;
s5.2: inputting the semantic feature map into a context semantic fusion layer, and carrying out context semantic fusion on the semantic feature map by utilizing two layers of BiLSTM in the context semantic fusion layer to obtain a context semantic fusion map;
s5.3: inputting the context semantic fusion map into a global semantic layer, mapping the context semantic fusion map into Chinese characters by utilizing the global semantic layer to obtain a Chinese character distribution thermodynamic diagram, obtaining a central pixel point of each Chinese character through expected calculation based on a Chinese character thermodynamic distribution map, constructing a Chinese character area according to the positive-negative distribution to obtain the feature distribution of each Chinese character, and generating a global semantic feature map of each Chinese character;
S5.4: the global semantic feature map of each Chinese character is input into a local semantic layer, a plurality of radical feature vectors constructed based on the local semantic layer, the global semantic feature map and all the radical feature vectors are used as the input of a self-attention mechanism, wherein the global semantic feature map is used as a q vector in the self-attention mechanism, and the radical feature vectors are used as k vectors and v vectors of the self-attention mechanism, so that the information of the radical vectors is fused into a standard through the self-attention mechanism. Obtaining an output result of a self-attention mechanism, wherein the output result is a local semantic feature diagram of a single Chinese character; and then calculating the cross entropy loss between the local semantic feature map of the single Chinese character and the local semantic of the radicals of the single Chinese character, namely calculating the cross entropy loss between the local sequence feature map of the radicals of the single Chinese character and the decoding sequence of the radicals of the single Chinese character, training the local semantic of the radicals through the cross entropy loss, and assisting the trunk model of the local semantic layer to learn the radical information.
The method and the device enable the output of the local semantic layer to be more accurate by calculating the cross entropy loss iteration local semantic layer.
S5.5, transforming the local semantic feature map to a single Chinese character level, and superposing the transformed local semantic feature map and the global semantic feature map to obtain a radical correction feature vector;
S5.6: inputting the radical correction feature vector into a linear layer to obtain the prediction results of single Chinese characters, and combining the prediction results of all the single Chinese characters of the ship plate to obtain the name of the ship plate.
The invention provides a ship board identification improvement method CRNN++ based on CRNN. CRNN belongs to an end-to-end character recognition model, comprises a self-separation model, and does not need additional character segmentation labels. According to the character characteristics, the invention improves the CRNN, establishes the local semantic information combining the radicals and the global semantic information of the whole single character, further strengthens the information extraction capability of the characters through the fusion of the local semantic and the global semantic, and constructs more stable character recognition.
In the aspect of ship plate detection, the ship plate detection method can effectively solve the problem of ship plate detection under complex environment and severe scale change through fusion and global fusion of adjacent layers of 4 features with different scales. In the aspect of ship board recognition, through fusion of the local semantic of radicals and the global semantic information of single Chinese characters, more detail information can be extracted, and recognition problems caused by uneven illumination, shielding and font size of a model are improved.
The following is a supplementary explanation in connection with specific examples:
s1: acquiring a ship video acquired by a camera;
the sources of ship videos are as follows: port access vessel video, dock access vessel video, marine vessel video, and shore berth vessel video.
S2: processing the ship video to obtain a ship picture, wherein the ship picture is a picture containing a ship and obtained from the ship video; the method comprises the following steps: and inputting the ship video into a ship detection model, extracting a video frame containing the ship by the ship detection model according to the video stream, and outputting to obtain a ship picture.
S3: training a ship plate detection baseline model and a ship plate identification baseline model by adopting a pre-labeling method based on the ship picture, and detecting and identifying the ship picture by using the trained ship plate retrieval baseline model and the ship plate identification baseline model to obtain a basic picture; training the ship plate detection baseline model and the ship plate identification baseline model by adopting a pre-labeling method comprises the following specific steps:
s3.1: pre-marking the ship plates in 700 ship pictures to obtain 700 first pictures; the pre-labeling can be performed manually, so that the accuracy of model training is guaranteed.
S3.2: training an initial ship board detection baseline model and an initial ship board identification baseline model through 700 first pictures to obtain a first ship board detection baseline model and a first ship board identification baseline model;
S3.3: pre-marking 2300 ship pictures through a first ship plate detection baseline model and a first ship plate identification baseline model to obtain a second picture containing first pre-marking information, and correcting the second picture containing the first pre-marking information;
s3.4: training the first ship board detection baseline model and the first ship board identification baseline model through 700 first pictures and the corrected 2300 Zhang Dier pictures to obtain a second ship board detection baseline model and a second ship board identification baseline model;
s3.5: pre-labeling 7000 ship pictures through a second ship board detection baseline model and a second ship board identification baseline model to obtain third pictures containing second pre-labeling information, and performing wrong mark deletion on the 7000 third pictures containing the second pre-labeling information;
s3.6: training the second ship board detection baseline model and the second ship board recognition baseline model through 700 first pictures, 2300 corrected second pictures and 7000 third pictures with error marks deleted to obtain the ship board detection baseline model and the ship board recognition baseline model.
S4: training a ship plate detection model, and processing the basic picture by using the trained ship plate detection model to obtain a ship plate detection diagram; the trained ship board detection model is based on fusion of object semantic features and edge semantic features of 4 different layers in the basic picture, so that the basic picture is processed;
As shown in fig. 4, the training ship plate detection model specifically includes:
s4.1: carrying out data enhancement on the basic picture to obtain a data enhancement picture;
s4.1.1: performing image preprocessing on the basic picture to obtain a preprocessed picture, wherein the image preprocessing comprises rotation, perspective transformation, cutting, overturning and blurring;
s4.1.2: inputting the preprocessed picture into a multi-scale training model, calibrating 480,640, 960, randomly selecting one of the sizes as a picture size during training, and processing the preprocessed picture by using the size calibrated by the multi-scale training model to obtain the preprocessed picture with the calibrated size;
s4.1.3: the pixel value of each pixel point of the preprocessed picture with the calibrated size is converted from [0,255] integer data to [ -1,1] floating point data, so as to obtain a numerical value standardized picture;
s4.1.4: generating y_score information, y_thresh information and y_binary information of the numerical value standardized picture by a graph tag information generation method to obtain a data enhancement picture; wherein:
the y_score information comprises the size of a numerical standardized picture, the pixel value of the pixel point in the union of the graphic labels is 1, and the pixel value of the pixel point outside is 0;
The y_thresh information comprises the size of a numerical standardized picture, the pixel value of a pixel point around the edge of the graphic label in a threshold distance range is calculated by the shortest distance from the pixel point to the nearest edge of the marked graphic, and when the shortest distance is 0, y_thresh=0.7; when the shortest distance is equal to the threshold distance, y_thresh=0.3, removing the pixel points in the threshold distance range around the edge of the graphic label, and the pixel value of the rest pixel points is 0;
s4.2: inputting the data enhancement picture into a basic feature extraction network layer, and carrying out feature extraction on the data enhancement picture by utilizing a ResNet34 network of the basic feature extraction network layer to obtain a feature map containing pyramid features with 4 different levels;
the 4 feature sizes are as follows:
C1=(image_h/4,image_w/4,128);
C2=(image_h/8,image_w/8,256);
C3=(image_h/16,image_w/16,512);
C4=(image_h/32,image_w/32,1024);
s4.3: s4.3: inputting the feature images containing 4 pyramid features with different levels into a sampling fusion layer, and carrying out up-sampling fusion on the feature images containing 4 pyramid features with different levels by utilizing a DBFPN network of the sampling fusion layer to obtain a ship board detection image;
s4.3.1: adjacent layer fusion: the high-level high-latitude data up-sampling has better semantic information of the whole object, and the low-level low-latitude data has better object edge information. And carrying out up-sampling on the high-dimensional data, and then adding and fusing the high-dimensional data and the low-latitude data to obtain a feature map with multi-dimensional information.
S4.3.2: global fusion: the attention scope of different high-level data is different, and the part is local, the part is whole and the edge information of the bottom layer is also the same. And upsampling and fusing all the high-level semantic information with the bottom-layer edge information to obtain a multi-level semantic feature map with all the global parts.
S4.4: inputting the ship board detection graph into a plurality of deconvolution layers in parallel, and processing the ship board detection graph by utilizing the deconvolution layers to obtain a foreground-background classification score, a segmentation boundary threshold value and a binarization segmentation result, wherein the binarization segmentation result is obtained by calculating the foreground-background classification score and the segmentation boundary threshold value, and the calculation formula is shown as follows:
wherein score is foreground-background classification score, thresh is segmentation boundary threshold, and bin is binary segmentation result;
s4.5: constructing a binary classification Loss function BCELoss according to the foreground and background classification score, constructing a regression Loss function L1Loss according to the segmentation boundary threshold, constructing a segmentation Loss function DiceLoss according to the binary segmentation result, constructing a final Loss function according to the binary classification Loss function BCELoss, the regression Loss function L1Loss and the segmentation Loss function DiceLoss, and performing iterative training on the ship board detection model through the final Loss function;
The classification loss function BCELoss is shown as follows:
l bce =y 1 *logx 1 +(1-y 1 )*log(1-x 1 )
wherein x1 is a foreground score in the foreground-background classification score, the range is [0,1], and y1 is y_score information;
the regression Loss function L1Loss is shown as follows:
l l1 =|x 2 -y 2 |
wherein x2 is a segmentation boundary threshold, and y2 is y_thresh information;
the segmentation loss function DiceLoss is shown as follows:
wherein x3 is a binarization segmentation result, more than 0 is a foreground, less than 0 is a background, and y3 is y_binary information;
the final loss function is shown as follows:
l=l bce +l l1 +l dice
wherein l bce For the two classification loss functions BCELoss, l l1 For regression Loss function L1Loss, L dice Is the segmentation loss function DiceLoss.
The invention provides a ship board detection improvement method based on DBNet. The DBNet is added into the DBFPN, 4 object semantic features and edge semantic features in different layers are fused, the adaptability of the DBNet to the scale is improved, and meanwhile, the judgment capability of the DBNet to the complex background is improved through learning of the self-adaptive threshold.
S5: training a ship plate recognition model, and carrying out recognition processing on the ship plate detection diagram by using the trained ship plate recognition model to obtain a ship plate name, wherein the trained ship plate recognition model is based on fusion of local semantic information and global semantic information of Chinese characters in the ship plate detection diagram, so as to realize recognition processing on the ship plate detection diagram.
As shown in fig. 5 (in fig. 5, n is an indefinite side length scaled in equal proportion, m is a maximum output character number, 359 is a radical number, 3579 is a total number of characters), the training ship plate recognition model specifically is:
s5.1: inputting the ship board detection map into a feature extraction network layer, carrying out feature extraction on the ship board detection map by utilizing a ResNet50 network of the feature extraction network layer, and outputting a semantic feature map;
the Res50 and the underlying Res50 are slightly different, and considering that the number is included in the ship board identification, the width and height of the number is basically 1:2, wherein the width and height of the number 1 is close to 1:4, so that the last two layers of blocks of the Res50 are scaled from the original (2, 2) scaling to the (2, 1), namely, the width is not downsampled, and downsampled only in the height direction.
S5.2: inputting the semantic feature map into a context semantic fusion layer, and carrying out context semantic fusion on the semantic feature map by utilizing two layers of BiLSTM in the context semantic fusion layer to obtain a context semantic fusion map;
s5.3: inputting the context semantic fusion map into a global semantic layer, mapping the context semantic fusion map into Chinese characters by utilizing the global semantic layer to obtain a Chinese character distribution thermodynamic diagram, obtaining a central pixel point of each Chinese character through expected calculation based on a Chinese character thermodynamic distribution map, constructing a Chinese character area according to the positive-negative distribution to obtain the feature distribution of each Chinese character, and generating a global semantic feature map of each Chinese character;
S5.4: the local semantic layer is based on the built 359 component feature vectors to be trained, the 359 component feature vectors built based on the local semantic layer take a global semantic feature map and 359 component feature vectors as the input of a self-attention mechanism, wherein the global semantic feature map is taken as a q vector in the self-attention mechanism, the component feature vectors are taken as k vectors and v vectors of the self-attention mechanism, the output result of the self-attention mechanism is obtained, and the output result is the local semantic feature map of a single Chinese character; and calculating cross entropy loss between the character component part sequence feature diagram of the single Chinese character and the character component part decoding sequence of the single Chinese character, training the character component part semantics through the cross entropy loss, and assisting the trunk model of the local semantic layer to learn the character component part information.
X=Transformer(X,Q,Q)
Wherein X is a global semantic feature map, Q is feature vectors of 359 radicals,is the label of the radical corresponding to the single word.
S5.5, a semantic fusion layer: transforming the local semantic feature map to a single Chinese character level, and superposing the local semantic feature map and the global semantic feature map to obtain a radical correction feature vector;
s5.6: inputting the radical correction feature vector into a linear layer to obtain the prediction results of single Chinese characters, and combining the prediction results of all the single Chinese characters of the ship plate to obtain the name of the ship plate.
X=Linear(X)
Wherein X is a feature map obtained by fusing a local semantic feature map and a global semantic feature map,is a single Chinese character label of the picture.
Based on the same principle as the method shown in fig. 1, the embodiment of the invention also provides a ship board detection and identification device based on multi-layer semantic fusion, as shown in fig. 2, the ship board detection and identification device based on multi-layer semantic fusion can comprise:
and (3) acquiring a video module: the camera is used for acquiring the ship video acquired by the camera;
and a picture conversion module: the ship video processing method comprises the steps of processing a ship video to obtain a ship picture, wherein the ship picture is obtained from the ship video and contains a ship;
and the marking module is used for: the ship picture detection method is used for training a ship plate detection baseline model and a ship plate identification baseline model by adopting a pre-marking method based on the ship picture, and detecting and identifying the ship picture by utilizing the trained ship plate retrieval baseline model and the ship plate identification baseline model to obtain a basic picture;
the ship plate detection module: the ship plate detection model is used for training a ship plate detection model, and the basis pictures are processed by using the trained ship plate detection model to obtain a ship plate detection diagram; the trained ship board detection model is based on fusion of object semantic features and edge semantic features of 4 different layers in the basic picture, so that the basic picture is processed;
The ship plate identification module: the method is used for training a ship plate recognition model, and the ship plate detection diagram is recognized by using the trained ship plate recognition model to obtain a ship plate name, wherein the trained ship plate recognition model is based on fusion of local semantic information and global semantic information of Chinese characters in the ship plate detection diagram, so that the ship plate detection diagram is recognized.
The tile detection and recognition device based on multi-layer semantic fusion according to the embodiments of the present invention may execute the tile detection and recognition method based on multi-layer semantic fusion according to the embodiments of the present invention, and the implementation principle is similar, and actions executed by each module and unit in the tile detection and recognition device based on multi-layer semantic fusion according to the embodiments of the present invention correspond to steps in the tile detection and recognition method based on multi-layer semantic fusion according to the embodiments of the present invention, and detailed functional descriptions of each module of the tile detection and recognition device based on multi-layer semantic fusion may be referred to descriptions in the corresponding tile detection and recognition method based on multi-layer semantic fusion shown in the foregoing, which are not repeated herein.
The ship board detection and recognition device based on the multi-layer semantic fusion can be a computer program (comprising program codes) running in computer equipment, for example, the ship board detection and recognition device based on the multi-layer semantic fusion is application software; the device can be used for executing corresponding steps in the method provided by the embodiment of the invention.
In some embodiments, the tile detection and recognition device based on multi-layer semantic fusion provided by the embodiments of the present invention may be implemented by combining software and hardware, and as an example, the tile detection and recognition device based on multi-layer semantic fusion provided by the embodiments of the present invention may be a processor in the form of a hardware decoding processor that is programmed to perform the tile detection and recognition method based on multi-layer semantic fusion provided by the embodiments of the present invention, for example, the processor in the form of a hardware decoding processor may use one or more application specific integrated circuits (ASIC, application Specific IntegratedCircuit), DSP, programmable logic device (PLD, programmable Logic Device), complex programmable logic device (CPLD, complex Programmable Logic Device), field programmable gate array (FPGA, field-Programmable Gate Array), or other electronic components.
In other embodiments, the tile detection and recognition device based on multi-layer semantic fusion provided by the embodiments of the present invention may be implemented in a software manner, and fig. 2 shows the tile detection and recognition device based on multi-layer semantic fusion stored in a memory, which may be software in the form of a program, a plug-in unit, and a series of modules, including a video acquisition module, a picture conversion module, a labeling module, a tile detection module, and a tile recognition module, for implementing the method provided by the embodiments of the present invention.
The modules involved in the embodiments of the present invention may be implemented in software or in hardware. The name of a module does not in some cases define the module itself.
Based on the same principles as the methods shown in the embodiments of the present invention, there is also provided in the embodiments of the present invention an electronic device, which may include, but is not limited to: a processor and a memory; a memory for storing a computer program; a processor for executing the method according to any of the embodiments of the invention by invoking a computer program.
In an alternative embodiment, there is provided an electronic device, as shown in fig. 3, the electronic device shown in fig. 3 including: a processor and a memory. Wherein the processor is coupled to the memory, such as via a bus. Optionally, the electronic device may further comprise a transceiver, which may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data, etc. It should be noted that, in practical applications, the transceiver is not limited to one, and the structure of the electronic device does not limit the embodiments of the present invention.
The processor may be a CPU (Central Processing Unit ), general purpose processor, DSP (Digital Signal Processor, data signal processor), ASIC (Application SpecificIntegrated Circuit ), FPGA (Field Programmable Gate Array, field programmable gate array) or other programmable logic device, transistor logic device, hardware components, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules and circuits described in connection with this disclosure. The processor 4001 may also be a combination that implements computing functionality, e.g., comprising one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.
A bus may include a path that communicates information between the components. The bus may be a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus or an EISA (Extended Industry Standard Architecture ) bus, or the like. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in fig. 3, but not only one bus or one type of bus.
The Memory may be, but is not limited to, ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, RAM (Random Access Memory ) or other type of dynamic storage device that can store information and instructions, EEPROM (Electrically ErasableProgrammable Read Only Memory ), CD-ROM (Compact DiscRead Only Memory, compact disc Read Only Memory) or other optical disk storage, optical disk storage (including compact discs, laser discs, optical discs, digital versatile discs, blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
The memory is used for storing application program codes (computer programs) for executing the scheme of the invention, and the execution is controlled by the processor. The processor is configured to execute the application code stored in the memory to implement what is shown in the foregoing method embodiments.
The electronic device shown in fig. 3 is only an example, and should not impose any limitation on the functions and application scope of the embodiment of the present invention.
Embodiments of the present invention provide a computer-readable storage medium having a computer program stored thereon, which when run on a computer, causes the computer to perform the corresponding method embodiments described above.
According to another aspect of the present invention, there is also provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the methods provided in the implementation of the various embodiments described above.
Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
It should be appreciated that the flow charts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The computer readable storage medium according to embodiments of the present invention may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The computer-readable storage medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the methods shown in the above-described embodiments.
The above description is only illustrative of the preferred embodiments of the present invention and of the principles of the technology employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in the present invention is not limited to the specific combinations of technical features described above, but also covers other technical features formed by any combination of the technical features described above or their equivalents without departing from the spirit of the disclosure. Such as the above-mentioned features and the technical features disclosed in the present invention (but not limited to) having similar functions are replaced with each other.

Claims (10)

1. The ship board detection and identification method based on multi-layer semantic fusion is characterized by comprising the following steps of:
s1: acquiring a ship video acquired by a camera;
s2: processing the ship video to obtain a ship picture, wherein the ship picture is a picture containing a ship and obtained from the ship video;
s3: training a ship plate detection baseline model and a ship plate identification baseline model by adopting a pre-labeling method based on the ship picture, and detecting and identifying the ship picture by utilizing the trained ship plate retrieval baseline model and the ship plate identification baseline model to obtain a basic picture;
S4: training a ship plate detection model, and processing the basic picture by using the trained ship plate detection model to obtain a ship plate detection diagram; the trained ship board detection model is based on fusion of object semantic features and edge semantic features of 4 different layers in the basic picture, so that the basic picture is processed;
s5: training a ship plate recognition model, and carrying out recognition processing on the ship plate detection diagram by using the trained ship plate recognition model to obtain a ship plate name, wherein the trained ship plate recognition model is based on fusion of local semantic information and global semantic information of Chinese characters in the ship plate detection diagram, so as to realize recognition processing on the ship plate detection diagram.
2. The ship board detection and recognition method based on multi-layer semantic fusion according to claim 1, wherein in S1, the source of ship video is: port access vessel video, dock access vessel video, marine vessel video, and shore berth vessel video.
3. The ship board detection and recognition method based on multi-layer semantic fusion according to claim 1, wherein S2 specifically comprises: and inputting the ship video into a ship detection model, and extracting a video frame containing the ship from the video stream by using the ship detection model to obtain a ship picture.
4. The ship board detection and recognition method based on multi-layer semantic fusion according to claim 1, wherein the training of the ship board detection baseline model and the ship board recognition baseline model by adopting the pre-labeling method in S3 is specifically as follows:
s3.1: pre-labeling ship plates in the first number of ship pictures to obtain the first number of first pictures;
s3.2: training an initial ship board detection baseline model and an initial ship board identification baseline model through a first number of first pictures to obtain a first ship board detection baseline model and a first ship board identification baseline model;
s3.3: pre-labeling a second number of ship pictures through a first ship board detection baseline model and a first ship board identification baseline model to obtain a second picture containing first pre-labeling information, and correcting the second picture containing the first pre-labeling information;
s3.4: training the first ship board detection baseline model and the first ship board identification baseline model through the first picture and the corrected second picture to obtain a second ship board detection baseline model and a second ship board identification baseline model;
s3.5: pre-marking a third number of ship pictures through a second ship board detection baseline model and a second ship board identification baseline model to obtain third pictures containing second pre-marking information, and performing false mark deletion on the third pictures containing the second pre-marking information;
S3.6: training the second ship board detection baseline model and the second ship board identification baseline model through the first picture, the corrected second picture and the third picture with the error mark deleted to obtain the ship board detection baseline model and the ship board identification baseline model.
5. The ship board detection and recognition method based on multi-layer semantic fusion according to claim 1, wherein the training ship board detection model in S4 specifically comprises the following steps:
s4.1: carrying out data enhancement on the basic picture to obtain a data enhancement picture;
s4.1.1: performing image preprocessing on the basic picture to obtain a preprocessed picture, wherein the image preprocessing comprises rotation, perspective transformation, cutting, overturning and blurring;
s4.1.2: inputting the preprocessed picture into a multi-scale training model, and processing the preprocessed picture by using the dimension calibrated by the multi-scale training model to obtain the preprocessed picture with the calibrated dimension;
s4.1.3: the pixel value of each pixel point of the preprocessed picture with the calibrated size is converted from [0,255] integer data to [ -1,1] floating point data, so as to obtain a numerical value standardized picture;
s4.1.4: generating y_score information, y_thresh information and y_binary information of the numerical value standardized picture by a graph tag information generation method to obtain a data enhancement picture; wherein:
The y_score information comprises the size of a numerical standardized picture, the pixel value of the pixel point in the union of the graphic labels is 1, and the pixel value of the pixel point outside is 0;
the y_thresh information comprises the size of a numerical standardized picture, the pixel value of a pixel point around the edge of the graphic label in a threshold distance range is calculated by the shortest distance from the pixel point to the nearest edge of the marked graphic, and when the shortest distance is 0, y_thresh=0.7; when the shortest distance is equal to the threshold distance, y_thresh=0.3, removing the pixel points in the threshold distance range around the edge of the graphic label, and the pixel value of the rest pixel points is 0;
the y_bin information comprises the size of a numerical standardized picture, the pixel value of the pixel point in the union of the graphic labels is 1, and the pixel value of the pixel point outside is 0;
s4.2: inputting the data enhancement picture into a basic feature extraction network layer, and carrying out feature extraction on the data enhancement picture by utilizing a ResNet34 network of the basic feature extraction network layer to obtain a feature map containing pyramid features with 4 different levels;
s4.3: inputting the feature images containing 4 pyramid features with different levels into a sampling fusion layer, and carrying out up-sampling fusion on the feature images containing 4 pyramid features with different levels by utilizing a DBFPN network of the sampling fusion layer to obtain a ship board detection image;
S4.4: inputting the ship board detection graph into a plurality of deconvolution layers in parallel, and processing the ship board detection graph by utilizing the deconvolution layers to obtain a foreground-background classification score, a segmentation boundary threshold value and a binarization segmentation result, wherein the binarization segmentation result is obtained by calculating the foreground-background classification score and the segmentation boundary threshold value, and the calculation formula is shown as follows:
wherein score is foreground-background classification score, thresh is segmentation boundary threshold, and bin is binary segmentation result;
s4.5: constructing a binary classification Loss function BCELoss according to the foreground and background classification score, constructing a regression Loss function L1Loss according to the segmentation boundary threshold, constructing a segmentation Loss function DiceLoss according to the binary segmentation result, constructing a final Loss function according to the binary classification Loss function BCELoss, the regression Loss function L1Loss and the segmentation Loss function DiceLoss, and performing iterative training on the ship board detection model through the final Loss function;
the classification loss function BCELoss is shown as follows:
l bce =y 1 *logx 1 +(1-y 1 )*log(1-x 1 )
wherein x1 is a foreground score in the foreground-background classification score, the range is [0,1], and y1 is y_score information;
the regression Loss function L1Loss is shown as follows:
l l1 =|x 2 -y 2 |
wherein x2 is a segmentation boundary threshold, and y2 is y_thresh information;
The segmentation loss function DiceLoss is shown as follows:
wherein x3 is a binarization segmentation result, more than 0 is a foreground, less than 0 is a background, and y3 is y_binary information;
the final loss function is shown as follows:
l=l bce +l l1 +l dice
wherein l bce For the two classification loss functions BCELoss, l l1 For regression Loss function L1Loss, L dice Is the segmentation loss function DiceLoss.
6. The ship board detection and recognition method based on multi-layer semantic fusion according to claim 1, wherein the training ship board recognition model in S5 specifically comprises:
s5.1: inputting the ship board detection map into a feature extraction network layer, carrying out feature extraction on the ship board detection map by utilizing a ResNet50 network of the feature extraction network layer, and outputting a semantic feature map;
s5.2: inputting the semantic feature map into a context semantic fusion layer, and carrying out context semantic fusion on the semantic feature map by utilizing two layers of BiLSTM in the context semantic fusion layer to obtain a context semantic fusion map;
s5.3: inputting the context semantic fusion map into a global semantic layer, mapping the context semantic fusion map into Chinese characters by utilizing the global semantic layer to obtain a Chinese character distribution thermodynamic diagram, obtaining a central pixel point of each Chinese character through expected calculation based on a Chinese character thermodynamic distribution map, constructing a Chinese character area according to the positive-negative distribution to obtain the feature distribution of each Chinese character, and generating a global semantic feature map of each Chinese character;
S5.4: inputting the global semantic feature map of each Chinese character into a local semantic layer, constructing a plurality of radical feature vectors based on the local semantic layer, taking the global semantic feature map and all the radical feature vectors as the input of a self-attention mechanism, wherein the global semantic feature map is taken as a q vector in the self-attention mechanism, the radical feature vectors are taken as k vectors and v vectors of the self-attention mechanism, obtaining the output result of the self-attention mechanism, and the output result is the local semantic feature map of a single Chinese character;
s5.5, transforming the local semantic feature map to a single Chinese character level, and superposing the transformed local semantic feature map and the global semantic feature map to obtain a radical correction feature vector;
s5.6: inputting the radical correction feature vector into a linear layer to obtain the prediction results of single Chinese characters, and combining the prediction results of all the single Chinese characters of the ship plate to obtain the name of the ship plate.
7. The ship board detection and recognition method based on multi-layer semantic fusion according to claim 6, wherein in S5.4, cross entropy loss between the local semantic feature map of a single Chinese character and the local semantic of the radicals of the single Chinese character is calculated, the local semantic of the radicals is trained through the cross entropy loss, and the trunk model of the local semantic layer is assisted to learn the radical information.
8. Ship plate detection and identification device based on multilayer semantic fusion is characterized by comprising:
and (3) acquiring a video module: the camera is used for acquiring the ship video acquired by the camera;
and a picture conversion module: the ship video processing method comprises the steps of processing a ship video to obtain a ship picture, wherein the ship picture is obtained from the ship video and contains a ship;
and the marking module is used for: the ship picture detection method is used for training a ship plate detection baseline model and a ship plate identification baseline model by adopting a pre-marking method based on the ship picture, and detecting and identifying the ship picture by utilizing the trained ship plate retrieval baseline model and the ship plate identification baseline model to obtain a basic picture;
the ship plate detection module: the ship plate detection model is used for training a ship plate detection model, and the basis pictures are processed by using the trained ship plate detection model to obtain a ship plate detection diagram; the trained ship board detection model is based on fusion of object semantic features and edge semantic features of 4 different layers in the basic picture, so that the basic picture is processed;
the ship plate identification module: the method is used for training a ship plate recognition model, and the ship plate detection diagram is recognized by using the trained ship plate recognition model to obtain a ship plate name, wherein the trained ship plate recognition model is based on fusion of local semantic information and global semantic information of Chinese characters in the ship plate detection diagram, so that the ship plate detection diagram is recognized.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1-7 when the computer program is executed.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the method of any of claims 1-7.
CN202311037312.5A 2023-08-17 2023-08-17 Ship board detection and identification method and device based on multilayer semantic fusion Pending CN117218636A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311037312.5A CN117218636A (en) 2023-08-17 2023-08-17 Ship board detection and identification method and device based on multilayer semantic fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311037312.5A CN117218636A (en) 2023-08-17 2023-08-17 Ship board detection and identification method and device based on multilayer semantic fusion

Publications (1)

Publication Number Publication Date
CN117218636A true CN117218636A (en) 2023-12-12

Family

ID=89039822

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311037312.5A Pending CN117218636A (en) 2023-08-17 2023-08-17 Ship board detection and identification method and device based on multilayer semantic fusion

Country Status (1)

Country Link
CN (1) CN117218636A (en)

Similar Documents

Publication Publication Date Title
CN111782839B (en) Image question-answering method, device, computer equipment and medium
CN107133622B (en) Word segmentation method and device
CN109902622B (en) Character detection and identification method for boarding check information verification
CN110176027B (en) Video target tracking method, device, equipment and storage medium
CN108229303B (en) Detection recognition and training method, device, equipment and medium for detection recognition network
CN111681273B (en) Image segmentation method and device, electronic equipment and readable storage medium
US10896357B1 (en) Automatic key/value pair extraction from document images using deep learning
CN111476067A (en) Character recognition method and device for image, electronic equipment and readable storage medium
CN110782420A (en) Small target feature representation enhancement method based on deep learning
CN110084172B (en) Character recognition method and device and electronic equipment
US20220019834A1 (en) Automatically predicting text in images
CN111914654B (en) Text layout analysis method, device, equipment and medium
CN113222055B (en) Image classification method and device, electronic equipment and storage medium
CN115210747B (en) Method and system for digital image processing
CN113903022B (en) Text detection method and system based on feature pyramid and attention fusion
CN116311310A (en) Universal form identification method and device combining semantic segmentation and sequence prediction
CN112597918A (en) Text detection method and device, electronic equipment and storage medium
CN113205095A (en) Training model and character detection method and device
CN111460355A (en) Page parsing method and device
CN113436222A (en) Image processing method, image processing apparatus, electronic device, and storage medium
CN111178363A (en) Character recognition method and device, electronic equipment and readable storage medium
CN112215266B (en) X-ray image contraband detection method based on small sample learning
CN115953744A (en) Vehicle identification tracking method based on deep learning
CN115861922A (en) Sparse smoke and fire detection method and device, computer equipment and storage medium
CN117218636A (en) Ship board detection and identification method and device based on multilayer semantic fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination