CN115410140A

CN115410140A - Image detection method, device, equipment and medium based on marine target

Info

Publication number: CN115410140A
Application number: CN202211359300.XA
Authority: CN
Inventors: 韦一; 孟凡彬; 张妙藏; 武智强; 侯春艳
Original assignee: 707th Research Institute of CSIC; 707th Research Institute of CSIC Jiujiang Branch
Current assignee: 707th Research Institute of CSIC; 707th Research Institute of CSIC Jiujiang Branch
Priority date: 2022-11-02
Filing date: 2022-11-02
Publication date: 2022-11-29

Abstract

The invention discloses an image detection method, device, equipment and medium based on an offshore target. Obtaining an original image corresponding to video stream data by obtaining the video stream data of a to-be-detected shipborne camera and adopting an OpenCV image extraction technology; dividing an original image into N-by-N split images; inputting each split image into a pre-trained image detection depth neural network model respectively to obtain each split image detection result; and merging the split image detection results to obtain an image detection result. According to the technical scheme, aiming at the problems that the offshore target detection effect is unstable, offshore target training data are difficult to obtain, and the data information redundancy is high, the data cleaning and data enhancing processes of the offshore target are constructed, and the stability and reliability of offshore target image detection are improved, so that the image detection can be better performed, the user experience is improved, and powerful support is provided for improving the degree of autonomy of the intelligent ship.

Description

Image detection method, device, equipment and medium based on marine target

Technical Field

The invention relates to the technical field of image recognition, in particular to an image detection method, device, equipment and medium based on a marine target.

Background

In recent years, with the development of the internet of things and artificial intelligence, the degree of ship intelligence is also improved, wherein visual sensors such as a photoelectric holder and a camera play an important role in ship intelligence, and can provide image information around a ship for a captain and assist the captain in controlling. Because the distance between the ships is relatively long, the pixels occupied by the sea surface targets displayed on the image are generally small, and visual fatigue is easily caused to observers.

In the process of implementing the invention, the inventor finds that the prior art has the following defects: at present, a method for enhancing and displaying target information acquired by an image on the image, such as an image target detection algorithm, can provide information support for navigation operation, reduce time consumption for converting digital information into reality for understanding by a crew, avoid distraction of abstract data to the attention of the crew, and assist the crew in navigating. However, the method has the problems that the small target detection effect is unstable, the small target training data is difficult to obtain, and the human-computer interaction effect is generally poor.

Disclosure of Invention

The invention provides an image detection method, device, equipment and medium based on a marine target, which are used for improving the stability and reliability of marine target image detection and improving the good experience of a user.

According to an aspect of the present invention, there is provided a method for detecting an image based on a marine target, including:

acquiring video stream data of a to-be-detected shipborne camera, and acquiring an original image corresponding to the video stream data through an OpenCV image extraction technology;

dividing the original image into N x N split images, wherein N is an integer greater than 1;

inputting each split image into a pre-trained image detection depth neural network model respectively to obtain each split image detection result;

and merging the split image detection results to obtain image detection results.

According to another aspect of the present invention, there is provided an image detecting apparatus based on a marine target, comprising:

the original image determining module is used for acquiring video stream data of a to-be-detected shipborne camera and obtaining an original image corresponding to the video stream data through an OpenCV image extraction technology;

the split image dividing module is used for dividing the original image into N x N split images, wherein N is an integer larger than 1;

the split image detection result determining module is used for respectively inputting each split image into a pre-trained image detection depth neural network model to obtain each split image detection result;

and the image detection result determining module is used for carrying out merging operation on the split image detection results to obtain image detection results.

According to another aspect of the present invention, there is provided an electronic device, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the image detection method based on the marine target according to any embodiment of the present invention when executing the computer program.

According to another aspect of the present invention, there is provided a computer-readable storage medium storing computer instructions for causing a processor to implement the image detection method based on the marine target according to any one of the embodiments of the present invention when the computer instructions are executed.

According to the technical scheme of the embodiment of the invention, the video stream data of the shipborne camera to be detected is obtained, and the original image corresponding to the video stream data is obtained through the OpenCV image extraction technology; dividing the original image into N x N block split images; inputting each split image into a pre-trained image detection depth neural network model to obtain each split image detection result; and merging the split image detection results to obtain image detection results. According to the technical scheme, aiming at the problems that the offshore target detection effect is unstable, offshore target training data are difficult to obtain, and data information redundancy is high, the data cleaning and data enhancing process of the offshore target is constructed, and the stability and reliability of image detection of the offshore target are improved, so that the image detection can be better carried out, the user experience is improved, and powerful support is provided for improving the degree of autonomy of an intelligent ship.

It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present invention, nor are they intended to limit the scope of the invention. Other features of the present invention will become apparent from the following description.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart of an image detection method based on a marine target according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of an image detection apparatus based on a marine target according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, shall fall within the protection scope of the present invention.

It is to be understood that the terms "target," "current," and the like in the description and claims of the present invention and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in other sequences than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example one

Fig. 1 is a flowchart of an embodiment of the present invention, which provides a method for detecting an image based on a marine target, where the embodiment is applicable to image detection of a marine target and training data acquisition, and the method may be performed by a marine target-based image detection apparatus, and the marine target-based image detection apparatus may be implemented in a form of hardware and/or software.

Accordingly, as shown in fig. 1, the method comprises:

s110, video stream data of the to-be-detected shipborne camera are obtained, and an original image corresponding to the video stream data is obtained through an OpenCV image extraction technology.

The video stream data can be a data video stream acquired by collecting through a shipborne camera. The OpenCV image extraction technology may be a technology for extracting multiple images from video stream data, and may perform acquisition of an original image according to a preset extraction rule. The original image may be an image extracted in the video stream data.

Specifically, the OpenCV image extraction technology can be used on multiple platforms, for example: linux, windows, solari, and IRIX. The OpenCV image extraction technique can process raster and vector data, which is to browse a raster image, and then add a vector layer by itself. And the OpenCV image extraction technology can support 2D or 3D display, and the display of a large data set is good.

And S120, dividing the original image into N-by-N split images.

Wherein N is an integer greater than 1.

The split image may be an image obtained by splitting an original image. Specifically, assuming that N =2, the original image needs to be divided into 2 × 2 split images, that is, into 4 split images. Similarly, assuming that N =3, the original image needs to be divided into 3 × 3 split images, that is, into 9 split images. The setting may be performed by a system according to a human need, and is not particularly limited in this embodiment.

And S130, respectively inputting each split image into a pre-trained image detection depth neural network model to obtain a detection result of each split image.

The image detection depth neural network model can be a model capable of detecting small target images of the split images. The split image detection result may be a detection result obtained by performing small target image detection on each split image through the image detection depth neural network model.

In this embodiment, assuming that an original image is split into 4 split images, the 4 split images need to be respectively input into a pre-trained image detection depth neural network model, so as to obtain 4 split image detection results corresponding to the 4 split images respectively.

And S140, merging the split image detection results to obtain image detection results.

The image detection result may be a detection result that can reflect a corresponding original image, and specifically, the detection result is obtained by combining the split image results.

In the previous example, the 4 split image detection results are merged to obtain the image detection result corresponding to the original image. Therefore, the marine target in the original image can be detected.

Optionally, before the step of inputting each split image into a pre-trained image detection deep neural network model to obtain a detection result of each split image, the method further includes: acquiring historical video stream data of a shipborne camera, and performing video frame extraction on the historical video stream data through an FFMPEG frame extraction technical library to obtain a historical image data set; performing data cleaning on the historical images in the historical image data set through a bag-of-words model to obtain a historical cleaning image data set; performing data enhancement processing on the historical cleaning image in the historical cleaning image data set to obtain a historical enhanced image data set; and inputting the historical enhancement image in the historical enhancement image data set into an improved YOLOV7-tiny image detection depth neural network, and obtaining a trained image detection depth neural network model when the calculated CIOU loss function value meets the loss function value condition.

The historical video stream data may be video stream data obtained in the past through a ship-borne camera. The FFMPEG frame extraction technology is used for extracting video key frames, extracting video scene conversion frames, uniformly extracting the frames according to time and extracting video frames at specified time. The FFMPEG frame extraction technical library is a set of open source software which can be used for coding, decoding, synthesizing and converting audio and video data, and provides a very comprehensive audio and video processing function. The FFMPEG frame extraction technology provides common audio and video and coding and decoding modes, a plurality of audio and video formats can be read, and basically all software completes the audio and video reading operation by means of the FFMPEG frame extraction technology.

Specifically, the historical image data set may be a data set composed of historical images. The bag of words model may be an expression model that is simplified under natural language processing and information retrieval. The historical cleaning image dataset may be a dataset consisting of historical cleaning images. The history cleaning image may be a cleaning image obtained by cleaning the history image with data. The historical enhanced image dataset may be a dataset consisting of historical enhanced images. The history enhanced image may be an image obtained by enhancing the history cleaned image with data. The improved YOLOV7-tiny image detection deep neural network can be a neural network for improving the structure of the YOLOV7-tiny image detection deep neural network, specifically, the 17 th layer to the 20 th layer of the YOLOV7-tiny image detection deep neural network are subjected to upsampling processing, and in addition, the characteristic graph of the 20 th layer and the characteristic graph of the 2 nd layer are subjected to characteristic fusion processing.

The CIOU loss function value may be a function of several geometric parameters, such as: and (4) carrying out regression positioning loss processing on parameters such as the overlapping area, the central point distance, the length-width ratio, the size of the detection frame and the like to obtain a loss function value.

In the embodiment, a historical image data set is obtained by acquiring historical video stream data of a shipborne camera and performing video frame extraction through an FFMPEG frame extraction technology; and then, performing data cleaning and data enhancement processing on the historical image, and inputting the historical image into an improved YOLOV7-tiny image detection deep neural network for training, thereby obtaining a trained image detection deep neural network model. When the acquired video stream data needs to be detected in real time, the video stream data is directly input into the trained image detection deep neural network model for training, and the method has the advantage that the image of the marine target in the video stream data can be detected accurately and quickly.

Optionally, the performing, by using a bag-of-words model, data cleaning on the historical image in the historical image data set to obtain a target historical image data set specifically includes: respectively calculating bag-of-words model scores between every two historical image pairs in the historical image data set through a bag-of-words model; identifying a target historical image pair of which the bag-of-words model score is greater than or equal to a bag-of-words model score threshold value, and deleting one target historical image in the target historical image pair; identifying a target historical image pair with the score of the bag-of-words model smaller than a bag-of-words model score threshold value, and reserving two target historical images in the target historical image pair; and merging the reserved target historical images to obtain the historical cleaning image data set.

The historical image pairs may be pairs of historical images in the target historical image dataset combined two by two. The bag of words model score may be a model score that can reflect the similarity between pairs of historical images. The bag-of-words model score threshold may be a threshold of similarity score preset by the system. Specifically, when the bag-of-words model score is greater than or equal to the bag-of-words model score threshold, it indicates that the similarity between the two history images is high, and therefore one of the history images needs to be deleted and the other history image needs to be retained. And when the bag-of-word model score is smaller than the bag-of-word model score threshold value, the similarity of the two historical images is low, and therefore both the two historical images are subjected to retention processing.

The benefit of this arrangement is: by calculating the bag-of-word model score between every two historical image pairs, comparing the calculated bag-of-word model score with the bag-of-word model score threshold, selecting the historical image with higher similarity for deletion processing, and reserving one historical image, the waste of detection time caused by multiple times of image detection on the historical image with higher similarity can be avoided, the efficiency of marine target detection can be improved, and the time cost is saved.

Optionally, the calculating the bag-of-words model score between each two historical image pairs in the historical image dataset includes: according to the formula

Calculating bag of words model scores between every two pairs of historical images in the historical image dataset respectively

(ii) a Wherein the content of the first and second substances,

as words

Frequency of occurrence in a single history image;

is the word inverse number;

is the frequency of occurrence of words in the historical image;

is a word in the history image A;

is a word in the history image B;

a collection of word bags in the historical image A;

is a collection of word bags in the historical image B; n is the number of times of occurrence of the bag of words; a is a historical image A; b is a history image B; i is a variable parameter corresponding to the word.

In this embodiment, the frequency of occurrence of different words in the history image a may be determined according to the history image a, and a collection of bags of words in the history image a may be obtained. And obtaining a word bag collection in the historical image B in the same way. According to the word bag collection between the historical image A and the historical image B, a word bag model score between the historical image A and the historical image B can be calculated.

Optionally, the formula

Separately calculating two of the historical image data setsBag of words model scoring between two historical image pairs

The method comprises the following steps: according to the formula

Calculating to obtain words

Frequency of appearance in a single history image of an object

(ii) a Wherein the content of the first and second substances,

as words

The number of occurrences in the target single history image;

as words

Number of occurrences in all historical images; according to the formula

Calculating to obtain the word reverse order number

。

In the present embodiment, the pass word

Number of occurrences in a single history image of the object

And a word

Number of occurrences in all historical images

To calculate the word

Frequency of appearance in a single historical image of an object

Number of words in reverse order

。

Optionally, the performing data enhancement processing on the historical cleaning image in the historical cleaning image data set to obtain a historical enhanced image data set includes: performing data enhancement processing on the historical cleaning image in the historical cleaning image data set through Gaussian blur operation to obtain a first historical enhancement image; performing data enhancement processing on the historical cleaning image in the historical cleaning image data set through brightness adjustment operation to obtain a second historical enhanced image; performing data enhancement processing on the historical cleaning image in the historical cleaning image data set through noise point increasing operation to obtain a third historical enhancement image; performing data enhancement processing on the historical cleaning image in the historical cleaning image data set through copying operation to obtain a fourth historical enhancement image; and merging the historical cleaning image, the first historical enhanced image, the second historical enhanced image, the third historical enhanced image and the fourth historical enhanced image to obtain a historical enhanced image data set.

The first history enhanced image may be a history enhanced image obtained by performing data enhancement processing on the history cleaned image through a gaussian blur operation. The second history enhanced image may be a history enhanced image obtained by performing data enhancement processing on the history cleaned image by adjusting brightness. The third history enhanced image can be a history enhanced image obtained by performing data enhancement processing on the history cleaning image through noise increasing operation. The fourth history enhanced image may be a history enhanced image obtained by subjecting the history cleaned image to data enhancement processing by a copy operation.

In the present embodiment, the first history enhanced image, the second history enhanced image, the third history enhanced image, and the fourth history enhanced image are obtained by subjecting the history cleaned image to a gaussian blur operation, a brightness adjustment operation, a noise increase operation, and a copy operation, respectively. Thus, the historical cleaning image is processed to obtain 4 different historical enhancement images, and the 4 different historical enhancement images are combined with the original historical enhancement image, so that the historical enhancement image data set is enlarged by 4 times.

The advantages of such an arrangement are: by performing data enhancement processing on the historical enhanced images, different types of historical enhanced images can be obtained, so that the image detection of the marine target can be performed more accurately and comprehensively.

Optionally, the inputting the historical enhanced image in the historical enhanced image dataset into the improved YOLOV7-tiny image detection depth neural network, and obtaining the trained image detection depth neural network model when the calculated CIOU loss function value satisfies the loss function value condition, includes: inputting the historical enhanced image in the historical enhanced image data set into an improved YOLOV7-tiny image detection depth neural network; wherein the historical enhanced image has a resolution of 1280 x 1280; the improved YOLOV7-tiny image detection depth neural network is obtained by improving based on the YOLOV7-tiny image detection depth neural network; respectively adding up-sampling processing on the first characteristic diagram from the 17 th layer to the 20 th layer in the improved YOLOV7-tiny image detection depth neural network; obtaining a second feature map in a 20 th layer of the improved Yolov7-tiny image detection depth neural network, and adding a processing operation of performing feature fusion on the second feature map and a third feature map corresponding to the 2 nd layer;

according to the formula

Calculating a CIOU loss function value; wherein w is the width of the reasoning box, h is the height of the reasoning box,

the width of the real frame is the width of the real frame,

the frame height is a real frame, v is an inference frame and a real frame middle width-height ratio similarity measurement weight, IOU is an intersection ratio, alpha is a weight parameter, rho is an Euclidean distance of a central point of the inference frame and the real frame, C is a diagonal length of a minimum outsourcing rectangle of the inference frame and the real frame, C is the inference frame, and D is the real frame; and when the calculated CIOU loss function value meets the loss function value condition, obtaining a trained image detection depth neural network model.

In the embodiment, the improved YOLOV7-tiny image detection deep neural network is obtained by improving the YOLOV7-tiny image detection deep neural network. Specifically, the improvement of the first aspect is as follows: respectively adding up-sampling processing on the first characteristic diagram from the 17 th layer to the 20 th layer in the improved YOLOV7-tiny image detection depth neural network; namely, the 17 th layer, the 18 th layer, the 19 th layer and the 20 th layer are respectively added with the characteristic maps which are subjected to the upsampling processing. The improvement of the second aspect is that: and obtaining a second feature map in a 20 th layer in the improved Yolov7-tiny image detection depth neural network, and adding a processing operation of performing feature fusion on the second feature map and a third feature map corresponding to the 2 nd layer.

In addition, in this embodiment, the CIOU loss function value needs to be calculated, and whether the image detection deep neural network model is trained is determined by judging the calculated CIOU loss function value and the loss function value condition. And when the calculated CIOU loss function value meets the loss function value condition, determining that the training is completed on the image detection depth neural network model.

The advantages of such an arrangement are: the original YOLOV7-tiny image detection deep neural network can only detect 3 detection targets, and the improved YOLOV7-tiny image detection deep neural network can detect 4 detection targets. Therefore, the image detection of the marine target can be more accurately carried out on the collected video stream data, and the image detection efficiency is improved.

According to the technical scheme of the embodiment of the invention, the video stream data of the to-be-detected shipborne camera is obtained, and the original image corresponding to the video stream data is obtained through the OpenCV image extraction technology; dividing the original image into N x N split images; inputting each split image into a pre-trained image detection depth neural network model respectively to obtain each split image detection result; and merging the split image detection results to obtain image detection results. According to the technical scheme, aiming at the problems that the offshore target detection effect is unstable, offshore target training data are difficult to obtain, and data information redundancy is high, the data cleaning and data enhancing process of the offshore target is constructed, and the stability and reliability of image detection of the offshore target are improved, so that the image detection can be better carried out, the user experience is improved, and powerful support is provided for improving the degree of autonomy of an intelligent ship.

Example two

Fig. 2 is a schematic structural diagram of an image detection apparatus based on an offshore object according to a second embodiment of the present invention. The image detection device based on the marine target provided by the embodiment of the invention can be realized by software and/or hardware, and can be configured in a server or a terminal device to realize the image detection method based on the marine target in the embodiment of the invention. As shown in fig. 2, the apparatus includes: an original image determining module 210, a split image dividing module 220, a split image detection result determining module 230, and an image detection result determining module 240.

The original image determining module 210 is configured to obtain video stream data of a to-be-detected shipborne camera, and obtain an original image corresponding to the video stream data through an OpenCV image extraction technology;

a split image dividing module 220, configured to divide the original image into N × N split images, where N is an integer greater than 1;

a split image detection result determining module 230, configured to input each split image into a pre-trained image detection depth neural network model, respectively, to obtain each split image detection result;

and an image detection result determining module 240, configured to perform a merging operation on each split image detection result to obtain an image detection result.

According to the technical scheme of the embodiment of the invention, the video stream data of the to-be-detected shipborne camera is obtained, and the original image corresponding to the video stream data is obtained through the OpenCV image extraction technology; dividing the original image into N x N block split images; inputting each split image into a pre-trained image detection depth neural network model respectively to obtain each split image detection result; and merging the split image detection results to obtain an image detection result. According to the technical scheme, aiming at the problems that the offshore target detection effect is unstable, offshore target training data are difficult to obtain, and the data information redundancy is high, the data cleaning and data enhancing processes of the offshore target are constructed, and the stability and reliability of offshore target image detection are improved, so that the image detection can be better performed, the user experience is improved, and powerful support is provided for improving the degree of autonomy of the intelligent ship.

Optionally, the image detection depth neural network model determining module may specifically include: a historical image data set determining unit, configured to obtain historical video stream data of a ship-borne camera before the split images are input into a pre-trained image detection deep neural network model to obtain detection results of the split images, and perform video frame extraction on the historical video stream data through an FFMPEG frame extraction technical library to obtain a historical image data set; the historical cleaning image data set determining unit is used for carrying out data cleaning on the historical images in the historical image data set through a bag-of-word model to obtain a historical cleaning image data set; the historical enhanced image data set determining unit is used for performing data enhancement processing on the historical cleaning images in the historical cleaning image data set to obtain a historical enhanced image data set; and the image detection depth neural network model determining unit is used for inputting the historical enhancement images in the historical enhancement image data set into the improved YOLOV7-tiny image detection depth neural network, and obtaining the trained image detection depth neural network model when the calculated CIOU loss function value meets the loss function value condition.

Optionally, the historical cleaning image data set determining unit may specifically include: the bag-of-words model score calculating subunit is used for respectively calculating bag-of-words model scores between every two historical image pairs in the historical image data set through a bag-of-words model; the target historical image deleting subunit is used for identifying a target historical image pair of which the bag-of-words model score is greater than or equal to a bag-of-words model score threshold value, and deleting one target historical image in the target historical image pair; the target historical image retaining subunit is used for identifying a target historical image pair of which the bag-of-words model score is smaller than a bag-of-words model score threshold value, and retaining two target historical images in the target historical image pair; and the historical cleaning image data set determining subunit is used for merging the reserved target historical images to obtain the historical cleaning image data set.

Optionally, the bag-of-words model score calculating subunit may be specifically configured to: according to the formula

(ii) a Wherein, the first and the second end of the pipe are connected with each other,

as words

Frequency of occurrence in a single history image;

is the word inverse number;

is the frequency of occurrence of words in the historical image;

the words in the history image A;

is a word in the history image B;

is a collection of word bags in the historical image A;

Optionally, the bag-of-words model score calculating subunit may be further specifically configured to: according to the formula

Calculating to obtain words

Frequency of appearance in a single historical image of an object

(ii) a Wherein the content of the first and second substances,

as words

Number of occurrences in the target single history image;

as words

Number of occurrences in all historical images; according to the formula

Calculating to obtain the word reverse order number

。

Optionally, the history enhanced image data set determining unit may be specifically configured to: performing data enhancement processing on the historical cleaning image in the historical cleaning image data set through Gaussian blur operation to obtain a first historical enhancement image; performing data enhancement processing on the historical cleaning image in the historical cleaning image data set through brightness adjustment operation to obtain a second historical enhancement image; performing data enhancement processing on the historical cleaning image in the historical cleaning image data set through noise point increasing operation to obtain a third historical enhancement image; performing data enhancement processing on the historical cleaning image in the historical cleaning image data set through copying operation to obtain a fourth historical enhancement image; and merging the historical cleaning image, the first historical enhanced image, the second historical enhanced image, the third historical enhanced image and the fourth historical enhanced image to obtain a historical enhanced image data set.

Optionally, the image detection depth neural network model determining unit may be specifically configured to: inputting the historical enhanced image in the historical enhanced image data set into an improved YOLOV7-tiny image detection depth neural network; wherein, the historical enhanced image has a resolution of 1280 × 1280; the improved YOLOV7-tiny image detection depth neural network is obtained by improving based on the YOLOV7-tiny image detection depth neural network; respectively adding up-sampling processing on the first characteristic diagram from the 17 th layer to the 20 th layer in the improved YOLOV7-tiny image detection depth neural network; obtaining a second feature map in a layer 20 in the improved Yolov7-tiny image detection deep neural network, and adding a layer 2 corresponding to the second feature mapCarrying out processing operation of feature fusion on the three feature graphs; according to the formula

Calculating a CIOU loss function value; wherein w is the width of the inference box, h is the height of the inference box,

in order to be the width of the real frame,

the height of a real frame, v is similarity measurement weight of the width-height ratio of the inference frame and the real frame, IOU is cross-over ratio, alpha is a weight parameter, rho is Euclidean distance between the inference frame and the central point of the real frame, C is diagonal length of the minimum outer-wrapping rectangle of the inference frame and the real frame, C is the inference frame, and D is the real frame; and when the calculated CIOU loss function value meets the loss function value condition, obtaining a trained image detection depth neural network model.

The image detection device based on the marine target provided by the embodiment of the invention can execute the image detection method based on the marine target provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

EXAMPLE III

Fig. 3 shows a schematic structural diagram of an electronic device 10 that can be used to implement a third embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 3, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 can perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from a storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 can also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.

A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. The processor 11 performs the various methods and processes described above, such as a marine target-based image detection method.

In some embodiments, the marine target-based image detection method may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the above-described marine target-based image detection method may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the marine target-based image detection method by any other suitable means (e.g., by means of firmware).

The method comprises the following steps: acquiring video stream data of a to-be-detected shipborne camera, and obtaining an original image corresponding to the video stream data through an OpenCV image extraction technology; dividing the original image into N x N block split images; inputting each split image into a pre-trained image detection depth neural network model to obtain each split image detection result; and merging the split image detection results to obtain image detection results.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

A computer program for implementing the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on a machine, as a stand-alone software package partly on a machine and partly on a remote machine or entirely on a remote machine or server.

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the Internet.

The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.

It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solution of the present invention can be achieved.

The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Example four

A fourth embodiment of the present invention further provides a computer-readable storage medium containing computer-readable instructions, which when executed by a computer processor, perform a method for image detection based on a marine target, the method including: acquiring video stream data of a to-be-detected shipborne camera, and acquiring an original image corresponding to the video stream data through an OpenCV image extraction technology; dividing the original image into N x N block split images; inputting each split image into a pre-trained image detection depth neural network model respectively to obtain each split image detection result; and merging the split image detection results to obtain image detection results.

Of course, the embodiment of the present invention provides a storage medium containing computer-readable instructions, and the computer-executable instructions are not limited to the method operations described above, and may also perform related operations in the image detection method based on marine targets provided in any embodiment of the present invention.

From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.

It should be noted that, in the embodiment of the image detection apparatus based on marine targets, the included units and modules are only divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An image detection method based on a marine target is characterized by comprising the following steps:

inputting each split image into a pre-trained image detection depth neural network model to obtain each split image detection result;

and merging the split image detection results to obtain an image detection result.

2. The method according to claim 1, before the step of inputting each split image into a pre-trained image detection deep neural network model to obtain each split image detection result, further comprising:

acquiring historical video stream data of a shipborne camera, and performing video frame extraction on the historical video stream data through an FFMPEG frame extraction library to obtain a historical image data set;

performing data cleaning on the historical images in the historical image data set through a bag-of-words model to obtain a historical cleaning image data set;

performing data enhancement processing on the historical cleaning image in the historical cleaning image data set to obtain a historical enhanced image data set;

and inputting the historical enhancement image in the historical enhancement image data set into an improved YOLOV7-tiny image detection depth neural network, and obtaining a trained image detection depth neural network model when the calculated CIOU loss function value meets the loss function value condition.

3. The method according to claim 2, wherein the performing data cleaning on the historical images in the historical image data set through a bag-of-words model to obtain a target historical image data set specifically includes:

respectively calculating bag-of-word model scores between every two historical image pairs in the historical image data set through a bag-of-word model;

identifying a target historical image pair of which the bag-of-words model score is greater than or equal to a bag-of-words model score threshold value, and deleting one target historical image in the target historical image pair;

identifying a target historical image pair of which the bag-of-words model score is smaller than a bag-of-words model score threshold value, and reserving two target historical images in the target historical image pair;

and merging the reserved target historical images to obtain the historical cleaning image data set.

4. The method of claim 3, wherein said separately calculating bag-of-words model scores between pairs of historical image pairs in the historical image dataset comprises:

according to the formula

；

Wherein the content of the first and second substances,

as words

Frequency of occurrence in a single history image;

is the word inverse number;

is the frequency of occurrence of words in the historical images;

is a word in the history image A;

is a word in the history image B;

a collection of word bags in the historical image A;

is a collection of word bags in the historical image B; n is the number of times of occurrence of the word bag; a is a historical image A; b is a history image B; i is a variable parameter corresponding to the word.

5. The method of claim 4, wherein the equation is based on

Separately calculating bag of words model scores between pairs of historical images in the historical image dataset

The method comprises the following steps:

according to the formula

Calculating to obtain words

Frequency of appearance in a single historical image of an object

as words

The number of occurrences in the target single history image;

as words

Number of occurrences in all historical images;

according to the formula

Calculating to obtain the word reverse order number

。

6. The method according to claim 2, wherein the performing data enhancement processing on the history cleaned image in the history cleaned image data set to obtain a history enhanced image data set comprises:

performing data enhancement processing on the historical cleaning image in the historical cleaning image data set through Gaussian blur operation to obtain a first historical enhancement image;

performing data enhancement processing on the historical cleaning image in the historical cleaning image data set through brightness adjustment operation to obtain a second historical enhancement image;

performing data enhancement processing on the historical cleaning image in the historical cleaning image data set by adding noise point operation to obtain a third historical enhancement image;

performing data enhancement processing on the historical cleaning image in the historical cleaning image data set through copying operation to obtain a fourth historical enhancement image;

and merging the historical cleaning image, the first historical enhanced image, the second historical enhanced image, the third historical enhanced image and the fourth historical enhanced image to obtain a historical enhanced image data set.

7. The method of claim 2, wherein inputting the historical enhanced images in the historical enhanced image data set into a modified YOLOV7-tiny image detection deep neural network, and obtaining a trained image detection deep neural network model when the calculated CIOU loss function value satisfies the loss function value condition comprises:

inputting the historical enhanced images in the historical enhanced image data set into an improved YOLOV7-tiny image detection depth neural network;

wherein, the historical enhanced image has a resolution of 1280 × 1280; the improved YOLOV7-tiny image detection depth neural network is obtained by improving based on the YOLOV7-tiny image detection depth neural network; respectively adding up-sampling processing on the first characteristic diagram from the 17 th layer to the 20 th layer in the improved YOLOV7-tiny image detection depth neural network; obtaining a second feature map in a 20 th layer of the improved Yolov7-tiny image detection depth neural network, and adding a processing operation of performing feature fusion on the second feature map and a third feature map corresponding to the 2 nd layer;

according to the formula

Calculating a CIOU loss function value;

wherein, the first and the second end of the pipe are connected with each other,

in order to provide a wide range of the inference box,

in order for the height of the inference box to be high,

in order to be the width of the real frame,

is the height of the real frame or frames,

to measure the weight for the aspect ratio similarity in the inference box and the real box,

in order to obtain the cross-over ratio,

in order to be a weight parameter, the weight parameter,

in order to infer the euclidean distance between the center points of the frame and the real frame,

the length of a diagonal line of a minimum outsourcing rectangle of the reasoning frame and the real frame is shown, C is the reasoning frame, and D is the real frame;

and when the calculated CIOU loss function value meets the loss function value condition, obtaining a trained image detection depth neural network model.

8. An image detection device based on a marine target, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method for image detection based on marine targets of any of claims 1-7 when executing the computer program.

10. A computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions for causing a processor to implement the image detection method based on marine targets of any one of claims 1-7 when executed.