CN118097089A

CN118097089A - Night warehousing robot target detection method and system based on integral network

Info

Publication number: CN118097089A
Application number: CN202410476865.9A
Authority: CN
Inventors: 林楷栋; 杨晓君; 周齐; 闵海波; 施煜锴; 程昱
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2024-04-19
Filing date: 2024-04-19
Publication date: 2024-05-28
Anticipated expiration: 2044-04-19
Also published as: CN118097089B

Abstract

The invention discloses a night warehousing robot target detection method and system based on an integrity network, comprising the following steps: constructing an integral network through a hidden data pipeline, and taking a visible light image and an infrared image as network input; and performing feature extraction on the visible light image by using a codec based on a convolutional neural network to perform image enhancement, aggregating the enhanced image and the infrared image, extracting low-frequency texture information and high-frequency semantic information by using a cross attention module and a detail extraction module, and generating a fusion image to perform target perception. The invention performs image fusion based on the integral network, avoids the phenomenon of information entropy loss and supersaturation of fusion results, optimizes the visual effect of the fusion images, and improves the target detection accuracy and efficiency of the night warehouse robot at the same time, thereby better meeting the demands of night warehouse operation.

Description

Night warehousing robot target detection method and system based on integral network

Technical Field

The invention relates to the technical field of target detection, in particular to a night warehousing robot target detection method and system based on an integral network.

Background

In modern warehouse management, the use of robots for automated operations has become a trend. However, the quality of the visible light image is limited by illumination conditions, so that the whole scene cannot be accurately described, and particularly in a dark light environment, the conventional robot vision target detection system often faces the problems of image quality degradation, inaccurate target identification and the like. Therefore, image fusion in combination with infrared images has become an important solution to improve the target detection effect of robots in night warehouse operations. At present, although the traditional image fusion method has effect in the aspects of fusion of visible light and infrared images, the traditional image fusion method still has unsatisfactory performance in a dark light environment. Although the image fusion method based on deep learning has advanced to some extent, the image fusion problem in the dark environment still needs to be improved.

In the prior art, a coupling interaction image fusion network suitable for a dim light scene is provided by a dim-view imaging infrared and visible light image fusion method (DIVFusion), and the problem of image fusion in the dim light environment is solved by a two-stage training mode. But adopts single-scale image data in the form of data transmission between stages, so that oversaturation phenomenon occurs to fusion results. And EFMN adopts a multi-scale characteristic transmission mode, so that the fusion information quantity is improved, and the global overexposure phenomenon is solved. However, these methods still have the problem of information data loss, and do not give sufficient attention to the visual effect of the fused image.

Disclosure of Invention

In order to solve the technical problems, the invention provides a night warehousing robot target detection method and system based on an integral network.

The first aspect of the invention provides a night warehousing robot target detection method based on an integrity network, which comprises the following steps:

The visual device of the storage robot is utilized to acquire a visible light image and an infrared image in a dark light environment, an integral network is constructed through a hidden data pipeline, and the visible light image and the infrared image are used as network input;

performing feature extraction on the visible light image by using a codec based on a convolutional neural network, separating an illumination component and a reflection component, performing image enhancement, and obtaining an enhanced image;

The enhanced image and the infrared image are aggregated, a cross attention module and a detail extraction module are utilized to extract low-frequency texture information and high-frequency semantic information, and a fusion image is generated through the low-frequency texture information and the high-frequency semantic information;

And extracting target characteristics and sensing targets according to the fusion images, and outputting target detection results of the storage robot.

In this scheme, build the wholeness network through hidden data pipeline, regard visual light image and infrared image as the network input, specifically do:

An integral network is built through a hidden data pipeline, a huge amount of visible light images and infrared images in a dim light environment or at night are obtained to serve as data sources, training data are obtained from the data sources to generate batch processing blocks, and the batch processing blocks are subjected to network training by utilizing forward calculation, reverse calculation and parameter updating;

the integral network comprises a data enhancement stage and an image fusion stage, and is mutually coupled through a shared feature extraction module, the batch processing block is divided into a plurality of data pieces, and the data pieces are sequentially processed by a data pipeline;

Each stage of the integrity network operates in an independent thread, after the calculation task of the current stage is finished, a message mechanism is utilized to generate a message and send the message to the later stage, and the later stage continues to execute the calculation task after receiving the message;

and informing the main thread to sequentially execute forward computation, reverse computation and parameter updating through a message mechanism, outputting a final integral network after iterative training, and importing the visible light image and the infrared image which are acquired by the storage robot vision device under the dim light environment into the integral network to serve as network input.

In this scheme, the data enhancement stage specifically includes:

Processing the visible light image in the input dark light environment by utilizing the data enhancement stage, and extracting the characteristics of the visible light image in the input dark light environment by utilizing a coder-decoder based on a convolutional neural network;

Introducing a 3D convolution residual block into a convolution neural network to carry out convolution on the visible light image, adding the convolution operation and input for activation, setting two parallel feature extraction branch lines, and acquiring features with different scales by using convolution kernels with different scales in the two feature extraction branch lines;

Respectively obtaining multi-scale features of two feature extraction branch lines, splicing and integrating the multi-scale features of the two parts, pooling the integrated features, extracting image features by using hole convolution with different step sizes, and performing feature recovery on the image features by using a decoder;

Separating an illumination component and a reflection component of the visible light image by using a Retinex theory, respectively processing the reflection component through local and global response normalization to realize noise suppression, introducing an attention mechanism to perform characteristic enhancement on the illumination component, and realizing illumination correction;

And recombining the enhanced illumination component and the reflection component to generate an enhanced visible light image.

In this scheme, the image fusion stage specifically includes:

The enhanced visible light image and the enhanced infrared image are input into an image fusion stage in a polymerization way, basic characteristic information of the visible light image and the infrared image is extracted through a self-attention mechanism by utilizing a cross attention module, structural information of the basic characteristic information is solved, and low-frequency texture characteristics are extracted;

Constructing a detail extraction module by utilizing a reversible neural network, acquiring basic feature information of a cross attention module through feature data sharing, and extracting global high-frequency semantic information in a recursion training mode based on the basic feature information;

and fusing the low-frequency texture features and the high-frequency semantic information output by the cross attention module and the detail extraction module to generate a final fused image.

In this scheme, the cross attention module specifically is:

integrating and optimizing the characteristic information output by the double-channel encoder by using a self-attention mechanism, and guiding the extraction of basic characteristic information by using cross attention;

Carrying out edge detection by a Sobel operator to solve the structural information of the basic feature information, and extracting low-frequency texture features;

Weights in which visible images are at a cross-attention module The method comprises the following steps:

;

Wherein the method comprises the steps of Features representing visible and infrared images, respectively,/>、/>、/>Respectively representing the calculation of the query, the key vector and the value vector,/>Representing a gradient solving operation,/>Representing the expanded dimension of a feature,/>Representing a data stream link,/>Representing the matrix transpose.

In this scheme, the detail extraction module specifically is:

a detail extraction module is constructed by utilizing a reversible neural network, global detail information is extracted in a recursion training mode, and high-frequency semantic information is generated;

High frequency semantic information of visible light images And high frequency semantic information of infrared images/>Expressed as:

;

Wherein, Mapping function representing reversible neural network,/>And/>Representing the characteristics of the visible and infrared images, respectively.

In this scheme, carry out target feature extraction and target perception according to the fused image, output storage robot's target detection result, specifically do:

Acquiring the fusion image, cutting to generate mutually non-overlapping image blocks with consistent sizes, extracting the characteristics of the fusion image, calculating the information entropy of different image blocks, screening the image blocks according to the information entropy, and reserving the image blocks larger than a preset information entropy threshold;

Constructing a reference target image according to historical detection data of the storage targets in a dim light environment, and utilizing normal images of the storage targets of all categories to realize data expansion of the reference target image so as to construct a reference target image set;

performing target preliminary detection by using the reserved image blocks, performing pre-screening in a reference target image set through a preliminary detection result, and constructing a reference block according to pre-screened image data;

calculating the distance characterization similarity between the image blocks and the reference blocks by using the mahalanobis distance as a measurement function, acquiring the reference blocks smaller than a distance threshold value, and forming a similar block set by the reference blocks of each image block;

and obtaining the duty ratio of the background blocks and the target blocks in the similar block set, setting target labels or background labels for the image blocks according to the duty ratio, extracting the image blocks corresponding to the target labels, statistically analyzing the storage target class with the highest occurrence frequency of the reference blocks, and outputting the target detection result of the storage robot.

The second aspect of the invention provides a night warehouse robot target detection system based on an integrity network, which comprises: the night warehouse robot target detection system comprises a memory and a processor, wherein the memory and the processor store and execute a program of the night warehouse robot target detection method based on an integral network.

Compared with the prior art, the beneficial effects of the present disclosure are:

(1) Through the design of the integral network structure and the feature extraction module, the invention can better fuse the color information of the visible light image and the structural information of the infrared image to generate a fused image with rich texture details and obvious structural information, thereby improving the accuracy of target detection;

(2) The fusion loss of the contrast equalization enables the fusion image to be clearer and more balanced in vision, reduces overexposure phenomenon, and enables the storage robot to identify the target more accurately in a dim light environment;

(3) By adopting the shared feature extraction module and the integral network structure, the invention reduces the network level and the parameter quantity, improves the calculation efficiency and enables the robot vision system to process a large amount of image data in real time.

Drawings

In order to more clearly illustrate the technical solutions of embodiments or examples of the present invention, the drawings that are required to be used in the embodiments or examples of the present invention will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the drawings without inventive efforts for those skilled in the art.

FIG. 1 illustrates a flow chart of a night warehouse robot target detection method based on an integrity network;

FIG. 2 illustrates a frame diagram of an overall network in an embodiment of the invention;

FIG. 3 shows a flow chart of a data enhancement phase of an embodiment of the present invention;

FIG. 4 shows a flow chart of an image fusion phase of an embodiment of the present invention;

Fig. 5 shows a block diagram of a night warehouse robot target detection system based on an integrity network.

Detailed Description

In order that the above-recited objects, features and advantages of the present application will be more clearly understood, a more particular description of the application will be rendered by reference to the appended drawings and appended detailed description. It should be noted that, without conflict, the embodiments of the present application and features in the embodiments may be combined with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those described herein, and therefore the scope of the present invention is not limited to the specific embodiments disclosed below.

Fig. 1 shows a flowchart of a night warehouse robot target detection method based on an integrity network.

As shown in fig. 1, the invention provides a night warehouse robot target detection method based on an integral network, which comprises the following steps:

S102, acquiring a visible light image and an infrared image in a dark light environment by using a vision device of a storage robot, constructing an integral network through a hidden data pipeline, and inputting the visible light image and the infrared image as a network;

S104, extracting features of the visible light image by using a codec based on a convolutional neural network, separating an illumination component and a reflection component for image enhancement, and obtaining an enhanced image;

s106, aggregating the enhanced image and the infrared image, extracting low-frequency texture information and high-frequency semantic information by using a cross attention module and a detail extraction module, and generating a fusion image by using the low-frequency texture information and the high-frequency semantic information;

s108, extracting target characteristics and sensing targets according to the fusion image, and outputting target detection results of the storage robot.

It should be noted that, the integral network is constructed through the hidden data pipeline, so that the network can integrally learn the image enhancement and fusion in the dim light environment, and the problem of information entropy loss is avoided. Acquiring massive visible light images and infrared images in a dim light environment or at night as a data source, acquiring training data from the data source to generate a batch processing block, and performing network training on the batch processing block by utilizing forward calculation, reverse calculation and parameter updating; as shown in fig. 2, the integrity network includes a data enhancement stage and an image fusion stage, and the integrity network is coupled with each other through a shared feature extraction module to divide the batch processing block into a plurality of data slices, and the data slices are sequentially processed by using a data pipeline; the integrity network runs in a forward and reverse calculation process, each layer in the network runs in a separate thread in each stage, the forward and reverse calculation processes of the data sheets are overlapped, after the previous stage processes the calculation task of the current data sheet, the previous stage is handed to the next stage to continue processing, and the previous stage processes the next adjacent data sheet of the current data sheet. When the data sheet is transferred between different stages, the data acquired in the later stage is synchronized in the former stage, after the calculation task in the current stage is finished, a message mechanism is utilized to generate a message and send the message to the later stage, and the later stage continues to execute the calculation task after receiving the message; and informing the main thread to sequentially execute forward computation, reverse computation and parameter updating through a message mechanism, outputting a final integral network after iterative training, and importing the visible light image and the infrared image which are acquired by the storage robot vision device under the dim light environment into the integral network to serve as network input.

FIG. 3 shows a flow chart of the data enhancement phase of an embodiment of the present invention.

According to an embodiment of the present invention, the data enhancement stage specifically includes:

S302, processing the visible light image in the input dark light environment by utilizing the data enhancement stage, and extracting the characteristics of the visible light image in the input dark light environment by using a codec based on a convolutional neural network;

s304, introducing a 3D convolution residual block into a convolution neural network to carry out convolution on the visible light image, adding the convolution operation and input for activation, setting two parallel feature extraction branch lines, and acquiring features with different scales by using convolution kernels with different scales in the two feature extraction branch lines;

S306, respectively acquiring multi-scale features of two feature extraction branch lines, splicing and integrating the multi-scale features of the two parts, pooling the integrated features, extracting image features by using hole convolution with different step sizes, and performing feature recovery on the image features by using a decoder;

S308, separating an illumination component and a reflection component of the visible light image by using a Retinex theory, respectively processing the reflection component through local and global response normalization to realize noise suppression, introducing an attention mechanism to strengthen the characteristics of the illumination component, and realizing illumination correction;

S310, recombining the enhanced illumination component and the reflection component to generate an enhanced visible light image.

It should be noted that, the 3D convolution method is used to perform longitudinal one-dimensional convolution on the visible light image, so as to extract more abundant features, and use residual errors to extract features rapidly and efficiently, and perform cavity convolution on the extracted image features, so that the cavity convolution can increase receptive field, keep the size of the feature map unchanged, and is beneficial to detection and identification of small targets. The Retinex theory proposes that any image can be decomposed into an illumination image and a reflection image, the reflection image is the intrinsic property of the object and is not influenced by the environment to change, and the illumination image is influenced by the environment relatively more, so that the purpose of enhancing the image can be achieved by correcting the illumination image. Introducing an attention mechanism and residual connection to perform characteristic enhancement on the illumination component, and enhancing detail information in the image; and respectively extracting local and global features through local and global response normalization to perform noise elimination, and re-synthesizing the enhanced reflection component and illumination component into an enhanced visible light image according to the Retinex theory. Enhanced images with better contrast and detail can be generated even in under-illuminated conditions.

FIG. 4 shows a flow chart of an image fusion phase of an embodiment of the present invention.

According to the embodiment of the invention, the image fusion stage specifically comprises the following steps:

S402, the enhanced visible light image and the enhanced infrared image are input into an image fusion stage in an aggregation way, basic feature information of the visible light image and the infrared image is extracted through a self-attention mechanism by utilizing a cross attention module, structural information of the basic feature information is solved, and low-frequency texture features are extracted;

s404, constructing a detail extraction module by utilizing a reversible neural network, acquiring basic feature information of a cross attention module through feature data sharing, and extracting global high-frequency semantic information in a recursion training mode based on the basic feature information;

S406, fusing the low-frequency texture features and the high-frequency semantic information output by the cross attention module and the detail extraction module to generate a final fused image.

The method is characterized in that the enhanced visible light image and the enhanced infrared image are processed through the dual-channel encoder, the characteristic information output by the dual-channel encoder is integrated and optimized through a self-attention mechanism, the basic characteristic information is extracted through cross attention, and the extraction effect of a network is improved; carrying out edge detection by a Sobel operator to solve the structural information of the basic feature information, and extracting low-frequency texture features;

；

A detail extraction module is constructed by utilizing a reversible neural network, global detail information is extracted in a recursion training mode, and high-frequency semantic information is generated; the reversible neural network has a reversible neural network structure, after input data is output through a forward process of the reversible neural network, the original input can be obtained through a reverse process of the reversible neural network, and no information loss exists in the input in the process. Preferably, revCol (reversible columnar neural network) is utilized to realize decoupling of low-level information and high-level semantics, so that relevant information is extracted and utilized; revCol consists of N sub-networks (or columns), each of which is identical in structure and function. By adding additional supervision in the previous columns, the mutual information between the features and the input image is maintained.

High frequency semantic information of visible light imagesAnd high frequency semantic information of infrared images/>Expressed as:

;

Wherein, Mapping function representing reversible neural network,/>And/>Representing the characteristics of the visible and infrared images, respectively. The cross attention module and the detail extraction module extract basic characteristic information and high-frequency characteristic information in a mode of sharing characteristic data, so that a network can provide more comprehensive source image information for visual enhancement and characteristic fusion.

The output of the two modules is fused to generate a final fusion image, and the fusion loss of contrast equalization is used, so that the influence of local overexposure data on the fusion image can be reduced, the fusion image is more suitable for human eye vision perception, and the fusion image has rich textures and good color perception while maintaining structural information.

It should be noted that, obtaining the fusion image, cutting to generate image blocks which are not overlapped and have the same size, extracting the characteristics of the fusion image, calculating the information entropy of different image blocks, screening the image blocks according to the information entropy, and reserving the image blocks larger than the preset information entropy threshold; constructing a reference target image according to historical detection data of the storage targets in a dim light environment, and utilizing normal images of the storage targets of all categories to realize data expansion of the reference target image so as to construct a reference target image set; performing target preliminary detection by using the reserved image blocks, performing pre-screening in a reference target image set through a preliminary detection result, and constructing a reference block according to pre-screened image data; calculating the distance characterization similarity between the image blocks and the reference blocks by using the mahalanobis distance as a measurement function, acquiring the reference blocks smaller than a distance threshold value, and forming a similar block set by the reference blocks of each image block; and obtaining the duty ratio of the background blocks and the target blocks in the similar block set, setting target labels or background labels for the image blocks according to the duty ratio, extracting the image blocks corresponding to the target labels, statistically analyzing the storage target class with the highest occurrence frequency of the reference blocks, and outputting the target detection result of the storage robot.

The method includes the steps that a target detection database of the warehousing robot in a dim light environment is built, historical target detection data of the integrity network in the dim light environment are obtained, accuracy inspection is conducted on the historical target detection data, and abnormal detection data are removed; extracting visible light images and infrared images corresponding to different storage targets according to the preprocessed historical target detection data, extracting basic features, wherein the basic features comprise texture features, edge features and color features, and constructing basic feature subsets by matching the basic features of the different storage targets and storing the basic feature subsets in the target detection database; the method comprises the steps of obtaining a low-illumination visible light image and an infrared image of a target to be detected, firstly calling a basic feature subset in a target detection database to search by using similarity calculation, and judging whether a similarity threshold is met or not, wherein the similarity threshold is met. And selecting the storage target category corresponding to the basic feature subset with the maximum similarity for output, and if the storage target category does not accord with the basic feature subset, carrying out target detection and identification of the storage robot by utilizing the integrity network.

A second aspect of the present invention provides a night warehouse robot target detection system 5 based on an integrity network, the system comprising: the night warehouse robot target detection system comprises a memory 51 and a processor 52, wherein the memory 51 and the processor 52 store and execute a program of the night warehouse robot target detection method based on an integral network.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above described device embodiments are only illustrative, e.g. the division of the units is only one logical function division, and there may be other divisions in practice, such as: multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units; can be located in one place or distributed to a plurality of network units; some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present invention may be integrated in one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated in one unit; the integrated units may be implemented in hardware or in hardware plus software functional units.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, where the program, when executed, performs steps including the above method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk or optical disk, or the like, which can store program codes.

Or the above-described integrated units of the invention may be stored in a computer-readable storage medium if implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in essence or a part contributing to the prior art in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, ROM, RAM, magnetic or optical disk, or other medium capable of storing program code.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. The night warehousing robot target detection method based on the integral network is characterized by comprising the following steps of:

2. The night warehouse robot target detection method based on the integral network according to claim 1, wherein the integral network is constructed through a hidden data pipeline, and the visible light image and the infrared image are used as network inputs, specifically:

3. The night warehouse robot target detection method based on the integrity network according to claim 2, wherein the data enhancement stage specifically comprises:

4. The night warehouse robot target detection method based on the integral network according to claim 2, wherein the image fusion stage specifically comprises:

5. The night warehouse robot target detection method based on the integrity network according to claim 4, wherein the cross attention module specifically comprises:

；

Wherein the method comprises the steps of Features representing visible and infrared images, respectively,/>、/>、/>Respectively representing the calculation of the query, the key vector and the value vector,/>Representing a gradient solving operation,/>Representing the expanded dimension of a feature,/>The link of the data stream is represented as such,Representing the matrix transpose.

6. The night warehouse robot target detection method based on the integrity network according to claim 4, wherein the detail extraction module specifically comprises:

；

7. The night warehousing robot target detection method based on the integral network according to claim 1, wherein target feature extraction and target perception are performed according to the fusion image, and target detection results of the warehousing robot are output, specifically:

8. A night warehouse robot target detection system based on an integrity network, the system comprising: a memory, a processor storing and executing a program of the night warehouse robot target detection method based on the integrity network as claimed in any one of claims 1-7.