CN113936220B

CN113936220B - Image processing method, storage medium, electronic device, and image processing apparatus

Info

Publication number: CN113936220B
Application number: CN202111524090.0A
Authority: CN
Inventors: 张文俊; 孙军欢; 冀旭
Original assignee: Shenzhen Zhixing Technology Co Ltd
Current assignee: Shenzhen Zhixing Technology Co Ltd
Priority date: 2021-12-14
Filing date: 2021-12-14
Publication date: 2022-03-04
Anticipated expiration: 2041-12-14
Also published as: CN113936220A

Abstract

The present application relates to an image processing method, a storage medium, an electronic device, and an image processing apparatus. The method comprises the following steps: inputting an original image into a feature extraction network to obtain a first feature map; inputting the first feature map into a porous space pyramid pooling ASPP model, performing cavity convolution parallel sampling under multiple sampling rates to obtain a multi-scale feature map, inputting the multi-scale feature map into a deep feature extraction module, performing deep feature extraction to obtain a deep feature map, and inputting the first feature map into a shallow feature extraction module, performing shallow feature extraction to obtain a shallow feature map; inputting the deep layer feature map and the shallow layer feature map into a splicing module to be spliced to obtain a second feature map; and performing semantic segmentation on the second feature map to obtain a semantic segmentation result of the original image. Therefore, the detection effect of various possible feature extraction networks is uniformly improved.

Description

Image processing method, storage medium, electronic device, and image processing apparatus

Technical Field

The present application relates to the field of computer vision technologies, and in particular, to an image processing method, a storage medium, an electronic device, and an image processing apparatus.

Background

With the development of artificial intelligence technology, the deep learning technology has made a great development in the field of computer vision technology, and made a great breakthrough in the aspects of image classification, image target detection, image segmentation, and the like. The face recognition product based on the computer vision technology is widely applied to places such as entry and exit ports, railway stations, airport halls and the like, and the purpose of identity detection and judgment is achieved by extracting face features from collected images, comparing and searching. In the industrial application field, such as automatic cargo sorting and port automation in logistics centers, etc., intelligent automatic detection and judgment of target cargos can be realized by means of artificial intelligence technology and products based on computer vision technology, and corresponding operations of carrying, sorting, packaging, etc. can be adopted according to detection and judgment results. In addition, in the waste steel recycling link, various waste steel products with complex sources, various types and large material difference need to be graded and corresponding operations are adopted, so that intelligent automatic detection and judgment of the waste steel products can be realized by means of an artificial intelligence technology and a product based on a computer vision technology. Compared with the traditional manual measurement and manual detection, the intelligent automatic detection and judgment of the target goods or the waste steel has the advantages of objective and stable detection and judgment standard, high informatization degree, reduction of potential safety hazards, labor cost and the like, and is favorable for improving the production efficiency and the operation safety.

In the industrial application field, such as the above-mentioned automatic inspection and judgment of the goods or the waste steel products, the difficulties that the compartment area of the vehicle such as a vehicle loaded with the goods or the waste steel products is difficult to identify, each individual of the goods or the waste steel products is difficult to identify, and the like are often faced, and particularly, the shielding between different individuals, the similar color shapes, and the like all bring challenges to accurate identification.

Therefore, there is a need for an image processing method, a storage medium, an electronic device, and an image processing apparatus, which can be applied to the above-mentioned industrial application fields of automatic inspection and judgment of goods or waste steel products, and which can realize accurate recognition of the contents and positions of objects such as carriers, goods, and individual waste steel products.

Disclosure of Invention

In a first aspect, an embodiment of the present application provides an image processing method, where the image processing method includes: inputting an original image into a feature extraction network to obtain a first feature map; inputting the first feature map into a porous space pyramid pooling ASPP model, performing cavity convolution parallel sampling under multiple sampling rates to obtain a multi-scale feature map, inputting the multi-scale feature map into a deep feature extraction module, performing deep feature extraction to obtain a deep feature map, and inputting the first feature map into a shallow feature extraction module, performing shallow feature extraction to obtain a shallow feature map; inputting the deep layer feature map and the shallow layer feature map into a splicing module to be spliced to obtain a second feature map; and performing semantic segmentation on the second feature map to obtain a semantic segmentation result of the original image.

The technical scheme described in the first aspect has better performance in the aspects of reducing edge errors, improving edge recognition and contour recognition, and the like, reduces the influence of a network model, a network structure and model parameters specifically used by the feature extraction network on the final detection effect, and can uniformly improve the detection effects of various possible feature extraction networks, particularly effectively cope with the situation that a large number of small-size objects or large edge errors exist.

According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the ASPP model and the deep layer feature extraction module are used together as a high-level semantic feature extraction channel and configured to obtain a high-level semantic feature of the first feature map, and the shallow layer feature extraction module is used as a low-level semantic feature extraction channel and configured to obtain a low-level semantic feature of the first feature map.

According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the high-level semantic features of the first feature map are used to enhance semantic information about an identification target in a semantic segmentation result of the original image, and the low-level semantic features of the first feature map are used to enhance boundary information about the identification target in the semantic segmentation result of the original image.

According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the low-level semantic feature extraction channel is a boundary information recovery model associated with the ASPP model, and the boundary information recovery model is configured to recover boundary information that is located in the first feature map but not located in the multi-scale feature map output by the ASPP model.

According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the boundary information recovery model is further configured to improve a resolution of a feature of the second feature map with respect to a boundary of the recognition target.

According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the deep layer feature extraction module performs deep layer feature extraction according to a first feature extraction requirement, the shallow layer feature extraction module performs shallow layer feature extraction according to a second feature extraction requirement, the first feature extraction requirement is for a high-level semantic feature, and the second feature extraction requirement is for a low-level semantic feature.

According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that at least one sampling rate of the multiple sampling rates of the ASPP model may be increased to improve a resolution of a feature of the second feature map with respect to a boundary of the recognition target.

According to a possible implementation manner of the technical solution of the first aspect, the embodiment of the present application further provides that the accuracy of the data format used by the feature extraction network and the ASPP model to store the weights and the gradients may be increased to improve the resolution of the features of the second feature map with respect to the boundary of the recognition target.

According to a possible implementation manner of the technical solution of the first aspect, the embodiment of the present application further provides that the plurality of sampling rates of the ASPP model are associated with network model parameters of the feature extraction network and adjusted together to improve the resolution of the features of the second feature map with respect to the boundary of the recognition target.

According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the original image is subjected to a data enhancement operation, where the data enhancement operation includes at least one of: random inversion, rotation, inversion and rotation, random transformation, random scaling, random clipping, fuzzification, Gaussian noise addition and filling.

According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the image processing method is used for detecting parts in a process of transporting a steel scrap set, a semantic segmentation result of the original image includes a semantic segmentation recognition result of the steel scrap set of the original image, the semantic segmentation recognition result of the steel scrap set of the original image is used to determine at least one piece of related information of the steel scrap set of the original image, and the at least one piece of related information of the steel scrap set of the original image includes at least one of: contour information, category information, source information, coordinate information, area information, pixel feature information.

According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the image processing method is used for detecting a car region, and a semantic segmentation result of the original image includes a semantic segmentation recognition result of a car region contour of the original image.

According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the ASPP model and the deep-level feature extraction module are used together as a high-level semantic feature extraction channel and configured to obtain a high-level semantic feature of the first feature map, the shallow-level feature extraction module is used as a low-level semantic feature extraction channel and configured to obtain a low-level semantic feature of the first feature map, the high-level semantic feature of the first feature map is used to enhance semantic information about a contour of the car area in a semantic segmentation result of the original image, and the low-level semantic feature of the first feature map is used to enhance boundary information about the contour of the car area in the semantic segmentation result of the original image.

According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the low-level semantic feature extraction channel is a boundary information recovery model with respect to the ASPP model, and the boundary information recovery model is configured to recover boundary information about the contour of the car region, which is located in the first feature map but not located in the multi-scale feature map output by the ASPP model.

According to a possible implementation manner of the technical solution of the first aspect, the embodiment of the present application further provides that at least one sampling rate of the plurality of sampling rates of the ASPP model may be increased to improve the resolution of the feature of the second feature map with respect to the contour of the cabin area.

According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the image processing method further includes: determining an optimal compartment contour and coordinate information of the optimal compartment contour according to a semantic segmentation recognition result of the compartment region contour of the original image through a compartment contour searching module, wherein the semantic segmentation recognition result of the compartment region contour of the original image indicates whether each pixel point of the original image belongs to a compartment region.

According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that, by the car contour search module, determining the optimal car contour and the coordinate information of the optimal car contour according to a semantic segmentation recognition result of the car region contour of the original image includes: selecting a plurality of pixel points belonging to a compartment area from all pixel points of the original image according to a semantic segmentation recognition result of the compartment area outline of the original image; carrying out contour calculation on the plurality of pixel points through the carriage contour searching module to obtain a plurality of candidate contours; classifying the plurality of candidate contours and classifying each of the plurality of candidate contours as convex hull type, concave hull type, or irregular type; and performing an optimal search algorithm on the candidate contours classified as convex hulls in the plurality of candidate contours through the carriage contour searching module, so as to obtain the optimal carriage contour and the coordinates of the optimal carriage contour.

In a second aspect, the present application provides a non-transitory computer-readable storage medium storing computer instructions which, when executed by a processor, implement the image processing method according to any one of the first aspect.

The technical solution described in the second aspect has better performance in reducing edge errors, improving edge recognition and contour recognition, and the like, and reduces the influence of a network model, a network structure and model parameters specifically used by the feature extraction network on the final detection effect, so that the detection effects of various possible feature extraction networks can be uniformly improved, and particularly, the situation that a large number of small-sized objects or large edge errors exist can be effectively dealt with.

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor implements the image processing method according to any one of the first aspect by executing the executable instructions.

The technical scheme described in the third aspect has better performance in the aspects of reducing edge errors, improving edge recognition and contour recognition, and the like, reduces the influence of a network model, a network structure and model parameters specifically used by the feature extraction network on the final detection effect, and can uniformly improve the detection effects of various possible feature extraction networks, particularly effectively cope with the situation that a large number of small-size objects or large edge errors exist.

In a fourth aspect, an embodiment of the present application provides an image processing apparatus. The image processing apparatus includes: the characteristic extraction network is used for obtaining a first characteristic diagram according to the original image; the ASPP model is used for performing cavity convolution parallel sampling on the first characteristic diagram under a plurality of sampling rates to obtain a multi-scale characteristic diagram; the deep feature extraction module is used for carrying out deep feature extraction on the multi-scale feature map to obtain a deep feature map; the shallow feature extraction module is used for performing shallow feature extraction on the first feature map to obtain a shallow feature map; the splicing module is used for splicing the deep layer feature map and the shallow layer feature map to obtain a second feature map; and the semantic segmentation model is used for performing semantic segmentation on the second feature map to obtain a semantic segmentation result of the original image.

The technical solution described in the fourth aspect has better performance in reducing edge errors, improving edge recognition and contour recognition, and the like, and reduces the influence of a network model, a network structure and model parameters specifically used by the feature extraction network on the final detection effect, so that the detection effects of various possible feature extraction networks can be uniformly improved, and particularly, the situation that a large number of small-sized objects or large edge errors exist can be effectively dealt with.

According to a possible implementation manner of the technical solution of the fourth aspect, an embodiment of the present application further provides that the ASPP model and the deep-level feature extraction module are used together as a high-level semantic feature extraction channel and configured to obtain high-level semantic features of the first feature map, the shallow-level feature extraction module is used as a low-level semantic feature extraction channel and configured to obtain low-level semantic features of the first feature map, the high-level semantic features of the first feature map are used to enhance semantic information about an identified object in a semantic segmentation result of the original image, the low-level semantic features of the first feature map are used to enhance boundary information about the identified object in the semantic segmentation result of the original image, the low-level semantic feature extraction channel is a boundary information recovery model associated with the ASPP model, and the boundary information recovery model is used to recover boundary information that is located in the first feature map but is not located in the ASPP model And the boundary information in the output multi-scale feature map is used for restoring the model, and the resolution of the features of the second feature map on the boundary of the recognition target is improved.

According to a possible implementation manner of the technical solution of the fourth aspect, the embodiment of the present application further provides that at least one of the plurality of sampling rates of the ASPP model may be increased, or the accuracy of the data format used by the feature extraction network and the ASPP model to store the weights and the gradients may be increased, so as to improve the resolution of the features of the second feature map with respect to the boundary of the recognition target.

According to a possible implementation manner of the technical solution of the fourth aspect, an embodiment of the present application further provides that the image processing apparatus is configured to detect a scrap in a process of transporting a scrap set, a semantic segmentation result of the original image includes a semantic segmentation recognition result of the scrap set of the original image, the semantic segmentation recognition result of the scrap set of the original image is used to determine at least one piece of related information of the scrap set of the original image, and the at least one piece of related information of the scrap set of the original image includes at least one of the following: contour information, category information, source information, coordinate information, area information, pixel feature information.

According to a possible implementation manner of the technical solution of the fourth aspect, an embodiment of the present application further provides that the image processing apparatus is configured to detect a car region, and a semantic segmentation result of the original image includes a semantic segmentation recognition result of a car region contour of the original image, where the image processing apparatus further includes: and the carriage contour searching module is used for determining an optimal carriage contour and coordinate information of the optimal carriage contour according to a semantic segmentation recognition result of the carriage region contour of the original image, wherein the semantic segmentation recognition result of the carriage region contour of the original image indicates whether each pixel point of the original image belongs to a carriage region.

According to a possible implementation manner of the technical solution of the fourth aspect, the embodiment of the present application further provides that determining the optimal car contour and the coordinate information of the optimal car contour according to the semantic segmentation recognition result of the car region contour of the original image includes: selecting a plurality of pixel points belonging to a compartment area from all pixel points of the original image according to a semantic segmentation recognition result of the compartment area outline of the original image; carrying out contour calculation on the plurality of pixel points through the carriage contour searching module to obtain a plurality of candidate contours; classifying the plurality of candidate contours and classifying each of the plurality of candidate contours as convex hull type, concave hull type, or irregular type; and performing an optimal search algorithm on the candidate contours classified as convex hulls in the plurality of candidate contours through the carriage contour searching module, so as to obtain the optimal carriage contour and the coordinates of the optimal carriage contour.

Drawings

In order to explain the technical solutions in the embodiments or background art of the present application, the drawings used in the embodiments or background art of the present application will be described below.

Fig. 1 shows a schematic flowchart of an image processing method provided in an embodiment of the present application.

Fig. 2 shows a block diagram of an electronic device used in the image processing method shown in fig. 1 according to an embodiment of the present application.

Fig. 3 shows a block diagram of an image processing apparatus provided in an embodiment of the present application.

Detailed Description

The embodiment of the application provides an image processing method, a storage medium, an electronic device and an image processing device in order to solve the technical problem of how to accurately identify the content and the position of targets such as carriers, goods, waste steel individuals and the like. The image processing method comprises the following steps: inputting an original image into a feature extraction network to obtain a first feature map; inputting the first feature map into a porous space pyramid pooling ASPP model, performing cavity convolution parallel sampling under multiple sampling rates to obtain a multi-scale feature map, inputting the multi-scale feature map into a deep feature extraction module, performing deep feature extraction to obtain a deep feature map, and inputting the first feature map into a shallow feature extraction module, performing shallow feature extraction to obtain a shallow feature map; inputting the deep layer feature map and the shallow layer feature map into a splicing module to be spliced to obtain a second feature map; and performing semantic segmentation on the second feature map to obtain a semantic segmentation result of the original image. Therefore, better performance in the aspects of reducing edge errors, improving edge recognition, contour recognition and the like is achieved, the influence of a network model, a network structure and model parameters specifically used by the feature extraction network on the final detection effect is reduced, the detection effects of various possible feature extraction networks can be uniformly improved, and particularly the situation that a large number of small-size objects or large edge errors exist can be effectively dealt with.

The embodiment of the application can be applied to the following application scenes, including but not limited to, industrial automation, goods sorting in logistics centers, port automation, intelligent automatic goods inspection and judgment, waste steel recovery, intelligent automatic waste steel inspection and judgment, and any application scenes, such as coal automatic sorting, garbage recovery, garbage automatic sorting and the like, which can improve the production efficiency and reduce the labor cost through the identification method and device for intelligent material inspection and judgment.

The embodiments of the present application may be modified and improved according to specific application environments, and are not limited herein.

In order to make the technical field of the present application better understand, embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.

Aspects of the present application and various embodiments and implementations mentioned below relate to concepts of artificial intelligence, machine learning, and neural networks. In general, Artificial Intelligence (AI) studies the nature of human Intelligence and builds intelligent machines that can react in a manner similar to human Intelligence. Research in the field of artificial intelligence applications includes robotics, speech recognition, natural language processing, image recognition, decision reasoning, human-computer interaction, expert systems, and the like. Machine Learning (ML) studies how artificial intelligence systems model or implement human Learning behavior, acquire new knowledge or skills, reorganize existing knowledge structures, and improve self-competency. Machine learning learns rules from a large number of samples, data, or experiences through various algorithms to identify new samples or to make decisions and predictions about events. Examples of machine learning algorithms include decision tree learning, bayesian classification, support vector machines, clustering algorithms, and the like. Deep Learning (DL) refers to the natural Deep structures of the human brain and cognitive processes graded by depth, studies how to input large amounts of data into complex models, and "trains" the models to learn how to grab features. Neural Networks (NN) can be divided into Artificial Neural Networks (ANN) and Spiking Neural Networks (SNN). The SNN simulates a pulse neuron model of a biological nerve working mechanism, and pulse coding information is adopted in the calculation process. Currently, ANN is widely used. The neural network NN referred to herein generally refers to an artificial neural network, i.e., an ANN, unless specified otherwise or indicated otherwise or a different interpretation is made in conjunction with the context.

The ANN refers to an algorithmic mathematical model which is established by the inspiration of a brain neuron structure and a nerve conduction principle, and has a network structure which imitates animal neural network behavior characteristics to process information. Neural networks comprise a large number of interconnected nodes or neurons, sometimes referred to as artificial neurons or perceptrons, which are inspired by the structure of neurons in the brain. The Shallow Neural Network (shadow Neural Network) only comprises an input layer and an output layer, wherein the input layer is responsible for receiving input signals, and the output layer is responsible for outputting calculation results of the Neural Network. After the input signals are linearly combined, an Activation Function (Activation Function) is applied to the input signals for transformation to obtain a result of an output layer. The complex model used in Deep learning is mainly a multi-layer Neural Network, sometimes referred to as Deep Neural Network (DNN). The multi-layer neural network includes hidden layers in addition to an input layer and an output layer, each hidden layer includes an arbitrary number of neurons which are connected as nodes with a node of a previous layer in a network structure, and each neuron can be regarded as a linear combiner and assigns a weight to each connected input value for weighted linear combination. The activation function is a nonlinear mapping after weighted linear combination of input signals, which in a multilayer neural network can be understood as a functional relationship between the output of a neuron in a previous layer and the input of a neuron in a next layer. Each hidden layer may have a different activation function. Common activation functions are ReLU, Sigmoid, Tanh, etc. The neural network passes the information of each layer to the next layer through the mesh structure. The forward propagation is a process of calculating layer by layer from an input layer to an output layer, the weighted linear combination and the transformation are repeatedly carried out in the forward propagation process, and finally, a Loss Function (Loss Function) is calculated and used for measuring the deviation degree between the predicted value and the true value of the model. The back propagation is to propagate from the output layer to the hidden layer to the input layer, and the neural network parameters are corrected according to the error between the actual output and the expected output in the back propagation process. DNN can be classified into Convolutional Neural Network (CNN), Fully Connected Neural Network (FCN), and Recurrent Neural Network (RNN) according to the composition of a base layer. The CNN is composed of a convolutional layer, a pooling layer and a full-link layer. The FCN consists of multiple fully connected layers. The RNN consists of fully connected layers but with feedback paths and gating operations between layers, also called recursive layers. Different types of neural network base layers have different computational characteristics and computational requirements, for example, the computation proportion of convolutional layers in some neural networks is high and the computation amount of each convolutional layer is large. In addition, the calculation parameters of each convolution layer of the neural network, such as the convolution kernel size and the input/output characteristic diagram size, vary widely.

Fig. 1 shows a schematic flowchart of an image processing method provided in an embodiment of the present application. As shown in fig. 1, the image processing method includes the following steps.

Step S102: and inputting the original image into a feature extraction network to obtain a first feature map.

Step S104: the method comprises the steps of inputting the first feature map into a porous space pyramid pooling ASPP model, performing cavity convolution parallel sampling under multiple sampling rates to obtain a multi-scale feature map, inputting the multi-scale feature map into a deep feature extraction module, performing deep feature extraction to obtain a deep feature map, and inputting the first feature map into a shallow feature extraction module, performing shallow feature extraction to obtain a shallow feature map.

Step S106: inputting the deep layer feature map and the shallow layer feature map into a splicing module to be spliced to obtain a second feature map.

Step S108: and performing semantic segmentation on the second feature map to obtain a semantic segmentation result of the original image.

The principle of the image semantic segmentation technology comprises the steps of identifying pixels on an image, distributing classes and labeling corresponding classes so as to obtain a pixel-level prediction result of the image. In order to realize accurate identification and detection based on computer vision technology, the content and position of the target in the image need to be accurately identified and labeled, and other categories can also be labeled. The better the detection effect of the pixel level prediction result of the image or the semantic segmentation result of the original image is, which means that the identification of the content and the position of the target in the image is more accurate, and thus, a more reliable prediction result can be brought and a better basis can be provided for decision making and subsequent processing. In order to obtain better detection effect, the process of obtaining the semantic segmentation result of the original image needs to be improved. Here, the semantic information of the original image may be divided into different levels, such as a low-level semantic feature (which may also be called a low-level semantic feature) and a high-level semantic feature (which may also be called a high-level semantic feature). The features of the image such as the contour, edge, color, texture and shape are generally called low-level semantic features, and sometimes called bottom-level semantic features for corresponding concepts of the visual layer. And features such as objects of an image that are closer to being understood by human vision are referred to as high-level semantic features. For example, in semantic information of a face image, an outline, a nose, eyes, and the like of a face are low-level semantic features, and an object of the face is a high-level semantic feature. The low-level semantic features contain less feature semantic information but accurate target positions; the high-level semantic features contain rich feature semantic information but rough target positions. Therefore, target positions such as judgment edges and key points can be better identified by extracting low-level semantic features, but the semantic information of the features contained in the target positions is less, so that the content of the image is not easy to understand better; on the other hand, the content of the image can be better understood by extracting high-level semantic features, but the target position is rough, so that the position identification is not facilitated, and a large error in edge identification can exist. Taking the semantic information of the face image mentioned above as an example, low-level semantic features can be used to better identify the location of the nose and eyes, while high-level semantic features can be used to better understand the content of the face. In order to obtain a better detection effect, it is necessary to improve the process of obtaining the semantic segmentation result of the original image to fully utilize different levels of semantic features, including low-level semantic features and high-level semantic features, in the semantic information of the original image, which is described in detail below.

In step S102, a first feature map of the original image is obtained through a feature extraction network. The feature extraction network may employ any suitable network model, network structure and model parameters, as long as the basic features or fusion features of the original image can be extracted for subsequent processing. For example, the feature extraction network may be a Deep Convolutional Neural Network (DCNN) or other Neural network with Deep structure and convolution computation. As another example, the feature extraction network may be an image semantic segmentation model such as U-Net, FCN, SegNet, PSPNet, deep Lab, an extension framework thereof, and the like. Depending on the network model, network structure and model parameters used, the feature extraction network may divide the original image into a plurality of blocks of the same size and perform feature extraction or semantic recognition and labeling within each block, for example, U-Net may divide the original image into a plurality of blocks of size 256 × 256. However, in the industrial application field such as the above-mentioned automatic determination of the goods or the waste steel material, it may be encountered that the waste steel material is located at the boundary of two adjacent blocks, and the respective semantic recognition results of the two adjacent blocks may be opposite, for example, the waste steel material located at the boundary of two adjacent blocks may be recognized as one type in one block but as another type in another block. Furthermore, edge recognition errors can also be introduced by the contour of the body of a vehicle, for example a vehicle, or the edge of the body if it is located exactly at the intersection of two adjacent blocks. In addition to the edge errors caused by the block recognition of the original image, there may be different kinds of objects with higher similarity on the original image, especially small-sized objects or small-sized objects, which have higher requirements on the detection accuracy of the edge recognition and the contour recognition than large-sized objects. Different feature extraction networks may have different expressions in the aspects of the edge error, the edge recognition, the contour recognition and the like, and may also have different expressions in different actual scenes based on different network models, network structures or model parameters adopted by the different feature extraction networks. In order to uniformly improve the detection effect of various possible feature extraction networks in the industrial application field, particularly in the intelligent automatic detection and judgment field of waste steel, the process of obtaining the semantic segmentation result of an original image needs to be improved and any feature extraction network can be matched.

In step S104, the first feature map is input into the porous space pyramid pooling ASPP model to perform void convolution parallel sampling at multiple sampling rates to obtain a multi-scale feature map, then the multi-scale feature map is input into the deep feature extraction module to perform deep feature extraction to obtain a deep feature map, and the first feature map is input into the shallow feature extraction module to perform shallow feature extraction to obtain a shallow feature map. Here, the porous Spatial Pyramid Pooling (ASPP) model convolves and samples the given input in parallel with holes of different sampling rates, equivalent to the context of capturing images at multiple scales. In other words, ASPP fuses multi-scale information using hole convolutions of different dilation factors, i.e., operates on image features with different sampling rates using hole convolutions (holey convolutions). In this way, by using the ASPP model or the image semantic segmentation model with the ASPP structure, the convolution and parallel sampling of holes with different sampling rates are equivalent to capturing the context of the first feature map in multiple proportions, so as to obtain a multi-scale feature map with multi-scale sampling information. The multi-scale sampling information is the whole of the input first feature map, so that complete edge information is reserved, and edge errors caused by partitioning identification of the original image by a feature extraction network such as U-net are avoided. In one possible implementation, the ASPP model is a combination of hole convolution models with multiple sampling rates and multiple model parameters, including, for example, a 1x1 convolutional layer, a 3x3 convolutional layer with a sampling rate of 16, a 3x3 convolutional layer with a sampling rate of 18, and image pooling layers. And inputting the first feature map into an ASPP model to obtain a multi-scale feature map, expressing the multi-scale feature map into a plurality of mask maps, and inputting the multi-scale feature map into a deep feature extraction module to perform deep feature extraction to obtain a deep feature map. Here, although the multi-scale feature map retains complete edge information, the image features therein are pixel-level prediction results, considering that the pixel-level prediction results obtained by the ASPP model may be interfered or have errors between adjacent pixel points, such as objects that may be difficult to distinguish between the edges of a car or adjacent cars but are not located in the car. The reason behind this is the reduced resolution of features on edge information resulting from the ASPP model convolving parallel samples with holes of different sampling rates. For this purpose, a deep feature map is obtained by inputting a multi-scale feature map output by the ASPP model into a deep feature extraction module for deep feature extraction, and a shallow feature map is obtained by inputting the first feature map into a shallow feature extraction module for shallow feature extraction, so that abundant edge information in a pixel-level prediction result of the ASPP model is utilized, and further, the problem of the resolution reduction of features related to the edge information due to sampling of the ASPP model is overcome by extracting the shallow features. It should be understood that the deep and shallow feature extraction modules may employ any suitable models and parameters. For example, the deep feature extraction module may reduce the weight of low-level semantic features by encoding operations such as channel compression and upsampling. The shallow feature extraction module may better recover the details of the target boundary through a decoding operation, and these may adopt an appropriate technique according to a specific application scenario, which is not specifically limited herein. In some embodiments, the channel compression may be performed by convolution with 1 × 1, and the output of the encoding operation may be up-sampled by 4 times, so that the output of the encoding operation is consistent with the resolution of the output of the decoding operation, and may be further up-sampled again by convolution with 3 × 3 after splicing in order to achieve a thinning effect.

In step S106, the deep feature map and the shallow feature map are input into a stitching module for stitching to obtain a second feature map. As mentioned above, although the multi-scale feature map output by the ASPP model is convolution parallel sampling of holes with different sampling rates for the whole input first feature map, so that complete edge information is retained, the resolution of the features related to the edge information is reduced or the resolution of the features related to the boundary of the recognition target is reduced, for this purpose, the deep feature map is obtained by deep feature extraction of the multi-scale feature map by the deep feature extraction module, and the shallow feature map is obtained by shallow feature extraction of the first feature map by the shallow feature extraction module, so that abundant edge information in the pixel level prediction result of the ASPP model is utilized, and the problem of the resolution reduction of the features related to the edge information caused by sampling of the ASPP model is further overcome by extracting the shallow features. The shallow feature contained in the shallow feature map obtained by the shallow feature extraction module corresponds to the low-level semantic features of the original image, and the feature semantic information contained in the shallow feature map is less, but the target position is accurate and can be used for better identifying the position, so that the key points and the edges can be better judged, and the edge error can be reduced. In summary, on one hand, shallow feature extraction is performed on the first feature map (for example, low-level semantic features have less feature information but accurate positions), and on the other hand, deep feature extraction is performed on the multi-scale feature map (for example, high-level semantic features have more feature information but inaccurate positions). The shallow features can be used to restore the resolution of features that identify the boundary of the target, such as the resolution of the edges of the car, while the deep features can be used to better identify the content of the target. And finally, splicing the deep layer characteristic diagram and the shallow layer characteristic diagram through a splicing module to obtain a second characteristic diagram. The result of the stitching thus obtained, i.e., the second feature map, is better represented in terms of edge error, edge recognition, contour recognition, and the like than the first feature map, because of the above series of operations. In addition, compared with the first feature map limited by the network model, the network structure and the model parameters specifically used by the feature extraction network, the second feature map can uniformly improve the detection effect of various possible feature extraction networks through the operations of the step S104 and the step S106, and particularly effectively cope with the situation that a large number of small-size objects or large edge errors exist.

In step S108, performing semantic segmentation on the second feature map to obtain a semantic segmentation result of the original image. The semantic segmentation of the second feature map may be based on any suitable semantic segmentation model, or a confidence analysis model, such as an activation function, i.e. a confidence analysis, to obtain a final prediction result.

Referring to steps S102 to S108, the image processing method performs parallel sampling on the whole first feature map by using an ASPP model with a hole convolution of different sampling rates, thereby retaining complete edge information and avoiding an edge error caused by a feature extraction network such as U-net performing block recognition on an original image; the problem of reduced resolution of features related to edge information due to sampling of an ASPP model is solved by extracting shallow features; and obtaining a second characteristic diagram by splicing the deep characteristic diagram and the shallow characteristic diagram. The second feature map obtained based on such an operation is improved as compared with the first feature map as follows: the method has better performance in the aspects of reducing edge errors, improving edge recognition, contour recognition and the like, reduces the influence of network models, network structures and model parameters specifically used by the feature extraction network on the final detection effect, and can uniformly improve the detection effect of various possible feature extraction networks, particularly effectively cope with the condition that a large number of small-size objects or large edge errors exist.

In a possible implementation, the ASPP model and the deep layer feature extraction module together serve as a high-level semantic feature extraction channel and are configured to obtain high-level semantic features of the first feature map, and the shallow layer feature extraction module serves as a low-level semantic feature extraction channel and is configured to obtain low-level semantic features of the first feature map. In some embodiments, the high-level semantic features of the first feature map are used to enhance semantic information about an identification target in the semantic segmentation result of the original image, and the low-level semantic features of the first feature map are used to enhance boundary information about the identification target in the semantic segmentation result of the original image. Therefore, the high-level semantic feature extraction channel and the low-level semantic feature extraction channel have the effects of enhancing the semantic information of the recognition target and enhancing the boundary information of the recognition target at the same time, so that the second feature map obtained by splicing the deep feature map and the shallow feature map is improved greatly. In some embodiments, the low-level semantic feature extraction channel is a boundary information recovery model associated with the ASPP model for recovering boundary information located in the first feature map but not in the multi-scale feature map output by the ASPP model. Here, considering that the multi-scale feature map output by the ASPP model is convolution parallel-sampled with holes of different sampling rates for the entirety of the input first feature map, thereby causing a decrease in resolution of the features regarding the edge information or the features regarding the boundary of the recognition target, there may be a case where part of the boundary information is not located in the multi-scale feature map output by the ASPP model but can be recovered from the first feature map. Therefore, the boundary information needing to be restored is extracted through the low-level semantic feature extraction channel, and the boundary information is equivalent to be used as a boundary information restoration model associated with the ASPP model to restore the part of the boundary information which is not located in the multi-scale feature map output by the ASPP model, so that the detection effect is improved. And the boundary information restoration model is also used for improving the resolution of the features of the second feature map with respect to the boundary of the recognition target.

In a possible implementation, the deep feature extraction module performs deep feature extraction according to a first feature extraction requirement, and the shallow feature extraction module performs shallow feature extraction according to a second feature extraction requirement, wherein the first feature extraction requirement is for high-level semantic features, and the second feature extraction requirement is for low-level semantic features.

In one possible implementation, at least one of the plurality of sampling rates of the ASPP model may be increased to improve a resolution of features of the second feature map with respect to identifying a boundary of an object. Here, the ratio of capturing the context may be increased by increasing the sampling rate, thereby increasing the resolution of the feature of the second feature map with respect to the boundary of the recognition target, and a higher resolution may reduce the edge error, which is advantageous for improving the detection effect. For example, in the case of car edge recognition, the image features of a car on a multi-scale included in the multi-scale feature map output by the ASPP model represent the resolution of the features of the boundary of the car as a recognition target. In some embodiments, the accuracy of the data format used by the feature extraction network and the ASPP model to hold weights and gradients may be increased to improve the resolution of features of the second feature map with respect to identifying the boundary of the target. For example, by using higher precision than FP16, such as FP32, more information can be carried by a higher precision data format, which is beneficial to improving detection effect.

In one possible embodiment, the plurality of sampling rates of the ASPP model are associated with network model parameters of the feature extraction network and adjusted together to increase the resolution of features of the second feature map with respect to the boundary of the recognition target. Therefore, the sampling rate of the ASPP model and the network model parameters of the feature extraction network are adjusted together, and the final detection effect can be improved better in a synergistic manner. And, the tuning can be performed according to the specific characteristics (such as recognition accuracy, training feedback, iteration effect) of the feature extraction network.

In one possible embodiment, the original image is subjected to a data enhancement operation comprising at least one of: random inversion, rotation, inversion and rotation, random transformation, random scaling, random clipping, fuzzification, Gaussian noise addition and filling. The original image may also be subjected to any suitable data enhancement operation or pre-processing operation.

In a possible embodiment, the image processing method is used for detecting a scrap set in a scrap set transportation process, the semantic segmentation recognition result of the original image includes a semantic segmentation recognition result of the scrap set of the original image, the semantic segmentation recognition result of the scrap set of the original image is used for determining at least one piece of related information of the scrap set of the original image, and the at least one piece of related information of the scrap set of the original image includes at least one of: contour information, category information, source information, coordinate information, area information, pixel feature information. Thus, the detection effect of the scrap steel part is enhanced, and rich associated information is provided. The profile information indicates the profile of each scrap part in the scrap part set, and may be a result of matching with a plurality of preset profile types, or may be semantic descriptions (such as side length, curvature, and the like) in a numerical manner, or may be generalized semantic descriptions (such as a disc shape, a strip shape, and the like). The type information indicates how many types of steel scrap pieces are contained in each steel scrap piece of the steel scrap piece set and the number of each type of steel scrap piece, and the information can be used for further analyzing and extracting more information, so that the related information at least comprises type information under general conditions. For example, the type information of the scrap steel parts set may indicate that each scrap steel part of the scrap steel parts set has 10 train wheels, 20 car bearings, 30 screws, and the like. The source information indicates from which location a scrap piece comes, for example from a train or barge. The coordinate information indicates the coordinates of a certain scrap piece on the image. The area information indicates the area of a certain scrap piece identified on the image. The pixel characteristic information indicates characteristics of all pixels to which a certain scrap piece belongs. It should be understood that more abundant associated information of the scrap steel part set can be obtained according to the computer vision technology which is specifically adopted to obtain the semantic segmentation result of the original image. The above listed examples of association information are illustrative only and not limiting. Therefore, abundant associated information is obtained, and basis is provided for decision making and subsequent processing.

In one possible implementation, the image processing method is used for compartment area detection, and the semantic segmentation result of the original image comprises a semantic segmentation recognition result of a compartment area outline of the original image. In a scenario for detecting a compartment area, the ASPP model and the deep-level feature extraction module are used together as a high-level semantic feature extraction channel and configured to obtain a high-level semantic feature of the first feature map, the shallow-level feature extraction module is used as a low-level semantic feature extraction channel and configured to obtain a low-level semantic feature of the first feature map, the high-level semantic feature of the first feature map is used to enhance semantic information about a compartment area contour in a semantic segmentation result of the original image, and the low-level semantic feature of the first feature map is used to enhance boundary information about the compartment area contour in the semantic segmentation result of the original image. And the low-level semantic feature extraction channel is a boundary information recovery model with respect to the ASPP model, the boundary information recovery model being used to recover boundary information about the contour of the car region that is located in the first feature map but not in the multi-scale feature map output by the ASPP model. And, at least one of the plurality of sampling rates of the ASPP model may be increased to improve a resolution of the feature of the second feature map with respect to the cabin area contour. Thus, the detection effect on the compartment area is enhanced.

In one possible implementation, the image processing method further includes: determining an optimal compartment contour and coordinate information of the optimal compartment contour according to a semantic segmentation recognition result of the compartment region contour of the original image through a compartment contour searching module, wherein the semantic segmentation recognition result of the compartment region contour of the original image indicates whether each pixel point of the original image belongs to a compartment region. Here, in the scenario for detecting the car region, in order to effectively prevent the situation of false recognition outside the car (for example, interference of other cars, or interference of complete cars not included in the original image), an automatic car search algorithm may be further performed on the semantic segmentation recognition result of the car region contour of the original image through an additional car contour search module, so as to determine an optimal car contour and coordinate information of the optimal car contour. In some embodiments, the automatic car search algorithm, namely determining the optimal car contour and the coordinate information of the optimal car contour according to the semantic segmentation recognition result of the car region contour of the original image, comprises: selecting a plurality of pixel points belonging to a compartment area from all pixel points of the original image according to a semantic segmentation recognition result of the compartment area outline of the original image; carrying out contour calculation on the plurality of pixel points through the carriage contour searching module to obtain a plurality of candidate contours; classifying the plurality of candidate contours and classifying each of the plurality of candidate contours as convex hull type, concave hull type, or irregular type; and performing an optimal search algorithm on the candidate contours classified as convex hulls in the plurality of candidate contours through the carriage contour searching module, so as to obtain the optimal carriage contour and the coordinates of the optimal carriage contour. Here, the convex hull type candidate contour refers to a contour that conforms to the convex hull type definition of the car patches after the same patch is set for the pixel points identified as the car patches. The candidate contour of the concave hull type and the candidate contour of the irregular type are also contours that conform to the cabin color patches of the concave hull type and the irregular type, respectively. Wherein the convex hull form defines a contour having a contour or convex surface curved like the outside of a circle or sphere. The dimple type defines a contour having an inwardly curved contour or surface, like the interior of a circle or sphere. An irregularly-defined contour is a contour that does not conform to either a convex-hull or a concave-hull definition. In this way, the optimal compartment contour and the coordinates of the optimal compartment contour are obtained by traversing the candidate contour classified into the convex hull type, so that the situation of false recognition outside the compartment can be effectively prevented, and the optimal compartment contour and the coordinates thereof can still be searched in a traversing manner under the situation that the complete compartment is not included in the original image. Moreover, the target area may be preset before traversing the candidate contour classified as the convex hull type, so as to better prevent the situation of false recognition outside the car, for example, excluding the area obviously far away from the car from the target area for searching.

It is to be understood that the above-described method may be implemented by a corresponding execution body or carrier. In some exemplary embodiments, a non-transitory computer readable storage medium stores computer instructions that, when executed by a processor, implement the above-described method and any of the above-described embodiments, implementations, or combinations thereof. In some example embodiments, an electronic device includes: a processor; a memory for storing processor-executable instructions; wherein the processor implements the above method and any of the above embodiments, implementations, or combinations thereof by executing the executable instructions.

Fig. 2 shows a block diagram of an electronic device used in the image processing method shown in fig. 1 according to an embodiment of the present application. As shown in FIG. 2, the electronic device includes a main processor 202, an internal bus 204, a network interface 206, a main memory 208, and secondary processor 210 and secondary memory 212, as well as a secondary processor 220 and secondary memory 222. The main processor 202 is connected to the main memory 208, and the main memory 208 can be used for storing computer instructions executable by the main processor 202, so that the image processing method shown in fig. 1 can be implemented, including some or all of the steps, and any possible combination or combination and possible replacement or variation of the steps. The network interface 206 is used to provide network connectivity and to transmit and receive data over a network. The internal bus 204 is used to provide internal data interaction between the main processor 202, the network interface 206, the auxiliary processor 210, and the auxiliary processor 220. The auxiliary processor 210 is coupled to the auxiliary memory 212 and provides auxiliary computing power, and the auxiliary processor 220 is coupled to the auxiliary memory 222 and provides auxiliary computing power. The auxiliary processors 210 and 220 may provide the same or different auxiliary computing capabilities including, but not limited to, computing capabilities optimized for particular computing requirements such as parallel processing capabilities or tensor computing capabilities, computing capabilities optimized for particular algorithms or logic structures such as iterative computing capabilities or graph computing capabilities, and the like. The auxiliary processors 210 and 220 may include one or more processors of a particular type, such as a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or the like, so that customized functionality and structure may be provided. In some exemplary embodiments, the electronic device may not include an auxiliary processor, may include only one auxiliary processor, and may include any number of auxiliary processors and each have a corresponding customized function and structure, which are not specifically limited herein. The architecture of the two auxiliary processors shown in FIG. 2 is for illustration only and should not be construed as limiting. In addition, main processor 202 may include a single-core or multi-core computing unit to provide the functions and operations necessary for embodiments of the present application. In addition, the main processor 202 and the auxiliary processors (such as the auxiliary processor 210 and the auxiliary processor 220 in fig. 2) may have different architectures, that is, the electronic device may be a heterogeneous architecture based system, for example, the main processor 202 may be a general-purpose processor based on an instruction set operating system, such as a CPU, and the auxiliary processor may be a graphics processor GPU suitable for parallelized computation or a dedicated accelerator suitable for neural network model-related operations. The auxiliary memory (e.g., auxiliary memory 212 and auxiliary memory 222 shown in fig. 2) may be used to implement customized functions and structures in cooperation with the respective auxiliary processors. And main memory 208 stores the necessary instructions, software, configurations, data, etc. to cooperate with main processor 202 to provide the functionality and operations necessary for the embodiments of the present application. In some exemplary embodiments, the electronic device may not include the auxiliary memory, may include only one auxiliary memory, and may further include any number of auxiliary memories, which is not specifically limited herein. The architecture of the two auxiliary memories shown in fig. 2 is illustrative only and should not be construed as limiting. Main memory 208 and possibly secondary memory may include one or more of the following features: volatile, nonvolatile, dynamic, static, readable/writable, read-only, random-access, sequential-access, location-addressability, file-addressability, and content-addressability, and may include random-access memory (RAM), flash memory, read-only memory (ROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, a hard disk, a removable disk, a recordable and/or rewriteable Compact Disc (CD), a Digital Versatile Disc (DVD), a mass storage media device, or any other form of suitable storage media. The internal bus 204 may include any of a variety of different bus structures or combinations of different bus structures, such as a memory bus or memory controller, a peripheral bus, a universal serial bus, and/or a processor or local bus that utilizes any of a variety of bus architectures. It should be understood that the electronic device shown in fig. 2, the illustrated structure of which does not constitute a specific limitation on the apparatus or system in question, may in some exemplary embodiments include more or fewer components than the specific embodiments and the drawings, or combine certain components, or split certain components, or have a different arrangement of components.

With continued reference to fig. 2, in one possible implementation, the auxiliary processor 210 and/or the auxiliary processor 220 may have a computing architecture that is custom designed for the characteristics of neural network computing, such as a neural network accelerator. Moreover, the electronic device may include any number of auxiliary processors each having a computing architecture that is custom designed for the characteristics of neural network computations, or the electronic device may include any number of neural network accelerators. In some embodiments, for illustrative purposes only, an exemplary neural network accelerator may be: the neural network accelerator is provided with a time domain computing architecture based on a control flow, and the instruction flow of an instruction set is customized based on a neural network algorithm to perform centralized control on computing resources and storage resources; alternatively, neural network accelerators with a data-flow based spatial computation architecture, such as two-dimensional spatial computation arrays based on Row Stationary (RS) data flows, two-dimensional matrix multiplication arrays using Systolic arrays (Systolic Array), and the like; or any neural network accelerator having any suitable custom designed computational architecture.

Fig. 3 shows a block diagram of an image processing apparatus provided in an embodiment of the present application. As shown in fig. 3, the image processing apparatus includes: a feature extraction network 310, configured to obtain a first feature map from an original image; the ASPP model 320 is used for performing cavity convolution parallel sampling on the first feature map at a plurality of sampling rates to obtain a multi-scale feature map; the deep-layer feature extraction module 330 is configured to perform deep-layer feature extraction on the multi-scale feature map to obtain a deep-layer feature map; a shallow feature extraction module 340, configured to perform shallow feature extraction on the first feature map to obtain a shallow feature map; a splicing module 350, configured to perform a splicing operation on the deep layer feature map and the shallow layer feature map to obtain a second feature map; and a semantic segmentation model 360, configured to perform semantic segmentation on the second feature map to obtain a semantic segmentation result of the original image. It should be appreciated that the deep feature extraction module 330 and the shallow feature extraction module 340 may employ any suitable models and parameters. For example, the deep-level feature extraction module 330 may reduce the weight of low-level semantic features by encoding operations such as channel compression and upsampling. The shallow feature extraction module 340 may better recover the target boundary details through a decoding operation, which may employ a suitable technique according to a specific application scenario, and is not limited herein.

The image processing device carries out convolution parallel sampling on the whole first feature map by using the ASPP model and cavities with different sampling rates, thereby retaining complete edge information and avoiding edge errors caused by partitioning identification of the original image by a feature extraction network such as U-net; the problem of reduced resolution of features related to edge information due to sampling of an ASPP model is solved by extracting shallow features; and obtaining a second characteristic diagram by splicing the deep characteristic diagram and the shallow characteristic diagram. The second feature map obtained based on such an operation is improved as compared with the first feature map as follows: the method has better performance in the aspects of reducing edge errors, improving edge recognition, contour recognition and the like, reduces the influence of network models, network structures and model parameters specifically used by the feature extraction network on the final detection effect, and can uniformly improve the detection effect of various possible feature extraction networks, particularly effectively cope with the condition that a large number of small-size objects or large edge errors exist.

In one possible implementation, the ASPP model 320 and the deep-level feature extraction module 330 together serve as a high-level semantic feature extraction channel for obtaining high-level semantic features of the first feature map, the shallow-level feature extraction module 340 serves as a low-level semantic feature extraction channel for obtaining low-level semantic features of the first feature map, the high-level semantic features of the first feature map are used for enhancing semantic information about an identification target in a semantic segmentation result of the original image, the low-level semantic features of the first feature map are used for enhancing boundary information about the identification target in the semantic segmentation result of the original image, the low-level semantic feature extraction channel is a boundary information recovery model associated with the ASPP model 320, and the boundary information recovery model is used for recovering boundary information located in the first feature map but not located in the multi-scale feature map output by the ASPP model 320 The boundary information restoration model is also used to improve the resolution of the features of the second feature map with respect to the boundary of the recognition target.

In one possible implementation, at least one of the plurality of sampling rates of the ASPP model 320 may be increased, or the accuracy of the data format used by the feature extraction network 310 and the ASPP model 320 to hold the weights and gradients may be increased to improve the resolution of the features of the second feature map with respect to the boundary of the recognition target.

In a possible embodiment, the image processing apparatus is used for material detection during transportation of a steel scrap set, the semantic segmentation recognition result of the original image includes a semantic segmentation recognition result of the steel scrap set of the original image, the semantic segmentation recognition result of the steel scrap set of the original image is used for determining at least one piece of related information of the steel scrap set of the original image, and the at least one piece of related information of the steel scrap set of the original image includes at least one of: contour information, category information, source information, coordinate information, area information, pixel feature information.

In one possible implementation, the image processing apparatus is configured to detect a car region, and the semantic segmentation result of the original image includes a semantic segmentation recognition result of a car region contour of the original image, wherein the image processing apparatus further includes: a compartment contour searching module (not shown) configured to determine an optimal compartment contour and coordinate information of the optimal compartment contour according to a semantic segmentation recognition result of a compartment region contour of the original image, where the semantic segmentation recognition result of the compartment region contour of the original image indicates whether each pixel point of the original image belongs to a compartment region.

In one possible implementation, determining the optimal car contour and the coordinate information of the optimal car contour according to the semantic segmentation recognition result of the car region contour of the original image includes: selecting a plurality of pixel points belonging to a compartment area from all pixel points of the original image according to a semantic segmentation recognition result of the compartment area outline of the original image; carrying out contour calculation on the plurality of pixel points through the carriage contour searching module to obtain a plurality of candidate contours; classifying the plurality of candidate contours and classifying each of the plurality of candidate contours as convex hull type, concave hull type, or irregular type; and performing an optimal search algorithm on the candidate contours classified as convex hulls in the plurality of candidate contours through the carriage contour searching module, so as to obtain the optimal carriage contour and the coordinates of the optimal carriage contour.

The embodiments provided herein may be implemented in any one or combination of hardware, software, firmware, or solid state logic circuitry, and may be implemented in connection with signal processing, control, and/or application specific circuitry. Particular embodiments of the present application provide an apparatus or device that may include one or more processors (e.g., microprocessors, controllers, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), etc.) that process various computer-executable instructions to control the operation of the apparatus or device. Particular embodiments of the present application provide an apparatus or device that can include a system bus or data transfer system that couples the various components together. A system bus can include any of a variety of different bus structures or combination of different bus structures, such as a memory bus or memory controller, a peripheral bus, a universal serial bus, and/or a processor or local bus that utilizes any of a variety of bus architectures. The devices or apparatuses provided in the embodiments of the present application may be provided separately, or may be part of a system, or may be part of other devices or apparatuses.

Particular embodiments provided herein may include or be combined with computer-readable storage media, such as one or more storage devices capable of providing non-transitory data storage. The computer-readable storage medium/storage device may be configured to store data, programmers and/or instructions that, when executed by a processor of an apparatus or device provided by embodiments of the present application, cause the apparatus or device to perform operations associated therewith. The computer-readable storage medium/storage device may include one or more of the following features: volatile, non-volatile, dynamic, static, read/write, read-only, random access, sequential access, location addressability, file addressability, and content addressability. In one or more exemplary embodiments, the computer-readable storage medium/storage device may be integrated into a device or apparatus provided in the embodiments of the present application or belong to a common system. The computer-readable storage medium/memory device may include optical, semiconductor, and/or magnetic memory devices, etc., and may also include Random Access Memory (RAM), flash memory, read-only memory (ROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, a hard disk, a removable disk, a recordable and/or rewriteable Compact Disc (CD), a Digital Versatile Disc (DVD), a mass storage media device, or any other form of suitable storage media.

The above is an implementation manner of the embodiments of the present application, and it should be noted that the steps in the method described in the embodiments of the present application may be sequentially adjusted, combined, and deleted according to actual needs. In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments. It is to be understood that the embodiments of the present application and the structures shown in the drawings are not to be construed as particularly limiting the devices or systems concerned. In other embodiments of the present application, an apparatus or system may include more or fewer components than the specific embodiments and figures, or may combine certain components, or may separate certain components, or may have a different arrangement of components. Those skilled in the art will understand that various modifications and changes may be made in the arrangement, operation, and details of the methods and apparatus described in the specific embodiments without departing from the spirit and scope of the embodiments herein; without departing from the principles of embodiments of the present application, several improvements and modifications may be made, and such improvements and modifications are also considered to be within the scope of the present application.

Claims

1. An image processing method, characterized in that the image processing method comprises:

inputting an original image into a feature extraction network to obtain a first feature map;

inputting the first feature map into a porous space pyramid pooling ASPP model, performing cavity convolution parallel sampling under multiple sampling rates to obtain a multi-scale feature map, inputting the multi-scale feature map into a deep feature extraction module, performing deep feature extraction to obtain a deep feature map, and inputting the first feature map into a shallow feature extraction module, performing shallow feature extraction to obtain a shallow feature map;

inputting the deep layer feature map and the shallow layer feature map into a splicing module to be spliced to obtain a second feature map;

performing semantic segmentation on the second feature map to obtain a semantic segmentation result of the original image,

wherein the ASPP model and the deep layer feature extraction module are used together as a high-level semantic feature extraction channel and used for acquiring high-level semantic features of the first feature map, the shallow layer feature extraction module is used as a low-level semantic feature extraction channel and used for acquiring low-level semantic features of the first feature map,

wherein the high-level semantic features of the first feature map are used for enhancing semantic information about an identification target in the semantic segmentation result of the original image, the low-level semantic features of the first feature map are used for enhancing boundary information about the identification target in the semantic segmentation result of the original image,

wherein the low-level semantic feature extraction channel is a boundary information recovery model associated with the ASPP model for recovering boundary information located in the first feature map but not in the multi-scale feature map output by the ASPP model,

wherein the boundary information recovery model is further configured to increase a resolution of features of the second feature map with respect to a boundary of the recognition target.

2. The image processing method according to claim 1, wherein the deep feature extraction module performs deep feature extraction according to a first feature extraction requirement, and the shallow feature extraction module performs shallow feature extraction according to a second feature extraction requirement, wherein the first feature extraction requirement is for high-level semantic features, and the second feature extraction requirement is for low-level semantic features.

3. The image processing method according to claim 1, wherein at least one of the plurality of sampling rates of the ASPP model may be increased to improve resolution of the feature of the second feature map with respect to the boundary of the recognition target.

4. The image processing method according to claim 3, wherein the accuracy of the data format used by the feature extraction network and the ASPP model to hold the weights and gradients can be increased to improve the resolution of the features of the second feature map with respect to the boundary of the recognition target.

5. The image processing method of claim 1, wherein the plurality of sampling rates of the ASPP model are associated with network model parameters of the feature extraction network and adjusted together to improve resolution of features of the second feature map with respect to a boundary of an identification target.

6. The image processing method of any of claims 1 to 5, wherein the original image is subjected to a data enhancement operation comprising at least one of: random inversion, rotation, inversion and rotation, random transformation, random scaling, random clipping, fuzzification, Gaussian noise addition and filling.

7. The image processing method according to any one of claims 1 to 5, wherein the image processing method is used for material detection in a process of transporting a scrap material set, the semantic segmentation recognition result of the original image comprises a semantic segmentation recognition result of the scrap material set of the original image, the semantic segmentation recognition result of the scrap material set of the original image is used for determining at least one kind of relevant information of the scrap material set of the original image, and the at least one kind of relevant information of the scrap material set of the original image comprises at least one of the following: contour information, category information, source information, coordinate information, area information, pixel feature information.

8. The image processing method according to claim 1, wherein the image processing method is used for compartment region detection, and the semantic segmentation result of the original image comprises a semantic segmentation recognition result of a compartment region contour of the original image.

9. The image processing method according to claim 8, wherein the ASPP model and the deep-level feature extraction module are used together as a high-level semantic feature extraction channel and for obtaining high-level semantic features of the first feature map, the shallow-level feature extraction module is used as a low-level semantic feature extraction channel and for obtaining low-level semantic features of the first feature map, the high-level semantic features of the first feature map are used for enhancing semantic information about the contour of the car region in the semantic segmentation result of the original image, and the low-level semantic features of the first feature map are used for enhancing boundary information about the contour of the car region in the semantic segmentation result of the original image.

10. The image processing method according to claim 9, wherein the low-level semantic feature extraction channel is a boundary information restoration model with respect to the ASPP model, the boundary information restoration model being used to restore boundary information about the contour of the car region that is located in the first feature map but not in the multi-scale feature map output by the ASPP model.

11. The image processing method according to claim 10, wherein at least one of the plurality of sampling rates of the ASPP model may be increased to improve resolution of the feature of the second feature map with respect to the contour of the cabin region.

12. The image processing method according to any one of claims 8 to 11, characterized by further comprising:

determining an optimal compartment contour and coordinate information of the optimal compartment contour according to a semantic segmentation recognition result of the compartment region contour of the original image through a compartment contour searching module, wherein the semantic segmentation recognition result of the compartment region contour of the original image indicates whether each pixel point of the original image belongs to a compartment region.

13. The image processing method according to claim 12, wherein determining, by the car contour search module, the optimal car contour and coordinate information of the optimal car contour according to the semantic segmentation recognition result of the car region contour of the original image comprises:

selecting a plurality of pixel points belonging to a compartment area from all pixel points of the original image according to a semantic segmentation recognition result of the compartment area outline of the original image;

carrying out contour calculation on the plurality of pixel points through the carriage contour searching module to obtain a plurality of candidate contours;

classifying the plurality of candidate contours and classifying each of the plurality of candidate contours as convex hull type, concave hull type, or irregular type;

and performing an optimal search algorithm on the candidate contours classified as convex hulls in the plurality of candidate contours through the carriage contour searching module, so as to obtain the optimal carriage contour and the coordinates of the optimal carriage contour.

14. A non-transitory computer readable storage medium storing computer instructions which, when executed by a processor, implement the image processing method according to any one of claims 1 to 13.

15. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor implements the image processing method according to any one of claims 1 to 13 by executing the executable instructions.

16. An image processing apparatus characterized by comprising:

the characteristic extraction network is used for obtaining a first characteristic diagram according to the original image;

the ASPP model is used for performing cavity convolution parallel sampling on the first characteristic diagram under a plurality of sampling rates to obtain a multi-scale characteristic diagram;

the deep feature extraction module is used for carrying out deep feature extraction on the multi-scale feature map to obtain a deep feature map;

the shallow feature extraction module is used for performing shallow feature extraction on the first feature map to obtain a shallow feature map;

the splicing module is used for splicing the deep layer feature map and the shallow layer feature map to obtain a second feature map; and

a semantic segmentation model for performing semantic segmentation on the second feature map to obtain a semantic segmentation result of the original image,

the ASPP model and the deep layer feature extraction module are used together as a high-level semantic feature extraction channel and used for acquiring high-level semantic features of the first feature map, the shallow layer feature extraction module is used as a low-level semantic feature extraction channel and used for acquiring low-level semantic features of the first feature map, the high-level semantic features of the first feature map are used for enhancing semantic information about an identified target in a semantic segmentation result of the original image, the low-level semantic features of the first feature map are used for enhancing boundary information about the identified target in the semantic segmentation result of the original image, the low-level semantic feature extraction channel is a boundary information recovery model associated with the ASPP model and used for recovering boundary information which is located in the first feature map but not located in the multi-scale feature map output by the ASPP model, the boundary information recovery model is further configured to increase a resolution of features of the second feature map with respect to a boundary of the recognition target.

17. The apparatus according to claim 16, wherein at least one of the plurality of sampling rates of the ASPP model may be increased, or wherein an accuracy of a data format used by the feature extraction network and the ASPP model to hold weights and gradients may be increased to improve a resolution of the features of the second feature map with respect to the boundary of the recognition target.

18. The image processing device according to claim 16, wherein the image processing device is configured to detect the scrap in a scrap set transportation process, the semantic segmentation result of the original image includes a semantic segmentation recognition result of the scrap set of the original image, the semantic segmentation recognition result of the scrap set of the original image is used to determine at least one piece of related information of the scrap set of the original image, and the at least one piece of related information of the scrap set of the original image includes at least one of: contour information, category information, source information, coordinate information, area information, pixel feature information.

19. The image processing apparatus according to claim 16, wherein the image processing apparatus is configured to detect a car region, and the semantic segmentation result of the original image includes a semantic segmentation recognition result of a car region contour of the original image, and wherein the image processing apparatus further includes:

and the carriage contour searching module is used for determining an optimal carriage contour and coordinate information of the optimal carriage contour according to a semantic segmentation recognition result of the carriage region contour of the original image, wherein the semantic segmentation recognition result of the carriage region contour of the original image indicates whether each pixel point of the original image belongs to a carriage region.

20. The image processing apparatus according to claim 19, wherein determining the optimal car contour and the coordinate information of the optimal car contour from the semantic segmentation recognition result of the car region contour of the original image comprises: