CN116452813B - Image processing method, system, equipment and medium based on space and semantic information - Google Patents

Image processing method, system, equipment and medium based on space and semantic information Download PDF

Info

Publication number
CN116452813B
CN116452813B CN202310698749.7A CN202310698749A CN116452813B CN 116452813 B CN116452813 B CN 116452813B CN 202310698749 A CN202310698749 A CN 202310698749A CN 116452813 B CN116452813 B CN 116452813B
Authority
CN
China
Prior art keywords
image
information
semantic information
semantic
extracting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310698749.7A
Other languages
Chinese (zh)
Other versions
CN116452813A (en
Inventor
韩军
马梦圆
黄惠玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Quanzhou Institute of Equipment Manufacturing
Original Assignee
Quanzhou Institute of Equipment Manufacturing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Quanzhou Institute of Equipment Manufacturing filed Critical Quanzhou Institute of Equipment Manufacturing
Priority to CN202310698749.7A priority Critical patent/CN116452813B/en
Publication of CN116452813A publication Critical patent/CN116452813A/en
Application granted granted Critical
Publication of CN116452813B publication Critical patent/CN116452813B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/778Active pattern-learning, e.g. online learning of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Abstract

The application is suitable for the field of computer vision, and provides an image processing method, an image processing system, image processing equipment and an image processing medium based on space and semantic information. The image processing method based on the space and semantic information comprises the following steps: acquiring a first image, and performing semantic extraction processing on the first image to obtain a second image; extracting semantic information from the second image, and carrying out semantic information detail adjustment on the first image by utilizing the semantic information to obtain a third image; and extracting space information from the third image, and carrying out semantic information guiding processing on the second image by utilizing the space information to obtain a fourth image. According to the application, through mutual learning optimization of the shallow space information and the deep semantic information, noise of shallow features is reduced rapidly and effectively, and then the deep features are guided to reconstruct the space information, so that the segmentation precision is effectively improved, the balance between the picture processing speed and the accuracy is achieved, and an additional side-road auxiliary or complex decoder is avoided.

Description

Image processing method, system, equipment and medium based on space and semantic information
Technical Field
The application belongs to the field of computer vision, and particularly relates to an image processing method, system, equipment and medium based on space and semantic information.
Background
Semantic segmentation is an important task in the field of computer vision, its application is widespread and evolving, with the aim of accurately predicting the label of each pixel in an image. The method is a key step for realizing visual scene understanding, and has wide application in the fields of automatic driving, medical image generation, image generation and the like.
The deep learning method is dominant in the semantic segmentation field, and a plurality of representative network models are proposed. Although deep learning methods are increasingly dominant in this field and many network models are proposed, these models either have higher accuracy but high computational cost, or are fast but low in accuracy, or make shallow features have more detailed information, and at the same time have much noise, or deep features have stronger semantic information, but lose some spatial information.
Therefore, it is difficult for the existing technology to satisfy both the accuracy and the speed of image processing.
Disclosure of Invention
The embodiment of the application aims to provide an image processing method based on space and semantic information, which aims to solve the problem that the prior art is difficult to meet the requirements of precision and speed of image processing at the same time.
The embodiment of the application is realized in such a way that an image processing method based on space and semantic information comprises the following steps:
acquiring a first image, and performing semantic extraction processing on the first image to obtain a second image;
extracting semantic information from the second image, and carrying out semantic information detail adjustment on the first image by utilizing the semantic information to obtain a third image;
and extracting space information from the third image, and carrying out semantic information guiding processing on the second image by utilizing the space information to obtain a fourth image.
Another object of an embodiment of the present application is an image processing system based on spatial and semantic information, the image processing system comprising:
the main network is used for acquiring a first image, and carrying out semantic extraction processing on the first image to obtain a second image;
the semantic adjustment detail module is used for extracting semantic information from the second image, and carrying out semantic information detail adjustment on the first image by utilizing the semantic information to obtain a third image;
the detail guiding semantic module is used for extracting space information from the third image, and conducting semantic information guiding processing on the second image by utilizing the space information to obtain a fourth image.
Another object of an embodiment of the application is a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the image processing method based on spatial and semantic information.
Another object of an embodiment of the present application is a computer-readable storage medium, on which a computer program is stored, which when being executed by a processor, causes the processor to perform the steps of the image processing method based on spatial and semantic information.
According to the image processing method based on the space and the semantic information, provided by the embodiment of the application, through mutual learning optimization of the shallow space information and the deep semantic information, the noise of shallow features is quickly and effectively reduced, and then the deep features are guided to reconstruct the space information, so that the segmentation precision is effectively improved, the balance between the picture processing speed and the accuracy is achieved, and an additional side-road auxiliary or complex decoder is avoided.
Drawings
FIG. 1 is a flow diagram of a method of image processing based on spatial and semantic information provided in one embodiment;
FIG. 2 is a block diagram of the spatial detail and semantic information mutual optimization network (DSMONet) provided in one embodiment;
FIG. 3 is a block diagram of a Mutual Optimization Module (MOM) provided in one embodiment;
FIG. 4 is a graph comparing segmentation accuracy (mIoU) and inference speed (FPS) on a Cityscapes test set, under one embodiment;
FIG. 5 is a block diagram of an image processing system based on spatial and semantic information provided in one embodiment;
FIG. 6 is a block diagram of the internal architecture of a computer device in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
It will be understood that the terms "first," "second," and the like, as used herein, may be used to describe various elements, but these elements are not limited by these terms unless otherwise specified. These terms are only used to distinguish one element from another element. For example, a first xx script may be referred to as a second xx script, and similarly, a second xx script may be referred to as a first xx script, without departing from the scope of this disclosure.
As shown in fig. 1, in one embodiment, an image processing method based on spatial and semantic information is provided, where the image processing method includes steps S102 to S106:
step S102, a first image is obtained, and semantic extraction processing is carried out on the first image to obtain a second image.
In this embodiment, the first image is a shallow image, the resolution of the shallow image is high, the image has more spatial detail information, but the noise is relatively more. The second image is a deep image, which is obtained by processing a shallow image, has relatively low resolution and strong semantic information, but lacks spatial information.
Specifically, the optimization processing method of step S102 is developed in detail as steps S202 to S204:
step S202, obtaining an original image to be processed, and reducing the resolution of the original image by increasing the number of channels of the feature map to obtain the first image.
Step S204, performing feature extraction on the first image in a backbone network, and performing context aggregation to obtain the second image; the resolution of the second image is lower than the resolution of the first image.
As shown in fig. 2, the original image is an image to be processed, and after 4 steps, the resolution of the original image is reduced by increasing the number of channels of the feature map, so as to obtain images 1-4 one by one. The original image and the images 1-4 form a backbone network. The backbone network selects lightweight STDCNet, which has 5 stages, each stage stride is 2, the number of characteristic map channels is increased, and the resolution is reduced to 1/32 of the input image. In order to acquire the feature containing the global context information, a DAPM module is added after the backbone network to further extract the context information from the low-resolution feature map, so as to obtain a second image. In the embodiment, the image 1 is set to be optimized as the first image and the 4 th image is set to be optimized as the second image, but in actual operation, the technical scheme of the embodiment can be realized as long as the resolution of the first image is larger than that of the second image, so that the method is not limited to the first image and the second image which are specifically corresponding to the partial images in the backbone network, and the resolution requirement is only satisfied.
Semantic segmentation is to extract semantic information of deep features, optimize the extracted features, and then up-sample and output. Specifically, semantic information of deep features is extracted through a backbone network such as ResNet, STDC and the like. In the segmentation head, the number of feature channels is reduced to the number of categories by a Conv-BN-Relu operation, while an upsampling operation is performed, the feature size is extended to the input image size, and then the label of each pixel is predicted using an argmax operation. Cross entropy loss with on-line hard-case miningAnd optimizing the model. Placing a semantic header at the output of the UAFM generates additional semantic lossesTo better optimize the overall network.BCE loss is used to highlight boundary regions, enhancing features of small objects. The final loss is:
empirically, parameters of the mutual optimization network (DSMONet, details and Semantic Mutual Optimization NET) training loss of spatial detail and semantic information are set to
Step S104, extracting semantic information from the second image, and carrying out semantic information detail adjustment on the first image by utilizing the semantic information to obtain a third image.
In this embodiment, as shown in fig. 2 and 3, the inputs of the mutual optimization module (MOM, mutual Optimization Module) are two parts, namely, a second image and a first image, of the feature map S and the trunk feature output feature map D after DAPPM context aggregation. Wherein S has strong semantic information and D has spatial detail information. Therefore, the core of MOM is the mutual optimization of S and D, and as shown in fig. 3, the two processes are mainly divided: part of the optimization of the high resolution feature map by filtering the noise by the edge information and edge operators of the low resolution feature map is implemented by a semantic adjustment detail module (SADM, semantic adjustment details module); the other part is to guide the deep feature reconstruction to lose the spatial information through the optimized spatial information, and the deep feature reconstruction is realized by a detail guide semantic module (DGSM, details guide semantics module).
Specifically, step S104 further includes steps S302 to S304:
step S302, decoupling the second image, and extracting a first edge feature from the second image.
Further, step S302 further includes steps S402-404:
step S402, decoupling the second image, and acquiring a subject feature of the second image through a stream-based body feature representation method.
Step S404, subtracting the main feature from the second image to obtain the first edge feature
Wherein S is the characteristic of the second image,is the main of the second imageBody characteristics.
In this embodiment, SADM first decouples a feature map S with strong semantic information. Decoupling the second image into a subject feature according to DecoupleSegNetAnd a first edge featureI.e. the above formula is satisfied. Acquiring subject features of feature map S of second image by stream-based body feature representation methodThen by explicitly subtracting the subject features from the feature map STo obtain a first edge feature
Step S304, extracting a second edge feature from the first image, and fusing the first edge feature and the second edge feature to obtain the third image.
Further, step S304 further includes steps S502-504:
step S502, optimizing the first image by using a laplace operator, and sampling the first image by using transpose convolution to obtain the second edge feature.
And step S504, carrying out feature fusion on the first edge feature and the second edge feature to obtain the third image.
In this embodiment, the high-resolution first image includes more detail information, and simultaneously has a lot of noise, and the edge information of the feature map is extracted by the laplace operator to obtain the second edge feature, so as to enhance the capturing capability of the model on the detail. Wherein, the following Laplace kernel can be selected
The kernel is incorporated into the network by a residual structure with laplace convolution. Using transpose convolution at first edge feature of second imageUp-sampling, wherein the up-sampled image has the same size as the optimized first image. Second edge characteristics and first edge characteristics after Laplace operator optimizationAnd (3) splicing, and then carrying out feature fusion through Conv-BN-Relu to obtain an optimized high-resolution third image. This process can be expressed as:
wherein, the liquid crystal display device comprises a liquid crystal display device,a third image, which is a high resolution feature, gamma is a convolution layer, representing a cascading operation,represents edge information extraction using the laplace operator.
In this embodiment, the core of DSMONet is the mutual optimization of the high-resolution profile and the low-resolution profile, thus involving many upsampling sites. Bilinear interpolation calculates the value of a new pixel by weighted averaging the distances of the neighboring four pixel points. It can quickly upsample the feature map but may result in loss of detail and blurring of edges due to its smoothing of the feature map. The transposed convolution can better retain the details and edge information of the feature map, so the embodiment selects the transposed convolution to realize the up-sampling operation.
Further, the calculation amount of each transposed convolutional layer can also be reduced by adding a plurality of transposed convolutional layers. To reduce the amount of computation, this embodiment uses 3 transposed convolutional layers, where the number of output channels and the convolution kernel size are different for each transposed convolutional layer. If 8-fold upsampling is achieved directly using transposed convolution, then the convolution kernel size is 8, the stride is 8, and the calculated amount of each transposed convolution layer is flows= 67108864HW. If 8-fold upsampling is achieved using 3 transposed convolutional layers, then the convolution kernel size of each convolutional layer is 2, the stride is 2, and the computation of each transposed convolutional layer is 6815744HW. It can be seen that the computational effort is reduced by a factor of 10 compared to directly using one layer of transposed convolution.
And S106, extracting space information from the third image, and carrying out semantic information guiding processing on the second image by utilizing the space information to obtain a fourth image.
In this embodiment, after the shallow detail features are optimized by the deep semantic information, the optimized detail features are further usedTo guide the deep features to reconstruct the missing spatial information, i.e. to construct a fourth image with spatial detail information using the third image guidance of the high resolution features. Specifically, step S106 includes steps S602 to S604:
step S602, attention operation is adopted on the second image, decoupled main body characteristics are obtained from the second image, and the main body characteristics and the second image after channel attention and space attention processing are added to obtain a fifth image;
step S604, extracting the spatial information from the third image, and multi-scale fusing the spatial information and the fifth image to obtain the fourth image.
In this embodiment, to avoid losing semantic information during the process, additional attention is taken to the second image S before spatial information is reconstructed to enhance the correlation between the feature channels. As shown in fig. 3, the processed second image is decoupled from the subject features in SADMAdding to obtain a fifth imageFifth imageThere is stronger semantic information than S. Then a third image of the high resolution featureAnd a fifth imageMulti-scale feature fusion is performed, using spatial attention to enhance feature representation, depending on the job. The overall process can be expressed as the following equation:
wherein, the liquid crystal display device comprises a liquid crystal display device,a fifth image is represented and is displayed,representing the bulk characteristics of the decoupling in SADM,representing a subject feature of the second image,a third image representing a high resolution feature,the table represents the fourth image output.
The embodiment of the application provides an image processing method based on space and semantic information, which is applied to a mutual optimization module MOM. MOM consists of two parts, one part is to optimize a high resolution feature map (SADM) by filtering the noise by edge information and edge operators of the low resolution feature map; another part is the guidance of deep feature reconstruction missing spatial information (DGSM) through optimized spatial information. The MOM rapidly and effectively reduces noise of shallow features through mutual learning optimization of shallow space information and deep semantic information, and then guides deep features to reconstruct the space information, so that segmentation precision is effectively improved, and balance between picture processing speed and accuracy is achieved.
In one embodiment, specific experimental details of the application are given, consisting of sections 4:
1. data set of experiments
Cityscapes is a large urban street scene dataset. It contains 5000 fine annotation images and 20000 coarse annotation images with an image resolution of 2048 x 1024. The fine annotation image is further divided into 2975, 500 and 1525 sheets for training, verification and testing, respectively. The annotation contains 30 classes, but only 19 for semantic segmentation.
CamVid provides 701 driving scene images, which are divided into 367, 101, and 233, for training, verification, and testing, respectively. The image resolution is 960×720. The annotated image provides 32 categories, with a subset of 11 categories being used in accordance with the general setup for the experiments of the present application.
2. Implementation details
1) Inference settings
The experiment uses a preheating strategy and a plurality of learning rate schedulers to update the learning rate of each iteration, and uses data enhancement technologies such as random scaling, random clipping, random horizontal overturning, random color dithering, normalization and the like. Because the same backbone network is used, the pre-training weights provided by PP-LiteSeg [20] are used. For the Cityscapes dataset, a batch size of 16, a maximum number of iterations of 160000, an initial learning rate of 0.005, and a weight decay of 5e-4 in the optimizer is used. For the CamVid dataset, a batch size of 24, a maximum number of iterations of 1000, an initial learning rate of 0.01, and a weight decay of 1e-4 were used. The random scaling ranges for the Cityscapes and CamVid datasets are [0.125,1.5] and [0.5,2.5], respectively. The clip resolution of Cityscapes is 1024 x 1024, while the clip resolution of CamVid is 960 x 720. The network of the present application was implemented using PaddleSeg [30], all experiments were performed on A100 GPU.
2) Inference settings
For fairness comparison, the model is exported to ONNX and performed using tensort. In the inference process, for Cityscapes and CamVid, the inference model takes the original image as input and the resolution is 960×720. All reasoning experiments were performed in an environment consisting of RTX 3090, CUDA 11.2, cuDNN 8.2 and TensorRT 8.1.3. In quantitative evaluation, a comparison of segmentation accuracy was made using a class-of-class-wise intersection-over-unit (mIoU), and a speed comparison was made using floating point operations (float point operations, flow) and frame rate per second (frames per second, FPS).
3. Comparison of experimental results with other most advanced methods
In this section, the network of the present application was tested on Cityscapes and CamVid and compared to the most advanced model, further demonstrating the semantic segmentation capabilities of DSMONet.
Table 1 comparison with the most advanced real-time method on the CamVid test set
Table 1 shows the results of the comparison with other methods, and the training and reasoning input resolution is 960 x 720, similar to other works. Among these, DSMONet reaches 76.1% mIoU and 94.3 FPS, which is the most advanced tradeoff between performance and speed. This further demonstrates the superiority of the process of the application.
Through the training and reasoning set-up mentioned above, DSMONet is compared to the most advanced model in the urban landscape dataset. The present application proposes DSMONet-T and DSMONet-B based on two versions of the backbone network STDC1 and STDC 2. As shown in table 2, model information, resolution, segmentation accuracy, and inference speed of various methods are given.
Table 2 comparison with the most advanced real-time method on Cityscapes
Fig. 4 provides an visual comparison of segmentation accuracy and inference speed. The training set and validation set are used to train the model of the present application before the results are uploaded to the official benchmark server. Experimental evaluation shows that compared with other methods, the DSMONet proposed by the present application achieves the most advanced balance between accuracy and speed. Wherein DSMONet-T reaches 78.2% mIoU and 78.1FPS, and higher precision can be achieved at similar reasoning speed. In addition, DSMONet-B achieved 80.5% mIoU, in Table 2 obtained the best accuracy of the test set. Compared with DDRNet-23, DSMONet-B has the advantages of 8.3FPS speed and 1% mIoU precision. In the visual segmentation result of DSMONet-B on the Cityscapes validation set, DSMONet is more capable of capturing details than PP-LiteSeg and STDCNet.
4. Ablation experiments
This section will introduce ablation experiments to verify the effectiveness of each component in the method of the present application. Mainly divided into a mutual optimization module, additional losses and additional training strategies. All experiments in this section were evaluation of DSMONet-B on the Cityscapes validation set. The baseline model is DSMONet-B without the proposed module.
1) Efficient mutual optimization module
The mutual optimization module is mainly divided into SADM and DGSM. Wherein deep semantic features are decoupled intoAndhere we express asAndand representing the feature map after the optimization of the Laplace operator. The core component of SADM isAndand fusing to form the optimized shallow layer characteristic. Attention mechanisms are also added before the optimized detail features guide deep featuresTo enhance the feature representation. In order to verify the validity of the module, the application performs split verification. Based on the proposed mutual optimization module, DSMONet-B achieves 80.5% mIoU and 44.4FPS. The mIoU was increased by 4.4% compared to baseline model. Qualitative comparisons in Table 3 show that the additions were made sequentially,Andthe result is more consistent with reality, especially for small objects. After the proposed modules are added step by step, the model has a significant improvement in the capture ability of details. The vehicle information in the frame is also more complete. In summary, the proposed module is effective for semantic segmentation.
So that the segmentation result is more consistent with the actual situation, especially for small objects. After the proposed modules are added step by step, the model has a significant improvement in the capture ability of details. The vehicle information in the frame is also more complete. In summary, the proposed module is effective for semantic segmentation.
Table 3 ablation experiments of mutual optimization modules
Wherein, the liquid crystal display device comprises a liquid crystal display device,is an edge feature that is provided with a feature,is edge information extracted using the laplace operator,is a feature of the main body of the device,is the mechanism of attention.
2) Effective additional loss
According to the structure of DSMONet we introduce additional losses to facilitate the optimization of the whole network. As can be seen from Table 4, additional lossesIs a necessary condition for DSMONet to obtain better performance, especially in the case of added lossesLater, the mIOU increased by 0.7%, which fully justifies the need for additional penalty, while on-line hard-example mining (OHEM) further improved accuracy.
Table 4 ablation experiments with additional losses and OHEM in DSMONet
3) Effective additional strategy
From the above analysis, to achieve a further balance of speed and accuracy, the present application uses an additional strategy: different upsampling patterns and additional attention mechanisms. The application performs ablation experiments on the selection of bilinear interpolation or transposed convolution to achieve upsampling and whether additional attention mechanisms are used. The results are shown in Table 5, and it can be seen that the selection of bilinear interpolation upsampling can achieve 79.3% mIoU and 50.7FPS, while the use of transposed convolution upsampling can achieve 80.5% mIoU and 44.4FPS. Because of the lighter model DSMONet-T, the application selects transpose convolution to promote the accuracy effectively, and the mIoU promotes by 1.2%. In order to fully utilize the feature fusion module, the application combines the channel attention and the space attention, and can reach 80.5% mIoU and 44.4FPS, thereby further balancing the speed and the precision.
Table 5 ablation experiments with additional attention and upsampling methods in DSMONet
As shown in fig. 5, in one embodiment, there is provided an image processing system based on spatial and semantic information, the image processing system comprising:
the backbone network 100 is configured to obtain a first image, and perform semantic extraction processing on the first image to obtain a second image;
the semantic adjustment detail module 200 is configured to extract semantic information from the second image, and perform semantic information detail adjustment on the first image by using the semantic information to obtain a third image;
the detail guiding semantic module 300 is configured to extract spatial information from the third image, and perform semantic information guiding processing on the second image by using the spatial information to obtain a fourth image.
FIG. 6 illustrates an internal block diagram of a computer device in one embodiment. As shown in fig. 6, the computer device includes a processor, a memory, a network interface, an input device, and a display screen connected by a system bus. The memory includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system, and may also store a computer program that, when executed by a processor, causes the processor to implement an image processing method based on spatial and semantic information. The internal memory may also have stored therein a computer program which, when executed by a processor, causes the processor to perform an image processing method based on spatial and semantic information. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structure shown in FIG. 6 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In one embodiment, a computer device is presented, the computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
step S102, a first image is obtained, semantic extraction processing is carried out on the first image, and a second image is obtained;
step S104, extracting semantic information from the second image, and carrying out semantic information detail adjustment on the first image by utilizing the semantic information to obtain a third image;
and S106, extracting space information from the third image, and carrying out semantic information guiding processing on the second image by utilizing the space information to obtain a fourth image.
In one embodiment, a computer readable storage medium is provided, having a computer program stored thereon, which when executed by a processor causes the processor to perform the steps of:
step S102, a first image is obtained, semantic extraction processing is carried out on the first image, and a second image is obtained;
step S104, extracting semantic information from the second image, and carrying out semantic information detail adjustment on the first image by utilizing the semantic information to obtain a third image;
and S106, extracting space information from the third image, and carrying out semantic information guiding processing on the second image by utilizing the space information to obtain a fourth image.
It should be understood that, although the steps in the flowcharts of the embodiments of the present application are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in various embodiments may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.
Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a non-volatile computer readable storage medium, and where the program, when executed, may include processes in the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.
The foregoing description of the preferred embodiments of the application is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the application.

Claims (5)

1. An image processing method based on spatial and semantic information, the image processing method comprising:
acquiring a first image, and performing semantic extraction processing on the first image to obtain a second image;
extracting semantic information from the second image, and carrying out semantic information detail adjustment on the first image by utilizing the semantic information to obtain a third image;
extracting space information from the third image, and carrying out semantic information guiding processing on the second image by utilizing the space information to obtain a fourth image;
the method for extracting semantic information from the second image, and carrying out semantic information detail adjustment on the first image by utilizing the semantic information to obtain a third image comprises the following steps:
decoupling the second image and extracting a first edge feature from the second image;
extracting a second edge feature from the first image, and fusing the first edge feature and the second edge feature to obtain the third image;
the decoupling the second image and extracting a first edge feature from the second image comprises the following steps:
decoupling the second image according to the decoupleSegNet, and acquiring main body characteristics of the second image;
subtracting the main body characteristic from the second image to obtain the first edge characteristic;
the step of extracting a second edge feature from the first image, merging the first edge feature and the second edge feature to obtain the third image comprises the following steps:
optimizing the first image by using a Laplace operator, and sampling the first image by using transpose convolution to obtain the second edge feature;
performing feature fusion on the first edge feature and the second edge feature to obtain the third image;
the step of extracting spatial information from the third image, and performing semantic guidance processing on the second image by using the spatial information comprises the following steps:
performing attention operation on the second image, acquiring decoupled main body characteristics from the second image, and performing addition processing on the main body characteristics and the second image subjected to channel attention and spatial attention processing to obtain a fifth image;
and extracting the spatial information from the third image, and carrying out multi-scale fusion on the spatial information and the fifth image to obtain the fourth image.
2. The method according to claim 1, wherein the semantic extraction process is performed on the first image to obtain a second image, and the method comprises the following steps:
acquiring an original image to be processed, and reducing the resolution of the original image by increasing the number of channels of a feature map to obtain the first image;
performing feature extraction on the first image in a backbone network, and performing context aggregation to obtain the second image; the resolution of the second image is lower than the resolution of the first image.
3. An image processing system based on spatial and semantic information, the image processing system comprising:
the main network is used for acquiring a first image, and carrying out semantic extraction processing on the first image to obtain a second image;
the semantic adjustment detail module is used for extracting semantic information from the second image, and carrying out semantic information detail adjustment on the first image by utilizing the semantic information to obtain a third image;
the detail guiding semantic module is used for extracting space information from the third image, and conducting semantic information guiding processing on the second image by utilizing the space information to obtain a fourth image.
4. A computer device comprising a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, causes the processor to perform the steps of the spatial and semantic information based image processing method according to any of claims 1 or 2.
5. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, causes the processor to carry out the steps of the spatial and semantic information based image processing method according to any one of claims 1 or 2.
CN202310698749.7A 2023-06-14 2023-06-14 Image processing method, system, equipment and medium based on space and semantic information Active CN116452813B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310698749.7A CN116452813B (en) 2023-06-14 2023-06-14 Image processing method, system, equipment and medium based on space and semantic information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310698749.7A CN116452813B (en) 2023-06-14 2023-06-14 Image processing method, system, equipment and medium based on space and semantic information

Publications (2)

Publication Number Publication Date
CN116452813A CN116452813A (en) 2023-07-18
CN116452813B true CN116452813B (en) 2023-08-22

Family

ID=87122244

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310698749.7A Active CN116452813B (en) 2023-06-14 2023-06-14 Image processing method, system, equipment and medium based on space and semantic information

Country Status (1)

Country Link
CN (1) CN116452813B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114549555A (en) * 2022-02-25 2022-05-27 北京科技大学 Human ear image planning and division method based on semantic division network
CN115359372A (en) * 2022-07-25 2022-11-18 成都信息工程大学 Unmanned aerial vehicle video moving object detection method based on optical flow network
CN115546485A (en) * 2022-10-17 2022-12-30 华中科技大学 Construction method of layered self-attention field Jing Yuyi segmentation model
CN116229461A (en) * 2023-01-31 2023-06-06 西南大学 Indoor scene image real-time semantic segmentation method based on multi-scale refinement

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11188799B2 (en) * 2018-11-12 2021-11-30 Sony Corporation Semantic segmentation with soft cross-entropy loss

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114549555A (en) * 2022-02-25 2022-05-27 北京科技大学 Human ear image planning and division method based on semantic division network
CN115359372A (en) * 2022-07-25 2022-11-18 成都信息工程大学 Unmanned aerial vehicle video moving object detection method based on optical flow network
CN115546485A (en) * 2022-10-17 2022-12-30 华中科技大学 Construction method of layered self-attention field Jing Yuyi segmentation model
CN116229461A (en) * 2023-01-31 2023-06-06 西南大学 Indoor scene image real-time semantic segmentation method based on multi-scale refinement

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
用于场景分割的改进 DeepLabV3 + 算法;桑永龙,韩军;《电光与控制》;第29卷(第3期);47-52 *

Also Published As

Publication number Publication date
CN116452813A (en) 2023-07-18

Similar Documents

Publication Publication Date Title
CN111047516B (en) Image processing method, image processing device, computer equipment and storage medium
Zhang et al. Swinfir: Revisiting the swinir with fast fourier convolution and improved training for image super-resolution
US20200234447A1 (en) Computer vision system and method
CN101477684B (en) Process for reconstructing human face image super-resolution by position image block
CN110490082B (en) Road scene semantic segmentation method capable of effectively fusing neural network features
US9865036B1 (en) Image super resolution via spare representation of multi-class sequential and joint dictionaries
CN110555433B (en) Image processing method, device, electronic equipment and computer readable storage medium
CN109523470B (en) Depth image super-resolution reconstruction method and system
CN103279933B (en) A kind of single image super resolution ratio reconstruction method based on bilayer model
CN113159143B (en) Infrared and visible light image fusion method and device based on jump connection convolution layer
CN113111835B (en) Semantic segmentation method and device for satellite remote sensing image, electronic equipment and storage medium
CN112651979A (en) Lung X-ray image segmentation method, system, computer equipment and storage medium
CN110826609B (en) Double-current feature fusion image identification method based on reinforcement learning
CN111275034A (en) Method, device, equipment and storage medium for extracting text region from image
CN113674191B (en) Weak light image enhancement method and device based on conditional countermeasure network
CN115731505B (en) Video salient region detection method and device, electronic equipment and storage medium
CN109543685A (en) Image, semantic dividing method, device and computer equipment
CN110689509A (en) Video super-resolution reconstruction method based on cyclic multi-column 3D convolutional network
CN115526777A (en) Blind over-separation network establishing method, blind over-separation method and storage medium
Wu et al. Cross-view panorama image synthesis with progressive attention GANs
Xu et al. Image enhancement algorithm based on generative adversarial network in combination of improved game adversarial loss mechanism
Li et al. NDNet: Spacewise multiscale representation learning via neighbor decoupling for real-time driving scene parsing
CN116452813B (en) Image processing method, system, equipment and medium based on space and semantic information
CN112614108B (en) Method and device for detecting nodules in thyroid ultrasound image based on deep learning
CN113743346A (en) Image recognition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant