CN115100417A - Image processing method, storage medium, and electronic device - Google Patents

Image processing method, storage medium, and electronic device Download PDF

Info

Publication number
CN115100417A
CN115100417A CN202210662381.4A CN202210662381A CN115100417A CN 115100417 A CN115100417 A CN 115100417A CN 202210662381 A CN202210662381 A CN 202210662381A CN 115100417 A CN115100417 A CN 115100417A
Authority
CN
China
Prior art keywords
target
feature
image
sample
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210662381.4A
Other languages
Chinese (zh)
Inventor
孙修宇
姜奕祺
许贤哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202210662381.4A priority Critical patent/CN115100417A/en
Publication of CN115100417A publication Critical patent/CN115100417A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/16Image acquisition using multiple overlapping images; Image stitching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/176Urban or other man-made structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/188Vegetation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an image processing method, a storage medium and an electronic device. Wherein, the method comprises the following steps: acquiring a target image, wherein the target image comprises a target object; performing multi-scale feature extraction on the target image to obtain a plurality of first feature maps; performing feature fusion on the plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, wherein the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches; and detecting at least one second characteristic diagram to obtain a detection result of the target object. The invention solves the technical problem of lower accuracy of image detection in the related technology.

Description

Image processing method, storage medium, and electronic device
Technical Field
The present invention relates to the field of image processing, and in particular, to an image processing method, a storage medium, and an electronic device.
Background
At present, in a computer vision task, a deep learning technology is generally adopted to detect an object in an image, the complexity of the deep learning technology is high, so that the detection efficiency is low, and the detection accuracy is poor for a detection mode with low complexity.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the application provides an image processing method, a storage medium and electronic equipment, which are used for at least solving the technical problem of low accuracy of image detection in the related art.
According to an aspect of an embodiment of the present application, there is provided an image processing method including: acquiring a target image, wherein the target image comprises a target object; performing multi-scale feature extraction on the target image to obtain a plurality of first feature maps; performing feature fusion on the plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, wherein the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches; and detecting at least one second characteristic diagram to obtain a detection result of the target object.
According to an aspect of an embodiment of the present application, there is provided an image processing method including: acquiring a target remote sensing image, wherein the target remote sensing image comprises a target object; carrying out multi-scale feature extraction on the target remote sensing image to obtain a plurality of first feature maps; performing feature fusion on the plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, wherein the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches; and detecting at least one second characteristic diagram to obtain a detection result of the target object.
According to an aspect of an embodiment of the present application, there is provided an image processing method including: acquiring an agricultural remote sensing image, wherein the agricultural remote sensing image comprises crops; carrying out multi-scale feature extraction on the agricultural remote sensing image to obtain a plurality of first feature maps; performing feature fusion on the plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, wherein the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches; and detecting at least one second characteristic diagram to obtain the detection result of the crops.
According to an aspect of an embodiment of the present application, there is provided an image processing method including: obtaining a remote sensing image of a building, wherein the remote sensing image of the building comprises a target building; carrying out multi-scale feature extraction on the building remote sensing image to obtain a plurality of first feature maps; performing feature fusion on the plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, wherein the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches; and detecting at least one second characteristic diagram to obtain a detection result of the target building.
According to an aspect of an embodiment of the present application, there is provided an image processing method including: the cloud server acquires a target image, wherein the target image comprises a target object; the cloud server performs multi-scale feature extraction on the target image to obtain a plurality of first feature maps; the cloud server performs feature fusion on the plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, wherein the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches; and the cloud server detects the at least one second characteristic graph to obtain a detection result of the target object.
In the embodiment of the application, a target image is obtained firstly, wherein the target image comprises a target object; performing multi-scale feature extraction on the target image to obtain a plurality of first feature maps; and performing feature fusion on the plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, wherein the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches to obtain at least one second feature map, and the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through the plurality of branches, detecting the at least one second feature map to obtain a detection result of the target object, so that the accuracy of the detection result of the target object is improved. It is easy to note that the feature fusion can be performed on the first feature map through a plurality of branches, so as to improve the accuracy of the obtained at least one second feature map, and the plurality of branches can also reduce the number of parameters in the fusion process, so as to improve the fusion efficiency, thereby solving the technical problem of low accuracy in detecting images in the related art.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention to a proper form. In the drawings:
fig. 1 is a block diagram of a hardware structure of a computer terminal (or mobile device) for implementing an image processing method according to an embodiment of the present application;
fig. 2 is a flowchart of an image processing method according to embodiment 1 of the present application;
FIG. 3 is a schematic diagram of a Giraffe target detector according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a detector according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a multi-drop network architecture according to an embodiment of the present application;
FIG. 6 is a schematic diagram of detection by a target detection model according to an embodiment of the present application;
FIG. 7 is a schematic diagram of object detection model training according to an embodiment of the present application;
fig. 8 is a flowchart of an image processing method according to embodiment 2 of the present application;
fig. 9 is a flowchart of an image processing method according to embodiment 3 of the present application;
fig. 10 is a flowchart of an image processing method according to embodiment 4 of the present application;
FIG. 11 is a flowchart of an image processing method according to embodiment 5 of the present application;
fig. 12 is a schematic diagram of an image processing apparatus according to embodiment 6 of the present application;
fig. 13 is a schematic diagram of an image processing apparatus according to embodiment 7 of the present application;
fig. 14 is a schematic diagram of an image processing apparatus according to embodiment 8 of the present application;
fig. 15 is a schematic diagram of an image processing apparatus according to embodiment 9 of the present application;
fig. 16 is a block diagram of a computer terminal according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present invention better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
First, some terms or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:
GFPN: the Generalized-Feature Pyramid network structure is used for detecting the Feature fusion part and performing Feature fusion under different scales.
GFocalV 2: the Generalized local V2 is a special network structure used for matching the target with the predicted value in target detection and for feature detection.
YOLOv1 YOLOv2 YOLOv3 YOLOv4 YOLOv5 YOLOX: a series of special picture-based detection methods.
At present, image-based object detection is a basic technology in a machine vision task and is widely applied to industries such as remote sensing, security protection, homeland, water conservancy, retail and the like. The object detection technology based on deep learning is the mainstream method at present, but the calculation complexity of the scheme is often very high, and the practical and practical requirements are difficult to meet.
The application provides an image processing method which can improve the accuracy of a detection result while improving the image detection efficiency.
Example 1
There is also provided, in accordance with an embodiment of the present application, an image processing method embodiment, it should be noted that the steps illustrated in the flowchart of the accompanying drawings may be performed in a computer system such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than here.
The method provided by the first embodiment of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Fig. 1 is a block diagram of a hardware structure of a computer terminal (or mobile device) for implementing an image processing method according to an embodiment of the present application. As shown in fig. 1, the computer terminal 10 (or mobile device 10) may include one or more processors (shown as 102a, 102b, … …, 102n in the figures) which may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA, a memory 104 for storing data, and a transmission module 106 for communication functions. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial BUS (USB) port (which may be included as one of the ports of the BUS), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
It should be noted that the one or more processors and/or other data processing circuitry described above may be generally referred to herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuit may be a single stand-alone processing module, or incorporated in whole or in part into any of the other elements in the computer terminal 10 (or mobile device). As referred to in the embodiments of the application, the data processing circuit acts as a processor control (e.g. selection of a variable resistance termination path connected to the interface).
The memory 104 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the image processing method in the embodiment of the present application, and the processor executes various functional applications and data processing by running the software programs and modules stored in the memory 104, so as to implement the image processing method described above. The memory 104 may include high speed random access memory and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer terminal 10 (or mobile device).
It should be noted here that in some alternative embodiments, the computer device (or mobile device) shown in fig. 1 described above may include hardware elements (including circuitry), software elements (including computer code stored on a computer-readable medium), or a combination of both hardware and software elements. It should be noted that fig. 1 is only one example of a specific example and is intended to illustrate the types of components that may be present in the computer device (or mobile device) described above.
Under the above operating environment, the present application provides an image processing method as shown in fig. 2. Fig. 2 is a flowchart of an image processing method according to embodiment 1 of the present application. As shown in fig. 2, the method includes:
step S202, a target image is acquired.
Wherein the target image contains the target object.
The target image may be an image including a target object to be detected, wherein the target image may be a remote sensing image acquired by an unmanned aerial vehicle and/or a satellite, and the target image may also be an image obtained by shooting by a shooting device.
The target object may be a specific object to be detected in the target image.
In an agricultural scene, the target image can be an agricultural remote sensing image, and the target object can be a crop to be detected.
In a building scene, the target image can be a building remote sensing image, and the target object can be a building to be detected.
In an alternative embodiment, in order to better process the target image, the acquired target image may be transmitted to a corresponding processing device for processing, for example, directly transmitted to a computer terminal (e.g., a laptop, a personal computer, etc.) of the user for processing, or transmitted to a cloud server through the computer terminal of the user for processing. It should be noted that, since a large amount of computing resources are required for processing the target image, in the embodiment of the present application, the processing device is taken as a cloud server as an example for description.
For example, in order to facilitate the user to upload the target image, an interactive interface may be provided for the user, where the interactive interface includes controls such as "select image", "upload", "image display", and the like, and the user may click the "select image" button to determine the target image that needs to be uploaded, and upload the target image to the cloud server for processing by clicking the "upload" button. In addition, in order to facilitate the user to confirm whether the selected target image is a target image to be processed, the selected target image may be displayed in the "image display" area, and after the user confirms that there is no error, data may be uploaded by clicking the "upload" button.
It should be noted that data interaction can be performed between the client and the cloud server through a specific interface, and the client can transmit the description page of the target object selected by the user into the interface function and use the description page as a parameter of the interface, so as to achieve the purpose of uploading the description page of the target object to the cloud server.
And S204, performing multi-scale feature extraction on the target image to obtain a plurality of first feature maps.
In an optional embodiment, multi-scale feature extraction may be performed on the target image so as to obtain a plurality of first feature maps of different scales, and the plurality of first feature maps may be fused through a preset feature fusion policy so as to improve detection accuracy of the feature maps.
In another alternative embodiment, a feature extraction layer in a GiraffeDet target detector (giraffedett) can be used for performing multi-scale feature extraction on a target image to obtain a plurality of first feature maps, wherein the scales of the plurality of first feature maps are different. Optionally, the feature extraction layer includes a plurality of scaling layers (scales), the scaling layers have different corresponding sizes, and the multi-scale feature extraction may be performed on the target image according to the scaling layers to obtain a plurality of first feature maps.
Step S206, the multi-branch network structure is used for carrying out feature fusion on the plurality of first feature maps to obtain at least one second feature map.
The multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches.
The multi-branch network structure may include two or more branches, and the number of the branches may be set by a user, or may be flexibly adjusted according to a requirement.
In an optional embodiment, the multiple first feature maps may be repeatedly feature-fused by using a multi-branch network according to a preset feature-fusion policy to obtain multiple second feature maps, where the feature-fusion policy may be a feature-fusion policy used by GiraffeDet.
FIG. 3 is a schematic diagram of a Giraffe target detector according to an embodiment of the application, and as shown in FIG. 3, S1-S5 are multiple first feature maps of different scales extracted by the feature extraction layer, and the multiple first feature maps can be fused repeatedly and by using log2 N The feature maps obtained by fusion are connected in the form of the feature maps to obtain the feature map which needs to be detected finally. In the merging section, the feature maps of different sizes may be up-down sampled in the manner shown in fig. 3, wherein the upward arrow indicates the process of down-sampling, the upward arrow indicates the process of up-sampling, and then the first feature map of different sizes may be down-sampledSplicing the feature maps, inputting the spliced feature maps into corresponding multi-branch network structures, fusing the first feature maps with different sizes through the multi-branch network structures to obtain fused features, such as S5_0, repeatedly fusing the first feature maps with different sizes and the fused features, and stacking the fused features to obtain a final feature, such as S5_ N, wherein N represents the stacking times, S5_0 in FIG. 3 is obtained by fusing S4 and S5, S4_0 is obtained by fusing S3, S4 and S5, S5_1 is obtained by fusing S5, S5_0 and S4_0, wherein a cross-connection exists before S5_1 and S5, and the cross-connection is obtained according to log2 N Are connected.
In an alternative embodiment, the framework for detecting the target object in the target image may be a backbone (feature extraction network) -neck (fusion of shallow high-resolution detail feature map and low-resolution semantic feature map) -head (detector), where currently, backbone: tack: the calculation proportion of the head is generally 2-4:1:2, wherein the calculation proportion of the fusion part is smaller, so that the subsequent calculation precision is lower, and in order to solve the problem, the calculation proportion is adjusted to be 1:5:1, the calculation proportion of the fusion part is increased, and the detection precision can be further improved.
The above calculation ratio is a distribution of calculation amounts, and assuming that a total calculation amount of 100GFLOPs is required, the calculation of the backbone portion is about 2/5 × 100 to 40 GFLOPs; the specific calculation is the standard operation of corresponding convolution in the neural network.
And S208, detecting at least one second characteristic diagram to obtain a detection result of the target object.
The detection result may be a target detection frame, where the target detection frame is used to label a target object in a target image; the detection result may further include a category of the target object, wherein the category of the target object may be marked beside the target detection box or elsewhere.
In an optional embodiment, at least one second feature map may be obtained by fusing a plurality of first feature maps, and for second feature maps of different scales, different detectors may be used for detection, and different detectors have the same structure, that is, the length, the width, and the height corresponding to different detectors are the same, but parameters in the detectors are different, and because the accuracy of the second feature maps of different scales is different, the detection of the second feature maps of different scales by the detectors of different accuracies can further improve the detection effect of the feature map of the scale, thereby improving the detection accuracy of the feature map of the scale. The detection precision of the detector can be higher by adjusting the parameters of the detector so as to detect the second characteristic diagram with higher precision, thereby improving the detection precision, and the detection precision of the detector can be lower by adjusting the parameters of the detector so as to detect the second characteristic diagram with lower precision, thereby improving the detection precision. The detection precision can be higher under the same calculation amount through the arrangement.
In an alternative embodiment, GFocalV2 Head (detection algorithm) is used as the base detector structure.
FIG. 4 is a schematic diagram of a detector according to an embodiment of the present application, the left side of which shows the manner in which the detector is used as a target, and which uses the same detector for detection with respect to a feature map of a different scale; the detector shown on the right is the mode of using the detector in the application, and the detector with different parameters and the same structure is adopted for detecting the feature maps with different scales, so that the detection accuracy is improved.
In another alternative embodiment, a target image may be acquired, wherein the target image contains a target object; multi-scale feature extraction can be carried out on a target image by utilizing a feature extraction layer in a target detection model to obtain a plurality of first feature maps; the method comprises the steps that a multi-branch network structure in a target detection model is used for carrying out feature fusion on a plurality of first feature graphs to obtain at least one second feature graph, wherein the multi-branch network structure is used for carrying out feature fusion on the plurality of first feature graphs through a plurality of branches; the at least one second feature map may be detected by using a detector in the target detection model, so as to obtain a detection result of the target object.
Through the steps, firstly, a target image is obtained, wherein the target image comprises a target object; performing multi-scale feature extraction on the target image to obtain a plurality of first feature maps; and performing feature fusion on the plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, wherein the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches to obtain at least one second feature map, and the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through the plurality of branches, detecting the at least one second feature map to obtain a detection result of the target object, so that the accuracy of the detection result of the target object is improved. It is easy to note that the feature fusion can be performed on the first feature map through multiple branches, so as to improve the accuracy of the obtained at least one second feature map, and the multiple branches can also reduce the number of parameters in the fusion process, so as to improve the fusion efficiency, thereby solving the technical problem of low accuracy in detecting images in the related art.
In the above embodiments of the present application, the multi-branch network structure includes: the device comprises a first branch and a second branch, wherein the output of the first branch is connected with the output of the second branch.
The first branch may include a 1 × 1 convolutional layer, and the second branch may include N1 × 1 and 3 × 3 convolutional blocks, where the N convolutional blocks may be connected in sequence. It should be noted that 3 × 3 in the second branch may be replaced with another convolutional layer, for example, 3 × 3 may be replaced with 5 × 5, but the present invention is not limited thereto.
In an alternative embodiment, the multi-drop network architecture may include: the first convolutional layer, the first branch, the second branch, and the output of the first convolutional layer are connected with the input of the first branch and the input of the second branch, and the output of the first branch is connected with the output of the second branch.
The first winding layer may be 1 × 1.
Fig. 5 is a schematic diagram of a multi-branch network structure according to an embodiment of the present application, and as shown in fig. 5, a plurality of first feature maps to be fused may be processed and then merged to obtain a merged feature map, the merged feature map may be input to the first branch and the second branch respectively to be processed to obtain two output feature maps output by the two branches, and the two output feature maps may be spliced to obtain a second feature map.
In another optional embodiment, the multi-branch network structure may further include a plurality of branches, which are not limited herein, wherein the first branch may include a plurality of sub-branches, and the second branch may also include a plurality of sub-branches.
In the above embodiment of the present application, performing feature fusion on a plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, where the method includes: channel merging is carried out on the first feature maps to obtain a merged feature map; carrying out convolution processing on the combined feature graph by using the first branch to obtain a first output feature; carrying out convolution processing on the merged feature graph by using the second branch to obtain a second output feature; and channel combination is carried out on the first output characteristic and the second output characteristic to obtain at least one second characteristic diagram.
The first branch may include convolution layers with different convolution kernels, and the merged feature map is processed by convolution layers corresponding to convolution kernels with different sizes, so that the calculation amount of convolution operation can be reduced, and the fusion efficiency can be improved.
In an alternative embodiment, the first convolution layer may be used to directly perform channel merging on the plurality of first feature maps, so as to obtain the merged feature map. The convolution operation can be performed on the multiple first feature maps by using the first convolution layer to obtain multiple third feature maps, so that the sizes of the multiple first feature maps can be unified, the subsequent merging process is facilitated, the channels of the multiple third feature maps can be merged, the effect of splicing the multiple third feature maps is achieved, the merged feature maps are obtained, the merged feature maps can be subjected to convolution processing by using the first branch to obtain the first output feature, the merged feature maps can be subjected to convolution processing by using the second branch to obtain the second output feature, and the channels of the first output feature and the channels of the second output feature can be merged to obtain the second feature map.
Furthermore, the second feature map may be used as the first feature map, and the multi-branch network may be used to continue to process the plurality of first feature maps to obtain a new second feature map, and the step may be repeated a plurality of times to obtain a plurality of second feature maps.
In the above embodiments of the present application, the first branch includes: at least one convolution block, each of the plurality of convolution blocks including a plurality of sub-convolution layers, the plurality of sub-convolution layers having different convolution kernels.
In an alternative embodiment, the number of the volume blocks may be determined according to the number of the input first feature maps, and if the number of the input first feature maps is N, the number of the volume blocks is N.
The convolution kernels of the sub-convolution layers have different sizes, wherein the convolution kernel corresponding to the convolution layer located before the convolution block in the sub-convolution layers may be smaller than the convolution kernel corresponding to the convolution layer located after the convolution block.
The plurality of sub-convolution layers may be 1 × 1 and 3 × 3, respectively.
The plurality of sub convolutional layers included in the plurality of convolutional blocks may be the same. The plurality of sub-convolutional layers included in the plurality of convolutional blocks may be different.
In the above embodiment of the present application, performing convolution processing on the merged feature map by using the first branch to obtain the first output feature includes: and performing convolution processing on the combined feature graph by using at least one convolution block to obtain a first output feature.
The convolution block includes a first sub-convolution layer and a second sub-convolution layer, wherein the first sub-convolution layer may be a convolution layer with a smaller convolution kernel, and the second sub-convolution layer may be a convolution layer with a larger convolution kernel. The first sub-convolution layer may be 1 × 1 and the second sub-convolution layer may be 3 × 3.
In an alternative embodiment, in the case of including one convolution block, the first sub-convolution layer in the convolution block is used to perform convolution operation on the merged feature to obtain a processing result, and the processing result may be input into the second sub-convolution layer and subjected to convolution operation by using the second sub-convolution layer to obtain the first output feature.
In the above embodiment of the present application, performing convolution processing on the merged feature map by using the second branch to obtain the second output feature includes: and carrying out convolution processing on the merged feature map by using the three convolution layers to obtain a second output feature.
The third convolution layer may be 1 × 1.
In the above embodiment of the present application, the method further includes: and detecting the target image by using a target detection model to obtain a detection result of the target object, wherein the target detection model is obtained by training based on the target sample image, and the target sample image is obtained by performing data enhancement on a plurality of sample images through a sample detection frame.
In an alternative embodiment, the target image may be input into the target detection model, and the target detection model is used to detect the target image, so as to obtain a detection result of the target object. The main framework of the target detection model can be a backbone-tack-head, multi-scale feature extraction can be performed on a target image by using the backbone to obtain a plurality of first feature maps, feature fusion is performed on the plurality of first feature maps by using a multi-branch network structure by using the tack to obtain at least one second feature map, wherein the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches; and finally, detecting at least one second characteristic graph by using the head to obtain a detection result of the target object.
In an alternative embodiment, the target detection model may be a common detection model, which trains the sampled sample image to be different from a conventionally used sample image, and mainly performs data enhancement on a plurality of sample images through a sample detection frame to obtain a target sample image.
In another optional embodiment, the target detection model may include a feature extraction layer, a multi-branch network structure, and a detection layer, where the feature extraction layer is configured to perform multi-scale feature extraction on a target image to obtain a plurality of first feature maps, the multi-branch network structure is configured to fuse the plurality of first feature maps to obtain at least one second feature map, and the detection layer is configured to detect the at least one second feature map to obtain a detection result of the target object. The multi-branch network structure may comprise a first branch and a second branch, wherein an output of the first branch is connected to an output of the second branch.
Fig. 6 is a schematic diagram of detection performed by a target detection model according to an embodiment of the present application, where a target image to be detected may be input into the target detection model, and the target detection model may output a detection result of a target object in the target image, where the detection result is obtained by labeling the target object with a target detection frame.
In the above embodiment of the present application, the method further includes: obtaining a plurality of sample images and sample detection frames corresponding to the plurality of sample images, wherein the sample detection frames are used for marking target objects in the sample images; determining a preset number of sample detection frames in a plurality of sample images as target detection frames; performing data enhancement on a target object corresponding to the target detection frame to obtain a target sample image; and training the initial detection model by using the target sample image to obtain a target detection model.
In an alternative embodiment, the plurality of sample images may be mixed to obtain a mixed image, and then a preset number of sample detection frames in the plurality of sample detection frames included in the mixed image may be determined as the target detection frame.
In another optional embodiment, a plurality of sample images and sample detection frames corresponding to the plurality of sample images may be randomly selected from a training data set, data enhancement may be performed on the plurality of sample images according to box levels of the plurality of sample images, global data enhancement may be performed on the plurality of sample images for one time to obtain a plurality of enhanced sample images, then a preset number of sample detection frames are randomly selected from the plurality of enhanced sample images to perform local data enhancement on target detection frames, data enhancement may be performed on target objects corresponding to the target detection frames to obtain target sample images, and finally an initial detection model is trained by using the target sample images to obtain a target model.
In another optional embodiment, a preset number of sample detection frames may be randomly selected from the plurality of sample images to perform local data enhancement on the target detection frame, data enhancement may be performed on a target object corresponding to the target detection frame to obtain a plurality of enhanced sample images, then, global data enhancement is performed on the plurality of enhanced sample images for one time to obtain target sample images, and finally, the initial detection model is trained by using the target sample images to obtain the target model.
The global data enhancement may be color changing, rotating, contrast enhancing, random erasing, scaling, cropping, etc. for a plurality of sample images. The local data enhancement may be color changing, rotating, contrast enhancing, random erasing, scaling, etc. of the target object in the target detection frame.
In the above embodiments of the present application, determining a preset number of sample detection frames in a plurality of sample images as target detection frames includes: splicing the plurality of sample images to obtain an initial sample image; mixing the initial sample image and a preset sample image to obtain a mixed image; and determining a preset number of sample detection frames in the mixed image as target detection frames.
In an optional embodiment, a plurality of sample images may be sequentially stitched to obtain an initial sample image; optionally, a plurality of sample images may be spliced to a white background image, the white background image is filled to obtain an initial sample image, and if the white background image is not filled by the plurality of sample images, a part of the sample images may be continuously extracted from the training data to fill the white background image until the white background image is filled. Optionally, a plurality of sample images may be randomly stitched. The plurality of sample images may be stitched in other manners, which is not limited herein.
The preset number may be a preset number, and the preset number may also be determined according to the number of the sample detection frames, for example, the preset number may be 30% of the sample detection frames.
In another alternative embodiment, a mosaic module (mosaic) may be used to sequentially stitch a plurality of sample images to obtain an initial sample image, where mosaic is used to enrich the detection background and the detection object in the target image so as to enrich the data set. After the initial sample image is obtained, the initial sample image may be scaled and cropped to obtain a processed initial sample image, and the processed initial sample image and the preset sample image may be mixed by using a mixing module (mixup) to obtain a mixed image, so as to further enhance data, thereby enriching features in the sample image. A preset number of sample detection frames among a plurality of sample detection frames included in the mixed image may be determined as the target detection frame.
Fig. 7 is a schematic diagram of target detection model training according to an embodiment of the present application, where first, a plurality of sample images including sample detection frames are input into a mosaic module, the mosaic module is used to splice the plurality of sample images to obtain an initial sample image, then a mixing module is used to mix the initial sample image with a preset sample image to obtain a mixed image, and finally, a regional level module is used to determine a preset number of sample detection frames in the mixed image as target detection frames, and data enhancement is performed on target objects corresponding to the target detection frames to obtain target sample images; and training the initial detection model by using the target sample image to obtain a target detection model.
In the above embodiment of the present application, the method further includes: outputting a detection result; receiving a first feedback result, wherein the first feedback result is obtained by modifying a channel in the combined characteristic diagram according to the detection result; and updating the combined characteristic diagram based on the first feedback result.
In an optional embodiment, the detection result may be output, and the detection result is displayed to a client of the user, the user may modify the channel in the merged feature map according to the detection result to obtain a first feedback result, so as to update the merged feature map according to the first feedback result, and the detection may be performed according to the updated merged feature map to obtain a detection result with higher accuracy.
The neural network structure proposed in the present application can use a lightweight network structure (csp _ darknet) as a backbone, a high specific weight csp _ GFPN as a neck, and a scale-slave gfocal 2 as a head. Meanwhile, the target object in the target image is detected quickly and accurately by combining a learning-based data enhancement training method. Greatly reducing the resource usage when the neural network structure is deployed.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that acts and modules are not required to practice the invention.
Through the above description of the embodiments, those skilled in the art can clearly understand that the image processing method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
Example 2
There is also provided, in accordance with an embodiment of the present application, an image processing method embodiment, it should be noted that the steps illustrated in the flowchart of the accompanying drawings may be performed in a computer system such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than here.
Fig. 8 is a flowchart of an image processing method according to embodiment 2 of the present application, and as shown in fig. 8, the method may include the following steps:
and step S802, acquiring a target remote sensing image.
And the target remote sensing image comprises a target object.
And step S804, performing multi-scale feature extraction on the target remote sensing image to obtain a plurality of first feature maps.
Step S806, performing feature fusion on the multiple first feature maps by using the multi-branch network structure to obtain at least one second feature map.
The multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches.
And step S808, detecting at least one second characteristic diagram to obtain a detection result of the target object.
In the above embodiments of the present application, the multi-branch network structure includes: the device comprises a first branch and a second branch, wherein the output of the first branch is connected with the output of the second branch.
In the above embodiment of the present application, performing feature fusion on a plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, where the method includes: channel merging is carried out on the first feature maps to obtain a merged feature map; carrying out convolution processing on the combined feature graph by using the first branch to obtain a first output feature; carrying out convolution processing on the merged feature graph by using the second branch to obtain a second output feature; and channel combination is carried out on the first output characteristic and the second output characteristic to obtain at least one second characteristic diagram.
In the above embodiments of the present application, the first branch includes: at least one convolution block, each of the plurality of convolution blocks including a plurality of sub-convolution layers, the plurality of sub-convolution layers having different convolution kernels.
In the above embodiment of the present application, the method further includes: and detecting the target image by using a target detection model to obtain a detection result of the target object, wherein the target detection model is obtained by training based on the target sample image, and the target sample image is obtained by performing data enhancement on a plurality of sample images through a sample detection frame.
In the above embodiment of the present application, the method further includes: obtaining a plurality of sample images and sample detection frames corresponding to the plurality of sample images, wherein the sample detection frames are used for marking target objects in the sample images; determining a preset number of sample detection frames in a plurality of sample images as target detection frames; performing data enhancement on a target object corresponding to the target detection frame to obtain a target sample image; and training the initial detection model by using the target sample image to obtain a target detection model.
In the above embodiment of the present application, determining a preset number of sample detection frames in a plurality of sample images as target detection frames includes: splicing the plurality of sample images to obtain an initial sample image; mixing the initial sample image and a preset sample image to obtain a mixed image; and determining a preset number of sample detection frames in the mixed image as target detection frames.
In the above embodiment of the present application, the method further includes: outputting a detection result; receiving a first feedback result, wherein the first feedback result is obtained by modifying a channel in the combined characteristic diagram according to the detection result; and updating the combined characteristic diagram based on the first feedback result.
It should be noted that the preferred embodiments described in the above examples of the present application are the same as the schemes, application scenarios, and implementation procedures provided in example 1, but are not limited to the schemes provided in example 1.
Example 3
There is also provided, in accordance with an embodiment of the present application, an image processing method embodiment, it should be noted that the steps illustrated in the flowchart of the accompanying drawings may be performed in a computer system such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than here.
Fig. 9 is a flowchart of an image processing method according to embodiment 3 of the present application, and as shown in fig. 9, the method may include the steps of:
and step S902, acquiring an agricultural remote sensing image.
Wherein the agricultural remote sensing image comprises crops.
And step S904, carrying out multi-scale feature extraction on the agricultural remote sensing image to obtain a plurality of first feature maps.
Step S906, the multi-branch network structure is utilized to perform feature fusion on the plurality of first feature maps to obtain at least one second feature map.
The multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches.
And step S908, detecting at least one second characteristic diagram to obtain a detection result of the crops.
In the above embodiments of the present application, the multi-branch network structure includes: the device comprises a first branch and a second branch, wherein the output of the first branch is connected with the output of the second branch.
In the above embodiment of the present application, performing feature fusion on a plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, where the method includes: channel merging is carried out on the first feature maps to obtain a merged feature map; carrying out convolution processing on the combined feature graph by using the first branch to obtain a first output feature; carrying out convolution processing on the merged feature graph by using the second branch to obtain a second output feature; and carrying out channel combination on the first output characteristic and the second output characteristic to obtain at least one second characteristic diagram.
In the above embodiments of the present application, the first branch includes: at least one convolution block, each convolution block of the plurality of convolution blocks comprises a plurality of sub-convolution layers, and convolution kernels of the plurality of sub-convolution layers are different.
In the above embodiment of the present application, the method further includes: and detecting the agricultural remote sensing images by using a target detection model to obtain a detection result of the crops, wherein the target detection model is obtained by training based on a target sample image, and the target sample image is obtained by performing data enhancement on a plurality of sample images through a sample detection frame.
In the above embodiment of the present application, the method further includes: obtaining a plurality of sample images and sample detection frames corresponding to the plurality of sample images, wherein the sample detection frames are used for marking crops in the sample images; determining a preset number of sample detection frames in the plurality of sample images as target detection frames; performing data enhancement on crops corresponding to the target detection frame to obtain a target sample image; and training the initial detection model by using the target sample image to obtain a target detection model.
In the above embodiments of the present application, determining a preset number of sample detection frames in a plurality of sample images as target detection frames includes: splicing the plurality of sample images to obtain an initial sample image; mixing the initial sample image and a preset sample image to obtain a mixed image; and determining a preset number of sample detection frames in the mixed image as target detection frames.
In the above embodiment of the present application, the method further includes: outputting a detection result; receiving a first feedback result, wherein the first feedback result is obtained by modifying a channel in the combined characteristic diagram according to the detection result; and updating the combined characteristic diagram based on the first feedback result.
It should be noted that the preferred embodiments described in the foregoing examples of the present application are the same as the schemes, application scenarios, and implementation procedures provided in example 1, but are not limited to the schemes provided in example 1.
Example 4
There is also provided, in accordance with an embodiment of the present application, an image processing method embodiment, it should be noted that the steps illustrated in the flowchart of the accompanying drawings may be performed in a computer system such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than here.
Fig. 10 is a flowchart of an image processing method according to embodiment 4 of the present application, and as shown in fig. 10, the method may include the following steps:
and step S1002, obtaining a remote sensing image of the building.
Wherein the remote sensing image of the building comprises a target building.
And step S1004, carrying out multi-scale feature extraction on the building remote sensing image to obtain a plurality of first feature maps.
Step S1006, feature fusion is performed on the plurality of first feature maps by using the multi-branch network structure to obtain at least one second feature map.
The multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches.
And step S1008, detecting at least one second characteristic diagram to obtain a detection result of the target building.
In the above embodiment of the present application, the multi-branch network structure includes: the device comprises a first branch and a second branch, wherein the output of the first branch is connected with the output of the second branch.
In the above embodiment of the present application, performing feature fusion on a plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, where the method includes: channel merging is carried out on the first feature maps to obtain a merged feature map; carrying out convolution processing on the merged feature map by using the first branch to obtain a first output feature; carrying out convolution processing on the merged feature graph by using the second branch to obtain a second output feature; and carrying out channel combination on the first output characteristic and the second output characteristic to obtain at least one second characteristic diagram.
In the above embodiments of the present application, the first branch includes: at least one convolution block, each convolution block of the plurality of convolution blocks comprises a plurality of sub-convolution layers, and convolution kernels of the plurality of sub-convolution layers are different.
In the above embodiment of the present application, the method further includes: and detecting the building remote sensing image by using a target detection model to obtain a detection result of the target building, wherein the target detection model is obtained by training based on a target sample image, and the target sample image is obtained by performing data enhancement on a plurality of sample images through a sample detection frame.
In the above embodiment of the present application, the method further includes: obtaining a plurality of sample images and sample detection frames corresponding to the plurality of sample images, wherein the sample detection frames are used for marking a target building in the sample images; determining a preset number of sample detection frames in the plurality of sample images as target detection frames; performing data enhancement on a target building corresponding to the target detection frame to obtain a target sample image; and training the initial detection model by using the target sample image to obtain a target detection model.
In the above embodiments of the present application, determining a preset number of sample detection frames in a plurality of sample images as target detection frames includes: splicing the plurality of sample images to obtain an initial sample image; mixing the initial sample image and a preset sample image to obtain a mixed image; and determining a preset number of sample detection frames in the mixed image as target detection frames.
In the above embodiment of the present application, the method further includes: outputting a detection result; receiving a first feedback result, wherein the first feedback result is obtained by modifying a channel in the combined characteristic diagram according to the detection result; and updating the combined characteristic diagram based on the first feedback result.
It should be noted that the preferred embodiments described in the above examples of the present application are the same as the schemes, application scenarios, and implementation procedures provided in example 1, but are not limited to the schemes provided in example 1.
Example 5
There is also provided, in accordance with an embodiment of the present application, an image processing method embodiment, it should be noted that the steps illustrated in the flowchart of the accompanying drawings may be performed in a computer system such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than here.
Fig. 11 is a flowchart of an image processing method according to embodiment 5 of the present application, and as shown in fig. 11, the method may include the following steps:
step S1102, the cloud server acquires a target image.
Wherein the target image contains the target object.
Step S1104, the cloud server performs multi-scale feature extraction on the target image to obtain a plurality of first feature maps.
In step S1106, the cloud server performs feature fusion on the plurality of first feature maps by using the multi-branch network structure to obtain at least one second feature map.
The multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches.
Step S1108, the cloud server detects the at least one second feature map to obtain a detection result of the target object.
In the above embodiments of the present application, the multi-branch network structure includes: a first branch, a second branch, wherein the output of the first branch is connected with the output of the second branch.
In the above embodiment of the present application, the performing, by the cloud server, feature fusion on the plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map includes: the cloud server performs channel merging on the first feature maps to obtain a merged feature map; the cloud server performs convolution processing on the combined feature map by using the first branch to obtain a first output feature; the cloud server performs convolution processing on the combined feature graph by using the second branch to obtain a second output feature; and the cloud server performs channel merging on the first output characteristic and the second output characteristic to obtain at least one second characteristic diagram.
In the above embodiments of the present application, the first branch includes: at least one convolution block, each of the plurality of convolution blocks including a plurality of sub-convolution layers, the plurality of sub-convolution layers having different convolution kernels.
In the above embodiment of the present application, the method further includes: the cloud server detects the target image by using the target detection model to obtain the detection result of the target object, wherein the target detection model is obtained based on the target sample image training, and the target sample image is obtained by performing data enhancement on a plurality of sample images through the sample detection frame.
In the above embodiment of the present application, the method further includes: the cloud server acquires a plurality of sample images and sample detection frames corresponding to the sample images, wherein the sample detection frames are used for marking target objects in the sample images; the cloud server determines a preset number of sample detection frames in the plurality of sample images as target detection frames; the cloud server performs data enhancement on a target object corresponding to the target detection frame to obtain a target sample image; and the cloud server trains the initial detection model by using the target sample image to obtain a target detection model.
In the above embodiment of the present application, the determining, by the cloud server, that a preset number of sample detection frames in the plurality of sample images are target detection frames includes: the cloud server splices the plurality of sample images to obtain an initial sample image; the cloud server mixes the initial sample image and a preset sample image to obtain a mixed image; and the cloud server determines a preset number of sample detection frames in the mixed image as target detection frames.
In the above embodiment of the present application, the method further includes: the cloud server outputs a detection result; the cloud server receives a first feedback result, wherein the first feedback result is obtained by modifying a channel in the combined feature map according to the detection result; and the cloud server updates the combined feature map based on the first feedback result.
It should be noted that the preferred embodiments described in the above examples of the present application are the same as the schemes, application scenarios, and implementation procedures provided in example 1, but are not limited to the schemes provided in example 1.
Example 6
According to an embodiment of the present application, there is also provided an image processing apparatus for implementing the image processing method, and fig. 12 is a schematic diagram of an image processing apparatus according to embodiment 6 of the present application, as shown in fig. 12, the apparatus 1200 includes: an acquisition module 1202, an extraction module 1204, a fusion module 1206, and a detection module 1208.
The acquisition module is used for acquiring a target remote sensing image, wherein the agricultural remote sensing image comprises a target object; the extraction module is used for carrying out multi-scale feature extraction on the target remote sensing image to obtain a plurality of first feature maps; the fusion module is used for performing feature fusion on the plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, wherein the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches; the detection module is used for detecting at least one second characteristic diagram to obtain a detection result of the target object.
It should be noted here that the obtaining module 1202, the extracting module 1204, the fusing module 1206, and the detecting module 1208 correspond to steps S202 to S208 in embodiment 1, and the four modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in embodiment 1. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.
In the above embodiments of the present application, the multi-branch network structure includes: the device comprises a first branch and a second branch, wherein the output of the first branch is connected with the output of the second branch.
In the above embodiment of the present application, the fusion module includes: the device comprises a first processing unit and a merging unit.
The merging unit is used for merging the channels of the first feature maps to obtain a merged feature map; the first processing unit is used for carrying out convolution processing on the merged feature map by using the first branch to obtain a first output feature; the first processing unit is also used for carrying out convolution processing on the merged feature map by using the second branch to obtain a second output feature; the merging unit is further configured to perform channel merging on the first output feature and the second output feature to obtain at least one second feature map.
In the above embodiments of the present application, the first branch includes: at least one convolution block, each convolution block of the plurality of convolution blocks comprises a plurality of sub-convolution layers, and convolution kernels of the plurality of sub-convolution layers are different.
In the above embodiment of the present application, the detection module is further configured to detect a target image by using a target detection model to obtain a detection result of the target object, where the target detection model is obtained based on target sample image training, and the target sample image is obtained by performing data enhancement on a plurality of sample images through a sample detection frame.
In the above embodiment of the present application, the apparatus further includes: the device comprises an acquisition module, a determination module, an enhancement module and a training module.
The acquisition module is used for acquiring a plurality of sample images and sample detection frames corresponding to the sample images, wherein the sample detection frames are used for marking target objects in the sample images; the determining module is used for determining a preset number of sample detection frames in the plurality of sample images as target detection frames; the enhancement module is used for enhancing data of a target object corresponding to the target detection frame to obtain a target sample image; the training module is used for training the initial detection model by using the target sample image to obtain a target detection model.
In the above embodiments of the present application, the determining module includes: splicing unit, mixing unit, determining unit.
The splicing unit is used for splicing a plurality of sample images to obtain an initial sample image; the mixing unit is used for mixing the initial sample image and the preset sample image to obtain a mixed image; the determining unit is used for determining a preset number of sample detection frames in the mixed image as target detection frames.
In the above embodiment of the present application, the apparatus further includes: the device comprises an output module, a receiving module and an updating module.
The output module is used for outputting a detection result; the receiving module is used for receiving a first feedback result, wherein the first feedback result is obtained by modifying a channel in the combined characteristic diagram according to the detection result; the updating module is used for updating the combined characteristic diagram based on the first feedback result.
It should be noted that the preferred embodiments described in the above examples of the present application are the same as the schemes, application scenarios, and implementation procedures provided in example 1, but are not limited to the schemes provided in example 1.
Example 7
According to an embodiment of the present application, there is also provided an image processing apparatus for implementing the image processing method, and fig. 13 is a schematic diagram of an image processing apparatus according to embodiment 7 of the present application, as shown in fig. 13, the apparatus 1300 includes: an obtaining module 1302, and an obtaining module 1302.
The acquisition module is used for acquiring an agricultural remote sensing image, wherein the agricultural remote sensing image comprises crops; the extraction module is used for carrying out multi-scale feature extraction on the agricultural remote sensing image to obtain a plurality of first feature maps; the fusion module is used for performing feature fusion on the plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, wherein the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches; the detection module is used for detecting at least one second characteristic diagram to obtain a detection result of the crops.
It should be noted here that the obtaining module 1302, the extracting module 1304, the fusing module 1306, and the detecting module 1308 described above correspond to steps S802 to S808 in embodiment 2, and the four modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in embodiment 1 described above. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.
It should be noted that the preferred embodiments described in the above examples of the present application are the same as the schemes, application scenarios, and implementation procedures provided in example 1, but are not limited to the schemes provided in example 1.
Example 8
According to an embodiment of the present application, there is also provided an image processing apparatus for implementing the image processing method, and fig. 14 is a schematic diagram of an image processing apparatus according to embodiment 8 of the present application, as shown in fig. 14, the apparatus 1400 includes: an acquisition module 1402, an extraction module 1404, a fusion module 1406, and a detection module 1408.
The acquisition module is used for acquiring a building remote sensing image, wherein the building remote sensing image comprises a target building; the extraction module is used for carrying out multi-scale feature extraction on the building remote sensing image to obtain a plurality of first feature maps; the fusion module is used for performing feature fusion on the plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, wherein the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches; the detection module is used for detecting at least one second characteristic diagram to obtain a detection result of the target building.
It should be noted here that the above-mentioned obtaining module 1402, extracting module 1404, fusing module 1406, and detecting module 1408 correspond to steps S902 to S908 in embodiment 3, and the four modules are the same as the examples and application scenarios realized by the corresponding steps, but are not limited to the disclosure in embodiment 1. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.
In the above embodiment of the present application, the method further includes: and detecting the building remote sensing image by using a target detection model to obtain a detection result of the target building, wherein the target detection model is obtained by training based on a target sample image, and the target sample image is obtained by performing data enhancement on a plurality of sample images through a sample detection frame.
It should be noted that the preferred embodiments described in the above examples of the present application are the same as the schemes, application scenarios, and implementation procedures provided in example 1, but are not limited to the schemes provided in example 1.
Example 9
According to an embodiment of the present application, there is also provided an image processing apparatus for implementing the image processing method, and fig. 15 is a schematic diagram of an image processing apparatus according to embodiment 9 of the present application, as shown in fig. 15, the apparatus 1500 includes: an obtaining module 1502, an extracting module 1504, a fusing module 1506, and a detecting module 1508.
The acquisition module is used for acquiring a target remote sensing image through a cloud server, wherein the target remote sensing image comprises a target object; the extraction module is used for carrying out multi-scale feature extraction on the target remote sensing image through the cloud server to obtain a plurality of first feature maps; the fusion module is used for performing feature fusion on the plurality of first feature maps by using a multi-branch network structure through the cloud server to obtain at least one second feature map, wherein the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches; the detection module is used for detecting the at least one second characteristic diagram through the cloud server to obtain a detection result of the target object.
It should be noted here that the obtaining module 1502, the extracting module 1504, the fusing module 1506, and the detecting module 1508 correspond to steps S1002 to S1008 in embodiment 4, and the four modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in embodiment 1. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.
It should be noted that the preferred embodiments described in the above examples of the present application are the same as the schemes, application scenarios, and implementation procedures provided in example 1, but are not limited to the schemes provided in example 1.
Example 10
An embodiment of the present invention may provide an electronic device including: the electronic device can be a computer terminal, and the computer terminal can be any one computer terminal device in a computer terminal group. Optionally, in this embodiment, the computer terminal may be replaced with a terminal device such as a mobile terminal.
Optionally, in this embodiment, the computer terminal may be located in at least one network device of a plurality of network devices of a computer network.
In this embodiment, the computer terminal may execute program codes of the following steps in the image processing method: acquiring a target image, wherein the target image comprises a target object; performing multi-scale feature extraction on the target image to obtain a plurality of first feature maps; performing feature fusion on the plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, wherein the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches; and detecting at least one second characteristic diagram to obtain a detection result of the target object.
Optionally, fig. 16 is a block diagram of a computer terminal according to an embodiment of the present application. As shown in fig. 16, the computer terminal a may include: one or more processors (only one shown), memory.
The memory may be configured to store software programs and modules, such as program instructions/modules corresponding to the image processing method and apparatus in the embodiments of the present application, and the processor executes various functional applications and data processing by running the software programs and modules stored in the memory, so as to implement the image processing method. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory remotely located from the processor, and these remote memories may be connected to terminal a through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: acquiring a target image, wherein the target image comprises a target object; performing multi-scale feature extraction on the target image to obtain a plurality of first feature maps; performing feature fusion on the plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, wherein the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches; and detecting at least one second characteristic diagram to obtain a detection result of the target object.
Optionally, the processor may further execute the program code of the following steps: channel merging is carried out on the first feature maps to obtain a merged feature map; carrying out convolution processing on the combined feature graph by using the first branch to obtain a first output feature; carrying out convolution processing on the merged feature graph by using the second branch to obtain a second output feature; and carrying out channel combination on the first output characteristic and the second output characteristic to obtain at least one second characteristic diagram.
Optionally, the processor may further execute the program code of the following steps: and detecting the target image by using a target detection model to obtain a detection result of the target object, wherein the target detection model is obtained by training based on the target sample image, and the target sample image is obtained by performing data enhancement on a plurality of sample images through a sample detection frame.
Optionally, the processor may further execute the program code of the following steps: obtaining a plurality of sample images and sample detection frames corresponding to the plurality of sample images, wherein the sample detection frames are used for marking target objects in the sample images; determining a preset number of sample detection frames in the plurality of sample images as target detection frames; performing data enhancement on a target object corresponding to the target detection frame to obtain a target sample image; and training the initial detection model by using the target sample image to obtain a target detection model.
Optionally, the processor may further execute the program code of the following steps: splicing the plurality of sample images to obtain an initial sample image; mixing the initial sample image and a preset sample image to obtain a mixed image; and determining a preset number of sample detection frames in the mixed image as target detection frames.
Optionally, the processor may further execute the program code of the following steps: outputting a detection result; receiving a first feedback result, wherein the first feedback result is obtained by modifying a channel in the combined characteristic diagram according to the detection result; and updating the combined characteristic diagram based on the first feedback result.
The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: acquiring a target remote sensing image, wherein the target remote sensing image comprises a target object; carrying out multi-scale feature extraction on the target remote sensing image to obtain a plurality of first feature maps; performing feature fusion on the plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, wherein the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches; and detecting at least one second characteristic diagram to obtain a detection result of the target object.
The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: acquiring an agricultural remote sensing image, wherein the agricultural remote sensing image comprises crops; carrying out multi-scale feature extraction on the agricultural remote sensing image to obtain a plurality of first feature maps; performing feature fusion on the plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, wherein the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches; and detecting at least one second characteristic diagram to obtain the detection result of the crops.
The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: obtaining a building remote sensing image, wherein the building remote sensing image comprises a target building; carrying out multi-scale feature extraction on the building remote sensing image to obtain a plurality of first feature maps; performing feature fusion on the plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, wherein the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches; and detecting at least one second characteristic diagram to obtain a detection result of the target building.
The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: the cloud server acquires a target image, wherein the target image comprises a target object; the cloud server performs multi-scale feature extraction on the target image to obtain a plurality of first feature maps; the cloud server performs feature fusion on the plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, wherein the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches; and the cloud server detects the at least one second characteristic graph to obtain a detection result of the target object.
By adopting the embodiment of the application, a target image is obtained firstly, wherein the target image comprises a target object; performing multi-scale feature extraction on the target image to obtain a plurality of first feature maps; and performing feature fusion on the plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, wherein the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches to obtain at least one second feature map, and the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through the plurality of branches, detecting the at least one second feature map to obtain a detection result of the target object, so that the accuracy of the detection result of the target object is improved. It is easy to note that the feature fusion can be performed on the first feature map through a plurality of branches, so as to improve the accuracy of the obtained at least one second feature map, and the plurality of branches can also reduce the number of parameters in the fusion process, so as to improve the fusion efficiency, thereby solving the technical problem of low accuracy in detecting images in the related art.
It can be understood by those skilled in the art that the structure shown in fig. 15 is only an illustration, and the computer terminal may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 15 is a diagram illustrating a structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 15, or have a different configuration than shown in FIG. 15.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
Example 11
The embodiment of the invention also provides a storage medium. Optionally, in this embodiment, the storage medium may be configured to store a program code executed by the image processing method provided in the first embodiment.
Optionally, in this embodiment, the storage medium may be located in any one of computer terminals in a computer terminal group in a computer network, or in any one of mobile terminals in a mobile terminal group.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring a target image, wherein the target image comprises a target object; performing multi-scale feature extraction on the target image to obtain a plurality of first feature maps; performing feature fusion on the plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, wherein the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches; and detecting at least one second characteristic diagram to obtain a detection result of the target object.
Optionally, the storage medium is further configured to store program code for performing the following steps: channel merging is carried out on the first feature maps to obtain a merged feature map; carrying out convolution processing on the merged feature map by using the first branch to obtain a first output feature; carrying out convolution processing on the merged feature graph by using the second branch to obtain a second output feature; and carrying out channel combination on the first output characteristic and the second output characteristic to obtain at least one second characteristic diagram.
Optionally, the storage medium is further configured to store program code for performing the following steps: and detecting the target image by using a target detection model to obtain a detection result of the target object, wherein the target detection model is obtained by training based on the target sample image, and the target sample image is obtained by performing data enhancement on a plurality of sample images through a sample detection frame.
Optionally, the storage medium is further configured to store program code for performing the following steps: obtaining a plurality of sample images and sample detection frames corresponding to the plurality of sample images, wherein the sample detection frames are used for marking target objects in the sample images; determining a preset number of sample detection frames in a plurality of sample images as target detection frames; performing data enhancement on a target object corresponding to the target detection frame to obtain a target sample image; and training the initial detection model by using the target sample image to obtain a target detection model.
Optionally, the storage medium is further configured to store program code for performing the following steps: splicing the plurality of sample images to obtain an initial sample image; mixing the initial sample image and a preset sample image to obtain a mixed image; and determining a preset number of sample detection frames in the mixed image as target detection frames.
Optionally, the storage medium is further configured to store program code for performing the following steps: outputting a detection result; receiving a first feedback result, wherein the first feedback result is obtained by modifying a channel in the combined characteristic diagram according to the detection result; and updating the combined characteristic diagram based on the first feedback result.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring a target remote sensing image, wherein the target remote sensing image comprises a target object; carrying out multi-scale feature extraction on the target remote sensing image to obtain a plurality of first feature maps; performing feature fusion on the plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, wherein the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches; and detecting at least one second characteristic diagram to obtain a detection result of the target object.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring an agricultural remote sensing image, wherein the agricultural remote sensing image comprises crops; carrying out multi-scale feature extraction on the agricultural remote sensing image to obtain a plurality of first feature maps; performing feature fusion on the plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, wherein the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches; and detecting at least one second characteristic diagram to obtain a detection result of the crops.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: obtaining a remote sensing image of a building, wherein the remote sensing image of the building comprises a target building; carrying out multi-scale feature extraction on the building remote sensing image to obtain a plurality of first feature maps; performing feature fusion on the plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, wherein the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches; and detecting at least one second characteristic diagram to obtain a detection result of the target building.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: the cloud server acquires a target image, wherein the target image comprises a target object; the cloud server performs multi-scale feature extraction on the target image to obtain a plurality of first feature maps; the cloud server performs feature fusion on the plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, wherein the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches; and the cloud server detects the at least one second characteristic diagram to obtain a detection result of the target object.
By adopting the embodiment of the application, a target image is obtained firstly, wherein the target image comprises a target object; performing multi-scale feature extraction on the target image to obtain a plurality of first feature maps; and performing feature fusion on the plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, wherein the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches to obtain at least one second feature map, and the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through the plurality of branches, detecting the at least one second feature map to obtain a detection result of the target object, so that the accuracy of the detection result of the target object is improved. It is easy to note that the feature fusion can be performed on the first feature map through multiple branches, so as to improve the accuracy of the obtained at least one second feature map, and the multiple branches can also reduce the number of parameters in the fusion process, so as to improve the fusion efficiency, thereby solving the technical problem of low accuracy in detecting images in the related art.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described device embodiments are merely illustrative, and for example, the division of the units is only one type of logical functional division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed coupling or direct coupling or communication connection between each other may be an indirect coupling or communication connection through some interfaces, units or modules, and may be electrical or in other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may also be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that those skilled in the art can make various improvements and modifications without departing from the principle of the present invention, and these improvements and modifications should also be construed as the protection scope of the present invention.

Claims (14)

1. An image processing method, comprising:
acquiring a target image, wherein the target image comprises a target object;
performing multi-scale feature extraction on the target image to obtain a plurality of first feature maps;
performing feature fusion on the plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, wherein the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches;
and detecting the at least one second characteristic diagram to obtain a detection result of the target object.
2. The method of claim 1, wherein the multi-drop network architecture comprises: a first branch, a second branch, wherein an output of the first branch is connected with an output of the second branch.
3. The method of claim 2, wherein performing feature fusion on the plurality of first feature maps using a multi-drop network architecture to obtain at least one second feature map comprises:
channel merging is carried out on the first feature maps to obtain a merged feature map;
performing convolution processing on the merged feature map by using the first branch to obtain a first output feature;
performing convolution processing on the merged feature map by using the second branch to obtain a second output feature;
and carrying out channel combination on the first output characteristic and the second output characteristic to obtain the at least one second characteristic diagram.
4. The method of claim 3, wherein the first branch comprises: at least one convolution block, each of the plurality of convolution blocks comprising a plurality of sub-convolution layers, the plurality of sub-convolution layers having different convolution kernels.
5. The method of claim 1, further comprising:
and detecting the target image by using a target detection model to obtain the detection result of the target object, wherein the target detection model is obtained by training based on a target sample image, and the target sample image is obtained by performing data enhancement on a plurality of sample images through a sample detection frame.
6. The method of claim 5, further comprising:
obtaining the plurality of sample images and the sample detection frames corresponding to the plurality of sample images, wherein the sample detection frames are used for marking target objects in the sample images;
determining a preset number of sample detection frames in the plurality of sample images as target detection frames;
performing data enhancement on the target object corresponding to the target detection frame to obtain a target sample image;
and training an initial detection model by using the target sample image to obtain the target detection model.
7. The method of claim 6, wherein determining a preset number of sample detection frames in the plurality of sample images as target detection frames comprises:
splicing the plurality of sample images to obtain an initial sample image;
mixing the initial sample image and a preset sample image to obtain a mixed image;
determining the preset number of the sample detection frames in the mixed image as the target detection frame.
8. The method of claim 3, further comprising:
outputting the detection result;
receiving a first feedback result, wherein the first feedback result is obtained by modifying a channel in the combined feature map according to the detection result;
updating the merged feature map based on the first feedback result.
9. An image processing method, comprising:
acquiring a target remote sensing image, wherein the target remote sensing image comprises a target object;
carrying out multi-scale feature extraction on the target remote sensing image to obtain a plurality of first feature maps;
performing feature fusion on the plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, wherein the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches;
and detecting the at least one second characteristic diagram to obtain a detection result of the target object.
10. An image processing method, comprising:
obtaining a building remote sensing image, wherein the building remote sensing image comprises a target building;
carrying out multi-scale feature extraction on the building remote sensing image to obtain a plurality of first feature maps;
performing feature fusion on the plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, wherein the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches;
and detecting the at least one second characteristic diagram to obtain a detection result of the target building.
11. The method of claim 10, further comprising:
and detecting the building remote sensing image by using a target detection model to obtain the detection result of the target building, wherein the target detection model is obtained by training based on a target sample image, and the target sample image is obtained by performing data enhancement on a plurality of sample images through a sample detection frame.
12. An image processing method, comprising:
the method comprises the steps that a cloud server obtains a target image, wherein the target image comprises a target object;
the cloud server performs multi-scale feature extraction on the target image to obtain a plurality of first feature maps;
the cloud server performs feature fusion on the plurality of first feature maps by using a multi-branch network structure to obtain at least one second feature map, wherein the multi-branch network structure is used for performing feature fusion on the plurality of first feature maps through a plurality of branches;
and the cloud server detects the at least one second characteristic diagram to obtain a detection result of the target object.
13. A computer-readable storage medium, comprising a stored program, wherein the program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the method of any one of claims 1 to 12.
14. An electronic device, comprising: a memory and a processor for executing a program stored in the memory, wherein the program when executed performs the method of any one of claims 1 to 12.
CN202210662381.4A 2022-06-13 2022-06-13 Image processing method, storage medium, and electronic device Pending CN115100417A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210662381.4A CN115100417A (en) 2022-06-13 2022-06-13 Image processing method, storage medium, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210662381.4A CN115100417A (en) 2022-06-13 2022-06-13 Image processing method, storage medium, and electronic device

Publications (1)

Publication Number Publication Date
CN115100417A true CN115100417A (en) 2022-09-23

Family

ID=83291195

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210662381.4A Pending CN115100417A (en) 2022-06-13 2022-06-13 Image processing method, storage medium, and electronic device

Country Status (1)

Country Link
CN (1) CN115100417A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117495837A (en) * 2023-11-17 2024-02-02 哈尔滨工程大学 Intelligent detection method for three-dimensional appearance defects of bearing

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111739062A (en) * 2020-06-05 2020-10-02 北京航空航天大学 Target detection method and system based on feedback mechanism
CN113158865A (en) * 2021-04-14 2021-07-23 杭州电子科技大学 Wheat ear detection method based on EfficientDet
CN114399643A (en) * 2021-12-13 2022-04-26 阿里巴巴(中国)有限公司 Image processing method, storage medium, and computer terminal
CN114462469A (en) * 2021-12-20 2022-05-10 浙江大华技术股份有限公司 Training method of target detection model, target detection method and related device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111739062A (en) * 2020-06-05 2020-10-02 北京航空航天大学 Target detection method and system based on feedback mechanism
CN113158865A (en) * 2021-04-14 2021-07-23 杭州电子科技大学 Wheat ear detection method based on EfficientDet
CN114399643A (en) * 2021-12-13 2022-04-26 阿里巴巴(中国)有限公司 Image processing method, storage medium, and computer terminal
CN114462469A (en) * 2021-12-20 2022-05-10 浙江大华技术股份有限公司 Training method of target detection model, target detection method and related device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YIQI JIANG等: "GIRAFFEDET: A EAVY-NECK PARADIGM FOR OBJECT DETECTION", 《ARXIV》, 9 February 2022 (2022-02-09), pages 3 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117495837A (en) * 2023-11-17 2024-02-02 哈尔滨工程大学 Intelligent detection method for three-dimensional appearance defects of bearing

Similar Documents

Publication Publication Date Title
AU2018211356B2 (en) Image completion with improved deep neural networks
CN108764039B (en) Neural network, building extraction method of remote sensing image, medium and computing equipment
CN112651475B (en) Two-dimensional code display method, device, equipment and medium
CN111553362A (en) Video processing method, electronic equipment and computer readable storage medium
CN111399831A (en) Page display method and device, storage medium and electronic device
CN113724128A (en) Method for expanding training sample
CN115100417A (en) Image processing method, storage medium, and electronic device
CN117237755A (en) Target detection model training method and device, and image detection method and device
CN114926754A (en) Image detection method, storage medium and processor
CN114359565A (en) Image detection method, storage medium and computer terminal
CN109615620A (en) The recognition methods of compression of images degree, device, equipment and computer readable storage medium
CN113470051B (en) Image segmentation method, computer terminal and storage medium
CN114359676B (en) Method, device and storage medium for training target detection model and constructing sample set
CN112667942A (en) Animation generation method, device and medium
CN114399643A (en) Image processing method, storage medium, and computer terminal
CN115690592A (en) Image processing method and model training method
CN115641397A (en) Method and system for synthesizing and displaying virtual image
CN114998694A (en) Method, apparatus, device, medium and program product for training image processing model
CN115620034A (en) Object tracking method, device, equipment and storage medium
CN114299073A (en) Image segmentation method, image segmentation device, storage medium, and computer program
CN114266723A (en) Image processing method, image processing device, storage medium and computer terminal
CN113568735A (en) Data processing method and system
CN113487480A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN110929866B (en) Training method, device and system of neural network model
US20200126517A1 (en) Image adjustment method, apparatus, device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination