CN113470051B - Image segmentation method, computer terminal and storage medium - Google Patents

Image segmentation method, computer terminal and storage medium Download PDF

Info

Publication number
CN113470051B
CN113470051B CN202111035749.6A CN202111035749A CN113470051B CN 113470051 B CN113470051 B CN 113470051B CN 202111035749 A CN202111035749 A CN 202111035749A CN 113470051 B CN113470051 B CN 113470051B
Authority
CN
China
Prior art keywords
result
height
target
image
segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111035749.6A
Other languages
Chinese (zh)
Other versions
CN113470051A (en
Inventor
张定乾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Damo Institute Hangzhou Technology Co Ltd
Original Assignee
Alibaba Damo Institute Hangzhou Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Damo Institute Hangzhou Technology Co Ltd filed Critical Alibaba Damo Institute Hangzhou Technology Co Ltd
Priority to CN202111035749.6A priority Critical patent/CN113470051B/en
Publication of CN113470051A publication Critical patent/CN113470051A/en
Application granted granted Critical
Publication of CN113470051B publication Critical patent/CN113470051B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Geometry (AREA)
  • Image Processing (AREA)

Abstract

The application discloses an image segmentation method, a computer terminal and a storage medium. Wherein, the method comprises the following steps: acquiring a target image; carrying out image segmentation on the target image to obtain an initial segmentation result of the target image; predicting to obtain a height result of the target image, wherein the height result is used for representing the shortest distance between a pixel point in the target image and a target boundary contained in the target image; and correcting the initial segmentation result based on the height result to obtain a target segmentation result of the target image. The method and the device solve the technical problems that when the image segmentation method in the related technology is applied to the aerial image or the aerial image, the segmentation result is incomplete, and the robustness is low.

Description

Image segmentation method, computer terminal and storage medium
Technical Field
The present application relates to the field of image processing, and in particular, to an image segmentation method, a computer terminal, and a storage medium.
Background
In the remote sensing image processing, since the size, shape and rotation angle of the target in the remote sensing image are various, the target is usually extracted by adopting a semantic segmentation method. Most of the current commonly used image segmentation methods are based on images acquired by a camera, and due to the fact that the remote sensing images and the images acquired by the camera have large differences, when the image segmentation method is used for segmentation, the situations that the segmentation result is not complete, the robustness is not high and the like occur.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the application provides an image segmentation method, a computer terminal and a storage medium, which are used for at least solving the technical problems of incomplete segmentation result and low robustness when the image segmentation method in the related technology is applied to an aerial image or an aerial image.
According to a first aspect of embodiments of the present application, there is provided an image segmentation method, including: acquiring a target image; carrying out image segmentation on the target image to obtain an initial segmentation result of the target image; predicting to obtain a height result of the target image, wherein the height result is used for representing the shortest distance between a pixel point in the target image and a target boundary contained in the target image; and correcting the initial segmentation result based on the height result to obtain a target segmentation result of the target image.
According to a second aspect of the embodiments of the present application, there is also provided an image segmentation method, including: acquiring a remote sensing image; carrying out image segmentation on the remote sensing image to obtain an initial segmentation result of the remote sensing image; predicting to obtain a height result of the remote sensing image, wherein the height result is used for representing the shortest distance between a pixel point in the remote sensing image and a ground object boundary contained in the remote sensing image; and correcting the initial segmentation result based on the height result to obtain a target segmentation result of the remote sensing image.
According to a third aspect of the embodiments of the present application, there is also provided an image segmentation method, including: acquiring an aerial image shot by an unmanned aerial vehicle; carrying out image segmentation on the aerial image to obtain an initial segmentation result of the aerial image; predicting to obtain a height result of the aerial image, wherein the height result is used for representing the shortest distance between a pixel point in the aerial image and a ground object boundary contained in the aerial image; and correcting the initial segmentation result based on the height result to obtain a target segmentation result of the aerial image.
According to a fourth aspect of the embodiments of the present application, there is also provided an image segmentation method, including: the cloud server receives a target image uploaded by a client; the cloud server performs image segmentation on the target image to obtain an initial segmentation result of the target image; the cloud server predicts and obtains a height result of the target image, wherein the height result is used for representing the shortest distance between a pixel point in the target image and a target boundary contained in the target image; the cloud server corrects the initial segmentation result based on the height result to obtain a target segmentation result of the target image; and the cloud server feeds the target segmentation result back to the client.
According to a fifth aspect of the embodiments of the present application, there is also provided an image segmentation method, including: acquiring a target image by calling a first function, wherein the first function comprises: a first parameter, wherein a parameter value of the first parameter is a target image; carrying out image segmentation on the target image to obtain an initial segmentation result of the target image; predicting to obtain a height result of the target image, wherein the height result is used for representing the shortest distance between a pixel point in the target image and a target boundary contained in the target image; correcting the initial segmentation result based on the height result to obtain a target segmentation result of the target image; outputting a target segmentation result by calling a second function, wherein the second function comprises: and the parameter value of the second parameter is the target segmentation result.
According to a sixth aspect of the embodiments of the present application, there is provided an image segmentation apparatus including: the acquisition module is used for acquiring a target image; the segmentation module is used for carrying out image segmentation on the target image to obtain an initial segmentation result of the target image; the prediction module is used for predicting a height result of the target image, wherein the height result is used for representing the shortest distance between a pixel point in the target image and a target boundary contained in the target image; and the correction module is used for correcting the initial segmentation result based on the height result to obtain a target segmentation result of the target image.
According to a seventh aspect of the embodiments of the present application, there is also provided an image segmentation apparatus including: the acquisition module is used for acquiring a remote sensing image; the segmentation module is used for carrying out image segmentation on the remote sensing image to obtain an initial segmentation result of the remote sensing image; the prediction module is used for predicting a height result of the remote sensing image, wherein the height result is used for representing the shortest distance between a pixel point in the remote sensing image and a ground object boundary contained in the remote sensing image; and the correction module is used for correcting the initial segmentation result based on the height result to obtain a target segmentation result of the remote sensing image.
According to an eighth aspect of the embodiments of the present application, there is also provided an image segmentation apparatus including: the acquisition module is used for acquiring aerial images shot by the unmanned aerial vehicle; the segmentation module is used for carrying out image segmentation on the aerial image to obtain an initial segmentation result of the aerial image; the prediction module is used for predicting a height result of the aerial image, wherein the height result is used for representing the shortest distance between a pixel point in the aerial image and a ground object boundary contained in the aerial image; and the correction module is used for correcting the initial segmentation result based on the height result to obtain a target segmentation result of the aerial image.
According to a ninth aspect of the embodiments of the present application, there is also provided an image segmentation apparatus including: the receiving module is used for receiving the target image uploaded by the client; the segmentation module is used for carrying out image segmentation on the target image to obtain an initial segmentation result of the target image; the prediction module is used for predicting a height result of the target image, wherein the height result is used for representing the shortest distance between a pixel point in the target image and a target boundary contained in the target image; the correction module is used for correcting the initial segmentation result based on the height result to obtain a target segmentation result of the target image; and the feedback module is used for feeding back the target segmentation result to the client.
According to a tenth aspect of the embodiments of the present application, there is also provided an image segmentation apparatus including: the first calling module is used for obtaining a target image by calling a first function, wherein the first function comprises: a first parameter, wherein a parameter value of the first parameter is a target image; the segmentation module is used for carrying out image segmentation on the target image to obtain an initial segmentation result of the target image; the prediction module is used for predicting a height result of the target image, wherein the height result is used for representing the shortest distance between a pixel point in the target image and a target boundary contained in the target image; the correction module is used for correcting the initial segmentation result based on the height result to obtain a target segmentation result of the target image; a second calling module, configured to output a target segmentation result by calling a second function, where the second function includes: and the parameter value of the second parameter is the target segmentation result.
According to an eleventh aspect of the embodiments of the present application, there is also provided an image segmentation method, including: acquiring a building image; carrying out image segmentation on the building image to obtain an initial segmentation result of the building contained in the building image; predicting a height result of the building image, wherein the height result is used for representing the shortest distance between a pixel point in the building image and the boundary of the building; and correcting the initial segmentation result of the building based on the height result to obtain a target segmentation result of the building.
According to a twelfth aspect of the embodiments of the present application, there is also provided an image segmentation apparatus including: the acquisition module is used for acquiring a building image; the segmentation module is used for carrying out image segmentation on the building image to obtain an initial segmentation result of the building contained in the building image; the prediction module is used for predicting a height result of the building image, wherein the height result is used for representing the shortest distance between a pixel point in the building image and the boundary of the building; and the correcting module is used for correcting the initial segmentation result of the building based on the height result to obtain a target segmentation result of the building.
According to the thirteenth aspect of the embodiments of the present application, there is further provided a computer-readable storage medium, where the computer-readable storage medium includes a stored program, and when the program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the image segmentation method in any one of the above embodiments.
According to a fourteenth aspect of the embodiments of the present application, there is also provided a computer terminal, including: a processor and a memory, the processor being configured to execute a program stored in the memory, wherein the program is configured to perform the image segmentation method in any of the above embodiments when executed.
In the embodiment of the application, after the target image is acquired, firstly, the target image is subjected to image segmentation to obtain an initial segmentation result of the target image, a height result of the target image is obtained through prediction, and finally, the initial segmentation result is corrected based on the height result to obtain a target segmentation result of the target image, so that the purpose of performing image processing on an image with a large target display size and a single characteristic is achieved. It is easy to note that, since the height result is used for representing the shortest distance between the pixel point in the target image and the target boundary included in the target image, that is, the height result embodies the semantic information of the target boundary, in the process of correcting the initial segmentation result by using the height result, the semantic information of the target boundary is fully considered, so that the target segmentation result is more complete, the situation that only part of the target is extracted is reduced, the technical effect of improving the image segmentation accuracy is achieved, and the technical problems that when the image segmentation method in the related art is applied to the aerial image or the aerial image, the segmentation result is incomplete and the robustness is not high are solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow chart of an image segmentation method according to the prior art;
fig. 2 is a block diagram of a hardware structure of a computer terminal (or mobile device) for implementing an image segmentation method according to an embodiment of the present application;
FIG. 3 is a flow chart of a first image segmentation method according to an embodiment of the present application;
FIG. 4 is a schematic illustration of an alternative interactive interface according to an embodiment of the present application;
FIG. 5 is a schematic diagram of another alternative interactive interface according to an embodiment of the present application;
FIG. 6 is a schematic illustration of an alternative contour map according to an embodiment of the present application;
FIG. 7 is a flow diagram of an alternative contour regression method according to an embodiment of the present application;
FIG. 8 is a flow chart of an alternative image segmentation method according to an embodiment of the present application;
FIG. 9 is a flow chart of a second image segmentation method according to an embodiment of the present application;
FIG. 10 is a flow chart of a third method of image segmentation according to an embodiment of the present application;
FIG. 11 is a flow chart of a fourth image segmentation method according to an embodiment of the present application;
FIG. 12 is a flow chart of a fifth image segmentation method according to an embodiment of the present application;
FIG. 13 is a diagram of a first image segmentation apparatus according to an embodiment of the present application;
FIG. 14 is a diagram of a second image segmentation apparatus according to an embodiment of the present application;
FIG. 15 is a diagram of a third image segmentation apparatus according to an embodiment of the present application;
FIG. 16 is a flow chart of a sixth image segmentation method according to an embodiment of the present application;
fig. 17 is a block diagram of a computer terminal according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
First, some terms or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:
semantic segmentation: it may mean that for each pixel in the image, a category is given to which the pixel belongs, e.g. car, building, etc.
Contour line: in the topographic map, it may be a closed curve formed by connecting adjacent points with equal elevations. In the embodiment of the present application, the connection line between the inside of the polygon of each independent object and the adjacent point with the same shortest distance to the boundary may be referred to.
As shown in fig. 1, the algorithm flow of remote sensing image information extraction based on deep learning is as follows: and inputting the remote sensing image, performing a semantic segmentation task and other independent and different subtasks, and fusing a semantic segmentation result and a subtask result to obtain a final result. For example, two methods, namely, an HRNet (High-Resolution Network) and a decoreadsegnet (Decoupled Segmentation Network) may be adopted, wherein the HRNet is divided into two stages: obtaining a coarse-grained segmentation result and correcting a fine-grained segmentation result; decoruppedsegnet divides the semantic segmentation task into low-frequency intermediate object region prediction and high-frequency boundary prediction.
The method is based on an image acquired by a camera, and the image has the following two characteristics: the target display size (the number of occupied pixels) is small, the object features are distinct, and the semantic information is rich. However, the target in the remote sensing image is large in display size, single in characteristic and deficient in semantic information. When the method is applied to remote sensing images, the situations of incomplete segmentation, low robustness and the like can occur.
In order to solve the problems, the application provides an image segmentation scheme, and the semantic information is continuously spread from outside to inside by fully mining the semantic information of the boundary, so that the aim of accurate image segmentation is fulfilled.
Example 1
In accordance with an embodiment of the present application, there is provided an image segmentation method, it should be noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer-executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
The method provided by the embodiment of the application can be executed in a mobile terminal, a computer terminal or a similar operation device. Fig. 2 shows a hardware configuration block diagram of a computer terminal (or mobile device) for implementing the image segmentation method. As shown in fig. 2, the computer terminal 10 (or mobile device 10) may include one or more processors (shown as 102a, 102b, … …, 102n in the figure) which may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA, a memory 104 for storing data, and a transmission device 106 for communication functions. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial BUS (USB) port (which may be included as one of the ports of the BUS), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 2 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 2, or have a different configuration than shown in FIG. 2.
It should be noted that the one or more processors and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuit may be a single stand-alone processing module, or incorporated in whole or in part into any of the other elements in the computer terminal 10 (or mobile device). As referred to in the embodiments of the application, the data processing circuit acts as a processor control (e.g. selection of a variable resistance termination path connected to the interface).
The memory 104 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the image segmentation method in the embodiment of the present application, and the processor executes various functional applications and data processing by running the software programs and modules stored in the memory 104, so as to implement the image segmentation method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer terminal 10 (or mobile device).
Fig. 2 shows a block diagram of a hardware structure, which may be taken as an exemplary block diagram of the computer terminal 10 (or the mobile device) and also taken as an exemplary block diagram of the server, in an alternative embodiment, the server may be a locally deployed server or a cloud server, and is connected to one or more clients via a data network or electronically. The data network connection may be a local area network connection, a wide area network connection, an internet connection, or other type of data network connection.
Under the above operating environment, the present application provides an image segmentation method as shown in fig. 3. Fig. 3 is a flowchart of a first image segmentation method according to an embodiment of the present application. As shown in fig. 3, the method may include the steps of:
step S302, a target image is acquired.
The target image in the above steps may be an image including a target with a larger display size and a single characteristic, and is different from the image acquired by the camera. Optionally, the target image may be a remote sensing image, an aerial image shot by an unmanned aerial vehicle, or a radar image, but is not limited thereto. In different application scenarios, the type of the target contained in the target image is different, for example, in a meteorological application scenario, the target contained in the target image may be a cloud layer; in a water conservancy application scene, the target contained in the target image can be a dam, a river, a lake and the like; in the field of agriculture and forestry scenes, the target contained in the target image can be a land, a greenhouse and the like; in a disaster application scene, the target contained in the target image can be a mountain, a dam or the like; in a city planning application scenario, the target image may include targets such as road networks, plots, buildings, and the like.
In an alternative embodiment, the target image may be captured by a satellite or a drone, transmitted to a server through a network, processed by the server, and displayed to the user, as shown in fig. 4, where the target image may be displayed in an image capture area; in another alternative embodiment, the target image may be captured by a satellite or a drone, and actively uploaded to a server by a user, and processed by the server, as shown in fig. 5, the user may accomplish the purpose of uploading the target image to the server by clicking an "upload image" button in an interactive interface, or by dragging the target image directly into a dashed frame, and the image uploaded by the user may be displayed in an image capturing area. The server may be a server deployed locally or a server deployed in the cloud.
And step S304, carrying out image segmentation on the target image to obtain an initial segmentation result of the target image.
In an optional embodiment, an existing image segmentation method may be adopted to perform image segmentation on the target image, and a specific implementation scheme is not described herein any more, and the obtained segmentation result is used as an initial segmentation result, which has a deficiency and is not high in robustness.
Step S306, predicting a height result of the target image, wherein the height result is used for representing the shortest distance between a pixel point in the target image and a target boundary contained in the target image.
Since the same target may have different sizes in different target images, the shortest distances between the pixel point and the target boundary in different target images are different, so that in the embodiment of the present application, different distances may be represented by different tag values, and a smaller tag value indicates a smaller shortest distance between the pixel point and the target boundary, so that the height result in the above step is a height tag of the pixel point in the target image.
In order to fully mine semantic information of the target boundary, a contour regression task can be added to predict a plurality of contours in the target boundary region, that is, a plurality of closed curves in the target boundary region are predicted from outside to inside, wherein the shortest distances from pixel points on the same closed curve to the target boundary are the same, and the shortest distances from pixel points on different closed curves to the target boundary are different. Therefore, in the embodiment of the present application, different label values may be set for different contour lines, and then the label value of the contour line is used as the height label of all the pixels on the contour line, and optionally, the sorting value of the contour line from outside to inside may be used as the label of the contour line. For other pixel points not located on the contour line, the labels of the other pixel points may be set as fixed values, it should be noted that, because the other pixel points do not represent the semantic information of the boundary, the height labels of the other pixel points may be set as 0 by default, but not limited thereto, and may also be set according to the shortest distance from the other pixel points to the target boundary.
For example, as shown in fig. 6, the remote sensing image includes a plurality of plots, a plurality of closed curves can be predicted for each plot, and the number of the predicted closed curves is different due to the difference in the sizes of the plots. As shown by the dotted lines in fig. 6, four contour lines can be predicted, and the height labels of all the pixel points on the outermost contour line are 1, and the height labels of all the pixel points on the second contour line are 2; the height labels of all pixel points on the third contour line are 3; the height labels of all the pixel points on the innermost contour line are 4, and the height labels of other pixel points are 0.
In an optional embodiment, the target boundary in the target image may be determined manually, and then a plurality of closed curves are obtained by continuously shrinking the target boundary inward, and specifically, the corrosion operation of opencv may be invoked to complete, the corrosion size may be determined according to the prediction efficiency and accuracy, the smaller the corrosion size is, the higher the prediction accuracy is, but the lower the efficiency is, and optionally, the corrosion size is a fixed value 1, but is not limited thereto. In another optional embodiment, in order to improve the prediction accuracy and the prediction efficiency of the height result, a height prediction model for executing a contour regression task may be trained in advance, and then the height prediction model may be used to process the target image to predict the height result. In the embodiment of the present application, a processing method using a height prediction model will be described as an example.
And step S308, correcting the initial segmentation result based on the height result to obtain a target segmentation result of the target image.
In the embodiment of the application, the semantic information in the target boundary region is continuously spread from outside to inside by predicting the height result of the target image, so that the situation that the semantic information is more fuzzy as the target boundary region is closer to the central region is solved, therefore, the initial segmentation result can be corrected by combining the semantic information of the boundary, the finally obtained target segmentation result is more complete, and the situation that only part of targets are extracted is greatly reduced.
In an alternative embodiment, after performing image segmentation on the target image to obtain a target segmentation result, the server may directly display the target segmentation result to the user for viewing, as shown in fig. 4, where the target segmentation result may be displayed in the result feedback area; in another alternative embodiment, after the server performs image segmentation on the target image to obtain the target segmentation result, the target segmentation result may be fed back to the client of the user through the network, and is displayed to the user for viewing by the client, as shown in fig. 5, the target segmentation result may be displayed in the result feedback area. Further, after the target segmentation result is displayed to the user, whether the target segmentation result is correct or not can be verified through user feedback, if the user considers that the target segmentation result is incorrect, the correct segmentation result can be fed back, as shown in fig. 4 and 5, the user can feed back the correct segmentation result in a result feedback area and upload the result to the server, and therefore the server can train the model again according to the user feedback, and the effect of improving the performance of the server is achieved.
For example, the description will be given by taking the plot segmentation in the agricultural scene as an example, after the plot image is collected by the satellite, the unmanned aerial vehicle or the radar, the plot image may be directly sent to the server for the plot segmentation, or may be transmitted to the client, selected by the user, and the plot image required to be segmented is uploaded to the server. After acquiring the plot image, the server may perform image segmentation on the plot image by using an existing image segmentation method to obtain an initial segmentation result of the plot image, where the initial segmentation result includes: the method comprises the following steps that a land block is in an initial area of the land block image, meanwhile, a pre-trained height prediction model can be used for processing the land block image, a height result of the land block image is obtained through prediction, the height result can represent the shortest distance between a pixel point in the land block image and a land block boundary contained in the land block image, finally, the initial segmentation result can be corrected through the height result obtained through prediction, and a target segmentation result of the land block is obtained, and the target segmentation result comprises the following steps: and correcting the target area of the land parcel in the land parcel image, namely, correcting the initial area of the land parcel by using the height result obtained by prediction to obtain the actual area of the land parcel. After the target segmentation result is obtained, the server can directly display the target segmentation result to a user for checking, or the server sends the target segmentation result to the client for checking, and the client displays the target segmentation result to the user for checking, so that the user can see the actual area of the segmented land parcel, and the height prediction model can be optimized through the feedback result of the user, and the performance of the server is improved.
For example, the building segmentation in the city planning scene is taken as an example, and after the building image acquired by the satellite, the unmanned aerial vehicle or the radar, the building image may be directly transmitted to the server for building segmentation, or may be transmitted to the client, selected by the user, and the building image required to be subjected to building segmentation is uploaded to the server. After obtaining the building image, the server may perform image segmentation on the building image by using an existing image segmentation method to obtain an initial segmentation result of the building included in the building image, where the initial segmentation result includes: the method comprises the steps that an initial area of a building in a building image is obtained, meanwhile, a pre-trained height prediction model can be used for processing the building image, the height result of the building image is obtained through prediction, the height result can represent the shortest distance between a pixel point in the building image and the boundary of the building, finally, the initial segmentation result can be corrected through the height result obtained through prediction, and a target segmentation result of the building image is obtained, and the target segmentation result comprises the following steps: and correcting the target area of the building in the building image, namely correcting the initial area of the building by using the height result obtained by prediction to obtain the actual area of the building. After the target segmentation result is obtained, the server can directly display the target segmentation result to a user for checking, or the server sends the target segmentation result to the client for checking, and the client displays the target segmentation result to the user for checking, so that the user can see the actual area of the segmented building, and the height prediction model can be optimized through the feedback result of the user, and the performance of the server is improved.
According to the scheme provided by the embodiment of the application, after the target image is acquired, firstly, the target image is subjected to image segmentation to obtain an initial segmentation result of the target image, and a height result of the target image is obtained through prediction, and finally, the initial segmentation result is corrected based on the height result to obtain a target segmentation result of the target image, so that the purpose of performing image processing on the image with a large target display size and a single characteristic is achieved. It is easy to note that, since the height result is used for representing the shortest distance between the pixel point in the target image and the target boundary included in the target image, that is, the height result embodies the semantic information of the target boundary, in the process of correcting the initial segmentation result by using the height result, the semantic information of the target boundary is fully considered, so that the target segmentation result is more complete, the situation that only part of the target is extracted is reduced, the technical effect of improving the image segmentation accuracy is achieved, and the technical problems that when the image segmentation method in the related art is applied to the aerial image or the aerial image, the segmentation result is incomplete and the robustness is not high are solved.
In the above embodiment of the present application, the method further includes: obtaining an initial training sample, wherein the initial training sample comprises: training images and marking frames of training targets contained in the training images; generating a height labeling result of the training image; generating a target training sample based on the training image and the height labeling result; and training a height prediction model by using the target training sample, wherein the height prediction model is used for predicting the height result of the target image.
The initial training sample in the above step may be a sample used for training an image segmentation model in the related art, and the labeling box in the sample may represent an image segmentation result of the training image, and may be obtained by manually labeling a boundary of each training target in the training image. In order to reduce the labor labeling cost, the labeled image can be directly obtained from the Internet or a public image set to serve as an initial training sample.
In the embodiment of the application, in order to ensure that the height prediction model obtained by training can predict the height result of the target image, the height labeling result of the training image may be firstly generated, and then the existing labeling frame and the newly generated height labeling result are jointly used as the labeling information of the training image, so as to obtain the target training sample finally used for completing the training of the height prediction model. The training process of the height prediction model is similar to that of the image segmentation model in the related art, and is not repeated herein.
The existing labeling frame is often a polygonal closed frame, a plurality of contour lines can be obtained in a mode of continuously shrinking the labeling frame inwards in the training process of the height prediction model, the corrosion operation of opencv can be specifically called to be completed, and optionally, the corrosion size is 1. In an optional embodiment, the height labels of all the pixel points in the training image can be determined according to a plurality of predicted closed curves, and the height labels of all the pixel points are used as the height labeling result of the training image, wherein for target pixel points located on the closed curves, the contraction times of the labeling frame can be directly used as the height labels of the target pixel points, and for other pixel points except the target pixel points, preset fixed values can be used as the height labels of other pixel points. In another optional embodiment, since the main task of the height prediction model is to predict the contour map of the target image, that is, the height label of the target pixel is the labeling information that needs to be used in the training process of the height prediction model, for the target pixel located on the closed curve, the shrinkage times of the labeling frame can be directly used as the height label of the target pixel, and the height label of the target pixel can be used as the height labeling result of the training image.
In the above embodiment of the present application, generating the height labeling result of the training image includes: contracting the marking frame for multiple times to obtain a plurality of contracted marking frames; determining the height of a target pixel point based on the contraction times corresponding to the plurality of contracted marking frames, wherein the target pixel point is positioned at the position corresponding to the plurality of contracted marking frames; and obtaining a height labeling result based on the height of the target pixel point.
In an alternative embodiment, multiple contours may be generated by continuously shrinking the labeling frame inward multiple times, and then the height label of the target pixel point located on each contour may be determined. The height prediction model is used for predicting the height result of the target image which is actually obtained, so that only the height label of the target pixel point is needed in the training process of the height prediction model, and optionally, the height label of the target pixel point can be directly used as the height marking result.
In the above embodiment of the present application, after performing one contraction on the label box, the method further includes: determining the number of pixel points contained in the marking frame obtained by current contraction; if the number is larger than the preset number, continuing to shrink the marking frame; and if the number is less than or equal to the preset number, stopping shrinking the marking frame.
The preset number in the above steps may be a value set according to actual prediction accuracy and efficiency requirements, and the smaller the preset number is, the more the number of contraction of the label box is, the richer the semantic information of the boundary is, the higher the prediction accuracy is, but the lower the efficiency is. Optionally, the preset number may be 0, that is, the contracted marking frame does not contain any pixel point, and the contraction cannot be continued.
For example, the contour regression method shown in fig. 6 and 7 will be described as an example. Firstly, a polygon marking frame of a target in a training image is shrunk, the inward shrinkage distance is 1, an outermost contour line is obtained, and height labels of all pixel points on the contour line are 1; then, inwards contracting again, wherein the inwards contracting distance is 1, and a second contour line is obtained, and the height labels of all pixel points on the contour line are 2; contracting inwards again, wherein the inward contraction distance is 1, and obtaining a third contour line, wherein the height labels of all pixel points on the contour line are 3; and (4) contracting inwards again, wherein the inward contraction distance is 1, the innermost contour line is obtained, the height labels of all the pixel points on the contour line are 4, at the moment, the innermost contour line does not contain other pixel points, and the contraction is finished. After all the targets in the remote sensing image are processed, the height labeling result of the whole image can be obtained.
In the above embodiment of the present application, based on the height of the pixel point in the training image, obtaining the height labeling result includes: and obtaining the ratio of the height of the pixel point in the training image to the preset height to obtain a height labeling result.
The preset height in the above step may be the height of the training image, for example 1024.
In an alternative embodiment, as shown in fig. 7, after all the targets in the training image are shrunk, a contour map having the same size as the training image may be obtained, and then divided by a preset height for normalization processing to obtain a contour line, that is, a height labeling result.
In the above embodiments of the present application, modifying the initial segmentation result based on the height result to obtain the target segmentation result of the target image includes: performing feature transformation on the height result to obtain semantic features of the height result; and correcting the initial segmentation result based on the semantic features to obtain a target segmentation result.
In an alternative embodiment, feature transformation may be performed on the height result of the target image by using convolution of 3 × 3, so as to complete semantic extraction on the contour map, obtain semantic features, and then modify the initial segmentation result based on the semantic features, so as to complete the purpose of semantic segmentation, thereby obtaining a target segmentation result.
In the above embodiments of the present application, modifying the initial segmentation result based on the semantic features, and obtaining the target segmentation result includes: splicing the semantic features and the initial segmentation result to obtain spliced features; and processing the spliced features to obtain a target segmentation result.
In an optional embodiment, in order to correct the segmentation result, semantic features extracted from the contour map may be spliced with the initial segmentation result, that is, connected together in a channel manner, the obtained spliced features are enhanced semantic features, and the target segmentation result may be obtained by performing convolution on the finally spliced features.
In the above embodiment of the present application, after the height result of the target image is predicted, the method further includes: displaying the height result of the target image on the interactive interface; receiving a correction result corresponding to the height result, wherein the correction result is obtained by correcting the height result; and correcting the initial segmentation result based on the correction result to obtain a target segmentation result.
The interactive interface in the above steps may be an interface as shown in fig. 4 and fig. 5, through which the user may view the target segmentation result and the height result, and may adjust the above results on the interface and feed back the results to the server.
Since the accuracy of the height result may affect the accuracy of the final target segmentation result, in order to ensure the accuracy of the height result, in an optional embodiment, the server may directly display the height result to the user for viewing, that is, display the height result on the interactive interface, and in another optional embodiment, the server may issue the height result to the client through the network, and the height result is displayed to the user for viewing by the client, that is, the height result is displayed on the interactive interface. Further, the height result can be confirmed by the user, and if the user confirms that the height result is correct, the correction process of the initial segmentation result can be directly carried out on the basis of the height result; if the user confirms that the height prediction model is wrong, the user can correct the height result on the interactive interface to obtain a corresponding correction result, the correction result is fed back to the server, the correction process of the initial segmentation result can be carried out based on the correction result, the height prediction model can be optimized according to the correction result, and the performance of the server is improved.
In the above embodiment of the present application, after the initial segmentation result is corrected based on the height result to obtain the target segmentation result of the target image, the method further includes: determining the category of the target contained in the target segmentation result; determining a target display mode corresponding to the target segmentation result based on the category of the target; and displaying the target segmentation result on the interactive interface according to the target display mode.
The type of the target in the above step may be a specific type of the target in the target image, and may be a type set in advance according to a segmentation requirement. For example, still taking the example of land parcel segmentation in an agricultural scene, the category of land parcel may be the height of land parcel, such as plain, hill, etc. For another example, still taking the example of building division in a city planning scenario as an example, the category of the building may be a specific type of the building, such as a commercial building, a high-rise residence, a library, a sports center, and the like.
The target display mode in the above step may refer to a color of a region where the display target is located, a thickness of a boundary line, a line shape of a boundary line, and the like, but is not limited thereto, and the region color is exemplified in the present application.
In an optional embodiment, in order to enable a user to more clearly and intuitively view the area where the target is located, a specific category of different targets included in the target segmentation result can be determined by using an existing target identification scheme, a corresponding target display mode is further determined, the target display mode is finally displayed for the user to view, and the target segmentation result can be displayed in an interactive interface. Optionally, in order to facilitate the user to determine the categories of different targets more intuitively, the category names of the targets may be displayed in the interactive interface.
For example, still taking the land parcel segmentation in the agricultural scene as an example for explanation, after obtaining the target segmentation result, that is, obtaining the area where each land parcel is located in the land parcel image, the category of the land parcel can be determined according to different heights of the land parcel, and then the displayed color can be determined, and finally, the land parcels with different heights can be marked by using different colors, for example, green land is marked, and yellow land is marked.
For another example, the description is given by taking the building segmentation in the city planning scene as an example, after the target segmentation result is obtained, that is, after the area where each building is located in the building image is obtained, the category of each building can be obtained through identification, and then the color of the display is determined, and finally, different categories of buildings can be marked by using different colors, for example, a residential building is marked by using green, a business building is marked by using yellow, and a sports center is marked by using blue. Further, the category name of each building may be displayed within the area in which the building is located.
A preferred embodiment of the present application is described in detail below with reference to fig. 8, and the method may be performed by a computer terminal or a server. As shown in fig. 8, the method includes the steps of:
step S801, a remote sensing image is acquired.
And S802, processing the remote sensing image by using a pre-trained basic network to obtain a primary segmentation result and a height result.
Optionally, the basic network may be any convolutional neural network structure, which can perform the existing semantic segmentation task and the contour prediction task in this application, wherein the features obtained from the basic network can be transformed into the features with the dimension 1 by using convolution, so as to obtain the height result.
It should be noted that the basic network may be supervised by using a contour map constructed in advance in the training process, and the construction process of the contour map is shown in fig. 7 and will not be described herein.
And step S803, performing feature transformation on the height result by utilizing convolution to complete semantic extraction on the contour map, and obtaining the semantic features of the height result.
Step S804, the semantic features and the preliminary segmentation results are spliced to obtain spliced features, namely enhanced semantic features.
And step S805, performing convolution on the spliced features to obtain a target segmentation result.
By adding a contour regression task, the propagation direction of boundary semantic information is reasonably pointed, semantic segmentation is divided into two steps of rough segmentation and result correction, the purpose of accurate prediction is achieved, and the problems that the segmentation result is incomplete due to the fact that a remote sensing image is large in target and low in semantic information, and the algorithm robustness is low are solved. By the scheme, the remote sensing image prediction result is more complete, and the situation that only part of targets are extracted is greatly reduced.
Example 2
There is also provided, in accordance with an embodiment of the present application, an image segmentation method, it being noted that the steps illustrated in the flowchart of the figure may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than here.
Fig. 9 is a flowchart of a second image segmentation method according to an embodiment of the present application. As shown in fig. 9, the method may include the steps of:
and step S902, acquiring a remote sensing image.
In different application fields, the type of the target contained in the remote sensing image is different, for example, in the meteorological application field, the target contained in the remote sensing image can be a cloud layer; in the water conservancy application field, the target contained in the remote sensing image can be a dam, a river, a lake and the like; in the application field of agriculture and forestry, the remote sensing image can comprise targets such as a land parcel, a greenhouse and the like; in the disaster application field, the target contained in the remote sensing image can be a mountain, a dam and the like; in the application field of urban planning, the remote sensing image comprises the target which can be a road network, a land parcel, a building and the like.
In an alternative embodiment, the remote sensing image may be captured by a satellite, transmitted to a server through a network, processed by the server, and displayed to a user, as shown in fig. 4, where the remote sensing image may be displayed in an image capturing area; in another alternative embodiment, the remote sensing image may be captured by a satellite and actively uploaded to a server by a user, and processed by the server, as shown in fig. 5, the user may complete the purpose of uploading the remote sensing image to the server by clicking an "upload image" button in an interactive interface or by directly dragging the remote sensing image into a dashed frame, and the image uploaded by the user may be displayed in an image capturing area. The server may be a server deployed locally or a server deployed in the cloud.
And step S904, carrying out image segmentation on the remote sensing image to obtain an initial segmentation result of the remote sensing image.
In an optional embodiment, the remote sensing image may be segmented by using an existing image segmentation method, details of a specific implementation scheme are not described herein, and the obtained segmentation result is used as an initial segmentation result, which is missing and has low robustness.
And step S906, predicting to obtain a height result of the remote sensing image, wherein the height result is used for representing the shortest distance between a pixel point in the remote sensing image and a ground object boundary contained in the remote sensing image.
Since the same ground object may have different sizes in different remote sensing images, the shortest distances between the pixel point and the ground object boundary in different remote sensing images are different, so that in order to enable the method provided by the application to be applicable to different remote sensing images, in the embodiment of the application, different distances can be represented by different label values, and the smaller the label value is, the smaller the shortest distance between the pixel point and the ground object boundary is, so that the height result in the above step is the height label of the pixel point in the remote sensing image.
In order to fully excavate semantic information of a ground object boundary, a contour regression task can be added to predict a plurality of contours in the ground object boundary region, that is, a plurality of closed curves are predicted from outside to inside in the ground object boundary region, wherein the shortest distances from pixel points on the same closed curve to the ground object boundary are the same, and the shortest distances from pixel points on different closed curves to the ground object boundary are different. Therefore, in the embodiment of the present application, different label values may be set for different contour lines, and then the label value of the contour line is used as the height label of all the pixels on the contour line, and optionally, the sorting value of the contour line from outside to inside may be used as the label of the contour line. For other pixel points not located on the contour line, the labels of the other pixel points may be set as fixed values, it should be noted that, because the other pixel points do not represent the semantic information of the boundary, the height labels of the other pixel points may be set as 0 by default, but not limited thereto, and may also be set according to the shortest distance from the other pixel points to the boundary of the ground object.
For example, as shown in fig. 6, the remote sensing image includes a plurality of plots, a plurality of closed curves can be predicted for each plot, and the number of the predicted closed curves is different due to the difference in the sizes of the plots. As shown by the dotted lines in fig. 6, four contour lines can be predicted, and the height labels of all the pixel points on the outermost contour line are 1, and the height labels of all the pixel points on the second contour line are 2; the height labels of all pixel points on the third contour line are 3; the height labels of all the pixel points on the innermost contour line are 4, and the height labels of other pixel points are 0.
In an optional embodiment, the surface feature boundary in the remote sensing image may be determined manually, and then a plurality of closed curves are obtained by continuously shrinking the surface feature boundary inward, and specifically, the corrosion operation of opencv may be invoked to complete, the corrosion magnitude may be determined according to the prediction efficiency and accuracy, the smaller the corrosion magnitude is, the higher the prediction accuracy is, but the lower the efficiency is, and optionally, the corrosion magnitude is a fixed value 1, but is not limited thereto. In another optional embodiment, in order to improve the prediction accuracy and the prediction efficiency of the height result, a height prediction model for executing a contour regression task may be trained in advance, and then the remote sensing image may be processed by using the height prediction model to obtain the height result through prediction. In the embodiment of the present application, a processing method using a height prediction model will be described as an example.
And step S908, correcting the initial segmentation result based on the height result to obtain a target segmentation result of the remote sensing image.
In the embodiment of the application, the semantic information in the ground feature boundary region is continuously spread from outside to inside by predicting the height result of the remote sensing image, so that the situation that the semantic information is more fuzzy is solved when the region is closer to the central region, therefore, the initial segmentation result can be corrected by combining the semantic information of the boundary, the finally obtained target segmentation result is more complete, and the situation that only part of targets are extracted is greatly reduced.
In an alternative embodiment, after performing image segmentation on the remote sensing image by the server to obtain a target segmentation result, the target segmentation result may be directly displayed to a user for viewing, as shown in fig. 4, the target segmentation result may be displayed in a result feedback area; in another optional embodiment, after the server performs image segmentation on the remote sensing image to obtain the target segmentation result, the target segmentation result may be fed back to the client of the user through the network, and the client displays the target segmentation result to the user for viewing, as shown in fig. 5, where the target segmentation result may be displayed in the result feedback area. Further, after the target segmentation result is displayed to the user, whether the target segmentation result is correct or not can be verified through user feedback, if the user considers that the target segmentation result is incorrect, the correct segmentation result can be fed back, as shown in fig. 4 and 5, the user can feed back the correct segmentation result in a result feedback area and upload the result to the server, and therefore the server can train the model again according to the user feedback, and the effect of improving the performance of the server is achieved.
In the above embodiment of the present application, the method further includes: obtaining an initial training sample, wherein the initial training sample comprises: training images and marking frames of training ground features contained in the training images; generating a height labeling result of the training image; generating a target training sample based on the training image and the height labeling result; and training the height prediction model by using the target training sample.
The initial training sample in the above step may be a sample used for training an image segmentation model in the related art, and the labeling frame in the sample may represent an image segmentation result of the training image, and may be obtained by manually labeling a boundary of each training ground feature in the training image. In order to reduce the labor labeling cost, the labeled image can be directly obtained from the Internet or a public image set to serve as an initial training sample.
Because the initial training sample only contains the labeling frame for representing the image segmentation result and does not contain the height labeling result of each pixel point and the boundary of the training ground object, in order to ensure that the height prediction model obtained by training can predict the height result of the remote sensing image, in the embodiment of the application, the height labeling result of the training image can be firstly generated, and then the existing labeling frame and the newly generated height labeling result are jointly used as the labeling information of the training image to obtain the target training sample for finally completing the training of the height prediction model. The training process of the height prediction model is similar to that of the image segmentation model in the related art, and is not repeated herein.
The existing labeling frame is often a polygonal closed frame, a plurality of contour lines can be obtained in a mode of continuously shrinking the labeling frame inwards in the training process of the height prediction model, the corrosion operation of opencv can be specifically called to be completed, and optionally, the corrosion size is 1. In an optional embodiment, the height labels of all the pixel points in the training image can be determined according to a plurality of predicted closed curves, and the height labels of all the pixel points are used as the height labeling result of the training image, wherein for target pixel points located on the closed curves, the contraction times of the labeling frame can be directly used as the height labels of the target pixel points, and for other pixel points except the target pixel points, preset fixed values can be used as the height labels of other pixel points. In another optional embodiment, since the main task of the height prediction model is to predict the contour map of the remote sensing image, that is, the height label of the target pixel point is the labeling information that needs to be used in the training process of the height prediction model, for the target pixel point located on the closed curve, the shrinkage times of the labeling frame can be directly used as the height label of the target pixel point, and the height label of the target pixel point is used as the height labeling result of the training image.
In the above embodiment of the present application, generating the height labeling result of the training image includes: contracting the marking frame for multiple times to obtain a plurality of contracted marking frames; determining the height of a target pixel point based on the contraction times corresponding to the plurality of contracted marking frames, wherein the target pixel point is positioned at the position corresponding to the plurality of contracted marking frames; and obtaining a height labeling result based on the height of the target pixel point.
In an alternative embodiment, multiple contours may be generated by continuously shrinking the labeling frame inward multiple times, and then the height label of the target pixel point located on each contour may be determined. The height prediction model is used for predicting the height result of the actually obtained remote sensing image, so that only the height label of the target pixel point is needed in the training process of the height prediction model, and optionally, the height label of the target pixel point can be directly used as the height marking result.
In the above embodiment of the present application, after performing one contraction on the label box, the method further includes: determining the number of pixel points contained in the marking frame obtained by current contraction; if the number is larger than the preset number, continuing to shrink the marking frame; and if the number is less than or equal to the preset number, stopping shrinking the marking frame.
The preset number in the above steps may be a value set according to actual prediction accuracy and efficiency requirements, and the smaller the preset number is, the more the number of contraction of the label box is, the richer the semantic information of the boundary is, the higher the prediction accuracy is, but the lower the efficiency is. Optionally, the preset number may be 0, that is, the contracted marking frame does not contain any pixel point, and the contraction cannot be continued.
For example, the contour regression method shown in fig. 6 and 7 will be described as an example. Firstly, a polygon marking frame of a ground object in a training image is contracted, the inward contraction distance is 1, the outermost contour line is obtained, and height labels of all pixel points on the contour line are 1; then, inwards contracting again, wherein the inwards contracting distance is 1, and a second contour line is obtained, and the height labels of all pixel points on the contour line are 2; contracting inwards again, wherein the inward contraction distance is 1, and obtaining a third contour line, wherein the height labels of all pixel points on the contour line are 3; and (4) contracting inwards again, wherein the inward contraction distance is 1, the innermost contour line is obtained, the height labels of all the pixel points on the contour line are 4, at the moment, the innermost contour line does not contain other pixel points, and the contraction is finished. After all the ground objects in the remote sensing image are processed, the height labeling result of the whole image can be obtained.
In the above embodiment of the present application, based on the height of the pixel point in the training image, obtaining the height labeling result includes: and obtaining the ratio of the height of the pixel point in the training image to the preset height to obtain a height labeling result.
The preset height in the above step may be the height of the training image, for example 1024.
In an alternative embodiment, as shown in fig. 7, after all the features in the training image are shrunk, a contour map with the same size as the training image may be obtained, and then the contour map is divided by the preset height for normalization, so as to obtain the height labeling result.
In the above embodiment of the present application, the modifying the initial segmentation result based on the height result to obtain the target segmentation result of the remote sensing image includes: performing feature transformation on the height result to obtain semantic features of the height result; and correcting the initial segmentation result based on the semantic features to obtain a target segmentation result.
In an optional embodiment, feature transformation can be performed on the height result of the remote sensing image by using convolution of 3x3, semantic extraction on the contour map is completed to obtain semantic features, and then the initial segmentation result is corrected based on the semantic features to complete the purpose of semantic segmentation to obtain a target segmentation result.
In the above embodiments of the present application, modifying the initial segmentation result based on the semantic features, and obtaining the target segmentation result includes: splicing the semantic features and the initial segmentation result to obtain spliced features; and processing the spliced features to obtain a target segmentation result.
In an optional embodiment, in order to correct the segmentation result, semantic features extracted from the contour map may be spliced with the initial segmentation result, that is, connected together in a channel manner, the obtained spliced features are enhanced semantic features, and the target segmentation result may be obtained by performing convolution on the finally spliced features.
In the above embodiment of the present application, after the height result of the remote sensing image is obtained by prediction, the method further includes: displaying the height result of the remote sensing image on an interactive interface; receiving a correction result corresponding to the height result, wherein the correction result is obtained by correcting the height result; and correcting the initial segmentation result based on the correction result to obtain a target segmentation result.
The interactive interface in the above steps may be an interface as shown in fig. 4 and fig. 5, through which the user may view the target segmentation result and the height result, and may adjust the above results on the interface and feed back the results to the server.
Since the accuracy of the height result may affect the accuracy of the final target segmentation result, in order to ensure the accuracy of the height result, in an optional embodiment, the server may directly display the height result to the user for viewing, that is, display the height result on the interactive interface, and in another optional embodiment, the server may issue the height result to the client through the network, and the height result is displayed to the user for viewing by the client, that is, the height result is displayed on the interactive interface. Further, the height result can be confirmed by the user, and if the user confirms that the height result is correct, the correction process of the initial segmentation result can be directly carried out on the basis of the height result; if the user confirms that the height prediction model is wrong, the user can correct the height result on the interactive interface to obtain a corresponding correction result, the correction result is fed back to the server, the correction process of the initial segmentation result can be carried out based on the correction result, the height prediction model can be optimized according to the correction result, and the performance of the server is improved.
In the above embodiment of the present application, after the initial segmentation result is corrected based on the height result to obtain the target segmentation result of the remote sensing image, the method further includes: determining the type of the ground object contained in the target segmentation result; determining a target display mode corresponding to the target segmentation result based on the type of the ground object; and displaying the target segmentation result on the interactive interface according to the target display mode.
The type of the feature in the above step may be a specific type of the feature in the remote sensing image, and may be a type set in advance according to a segmentation requirement. For example, still taking the example of land parcel segmentation in an agricultural scene, the category of land parcel may be the height of land parcel, such as plain, hill, etc. For another example, still taking the example of building division in a city planning scenario as an example, the category of the building may be a specific type of the building, such as a commercial building, a high-rise residence, a library, a sports center, and the like.
The target display mode in the above step may refer to a color of a region where the feature is displayed, a thickness of a boundary line, a line shape of a boundary line, and the like, but is not limited thereto, and the region color is exemplified in the present application.
In an optional embodiment, in order to enable a user to more clearly and intuitively view the region where the feature is located, a specific category of different features included in the target segmentation result can be determined by using an existing target identification scheme, a corresponding target display mode is determined, the target display mode is finally displayed for the user to view, and the target segmentation result can be displayed in an interactive interface. Optionally, in order to facilitate the user to determine the categories of different features more intuitively, the category names of the features may be displayed in the interactive interface.
Example 3
There is also provided, in accordance with an embodiment of the present application, an image segmentation method, it being noted that the steps illustrated in the flowchart of the figure may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than here.
Fig. 10 is a flowchart of a third image segmentation method according to an embodiment of the present application. As shown in fig. 10, the method may include the steps of:
and step S1002, acquiring an aerial image shot by the unmanned aerial vehicle.
In different application fields, the type of the target contained in the aerial image is different, for example, in the water conservancy application field, the target contained in the aerial image can be a dam, a river, a lake and the like; in the application field of agriculture and forestry, the aerial image comprises targets such as a land parcel, a greenhouse and the like; in the disaster application field, the aerial images contain targets such as mountains, dams and the like; in the application field of urban planning, the aerial images contain targets such as road networks, land parcels, buildings, and the like.
In an alternative embodiment, the aerial image may be captured by the drone and transmitted to the server via the network for processing by the server, while the aerial image may be presented to the user, as shown in fig. 4, the aerial image may be displayed within the image capture area; in another alternative embodiment, the aerial image may be captured by the drone and actively uploaded to the server by the user for processing by the server, as shown in fig. 5, the user may accomplish the purpose of uploading the aerial image to the server by clicking an "upload image" button in the interactive interface, or by dragging the aerial image directly into a dashed box, and the image uploaded by the user may be displayed in the image capture area. The server may be a server deployed locally or a server deployed in the cloud.
And step S1004, carrying out image segmentation on the aerial image to obtain an initial segmentation result of the aerial image.
In an optional embodiment, the image segmentation may be performed on the aerial image by using an existing image segmentation method, and a specific implementation scheme is not described herein any more, and the obtained segmentation result is used as an initial segmentation result, which has a deficiency and is not high in robustness.
And S1006, predicting a height result of the aerial image, wherein the height result is used for representing the shortest distance between a pixel point in the aerial image and a ground object boundary contained in the aerial image.
Since the same ground object may have different sizes in different aerial images, the shortest distances between the pixel point and the boundary of the ground object are different in different aerial images, and therefore, in order to enable the method provided by the application to be applicable to different aerial images, in the embodiment of the application, different distances may be represented by different tag values, and a smaller tag value indicates a smaller shortest distance between the pixel point and the boundary of the ground object, and therefore, the height result in the above step is the height tag of the pixel point in the aerial image.
In order to fully excavate semantic information of a ground object boundary, a contour regression task can be added to predict a plurality of contours in the ground object boundary region, that is, a plurality of closed curves are predicted from outside to inside in the ground object boundary region, wherein the shortest distances from pixel points on the same closed curve to the ground object boundary are the same, and the shortest distances from pixel points on different closed curves to the ground object boundary are different. Therefore, in the embodiment of the present application, different label values may be set for different contour lines, and then the label value of the contour line is used as the height label of all the pixels on the contour line, and optionally, the sorting value of the contour line from outside to inside may be used as the label of the contour line. For other pixel points not located on the contour line, the labels of the other pixel points may be set as fixed values, it should be noted that, because the other pixel points do not represent the semantic information of the boundary, the height labels of the other pixel points may be set as 0 by default, but not limited thereto, and may also be set according to the shortest distance from the other pixel points to the boundary of the ground object.
For example, as shown in fig. 6, the aerial image includes a plurality of plots, a plurality of closed curves can be predicted for each plot, and the number of the predicted closed curves is different due to the difference in the sizes of the plots. As shown by the dotted lines in fig. 6, four contour lines can be predicted, and the height labels of all the pixel points on the outermost contour line are 1, and the height labels of all the pixel points on the second contour line are 2; the height labels of all pixel points on the third contour line are 3; the height labels of all the pixel points on the innermost contour line are 4, and the height labels of other pixel points are 0.
In an optional embodiment, the feature boundary in the aerial image may be determined manually, and then a plurality of closed curves are obtained by continuously shrinking the feature boundary inward, and specifically, the corrosion operation of opencv may be invoked to complete the determination, and the corrosion magnitude may be determined according to the prediction efficiency and accuracy, where the smaller the corrosion magnitude is, the higher the prediction accuracy is, but the lower the efficiency is, and optionally, the corrosion magnitude is a fixed value 1, but is not limited thereto. In another optional embodiment, in order to improve the prediction accuracy and the prediction efficiency of the height result, a height prediction model for executing a contour regression task may be trained in advance, and then the aerial image may be processed by using the height prediction model to obtain the height result through prediction. In the embodiment of the present application, a processing method using a height prediction model will be described as an example.
And step S1008, correcting the initial segmentation result based on the height result to obtain a target segmentation result of the aerial image.
In the embodiment of the application, the semantic information in the boundary region of the ground object is continuously spread from outside to inside by predicting the height result of the aerial image, so that the situation that the semantic information is more fuzzy as the region is closer to the central region is solved, therefore, the initial segmentation result can be corrected by combining the semantic information of the boundary, the finally obtained target segmentation result is more complete, and the situation that only part of targets are extracted is greatly reduced.
In an alternative embodiment, after performing image segmentation on the aerial image by the server to obtain a target segmentation result, the target segmentation result may be directly displayed to the user for viewing, as shown in fig. 4, where the target segmentation result may be displayed in the result feedback area; in another alternative embodiment, after the server performs image segmentation on the aerial image to obtain the target segmentation result, the target segmentation result may be fed back to the client of the user through the network, and the client displays the target segmentation result to the user for viewing, as shown in fig. 5, where the target segmentation result may be displayed in the result feedback area. Further, after the target segmentation result is displayed to the user, whether the target segmentation result is correct or not can be verified through user feedback, if the user considers that the target segmentation result is incorrect, the correct segmentation result can be fed back, as shown in fig. 4 and 5, the user can feed back the correct segmentation result in a result feedback area and upload the result to the server, and therefore the server can train the model again according to the user feedback, and the effect of improving the performance of the server is achieved.
In the above embodiment of the present application, the method further includes: obtaining an initial training sample, wherein the initial training sample comprises: training images and marking frames of training ground features contained in the training images; generating a height labeling result of the training image; generating a target training sample based on the training image and the height labeling result; and training the height prediction model by using the target training sample.
The initial training sample in the above step may be a sample used for training an image segmentation model in the related art, and the labeling frame in the sample may represent an image segmentation result of the training image, and may be obtained by manually labeling a boundary of each training ground feature in the training image. In order to reduce the labor labeling cost, the labeled image can be directly obtained from the Internet or a public image set to serve as an initial training sample.
Because the initial training sample only contains the labeling frame for representing the image segmentation result and does not contain the height labeling result of each pixel point and the boundary of the training ground object, in order to ensure that the height prediction model obtained by training can predict the height result of the aerial image, in the embodiment of the application, the height labeling result of the training image can be firstly generated, and then the existing labeling frame and the newly generated height labeling result are jointly used as the labeling information of the training image to obtain the target training sample for finally completing the training of the height prediction model. The training process of the height prediction model is similar to that of the image segmentation model in the related art, and is not repeated herein.
The existing labeling frame is often a polygonal closed frame, a plurality of contour lines can be obtained in a mode of continuously shrinking the labeling frame inwards in the training process of the height prediction model, the corrosion operation of opencv can be specifically called to be completed, and optionally, the corrosion size is 1. In an optional embodiment, the height labels of all the pixel points in the training image can be determined according to a plurality of predicted closed curves, and the height labels of all the pixel points are used as the height labeling result of the training image, wherein for target pixel points located on the closed curves, the contraction times of the labeling frame can be directly used as the height labels of the target pixel points, and for other pixel points except the target pixel points, preset fixed values can be used as the height labels of other pixel points. In another optional embodiment, since the main task of the height prediction model is to predict the contour map of the aerial image, that is, the height label of the target pixel point is the labeling information that needs to be used in the training process of the height prediction model, for the target pixel point located on the closed curve, the shrinkage times of the labeling frame can be directly used as the height label of the target pixel point, and the height label of the target pixel point is used as the height labeling result of the training image.
In the above embodiment of the present application, generating the height labeling result of the training image includes: contracting the marking frame for multiple times to obtain a plurality of contracted marking frames; determining the height of a target pixel point based on the contraction times corresponding to the plurality of contracted marking frames, wherein the target pixel point is positioned at the position corresponding to the plurality of contracted marking frames; and obtaining a height labeling result based on the height of the target pixel point.
In an alternative embodiment, multiple contours may be generated by continuously shrinking the labeling frame inward multiple times, and then the height label of the target pixel point located on each contour may be determined. Because the height prediction model is used for predicting the height result of the actually obtained aerial image, the height label of the target pixel point is only needed in the training process of the height prediction model, and optionally, the height label of the target pixel point can be directly used as the height marking result.
In the above embodiment of the present application, after performing one contraction on the label box, the method further includes: determining the number of pixel points contained in the marking frame obtained by current contraction; if the number is larger than the preset number, continuing to shrink the marking frame; and if the number is less than or equal to the preset number, stopping shrinking the marking frame.
The preset number in the above steps may be a value set according to actual prediction accuracy and efficiency requirements, and the smaller the preset number is, the more the number of contraction of the label box is, the richer the semantic information of the boundary is, the higher the prediction accuracy is, but the lower the efficiency is. Optionally, the preset number may be 0, that is, the contracted marking frame does not contain any pixel point, and the contraction cannot be continued.
For example, the contour regression method shown in fig. 6 and 7 will be described as an example. Firstly, a polygon marking frame of a ground object in a training image is contracted, the inward contraction distance is 1, the outermost contour line is obtained, and height labels of all pixel points on the contour line are 1; then, inwards contracting again, wherein the inwards contracting distance is 1, and a second contour line is obtained, and the height labels of all pixel points on the contour line are 2; contracting inwards again, wherein the inward contraction distance is 1, and obtaining a third contour line, wherein the height labels of all pixel points on the contour line are 3; and (4) contracting inwards again, wherein the inward contraction distance is 1, the innermost contour line is obtained, the height labels of all the pixel points on the contour line are 4, at the moment, the innermost contour line does not contain other pixel points, and the contraction is finished. After all the ground objects in the aerial image are processed, the height labeling result of the whole image can be obtained.
In the above embodiment of the present application, based on the height of the pixel point in the training image, obtaining the height labeling result includes: and obtaining the ratio of the height of the pixel point in the training image to the preset height to obtain a height labeling result.
The preset height in the above step may be the height of the training image, for example 1024.
In an alternative embodiment, as shown in fig. 7, after all the features in the training image are shrunk, a contour map with the same size as the training image may be obtained, and then the contour map is divided by the preset height for normalization, so as to obtain the height labeling result.
In the above embodiment of the present application, the modifying the initial segmentation result based on the height result to obtain the target segmentation result of the aerial image includes: performing feature transformation on the height result to obtain semantic features of the height result; and correcting the initial segmentation result based on the semantic features to obtain a target segmentation result.
In an alternative embodiment, feature transformation can be performed on the height result of the aerial image by using convolution of 3x3, semantic extraction on the contour map is completed to obtain semantic features, and then the initial segmentation result is corrected based on the semantic features to complete the purpose of semantic segmentation to obtain a target segmentation result.
In the above embodiments of the present application, modifying the initial segmentation result based on the semantic features, and obtaining the target segmentation result includes: splicing the semantic features and the initial segmentation result to obtain spliced features; and processing the spliced features to obtain a target segmentation result.
In an optional embodiment, in order to correct the segmentation result, semantic features extracted from the contour map may be spliced with the initial segmentation result, that is, connected together in a channel manner, the obtained spliced features are enhanced semantic features, and the target segmentation result may be obtained by performing convolution on the finally spliced features.
In the above embodiment of the present application, after the height result of the aerial image is predicted, the method further includes: displaying the height result of the aerial image on the interactive interface; receiving a correction result corresponding to the height result, wherein the correction result is obtained by correcting the height result; and correcting the initial segmentation result based on the correction result to obtain a target segmentation result.
The interactive interface in the above steps may be an interface as shown in fig. 4 and fig. 5, through which the user may view the target segmentation result and the height result, and may adjust the above results on the interface and feed back the results to the server.
Since the accuracy of the height result may affect the accuracy of the final target segmentation result, in order to ensure the accuracy of the height result, in an optional embodiment, the server may directly display the height result to the user for viewing, that is, display the height result on the interactive interface, and in another optional embodiment, the server may issue the height result to the client through the network, and the height result is displayed to the user for viewing by the client, that is, the height result is displayed on the interactive interface. Further, the height result can be confirmed by the user, and if the user confirms that the height result is correct, the correction process of the initial segmentation result can be directly carried out on the basis of the height result; if the user confirms that the height prediction model is wrong, the user can correct the height result on the interactive interface to obtain a corresponding correction result, the correction result is fed back to the server, the correction process of the initial segmentation result can be carried out based on the correction result, the height prediction model can be optimized according to the correction result, and the performance of the server is improved.
In the above embodiment of the present application, after the initial segmentation result is corrected based on the height result to obtain the target segmentation result of the aerial image, the method further includes: determining the type of the ground object contained in the target segmentation result; determining a target display mode corresponding to the target segmentation result based on the type of the ground object; and displaying the target segmentation result on the interactive interface according to the target display mode.
The type of the target in the above step may be a specific type of the target in the target image, and may be a type set in advance according to a segmentation requirement. For example, still taking the example of land parcel segmentation in an agricultural scene, the category of land parcel may be the height of land parcel, such as plain, hill, etc. For another example, still taking the example of building division in a city planning scenario as an example, the category of the building may be a specific type of the building, such as a commercial building, a high-rise residence, a library, a sports center, and the like.
The target display mode in the above step may refer to a color of a region where the feature is displayed, a thickness of a boundary line, a line shape of a boundary line, and the like, but is not limited thereto, and the region color is exemplified in the present application.
In an optional embodiment, in order to enable a user to more clearly and intuitively view the region where the feature is located, a specific category of different features included in the target segmentation result can be determined by using an existing target identification scheme, a corresponding target display mode is determined, the target display mode is finally displayed for the user to view, and the target segmentation result can be displayed in an interactive interface. Optionally, in order to facilitate the user to determine the categories of different features more intuitively, the category names of the features may be displayed in the interactive interface.
Example 4
There is also provided, in accordance with an embodiment of the present application, an image segmentation method, it being noted that the steps illustrated in the flowchart of the figure may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than here.
Fig. 11 is a flowchart of a fourth image segmentation method according to an embodiment of the present application. As shown in fig. 11, the method may include the steps of:
in step S1102, the cloud server receives the target image uploaded by the client.
In an alternative embodiment, the target image may be captured by a satellite or an unmanned aerial vehicle, and actively uploaded to the cloud server by the user, and processed by the cloud server, as shown in fig. 5, the user may complete the purpose of uploading the target image to the cloud server by clicking an "upload image" button in the interactive interface, or by directly dragging the target image into a dashed frame, and the image uploaded by the user may be displayed in the image capturing area.
In step S1104, the cloud server performs image segmentation on the target image to obtain an initial segmentation result of the target image.
In step S1106, the cloud server predicts a height result of the target image, where the height result is used to represent the shortest distance between a pixel point in the target image and a target boundary included in the target image.
In step S1108, the cloud server corrects the initial segmentation result based on the height result to obtain a target segmentation result of the target image.
In step S1110, the cloud server feeds back the target segmentation result to the client.
In an optional embodiment, after performing image segmentation on the target image to obtain a target segmentation result, the cloud server may feed the target segmentation result back to the client of the user through the network, and the client displays the target segmentation result to the user for viewing, as shown in fig. 5, where the target segmentation result may be displayed in the result feedback area. Furthermore, after the target segmentation result is displayed to the user, whether the target segmentation result is correct or not can be verified through user feedback, if the user thinks that the target segmentation result is incorrect, the correct segmentation result can be fed back, the user can feed back the correct segmentation result in the result feedback area and upload the segmentation result to the cloud server, and therefore the cloud server can train the model again according to the user feedback, and the effect of improving the performance of the cloud server is achieved.
In the embodiments of the present application, the cloud server processes the target image by using the height prediction model, and predicts to obtain a height result.
In the above embodiment of the present application, the method further includes: the cloud server obtains an initial training sample, wherein the initial training sample comprises: training images and marking frames of training targets contained in the training images; the cloud server generates a height labeling result of the training image; the cloud server generates a target training sample based on the training image and the height labeling result; and the cloud server trains the height prediction model by using the target training sample.
In the above embodiment of the present application, the generating, by the cloud server, the height labeling result of the training image includes: contracting the marking frame for multiple times to obtain a plurality of contracted marking frames; determining the height of a target pixel point based on the contraction times corresponding to the plurality of contracted marking frames, wherein the target pixel point is positioned at the position corresponding to the plurality of contracted marking frames; and obtaining a height labeling result based on the height of the target pixel point.
In the above embodiment of the present application, after performing one contraction on the label box, the method further includes: determining the number of pixel points contained in the marking frame obtained by current contraction; if the number is larger than the preset number, continuing to shrink the marking frame; and if the number is less than or equal to the preset number, stopping shrinking the marking frame.
In the above embodiment of the present application, based on the height of the pixel point in the training image, obtaining the height labeling result includes: and obtaining the ratio of the height of the pixel point in the training image to the preset height to obtain a height labeling result.
In the foregoing embodiment of the present application, the modifying, by the cloud server, the initial segmentation result based on the height result, and obtaining the target segmentation result of the target image includes: performing feature transformation on the height result to obtain semantic features of the height result; and correcting the initial segmentation result based on the semantic features to obtain a target segmentation result.
In the above embodiments of the present application, modifying the initial segmentation result based on the semantic features, and obtaining the target segmentation result includes: splicing the semantic features and the initial segmentation result to obtain spliced features; and processing the spliced features to obtain a target segmentation result.
In the above embodiment of the present application, after the height result of the target image is predicted, the method further includes: displaying the height result of the target image on the interactive interface; receiving a correction result corresponding to the height result, wherein the correction result is obtained by correcting the height result; and correcting the initial segmentation result based on the correction result to obtain a target segmentation result.
In the above embodiment of the present application, after the initial segmentation result is corrected based on the height result to obtain the target segmentation result of the target image, the method further includes: determining the category of the target contained in the target segmentation result; determining a target display mode corresponding to the target segmentation result based on the category of the target; and displaying the target segmentation result on the interactive interface according to the target display mode.
It should be noted that the preferred embodiments described in the above examples of the present application are the same as the schemes, application scenarios, and implementation procedures provided in example 1, but are not limited to the schemes provided in example 1.
Example 5
There is also provided, in accordance with an embodiment of the present application, an image segmentation method, it being noted that the steps illustrated in the flowchart of the figure may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than here.
Fig. 12 is a flowchart of a fifth image segmentation method according to an embodiment of the present application. As shown in fig. 12, the method may include the steps of:
step S1202, obtaining a target image by calling a first function, where the first function includes: and the parameter value of the first parameter is the target image.
The first function in the above steps may be a function of data transmission between the client and the server, and is specifically used for the client to upload the target image, and the parameter value of the first parameter may be the target image itself, or a storage address of the target image.
And step S1204, performing image segmentation on the target image to obtain an initial segmentation result of the target image.
In step S1206, a height result of the target image is obtained through prediction, where the height result is used to represent a shortest distance between a pixel point in the target image and a target boundary included in the target image.
And step S1208, correcting the initial segmentation result based on the height result to obtain a target segmentation result of the target image.
Step S1210, outputting a target segmentation result by calling a second function, where the second function includes: and the parameter value of the second parameter is the target segmentation result.
The second function in the above steps may be a function for data transmission between the client and the server, and is specifically used for the server to feed back the target segmentation result to the client, and the parameter value of the second parameter may be the target segmentation result itself, or a storage address of the target segmentation result, and the like.
In the above embodiments of the present application, the height prediction model is used to process the target image, and a height result is obtained through prediction.
In the above embodiment of the present application, the method further includes: obtaining an initial training sample, wherein the initial training sample comprises: training images and marking frames of training targets contained in the training images; generating a height labeling result of the training image; generating a target training sample based on the training image and the height labeling result; and training the height prediction model by using the target training sample.
In the above embodiment of the present application, generating the height labeling result of the training image includes: contracting the marking frame for multiple times to obtain a plurality of contracted marking frames; determining the height of a target pixel point based on the contraction times corresponding to the plurality of contracted marking frames, wherein the target pixel point is positioned at the position corresponding to the plurality of contracted marking frames; and obtaining a height labeling result based on the height of the target pixel point.
In the above embodiment of the present application, after performing one contraction on the label box, the method further includes: determining the number of pixel points contained in the marking frame obtained by current contraction; if the number is larger than the preset number, continuing to shrink the marking frame; and if the number is less than or equal to the preset number, stopping shrinking the marking frame.
In the above embodiment of the present application, based on the height of the pixel point in the training image, obtaining the height labeling result includes: and obtaining the ratio of the height of the pixel point in the training image to the preset height to obtain a height labeling result.
In the above embodiments of the present application, modifying the initial segmentation result based on the height result to obtain the target segmentation result of the target image includes: performing feature transformation on the height result to obtain semantic features of the height result; and correcting the initial segmentation result based on the semantic features to obtain a target segmentation result.
In the above embodiments of the present application, modifying the initial segmentation result based on the semantic features, and obtaining the target segmentation result includes: splicing the semantic features and the initial segmentation result to obtain spliced features; and processing the spliced features to obtain a target segmentation result.
In the above embodiment of the present application, after the height result of the target image is predicted, the method further includes: displaying the height result of the target image on the interactive interface; receiving a correction result corresponding to the height result, wherein the correction result is obtained by correcting the height result; and correcting the initial segmentation result based on the correction result to obtain a target segmentation result.
In the above embodiment of the present application, after the initial segmentation result is corrected based on the height result to obtain the target segmentation result of the target image, the method further includes: determining the category of the target contained in the target segmentation result; determining a target display mode corresponding to the target segmentation result based on the category of the target; and displaying the target segmentation result on the interactive interface according to the target display mode.
It should be noted that the preferred embodiments described in the above examples of the present application are the same as the schemes, application scenarios, and implementation procedures provided in example 1, but are not limited to the schemes provided in example 1.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.
Example 6
According to an embodiment of the present application, there is also provided an image segmentation apparatus for implementing the image segmentation method, as shown in fig. 13, the apparatus 1300 includes: an acquisition module 1302, a segmentation module 1304, a prediction module 1306, and a correction module 1308.
The obtaining module 1302 is configured to obtain a target image; the segmentation module 1304 is configured to perform image segmentation on the target image to obtain an initial segmentation result of the target image; the predicting module 1306 is configured to predict a height result of the target image, where the height result is used to represent a shortest distance between a pixel point in the target image and a target boundary included in the target image; the modification module 1308 is configured to modify the initial segmentation result based on the height result to obtain a target segmentation result of the target image.
It should be noted here that the obtaining module 1302, the dividing module 1304, the predicting module 1306 and the correcting module 1308 correspond to steps S302 to S308 in embodiment 1, and the four modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in embodiment 1. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.
In the above embodiments of the present application, the prediction module is further configured to process the target image by using the height prediction model, and predict to obtain a height result.
In the above embodiment of the present application, the apparatus further includes: the device comprises a result generation module, a sample generation module and a training module.
Wherein, the acquisition module is further used for acquiring an initial training sample, wherein the initial training sample comprises: training images and marking frames of training targets contained in the training images; the result generation module is used for generating a height labeling result of the training image; the sample generation module is used for generating a target training sample based on the training image and the height labeling result; the training module is used for training the height prediction model by using the target training sample.
In the above embodiments of the present application, the generating module includes: a contraction unit, a height determination unit and a processing unit.
The contraction unit is used for contracting the marking frames for multiple times to obtain a plurality of contracted marking frames; the determining unit is used for determining the height of a target pixel point based on the contraction times corresponding to the plurality of contracted marking frames, wherein the target pixel point is positioned at the position corresponding to the plurality of contracted marking frames; the processing unit is used for obtaining a height labeling result based on the height of the target pixel point.
In the above embodiment of the present application, the generating module further includes: a number determination unit and a stop unit.
The quantity determining unit is used for determining the quantity of pixel points contained in the marking frame obtained by current contraction; the contraction unit is also used for continuously contracting the marking frame if the number is larger than the preset number; the stopping unit is used for stopping shrinking the marking frame if the number is less than or equal to the preset number.
In the above embodiment of the present application, the processing unit is further configured to obtain a ratio of the height of the pixel point in the training image to a preset height, and obtain a height labeling result.
In the above embodiments of the present application, the correction module includes: a transformation unit and a modification unit.
The change unit is used for carrying out feature transformation on the height result to obtain semantic features of the height result; and the correction unit is used for correcting the initial segmentation result based on the semantic features to obtain a target segmentation result.
In the above embodiment of the present application, the correction unit is further configured to splice the semantic features and the initial segmentation result to obtain spliced features, and process the spliced features to obtain a target segmentation result.
In the above embodiment of the present application, the apparatus further includes: the display module is used for displaying the height result of the target image on the interactive interface; the receiving module is used for receiving a correction result corresponding to the height result, wherein the correction result is obtained by correcting the height result; the correction module is also used for correcting the initial segmentation result based on the correction result to obtain a target segmentation result.
In the above embodiment of the present application, the apparatus further includes: a category determination module for determining a category of the target contained in the target segmentation result; the mode determining module is used for determining a target display mode corresponding to the target segmentation result based on the category of the target; and the display module is used for displaying the target segmentation result on the interactive interface according to the target display mode.
It should be noted that the preferred embodiments described in the above examples of the present application are the same as the schemes, application scenarios, and implementation procedures provided in example 1, but are not limited to the schemes provided in example 1.
Example 7
According to an embodiment of the present application, there is also provided an image segmentation apparatus for implementing the image segmentation method, as shown in fig. 13, the apparatus 1300 includes: an acquisition module 1302, a segmentation module 1304, a prediction module 1306, and a correction module 1308.
The obtaining module 1302 is configured to obtain a remote sensing image; the segmentation module 1304 is used for carrying out image segmentation on the remote sensing image to obtain an initial segmentation result of the remote sensing image; the prediction module 1306 is used for predicting a height result of the remote sensing image, wherein the height result is used for representing the shortest distance between a pixel point in the remote sensing image and a ground object boundary contained in the remote sensing image; the modification module 1308 is configured to modify the initial segmentation result based on the height result to obtain a target segmentation result of the remote sensing image.
It should be noted here that the obtaining module 1302, the dividing module 1304, the predicting module 1306 and the correcting module 1308 correspond to steps S902 to S908 in embodiment 2, and the four modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in embodiment 2. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.
In the above embodiment of the present application, the prediction module is further configured to process the remote sensing image by using the height prediction model, and predict to obtain a height result.
In the above embodiment of the present application, the apparatus further includes: the device comprises a result generation module, a sample generation module and a training module.
Wherein, the acquisition module is further used for acquiring an initial training sample, wherein the initial training sample comprises: training images and marking frames of training ground features contained in the training images; the result generation module is used for generating a height labeling result of the training image; the sample generation module is used for generating a target training sample based on the training image and the height labeling result; the training module is used for training the height prediction model by using the target training sample.
In the above embodiments of the present application, the generating module includes: a contraction unit, a height determination unit and a processing unit.
The contraction unit is used for contracting the marking frames for multiple times to obtain a plurality of contracted marking frames; the determining unit is used for determining the height of a target pixel point based on the contraction times corresponding to the plurality of contracted marking frames, wherein the target pixel point is positioned at the position corresponding to the plurality of contracted marking frames; the processing unit is used for obtaining a height labeling result based on the height of the target pixel point.
In the above embodiment of the present application, the generating module further includes: a number determination unit and a stop unit.
The quantity determining unit is used for determining the quantity of pixel points contained in the marking frame obtained by current contraction; the contraction unit is also used for continuously contracting the marking frame if the number is larger than the preset number; the stopping unit is used for stopping shrinking the marking frame if the number is less than or equal to the preset number.
In the above embodiment of the present application, the processing unit is further configured to obtain a ratio of the height of the pixel point in the training image to a preset height, and obtain a height labeling result.
In the above embodiments of the present application, the correction module includes: a transformation unit and a modification unit.
The change unit is used for carrying out feature transformation on the height result to obtain semantic features of the height result; and the correction unit is used for correcting the initial segmentation result based on the semantic features to obtain a target segmentation result.
In the above embodiment of the present application, the correction unit is further configured to splice the semantic features and the initial segmentation result to obtain spliced features, and process the spliced features to obtain a target segmentation result.
In the above embodiment of the present application, the apparatus further includes: the display module is used for displaying the height result of the remote sensing image on the interactive interface; the receiving module is used for receiving a correction result corresponding to the height result, wherein the correction result is obtained by correcting the height result; the correction module is also used for correcting the initial segmentation result based on the correction result to obtain a target segmentation result.
In the above embodiment of the present application, the apparatus further includes: the classification determination module is used for determining the classification of the ground object contained in the target segmentation result; the mode determining module is used for determining a target display mode corresponding to the target segmentation result based on the type of the ground feature; and the display module is used for displaying the target segmentation result on the interactive interface according to the target display mode.
It should be noted that the preferred embodiments described in the above examples of the present application are the same as the schemes, application scenarios, and implementation procedures provided in example 2, but are not limited to the schemes provided in example 2.
Example 8
According to an embodiment of the present application, there is also provided an image segmentation apparatus for implementing the image segmentation method, as shown in fig. 13, the apparatus 1300 includes: an acquisition module 1302, a segmentation module 1304, a prediction module 1306, and a correction module 1308.
The acquiring module 1302 is configured to acquire an aerial image captured by an unmanned aerial vehicle; the segmentation module 1304 is configured to perform image segmentation on the aerial image to obtain an initial segmentation result of the aerial image; the prediction module 1306 is configured to predict a height result of the aerial image, where the height result is used to represent a shortest distance between a pixel point in the aerial image and a ground object boundary included in the aerial image; the correcting module 1308 is configured to correct the initial segmentation result based on the height result to obtain a target segmentation result of the aerial image.
It should be noted here that the obtaining module 1302, the dividing module 1304, the predicting module 1306 and the modifying module 1308 correspond to steps S1002 to S1008 in embodiment 3, and the four modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in embodiment 3. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.
In the above embodiments of the present application, the prediction module is further configured to process the aerial image by using the height prediction model, and predict to obtain a height result.
In the above embodiment of the present application, the apparatus further includes: the device comprises a result generation module, a sample generation module and a training module.
Wherein, the acquisition module is further used for acquiring an initial training sample, wherein the initial training sample comprises: training images and marking frames of training ground features contained in the training images; the result generation module is used for generating a height labeling result of the training image; the sample generation module is used for generating a target training sample based on the training image and the height labeling result; the training module is used for training the height prediction model by using the target training sample.
In the above embodiments of the present application, the generating module includes: a contraction unit, a height determination unit and a processing unit.
The contraction unit is used for contracting the marking frames for multiple times to obtain a plurality of contracted marking frames; the determining unit is used for determining the height of a target pixel point based on the contraction times corresponding to the plurality of contracted marking frames, wherein the target pixel point is positioned at the position corresponding to the plurality of contracted marking frames; the processing unit is used for obtaining a height labeling result based on the height of the target pixel point.
In the above embodiment of the present application, the generating module further includes: a number determination unit and a stop unit.
The quantity determining unit is used for determining the quantity of pixel points contained in the marking frame obtained by current contraction; the contraction unit is also used for continuously contracting the marking frame if the number is larger than the preset number; the stopping unit is used for stopping shrinking the marking frame if the number is less than or equal to the preset number.
In the above embodiment of the present application, the processing unit is further configured to obtain a ratio of the height of the pixel point in the training image to a preset height, and obtain a height labeling result.
In the above embodiments of the present application, the correction module includes: a transformation unit and a modification unit.
The change unit is used for carrying out feature transformation on the height result to obtain semantic features of the height result; and the correction unit is used for correcting the initial segmentation result based on the semantic features to obtain a target segmentation result.
In the above embodiment of the present application, the correction unit is further configured to splice the semantic features and the initial segmentation result to obtain spliced features, and process the spliced features to obtain a target segmentation result.
In the above embodiment of the present application, the apparatus further includes: the display module is used for displaying the height result of the aerial image on the interactive interface; the receiving module is used for receiving a correction result corresponding to the height result, wherein the correction result is obtained by correcting the height result; the correction module is also used for correcting the initial segmentation result based on the correction result to obtain a target segmentation result.
In the above embodiment of the present application, the apparatus further includes: the classification determination module is used for determining the classification of the ground object contained in the target segmentation result; the mode determining module is used for determining a target display mode corresponding to the target segmentation result based on the type of the ground feature; and the display module is used for displaying the target segmentation result on the interactive interface according to the target display mode.
It should be noted that the preferred embodiments described in the above examples of the present application are the same as the schemes, application scenarios, and implementation procedures provided in example 1, but are not limited to the schemes provided in example 1.
Example 9
According to an embodiment of the present application, there is also provided an image segmentation apparatus for implementing the image segmentation method, where the apparatus is deployed in a cloud server, and as shown in fig. 14, the apparatus 1400 includes: a receiving module 1402, a segmentation module 1404, a prediction module 1406, a modification module 1408, and a feedback module 1410.
The receiving module 1402 is configured to receive a target image uploaded by a client; the segmentation module 1404 is configured to perform image segmentation on the target image to obtain an initial segmentation result of the target image; the predicting module 1406 is configured to predict a height result of the target image, where the height result is used to represent a shortest distance between a pixel point in the target image and a target boundary included in the target image; the correction module 1408 is configured to correct the initial segmentation result based on the height result to obtain a target segmentation result of the target image; the feedback module 1410 is configured to feed back the target segmentation result to the client.
It should be noted here that the receiving module 1402, the dividing module 1404, the predicting module 1406, the correcting module 1408 and the feedback module 1410 correspond to steps S1102 to S1110 in embodiment 4, and the five modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in embodiment 1. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.
In the above embodiments of the present application, the prediction module is further configured to process the target image by using the height prediction model, and predict to obtain a height result.
In the above embodiment of the present application, the apparatus further includes: the device comprises a result generation module, a sample generation module and a training module.
Wherein, the acquisition module is further used for acquiring an initial training sample, wherein the initial training sample comprises: training images and marking frames of training targets contained in the training images; the result generation module is used for generating a height labeling result of the training image; the sample generation module is used for generating a target training sample based on the training image and the height labeling result; the training module is used for training the height prediction model by using the target training sample.
In the above embodiments of the present application, the generating module includes: a contraction unit, a height determination unit and a processing unit.
The contraction unit is used for contracting the marking frames for multiple times to obtain a plurality of contracted marking frames; the determining unit is used for determining the height of a target pixel point based on the contraction times corresponding to the plurality of contracted marking frames, wherein the target pixel point is positioned at the position corresponding to the plurality of contracted marking frames; the processing unit is used for obtaining a height labeling result based on the height of the target pixel point.
In the above embodiment of the present application, the generating module further includes: a number determination unit and a stop unit.
The quantity determining unit is used for determining the quantity of pixel points contained in the marking frame obtained by current contraction; the contraction unit is also used for continuously contracting the marking frame if the number is larger than the preset number; the stopping unit is used for stopping shrinking the marking frame if the number is less than or equal to the preset number.
In the above embodiment of the present application, the processing unit is further configured to obtain a ratio of the height of the pixel point in the training image to a preset height, and obtain a height labeling result.
In the above embodiments of the present application, the correction module includes: a transformation unit and a modification unit.
The change unit is used for carrying out feature transformation on the height result to obtain semantic features of the height result; and the correction unit is used for correcting the initial segmentation result based on the semantic features to obtain a target segmentation result.
In the above embodiment of the present application, the correction unit is further configured to splice the semantic features and the initial segmentation result to obtain spliced features, and process the spliced features to obtain a target segmentation result.
In the above embodiment of the present application, the apparatus further includes: the display module is used for displaying the height result of the target image on the interactive interface; the receiving module is further used for receiving a correction result corresponding to the height result, wherein the correction result is obtained by correcting the height result; the correction module is also used for correcting the initial segmentation result based on the correction result to obtain a target segmentation result.
In the above embodiment of the present application, the apparatus further includes: a category determination module for determining a category of the target contained in the target segmentation result; the mode determining module is used for determining a target display mode corresponding to the target segmentation result based on the category of the target; and the display module is used for displaying the target segmentation result on the interactive interface according to the target display mode.
It should be noted that the preferred embodiments described in the above examples of the present application are the same as the schemes, application scenarios, and implementation procedures provided in example 1, but are not limited to the schemes provided in example 1.
Example 10
According to an embodiment of the present application, there is also provided an image segmentation apparatus for implementing the image segmentation method, as shown in fig. 15, the apparatus 1500 includes: a first calling module 1502, a segmentation module 1504, a prediction module 1506, a modification module 1508, and a second calling module 1510.
The first calling module 1502 is configured to obtain a target image by calling a first function, where the first function includes: a first parameter, wherein a parameter value of the first parameter is a target image; the segmentation module 1504 is used for performing image segmentation on the target image to obtain an initial segmentation result of the target image; the prediction module 1506 is configured to predict a height result of the target image, where the height result is used to represent a shortest distance between a pixel point in the target image and a target boundary included in the target image; the correcting module 1508 is configured to correct the initial segmentation result based on the height result to obtain a target segmentation result of the target image; the second calling module 1510 is configured to output the target segmentation result by calling a second function, where the second function includes: and the parameter value of the second parameter is the target segmentation result.
It should be noted here that the first invoking module 1502, the dividing module 1504, the predicting module 1506, the modifying module 1508 and the second invoking module 1510 correspond to steps S1202 to S1210 in embodiment 5, and the five modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure in embodiment 1. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.
In the above embodiments of the present application, the prediction module is further configured to process the target image by using the height prediction model, and predict to obtain a height result.
In the above embodiment of the present application, the apparatus further includes: the device comprises a result generation module, a sample generation module and a training module.
Wherein, the acquisition module is further used for acquiring an initial training sample, wherein the initial training sample comprises: training images and marking frames of training targets contained in the training images; the result generation module is used for generating a height labeling result of the training image; the sample generation module is used for generating a target training sample based on the training image and the height labeling result; the training module is used for training the height prediction model by using the target training sample.
In the above embodiments of the present application, the generating module includes: a contraction unit, a height determination unit and a processing unit.
The contraction unit is used for contracting the marking frames for multiple times to obtain a plurality of contracted marking frames; the determining unit is used for determining the height of a target pixel point based on the contraction times corresponding to the plurality of contracted marking frames, wherein the target pixel point is positioned at the position corresponding to the plurality of contracted marking frames; the processing unit is used for obtaining a height labeling result based on the height of the target pixel point.
In the above embodiment of the present application, the generating module further includes: a number determination unit and a stop unit.
The quantity determining unit is used for determining the quantity of pixel points contained in the marking frame obtained by current contraction; the contraction unit is also used for continuously contracting the marking frame if the number is larger than the preset number; the stopping unit is used for stopping shrinking the marking frame if the number is less than or equal to the preset number.
In the above embodiment of the present application, the processing unit is further configured to obtain a ratio of the height of the pixel point in the training image to a preset height, and obtain a height labeling result.
In the above embodiments of the present application, the correction module includes: a transformation unit and a modification unit.
The change unit is used for carrying out feature transformation on the height result to obtain semantic features of the height result; and the correction unit is used for correcting the initial segmentation result based on the semantic features to obtain a target segmentation result.
In the above embodiment of the present application, the correction unit is further configured to splice the semantic features and the initial segmentation result to obtain spliced features, and process the spliced features to obtain a target segmentation result.
In the above embodiment of the present application, the apparatus further includes: the display module is used for displaying the height result of the target image on the interactive interface; the receiving module is used for receiving a correction result corresponding to the height result, wherein the correction result is obtained by correcting the height result; the correction module is also used for correcting the initial segmentation result based on the correction result to obtain a target segmentation result.
In the above embodiment of the present application, the apparatus further includes: a category determination module for determining a category of the target contained in the target segmentation result; the mode determining module is used for determining a target display mode corresponding to the target segmentation result based on the category of the target; and the display module is used for displaying the target segmentation result on the interactive interface according to the target display mode.
It should be noted that the preferred embodiments described in the above examples of the present application are the same as the schemes, application scenarios, and implementation procedures provided in example 1, but are not limited to the schemes provided in example 1.
Example 11
There is also provided, in accordance with an embodiment of the present application, an image segmentation method, it being noted that the steps illustrated in the flowchart of the figure may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than here.
Fig. 16 is a flowchart of a sixth image segmentation method according to an embodiment of the present application. As shown in fig. 16, the method may include the steps of:
in step S1602, a building image is acquired.
In an alternative embodiment, the building image may be captured by a satellite and transmitted to the server via a network for processing by the server, while the building image may be presented to the user, as shown in fig. 4, the building image may be displayed in the image capture area; in another alternative embodiment, the building image may be captured by the satellite and actively uploaded to the server by the user for processing by the server, as shown in fig. 5, the user may accomplish the purpose of uploading the building image to the server by clicking an "upload image" button in the interactive interface, or by dragging the building image directly into a dashed frame, and the image uploaded by the user may be displayed in the image capture area. The server may be a server deployed locally or a server deployed in the cloud.
In step S1604, the building image is subjected to image segmentation to obtain an initial segmentation result of the building included in the building image.
In an alternative embodiment, an existing image segmentation method may be used to perform image segmentation on a building image, and a specific implementation scheme is not described herein any more, and an obtained segmentation result is used as an initial segmentation result, where the segmentation result includes an initial area of a building, and the result has a deficiency and is not high in robustness.
And step S1606, predicting a height result of the building image, wherein the height result is used for representing the shortest distance between the pixel point in the building image and the boundary of the building.
Since the shortest distances between the pixel point and the boundary of the building in different images may be different due to different sizes of the same building in different images, in order to enable the method provided by the present application to be applicable to different building images, in the embodiment of the present application, different distances may be represented by different tag values, and a smaller tag value indicates a smaller shortest distance between the pixel point and the boundary of the building, so that the height result in the above step is a height tag of the pixel point in the building image.
In order to fully excavate semantic information of a building boundary, a contour regression task can be added to predict a plurality of contours in the building boundary region, that is, a plurality of closed curves in the building boundary region from outside to inside, wherein the shortest distances from pixel points on the same closed curve to the building boundary are the same, and the shortest distances from pixel points on different closed curves to the building boundary are different. Therefore, in the embodiment of the present application, different label values may be set for different contour lines, and then the label value of the contour line is used as the height label of all the pixels on the contour line, and optionally, the sorting value of the contour line from outside to inside may be used as the label of the contour line. For other pixel points which are not located on the contour line, the labels of the other pixel points may be set as fixed values, it should be noted that, because the other pixel points do not represent the semantic information of the boundary, the height labels of the other pixel points may be set as 0 by default, but not limited thereto, and may also be set according to the shortest distance from the other pixel points to the boundary of the building.
In an alternative embodiment, the building boundary in the building image may be determined manually, and then a plurality of closed curves are obtained by continuously shrinking the building boundary inward, and specifically, the corrosion operation of opencv may be invoked to complete the determination, and the corrosion magnitude may be determined according to the prediction efficiency and accuracy, where the smaller the corrosion magnitude is, the higher the prediction accuracy is, but the lower the efficiency is, and optionally, the corrosion magnitude is a fixed value 1, but is not limited thereto. In another alternative embodiment, in order to improve the prediction accuracy and the prediction efficiency of the height result, a height prediction model for performing a contour regression task may be trained in advance, and then the height prediction model may be used to process the building image to predict the height result. In the embodiment of the present application, a processing method using a height prediction model will be described as an example.
In step S1608, the initial segmentation result is corrected based on the height result to obtain a target segmentation result of the building.
In the embodiment of the application, the semantic information in the boundary area of the building is continuously spread from outside to inside by predicting the height result of the building image, so that the situation that the semantic information is more fuzzy as the boundary area is closer to the central area is solved, therefore, the initial segmentation result can be corrected by combining the semantic information of the boundary, the finally obtained target segmentation result is more complete, and the situation that only part of targets are extracted is greatly reduced.
In an alternative embodiment, after performing image segmentation on the building image to obtain a target segmentation result, the server may directly present the target segmentation result to the user for viewing, as shown in fig. 4, where the target segmentation result may be displayed in the result feedback area; in another alternative embodiment, after the server performs image segmentation on the building image to obtain the target segmentation result, the target segmentation result may be fed back to the client of the user through the network, and is displayed to the user for viewing by the client, as shown in fig. 5, the target segmentation result may be displayed in the result feedback area. Further, after the target segmentation result is displayed to the user, whether the target segmentation result is correct or not can be verified through user feedback, if the user considers that the target segmentation result is incorrect, the correct segmentation result can be fed back, as shown in fig. 4 and 5, the user can feed back the correct segmentation result in a result feedback area and upload the result to the server, and therefore the server can train the model again according to the user feedback, and the effect of improving the performance of the server is achieved.
In the above embodiment of the present application, the method further includes: obtaining an initial training sample, wherein the initial training sample comprises: training images and marking frames of training buildings contained in the training images; generating a height labeling result of the training image; generating a target training sample based on the training image and the height labeling result; and training the height prediction model by using the target training sample.
The initial training sample in the above steps may be a sample used for training an image segmentation model in the related art, and the labeling frame in the sample may represent an image segmentation result of the training image, and may be obtained by manually labeling a boundary of each training building in the training image. In order to reduce the labor labeling cost, the labeled image can be directly obtained from the Internet or a public image set to serve as an initial training sample.
Because the initial training sample only contains the labeling frame for representing the image segmentation result and does not contain the height labeling result of each pixel point and the boundary of the training building, in order to ensure that the height prediction model obtained by training can predict the height result of the building image, in the embodiment of the application, the height labeling result of the training image can be firstly generated, and then the existing labeling frame and the newly generated height labeling result are jointly used as the labeling information of the training image to obtain the target training sample for finally completing the training of the height prediction model. The training process of the height prediction model is similar to that of the image segmentation model in the related art, and is not repeated herein.
The existing labeling frame is often a polygonal closed frame, a plurality of contour lines can be obtained in a mode of continuously shrinking the labeling frame inwards in the training process of the height prediction model, the corrosion operation of opencv can be specifically called to be completed, and optionally, the corrosion size is 1. In an optional embodiment, the height labels of all the pixel points in the training image can be determined according to a plurality of predicted closed curves, and the height labels of all the pixel points are used as the height labeling result of the training image, wherein for target pixel points located on the closed curves, the contraction times of the labeling frame can be directly used as the height labels of the target pixel points, and for other pixel points except the target pixel points, preset fixed values can be used as the height labels of other pixel points. In another optional embodiment, since the main task of the height prediction model is to predict the contour map of the building image, that is, the height label of the target pixel point is the labeling information that needs to be used in the training process of the height prediction model, for the target pixel point located on the closed curve, the shrinkage times of the labeling frame can be directly used as the height label of the target pixel point, and the height label of the target pixel point is used as the height labeling result of the training image.
In the above embodiment of the present application, generating the height labeling result of the training image includes: contracting the marking frame for multiple times to obtain a plurality of contracted marking frames; determining the height of a target pixel point based on the contraction times corresponding to the plurality of contracted marking frames, wherein the target pixel point is positioned at the position corresponding to the plurality of contracted marking frames; and obtaining a height labeling result based on the height of the target pixel point.
In an alternative embodiment, multiple contours may be generated by continuously shrinking the labeling frame inward multiple times, and then the height label of the target pixel point located on each contour may be determined. Because the height prediction model is used for predicting the height result of the building image which is actually obtained, only the height label of the target pixel point is needed in the training process of the height prediction model, and optionally, the height label of the target pixel point can be directly used as the height marking result.
In the above embodiment of the present application, after performing one contraction on the label box, the method further includes: determining the number of pixel points contained in the marking frame obtained by current contraction; if the number is larger than the preset number, continuing to shrink the marking frame; and if the number is less than or equal to the preset number, stopping shrinking the marking frame.
The preset number in the above steps may be a value set according to actual prediction accuracy and efficiency requirements, and the smaller the preset number is, the more the number of contraction of the label box is, the richer the semantic information of the boundary is, the higher the prediction accuracy is, but the lower the efficiency is. Optionally, the preset number may be 0, that is, the contracted marking frame does not contain any pixel point, and the contraction cannot be continued.
For example, the contour regression method shown in fig. 6 and 7 will be described as an example. Firstly, a polygon marking frame of a building in a training image is contracted, the inward contraction distance is 1, an outermost contour line is obtained, and height labels of all pixel points on the contour line are 1; then, inwards contracting again, wherein the inwards contracting distance is 1, and a second contour line is obtained, and the height labels of all pixel points on the contour line are 2; contracting inwards again, wherein the inward contraction distance is 1, and obtaining a third contour line, wherein the height labels of all pixel points on the contour line are 3; and (4) contracting inwards again, wherein the inward contraction distance is 1, the innermost contour line is obtained, the height labels of all the pixel points on the contour line are 4, at the moment, the innermost contour line does not contain other pixel points, and the contraction is finished. After all buildings in the building image are processed, the height labeling result of the whole image can be obtained.
In the above embodiment of the present application, based on the height of the pixel point in the training image, obtaining the height labeling result includes: and obtaining the ratio of the height of the pixel point in the training image to the preset height to obtain a height labeling result.
The preset height in the above step may be the height of the training image, for example 1024.
In an alternative embodiment, after all buildings in the training image are shrunk, a contour map with the same size as the training image can be obtained, and then the contour map is divided by the preset height for normalization, so that a height labeling result is obtained.
In the above embodiment of the present application, the modifying the initial segmentation result based on the height result to obtain the target segmentation result of the building image includes: performing feature transformation on the height result to obtain semantic features of the height result; and correcting the initial segmentation result based on the semantic features to obtain a target segmentation result.
In an alternative embodiment, feature transformation may be performed on the height result of the building image by using convolution of 3 × 3, so as to complete semantic extraction on the contour map, obtain semantic features, and then modify the initial segmentation result based on the semantic features, so as to complete the purpose of semantic segmentation, and obtain a target segmentation result.
In the above embodiments of the present application, modifying the initial segmentation result based on the semantic features, and obtaining the target segmentation result includes: splicing the semantic features and the initial segmentation result to obtain spliced features; and processing the spliced features to obtain a target segmentation result.
In an optional embodiment, in order to correct the segmentation result, semantic features extracted from the contour map may be spliced with the initial segmentation result, that is, connected together in a channel manner, the obtained spliced features are enhanced semantic features, and the target segmentation result may be obtained by performing convolution on the finally spliced features.
In the above embodiment of the present application, after the height result of the building image is predicted, the method further includes: displaying the height result of the building image on the interactive interface; receiving a correction result corresponding to the height result, wherein the correction result is obtained by correcting the height result; and correcting the initial segmentation result based on the correction result to obtain a target segmentation result.
The interactive interface in the above steps may be an interface as shown in fig. 4 and fig. 5, through which the user may view the target segmentation result and the height result, and may adjust the above results on the interface and feed back the results to the server.
Since the accuracy of the height result may affect the accuracy of the final target segmentation result, in order to ensure the accuracy of the height result, in an optional embodiment, the server may directly display the height result to the user for viewing, that is, display the height result on the interactive interface, and in another optional embodiment, the server may issue the height result to the client through the network, and the height result is displayed to the user for viewing by the client, that is, the height result is displayed on the interactive interface. Further, the height result can be confirmed by the user, and if the user confirms that the height result is correct, the correction process of the initial segmentation result can be directly carried out on the basis of the height result; if the user confirms that the height prediction model is wrong, the user can correct the height result on the interactive interface to obtain a corresponding correction result, the correction result is fed back to the server, the correction process of the initial segmentation result can be carried out based on the correction result, the height prediction model can be optimized according to the correction result, and the performance of the server is improved.
In the above embodiment of the present application, after the initial segmentation result is corrected based on the height result to obtain the target segmentation result of the building, the method further includes: determining the category of the building contained in the target segmentation result; determining a target display mode corresponding to the target segmentation result based on the category of the building; and displaying the target segmentation result on the interactive interface according to the target display mode.
The category of the building in the above steps may be a specific type of building, such as a commercial building, a high-rise residence, a library, a sports center, and the like.
The target display mode in the above steps may refer to displaying the color of the area where the building is located, the thickness of the boundary line, the line shape of the boundary line, etc., but is not limited thereto, and the area color is exemplified in the present application.
In an optional embodiment, in order to enable a user to more clearly and intuitively view the area where the building is located, the specific categories of different buildings included in the target segmentation result can be determined by using the existing target identification scheme, a corresponding target display mode is determined, the target display mode is finally displayed for the user to view, and the target segmentation result can be displayed in the interactive interface. Optionally, in order to facilitate the user to more intuitively determine the categories of different buildings, the category names of the buildings may be displayed in the interactive interface.
For example, taking building segmentation in a city planning scene as an example, after obtaining a target segmentation result, that is, obtaining an area where each building is located in a building image, the category of each building can be obtained through identification, and then the color of the display is determined, and finally, different categories of buildings can be marked by using different colors, for example, a residential building is marked by using green, a business building is marked by using yellow, and a sports center is marked by using blue. Further, the category name of each building may be displayed within the area in which the building is located.
Example 12
According to an embodiment of the present application, there is also provided an image segmentation apparatus for implementing the image segmentation method, as shown in fig. 13, the apparatus 1300 includes: an acquisition module 1302, a segmentation module 1304, a prediction module 1306, and a correction module 1308.
The obtaining module 1302 is configured to obtain a building image; the segmentation module 1304 is configured to perform image segmentation on the building image to obtain an initial segmentation result of a building included in the building image; the prediction module 1306 is configured to predict a height result of the building image, where the height result is used to represent a shortest distance between a pixel point in the building image and a boundary of the building; the correcting module 1308 is configured to correct the initial segmentation result based on the height result to obtain a target segmentation result of the building.
It should be noted here that the obtaining module 1302, the dividing module 1304, the predicting module 1306 and the modifying module 1308 correspond to steps S1602 to S1608 in the embodiment 11, and the four modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in the embodiment 11. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.
In the above embodiment of the present application, the prediction module is further configured to process the building image by using the height prediction model, and predict to obtain the height result.
In the above embodiment of the present application, the apparatus further includes: the device comprises a result generation module, a sample generation module and a training module.
Wherein, the acquisition module is further used for acquiring an initial training sample, wherein the initial training sample comprises: training images and marking frames of training buildings contained in the training images; the result generation module is used for generating a height labeling result of the training image; the sample generation module is used for generating a target training sample based on the training image and the height labeling result; the training module is used for training the height prediction model by using the target training sample.
In the above embodiments of the present application, the generating module includes: a contraction unit, a height determination unit and a processing unit.
The contraction unit is used for contracting the marking frames for multiple times to obtain a plurality of contracted marking frames; the determining unit is used for determining the height of a target pixel point based on the contraction times corresponding to the plurality of contracted marking frames, wherein the target pixel point is positioned at the position corresponding to the plurality of contracted marking frames; the processing unit is used for obtaining a height labeling result based on the height of the target pixel point.
In the above embodiment of the present application, the generating module further includes: a number determination unit and a stop unit.
The quantity determining unit is used for determining the quantity of pixel points contained in the marking frame obtained by current contraction; the contraction unit is also used for continuously contracting the marking frame if the number is larger than the preset number; the stopping unit is used for stopping shrinking the marking frame if the number is less than or equal to the preset number.
In the above embodiment of the present application, the processing unit is further configured to obtain a ratio of the height of the pixel point in the training image to a preset height, and obtain a height labeling result.
In the above embodiments of the present application, the correction module includes: a transformation unit and a modification unit.
The change unit is used for carrying out feature transformation on the height result to obtain semantic features of the height result; and the correction unit is used for correcting the initial segmentation result based on the semantic features to obtain a target segmentation result.
In the above embodiment of the present application, the correction unit is further configured to splice the semantic features and the initial segmentation result to obtain spliced features, and process the spliced features to obtain a target segmentation result.
In the above embodiment of the present application, the apparatus further includes: the display module is used for displaying the height result of the building image on the interactive interface; the receiving module is used for receiving a correction result corresponding to the height result, wherein the correction result is obtained by correcting the height result; the correction module is also used for correcting the initial segmentation result based on the correction result to obtain a target segmentation result.
In the above embodiment of the present application, the apparatus further includes: a category determination module for determining a category of a building included in the target segmentation result; the mode determining module is used for determining a target display mode corresponding to the target segmentation result based on the category of the building; and the display module is used for displaying the target segmentation result on the interactive interface according to the target display mode.
It should be noted that the preferred embodiments described in the above examples of the present application are the same as the schemes, application scenarios, and implementation procedures provided in example 11, but are not limited to the schemes provided in example 11.
Example 13
The embodiment of the application can provide a computer terminal, and the computer terminal can be any one computer terminal device in a computer terminal group. Optionally, in this embodiment, the computer terminal may also be replaced with a terminal device such as a mobile terminal.
Optionally, in this embodiment, the computer terminal may be located in at least one network device of a plurality of network devices of a computer network.
In this embodiment, the computer terminal may execute program codes of the following steps in the image segmentation method: acquiring a target image; carrying out image segmentation on the target image to obtain an initial segmentation result of the target image; predicting to obtain a height result of the target image, wherein the height result is used for representing the shortest distance between a pixel point in the target image and a target boundary contained in the target image; and correcting the initial segmentation result based on the height result to obtain a target segmentation result of the target image.
Alternatively, fig. 17 is a block diagram of a computer terminal according to an embodiment of the present application. As shown in fig. 17, the computer terminal a may include: one or more processors 1702 (only one of which is shown), and a memory 1704.
The memory may be configured to store software programs and modules, such as program instructions/modules corresponding to the image segmentation method and apparatus in the embodiments of the present application, and the processor executes various functional applications and data processing by running the software programs and modules stored in the memory, so as to implement the image segmentation method. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory remotely located from the processor, and these remote memories may be connected to terminal a through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: acquiring a target image; carrying out image segmentation on the target image to obtain an initial segmentation result of the target image; predicting to obtain a height result of the target image, wherein the height result is used for representing the shortest distance between a pixel point in the target image and a target boundary contained in the target image; and correcting the initial segmentation result based on the height result to obtain a target segmentation result of the target image.
Optionally, the processor may further execute the program code of the following steps: and processing the target image by using a height prediction model, and predicting to obtain a height result.
Optionally, the processor may further execute the program code of the following steps: obtaining an initial training sample, wherein the initial training sample comprises: training images and marking frames of training targets contained in the training images; generating a height labeling result of the training image; generating a target training sample based on the training image and the height labeling result; and training the height prediction model by using the target training sample.
Optionally, the processor may further execute the program code of the following steps: contracting the marking frame for multiple times to obtain a plurality of contracted marking frames; determining the height of a target pixel point based on the contraction times corresponding to the plurality of contracted marking frames, wherein the target pixel point is positioned at the position corresponding to the plurality of contracted marking frames; and obtaining a height labeling result based on the height of the target pixel point.
Optionally, the processor may further execute the program code of the following steps: after the marking frame is shrunk for one time, determining the number of pixel points contained in the marking frame obtained by current shrinkage; if the number is larger than the preset number, continuing to shrink the marking frame; and if the number is less than or equal to the preset number, stopping shrinking the marking frame.
Optionally, the processor may further execute the program code of the following steps: and obtaining the ratio of the height of the pixel point in the training image to the preset height to obtain a height labeling result.
Optionally, the processor may further execute the program code of the following steps: performing feature transformation on the height result to obtain semantic features of the height result; and correcting the initial segmentation result based on the semantic features to obtain a target segmentation result.
Optionally, the processor may further execute the program code of the following steps: splicing the semantic features and the initial segmentation result to obtain spliced features; and processing the spliced features to obtain a target segmentation result.
Optionally, the processor may further execute the program code of the following steps: displaying the height result of the target image on the interactive interface; receiving a correction result corresponding to the height result, wherein the correction result is obtained by correcting the height result; and correcting the initial segmentation result based on the correction result to obtain a target segmentation result.
Optionally, the processor may further execute the program code of the following steps: determining the category of the target contained in the target segmentation result; determining a target display mode corresponding to the target segmentation result based on the category of the target; and displaying the target segmentation result on the interactive interface according to the target display mode.
The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: acquiring a remote sensing image; carrying out image segmentation on the remote sensing image to obtain an initial segmentation result of the remote sensing image; predicting to obtain a height result of the remote sensing image, wherein the height result is used for representing the shortest distance between a pixel point in the remote sensing image and a ground object boundary contained in the remote sensing image; and correcting the initial segmentation result based on the height result to obtain a target segmentation result of the remote sensing image.
Optionally, the processor may further execute the program code of the following steps: and processing the remote sensing image by using the height prediction model, and predicting to obtain a height result.
Optionally, the processor may further execute the program code of the following steps: obtaining an initial training sample, wherein the initial training sample comprises: training images and marking frames of training ground features contained in the training images; generating a height labeling result of the training image; generating a target training sample based on the training image and the height labeling result; and training the height prediction model by using the target training sample.
The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: acquiring an aerial image shot by an unmanned aerial vehicle; carrying out image segmentation on the aerial image to obtain an initial segmentation result of the aerial image; predicting to obtain a height result of the aerial image, wherein the height result is used for representing the shortest distance between a pixel point in the aerial image and a ground object boundary contained in the aerial image; and correcting the initial segmentation result based on the height result to obtain a target segmentation result of the aerial image.
Optionally, the processor may further execute the program code of the following steps: and processing the aerial image by using the height prediction model, and predicting to obtain a height result.
Optionally, the processor may further execute the program code of the following steps: obtaining an initial training sample, wherein the initial training sample comprises: training images and marking frames of training ground features contained in the training images; generating a height labeling result of the training image; generating a target training sample based on the training image and the height labeling result; and training the height prediction model by using the target training sample.
The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: the cloud server receives a target image uploaded by a client; the cloud server performs image segmentation on the target image to obtain an initial segmentation result of the target image; the cloud server predicts and obtains a height result of the target image, wherein the height result is used for representing the shortest distance between a pixel point in the target image and a target boundary contained in the target image; the cloud server corrects the initial segmentation result based on the height result to obtain a target segmentation result of the target image; and the cloud server feeds the target segmentation result back to the client.
The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: acquiring a target image by calling a first function, wherein the first function comprises: a first parameter, wherein a parameter value of the first parameter is a target image; carrying out image segmentation on the target image to obtain an initial segmentation result of the target image; predicting to obtain a height result of the target image, wherein the height result is used for representing the shortest distance between a pixel point in the target image and a target boundary contained in the target image; correcting the initial segmentation result based on the height result to obtain a target segmentation result of the target image; outputting a target segmentation result by calling a second function, wherein the second function comprises: and the parameter value of the second parameter is the target segmentation result.
The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: acquiring a building image; carrying out image segmentation on the building image to obtain an initial segmentation result of the building contained in the building image; predicting to obtain a height result of the building image, wherein the height result is used for representing the shortest distance between a pixel point in the target image and the boundary of the building; and correcting the initial segmentation result of the building based on the height result to obtain a target segmentation result of the building.
Optionally, the processor may further execute the program code of the following steps: determining a category of the building; determining a target display mode corresponding to the building based on the category of the building; and displaying the target segmentation result on the interactive interface according to the target display mode.
By adopting the embodiment of the application, an image segmentation scheme is provided. The initial segmentation result is corrected by using the predicted height result, and the height result is used for representing the shortest distance between a pixel point in the target image and a target boundary contained in the target image, namely, the height result embodies the semantic information of the target boundary, so that the semantic information of the target boundary is fully considered in the process of correcting the initial segmentation result by using the height result, the target segmentation result is more complete, the situation that only part of targets are extracted is reduced, the technical effect of improving the image segmentation accuracy is achieved, and the technical problems that when the image segmentation method in the related technology is applied to an aerial image or an aerial image, the segmentation result is incomplete and the robustness is not high are solved.
It can be understood by those skilled in the art that the structure shown in fig. 17 is only an illustration, and the computer terminal may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, a Mobile Internet Device (MID), a PAD, etc. Fig. 17 is a diagram illustrating the structure of the electronic device. For example, the computer terminal a may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in fig. 17, or have a different configuration than shown in fig. 17.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
Example 14
Embodiments of the present application also provide a storage medium. Alternatively, in this embodiment, the storage medium may be configured to store program codes executed by the image segmentation method provided in the above embodiment.
Optionally, in this embodiment, the storage medium may be located in any one of computer terminals in a computer terminal group in a computer network, or in any one of mobile terminals in a mobile terminal group.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring a target image; carrying out image segmentation on the target image to obtain an initial segmentation result of the target image; predicting to obtain a height result of the target image, wherein the height result is used for representing the shortest distance between a pixel point in the target image and a target boundary contained in the target image; and correcting the initial segmentation result based on the height result to obtain a target segmentation result of the target image.
Optionally, the storage medium is further configured to store program codes for performing the following steps: and processing the target image by using a height prediction model, and predicting to obtain a height result.
Optionally, the storage medium is further configured to store program codes for performing the following steps: obtaining an initial training sample, wherein the initial training sample comprises: training images and marking frames of training targets contained in the training images; generating a height labeling result of the training image; generating a target training sample based on the training image and the height labeling result; and training the height prediction model by using the target training sample.
Optionally, the storage medium is further configured to store program codes for performing the following steps: contracting the marking frame for multiple times to obtain a plurality of contracted marking frames; determining the height of a target pixel point based on the contraction times corresponding to the plurality of contracted marking frames, wherein the target pixel point is positioned at the position corresponding to the plurality of contracted marking frames; and obtaining a height labeling result based on the height of the target pixel point.
Optionally, the storage medium is further configured to store program codes for performing the following steps: after the marking frame is shrunk for one time, determining the number of pixel points contained in the marking frame obtained by current shrinkage; if the number is larger than the preset number, continuing to shrink the marking frame; and if the number is less than or equal to the preset number, stopping shrinking the marking frame.
Optionally, the storage medium is further configured to store program codes for performing the following steps: and obtaining the ratio of the height of the pixel point in the training image to the preset height to obtain a height labeling result.
Optionally, the storage medium is further configured to store program codes for performing the following steps: performing feature transformation on the height result to obtain semantic features of the height result; and correcting the initial segmentation result based on the semantic features to obtain a target segmentation result.
Optionally, the storage medium is further configured to store program codes for performing the following steps: splicing the semantic features and the initial segmentation result to obtain spliced features; and processing the spliced features to obtain a target segmentation result.
Optionally, the storage medium is further configured to store program codes for performing the following steps: displaying the height result of the target image on the interactive interface; receiving a correction result corresponding to the height result, wherein the correction result is obtained by correcting the height result; and correcting the initial segmentation result based on the correction result to obtain a target segmentation result.
Optionally, the storage medium is further configured to store program codes for performing the following steps: determining the category of the target contained in the target segmentation result; determining a target display mode corresponding to the target segmentation result based on the category of the target; and displaying the target segmentation result on the interactive interface according to the target display mode.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring a remote sensing image; carrying out image segmentation on the remote sensing image to obtain an initial segmentation result of the remote sensing image; predicting to obtain a height result of the remote sensing image, wherein the height result is used for representing the shortest distance between a pixel point in the remote sensing image and a ground object boundary contained in the remote sensing image; and correcting the initial segmentation result based on the height result to obtain a target segmentation result of the remote sensing image.
Optionally, the storage medium is further configured to store program codes for performing the following steps: and processing the remote sensing image by using the height prediction model, and predicting to obtain a height result.
Optionally, the storage medium is further configured to store program codes for performing the following steps: obtaining an initial training sample, wherein the initial training sample comprises: training images and marking frames of training ground features contained in the training images; generating a height labeling result of the training image; generating a target training sample based on the training image and the height labeling result; and training the height prediction model by using the target training sample.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring an aerial image shot by an unmanned aerial vehicle; carrying out image segmentation on the aerial image to obtain an initial segmentation result of the aerial image; predicting to obtain a height result of the aerial image, wherein the height result is used for representing the shortest distance between a pixel point in the aerial image and a ground object boundary contained in the aerial image; and correcting the initial segmentation result based on the height result to obtain a target segmentation result of the aerial image.
Optionally, the storage medium is further configured to store program codes for performing the following steps: and processing the aerial image by using the height prediction model, and predicting to obtain a height result.
Optionally, the storage medium is further configured to store program codes for performing the following steps: obtaining an initial training sample, wherein the initial training sample comprises: training images and marking frames of training ground features contained in the training images; generating a height labeling result of the training image; generating a target training sample based on the training image and the height labeling result; and training the height prediction model by using the target training sample.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: the cloud server receives a target image uploaded by a client; the cloud server performs image segmentation on the target image to obtain an initial segmentation result of the target image; the cloud server predicts and obtains a height result of the target image, wherein the height result is used for representing the shortest distance between a pixel point in the target image and a target boundary contained in the target image; the cloud server corrects the initial segmentation result based on the height result to obtain a target segmentation result of the target image; and the cloud server feeds the target segmentation result back to the client.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring a target image by calling a first function, wherein the first function comprises: a first parameter, wherein a parameter value of the first parameter is a target image; carrying out image segmentation on the target image to obtain an initial segmentation result of the target image; predicting to obtain a height result of the target image, wherein the height result is used for representing the shortest distance between a pixel point in the target image and a target boundary contained in the target image; correcting the initial segmentation result based on the height result to obtain a target segmentation result of the target image; outputting a target segmentation result by calling a second function, wherein the second function comprises: and the parameter value of the second parameter is the target segmentation result.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring a building image; carrying out image segmentation on the building image to obtain an initial segmentation result of the building contained in the building image; predicting to obtain a height result of the building image, wherein the height result is used for representing the shortest distance between a pixel point in the target image and the boundary of the building; and correcting the initial segmentation result of the building based on the height result to obtain a target segmentation result of the building.
Optionally, the storage medium is further configured to store program codes for performing the following steps: determining a category of the building; determining a target display mode corresponding to the building based on the category of the building; and displaying the target segmentation result on the interactive interface according to the target display mode.
It should be noted that the processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the images involved in the above embodiments of the present application are all in accordance with the regulations of the relevant laws and regulations, and do not violate the customs of the official order.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims (11)

1. An image segmentation method, comprising:
acquiring a target image;
performing image segmentation on the target image to obtain an initial segmentation result of a target contained in the target image;
predicting to obtain a height result of the target image, wherein the height result is used for representing the shortest distance between a pixel point in the target image and a target boundary contained in the target image;
correcting the initial segmentation result based on the height result to obtain a target segmentation result of the target image;
wherein the height result is the height label of the pixel point, and the predicting to obtain the height result of the target image comprises: predicting a plurality of closed curves from outside to inside in the target boundary region; determining a height label of a pixel point on each closed curve based on the label value of each closed curve; and determining the height labels of the pixel points which are not positioned on the plurality of closed curves as preset values.
2. The method of claim 1, further comprising:
obtaining an initial training sample, wherein the initial training sample comprises: training images and marking frames of training targets contained in the training images;
generating a height labeling result of the training image;
generating a target training sample based on the training image and the height labeling result;
and training a height prediction model by using the target training sample, wherein the height prediction model is used for predicting the height result of the target image.
3. The method of claim 2, wherein generating the height labeling result for the training image comprises:
contracting the marking frame for multiple times to obtain a plurality of contracted marking frames;
determining the height of a target pixel point based on the contraction times corresponding to the plurality of contracted marking frames, wherein the target pixel point is positioned at the position corresponding to the plurality of contracted marking frames;
and obtaining the height labeling result based on the height of the target pixel point.
4. The method of claim 1, wherein modifying the initial segmentation result based on the height result to obtain the target segmentation result for the target image comprises:
performing feature transformation on the height result to obtain semantic features of the height result;
and correcting the initial segmentation result based on the semantic features to obtain the target segmentation result.
5. The method of any one of claims 1 to 4, wherein after predicting the height result of the target image, the method further comprises:
displaying the height result of the target image on an interactive interface;
receiving a correction result corresponding to the height result, wherein the correction result is obtained by correcting the height result;
and correcting the initial segmentation result based on the correction result to obtain the target segmentation result.
6. The method of any one of claims 1 to 4, wherein after the initial segmentation result is modified based on the height result to obtain a target segmentation result for the target image, the method further comprises:
determining the category of the target contained in the target segmentation result;
determining a target display mode corresponding to the target segmentation result based on the category of the target;
and displaying the target segmentation result on an interactive interface according to the target display mode.
7. An image segmentation method, comprising:
acquiring a building image;
carrying out image segmentation on the building image to obtain an initial segmentation result of a building contained in the building image;
predicting a height result of the building image, wherein the height result is used for representing the shortest distance between a pixel point in the building image and the boundary of the building;
correcting the initial segmentation result of the building based on the height result to obtain a target segmentation result of the building;
wherein the height result is the height label of the pixel point, and the predicting of the height result of the building image comprises: predicting a plurality of closed curves from outside to inside in a boundary region of the building; determining a height label of a pixel point on each closed curve based on the label value of each closed curve; and determining the height labels of the pixel points which are not positioned on the plurality of closed curves as preset values.
8. The method of claim 7, wherein after modifying the initial segmentation result of the building based on the height result to obtain a target segmentation result of the building, the method further comprises:
determining a category of the building;
determining a target display mode corresponding to the building based on the category of the building;
and displaying the target segmentation result on an interactive interface according to the target display mode.
9. An image segmentation method, comprising:
the cloud server receives a target image uploaded by a client;
the cloud server performs image segmentation on the target image to obtain an initial segmentation result of a target contained in the target image;
the cloud server predicts a height result of the target image, wherein the height result is used for representing the shortest distance between a pixel point in the target image and a target boundary contained in the target image;
the cloud server corrects the initial segmentation result based on the height result to obtain a target segmentation result of the target image;
the cloud server feeds the target segmentation result back to the client;
the height result is a height label of the pixel point, and the predicting, by the cloud server, the height result of the target image includes: predicting a plurality of closed curves from outside to inside in the target boundary region; determining a height label of a pixel point on each closed curve based on the label value of each closed curve; and determining the height labels of the pixel points which are not positioned on the plurality of closed curves as preset values.
10. A computer-readable storage medium, comprising a stored program, wherein the program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the image segmentation method according to any one of claims 1 to 9.
11. A computer terminal, comprising: a processor and a memory, the processor being configured to execute a program stored in the memory, wherein the program is configured to perform the image segmentation method of any one of claims 1 to 9 when executed.
CN202111035749.6A 2021-09-06 2021-09-06 Image segmentation method, computer terminal and storage medium Active CN113470051B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111035749.6A CN113470051B (en) 2021-09-06 2021-09-06 Image segmentation method, computer terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111035749.6A CN113470051B (en) 2021-09-06 2021-09-06 Image segmentation method, computer terminal and storage medium

Publications (2)

Publication Number Publication Date
CN113470051A CN113470051A (en) 2021-10-01
CN113470051B true CN113470051B (en) 2022-02-08

Family

ID=77867470

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111035749.6A Active CN113470051B (en) 2021-09-06 2021-09-06 Image segmentation method, computer terminal and storage medium

Country Status (1)

Country Link
CN (1) CN113470051B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115527028A (en) * 2022-08-16 2022-12-27 北京百度网讯科技有限公司 Map data processing method and device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514448A (en) * 2013-10-24 2014-01-15 北京国基科技股份有限公司 Method and system for navicular identification
CN111161275B (en) * 2018-11-08 2022-12-23 腾讯科技(深圳)有限公司 Method and device for segmenting target object in medical image and electronic equipment
CN110148148A (en) * 2019-03-01 2019-08-20 北京纵目安驰智能科技有限公司 A kind of training method, model and the storage medium of the lower edge detection model based on target detection
CN110751659B (en) * 2019-09-27 2022-06-10 北京小米移动软件有限公司 Image segmentation method and device, terminal and storage medium
CN112734775B (en) * 2021-01-19 2023-07-07 腾讯科技(深圳)有限公司 Image labeling, image semantic segmentation and model training methods and devices
CN113160257B (en) * 2021-04-23 2024-01-16 深圳市优必选科技股份有限公司 Image data labeling method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113470051A (en) 2021-10-01

Similar Documents

Publication Publication Date Title
CN108764039B (en) Neural network, building extraction method of remote sensing image, medium and computing equipment
WO2020000960A1 (en) Image separation method, apparatus, computer device and storage medium
CN113591804B (en) Image feature extraction method, computer-readable storage medium, and computer terminal
CN104317876A (en) Road network vector data generation method and device
CN113470051B (en) Image segmentation method, computer terminal and storage medium
CN113378897A (en) Neural network-based remote sensing image classification method, computing device and storage medium
CN111833372A (en) Foreground target extraction method and device
CN114120072A (en) Image processing method, computer-readable storage medium, and computer terminal
CN114140637B (en) Image classification method, storage medium and electronic device
US20220207679A1 (en) Method and apparatus for stitching images
CN113971757A (en) Image classification method, computer terminal and storage medium
CN108874269B (en) Target tracking method, device and system
US9639958B2 (en) Synthetic colorization of real-time immersive environments
CN113420769A (en) Image mask recognition, matting and model training method and device and electronic equipment
CN115170575B (en) Method and equipment for remote sensing image change detection and model training
CN113962850A (en) Image processing method, computer terminal and storage medium
CN113593297B (en) Parking space state detection method and device
CN111062863B (en) Method, device, equipment and storage medium for binding 3D model with longitude and latitude coordinates
CN114663570A (en) Map generation method and device, electronic device and readable storage medium
CN114299073A (en) Image segmentation method, image segmentation device, storage medium, and computer program
US11978221B2 (en) Construction detection using satellite or aerial imagery
US11900670B2 (en) Construction stage detection using satellite or aerial imagery
CN111325148A (en) Method, device and equipment for processing remote sensing image and storage medium
US11908185B2 (en) Roads and grading detection using satellite or aerial imagery
CN114387435A (en) Image processing method, storage medium, and computer terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant