CN113256650A

CN113256650A - Image segmentation method, apparatus, device and medium

Info

Publication number: CN113256650A
Application number: CN202110520007.6A
Authority: CN
Inventors: 朱艺
Original assignee: Guangzhou Fanxing Huyu IT Co Ltd
Current assignee: Guangzhou Fanxing Huyu IT Co Ltd
Priority date: 2021-05-13
Filing date: 2021-05-13
Publication date: 2021-08-13

Abstract

The application discloses an image segmentation method, an image segmentation device, image segmentation equipment and an image segmentation medium, and relates to the field of artificial intelligence. The method comprises the following steps: acquiring a target image to be segmented; acquiring text information related to image segmentation in response to an input operation; acquiring a segmentation target based on the text information, wherein the segmentation target is extracted from the text information; the target image and a dividing line indicating a dividing boundary of the dividing target between a first image region on the target image and a second image region other than the dividing target are displayed. The embodiment improves the accuracy of image segmentation by acquiring the text information related to image segmentation and segmenting the target image according to the segmentation target indicated by the text information to obtain the image segmentation result corresponding to the text information.

Description

Image segmentation method, apparatus, device and medium

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to an image segmentation method, apparatus, device, and medium.

Background

Image segmentation refers to a process of dividing an image into a plurality of specific regions with unique properties, and is an important preprocessing process in image recognition and computer vision.

In the related art, image segmentation is generally developed based on theoretical bases such as cluster analysis and fuzzy sets. For example, in an image segmentation method based on cluster analysis, pixels in an image space are identified by corresponding feature space points, the feature space is segmented according to the aggregation of the pixel points in the feature space, and then the segmented feature space is mapped into an original image space, so that an image segmentation result is obtained.

However, the use of theories such as cluster analysis and fuzzy clustering generally requires the assignment of initial values or parameter values. Under the condition that the initial value or the parameter changes greatly, the image segmentation result fluctuates greatly, so that the accuracy of image segmentation is low, and the required image segmentation result cannot be acquired.

Disclosure of Invention

The embodiment of the application provides an image segmentation method, device, equipment and medium, which are used for segmenting a target image through text information of a segmentation instruction, so that the accuracy of image segmentation is improved. The technical scheme is as follows:

according to an aspect of the present application, there is provided an image segmentation method, including:

acquiring a target image to be segmented;

acquiring text information related to image segmentation in response to an input operation;

acquiring a segmentation target based on the text information, wherein the segmentation target is extracted from the text information;

the target image and a dividing line indicating a dividing boundary of the dividing target between a first image region on the target image and a second image region other than the dividing target are displayed.

According to an aspect of the present application, there is provided an image segmentation apparatus including:

the acquisition module is used for acquiring a target image to be segmented;

the response module is used for responding to the input operation and acquiring text information related to image segmentation;

the acquisition module is also used for acquiring a segmentation target based on the text information, and the segmentation target is extracted from the text information;

and the display module is used for the target image and a dividing line, and the dividing line is used for indicating a dividing boundary of the dividing target between a first image area on the target image and a second image area except the dividing target.

According to an aspect of the present application, there is provided a computer device comprising a processor and a memory, the memory having stored therein at least one program code, the program code being loaded by the processor and performing the image segmentation method as described above.

According to an aspect of the present application, there is provided a computer-readable storage medium having at least one program code stored therein, the program code being loaded and executed by a processor to implement the image segmentation method as described above.

The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:

by acquiring the text information related to image segmentation and segmenting the target image according to the segmentation target indicated by the text information, the image segmentation result corresponding to the text information is obtained, the accuracy of image segmentation is improved, and the image segmentation result is matched with the actual requirement.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a block diagram of a computer system provided in an exemplary embodiment of the present application;

FIG. 2 is a flow chart of an image segmentation method provided by an exemplary embodiment of the present application;

FIG. 3 is a management interface diagram of an image segmentation method provided by an exemplary embodiment of the present application;

FIG. 4 is a flow chart of an image segmentation method provided by an exemplary embodiment of the present application;

FIG. 5 is a diagram of interface changes for image segmentation provided by an exemplary embodiment of the present application;

FIG. 6 is a diagram of interface changes for image segmentation provided by an exemplary embodiment of the present application;

FIG. 7 is a diagram of interface changes for image segmentation provided by an exemplary embodiment of the present application;

FIG. 8 is a flowchart of an image segmentation method provided by an exemplary embodiment of the present application;

fig. 9 is a block diagram of an image segmentation apparatus according to an exemplary embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

FIG. 1 illustrates a block diagram of a computer system provided in an exemplary embodiment of the present application. The computer system 100 includes: a first terminal 110 and a server 120.

The first terminal 110 includes an application or a web client having an image segmentation function therein. The application may be an image processing program or any of other applications that support image segmentation functionality. The device types of the first terminal 110 include: at least one of a smartphone, a tablet, an e-book reader, an MP3 player, an MP4 player, a laptop, a desktop, a smart television, a smart car.

The first terminal 110 is connected to the server 120 through a wireless network or a wired network.

The server 120 includes at least one of a server, a plurality of servers, a cloud computing platform, and a virtualization center. Optionally, the server 120 undertakes primary computational work and the terminals undertake secondary computational work; alternatively, the server 120 undertakes the secondary computing work and the terminal undertakes the primary computing work; alternatively, the server 120 and the terminal perform cooperative computing by using a distributed computing architecture.

In some optional embodiments, the first terminal 110 comprises a processor 1101 and a memory 1102.

The processor 1101 includes one or more processing cores, and the processor 1101 executes various functional applications and information processing by running software programs and modules.

The memory 1102 is operable to store at least one instruction for execution by the processor 1101 to perform the various steps of the image segmentation method. The memory 1102 may be implemented by any type or combination of volatile or non-volatile storage devices, including, but not limited to: magnetic or optical disks, Electrically Erasable Programmable Read-Only memories (EEPROMs), Erasable Programmable Read-Only memories (EPROMs), Static Random Access Memories (SRAMs), Read-Only memories (ROMs), magnetic memories, flash memories, Programmable Read-Only memories (PROMs).

In conjunction with the above description of the implementation environment, the image segmentation method provided in the embodiment of the present application is described, and an implementation subject of the method is illustrated as the first terminal 110 in fig. 1.

Fig. 2 shows a flowchart of an image segmentation method provided in an exemplary embodiment of the present application, taking the execution subject of the method as the first terminal 110 in fig. 1 as an example, the method includes the following steps:

step 202: and acquiring a target image to be segmented.

The target image is an image whose image content is to be segmented, and includes, but is not limited to, at least one of the following images: photos, paintings, clip-on pictures, maps, calligraphy works, handwritten Chinese, faxes, satellite clouds, movie and television pictures, X-ray films, electroencephalograms, electrocardiograms and the like. For example, the target image is a landscape photograph including a mountain, a river, and a plurality of visitors; alternatively, the target image is a video frame including house, person, and lines of person conversation.

There are various implementations of the acquisition of the target image. For example, the target image is stored in the first terminal 110, and the first terminal 110 obtains the target image from the storage; for another example, the target image is stored in the server, and the first terminal 110 receives the target image sent by the server; for another example, the first terminal 110 receives the interactive information sent by other terminals, and obtains the target image through the interactive information; for another example, the target image is obtained by the first terminal 110 through functions of photographing, screenshot, downloading and the like.

Step 204: in response to an input operation, text information related to image segmentation is acquired.

Specifically, the user inputs, in the management interface for image segmentation, input information on image segmentation for explaining the operation content for performing image segmentation.

There are many implementations of input operations. Optionally, the input operation includes one of a text input operation and a voice input operation.

In the case that the input operation is a text input operation, step 204 has the following optional implementations: in response to a text input operation, text information related to image segmentation is acquired.

In the case that the input operation is a voice input operation, step 204 has the following optional implementations: acquiring voice information related to image segmentation in response to a voice input operation; text information is determined from the speech information.

As schematically shown in fig. 3, a target image 311 is displayed in the management interface 310 for image segmentation, and an input box 312 is displayed, wherein the input box 312 is used for inputting input information related to image segmentation.

The user inputs text in the input box 312, the text being a character string "clothes in red" for instructing to segment the target image 311, the image segmentation result being clothes in red in the target image 311. From this character string, text information that is "red clothes" can be acquired.

Alternatively, the user inputs voice in the input box 312, and after acquiring the related voice information, the terminal converts the voice information into text information "red clothes" based on a voice recognition technology, and is used for instructing to divide the target image 311, and the image division result is the red clothes in the target image 311. According to the voice input operation, corresponding voice information can be acquired, and text information related to image segmentation can be acquired based on the voice information. The speech recognition technology is also called automatic speech recognition and is used for converting vocabulary contents in speech into computer-readable input information.

Step 206: a segmentation target is obtained based on the text information.

Illustratively, the segmentation object is extracted from the text information.

Wherein, the information which can be extracted based on the text information can be one or more. Optionally, the segmentation target needs to satisfy at least one extracted information, or satisfy all extracted information at the same time. For example, the text information includes "red clothes", the information extracted from the text information includes "red" and "clothes", and the segmentation target needs to satisfy both the conditions of "red" and "clothes".

The segmentation target may be obtained from semantic analysis of the text information. The semantic analysis is a logic stage in the compiling process, and by various machine learning methods, deep concepts related to texts, pictures and the like are mined, including text feature extraction and the like. Specifically, the terminal sends text information related to image segmentation to the server, and the server performs semantic analysis on the received text information and feeds back an analysis result to the terminal.

Step 208: and displaying the target image and the dividing line.

Illustratively, the dividing line is used to indicate a dividing boundary of the dividing target between a first image region on the target image and a second image region other than the dividing target.

According to the foregoing, after text information related to image segmentation is acquired, a segmentation target can be obtained. And segmenting the target image based on the indication content of the segmentation target. That is, in the case where the image content in the target image matches the text information relating to image segmentation, the image content is determined as the image segmentation result of the target image.

Specifically, according to the foregoing steps, there may be one or more pieces of text information related to image segmentation. And comparing the image content of the target image with the text information related to image segmentation, and determining the image segmentation result of the target image when the image content of the target image is matched with the text information related to image segmentation.

For example, the image content acquired from the target image includes "red", "green", "white", "black", "clothes", and "trousers". The text information related to image segmentation includes "red" and "clothes". And according to the matching result, determining the image content meeting the requirements of red and clothes simultaneously as an image segmentation result.

Optionally, the boundaries of the image segmentation result are distinguished by segmentation lines. Specifically, the dividing line may be a closed curve or a broken line, or may be a discontinuous line segment. For example, after it is determined that the image segmentation result is "red clothes", a closed curve is displayed in the target image, and the image area included in the curve is displayed with "red clothes".

In summary, according to the image segmentation method provided by the embodiment of the present application, the text information related to image segmentation is obtained, and the target image is segmented according to the segmentation target indicated by the text information, so as to obtain the image segmentation result corresponding to the text information, so that the accuracy of image segmentation is improved, and the image segmentation result is matched with the actual requirement.

Fig. 3 shows a flowchart of an image segmentation method provided in an exemplary embodiment of the present application, taking the execution subject of the method as the first terminal 110 in fig. 1 as an example, the method includes the following steps:

step 401: and acquiring a target image to be segmented.

The target image is an image whose image content is to be segmented, and includes, but is not limited to, at least one of the following images: photos, paintings, clip-on pictures, maps, calligraphy works, handwritten Chinese, faxes, satellite clouds, movie and television pictures, X-ray films, electroencephalograms, electrocardiograms and the like.

Step 402: in response to an input operation, text information related to image segmentation is acquired.

Illustratively, step 402 is the same as step 204, and may be referred to for further description.

Step 403: a segmentation target is obtained based on the text information.

Illustratively, the segmentation object is extracted from the text information.

Wherein, the information which can be extracted based on the text information can be one or more. Therefore, at least one extracted information or all the extracted information needs to be satisfied at the same time when the segmentation target is acquired. For example, the text information includes "red clothes", the information extracted from the text information includes "red" and "clothes", and the segmentation target needs to satisfy both the conditions of "red" and "clothes".

Illustratively, step 403 may be implemented as:

acquiring segmentation information, wherein the segmentation information is obtained according to semantic analysis of the text information, and the segmentation information comprises at least one of the type and the visual characteristic of a segmentation target;

and determining a segmentation target according to the segmentation information.

The type of the segmentation target refers to the classification of an object to which the segmentation target belongs; the visual feature of the segmentation object means at least one of a shape, a color, and a spatial position of the segmentation object. The spatial relationship features refer to the mutual spatial position or relative direction relationship among a plurality of targets segmented from the image, and these relationships can also be classified into a connection/adjacency relationship, an overlapping/overlapping relationship, an inclusion/containment relationship, and the like. For example, the division information includes houses, rectangular parallelepipeds, red, and overlaps, and the division target obtained from the division information may be a red house composed of overlapped rectangular parallelepipeds.

Specifically, in the image segmentation process, to obtain the segmentation target, the segmentation information needs to be obtained first. In an alternative embodiment, after acquiring the text information related to the image segmentation, the first terminal 110 sends the text information to the server; the server performs semantic analysis on the text information, extracts relevant segmentation information from the text information, and sends the segmentation information to the first terminal 110; the first terminal 110 determines a division target according to the division information.

That is, obtaining the segmentation information may be implemented as: sending text information to a server; and receiving the segmentation information sent by the server.

For example, the text information is "red clothes", the first terminal 110 sends the text information to the server, the server performs semantic analysis on the text information, then performs feature extraction, and understands semantics according to features to obtain segmentation information related to image segmentation, including "red" and "clothes"; the first terminal 110 receives the division information transmitted from the server and determines that the division target is "red clothes" based on the division information.

Step 404: and obtaining a semantic segmentation result of the target image.

Illustratively, the semantic segmentation result is used to indicate different objects on the object image and the image areas occupied by the different objects.

The semantic segmentation result is obtained based on semantic segmentation of pixel points in the target image. That is, the pixel points in the target image are analyzed, the pixels of the same type are classified into a target, and the target and an image area occupied by the target in the target image are determined as a semantic analysis result.

For example, the target image is a photograph including a mountain, a river, and a plurality of people. And performing semantic segmentation on the target image to obtain three targets, namely a mountain, a river and people, wherein the people targets comprise all people in the photo and image areas occupied by all people in the photo.

In the image segmentation process, after the first terminal 110 acquires the segmentation target, it needs to acquire a semantic segmentation result, and then performs comparison. In an optional embodiment, after acquiring the target image, the first terminal 110 sends the target image to the server; the server performs semantic segmentation on the target image to obtain a semantic segmentation result, and sends the semantic segmentation result to the first terminal 110.

That is, step 404 may be implemented as: sending the target image to a server; and receiving a semantic segmentation result sent by the server.

Step 405: and determining an image area occupied by the object with the highest similarity with the segmentation object in the semantic segmentation result as a first image area of the segmentation object on the target image.

According to the foregoing, the segmentation object is extracted from the text information, and the segmentation object is also a part of the target image. In order to enable the image content corresponding to the segmentation target to be distinguished in the target image, the target image needs to be further processed, the processing result of the target image is compared with the segmentation target, and in the case that the processing result of the target image is matched with the segmentation target, the area where the image content corresponding to the segmentation target is located is determined as the first image area.

Specifically, the first terminal 110 compares the semantic analysis result with the segmentation target, and determines an image area occupied by the target with the highest similarity as the first image area.

Step 406: and displaying the target image and the dividing line.

Illustratively, the dividing line is used to indicate a dividing boundary between the first image region and the second image region other than the dividing target.

As schematically shown in fig. 5, a target image 511 is displayed in the management interface 510 of image segmentation, and an input box 512 is used for inputting input information related to image segmentation.

The target image 511 includes two characters, a girl wearing red clothes on the upper left corner, and a santa claus wearing red clothes and a red hat on the lower right corner (the red part is shown as black shading in the figure).

The user inputs text in the input box 512, the text being a character string "clothes in red" for instructing to segment the target image 511, the image segmentation result being clothes in red in the target image 511. From this character string, text information that is "red clothes" can be acquired.

The first terminal 110 transmits the text information to the server, the server performs semantic analysis on the text information to obtain that the segmentation information is 'red' and 'clothes', and transmits the segmentation information to the first terminal 110, and the first terminal 110 determines that the segmentation target is 'red clothes' according to the segmentation information.

Subsequently, in the management interface 510, a first image area is displayed, which is displayed as a dot matrix area in the drawing.

Step 407: the first image area is edited in response to an editing operation on the target image.

In the image segmentation process, in order to make the obtained image segmentation result more accurate, the image segmentation result can be adjusted according to the image content of the target image.

Illustratively, the editing operation on the target image includes, but is not limited to, at least one of the following: a single click operation on the target image, a double click operation on the target image, a touch operation on the target image, a drag operation on the target image. For example, the user clicks on a point on the target image, and edits the first image area in response to the clicking operation.

Specifically, there are multiple implementation manners for editing the first image region, and this embodiment provides an alternative implementation manner, which is specifically set forth below.

Step 408: in response to an editing operation on the target image, a candidate image region corresponding to the editing operation is determined.

Wherein the candidate image area is determined according to the editing operation for adjusting the range of the first image area. Taking the editing operation as a single-click operation on the target image as an example, step 407 may be implemented as:

responding to the editing operation on the target image, and determining a first pixel point corresponding to the editing operation;

acquiring clustering scores of the first pixel points and second pixel points in adjacent regions, and determining the pixel points corresponding to the clustering scores higher than a preset threshold value as the pixel points of the same type;

and determining a candidate image area according to the same type of pixel points.

Specifically, in response to the click operation on the target image, the first terminal 110 may determine a first pixel point corresponding to the click operation, and send the pixel point to the server. After receiving the first pixel point, the server clusters the first pixel point with a second pixel point in an adjacent region to obtain a clustering score, and sends the clustering score to the first terminal 110. Subsequently, the first terminal 110 determines, according to the plurality of clustering scores, the pixel points corresponding to the clustering scores higher than the preset threshold as the pixel points of the same type, and the regions where the plurality of pixel points of the same type are located are the candidate image regions.

Clustering by the K-means algorithm is taken as an example. The basic idea of the K-means algorithm is as follows: clustering is performed centering on k points in space, classifying the objects closest to them. And (4) gradually updating the value of each clustering center through an iterative method until the best clustering result is obtained. The first pixel point is a central point in the space, the second pixel point in the adjacent region is classified by taking the first pixel point as the center, and the clustering score is obtained through iterative computation. Optionally, the clustering score is a color similarity from the second pixel point to the first pixel point, and the higher the color similarity is, the higher the clustering score is.

The cluster score can be calculated in various ways, and the embodiment of the application provides the following optional ways: the server respectively calculates the correlation degree of the first pixel point and the second pixel point based on the attention model; and according to the correlation, clustering the first pixel points and the plurality of second pixel points respectively, and calculating to obtain clustering scores.

Equivalently, the user clicks on any position on the target image, and the position can be in the image segmentation result or outside the image segmentation result; according to a selection point corresponding to the clicking operation of the user and the remaining points in the adjacent area of the selection point, the server calculates based on the attention model to obtain the correlation degree of the selection point and the remaining points; then, the cluster scores of the selected point and the plurality of remaining points are calculated based on the degree of correlation, and the portion with the higher cluster score is determined as a candidate image region.

As shown in fig. 6, the first image region obtained based on fig. 5 is a red garment for santa claus, the first image region is shown as a dot matrix region in the figure, and the boundary of the first image region is marked by a thick line. Wherein, the waistband, the buttons, the trouser legs and the cuffs of the Santa Claus are white.

A user clicks the waistband, the button, the trouser leg and the cuff in sequence, based on the first pixel points corresponding to the clicking positions, clustering scores of second pixel points of adjacent regions can be obtained, the same white pixel points are determined to be the same type pixel points, candidate image regions are determined according to the same type pixel points, and the candidate image regions are displayed as oblique line regions in the image.

Step 409: the range of the first image region is adjusted based on the candidate image region.

The position of the candidate image area in the target image includes the following three conditions: the candidate image region is included in the first image region, the candidate image region intersects the first image region, and the candidate image region is not included in the first image region.

In the image segmentation process, the range of the first image region is adjusted in two ways, namely, expansion and reduction. In step 409, according to different situations of the positions of the candidate image regions in the target image, the adjustment of the range of the first image region based on the candidate image regions also includes the following two optional adjustment modes:

the candidate image area is included in the first image area, and the candidate image area is removed from the first image area; or, the candidate image area is not included in the first image area, and the candidate image area is added to the first image area; alternatively, in the case where the candidate image region intersects the first image region, the candidate image region may be removed from the first image region, or the candidate image region may be added to the first image region.

According to the foregoing, the editing operation on the target image includes, but is not limited to, at least one of the following operations: a single click operation on the target image, a double click operation on the target image, a touch operation on the target image, a drag operation on the target image.

The adjustment of the extent of the first image area is different depending on the type of editing operation. For example, the editing operation is a dragging operation, and the user drags the dividing line to expand or reduce the range of the first image area. For another example, the editing operation is a touch operation, and the user circles the range of the candidate image region in the target image and adjusts the range of the first image region according to the range.

Specifically, the adjustment of the range of the first image region is displayed as a change in the region boundary. Due to the difference in the relative position of the boundary of the candidate image region and the boundary of the first image region, the boundary line of the first image region may be adjusted according to the relative position of the candidate image region and the first image region. Specifically, the following three cases occur:

in the case where the boundary of the candidate image region meets the boundary of the first image region: as shown in fig. 6, the boundary of the candidate image region is a partial boundary of the first image region, the candidate image region is surrounded by the first image region, and a partial boundary where the first image region overlaps with the boundary of the candidate image region is removed.

In the case where the boundary of the candidate image region does not meet the boundary of the first image region: removing the candidate image region from the first image region and displaying a boundary line of the candidate image region when the candidate image region is located in the first image region; when the candidate image area is located outside the first image area, the candidate image area is added to the first image area, and the boundary line of the candidate image area is displayed.

In case the boundary of the candidate image region intersects the boundary of the first image region: displaying a boundary line of the candidate image area when the candidate image area needs to be removed from the first image area; when the candidate image area needs to be added into the first image area, part of the boundary line of the candidate image area in the first image area is removed, and the rest boundary line is displayed and becomes a part of the boundary line of the first image area.

There are various implementation manners for adjusting the range of the first image region based on the candidate image region, and an embodiment of the present application provides an alternative implementation manner: and adjusting the range of the first image area according to the position information of the trigger position of the editing operation in the target image.

That is, there are two alternative implementations of the adjustment of the range of the first image region, step 4101 and step 4102.

Illustratively, step 4101 and step 4102 can only be executed alternatively, and cannot be executed simultaneously.

Step 4101: in the case where the trigger position of the editing operation belongs to the first image region, the candidate image region is removed from the first image region.

Step 4102: in the case where the trigger position of the editing operation does not belong to the first image region, the candidate image region is added to the first image region.

That is, the adjustment of the range of the first image area is different according to the difference in the position of the trigger position of the editing operation in the target image.

Equivalently, the user triggers an editing operation at any position on the target image, and the position can be located in the image segmentation result or outside the image segmentation result; and judging whether the trigger position is in the image segmentation result or not according to the trigger position of the editing operation of the user. When the trigger position belongs to the image segmentation result, removing the candidate image area from the first image area; and when the trigger position does not belong to the image segmentation result, adding the candidate image area into the first image area.

As schematically shown in fig. 7, a target image 711 is displayed in the management interface 710 for image segmentation, and an input box 712 is used for inputting input information related to image segmentation.

The target image 711 includes a white airplane, and the wings of the airplane are gray due to the light problem. The text information entered in the input box 712 is "white airplane", and based on the text information, a first image area is displayed in the management interface 710, the first image area being displayed as a dot matrix area in the drawing. Specifically, since the wings are gray, the first image region does not include the wing portion.

The user clicks on the wing portion of the target image 711. The method comprises the steps that a first pixel point corresponding to the click operation can be obtained according to the click operation of a user, the terminal sends the first pixel point to a server, the server clusters the first pixel point with a second pixel point in an adjacent area, and clustering scores are fed back to the terminal. After the terminal obtains the clustering score, determining the pixel points corresponding to the clustering score higher than the preset threshold value as the same type pixel points of the first pixel points, and determining the wing part as a candidate image area by the terminal based on the same type pixel points.

Since the trigger position of the user's clicking operation does not belong to the first image region, in response to the user's clicking operation, the terminal adds the wing portion (i.e., the candidate image region) to the first image region to obtain the final result of the first image region.

Fig. 8 shows a flowchart of an image segmentation method provided in an exemplary embodiment of the present application, the method including the steps of:

step 801: the first terminal acquires a target image to be segmented.

Step 802: the first terminal responds to the input operation of the segmentation instruction and acquires the text information of the segmentation instruction.

There are many implementations of input operations. Optionally, the input operation includes one of a text input operation and a voice input operation. For example, the user enters text that is the string "red clothing" indicating that the target image is to be segmented, and the image segmentation results in red clothing in the target image. From this character string, text information that is "red clothes" can be acquired.

Step 803: the first terminal sends text information to the server.

Step 804: and performing semantic analysis on the text information to obtain segmentation information, wherein the segmentation information comprises at least one of the type and the visual characteristic of the segmentation target.

The type of the segmentation target refers to the classification of an object to which the segmentation target belongs; the visual feature of the segmentation object means at least one of a shape, a color, and a spatial position of the segmentation object. For example, the division information includes houses, rectangular parallelepipeds, red, and overlaps, and the division target obtained from the division information may be a red house composed of overlapped rectangular parallelepipeds.

Step 805: the server transmits the splitting information to the first terminal.

Step 806: the first terminal receives the segmentation information.

Step 807: the first terminal determines a segmentation target according to the segmentation information.

For example, the text information is 'red clothes', the terminal sends the text information to the server, the server performs semantic analysis on the text information, then performs feature extraction, understands semantics according to features, and obtains segmentation information related to image segmentation, including 'red' and 'clothes'; the terminal receives the division information transmitted from the server, and determines that the division target is "red clothes" based on the division information.

Step 808: the first terminal transmits the target image to the server.

Step 809: and the server performs semantic segmentation on the target image to obtain a semantic segmentation result.

Step 810: and the server sends the semantic segmentation result to the first terminal.

The semantic segmentation result is a result obtained based on semantic segmentation of the pixel points in the target image. That is, the pixel points in the target image are analyzed, the pixels of the same type are classified into a target, and the target and an image area occupied by the target in the target image are determined as a semantic analysis result.

Step 811: the first terminal receives the semantic segmentation result.

For example, the target image is a photograph including a mountain, a river, and a plurality of people. The server carries out semantic segmentation on the target image to obtain three targets, namely a mountain, a river and people, wherein the people targets comprise all people in the photo and image areas occupied by all people in the photo. The server then sends the three targets to the first terminal.

Step 812: and the first terminal determines an image area occupied by the target with the highest similarity with the segmentation target in the semantic segmentation result as a first image area of the segmentation target on the target image.

And the first terminal compares the semantic segmentation result with the segmentation target, and determines the image area occupied by the target with the highest similarity as the first image area.

Step 813: the first terminal displays the target image and the dividing line.

Step 814: the first terminal edits the first image area in response to a trigger operation on the target image.

The editing process of the first image area can refer to the foregoing contents, and is not described in detail.

The following are embodiments of the apparatus of the present application, and for details that are not described in detail in the embodiments of the apparatus, reference may be made to corresponding descriptions in the above method embodiments, and details are not described herein again.

A block diagram of an exemplary image segmentation apparatus, as shown in fig. 9, comprising an acquisition module 920, a response module 940 and a display module 960, wherein:

an obtaining module 920, configured to obtain a target image to be segmented;

a response module 940, configured to, in response to an input operation, acquire text information related to image segmentation;

the obtaining module 920 is further configured to obtain a segmentation target based on the text information, where the segmentation target is extracted from the text information;

the display module 960 is configured to display the target image and a segmentation line indicating a segmentation boundary of the segmentation target between a first image region on the target image and a second image region other than the segmentation target.

In an optional implementation, the display module 960 is configured to obtain segmentation information, where the segmentation information is obtained according to semantic analysis performed on the text information, and the segmentation information includes at least one of a type and a visual feature of a segmentation target; and determining a segmentation target according to the segmentation information.

In an alternative embodiment, the display module 960 is configured to send text information to the server; and receiving the segmentation information sent by the server.

In an alternative embodiment, the display module 960 is configured to obtain a semantic segmentation result of the target image, where the semantic segmentation result is used to indicate different targets and image areas occupied by the different targets on the target image; determining an image area occupied by a target with the highest similarity with the segmentation target in the semantic segmentation result as a first image area of the segmentation target on the target image; and displaying the target image and the dividing line.

In an alternative embodiment, the display module 960 is configured to send the target image to the server; and receiving a semantic segmentation result sent by the server.

In an alternative embodiment, the image segmentation apparatus further comprises an editing module 980 for editing the first image region in response to an editing operation on the target image.

In an alternative embodiment, the editing module 980 is configured to determine, in response to an editing operation on the target image, a candidate image region corresponding to the editing operation; the range of the first image region is adjusted based on the candidate image region.

In an alternative embodiment, the editing module 980 is configured to remove the candidate image area from the first image area if the trigger position of the editing operation belongs to the first image area; alternatively, in a case where the trigger position of the editing operation does not belong to the first image area, the candidate image area is added to the first image area.

In an optional implementation manner, the editing module 980 is configured to determine, in response to an editing operation on the target image, a first pixel point corresponding to the editing operation; acquiring clustering scores of the first pixel points and second pixel points in adjacent regions, and determining the pixel points corresponding to the clustering scores higher than a preset threshold value as the pixel points of the same type; and determining a candidate image area according to the same type of pixel points.

In an alternative embodiment, the response module 940 is configured to obtain text information related to image segmentation in response to a text input operation.

In an alternative embodiment, the response module 940 is configured to, in response to a voice input operation, obtain voice information related to image segmentation; text information is determined from the speech information.

It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method of image segmentation, the method comprising:

acquiring a target image to be segmented;

displaying the target image and a dividing line indicating a dividing boundary of the dividing target between a first image region on the target image and a second image region other than the dividing target.

2. The method of claim 1, wherein the obtaining the segmentation target based on the text information comprises:

acquiring segmentation information, wherein the segmentation information is obtained according to semantic analysis performed on the text information, and the segmentation information comprises at least one of the type and the visual feature of the segmentation target;

and determining the segmentation target according to the segmentation information.

3. The method of claim 2, wherein the obtaining the segmentation information comprises:

sending the text information to a server;

and receiving the segmentation information sent by the server.

4. The method of claim 1, wherein the displaying the target image and the segmentation line comprises:

obtaining a semantic segmentation result of the target image, wherein the semantic segmentation result is used for indicating different targets on the target image and image areas occupied by the different targets;

determining an image area occupied by an object with the highest similarity to the segmentation object in the semantic segmentation result as a first image area of the segmentation object on the target image;

and displaying the target image and the segmentation line.

5. The method of claim 4, wherein obtaining the semantic segmentation result of the target image comprises:

sending the target image to a server;

and receiving the semantic segmentation result sent by the server.

6. The method of any of claims 1 to 5, further comprising:

editing the first image region in response to an editing operation on the target image.

7. The method of claim 6, wherein said editing the first image region in response to an editing operation on the target image comprises:

responding to an editing operation on the target image, and determining a candidate image area corresponding to the editing operation;

adjusting a range of the first image region based on the candidate image region.

8. The method of claim 7, wherein the adjusting the region of the first image region based on the candidate image region comprises:

removing the candidate image area from the first image area if the trigger position of the editing operation belongs to the first image area;

alternatively, the first and second electrodes may be,

adding the candidate image area to the first image area if the trigger position of the editing operation does not belong to the first image area.

9. The method of claim 7, wherein the determining, in response to the editing operation on the target image, a candidate image region corresponding to the editing operation comprises:

and determining the candidate image area according to the pixel points of the same type.

10. The method according to any one of claims 1 to 5, wherein the acquiring text information related to image segmentation in response to the input operation comprises:

and acquiring the text information related to the image segmentation in response to a text input operation.

11. The method according to any one of claims 1 to 5, wherein the acquiring text information related to image segmentation in response to the input operation comprises:

acquiring voice information related to image segmentation in response to a voice input operation;

and determining the text information according to the voice information.

12. An image segmentation apparatus, characterized in that the apparatus comprises:

the acquisition module is used for acquiring a target image to be segmented;

the acquisition module is further configured to acquire a segmentation target based on the text information, wherein the segmentation target is extracted from the text information;

a display module configured to display the target image and a dividing line indicating a dividing boundary of the division target between a first image region on the target image and a second image region other than the division target.

13. A computer device, characterized in that it comprises a processor and a memory, in which at least one program code is stored, which is loaded and executed by the processor to implement the image segmentation method according to any one of claims 1 to 11.

14. A computer-readable storage medium, having stored therein at least one program code, which is loaded and executed by a processor, to implement the image segmentation method according to any one of claims 1 to 11.