CN113256650B - Image segmentation method, device, equipment and medium - Google Patents

Image segmentation method, device, equipment and medium Download PDF

Info

Publication number
CN113256650B
CN113256650B CN202110520007.6A CN202110520007A CN113256650B CN 113256650 B CN113256650 B CN 113256650B CN 202110520007 A CN202110520007 A CN 202110520007A CN 113256650 B CN113256650 B CN 113256650B
Authority
CN
China
Prior art keywords
image
segmentation
target
text information
acquiring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110520007.6A
Other languages
Chinese (zh)
Other versions
CN113256650A (en
Inventor
朱艺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Fanxing Huyu IT Co Ltd
Original Assignee
Guangzhou Fanxing Huyu IT Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Fanxing Huyu IT Co Ltd filed Critical Guangzhou Fanxing Huyu IT Co Ltd
Priority to CN202110520007.6A priority Critical patent/CN113256650B/en
Publication of CN113256650A publication Critical patent/CN113256650A/en
Application granted granted Critical
Publication of CN113256650B publication Critical patent/CN113256650B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • User Interface Of Digital Computer (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The application discloses an image segmentation method, device, equipment and medium, and relates to the field of artificial intelligence. The method comprises the following steps: acquiring a target image to be segmented; acquiring text information related to image segmentation in response to an input operation; acquiring a segmentation target based on the text information, wherein the segmentation target is extracted from the text information; the target image and a dividing line for indicating a dividing boundary between a first image region of the dividing target on the target image and a second image region other than the dividing target are displayed. According to the embodiment, the text information related to image segmentation is obtained, and the target image is segmented according to the segmentation target indicated by the text information, so that an image segmentation result corresponding to the text information is obtained, and the accuracy of image segmentation is improved.

Description

Image segmentation method, device, equipment and medium
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to an image segmentation method, apparatus, device, and medium.
Background
Image segmentation, which refers to the process of dividing an image into a number of specific regions with unique properties, is an important preprocessing process in image recognition and computer vision.
In the related art, image segmentation is generally developed based on a theoretical basis such as cluster analysis and fuzzy set. For example, in an image segmentation method based on cluster analysis, the pixels in the image space are identified by corresponding feature space points, the feature space is segmented according to the aggregation of the pixels in the feature space, and then the feature space is mapped into the original image space, so that an image segmentation result is obtained.
However, the use of theory such as cluster analysis and fuzzy sets generally requires assigning initial values or parameter values. Under the condition that the initial value or the parameter is changed greatly, the image segmentation result also generates great fluctuation, so that the accuracy of the image segmentation is low, and the required image segmentation result cannot be obtained.
Disclosure of Invention
The embodiment of the application provides an image segmentation method, device, equipment and medium, which are used for segmenting a target image through text information of a segmentation instruction, so that the accuracy of image segmentation is improved. The technical scheme is as follows:
According to an aspect of the present application, there is provided an image segmentation method comprising:
Acquiring a target image to be segmented;
acquiring text information related to image segmentation in response to an input operation;
acquiring a segmentation target based on the text information, wherein the segmentation target is extracted from the text information;
The target image and a dividing line for indicating a dividing boundary between a first image region of the dividing target on the target image and a second image region other than the dividing target are displayed.
According to an aspect of the present application, there is provided an image segmentation apparatus including:
The acquisition module is used for acquiring a target image to be segmented;
the response module is used for responding to the input operation and acquiring text information related to image segmentation;
The acquisition module is also used for acquiring a segmentation target based on the text information, wherein the segmentation target is extracted from the text information;
and a display module for the target image and a dividing line for indicating a dividing boundary between a first image region of the dividing target on the target image and a second image region other than the dividing target.
According to an aspect of the present application, there is provided a computer device comprising a processor and a memory, the memory storing at least one program code, the program code being loaded by the processor and performing the image segmentation method as described above.
According to an aspect of the present application, there is provided a computer-readable storage medium having stored therein at least one program code loaded and executed by a processor to implement the image segmentation method as described above.
The technical scheme provided by the embodiment of the application has the beneficial effects that at least:
By acquiring text information related to image segmentation, the target image is segmented according to a segmentation target indicated by the text information, so that an image segmentation result corresponding to the text information is obtained, the accuracy of image segmentation is improved, and the image segmentation result is matched with actual requirements.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a block diagram of a computer system provided by an exemplary embodiment of the present application;
FIG. 2 is a flow chart of an image segmentation method provided by an exemplary embodiment of the present application;
FIG. 3 is a management interface diagram of an image segmentation method provided by an exemplary embodiment of the present application;
FIG. 4 is a flowchart of an image segmentation method provided by an exemplary embodiment of the present application;
FIG. 5 is an interface variation diagram of image segmentation provided by an exemplary embodiment of the present application;
FIG. 6 is an interface variation diagram of image segmentation provided by an exemplary embodiment of the present application;
FIG. 7 is an interface variation diagram of image segmentation provided by an exemplary embodiment of the present application;
FIG. 8 is a flowchart of an image segmentation method provided by an exemplary embodiment of the present application;
Fig. 9 is a block diagram of an image segmentation apparatus provided in an exemplary embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.
FIG. 1 illustrates a block diagram of a computer system provided in accordance with an exemplary embodiment of the present application. The computer system 100 includes: a first terminal 110 and a server 120.
The first terminal 110 includes an application program or a web page client having an image segmentation function. The application may be an image processing program or any of other applications that support image segmentation functionality. The device types of the first terminal 110 include: at least one of a smart phone, a tablet computer, an electronic book reader, an MP3 player, an MP4 player, a laptop portable computer, a desktop computer, a smart television, and a smart car.
The first terminal 110 is connected to the server 120 through a wireless network or a wired network.
Server 120 includes at least one of a server, a plurality of servers, a cloud computing platform, and a virtualization center. Optionally, the server 120 takes on primary computing work and the terminal takes on secondary computing work; or the server 120 takes on secondary computing work and the terminal takes on primary computing work; or the server 120 and the terminal use a distributed computing architecture for collaborative computing.
In some alternative embodiments, first terminal 110 includes processor 1101 and memory 1102.
The processor 1101 includes one or more processing cores, and the processor 1101 executes various functional applications and information processing by running software programs and modules.
The memory 1102 may be used to store at least one instruction that the processor 1101 is configured to execute to implement the steps of the image segmentation method. Memory 1102 may be implemented by any type of volatile or nonvolatile memory device, including but not limited to: magnetic or optical disk, electrically erasable programmable Read-Only Memory (EEPROM), erasable programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), static random access Memory 1102 (Static Random Access Memory, SRAM), read-Only Memory (ROM), magnetic Memory, flash Memory, programmable Read-Only Memory (Programmable Read-Only Memory, PROM).
The image segmentation method according to the embodiment of the present application is described with reference to the above description of the implementation environment, and the implementation body of the method is illustrated as the first terminal 110 in fig. 1.
Fig. 2 is a flowchart of an image segmentation method according to an exemplary embodiment of the present application, taking an execution subject of the method as an example of the first terminal 110 in fig. 1, the method includes the following steps:
step 202: and acquiring a target image to be segmented.
The target image is an image whose image content is to be segmented, the target image including, but not limited to, at least one of the following images: photographs, drawings, clippers, maps, calligraphic works, handwriting Sinology, faxes, satellite cloud pictures, film and television pictures, X-ray films, electroencephalograms, electrocardiographs and the like. For example, the target image is a landscape photograph, which includes mountain, river and tourists; or the target image is a frame of video picture, and the picture comprises a house, a person and a line of a person dialogue.
There are various implementations of the acquisition of the target image. For example, the target image is stored in the first terminal 110, and the first terminal 110 acquires the target image from the storage; as another example, the target image is stored in the server, and the first terminal 110 receives the target image sent by the server; for another example, the first terminal 110 receives the interaction information sent by other terminals, and obtains the target image through the interaction information; as another example, the target image is obtained by the first terminal 110 through photographing, screenshot, downloading, and the like.
Step 204: in response to an input operation, text information related to image segmentation is acquired.
Specifically, the user inputs input information related to image division, which is used to describe the content of an operation for performing image division, in the management interface for image division.
There are a number of implementations of the input operation. Optionally, the input operation includes one of a text input operation and a voice input operation.
In the case where the input operation is a text input operation, step 204 has the following alternative implementations: in response to the text input operation, text information related to image segmentation is acquired.
In the case where the input operation is a voice input operation, step 204 has the following alternative implementations: acquiring voice information related to image segmentation in response to a voice input operation; text information is determined from the speech information.
As schematically shown in fig. 3, a target image 311 is displayed in the image division management interface 310, and an input box 312 is provided, and the input box 312 is used to input information related to image division.
The user inputs a text, which is a character string "red clothes", for indicating the division of the target image 311 in the input box 312, the image division result being the red clothes in the target image 311. From this string, text information, which is "red clothing", can be acquired.
Or the user inputs voice in the input box 312, after the terminal acquires the relevant voice information, the voice information is converted into text information "red clothes" based on the voice recognition technology, so as to instruct to divide the target image 311, and the result of image division is red clothes in the target image 311. According to the voice input operation, corresponding voice information can be acquired, and text information related to image division can be acquired based on the voice information. The speech recognition technology is also called automatic speech recognition, and is used for converting vocabulary content in speech into computer-readable input information.
Step 206: a segmentation target is acquired based on the text information.
Illustratively, the segmentation target is extracted from the text information.
Wherein the information that can be extracted based on the text information may be one or more. Optionally, the segmentation target needs to satisfy at least one of the extracted information, or all of the extracted information simultaneously. For example, the text information includes "red clothes", the information extracted from the text information includes "red" and "clothes", and the division target needs to satisfy both the "red" and "clothes" conditions.
The acquisition of the segmentation target may be based on semantic analysis of the text information. The semantic analysis is a logic stage in the compiling process, and deep concepts related to texts, pictures and the like are mined through various machine learning methods, including text feature extraction and the like. Specifically, the terminal sends text information related to image segmentation to the server, the server performs semantic analysis on the received text information, and the analysis result is fed back to the terminal.
Step 208: the target image and the dividing line are displayed.
Illustratively, the segmentation line is used to indicate a segmentation boundary between a first image region of the segmentation target on the target image and a second image region other than the segmentation target.
According to the foregoing, after the text information related to the image segmentation is acquired, the segmentation target can be obtained. The target image is segmented based on the indicated content of the segmented target. That is, in the case where the image content in the target image matches the text information related to the image segmentation, the image content is determined as the image segmentation result of the target image.
Specifically, according to the foregoing steps, the text information related to the image segmentation may be one or more. Comparing the image content of the target image with text information related to image segmentation, and determining an image segmentation result of the target image under the condition that the image content and the text information are matched.
For example, the image content acquired from the target image includes "red", "green", "white", "black", "clothing", "pants". The text information related to the image segmentation includes "red" and "clothes". And determining the image content which simultaneously meets the requirements of 'red' and 'clothes' as an image segmentation result according to the matching result.
Optionally, boundaries of the image segmentation result are distinguished by segmentation lines. In particular, the dividing line may be a closed curve or a broken line, or may be a discontinuous line segment. For example, after it is determined that the image division result is "red clothing", a closed curve is displayed in the target image, and the "red clothing" is displayed in the image area included in the curve.
In summary, according to the image segmentation method provided by the embodiment of the application, the text information related to image segmentation is obtained, and the target image is segmented according to the segmentation target indicated by the text information, so that the image segmentation result corresponding to the text information is obtained, the accuracy of image segmentation is improved, and the image segmentation result is matched with the actual requirement.
Fig. 3 shows a flowchart of an image segmentation method according to an exemplary embodiment of the present application, taking an execution subject of the method as an example of the first terminal 110 in fig. 1, the method includes the following steps:
step 401: and acquiring a target image to be segmented.
The target image is an image whose image content is to be segmented, the target image including, but not limited to, at least one of the following images: photographs, drawings, clippers, maps, calligraphic works, handwriting Sinology, faxes, satellite cloud pictures, film and television pictures, X-ray films, electroencephalograms, electrocardiographs and the like.
Step 402: in response to an input operation, text information related to image segmentation is acquired.
Specifically, the user inputs input information related to image division, which is used to describe the content of an operation for performing image division, in the management interface for image division.
There are a number of implementations of the input operation. Optionally, the input operation includes one of a text input operation and a voice input operation.
Illustratively, step 402 is the same as step 204, and reference is made thereto, and no further description is given.
Step 403: a segmentation target is acquired based on the text information.
Illustratively, the segmentation target is extracted from the text information.
Wherein the information that can be extracted based on the text information may be one or more. Therefore, at the time of acquiring the division target, at least one piece of extraction information needs to be satisfied, or all pieces of extraction information need to be satisfied at the same time. For example, the text information includes "red clothes", the information extracted from the text information includes "red" and "clothes", and the division target needs to satisfy both the "red" and "clothes" conditions.
Illustratively, step 403 may be implemented as:
obtaining segmentation information, wherein the segmentation information is obtained according to semantic analysis of text information, and the segmentation information comprises at least one of the type and visual characteristics of a segmentation target;
determining a segmentation target according to the segmentation information.
The type of the segmentation target refers to object classification of the segmentation target; the visual characteristics of the segmented object refer to at least one of the shape, color, and spatial position of the segmented object. The spatial relationship features refer to the mutual spatial position or relative direction relationship between a plurality of targets segmented in the image, and these relationships may be also classified into a connection/adjacency relationship, an overlapping/overlapping relationship, an inclusion/containment relationship, and the like. For example, the division information includes houses, rectangular parallelepiped, red, and overlapping, and the division target obtained from the division information may be a red house composed of overlapping rectangular parallelepiped.
Specifically, in the image segmentation process, in order to acquire a segmentation target, segmentation information needs to be obtained first. In an alternative embodiment, the first terminal 110 transmits the text information to the server after acquiring the text information related to the image segmentation; the server performs semantic analysis on the text information, extracts relevant segmentation information therefrom, and transmits the segmentation information to the first terminal 110; the first terminal 110 determines a division target from the division information.
That is, acquiring the division information may be implemented as: sending text information to a server; and receiving the segmentation information sent by the server.
For example, the text information is "red clothing", the first terminal 110 transmits the text information to the server, the server performs semantic analysis on the text information, then performs feature extraction, and understands semantics according to the features, resulting in segmentation information related to image segmentation including "red" and "clothing"; the first terminal 110 receives the division information transmitted from the server and determines that the division target is "red clothes" according to the division information.
Step 404: and acquiring a semantic segmentation result of the target image.
Illustratively, the semantic segmentation results are used to indicate different targets on the target image and the image areas occupied by the different targets.
The semantic segmentation result is a result obtained based on semantic segmentation of pixel points in the target image. That is, the pixels in the target image are analyzed, the pixels of the same type are classified as a target, and the target and the image area occupied by the target in the target image are determined as a semantic analysis result.
For example, the target image is a photograph including a mountain, a river, and a plurality of people. And carrying out semantic segmentation on the target image to obtain three targets, namely a mountain, a river and a person, wherein the person targets comprise all the persons in the photo and the image area occupied by all the persons in the photo.
In the image segmentation process, after the first terminal 110 acquires the segmentation target, a semantic segmentation result is also acquired, and then comparison is performed. In an alternative embodiment, the first terminal 110 transmits the target image to the server after acquiring the target image; the server performs semantic segmentation on the target image to obtain a semantic segmentation result, and sends the semantic segmentation result to the first terminal 110.
That is, step 404 may be implemented as: transmitting the target image to a server; and receiving the semantic segmentation result sent by the server.
Step 405: and determining an image area occupied by the object with the highest similarity with the segmentation object in the semantic segmentation result as a first image area of the segmentation object on the object image.
According to the foregoing, the segmentation target is extracted from the text information, while the segmentation target is also a part of the target image. In order to enable the image content corresponding to the split target to be distinguished in the target image, further processing is needed to be carried out on the target image, the processing result of the target image is compared with the split target, and the area where the image content corresponding to the split target is located is determined to be a first image area under the condition that the processing result and the processing result are matched with the split target.
Specifically, the first terminal 110 compares the semantic analysis result with the segmentation target, and determines an image area occupied by the target with the highest similarity as the first image area.
Step 406: the target image and the dividing line are displayed.
Illustratively, a segmentation line is used to indicate a segmentation boundary between the first image region and a second image region other than the segmentation target.
As schematically shown in fig. 5, a target image 511 is displayed in the image division management interface 510, and an input box 512 is provided, and the input box 512 is used to input information related to image division.
Wherein the target image 511 includes two characters, a small girl wearing red clothes in the upper left corner and a Santa Claus wearing red clothes and having a red hat in the lower right corner (the red part is shown as black hatching in the figure).
The user enters text in input box 512, which is the character string "red clothing", for indicating that the target image 511 is divided, the image division result being red clothing in the target image 511. From this string, text information, which is "red clothing", can be acquired.
The first terminal 110 transmits the text information to a server, the server performs semantic analysis on the text information to obtain the segmentation information of red and clothes, and transmits the segmentation information to the first terminal 110, and the first terminal 110 determines that the segmentation target is the clothes of red according to the segmentation information.
Subsequently, in the management interface 510, a first image area is displayed, which is shown in the figure as a lattice area.
Step 407: the first image region is edited in response to an editing operation on the target image.
In the image segmentation process, in order to make the obtained image segmentation result more accurate, the image segmentation result can be adjusted according to the image content of the target image.
Illustratively, the editing operation on the target image includes, but is not limited to, at least one of the following operations: a single click operation on the target image, a double click operation on the target image, a touch operation on the target image, a drag operation on the target image. For example, the user clicks at a point on the target image, and edits the first image region in response to the clicking operation.
Specifically, there are various implementations of editing the first image area, and an alternative implementation is provided in the embodiment of the present application, which is specifically described below.
Step 408: in response to an editing operation on a target image, a candidate image area corresponding to the editing operation is determined.
The candidate image area is determined according to the editing operation and is used for adjusting the range of the first image area. Taking the example that the editing operation is a click operation on the target image, step 407 may be implemented as:
Responding to the editing operation on the target image, and determining a first pixel point corresponding to the editing operation;
acquiring a clustering score of a first pixel point and a second pixel point in an adjacent area, and determining the pixel points with the clustering score higher than a preset threshold value as the pixel points of the same type;
and determining candidate image areas according to the pixel points of the same type.
Specifically, in response to the click operation on the target image, the first terminal 110 may determine a first pixel point corresponding to the click operation and send the pixel point to the server. After receiving the first pixel point, the server clusters the first pixel point with the second pixel point in the adjacent area to obtain a cluster score, and sends the cluster score to the first terminal 110. Then, the first terminal 110 determines, according to the plurality of cluster scores, the pixel points corresponding to the cluster scores higher than the preset threshold value as the same type of pixel points, and the areas where the plurality of same type of pixel points are located are candidate image areas.
Taking clustering by K-means algorithm as an example. The basic idea of the K-means algorithm is: clustering is performed centering on k points in space, classifying the objects closest to them. And successively updating the values of the clustering centers by an iterative method until the best clustering result is obtained. The first pixel point is a center point in space, the first pixel point is taken as a center, the second pixel point in the adjacent area is classified, and the clustering score is obtained through iterative calculation. Optionally, the clustering score is a color similarity from the second pixel point to the first pixel point, and the higher the color similarity is, the higher the clustering score is.
The cluster score calculation has various implementation modes, and the embodiment of the application provides the following optional implementation modes: the server calculates the correlation degree of the first pixel point and the second pixel point based on the attention model; and clustering the first pixel point and the plurality of second pixel points according to the correlation degree, and calculating to obtain a clustering score.
Equivalently, the user performs a click on an arbitrary position on the target image, and the position can be located in the image segmentation result or can be located outside the image segmentation result; according to the selection points corresponding to the clicking operation of the user and the residual points in the adjacent areas of the selection points, the server calculates based on the attention model to obtain the correlation degree between the selection points and the residual points; then, the cluster scores of the selected points and the plurality of remaining points are calculated according to the correlation, and the part with the higher cluster score is determined as the candidate image area.
As schematically shown in fig. 6, taking the red garment of the Santa Claus as an example of the first image area obtained based on fig. 5, the first image area is shown as a dot matrix area in the figure, and the boundary of the first image area is marked with a thick line. Wherein, the waistband, buttons, trouser legs and cuffs of the Santa Claus are white.
The user clicks the waistband, the buttons, the trouser legs and the cuffs in sequence, based on the first pixel points corresponding to the clicking positions, the clustering score of the second pixel points of the adjacent areas can be obtained, the pixel points which are white in color are determined to be the same type of pixel points, the candidate image areas are determined according to the same type of pixel points, and the candidate image areas are displayed as oblique line areas in the figure.
Step 409: the range of the first image region is adjusted based on the candidate image region.
The position of the candidate image area in the target image includes the following three cases: the candidate image region is included in the first image region, the candidate image region has an intersection with the first image region, and the candidate image region is not included in the first image region.
In the image segmentation process, the adjustment of the range of the first image region includes two kinds, expansion and reduction, respectively. In step 409, according to the different situations of the position of the candidate image area in the target image, the adjustment of the range of the first image area based on the candidate image area also includes the following two optional adjustment manners:
The candidate image area is included in the first image area, and the candidate image area is removed from the first image area; or the candidate image area is not included in the first image area, adding the candidate image area to the first image area; or in the case where the candidate image area intersects the first image area, the candidate image area may be either removed from the first image area or added to the first image area.
According to the foregoing, the editing operation on the target image includes, but is not limited to, at least one of the following operations: a single click operation on the target image, a double click operation on the target image, a touch operation on the target image, a drag operation on the target image.
The adjustment of the range of the first image area is also different according to the different types of editing operations. For example, the editing operation is a drag operation, and the user drags the dividing line to expand or contract the range of the first image area. In another example, the editing operation is a touch operation, and the user circles the range of the candidate image area in the target image, and adjusts the range of the first image area according to the range.
Specifically, the adjustment of the range of the first image area is displayed as a change in the area boundary. Since the boundary of the candidate image area and the boundary of the first image area are different in relative positions, the boundary line of the first image area can be adjusted according to the relative positions of the candidate image area and the first image area. The following three conditions occur:
In the case where the boundary of the candidate image region meets the boundary of the first image region: as shown in fig. 6, the boundary of the candidate image region is a partial boundary of the first image region, the candidate image region is surrounded by the first image region, and a partial boundary where the first image region coincides with the boundary of the candidate image region is removed.
In the case where the boundary of the candidate image region does not meet the boundary of the first image region: when the candidate image area is positioned in the first image area, removing the candidate image area from the first image area, and displaying the boundary line of the candidate image area; when the candidate image area is located outside the first image area, the candidate image area is added to the first image area, and the boundary line of the candidate image area is displayed.
In the case where the boundary of the candidate image area intersects the boundary of the first image area: displaying boundary lines of the candidate image areas when the candidate image areas need to be removed from the first image area; when it is necessary to add the candidate image area to the first image area, a portion of the boundary line of the candidate image area located in the first image area is removed, and the remaining boundary line is displayed, the remaining boundary line becoming a part of the boundary line of the first image area.
There are various implementations of the adjustment of the range of the first image area based on the candidate image area, and an embodiment of the present application provides an alternative implementation: and adjusting the range of the first image area according to the position information of the trigger position of the editing operation in the target image.
That is, there are two alternative implementations of step 4101 and step 4102 for adjusting the range of the first image area.
Illustratively, steps 4101 and 4102 may be performed only alternatively, and may not be performed simultaneously.
Step 4101: in the case where the trigger position of the editing operation belongs to the first image area, the candidate image area is removed from the first image area.
Step 4102: in the case where the trigger position of the editing operation does not belong to the first image area, the candidate image area is added to the first image area.
That is, the adjustment of the range of the first image area is also different depending on the position of the trigger position of the editing operation in the target image.
Equivalently, the user triggers editing operation at any position on the target image, and the position can be located in the image segmentation result or outside the image segmentation result; and judging whether the trigger position is in the image segmentation result or not according to the trigger position of the editing operation of the user. Removing the candidate image area from the first image area when the trigger position belongs to the image segmentation result; and adding the candidate image area to the first image area when the trigger position does not belong to the image segmentation result.
As schematically shown in fig. 7, a target image 711 is displayed in the image division management interface 710, and an input box 712 is provided, and the input box 712 is used to input information related to image division.
The target image 711 includes a white aircraft, and the wings of the aircraft are gray due to the light problem. The text information input in the input box 712 is "white airplane", and based on the text information, a first image area is displayed in the management interface 710, and the first image area is displayed as a dot matrix area in the figure. Specifically, since the wing is gray, no wing portion is included in the first image area.
The user clicks on the wing portion in the target image 711. According to the clicking operation of the user, a first pixel point corresponding to the clicking operation can be obtained, the terminal sends the first pixel point to the server, the server clusters the first pixel point with a second pixel point in an adjacent area, and the clustering score is fed back to the terminal. After the terminal obtains the clustering score, determining the pixel points corresponding to the clustering score higher than a preset threshold value as the same type of pixel points of the first pixel point, and determining the wing part as a candidate image area based on the same type of pixel points.
Because the trigger position of the clicking operation of the user does not belong to the first image area, in response to the clicking operation of the user, the terminal adds the wing part (i.e. the candidate image area) to the first image area, and the final result of the first image area is obtained.
Fig. 8 shows a flowchart of an image segmentation method according to an exemplary embodiment of the present application, the method comprising the steps of:
step 801: the first terminal acquires a target image to be segmented.
The target image is an image whose image content is to be segmented, the target image including, but not limited to, at least one of the following images: photographs, drawings, clippers, maps, calligraphic works, handwriting Sinology, faxes, satellite cloud pictures, film and television pictures, X-ray films, electroencephalograms, electrocardiographs and the like.
Step 802: and the first terminal responds to the input operation of the segmentation instruction and acquires the text information of the segmentation instruction.
Specifically, the user inputs input information related to image division, which is used to describe the content of an operation for performing image division, in the management interface for image division.
There are a number of implementations of the input operation. Optionally, the input operation includes one of a text input operation and a voice input operation. For example, the user inputs text, which is a character string "red clothing", for indicating that the target image is segmented, and the image segmentation result is red clothing in the target image. From this string, text information, which is "red clothing", can be acquired.
Step 803: the first terminal sends text information to the server.
Step 804: and carrying out semantic analysis on the text information to obtain segmentation information, wherein the segmentation information comprises at least one of the type and visual characteristics of a segmentation target.
The type of the segmentation target refers to object classification of the segmentation target; the visual characteristics of the segmented object refer to at least one of the shape, color, and spatial position of the segmented object. For example, the division information includes houses, rectangular parallelepiped, red, and overlapping, and the division target obtained from the division information may be a red house composed of overlapping rectangular parallelepiped.
Step 805: the server transmits the segmentation information to the first terminal.
Step 806: the first terminal receives the segmentation information.
Step 807: the first terminal determines a segmentation target according to the segmentation information.
For example, the text information is "red clothes", the terminal sends the text information to the server, the server performs semantic analysis on the text information, then performs feature extraction, and obtains segmentation information related to image segmentation including "red" and "clothes" according to feature understanding semantics; the terminal receives the division information transmitted from the server and determines that the division target is "red clothes" based on the division information.
Step 808: the first terminal transmits the target image to the server.
Step 809: and the server performs semantic segmentation on the target image to obtain a semantic segmentation result.
Illustratively, the semantic segmentation results are used to indicate different targets on the target image and the image areas occupied by the different targets.
Step 810: and the server sends the semantic segmentation result to the first terminal.
The semantic segmentation result is a result obtained based on semantic segmentation of pixels in the target image. That is, the pixels in the target image are analyzed, the pixels of the same type are classified as a target, and the target and the image area occupied by the target in the target image are determined as a semantic analysis result.
Step 811: and the first terminal receives the semantic segmentation result.
For example, the target image is a photograph including a mountain, a river, and a plurality of people. The server performs semantic segmentation on the target image to obtain three targets, namely a mountain, a river and a person, wherein the person targets comprise all the persons in the photo and the image area occupied by all the persons in the photo. The server then sends the three targets to the first terminal.
Step 812: and the first terminal determines an image area occupied by the object with the highest similarity with the segmentation object in the semantic segmentation result as a first image area of the segmentation object on the object image.
The first terminal compares the semantic segmentation result with the segmentation target, and determines an image area occupied by the target with the highest similarity as a first image area.
Step 813: the first terminal displays the target image and the dividing line.
Illustratively, a segmentation line is used to indicate a segmentation boundary between the first image region and a second image region other than the segmentation target.
Step 814: the first terminal edits the first image area in response to a trigger operation on the target image.
In the image segmentation process, in order to make the obtained image segmentation result more accurate, the image segmentation result can be adjusted according to the image content of the target image.
The editing process of the first image area may refer to the foregoing, and will not be described in detail.
In summary, according to the image segmentation method provided by the embodiment of the application, the text information related to image segmentation is obtained, and the target image is segmented according to the segmentation target indicated by the text information, so that the image segmentation result corresponding to the text information is obtained, the accuracy of image segmentation is improved, and the image segmentation result is matched with the actual requirement.
The following is an embodiment of the device according to the present application, and details of the embodiment of the device that are not described in detail may be combined with corresponding descriptions in the embodiment of the method described above, which are not described herein again.
A block diagram of an image segmentation apparatus schematically shown in fig. 9, the apparatus includes an acquisition module 920, a response module 940, and a display module 960, where:
an acquisition module 920, configured to acquire a target image to be segmented;
A response module 940 for acquiring text information related to image segmentation in response to an input operation;
the obtaining module 920 is further configured to obtain a segmentation target based on the text information, where the segmentation target is extracted from the text information;
A display module 960 for displaying the target image and a dividing line for indicating a dividing boundary between a first image area of the dividing target on the target image and a second image area other than the dividing target.
In an alternative embodiment, the display module 960 is configured to obtain segmentation information, where the segmentation information is obtained according to semantic analysis performed on the text information, and the segmentation information includes at least one of a type and a visual characteristic of a segmentation target; determining a segmentation target according to the segmentation information.
In an alternative embodiment, the display module 960 is configured to send text information to a server; and receiving the segmentation information sent by the server.
In an alternative embodiment, the display module 960 is configured to obtain a semantic segmentation result of the target image, where the semantic segmentation result is used to indicate different targets on the target image and image areas occupied by the different targets; determining an image area occupied by a target with highest similarity with a segmentation target in a semantic segmentation result as a first image area of the segmentation target on a target image; the target image and the dividing line are displayed.
In an alternative embodiment, the display module 960 is configured to send the target image to the server; and receiving the semantic segmentation result sent by the server.
In an alternative embodiment, the image segmentation apparatus further includes an editing module 980 for editing the first image region in response to an editing operation on the target image.
In an alternative embodiment, the editing module 980 is configured to determine, in response to an editing operation on the target image, a candidate image area corresponding to the editing operation; the range of the first image region is adjusted based on the candidate image region.
In an alternative embodiment, the editing module 980 is configured to remove the candidate image area from the first image area if the trigger position of the editing operation belongs to the first image area; or in case the trigger position of the editing operation does not belong to the first image area, adding the candidate image area to the first image area.
In an optional implementation manner, the editing module 980 is configured to determine, in response to an editing operation on the target image, a first pixel point corresponding to the editing operation; acquiring a clustering score of a first pixel point and a second pixel point in an adjacent area, and determining the pixel points with the clustering score higher than a preset threshold value as the pixel points of the same type; and determining candidate image areas according to the pixel points of the same type.
In an alternative embodiment, the response module 940 is configured to obtain text information related to the image segmentation in response to a text input operation.
In an alternative embodiment, the response module 940 is configured to obtain voice information related to image segmentation in response to a voice input operation; text information is determined from the speech information.
It should be understood that references herein to "a plurality" are to two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The foregoing description of the preferred embodiments of the present application is not intended to limit the application, but rather, the application is to be construed as limited to the appended claims.

Claims (13)

1. An image segmentation method, the method comprising:
Acquiring a target image to be segmented;
Acquiring text information related to image segmentation in response to an input operation, the text information being used to describe the content of the operation for performing the image segmentation;
Acquiring a segmentation target based on the text information, wherein the segmentation target is extracted from the text information, the segmentation target meets at least one extraction information of the text information, and the extraction information is extracted from the text information;
Acquiring a semantic segmentation result of the target image, wherein the semantic segmentation result is a result obtained based on semantic segmentation of pixel points in the target image, and the semantic segmentation result is used for indicating different targets on the target image and image areas occupied by the different targets;
determining an image area occupied by a target with highest similarity with the segmentation target in the semantic segmentation result as a first image area of the segmentation target on the target image;
The target image and a dividing line for indicating a dividing boundary of the dividing target between the first image region and a second image region other than the dividing target on the target image are displayed.
2. The method of claim 1, wherein the obtaining a segmentation target based on the text information comprises:
Obtaining segmentation information, wherein the segmentation information is obtained according to semantic analysis of the text information, and the segmentation information comprises at least one of the type and visual characteristics of the segmentation target;
And determining the segmentation target according to the segmentation information.
3. The method of claim 2, wherein the obtaining segmentation information comprises:
Sending the text information to a server;
And receiving the segmentation information sent by the server.
4. The method of claim 1, wherein the obtaining the semantic segmentation result of the target image comprises:
transmitting the target image to a server;
and receiving the semantic segmentation result sent by the server.
5. The method according to any one of claims 1 to 4, further comprising:
The first image region is edited in response to an editing operation on the target image.
6. The method of claim 5, wherein editing the first image region in response to an editing operation on the target image comprises:
Responding to the editing operation on the target image, and determining a candidate image area corresponding to the editing operation;
and adjusting the range of the first image area based on the candidate image area.
7. The method of claim 6, wherein the adjusting the region of the first image region based on the candidate image region comprises:
removing the candidate image area from the first image area in the case that the trigger position of the editing operation belongs to the first image area;
or alternatively
And adding the candidate image area to the first image area in the case that the trigger position of the editing operation does not belong to the first image area.
8. The method of claim 6, wherein the determining, in response to an editing operation on the target image, a candidate image region corresponding to the editing operation comprises:
Responding to the editing operation on the target image, and determining a first pixel point corresponding to the editing operation;
Acquiring cluster scores of the first pixel points and second pixel points in the adjacent areas, and determining the pixel points with the cluster scores higher than a preset threshold value as pixel points of the same type;
And determining the candidate image area according to the pixel points of the same type.
9. The method according to any one of claims 1 to 4, wherein the acquiring text information related to image segmentation in response to an input operation includes:
And acquiring the text information related to the image segmentation in response to a text input operation.
10. The method according to any one of claims 1 to 4, wherein the acquiring text information related to image segmentation in response to an input operation includes:
Acquiring voice information related to image segmentation in response to a voice input operation;
And determining the text information according to the voice information.
11. An image segmentation apparatus, the apparatus comprising:
The acquisition module is used for acquiring a target image to be segmented;
A response module for acquiring text information related to image segmentation in response to an input operation, the text information being used for explaining the content of the operation for performing the image segmentation;
The acquisition module is further used for acquiring a segmentation target based on the text information, wherein the segmentation target is extracted from the text information, the segmentation target meets at least one extraction information of the text information, and the extraction information is extracted from the text information;
The display module is used for acquiring a semantic segmentation result of the target image, wherein the semantic segmentation result is a result obtained by carrying out semantic segmentation on pixel points in the target image, and the semantic segmentation result is used for indicating different targets on the target image and image areas occupied by the different targets; determining an image area occupied by a target with highest similarity with the segmentation target in the semantic segmentation result as a first image area of the segmentation target on the target image; the target image and a dividing line for indicating a dividing boundary of the dividing target between the first image region and a second image region other than the dividing target on the target image are displayed.
12. A computer device comprising a processor and a memory, wherein the memory has stored therein at least one program code that is loaded and executed by the processor to implement the image segmentation method of any one of claims 1-10.
13. A computer readable storage medium having stored therein at least one program code loaded and executed by a processor to implement the image segmentation method of any one of claims 1 to 10.
CN202110520007.6A 2021-05-13 2021-05-13 Image segmentation method, device, equipment and medium Active CN113256650B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110520007.6A CN113256650B (en) 2021-05-13 2021-05-13 Image segmentation method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110520007.6A CN113256650B (en) 2021-05-13 2021-05-13 Image segmentation method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN113256650A CN113256650A (en) 2021-08-13
CN113256650B true CN113256650B (en) 2024-06-21

Family

ID=77181636

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110520007.6A Active CN113256650B (en) 2021-05-13 2021-05-13 Image segmentation method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN113256650B (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20140093359A (en) * 2013-01-15 2014-07-28 삼성전자주식회사 User interaction based image segmentation apparatus and method
JP2017117030A (en) * 2015-12-22 2017-06-29 キヤノン株式会社 Image processing device
KR101919879B1 (en) * 2017-07-19 2018-11-19 충남대학교산학협력단 Apparatus and method for correcting depth information image based on user's interaction information
CN109726333A (en) * 2019-01-23 2019-05-07 广东小天才科技有限公司 It is a kind of that topic method and private tutor's equipment are searched based on image
CN110751659B (en) * 2019-09-27 2022-06-10 北京小米移动软件有限公司 Image segmentation method and device, terminal and storage medium
CN110930419A (en) * 2020-02-13 2020-03-27 北京海天瑞声科技股份有限公司 Image segmentation method and device, electronic equipment and computer storage medium

Also Published As

Publication number Publication date
CN113256650A (en) 2021-08-13

Similar Documents

Publication Publication Date Title
CN108229322B (en) Video-based face recognition method and device, electronic equipment and storage medium
CN114155543B (en) Neural network training method, document image understanding method, device and equipment
CN110473232B (en) Image recognition method and device, storage medium and electronic equipment
CN110163076B (en) Image data processing method and related device
US20140153832A1 (en) Facial expression editing in images based on collections of images
CN111598164B (en) Method, device, electronic equipment and storage medium for identifying attribute of target object
US10679041B2 (en) Hybrid deep learning method for recognizing facial expressions
CN107273895B (en) Method for recognizing and translating real-time text of video stream of head-mounted intelligent device
CN103824053A (en) Face image gender marking method and face gender detection method
CN112131978A (en) Video classification method and device, electronic equipment and storage medium
WO2022089170A1 (en) Caption area identification method and apparatus, and device and storage medium
US20210272253A1 (en) Automatically merging people and objects from multiple digital images to generate a composite digital image
CN113255501B (en) Method, apparatus, medium and program product for generating form recognition model
US20230100427A1 (en) Face image processing method, face image processing model training method, apparatus, device, storage medium, and program product
JP2018116589A (en) State identification device, program and method using changed image group of object image
CN112818995B (en) Image classification method, device, electronic equipment and storage medium
WO2021127916A1 (en) Facial emotion recognition method, smart device and computer-readabel storage medium
CN113642481A (en) Recognition method, training method, device, electronic equipment and storage medium
CN114549557A (en) Portrait segmentation network training method, device, equipment and medium
CN112883827B (en) Method and device for identifying specified target in image, electronic equipment and storage medium
WO2021179751A1 (en) Image processing method and system
CN112381118B (en) College dance examination evaluation method and device
CN113591433A (en) Text typesetting method and device, storage medium and computer equipment
CN113256650B (en) Image segmentation method, device, equipment and medium
CN114581994A (en) Class attendance management method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant