CN117372447A

CN117372447A - Image processing method, device, electronic equipment and storage medium

Info

Publication number: CN117372447A
Application number: CN202210771290.4A
Authority: CN
Inventors: 黄伟; 王利鸣; 王晓涛; 雷磊
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2022-06-30
Filing date: 2022-06-30
Publication date: 2024-01-09

Abstract

The disclosure provides an image processing method, an image processing device, an electronic device and a storage medium, wherein the method comprises the steps of obtaining an initial image, and the initial image comprises the following steps: the method comprises the steps of detecting an object to be processed in an initial image to obtain an initial detection frame corresponding to the object to be processed, processing the initial image according to initial position information and initial detection frame of initial position information of the initial detection frame in the initial image, and obtaining a target image corresponding to the object to be processed. According to the image segmentation method and device, the plurality of initial detection frames can be processed based on the initial position information to obtain the target detection frame, the image is processed according to the target detection frame, accurate segmentation of a plurality of objects to be processed in the image is facilitated, and accuracy of image segmentation processing is effectively improved.

Description

Image processing method, device, electronic equipment and storage medium

Technical Field

The disclosure relates to the field of computer technology, and in particular, to an image processing method, an image processing device, electronic equipment and a storage medium.

Background

The human image segmentation technology is to separate the human body outline from the image background and return the segmented binary image.

In the related art, a segmentation model is usually trained to directly predict a portrait mask, or a portrait segmentation mode is adopted to fuse human body key point detection.

In this way, for complex scenes with small targets, shielding and a large number of targets, accurate segmentation of the images cannot be realized, so that the segmentation accuracy is reduced, and the method of fusing human key point detection is suitable for the situation that one image is contained in the image and cannot be suitable for the use scene that a plurality of images are contained in the image.

Disclosure of Invention

The present disclosure aims to solve, at least to some extent, one of the technical problems in the related art.

Therefore, an object of the present disclosure is to provide an image processing method, an apparatus, an electronic device, a storage medium, and a computer program product, which can process a plurality of initial detection frames based on initial position information to obtain a target detection frame, process an image according to the target detection frame, so as to accurately divide a plurality of objects to be processed in the image, and effectively improve accuracy of image division processing.

An image processing method provided by an embodiment of a first aspect of the present disclosure includes: acquiring an initial image, wherein the initial image comprises: the method comprises the steps of detecting an object to be processed in an initial image to obtain an initial detection frame corresponding to the object to be processed, processing the initial image according to initial position information and initial detection frame of initial position information of the initial detection frame in the initial image, and obtaining a target image corresponding to the object to be processed.

In one embodiment, processing an initial image according to initial position information and an initial detection frame to obtain a target image corresponding to an object to be processed includes: processing the initial detection frame according to the initial position information to obtain a target detection frame; and processing the initial image according to the target detection frame to obtain a target image corresponding to the object to be processed.

In one embodiment, the number of initial detection frames is a plurality; the method for processing the initial detection frame according to the initial position information to obtain the target detection frame comprises the following steps: determining relative position information between at least two initial detection frames, wherein the relative position information is determined by the initial position information of the at least two initial detection frames; and processing at least two initial detection frames according to the relative position information to obtain a target detection frame.

In one embodiment, processing at least two initial detection frames according to the relative position information to obtain a target detection frame includes: detecting whether the same image area is contained between at least two initial detection frames according to the relative position information to obtain a detection result; and processing at least two initial detection frames according to the detection result to obtain a target detection frame.

In one embodiment, processing at least two initial detection frames according to the detection result to obtain a target detection frame includes: if the detection result indicates that the same image area is contained between the at least two initial detection frames, taking the external detection frames of the at least two initial detection frames as target detection frames; if the detection result indicates that the same image area is not contained between the at least two initial detection frames, center position information of the at least two initial detection frames is determined, and a target detection frame is determined according to the center position information.

In one embodiment, determining the target detection frame based on the center position information includes: determining splicing position information of at least two initial detection frames according to the central position information and the initial position information; performing transverse splicing treatment on at least two initial detection frames according to the splicing position information to obtain spliced detection frames; and acquiring an external detection frame of the spliced detection frame, and taking the external detection frame as a target detection frame.

In one embodiment, processing the initial image according to the target detection frame to obtain a target image corresponding to the object to be processed includes: intercepting a local area image selected by a target detection frame from an initial image; and carrying out segmentation processing on the local area image, and taking the segmented image as a target image corresponding to the object to be processed.

In one embodiment, the segmentation process is performed on the local area image, including: inputting the local area image into a preset image segmentation model to obtain local image semantics of the local area image; and carrying out segmentation processing on the local area image according to the local image semantics.

In one embodiment, the segmentation processing of the local area image according to the local image semantics includes: determining segmentation position information according to the local image semantics and initial position information of an initial detection frame; and carrying out segmentation processing on the local area image according to the segmentation position information.

In one embodiment, determining segmentation location information based on local image semantics and initial location information of an initial detection frame includes: determining local position information of a local area image corresponding to the local image semantics in an initial image; and determining segmentation position information according to the local position information and the initial position information of the initial detection frame.

According to the image processing method provided by the embodiment of the first aspect of the present disclosure, an initial image is obtained, where the initial image includes: the method comprises the steps of detecting an object to be processed in an initial image to obtain an initial detection frame corresponding to the object to be processed, processing the initial image according to the initial position information and the initial detection frame to obtain a target image corresponding to the object to be processed, processing a plurality of initial detection frames based on the initial position information to obtain a target detection frame, processing the image according to the target detection frame, and facilitating accurate segmentation of the plurality of objects to be processed in the image, and effectively improving accuracy of image segmentation processing.

An image processing apparatus according to an embodiment of a second aspect of the present disclosure includes: the device comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring an initial image, and the initial image comprises: an object to be processed; the detection module is used for detecting an object to be processed in the initial image to obtain an initial detection frame corresponding to the object to be processed and initial position information of the initial detection frame in the initial image; and the processing module is used for processing the initial image according to the initial position information and the initial detection frame to obtain a target image corresponding to the object to be processed.

In one embodiment, a processing module includes: the first processing sub-module is used for processing the initial detection frame according to the initial position information to obtain a target detection frame; and the second processing sub-module is used for processing the initial image according to the target detection frame to obtain a target image corresponding to the object to be processed.

In one embodiment, the number of initial detection frames is a plurality; the first processing sub-module is specifically configured to: determining relative position information between at least two initial detection frames, wherein the relative position information is determined by the initial position information of the at least two initial detection frames; and processing at least two initial detection frames according to the relative position information to obtain a target detection frame.

In one embodiment, the first processing sub-module is further configured to: detecting whether the same image area is contained between at least two initial detection frames according to the relative position information to obtain a detection result; and processing at least two initial detection frames according to the detection result to obtain a target detection frame.

In one embodiment, the first processing sub-module is further configured to: if the detection result indicates that the same image area is contained between the at least two initial detection frames, taking the external detection frames of the at least two initial detection frames as target detection frames; if the detection result indicates that the same image area is not contained between the at least two initial detection frames, center position information of the at least two initial detection frames is determined, and a target detection frame is determined according to the center position information.

In one embodiment, the first processing sub-module is further configured to: determining splicing position information of at least two initial detection frames according to the central position information and the initial position information; performing transverse splicing treatment on at least two initial detection frames according to the splicing position information to obtain spliced detection frames; and acquiring an external detection frame of the spliced detection frame, and taking the external detection frame as a target detection frame.

In one embodiment, the second processing sub-module is specifically configured to: intercepting a local area image selected by a target detection frame from an initial image; and carrying out segmentation processing on the local area image, and taking the segmented image as a target image corresponding to the object to be processed.

In one embodiment, the second processing sub-module is further configured to: inputting the local area image into a preset image segmentation model to obtain local image semantics of the local area image; and carrying out segmentation processing on the local area image according to the local image semantics.

In one embodiment, the second processing sub-module is further configured to: determining segmentation position information according to the local image semantics and initial position information of an initial detection frame; and carrying out segmentation processing on the local area image according to the segmentation position information.

In one embodiment, the second processing sub-module is further configured to: determining local position information of a local area image corresponding to the local image semantics in an initial image; and determining segmentation position information according to the local position information and the initial position information of the initial detection frame.

An embodiment of a second aspect of the present disclosure provides an image processing apparatus configured to obtain an initial image, where the initial image includes: the method comprises the steps of detecting an object to be processed in an initial image to obtain an initial detection frame corresponding to the object to be processed, processing the initial image according to the initial position information and the initial detection frame to obtain a target image corresponding to the object to be processed, processing a plurality of initial detection frames based on the initial position information to obtain a target detection frame, processing the image according to the target detection frame, and facilitating accurate segmentation of the plurality of objects to be processed in the image, and effectively improving accuracy of image segmentation processing.

An embodiment of a third aspect of the present disclosure proposes an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements an image processing method as proposed in an embodiment of the first aspect of the present disclosure when the program is executed by the processor.

An embodiment of a fourth aspect of the present disclosure proposes a non-transitory computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements an image processing method as proposed by an embodiment of the first aspect of the present disclosure.

An embodiment of a fifth aspect of the present disclosure proposes a computer program product which, when executed by a processor, performs an image processing method as proposed by an embodiment of the first aspect of the present disclosure.

Additional aspects and advantages of the disclosure will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the disclosure.

Drawings

The foregoing and/or additional aspects and advantages of the present disclosure will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a flow chart of an image processing method according to an embodiment of the disclosure;

fig. 2 is a schematic view of an application scenario in an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a portrait segmentation flow in an embodiment of the present disclosure;

fig. 4 is a flowchart of an image processing method according to another embodiment of the present disclosure;

FIG. 5 is a flow chart of an image processing method according to another embodiment of the present disclosure;

FIG. 6 is a schematic diagram of an external detection frame in an embodiment of the present disclosure;

FIG. 7 is a schematic view of a detection frame splice in an embodiment of the present disclosure;

FIG. 8 is a schematic diagram of a detection-based image processing flow in an embodiment of the present disclosure;

fig. 9 is a schematic structural view of an image processing apparatus according to an embodiment of the present disclosure;

fig. 10 is a schematic structural view of an image processing apparatus according to another embodiment of the present disclosure;

fig. 11 illustrates a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for explaining the present disclosure and are not to be construed as limiting the present disclosure. On the contrary, the embodiments of the disclosure include all alternatives, modifications, and equivalents as may be included within the spirit and scope of the appended claims.

Fig. 1 is a flowchart of an image processing method according to an embodiment of the disclosure.

It should be noted that, the main execution body of the image processing method in this embodiment is an image processing apparatus, and the apparatus may be implemented in software and/or hardware, and the apparatus may be configured in an electronic device, and the electronic device may include, but is not limited to, a terminal, a server, and the like.

As shown in fig. 1, the image processing method includes:

s101: acquiring an initial image, wherein the initial image comprises: an object to be processed.

The image to be subjected to the image segmentation process may be referred to as an initial image, where the initial image includes objects to be processed, and the number of the objects to be processed may be plural.

The object to be processed refers to an object to be segmented in the initial image, and the object to be processed may be a portrait in the initial image or may be another object in the image, which is not limited.

In the embodiment of the disclosure, when an initial image is acquired, an image shot by intelligent equipment such as a smart phone is acquired, the shot image is taken as the initial image, or an image acquired by a monitoring device can be acquired, and the acquired image is taken as the initial image.

For example, as shown in fig. 2, fig. 2 is a schematic view of an application scenario in an embodiment of the present disclosure, where the embodiment of the present disclosure may be applied to a terminal image processing scenario, for example, an image including a portrait may be obtained as an initial image in a mobile phone photographing and monitoring snapshot scenario, and target detection and image segmentation processing are performed on the initial image to generate masks of all portraits in the initial image.

In other embodiments, when the initial image is acquired, the image acquisition device may be configured on the image processing device in advance, the image acquisition device may be used to acquire the image under the scene as the initial image, or may acquire the image in the public data set as the initial image, or the image processing device may be configured with a data transmission interface, through which the image transmitted by other electronic devices is received as the initial image, or may acquire the initial image in any other possible manner, which is not limited.

S102: detecting an object to be processed in the initial image to obtain an initial detection frame corresponding to the object to be processed and initial position information of the initial detection frame in the initial image.

The initial detection frame is an object bounding box obtained after target detection processing is performed on an object to be processed in an initial image, and the initial detection frame contains an image to be processed.

The initial position information refers to data information that can be used to describe a relative position of the initial detection frame in the initial image, and the initial position information can be an upper left corner coordinate and a lower right corner coordinate of the detection frame of the initial detection frame in the initial image coordinate system.

In the embodiment of the disclosure, when detecting an object to be processed in an initial image to obtain an initial detection frame corresponding to the object to be processed, a lightweight target detection model may be trained in advance, the size of the initial image may be adjusted to an image size corresponding to the target detection model, for example, the size of the initial image may be adjusted to 320×320, then the initial image with the adjusted size may be input into the target detection model, and the object to be processed in the initial image is subjected to detection frame extraction processing by using the target detection model, so as to obtain a surrounding frame surrounding the object to be processed in the initial image as an initial detection frame, and the upper left corner coordinate and the lower right corner coordinate of the initial detection frame under the initial image coordinate system are saved as initial position information of the initial detection frame in the initial image.

In other embodiments, edge detection processing may be performed on the object to be processed to determine whether the detection frame predicted by the target detection model completely encloses the object to be processed, if the extracted detection frame may completely enclose the object to be processed, the extracted detection frame is used as an initial detection frame corresponding to the object to be processed, if the extracted detection frame does not completely enclose the object to be processed, expansion processing may be performed on the extracted detection frame, and the length and width of the detection frame may be expanded by 15% according to the length and width percentage of the detection frame, so as to obtain an expanded detection frame as an initial detection frame corresponding to the object to be processed, and an upper left corner coordinate and a lower right corner coordinate of the expanded detection frame under an initial image coordinate system are obtained as initial position information of the initial detection frame in an initial image.

S103: and processing the initial image according to the initial position information and the initial detection frame to obtain a target image corresponding to the object to be processed.

The target image is an image obtained by performing image segmentation processing on an object to be processed in the initial image, and the target image may be, for example, a binary image segmentation mask obtained by performing image semantic segmentation on a portrait in the initial image.

In the embodiment of the disclosure, after detecting the object to be processed in the initial image to obtain the initial detection frame corresponding to the object to be processed and the initial position information of the initial detection frame in the initial image, the initial image may be processed according to the initial position information and the initial detection frame to obtain the target image corresponding to the object to be processed.

In the embodiment of the disclosure, when an initial image is processed according to initial position information and an initial detection frame to obtain a target image corresponding to an object to be processed, an object image area of the object to be processed in the initial image may be obtained according to the initial position information and the initial detection frame, image semantic segmentation processing may be performed on the object image area, the object image area may be input into an image segmentation processing model to perform image semantic segmentation processing, so as to obtain an output result of the image segmentation processing model after the segmentation processing is performed on the object to be processed, and the position of the object to be processed in the initial image is restored according to the initial position information, and the processed image is used as the target image corresponding to the object to be processed.

Optionally, in some embodiments, the initial image is processed according to the initial position information and the initial detection frame to obtain the target image corresponding to the object to be processed, the initial detection frame may be processed according to the initial position information to obtain the target detection frame, and the initial image is processed according to the target detection frame to obtain the target image corresponding to the object to be processed.

The target detection frame is obtained by performing frame selection and splicing processing on the initial detection frame according to the initial position information, and a plurality of objects to be processed can be enclosed in the target detection frame.

In this embodiment of the present disclosure, the number of initial detection frames may be single or multiple, when an initial image is processed according to initial position information and initial detection frames to obtain a target image corresponding to an object to be processed, when the target detection model detects that the initial image includes multiple initial detection frames, the multiple initial detection frames may be processed according to the initial position information to obtain a target detection frame, whether an intersection relationship exists between the initial detection frames may be analyzed according to the initial position information, a minimum circumscribed frame of the initial detection frames having the intersection relationship is obtained, the circumscribed frames having no intersection relationship may be subjected to a stitching process, or a combination process of the two strategies may be performed on the multiple detection frames to obtain a processed detection frame, the processed detection frame may be used as a target detection frame, then the initial image may be processed according to the target detection frame to obtain a target image corresponding to the object to be processed, and the initial image of a region corresponding to the target detection frame may be input into the image segmentation processing model to perform an image semantic segmentation process, so as to obtain the segmented image as the target image.

For example, as shown in fig. 3, fig. 3 is a schematic diagram of a portrait segmentation flow in an embodiment of the present disclosure, taking an object to be processed in an initial image as a portrait, the initial image may be input into a target detection model, all portrait detection frames may be obtained, the portrait detection frames may be expanded by a fixed ratio based on the size of the outer frame of the portrait detection frames to obtain initial detection frames, then the initial detection frames may be automatically cropped and spliced by using a combined stitching policy according to the number, the size and the relative distance of the initial detection frames to obtain a target detection frame, the initial image may be cropped according to the target detection frame to obtain a partial image corresponding to the target detection frame, then the size of the image after the cropping may be adjusted according to the size of the input image of the image segmentation processing model, the image after the cropping may be input into the image segmentation processing model for image segmentation processing, and based on the stitching policy when the target detection frames are obtained, all portrait segmentation results in the initial image may be restored on the initial image, and the portrait mask after the segmentation processing may be used as the target image.

In other embodiments, when the target detection model detects that the initial image includes an initial detection frame, the object image area corresponding to the initial detection frame may be directly cut out from the initial image according to the initial position information of the initial detection frame, and the object image area is input into the image segmentation processing model to perform image semantic segmentation processing, so as to obtain an output result of the image segmentation processing model after the segmentation processing is performed on the object to be processed, and the position of the object to be processed in the initial image is restored according to the initial position information, and the processed image is used as the target image corresponding to the object to be processed.

In this embodiment, an initial image is acquired, where the initial image includes: the method comprises the steps of detecting an object to be processed in an initial image to obtain an initial detection frame corresponding to the object to be processed, processing the initial image according to the initial position information and the initial detection frame to obtain a target image corresponding to the object to be processed, processing a plurality of initial detection frames based on the initial position information to obtain a target detection frame, processing the image according to the target detection frame, and facilitating accurate segmentation of the plurality of objects to be processed in the image, and effectively improving accuracy of image segmentation processing.

Fig. 4 is a flowchart of an image processing method according to another embodiment of the present disclosure.

As shown in fig. 4, the image processing method includes:

s401: acquiring an initial image, wherein the initial image comprises: an object to be processed.

S402: detecting an object to be processed in the initial image to obtain initial detection frames corresponding to the object to be processed and initial position information of the initial detection frames in the initial image, wherein the number of the initial detection frames is multiple.

The descriptions of S401 to S402 may be specifically referred to the above embodiments, and are not repeated herein.

S403: and determining relative position information between the at least two initial detection frames, wherein the relative position information is determined by the initial position information of the at least two initial detection frames.

The relative position information refers to data information that may be used to describe a phase position relationship between at least two initial detection frames, where the relative position relationship may be, for example, a relative distance between at least two initial detection frames, the relative position relationship may describe that a cross relationship exists or does not exist between at least two initial detection frames, and the relative position information may be determined by initial position information of at least two initial detection frames.

In the embodiment of the disclosure, when determining the relative position information between at least two initial detection frames, the relative distance between at least two initial detection frames may be calculated according to the left-hand and right-hand coordinates of the upper left corner of the initial detection frames in the initial position information, and the relative distance is taken as the relative position information between at least two initial detection frames.

S404: and processing at least two initial detection frames according to the relative position information to obtain a target detection frame.

After determining the relative position information between the at least two initial detection frames, the embodiment of the disclosure may process the at least two initial detection frames according to the relative position information to obtain the target detection frame.

In the embodiment of the disclosure, when at least two initial detection frames are processed according to the relative position information, whether an intersection relationship exists between the at least two initial detection frames can be judged according to the relative position of the at least two initial detection frames in the relative position information, a minimum external frame of the initial detection frames with the intersection relationship is obtained, the external frames without the intersection relationship are subjected to splicing processing, or a plurality of detection frames can be subjected to combination processing of the two strategies, so that a processed detection frame is obtained, and the processed detection frame is used as a target detection frame.

Optionally, in some embodiments, when processing at least two initial detection frames according to the relative position information to obtain the target detection frame, whether the at least two initial detection frames contain the same image area or not may be detected according to the relative position information to obtain a detection result, and the at least two initial detection frames are processed according to the detection result to obtain the target detection frame.

The same image area refers to an overlapped image area of the initial image area surrounded by at least two initial detection frames.

In the embodiment of the disclosure, when processing at least two initial detection frames according to relative position information to obtain a target detection frame, whether the at least two initial detection frames contain the same image area or not may be detected according to the relative position information, whether the at least two initial detection frames contain the same image area or not may be judged according to the relative distance between the at least two initial detection frames in the relative position information to obtain a detection result, then the at least two initial detection frames may be processed according to the detection result, if the detection result indicates that the at least two initial detection frames contain the same image area, the minimum external detection frame of the at least two initial detection frames is obtained as the target detection frame, and if the detection result indicates that the at least two initial detection frames do not contain the same image area, the at least two initial detection frames are subjected to splicing processing to obtain the detection frame after the splicing processing as the target detection frame.

S405: and intercepting the local area image selected by the target detection frame from the initial image.

The local area image refers to a partial area image selected by a target detection frame in the initial image.

According to the embodiment of the disclosure, after at least two initial detection frames are processed according to the relative position information to obtain the target detection frame, a partial area image selected by the target detection frame by a target can be cut out from the initial image according to the position data and the size data of the target detection frame, and the cut-out image is used as the partial area image selected by the target detection frame.

S406: and carrying out segmentation processing on the local area image, and taking the segmented image as a target image corresponding to the object to be processed.

In the embodiment of the disclosure, when the local area image is segmented, the local area image may be input into the image segmentation processing model, the image segmentation processing model is utilized to perform semantic segmentation processing on the object to be processed included in the local area image, so as to obtain an object segmentation mask image of the object to be processed, which is output by the image segmentation processing model, and the object segmentation mask image is used as a target image corresponding to the object to be processed.

Optionally, in some embodiments, the local area image is subjected to segmentation processing, and the local area image may be input into a preset image segmentation model to obtain local image semantics of the local area image, where the local area image is subjected to segmentation processing according to the local image semantics, so that the receptive field of the object to be processed is effectively increased, accuracy in segmentation of the image to be processed is effectively improved, and image segmentation processing effect is improved.

In the embodiment of the disclosure, when the local area image is segmented, the size of the local area image may be adjusted according to the image input size of the preset image segmentation model, the adjusted local area image is input into the preset image segmentation model, the semantic segmentation processing is performed on the object to be processed included in the local area image by using the preset image segmentation model, so as to obtain the local image semantic of the local area image, then the segmentation processing may be performed on the local area image according to the local image semantic, and the pixel points in the local semantic image may be labeled according to the local image semantic, so as to obtain the binary mask image of the object to be processed as the target image.

In this embodiment, an initial image is acquired, where the initial image includes: the method comprises the steps of detecting an object to be processed in an initial image to obtain initial detection frames corresponding to the object to be processed, processing the initial image according to the initial position information and the initial detection frames to obtain a target image corresponding to the object to be processed, processing the initial detection frames based on the initial position information to obtain target detection frames, processing the image according to the target detection frames to facilitate accurate segmentation of the plurality of objects to be processed in the image, effectively improving the accuracy of image segmentation processing, detecting whether the same image area is contained between the at least two initial detection frames according to relative position information when the target detection frames are obtained, obtaining a detection result, processing the at least two initial detection frames according to the detection result to obtain target detection frames, and detecting whether the same image area is contained between the initial detection frames according to the relative position information, so that the initial detection frames with different position relations can be processed respectively to obtain the target detection frames, ensuring the integrity of the object to be processed in the target detection frames, effectively improving the accuracy of the segmentation processing of the object to be processed according to the relative position information, locally segmenting the image according to the preset semantic areas, and locally segmenting the image to obtain a local image by locally segmenting the image to be processed by the image, and locally segmenting the image to be processed by the local image area, the accuracy of the image to be processed in segmentation is effectively improved, and the image segmentation processing effect is improved.

Fig. 5 is a flowchart of an image processing method according to another embodiment of the present disclosure.

As shown in fig. 5, the image processing method includes:

s501: acquiring an initial image, wherein the initial image comprises: an object to be processed.

S502: detecting an object to be processed in the initial image to obtain an initial detection frame corresponding to the object to be processed and initial position information of the initial detection frame in the initial image.

S503: and determining relative position information between the at least two initial detection frames, wherein the relative position information is determined by the initial position information of the at least two initial detection frames.

S504: and detecting whether the same image area is contained between at least two initial detection frames according to the relative position information, and obtaining a detection result.

The descriptions of S501-S504 may be specifically referred to the above embodiments, and are not repeated herein.

S505: and if the detection result indicates that the same image area is contained between the at least two initial detection frames, taking the circumscribed detection frames of the at least two initial detection frames as target detection frames.

The external detection frame is a minimum detection frame which can surround at least two initial detection frames.

In the embodiment of the disclosure, whether the same image area is included between at least two initial detection frames can be detected according to the relative position information, so as to obtain a detection result, if the detection result indicates that the same image area is included between the at least two initial detection frames, a minimum detection frame which can surround the at least two initial detection frames is obtained, and the minimum detection frame is used as an external detection frame of the at least two initial detection frames.

As shown in fig. 6, fig. 6 is a schematic diagram of an circumscribed detection frame in the embodiment of the present disclosure, where 104 represents an initial image, detection frames 101 and 102 represent initial detection frames surrounding an object to be processed obtained after performing target detection processing on the initial image, whether the initial detection frames 101 and 102 contain the same image area or not may be detected according to the relative position information, if the detection result indicates that the initial detection frames 101 and 102 contain the same image area, and if the detection result indicates that the two initial detection frames are intersected, the smallest circumscribed frame surrounding all the initial detection frames is directly acquired as a target detection frame 103, and then the initial image selected by the target detection frame may be cropped to obtain a local area image 105.

S506: if the detection result indicates that the same image area is not contained between the at least two initial detection frames, center position information of the at least two initial detection frames is determined, and a target detection frame is determined according to the center position information.

The central position information may be central coordinate information of the initial detection frame.

In the embodiment of the disclosure, whether the same image area is included between at least two initial detection frames can be detected according to the relative position information to obtain a detection result, if the detection result indicates that the same image area is not included between at least two initial detection frames, center coordinate information of the at least two initial detection frames can be respectively determined, the center coordinate information is used as center position information of the at least two initial detection frames, then a target detection frame can be determined according to the center position information, and alignment and splicing processing can be performed on the at least two initial detection frames according to the center position information to obtain a detection frame after the splicing processing as the target detection frame.

Optionally, in some embodiments, the target detection frame is determined according to the central position information, the splicing position information of at least two initial detection frames may be determined according to the central position information and the initial position information, the at least two initial detection frames are transversely spliced according to the splicing position information, so as to obtain a spliced detection frame, an external detection frame of the spliced detection frame is obtained, and the external detection frame is used as the target detection frame, so that the plurality of initial detection frames may be spliced according to the splicing position information from left to right in sequence, thereby facilitating subsequent restoration of positions of the initial detection frames, and effectively improving image processing efficiency.

The splicing position information refers to arrangement sequence information of the initial detection frames when at least two initial detection frames are spliced.

In the embodiment of the disclosure, when the target detection frame is determined according to the central position information, the left-right arrangement sequence information between at least two initial detection frames can be determined according to the central position information and the initial position information, the left-right arrangement sequence information is used as splicing position information, then the at least two initial detection frames can be subjected to transverse splicing processing according to the splicing position information, so as to obtain a spliced detection frame, then the minimum external frame of the spliced detection frame is obtained as an external detection frame, and then the external detection frame is used as the target detection frame.

For example, as shown in fig. 7, fig. 7 is a schematic view of stitching the detection frames in the embodiment of the present disclosure, the image 205 is an initial image, the detection frames 203 and 204 represent two non-intersecting initial detection frames, whether the initial detection frames 203 and 204 include the same image area or not may be detected according to the relative position information, if the detection result indicates that the initial detection frames 203 and 204 do not include the same image area, center alignment and stitching are performed on the initial detection frames 203 and 204 according to the left-to-right sequence, the minimum external detection frame of the detection frames after the stitching process is obtained as a target detection frame, and the local area image 206 is obtained according to the initial image selected by the cut target detection frame.

S507: and intercepting the local area image selected by the target detection frame from the initial image.

S508: and inputting the local area image into a preset image segmentation model to obtain the local image semantics of the local area image.

The descriptions of S507-S508 may be specifically referred to the above embodiments, and are not repeated here.

S509: and determining segmentation position information according to the local image semantics and the initial position information of the initial detection frame.

The segmentation position information refers to segmentation edge information when an object to be processed in an initial image is segmented.

In the embodiment of the disclosure, after the local image semantics of the local area image are obtained by inputting the local area image into the preset image segmentation model, the segmentation position information may be determined according to the local image semantics and the initial position information of the initial detection frame.

In the embodiment of the disclosure, when determining the segmentation position information according to the local image semantics and the initial position information of the initial detection frame, the object edge position information of the object to be processed, which is surrounded by the initial detection frame, in the initial image may be positioned according to the initial position information of the initial detection frame and the local image semantics, and the object edge position information is used as the segmentation position information.

Optionally, in some embodiments, when determining the segmentation position information according to the local image semantics and the initial position information of the initial detection frame, local position information of the local area image corresponding to the local image semantics in the initial image may be determined, and the segmentation position information may be determined according to the local position information and the initial position information of the initial detection frame.

In the embodiment of the disclosure, when determining the segmentation position information according to the local image semantics and the initial position information of the initial detection frame, the local position information of the local area image corresponding to the local image semantics in the initial image may be determined, and the object edge position information of the object to be processed in the initial image is determined according to the local position information and the initial position information of the initial detection frame, and is used as the segmentation position information.

S510: and according to the segmentation position information, carrying out segmentation processing on the local area image, and taking the segmented image as a target image corresponding to the object to be processed.

In the embodiment of the disclosure, after determining the segmentation position information according to the local image semantics and the initial position information of the initial detection frame, the local area image may be subjected to segmentation processing according to the segmentation position information, and pixels of the local image area may be subjected to labeling processing according to the segmentation position information, so as to obtain a binarized object mask image, and the object mask image is used as a target image corresponding to an object to be processed.

For example, as shown in fig. 8, fig. 8 is a schematic diagram of a detection-based image processing flow in the embodiment of the present disclosure, first, whether to perform object detection on an initial image may be selected, if the initial image does not pass through a target detection stage, the initial image may be directly input into an image segmentation model to perform semantic segmentation to obtain a target image corresponding to an object to be processed in the initial image, if the initial image passes through the target detection stage, the initial image is first subjected to object detection to extract a plurality of initial detection frames in the initial image, then a maximum crop frame policy and a crop stitching policy may be automatically selected to process the initial detection frames to obtain a target detection frame, and then image segmentation processing may be performed on the initial image according to the target detection frame to generate a corresponding target image.

In this embodiment, an initial image is acquired, where the initial image includes: the method comprises the steps of detecting an object to be processed in an initial image to obtain initial detection frames corresponding to the object to be processed, processing the initial image according to the initial position information and the initial detection frames to obtain a target image corresponding to the object to be processed, processing the initial detection frames based on the initial position information to obtain the target detection frames, processing the image according to the target detection frames to facilitate accurate segmentation of the plurality of objects to be processed in the image, effectively improving accuracy of image segmentation processing, determining splicing position information of at least two initial detection frames according to central position information and the initial position information, performing transverse splicing processing on the at least two initial detection frames according to the splicing position information to obtain a spliced detection frame, obtaining an external detection frame of the spliced detection frame, taking the external detection frame as the target detection frame, and accordingly performing splicing processing on the plurality of initial detection frames according to the splicing position information from left to right in sequence to facilitate subsequent reduction of positions of the initial detection frames, and effectively improving image processing efficiency.

Fig. 9 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present disclosure.

As shown in fig. 9, the image processing apparatus 90 includes:

an acquiring module 901, configured to acquire an initial image, where the initial image includes: an object to be processed;

the detection module 902 is configured to detect an object to be processed in the initial image, so as to obtain an initial detection frame corresponding to the object to be processed, and initial position information of the initial detection frame in the initial image;

the processing module 903 is configured to process the initial image according to the initial position information and the initial detection frame, and obtain a target image corresponding to the object to be processed.

In some embodiments of the present disclosure, as shown in fig. 10, fig. 10 is a schematic structural diagram of an image processing apparatus according to another embodiment of the present disclosure, where the processing module 903 includes:

the first processing sub-module 9031 is configured to process the initial detection frame according to the initial position information, so as to obtain a target detection frame; and

the second processing sub-module 9032 is configured to process the initial image according to the target detection frame, and obtain a target image corresponding to the object to be processed.

In some embodiments of the present disclosure, the number of initial detection frames is a plurality;

The first processing sub-module 9031 is specifically configured to:

determining relative position information between at least two initial detection frames, wherein the relative position information is determined by the initial position information of the at least two initial detection frames;

and processing at least two initial detection frames according to the relative position information to obtain a target detection frame.

In some embodiments of the present disclosure, wherein the first processing sub-module 9031 is further configured to:

detecting whether the same image area is contained between at least two initial detection frames according to the relative position information to obtain a detection result;

and processing at least two initial detection frames according to the detection result to obtain a target detection frame.

if the detection result indicates that the same image area is contained between the at least two initial detection frames, taking the external detection frames of the at least two initial detection frames as target detection frames;

if the detection result indicates that the same image area is not contained between the at least two initial detection frames, center position information of the at least two initial detection frames is determined, and a target detection frame is determined according to the center position information.

Determining splicing position information of at least two initial detection frames according to the central position information and the initial position information;

performing transverse splicing treatment on at least two initial detection frames according to the splicing position information to obtain spliced detection frames;

and acquiring an external detection frame of the spliced detection frame, and taking the external detection frame as a target detection frame.

In some embodiments of the present disclosure, the second processing sub-module 9032 is specifically configured to:

intercepting a local area image selected by a target detection frame from an initial image;

and carrying out segmentation processing on the local area image, and taking the segmented image as a target image corresponding to the object to be processed.

In some embodiments of the present disclosure, wherein the second processing sub-module 9032 is further configured to:

inputting the local area image into a preset image segmentation model to obtain local image semantics of the local area image;

and carrying out segmentation processing on the local area image according to the local image semantics.

determining segmentation position information according to the local image semantics and initial position information of an initial detection frame;

and carrying out segmentation processing on the local area image according to the segmentation position information.

determining local position information of a local area image corresponding to the local image semantics in an initial image;

and determining segmentation position information according to the local position information and the initial position information of the initial detection frame.

Corresponding to the image processing method provided by the embodiments of fig. 1 to 5, the present disclosure also provides an image processing apparatus, and since the image processing apparatus provided by the embodiments of the present disclosure corresponds to the image processing method provided by the embodiments of fig. 1 to 5, the implementation of the image processing method is also applicable to the image processing apparatus provided by the embodiments of the present disclosure, and will not be described in detail in the embodiments of the present disclosure.

In order to achieve the above embodiments, the present disclosure further proposes an electronic device including: the image processing device comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the image processing method according to the previous embodiment of the disclosure when executing the program.

In order to implement the above-described embodiments, the present disclosure also proposes a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements an image processing method as proposed in the foregoing embodiments of the present disclosure.

In order to implement the above-described embodiments, the present disclosure also proposes a computer program product which, when executed by an instruction processor in the computer program product, performs an image processing method as proposed by the foregoing embodiments of the present disclosure.

Fig. 11 illustrates a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present disclosure. The electronic device 12 shown in fig. 11 is merely an example and should not be construed to limit the functionality and scope of use of embodiments of the present disclosure in any way.

As shown in fig. 11, the electronic device 12 is in the form of a general purpose computing device. Components of the electronic device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, a bus 18 that connects the various system components, including the system memory 28 and the processing units 16.

Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include industry Standard architecture (Industry Standard Architecture; hereinafter ISA) bus, micro channel architecture (Micro Channel Architecture; hereinafter MAC) bus, enhanced ISA bus, video electronics standards Association (Video Electronics Standards Association; hereinafter VESA) local bus, and peripheral component interconnect (Peripheral Component Interconnection; hereinafter PCI) bus.

Electronic device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by electronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

Memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (Random Access Memory; hereinafter: RAM) 30 and/or cache memory 32. The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 11, commonly referred to as a "hard disk drive").

Although not shown in fig. 11, a magnetic disk drive for reading from and writing to a removable nonvolatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable nonvolatile optical disk (e.g., a compact disk read only memory (Compact Disc Read Only Memory; hereinafter CD-ROM), digital versatile read only optical disk (Digital Video Disc Read Only Memory; hereinafter DVD-ROM), or other optical media) may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of the various embodiments of the disclosure.

A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods in the embodiments described in this disclosure.

The electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), one or more devices that enable a user to interact with the electronic device 12, and/or any devices (e.g., network card, modem, etc.) that enable the electronic device 12 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 22. Also, the electronic device 12 may communicate with one or more networks, such as a local area network (Local Area Network; hereinafter: LAN), a wide area network (Wide Area Network; hereinafter: WAN) and/or a public network, such as the Internet, via the network adapter 20. As shown, the network adapter 20 communicates with other modules of the electronic device 12 over the bus 18. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 12, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

The processing unit 16 executes various functional applications and data processing by running a program stored in the system memory 28, for example, implementing the image processing method mentioned in the foregoing embodiment.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

It should be noted that in the description of the present disclosure, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Furthermore, in the description of the present disclosure, unless otherwise indicated, the meaning of "a plurality" is two or more.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present disclosure in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present disclosure.

It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.

Furthermore, each functional unit in the embodiments of the present disclosure may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.

The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although embodiments of the present disclosure have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the present disclosure, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the present disclosure.

Claims

1. An image processing method, comprising:

acquiring an initial image, wherein the initial image comprises: an object to be processed;

Detecting the object to be processed in the initial image to obtain an initial detection frame corresponding to the object to be processed and initial position information of the initial detection frame in the initial image;

and processing the initial image according to the initial position information and the initial detection frame to obtain a target image corresponding to the object to be processed.

2. The method according to claim 1, wherein the processing the initial image according to the initial position information and the initial detection frame to obtain a target image corresponding to the object to be processed includes:

processing the initial detection frame according to the initial position information to obtain a target detection frame; and

and processing the initial image according to the target detection frame to obtain a target image corresponding to the object to be processed.

3. The method of claim 2, wherein the number of initial detection frames is a plurality;

the processing the initial detection frame according to the initial position information to obtain a target detection frame includes:

And processing the at least two initial detection frames according to the relative position information to obtain the target detection frame.

4. The method of claim 3, wherein said processing said at least two initial detection frames based on said relative position information to obtain said target detection frame comprises:

detecting whether the same image area is contained between the at least two initial detection frames according to the relative position information to obtain a detection result;

and processing the at least two initial detection frames according to the detection result to obtain the target detection frame.

5. The method of claim 4, wherein processing the at least two initial detection frames according to the detection result to obtain the target detection frame comprises:

if the detection result indicates that the same image area is contained between the at least two initial detection frames, taking an external detection frame of the at least two initial detection frames as the target detection frame;

and if the detection result indicates that the same image area is not contained between the at least two initial detection frames, determining center position information of the at least two initial detection frames, and determining the target detection frame according to the center position information.

6. The method of claim 5, wherein said determining the target detection box from the center position information comprises:

determining splicing position information of the at least two initial detection frames according to the central position information and the initial position information;

performing transverse splicing treatment on the at least two initial detection frames according to the splicing position information to obtain the spliced detection frames;

and acquiring an external detection frame of the spliced detection frame, and taking the external detection frame as the target detection frame.

7. The method according to claim 2, wherein the processing the initial image according to the target detection frame to obtain a target image corresponding to the object to be processed includes:

intercepting a local area image selected by the target detection frame from the initial image;

8. The method of claim 7, wherein the segmenting the local area image comprises:

9. The method of claim 8, wherein the segmenting the local region image according to the local image semantics comprises:

determining segmentation position information according to the local image semantics and the initial position information of the initial detection frame;

10. The method of claim 9, wherein determining segmentation location information based on the local image semantics and initial location information of the initial detection frame comprises:

determining local position information of the local area image corresponding to the local image semantics in the initial image;

11. An image processing apparatus, comprising:

the device comprises an acquisition module for acquiring an initial image, wherein the initial image comprises: an object to be processed;

the detection module is used for detecting the object to be processed in the initial image to obtain an initial detection frame corresponding to the object to be processed and initial position information of the initial detection frame in the initial image;

And the processing module is used for processing the initial image according to the initial position information and the initial detection frame to obtain a target image corresponding to the object to be processed.

12. The apparatus of claim 11, wherein the processing module comprises:

the first processing sub-module is used for processing the initial detection frame according to the initial position information to obtain a target detection frame; and

and the second processing sub-module is used for processing the initial image according to the target detection frame to obtain a target image corresponding to the object to be processed.

13. The apparatus of claim 12, wherein the number of initial detection frames is a plurality;

the first processing sub-module is specifically configured to:

14. The apparatus of claim 13, wherein the first processing sub-module is further to:

15. The apparatus of claim 14, wherein the first processing sub-module is further to:

16. The apparatus of claim 15, wherein the first processing sub-module is further to:

17. The apparatus of claim 12, wherein the second processing sub-module is specifically configured to:

18. The apparatus of claim 17, wherein the second processing sub-module is further to:

19. The apparatus of claim 18, wherein the second processing sub-module is further to:

20. The apparatus of claim 19, wherein the second processing sub-module is further to:

21. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the image processing method of any one of claims 1-10.

22. A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are for causing the computer to perform the image processing method of any one of claims 1-10.