CN113947771B

CN113947771B - Image recognition method, apparatus, device, storage medium, and program product

Info

Publication number: CN113947771B
Application number: CN202111201988.4A
Authority: CN
Inventors: 薛松; 冯原; 辛颖; 张滨; 李超; 王云浩; 彭岩
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-10-15
Filing date: 2021-10-15
Publication date: 2023-06-27
Anticipated expiration: 2041-10-15
Also published as: CN113947771A

Abstract

The present disclosure provides an image recognition method, apparatus, device, storage medium, and program product, relating to the field of artificial intelligence, in particular to computer vision and deep learning techniques. One embodiment of the example segmentation model training method comprises the following steps: acquiring a first training sample, wherein the first training sample is a first sample image for marking a first target and a second target, and the first training sample is a first sample image; segmenting an image of a second target associated with the first target and an image of an independent first target from the first training sample, and labeling the first target to generate a second training sample; and training the deep learning model by using a second training sample to obtain a second instance segmentation model. According to the embodiment, the first instance segmentation model and the second instance segmentation model are trained to conduct instance segmentation, so that the accuracy of instance segmentation of a small target is improved.

Description

Image recognition method, apparatus, device, storage medium, and program product

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular to computer vision and deep learning techniques.

Background

In the information age, humans are obtaining massive images by various means and approaches every moment. Many computer vision tasks require intelligent segmentation of images to fully understand the content in the images, making analysis between image portions easier, which makes real scene-based instance segmentation of real life of great application value. Object detection and instance segmentation are two different computer vision tasks, object detection requires that objects be identified and located from images, and instance segmentation requires that pixels where objects are located be marked on the basis of the object detection.

Currently, a commonly used image recognition method is Mask R-CNN (Region-based Convolutional Neural Network, regional convolutional neural network). Mask R-CNN adds a branch of predictive Mask based on Faster R-CNN. Semantic segmentation is performed using convolution and deconvolution to build an end-to-end network and ROI-Pooling (Region Of Interest Pooling ) layers are replaced with ROI-Align (Region Of Interest Align, region of interest alignment). However, mask R-CNN is difficult to achieve accurate instance segmentation for small objects.

Disclosure of Invention

The embodiment of the disclosure provides an image recognition method, an image recognition device, a storage medium and a program product.

In a first aspect, an embodiment of the present disclosure provides an example segmentation model training method, including: acquiring a first training sample, wherein the first training sample is a first sample image for marking a first target and a second target, and the number and the size of the first target are smaller than those of the second target in the first sample image; segmenting an image of a second target associated with the first target and an image of an independent first target from the first training sample, and labeling the first target to generate a second training sample; and training the deep learning model by using a second training sample to obtain a second instance segmentation model.

In a second aspect, an embodiment of the present disclosure provides an image recognition method, including: inputting an image to be segmented into a first instance segmentation model to obtain a boundary frame of a first target and a boundary frame of a second target which are independent; dividing an image of the second target from the images to be divided based on the boundary frame of the second target; inputting an image of a second object to a second instance segmentation model to obtain a bounding box of a first object associated with the second object, wherein the first instance segmentation model and the second instance segmentation model are trained as described in the first aspect; and carrying out instance segmentation on the image to be segmented based on the boundary box of the independent first object, the boundary box of the first object related to the second object and the boundary box of the second object, so as to obtain an instance segmentation result.

In a third aspect, an embodiment of the present disclosure provides an example segmentation model training apparatus, including: the acquisition module is configured to acquire a first training sample, wherein the first training sample is a first sample image for marking a first target and a second target, and the number and the size of the first targets are smaller than those of the second targets in the first sample image; the generation module is configured to segment an image of a second target related to the first target and an image of an independent first target from the first training sample, and annotate the first target to generate a second training sample; the first training module is configured to train the deep learning model by using the second training sample to obtain a second instance segmentation model.

In a fourth aspect, an embodiment of the present disclosure provides an image recognition apparatus, including: the first detection module is configured to input an image to be segmented into a first instance segmentation model to obtain a boundary frame of a first target and a boundary frame of a second target which are independent; a first segmentation module configured to segment an image of a second object from the images to be segmented based on a bounding box of the second object; a second detection module configured to input an image of a second object to a second instance segmentation model, resulting in a bounding box of a first object associated with the second object, wherein the first instance segmentation model and the second instance segmentation model are trained using the apparatus as described in the third aspect; and the second segmentation module is configured to conduct instance segmentation on the image to be segmented based on the boundary frame of the first object, the boundary frame of the first object related to the second object and the boundary frame of the second object, so as to obtain an instance segmentation result.

In a fifth aspect, an embodiment of the present disclosure proposes an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method as described in any one of the implementations of the first and second aspects.

In a sixth aspect, embodiments of the present disclosure provide a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a method as described in any one of the implementations of the first and second aspects.

In a seventh aspect, embodiments of the present disclosure propose a computer program product comprising a computer program which, when executed by a processor, implements a method as described in any of the implementations of the first and second aspects.

The embodiment of the disclosure provides an example segmentation model training method, which includes the steps of firstly, acquiring a first training sample; then, an image of a second target related to the first target and an image of an independent first target are segmented from the first training sample, and the first target is marked to generate a second training sample; and finally training the deep learning model by using a second training sample to obtain a second instance segmentation model. And the second instance segmentation model is trained to conduct instance segmentation, so that the accuracy of instance segmentation of the small target is improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

Other features, objects and advantages of the present disclosure will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings. The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow chart of one embodiment of an example segmentation model training method according to the present disclosure;

FIG. 2 is a flow chart of yet another embodiment of an example segmentation model training method according to the present disclosure;

FIG. 3 is a flow chart of another embodiment of an example segmentation model training method according to the present disclosure;

FIG. 4 is a flow chart of one embodiment of an image recognition method according to the present disclosure;

FIG. 5 is a flow chart of yet another embodiment of an image recognition method according to the present disclosure;

FIG. 6 is an overall system flow diagram of an embodiment of the present disclosure;

FIG. 7 is a schematic diagram of the architecture of one embodiment of an example segmentation model training apparatus in accordance with the present disclosure;

FIG. 8 is a schematic structural view of one embodiment of an image recognition device according to the present disclosure;

FIG. 9 is a block diagram of an electronic device used to implement an example segmentation model training method or an image super-resolution method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

FIG. 1 illustrates a flow 100 of one embodiment of an example segmentation model training method according to the present disclosure. The example segmentation model training method comprises the following steps:

step 101, a first training sample is obtained.

In this embodiment, the execution subject of the example segmentation model training method may acquire a large number of first training samples.

The first training sample may be a first sample image that labels the first target and the second target. That is, the first sample image has the first target and the second target, and the pixels where the first target and the second target are located are marked, so that the first training sample can be obtained. The first target and the second target are different targets. And, in the first sample image, the number and size of the first targets are smaller than those of the second targets. Thus, the first target may be referred to as a small target and the second target may be referred to as a large target. For example, for ball event images (e.g., basketball event images, volleyball event images, football event images, etc.), the number of balls is much smaller than the number of players, and the size of the balls is much smaller than the size of the players, so the balls are small targets and the players are large targets.

Step 102, segmenting an image of a second target associated with the first target and an image of a single first target from the first training sample, and labeling the first target to generate a second training sample.

In this embodiment, according to the labeling information of the first sample image, the execution subject may segment an image of a second target associated with the first target and an image of an individual first target from the first sample image, and label the first target to generate the second training sample.

And if the intersection exists between the first target and the second target, the first target and the second target are related. Since the size of the second object is much larger than that of the first object, the image of the second object (such as a player carrying a ball) associated with the first object, which is divided according to the labeling information of the second object, has both the first object and the second object. In addition, an image of an individual first object (such as an individual ball) can be segmented based on the labeling information of the first object. There is no intersection between the separate first and second targets. Only the first object is present in the image of the individual first object. In this way, the image of all the first objects in the first sample image can be segmented. And marking the pixels of the segmented image where the first target is located, so as to obtain a second training sample.

And step 103, training the deep learning model by using a second training sample to obtain a second instance segmentation model.

In this embodiment, the execution body may train the deep learning model by using a second training sample to obtain a second example segmentation model. Wherein the second instance segmentation model may be used to detect the first object. Since the training is performed by segmenting the entire image of the first object, the second example segmentation model can accurately detect the small-sized first object with a small number.

In general, the second example segmentation model may be obtained by supervised training of the deep learning model using a machine learning method and a second training sample. In practice, the various parameters of the deep learning model (e.g., weight parameters and bias parameters) may be initialized with a number of different small random numbers. The small random number is used for ensuring that the network does not enter a saturated state due to overlarge weight, so that training fails, and the different random numbers are used for ensuring that the network can learn normally. Parameters of the deep learning model may be continuously adjusted during training until a second example segmentation model is trained that is sufficiently accurate for small target detection. For example, a BP (Back Propagation) algorithm or an SGD (Stochastic Gradient Descent, random gradient descent) algorithm may be employed to adjust the parameters of the deep learning model.

In some embodiments, the first instance segmentation model may use a Swin Transformer as a backbone network, and use a Mask R-CNN algorithm to segment the object in the image.

With continued reference to FIG. 2, a flow 200 of yet another embodiment of an example segmentation model training method in accordance with the present disclosure is shown. The example segmentation model training method comprises the following steps:

in step 201, a first training sample is obtained.

In this embodiment, the specific operation of step 201 is described in detail in step 101 in the embodiment shown in fig. 1, and will not be described herein.

Step 202, training the deep learning model by using a first training sample to obtain a first instance segmentation model.

In this embodiment, the execution body of the example segmentation model training method may train the deep learning model by using the first training sample to obtain the first example segmentation model. Specifically, a first sample image is taken as input, labeling information of the first sample image is taken as output, and a deep learning model is trained to obtain a first instance segmentation model. Wherein the first instance segmentation model may be used to detect the first object and the second object. Since the number and size of the first objects in the first sample image are smaller than those of the second objects, the detection accuracy of the first object by the first example segmentation model is lower than that of the second object.

In general, the first example segmentation model may be obtained by supervised training of a deep learning model using a machine learning method and a first training sample. In practice, the various parameters of the deep learning model (e.g., weight parameters and bias parameters) may be initialized with a number of different small random numbers. The small random number is used for ensuring that the network does not enter a saturated state due to overlarge weight, so that training fails, and the different random numbers are used for ensuring that the network can learn normally. Parameters of the deep learning model may be continuously adjusted during training until a first instance segmentation model with sufficiently high accuracy for target detection is trained. For example, the BP algorithm or SGD algorithm may be employed to adjust parameters of the deep learning model.

Step 203, segmenting the image of the second target associated with the first target and the image of the independent first target from the first training sample, and labeling the first target to generate a second training sample.

And 204, training the deep learning model by using a second training sample to obtain a second instance segmentation model.

In this embodiment, the specific operations of steps 203 to 204 are described in detail in steps 102 to 103 in the embodiment shown in fig. 1, and are not described herein.

As can be seen from fig. 2, compared to the corresponding embodiment of fig. 1, the process 200 of the example segmentation model training method in this embodiment adds a step of training the first example segmentation model. Therefore, the scheme described in the embodiment trains the first instance segmentation model and the second instance segmentation model to conduct instance segmentation, and accuracy of instance segmentation is improved.

With further reference to FIG. 3, a flow 300 of another embodiment of an example segmentation model training method according to the present disclosure is shown. The example segmentation model training method comprises the following steps:

Step 301, adding a first target to a first sample image.

In this embodiment, the example segmentation model may add a first target to the first sample image.

Typically, in the first sample image, the number of first objects is much smaller than the second objects, resulting in a significant data imbalance. Thus, adding the first target to the first sample image may equalize the number of first targets and second targets. And, the first target added does not cover the original first target or second target in the first sample image, so as to avoid influencing target detection.

In some embodiments, the executing entity may copy the pixel points of the first object in the second sample image onto the first sample image, so that the added first object is more real and natural. The first target exists in the second sample image, and the pixel where the first target is located is marked. According to the labeling information of the second sample image, mask information, bounding box information and category information of the first target can be obtained, and pixels with the category of the first target are copied to the first sample image. In addition, before copying, it is necessary to determine whether the copied first object has the first object or the second object at the same position of the first sample image. If so, not pasting; if not, a paste may be performed.

Step 302, adding a second object associated with the first object in the first sample image.

In this embodiment, the example segmentation model may add a second object to the first sample image that is associated with the first object.

Typically, in the first sample image, the number of first objects is much smaller than the second objects, resulting in a significant data imbalance. Thus, adding a second object associated with the first object in the first sample image may equalize the number of first objects and second objects. And, the added second target does not cover the original first target or second target in the first sample image, so as to avoid influencing target detection.

In some embodiments, the first object and the second object are present in the second sample image, and pixels where the first object and the second object are located are labeled. If the first object and the second object in the second sample image have an intersection, the execution subject can copy the first object and the second object with the intersection on the first sample image at the same time, so that the added first object and second object are more real and natural. According to the labeling information of the second sample image, mask information, bounding box information and category information of the first target and the second target with the intersection can be obtained, and pixels of the first target and the second target with the intersection are copied to the first sample image at the same time. Further, before copying, it is necessary to determine whether the first object and the second object having the intersection exist or not at the same position of the first sample image. If so, not pasting; if not, a paste may be performed.

Generally, by adding a first object and a second object associated with the first object, and adjusting the ratio of the first object to the second object in the first sample image to be close to 1:1, the phenomenon of unbalanced classification can be fundamentally solved. For example, the first sample image and the second sample image may be different video frames in a ball game video, with a ball to player ratio of 1:5 before replication and 6:7 after replication. By duplicating the ball and the player carrying the ball, the ball to player ratio adjustment is made approximately 1:1.

In step 303, a first training sample is obtained.

And step 304, training the deep learning model by using the first training sample to obtain a first instance segmentation model.

In step 305, an image of a second target associated with the first target and an image of a separate first target are segmented from the first training sample, and the first target is labeled to generate a second training sample.

And step 306, training the deep learning model by using the second training sample to obtain a second instance segmentation model.

In this embodiment, the specific operations of steps 303-306 are described in detail in steps 201-204 in the embodiment shown in fig. 2, and are not described herein.

As can be seen from fig. 3, the flow 300 of the example segmentation model training method in this embodiment highlights the step of adding small targets, as compared to the corresponding embodiment of fig. 2. Therefore, the small targets are added in two ways, the problem of few small targets is solved, and the number of the small targets and the number of the large targets are balanced.

With further reference to fig. 4, a flow 400 of one embodiment of an image recognition method according to the present disclosure is shown. The image recognition method comprises the following steps:

step 401, inputting an image to be segmented into a first instance segmentation model, and obtaining a separate bounding box of a first target and a separate bounding box of a second target.

In this embodiment, the execution subject of the image recognition method may acquire an image to be segmented, and input the image to be segmented to the first instance segmentation model, to obtain a bounding box of a separate first target (e.g., a separate ball) and a bounding box of a second target (e.g., an athlete).

Wherein, there are first target and second target in waiting to cut apart the picture. The first example segmentation model may be trained using an embodiment of the method shown in fig. 2 or 3. The first example segmentation model may be used to detect a first object and a second object. Therefore, the object detection is performed on the image to be segmented by using the first instance segmentation model, and the boundary box of the first object and the boundary box of the second object can be obtained independently. Here, the second targets detected by the first example segmentation model include both individual second targets (e.g., individual athletes) and second targets associated with the first targets (e.g., ball-carrying athletes).

It should be noted that, since the number and the size of the first objects in the image to be segmented are smaller than those of the second objects, the detection accuracy of the first object by the first instance segmentation model is lower than that of the second object.

Step 402, segmenting the image of the second object from the images to be segmented based on the bounding box of the second object.

In this embodiment, the execution subject may segment the image of the second object from the image to be segmented based on the bounding box of the second object. Since the second object detected by the first instance segmentation model includes both an individual second object and a second object associated with the first object, the segmented image of the second object includes both an image of the individual second object and an image of the second object associated with the first object.

It should be noted that, since the detection accuracy of the first example segmentation model on the second target is high, the step 402 can segment almost all the images of the second target accurately, and the probability of missed detection and false detection is low.

Step 403, inputting the image of the second object to the second instance segmentation model, to obtain a bounding box of the first object associated with the second object.

In this embodiment, the execution subject may input the image of the second object into the second instance segmentation model to obtain the bounding box of the first object (e.g., the ball carried by the player) associated with the second object.

Wherein the second example segmentation model may be trained using an embodiment of the method shown in fig. 1 or fig. 2. The second example segmentation model may be used to detect the first object. Thus, object detection of the image of the second object using the second instance segmentation model can result in a bounding box of the first object associated with the second object.

Since the second example segmentation model can accurately detect the first objects with small number and small size, step 403 can almost accurately detect the first objects in the second object image associated with the first objects, and the probability of missing detection and false detection is low.

Step 404, performing instance segmentation on the image to be segmented based on the boundary box of the first object, the boundary box of the first object and the boundary box of the second object, which are independent, to obtain an instance segmentation result.

In this embodiment, the execution body may perform instance segmentation on the image to be segmented based on the bounding box of the first object alone, the bounding box of the first object associated with the second object, and the bounding box of the second object, to obtain the instance segmentation result.

Wherein the bounding box of the first object and the bounding box of the second object are detected using the first instance segmentation model. The bounding box of the first object associated with the second object is detected using the second instance segmentation model. By combining the first instance segmentation model and the second instance segmentation model, all first objects and all second objects in the image to be segmented can be detected almost accurately. The example segmentation result may be an image that labels pixels where all the first and second objects in the image to be segmented are located.

According to the image recognition method provided by the embodiment of the disclosure, the first instance segmentation model and the second instance segmentation model are combined to conduct instance segmentation, so that the accuracy of instance segmentation is improved.

With further reference to fig. 5, a flow 500 of yet another embodiment of an image recognition method according to the present disclosure is shown. The image recognition method comprises the following steps:

step 501, inputting an image to be segmented into a first instance segmentation model to obtain a separate bounding box of a first object and a separate bounding box of a second object.

Step 502, segmenting an image of the second object from the images to be segmented based on the bounding box of the second object.

Step 503, inputting the image of the second object to the second instance segmentation model, to obtain a bounding box of the first object associated with the second object.

In this embodiment, the specific operations of steps 501-503 are described in detail in steps 401-403 in the embodiment shown in fig. 4, and are not described herein.

And step 504, returning the coordinates of the bounding box of the first target associated with the second target to the image to be segmented according to the coordinates of the bounding box of the second target.

In this embodiment, since the image of the second object is segmented from the image to be segmented, the execution subject of the image recognition method may return the coordinates of the bounding box of the first object associated with the second object to the image to be segmented according to the coordinates of the bounding box of the second object, and convert the coordinates of the bounding box of the first object associated with the second object in the image to be segmented. The coordinates of the bounding box of the individual first object, the coordinates of the bounding box of the first object associated with the second object and the coordinates of the bounding box of the second object are thus coordinates in the image to be segmented.

In step 505, the image to be segmented is subjected to instance segmentation based on the coordinates of the bounding box of the first object alone, the coordinates of the bounding box of the first object associated with the second object, and the coordinates of the bounding box of the second object, so as to obtain an instance segmentation result.

In this embodiment, the execution body may perform the instance segmentation on the image to be segmented based on the coordinates of the bounding box of the first object alone, the coordinates of the bounding box of the first object associated with the second object, and the coordinates of the bounding box of the second object, to obtain the instance segmentation result.

Since the coordinates of the bounding box of the individual first object, the coordinates of the bounding box of the first object associated with the second object and the coordinates of the bounding box of the second object are all coordinates in the image to be segmented, the positions of the bounding box of the individual first object, the bounding box of the first object associated with the second object and the bounding box of the second object in the image to be segmented can be found from the coordinates. And labeling the pixels where the found bounding boxes are located, so that an image with the pixels where all the first targets and the second targets in the image to be segmented are labeled can be obtained, and the example segmentation of the first targets and the second targets is completed.

As can be seen from fig. 5, compared with the embodiment corresponding to fig. 4, the flow 500 of the image recognition method in this embodiment highlights the example segmentation step. Therefore, the scheme described in the embodiment returns the coordinates of the bounding box of the first target associated with the second target to the image to be segmented, and converts the coordinates of the bounding box of the first target associated with the second target in the image to be segmented, so that the example segmentation of the first target and the second target in the image to be segmented can be completed rapidly.

With further reference to fig. 6, an overall system flow diagram of an embodiment of the present disclosure is shown. Taking a basketball event as an example, as shown in fig. 6, data acquisition and model training are performed in a training stage to obtain a model a and a model B. Wherein, model A is used for detecting basketball and sportsman, and model B is used for detecting basketball. In the testing stage, the image is preprocessed and then input into a model A, an athlete Mask is output, and an athlete image is obtained through cutting. And inputting the athlete image obtained by cutting into a model B, and outputting a basketball Mask. And returning the basketball Mask to the original drawing to obtain the Mask for basketball and athletes.

With further reference to fig. 7, as an implementation of the method shown in the foregoing figures, the present disclosure provides a first embodiment of an example segmentation model training apparatus, where the apparatus embodiment corresponds to the method embodiment shown in fig. 1, and the apparatus may be specifically applied to various electronic devices.

As shown in fig. 7, an example segmentation model training apparatus 700 of the present embodiment may include: an acquisition module 701, a generation module 702 and a first training module 703. Wherein the obtaining module 701 is configured to obtain a first training sample, where the first training sample is a first sample image that marks a first target and a second target, and in the first sample image, the number and the size of the first target are smaller than those of the second target; a generating module 702 configured to segment an image of a second target associated with the first target and an image of a separate first target from the first training sample, and annotate the first target to generate a second training sample; the first training module 703 is configured to train the deep learning model using the second training sample to obtain a second instance segmentation model.

In the present embodiment, in the example segmentation model training apparatus 700: the specific processing and the technical effects of the acquisition module 701, the generation module 702, and the first training module 703 may refer to the relevant descriptions of steps 101-103 in the corresponding embodiment of fig. 1, and are not described herein again.

In some optional implementations of the present embodiment, the example segmentation model training apparatus 700 further includes: the second training module is configured to train the deep learning model by using the first training sample to obtain a first instance segmentation model.

In some optional implementations of this embodiment, the first instance segmentation model and the second instance segmentation model use a Swin Transformer as a backbone network, and use a Mask R-CNN algorithm to perform instance segmentation on the object in the image.

In some optional implementations of the present embodiment, the example segmentation model training apparatus 700 further includes: a first adding module configured to add a first target in the first sample image; and/or a second adding module configured to add a second object associated with the first object in the first sample image.

In some alternative implementations of the present embodiment, the added first object and/or second object does not overlay the first object or the second object in the first sample image.

In some optional implementations of this embodiment, the first augmentation module is further configured to: the pixels of the first object in the second sample image are copied onto the first sample image.

In some optional implementations of this embodiment, the second augmentation module is further configured to: if the first object and the second object in the second sample image have intersection, the first object and the second object with the intersection are copied to the first sample image at the same time.

With further reference to fig. 8, as an implementation of the method shown in the foregoing figures, the present disclosure provides a first embodiment of an image recognition apparatus, where an embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 4, and the apparatus is particularly applicable to various electronic devices.

As shown in fig. 8, the image recognition apparatus 800 of the present embodiment may include: a first detection module 801, a first segmentation module 802, a second detection module 803, and a second segmentation module 804. Wherein, the first detection module 801 is configured to input an image to be segmented into the first instance segmentation model, so as to obtain a separate bounding box of the first target and a separate bounding box of the second target; a first segmentation module 802 configured to segment an image of a second object from the images to be segmented based on a bounding box of the second object; a second detection module 803 configured to input an image of the second object to a second instance segmentation model, resulting in a bounding box of the first object associated with the second object, wherein the first instance segmentation model and the second instance segmentation model are trained using the apparatus shown in fig. 7; the second segmentation module 804 is configured to perform instance segmentation on the image to be segmented based on the separate bounding box of the first object, the bounding box of the first object associated with the second object, and the bounding box of the second object, to obtain an instance segmentation result.

In the present embodiment, in the image recognition apparatus 800: the specific processing of the first detection module 801, the first segmentation module 802, the second detection module 803, and the second segmentation module 804 and the technical effects thereof may refer to the relevant descriptions of steps 401-404 in the corresponding embodiment of fig. 4, and are not repeated herein.

In some optional implementations of the present embodiment, the second segmentation module 804 is further configured to: returning the coordinates of the boundary frame of the first target related to the second target to the image to be segmented according to the coordinates of the boundary frame of the second target; and carrying out instance segmentation on the image to be segmented based on the coordinates of the boundary frame of the independent first object, the coordinates of the boundary frame of the first object related to the second object and the coordinates of the boundary frame of the second object, so as to obtain an instance segmentation result.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 9 shows a schematic block diagram of an example electronic device 900 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The computing unit 901, the ROM 902, and the RAM 903 are connected to each other by a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.

Various components in device 900 are connected to I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, or the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, an optical disk, or the like; and a communication unit 909 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunications networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 901 performs the respective methods and processes described above, such as an example segmentation model training method or an image recognition method. For example, in some embodiments, the example segmentation model training method or the image recognition method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 900 via the ROM 902 and/or the communication unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the example segmentation model training method or the image recognition method described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the instance segmentation model training method or the image recognition method in any other suitable way (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions provided by the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. An image recognition method, comprising:

inputting an image to be segmented into a first instance segmentation model to obtain a boundary frame of a first target and a boundary frame of a second target which are independent;

segmenting an image of the second target from the images to be segmented based on the bounding box of the second target;

inputting the image of the second target into a second instance segmentation model to obtain a boundary box of a first target related to the second target;

Performing instance segmentation on the image to be segmented based on the boundary frame of the independent first object, the boundary frame of the first object associated with the second object and the boundary frame of the second object to obtain an instance segmentation result;

the example segmentation is performed on the image to be segmented based on the boundary frame of the independent first object, the boundary frame of the first object associated with the second object and the boundary frame of the second object, so as to obtain an example segmentation result, and the example segmentation result comprises the following steps:

returning the coordinates of the boundary frame of the first target related to the second target to the image to be segmented according to the coordinates of the boundary frame of the second target;

and carrying out instance segmentation on the image to be segmented based on the coordinates of the boundary frame of the independent first object, the coordinates of the boundary frame of the first object related to the second object and the coordinates of the boundary frame of the second object, so as to obtain the instance segmentation result.

2. The method of claim 1, wherein the training of the second example segmentation model comprises:

acquiring a first training sample, wherein the first training sample is a first sample image for marking a first target and a second target, and the number and the size of the first target are smaller than those of the second target in the first sample image;

Dividing an image of a second target related to the first target and an image of an independent first target from the first training sample, and labeling the first target to generate a second training sample;

and training the deep learning model by using the second training sample to obtain a second instance segmentation model.

3. The method of claim 2, wherein the training of the first instance segmentation model comprises:

and training the deep learning model by using the first training sample to obtain a first instance segmentation model.

4. A method according to claim 3, wherein the first instance segmentation model and the second instance segmentation model use a Swin Transformer as a backbone network, and a Mask region convolutional neural network Mask R-CNN algorithm is adopted to conduct instance segmentation on objects in an image.

5. The method of claim 2, wherein prior to the acquiring the first training sample, further comprising:

adding a first target in the first sample image; and/or

And adding a second target associated with the first target in the first sample image.

6. The method of claim 5, wherein the added first and/or second object does not overlay the first or second object in the first sample image.

7. The method of claim 5 or 6, wherein the adding a first target in the first sample image comprises:

and copying the pixel points of the first target in the second sample image onto the first sample image.

8. The method of claim 5 or 6, wherein the adding a second object associated with a first object in the first sample image comprises:

if the first object and the second object in the second sample image have intersection, the first object and the second object with the intersection are copied to the first sample image at the same time.

9. An image recognition apparatus comprising:

the first detection module is configured to input an image to be segmented into a first instance segmentation model to obtain a boundary frame of a first target and a boundary frame of a second target which are independent;

a first segmentation module configured to segment an image of a second object from the images to be segmented based on a bounding box of the second object;

a second detection module configured to input an image of the second object to a second instance segmentation model, resulting in a bounding box of a first object associated with the second object;

the second segmentation module is configured to conduct instance segmentation on the image to be segmented based on the boundary frame of the independent first object, the boundary frame of the first object associated with the second object and the boundary frame of the second object, and an instance segmentation result is obtained;

Wherein the second segmentation module is further configured to:

10. The apparatus of claim 9, wherein the image recognition apparatus further comprises a training module comprising:

an acquisition sub-module configured to acquire a first training sample, wherein the first training sample is a first sample image marking a first target and a second target, and the number and the size of the first targets are smaller than those of the second targets in the first sample image;

a generating sub-module configured to segment an image of a second target associated with the first target and an image of a separate first target from the first training sample, and annotate the first target to generate a second training sample;

and the first training submodule is configured to train the deep learning model by using the second training sample to obtain a second instance segmentation model.

11. The apparatus of claim 10, wherein the training module further comprises:

and the second training submodule is configured to train the deep learning model by using the first training sample to obtain a first instance segmentation model.

12. The apparatus of claim 11, wherein the first and second instance segmentation models use a Swin Transformer as a backbone network and employ a Mask region convolutional neural network Mask R-CNN algorithm to instance segment objects in an image.

13. The apparatus of claim 10, wherein the training module further comprises:

a first incrementing sub-module configured to increment a first target in the first sample image; and/or

A second incrementing sub-module configured to increment a second object associated with the first object in the first sample image.

14. The apparatus of claim 13, wherein the added first and/or second object does not overlay the first or second object in the first sample image.

15. The apparatus of claim 13 or 14, wherein the first augmentation submodule is further configured to:

16. The apparatus of claim 13 or 14, wherein the second augmentation sub-module is further configured to:

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

18. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-8.