CN116168442B

CN116168442B - Sample image generation method, model training method and target detection method

Info

Publication number: CN116168442B
Application number: CN202310403223.1A
Authority: CN
Inventors: 陈阳; 李弼; 希滕; 张刚
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2023-04-14
Filing date: 2023-04-14
Publication date: 2024-05-07
Anticipated expiration: 2043-04-14
Also published as: CN116168442A

Abstract

The disclosure provides a sample image generation method, a training method of a deep learning model, a target detection method, a device, electronic equipment, a storage medium and a program product, and relates to the technical field of artificial intelligence, in particular to the technical field of visual computers and the technical field of deep learning. The specific implementation scheme is as follows: image segmentation is carried out on the image to be processed to obtain a target detection frame of a target object in the image to be processed; transforming the target detection frame to obtain a transformation frame; determining a target image from the image to be processed based on the transformation frame; carrying out target recognition on the target image, and determining a label of the target object, wherein the label comprises a category and category confidence, and the category confidence is used for indicating the probability that the target object is of the category; and generating a sample image based on the target image and the tag of the target object.

Description

Sample image generation method, model training method and target detection method

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to the field of visual computer technology and the field of deep learning technology, and specifically, to a sample image generation method, a training method of a deep learning model, a target detection method, a target detection device, an electronic device, a storage medium, and a program product.

Background

Computer vision technology offers great potential for improving the processing power of images. Computer vision is a science and technology that studies how to "see" an electronic device, i.e., identify, track, measure, etc., objects with cameras and computers instead of the human eye. Computer vision technology provides great help for application development in public safety, information safety, financial safety and driving safety.

Disclosure of Invention

The present disclosure provides a sample image generation method, a training method of a deep learning model, a target detection method, an apparatus, an electronic device, a storage medium, and a program product.

According to an aspect of the present disclosure, there is provided a sample image generating method including: image segmentation is carried out on the image to be processed to obtain a target detection frame of the target object in the image to be processed; transforming the target detection frame to obtain a transformation frame; determining a target image from the images to be processed based on the transformation frame; performing target recognition on the target image, and determining a label of the target object, wherein the label comprises a category and a category confidence, and the category confidence is used for indicating the probability that the target object is the category; and generating a sample image based on the target image and the label of the target object.

According to another aspect of the present disclosure, there is provided a training method of a deep learning model, including: inputting a sample image into a deep learning model to obtain a sample detection result, wherein the sample detection result comprises a sample category and a sample category confidence coefficient about a target object in the sample image; training the deep learning model by using the sample detection result and the label of the sample image; wherein the sample image is generated by the sample image generation method.

According to another aspect of the present disclosure, there is provided a target detection method including: inputting an image to be identified into a target detection model to obtain a detection result of an object to be identified in the image to be identified; the target detection model is trained by the training method of the deep learning model.

According to another aspect of the present disclosure, there is provided a sample image generating apparatus including: the segmentation module is used for carrying out image segmentation on the image to be processed to obtain a target detection frame of the target object in the image to be processed; the transformation module is used for transforming the target detection frame to obtain a transformation frame; the determining module is used for determining a target image from the images to be processed based on the transformation frame; the identification module is used for carrying out target identification on the target image and determining a label of the target object, wherein the label comprises a category and category confidence, and the category confidence is used for indicating the probability that the target object is of the category; and a generation module for generating a sample image based on the target image and the label of the target object.

According to another aspect of the present disclosure, there is provided a training apparatus of a deep learning model, including: the input module is used for inputting the sample image into the deep learning model to obtain a sample detection result, wherein the sample detection result comprises a sample category and a sample category confidence coefficient about a target object in the sample image; the training module is used for training the deep learning model by using the sample detection result and the label of the sample image; wherein the sample image is generated by the sample image generating device.

According to another aspect of the present disclosure, there is provided an object detection apparatus including: the detection module is used for inputting the image to be identified into the target detection model to obtain a detection result of the object to be identified in the image to be identified; the target detection model is trained by a training device of a deep learning model.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method as disclosed herein.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer as described above to perform a method as disclosed herein.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method as disclosed herein.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 schematically illustrates an exemplary system architecture to which sample image generation methods and apparatus may be applied, according to embodiments of the present disclosure;

FIG. 2 schematically illustrates a flow chart of a sample image generation method according to an embodiment of the disclosure;

FIG. 3 schematically illustrates a flow diagram of generating a target image according to an embodiment of the disclosure;

FIG. 4 schematically illustrates a flow diagram for generating a relationship graph in accordance with an embodiment of the present disclosure;

FIG. 5 schematically illustrates a flow chart of a training method of a deep learning model according to an embodiment of the present disclosure;

FIG. 6 schematically illustrates a flow chart of a target detection method according to an embodiment of the disclosure;

fig. 7 schematically illustrates a block diagram of a sample image generating device according to an embodiment of the disclosure;

FIG. 8 schematically illustrates a block diagram of a training apparatus of a deep learning model according to an embodiment of the present disclosure;

FIG. 9 schematically illustrates a block diagram of an object detection apparatus according to an embodiment of the disclosure; and

Fig. 10 schematically illustrates a block diagram of an electronic device adapted to implement a sample image generation method according to an embodiment of the disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

According to an embodiment of the present disclosure, there is provided a sample image generation method including: and performing image segmentation on the image to be processed to obtain a target detection frame of the target object in the image to be processed. And transforming the target detection frame to obtain a transformation frame. Determining a target image from the image to be processed based on the transformation frame; and carrying out target recognition on the target image, and determining the label of the target object. The tag includes a category and a category confidence, the category confidence being used to indicate a probability that the target object is a category. A sample image is generated based on the target image and the tag of the target object.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing, applying and the like of the personal information of the user all conform to the regulations of related laws and regulations, necessary security measures are adopted, and the public order harmony is not violated.

In the technical scheme of the disclosure, the authorization or consent of the user is obtained before the personal information of the user is obtained or acquired.

Fig. 1 schematically illustrates an exemplary system architecture to which sample image generation methods and apparatuses may be applied, according to embodiments of the present disclosure.

It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios. For example, in another embodiment, an exemplary system architecture to which the sample image generating method and apparatus may be applied may include a terminal device, but the terminal device may implement the sample image generating method and apparatus provided by the embodiments of the present disclosure without interacting with a server.

As shown in fig. 1, a system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired and/or wireless communication links, and the like.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications may be installed on the terminal devices 101, 102, 103, such as a knowledge reading class application, a web browser application, a search class application, an instant messaging tool, a mailbox client and/or social platform software, etc. (as examples only).

The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for content browsed by the user using the terminal devices 101, 102, 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.

It should be noted that, the sample image generating method provided by the embodiments of the present disclosure may be generally performed by the terminal device 101, 102, or 103. Accordingly, the sample image generating apparatus provided by the embodiments of the present disclosure may also be provided in the terminal device 101, 102, or 103.

Or the sample image generation method provided by the embodiments of the present disclosure may be generally performed by the server 105. Accordingly, the sample image generating device provided by the embodiments of the present disclosure may be generally provided in the server 105. The sample image generation method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the sample image generating apparatus provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.

For example, the terminal devices 101, 102, 103 may acquire an image to be processed, and then send the acquired image to be processed to the server 105, and the server 105 performs image segmentation on the image to be processed to obtain a target detection frame of a target object in the image to be processed. And transforming the target detection frame to obtain a transformation frame. Based on the transformation box, a target image is determined from the image to be processed. And carrying out target recognition on the target image, and determining the label of the target object. A sample image is generated based on the target image and the tag of the target object. Or by a server or a cluster of servers capable of communicating with the terminal devices 101, 102, 103 and/or the server 105, and finally generates a sample image. So that a server or cluster of servers in communication with server 105 trains the deep learning model with the sample images.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

It should be noted that the sequence numbers of the respective operations in the following methods are merely representative of the operations for the purpose of description, and should not be construed as representing the order of execution of the respective operations. The method need not be performed in the exact order shown unless explicitly stated.

Fig. 2 schematically illustrates a flowchart of a sample image generation method according to an embodiment of the present disclosure.

As shown in fig. 2, the method includes operations S210 to S240.

In operation S210, image segmentation is performed on the image to be processed to obtain a target detection frame of the target object in the image to be processed.

In operation S220, the target detection frame is transformed to obtain a transformed frame.

In operation S230, a target image is determined from the images to be processed based on the transformation frame.

In operation S240, target recognition is performed on the target image, and a tag of the target object is determined. The tag includes a category and a category confidence.

In operation S250, a sample image is generated based on the target image and the tag of the target object.

According to an embodiment of the present disclosure, performing image segmentation on an image to be processed to obtain a target detection frame of a target object in the image to be processed may include: and detecting a target object in the image to be processed to obtain a target detection frame. The image to be processed may be processed using a deep learning model to obtain a target detection frame for the target object. The type of the deep learning model is not limited, and may include, for example, R-CNN (Region-based Convolution Neural Networks, a Region-based convolutional neural network algorithm, a two-stage target detection algorithm), fast R-CNN (improved algorithm for R-CNN), fast R-CNN (improved algorithm for Fast R-CNN), YOLO (You OnlyLook Once, single-stage target detection algorithm), and the like. The target detection frame may be a model of output data as long as the image to be processed can be used as input data.

According to an embodiment of the present disclosure, transforming the target detection frame to obtain a transformation frame may include: and performing geometric transformation on the target detection frame to obtain a transformation frame. The geometric transformation may include: downsizing or upsizing, but not limited thereto, may include deformation such as square rounding. Any means may be used as long as the target detection frame is changed.

According to an embodiment of the present disclosure, determining a target image from an image to be processed based on a transformation box may include: and cutting out the image selected by the transformation frame from the image to be processed as a target image.

According to the embodiment of the disclosure, the target detection frame is a detection frame matched with the target object, and the target object can be completely framed. The transformation frame is a transformed frame with respect to the target detection frame. Based on the target image determined by the transformation frame, in the case where the transformation frame becomes smaller in size relative to the target detection frame, the target object in the target image is incomplete. In the case where the transform frame becomes large in size with respect to the target detection frame, noise information other than the target object is included in the target image.

According to an embodiment of the present disclosure, performing target recognition on a target image, determining a tag of a target object may include: and inputting the target image into a reference target detection model to obtain the label of the target object. The tag includes a class of the target object and a class confidence. The category confidence is used to indicate the probability that the target object is of the category. The reference object detection model may be a trained model. The detection accuracy reaches a predetermined accuracy threshold. The type of the reference object detection model is not limited, and may include, for example: R-CNN (Region-based Convolution Neural Networks, a Region-based convolutional neural network algorithm, a two-stage target detection algorithm), fast R-CNN (improved algorithm for R-CNN), fast R-CNN (improved algorithm for Fast R-CNN), YOLO (You Only Look Once, single-stage target detection algorithm), and the like. But is not limited thereto. Any model is possible that can use the target image as input data and the category confidence as output data.

According to an embodiment of the present disclosure, a target image and a label of a target object may be taken as a sample image. But is not limited thereto. The tag may also be identified at a predetermined location of the target image, such as the upper left or upper right corner of the target image, as long as it is located remotely from the target object. The target image with the tag identified is taken as a sample image.

According to the embodiment of the disclosure, the target image is subjected to target recognition, the label of the target object is determined, automatic marking of the target image can be achieved, the label of the target object, for example, the category confidence is accurate to be between 0 and 1, and the quality of the sample image is improved while the accuracy of the label of the sample image is improved.

According to the embodiment of the disclosure, a new image data enhancement mode is provided, and the diversity and the data volume of a sample image for training can be improved through the transformation of a target detection frame. In addition, the target image is subjected to target recognition, and the label of the target object is determined, so that the converted target image can be marked by referring to the existing knowledge, the precision of the label is improved, and the quality of the sample image is further improved. And training the deep learning model by using the sample image, and improving the generalization capability and the precision of the trained deep learning model.

Fig. 3 schematically illustrates a flow diagram of generating a target image according to an embodiment of the disclosure.

As shown in fig. 3, the image segmentation is performed on the image to be processed P310 to obtain a target detection frame B310 of the target object T310 in the image to be processed P310. The target detection frame B310 is subjected to size expansion conversion to obtain a conversion frame B320. And determining an image selected by a transformation frame B320 from the images to be processed P310 to obtain a target image P320.

According to the embodiment of the disclosure, the sample image generation method provided by the embodiment of the disclosure can realize image data enhancement, so that the diversity of sample images for training is improved.

According to other embodiments of the present disclosure, a sample image may be generated from a target image and a predetermined label. The predetermined tag may refer to the same category as the target object, as well as a preset category confidence. The preset class confidence is, for example, 1 or other fixed value.

According to the embodiment of the disclosure, since the transformation frame is the transformed identification frame with respect to the target detection frame, noise information other than the target object is included in the target image or the target object in the target image is an incomplete object. Training with the target image, the class confidence realism value will be less than 1. Compared with the preset category confidence, the category confidence obtained based on target identification is more flexible and accurate.

According to an embodiment of the present disclosure, for operation S220 as shown in fig. 2, the target detection frame is transformed to obtain a transformation frame, which may include the following operations.

For example, a scaling threshold for the target detection box is determined. And scaling and transforming the target detection frame according to the proportion meeting the scaling threshold value to obtain a transformation frame.

According to embodiments of the present disclosure, the scaling threshold may include a shrink ratio threshold value and an amplification ratio threshold value. Scaling the target detection box by a scale that satisfies a scaling threshold may include: and amplifying and transforming the target detection frame, wherein the amplifying ratio is smaller than or equal to the amplifying ratio limit value. And performing shrinkage transformation on the target detection frame according to the proportion meeting the scaling threshold value, wherein the shrinkage proportion is larger than or equal to the shrinkage proportion limit value.

According to the embodiment of the disclosure, when the target detection frame is subjected to shrinkage transformation, the information amount of the target object in the target image is reduced along with the reduction of the shrinkage proportion, so that the category confidence is reduced, the information amount of the target object serving as target information is low, and the quality of a sample image generated by the target image does not meet the requirement. Similarly, if the target detection frame is subjected to expansion transformation, the information amount of noise information contained in the target image increases as the expansion ratio becomes larger, resulting in a decrease in the category confidence, and the quality of the sample image generated from the target image does not satisfy the requirement. The scaled transformation frame can meet the requirements by taking the scaling threshold value as a transformation reference, so that the quality of the sample image is ensured.

According to embodiments of the present disclosure, determining a scaling threshold for a target detection box may include the following operations.

For example, a relationship graph of the target object is generated. The relationship graph indicates the relationship between category confidence and scale. A scaling threshold is determined based on the relationship graph and a predetermined category confidence threshold.

According to embodiments of the present disclosure, there is a correspondence between category confidence and scale. The larger the magnification ratio, the more noise information is doped in the target image, and the smaller the category confidence is. The larger the contraction ratio, the more information of the target object removed in the target image, and the smaller the category confidence.

According to an embodiment of the present disclosure, a relationship graph with respect to a target object may be generated in advance. The relationship graph may be a two-dimensional graph. The horizontal axis is used to indicate the scale, and the vertical axis is used to indicate the category confidence. The scaling threshold may be determined by correlating the graph with a predetermined category confidence threshold.

According to an embodiment of the present disclosure, a scaling threshold is determined based on the relationship graph and a predetermined category confidence threshold. For example, a contraction scale limit and an expansion scale limit of the scaling threshold. The determination mode is simple and effective.

According to embodiments of the present disclosure, generating a relationship graph of a target object may include the following operations.

For example, scaling transformation is performed on the target detection frame according to a plurality of predetermined scaling ratios, respectively, to obtain a plurality of predetermined transformation frames corresponding to the predetermined scaling ratios one by one. A plurality of predetermined target images are determined based on the plurality of predetermined transformation boxes. A plurality of predetermined category confidence levels with respect to the target object is determined based on the plurality of predetermined target images. A relationship graph is generated based on the plurality of predetermined category confidence levels and a plurality of predetermined scaling ratios that are in one-to-one correspondence with the plurality of predetermined category confidence levels.

According to embodiments of the present disclosure, the predetermined scaling ratio may include an enlargement ratio or a contraction ratio. For example, taking the target detection frame as a square detection frame as an example, the predetermined scaling may include scaling in the length direction and the width direction or the like. But is not limited thereto. And may further include contracting or expanding in a predetermined ratio in the length direction and the width direction, respectively. The scaling conversion may be performed on the basis of the unchanged position of the center point of the target detection frame.

According to the embodiment of the disclosure, scaling transformation is performed on the target detection frame, so as to obtain a plurality of predetermined transformation frames corresponding to a plurality of predetermined scaling ratios one by one. The center point positions of the plurality of predetermined transformation frames may be the same as the center point positions of the target detection frames. It is thereby ensured that the target object is still in the center of the predetermined transformation box. But is not limited thereto. As long as the predetermined conversion frame can frame the center point position of the target detection frame.

According to an embodiment of the present disclosure, a predetermined target image is determined from images to be processed based on a predetermined transformation frame. Determining a plurality of predetermined category confidence levels for the target object based on the plurality of predetermined target images may include: for each predetermined target image, the predetermined target image is input into a reference target detection model, resulting in a category for the target object and a predetermined category confidence for that category.

According to an embodiment of the present disclosure, a relationship graph is generated based on a plurality of predetermined category confidence levels and a plurality of predetermined scaling ratios that are in one-to-one correspondence with the plurality of predetermined category confidence levels. For example, a discrete plurality of data points may be obtained based on a plurality of associated data of a predetermined scale and a predetermined category confidence, where the scale is indicated on the horizontal axis and the corresponding data point is drawn on the vertical axis in a two-dimensional graph indicating the category confidence. A relationship graph is generated by smoothly connecting a plurality of data points.

According to an embodiment of the present disclosure, the reference target detection model is a model satisfying a predetermined detection accuracy, a predetermined category confidence is acquired through the reference target detection model, and a relationship graph is generated based on the predetermined category confidence. The prior knowledge about the reference target detection model can be fused in the relation graph, so that the generation precision and the effectiveness of the relation graph are improved.

Fig. 4 schematically illustrates a flow diagram for generating a relationship graph according to an embodiment of the disclosure.

As shown in fig. 4, a plurality of predetermined scaling ratios may be preset, and a predetermined scaling table T410 may be generated. And scaling and transforming the target detection frame according to a preset scaling ratio to obtain a plurality of preset transformation frames corresponding to the preset scaling ratios one by one. For each predetermined transformation frame, a predetermined target image corresponding to the predetermined transformation frame is determined from the image to be processed. And inputting the preset target image into the reference target detection model to obtain the preset category confidence and category of the target object in the preset target image. A plurality of associated data lists T420 of predetermined scales and predetermined category confidence levels are generated. Based on the associated data list T420, corresponding data points are drawn in a two-dimensional coordinate graph with the horizontal coordinate axis indicating the scaling and the vertical coordinate axis indicating the category confidence, and a plurality of data points are obtained. The plurality of data points are smoothly connected to generate a relationship graph P410.

The graph of fig. 4 is a schematic diagram showing the relationship to an enlarged scale.

According to an alternative embodiment of the present disclosure, the target detection frame is subjected to an amplification transformation to obtain a transformation frame. The target image generated based on the transformation frame contains the target object and the area information around the target object, so that the sample image contains the context semantic information around the target object.

According to an example embodiment of the present disclosure, an association function between a scale and a category confidence may be determined based on a plurality of scale and category confidence association data. Based on the association function, a relationship graph of the target object is generated.

According to other embodiments of the present disclosure, the scaling threshold may also be determined based on the association function and a predetermined category confidence threshold.

The manner of determining the scaling threshold based on the relationship graph and the predetermined category confidence threshold can be used in cases where the association function between the scaling and the category confidence cannot be determined, as compared to the manner of determining the scaling threshold based on the association function and the predetermined category confidence threshold, thereby expanding the scope of application provided by embodiments of the present disclosure.

According to an alternative embodiment of the present disclosure, for the operation S240 shown in fig. 2, performing target recognition on the target image, determining the tag of the target object may include: a scale of the target image is determined. Based on the scaling and relationship graph, a class confidence of the target object is determined. Based on the category confidence, a tag of the target object is determined.

According to an embodiment of the present disclosure, a category confidence corresponding to a scale is determined from a relationship graph based on the scale and the relationship graph. Based on the category confidence and the category, a tag of the target object is determined.

According to the embodiment of the disclosure, the label of the target object is determined by using the method, so that the relation curve graph can be fully utilized, and the processing efficiency is improved.

According to other embodiments of the present disclosure, for operation S240 shown in fig. 2, performing target recognition on the target image, determining a tag of the target object may further include: and inputting the target image into other reference target detection models to obtain an output result. The output results include the category and category confidence of the target object in the target image. And when the output result and the category confidence obtained based on the relation graph are determined to meet the preset condition, the output result is taken as a label of the target object.

According to an embodiment of the present disclosure, the outputting of the category confidence in the result and the category confidence obtained based on the relationship graph satisfy a predetermined condition may include: the difference between the category confidence in the output result and the category confidence obtained based on the relation graph is smaller than a preset difference. The difference between the category confidence in the output result and the category confidence obtained based on the relation graph is smaller than a preset difference, which indicates that the accuracy of the category confidence meets the requirement, and the category confidence in the output result can be used as a label, but the method is not limited to the method, and the category confidence obtained based on the relation graph can be used as a label, and the details are not repeated.

According to embodiments of the present disclosure, in the case where it is determined that the output result and the class confidence obtained based on the relationship graph do not satisfy the predetermined condition, it is explained that one or more of the class confidence in the output result and the class confidence obtained based on the relationship graph do not satisfy the accuracy, and may be determined in other manners, for example, by a third reference target detection model.

According to the embodiment of the disclosure, the accuracy of the relation curve graph is determined in an auxiliary mode by utilizing other reference target detection models, so that the image quality of the obtained sample image can be further ensured, the deep learning model is trained based on the sample image provided by the embodiment of the disclosure, and the model performance of the deep learning model is improved.

According to another embodiment of the present disclosure, in the case of performing an enlargement transformation or a reduction transformation on the target detection frame, the target detection frame is transformed for operation S220 as shown in fig. 2, resulting in a transformation frame, which may further include the following operations.

For example, the target detection frame is subjected to shrinkage transformation to obtain an initial transformation frame. An initial target image is determined from the image to be processed based on the initial transformation box. Based on the initial target image, a class confidence of the target object is determined. In the case that the category confidence of the target object is determined to be greater than the predetermined confidence threshold, the initial transformation box is taken as a transformation box.

According to the embodiment of the disclosure, the initial transformation box can be input into the reference target detection model to obtain the category confidence of the target object. And under the condition that the class confidence of the target object is determined to be larger than a preset confidence threshold, the amplification ratio of the initial transformation frame is determined to meet the preset requirement, and the noise information contained in the initial target image is smaller than a noise content threshold. The initial transform box may be referred to as a transform box. The initial target image is taken as a target image. And taking the category confidence of the target object as a label of the target object.

According to the embodiments of the present disclosure, by determining the transformation frame in the above manner, the operation can be simplified while determining the transformation frame and the tag of the target object.

According to another embodiment of the present disclosure, for operation S230 as shown in fig. 2, determining a target image from an image to be processed based on a transformation frame may include: and cutting the image to be processed according to the coordinates of the transformation frame in the image to be processed to obtain an image selected by the transformation frame as a target image. But is not limited thereto. May further include: an initial transformed image is determined from the image to be processed based on the transformation box. And carrying out image data enhancement on the initial transformation image to obtain a target image.

According to an embodiment of the present disclosure, determining an initial transformed image from an image to be processed based on a transformation box may include: cutting the image to be processed according to the coordinates of the transformation frame in the image to be processed to obtain an image selected by the transformation frame as an initial transformation image.

According to an embodiment of the present disclosure, image data enhancement is performed on an initial transformed image to obtain a target image, which may include: performing at least one of the following transformations on the initial transformed image to obtain a target image: pixel level transforms, spatial level transforms, and fusion transforms.

According to an embodiment of the present disclosure, the pixel level transformation may include at least one of: blurring, equalization, color conversion, adding noise, etc. Such as brightness, contrast adjustment, etc.

According to an embodiment of the present disclosure, the spatial level transformation may include at least one of: rotation, affine transformation, distortion, etc. Such as randomly resizing the image, rotating the image, etc.

According to an embodiment of the present disclosure, the fusion transformation may include at least one of: splicing, splicing fusion and the like. Such as MixUp (fusion) or mosaics, etc.

According to the embodiment of the disclosure, the transformation of the initial transformation image is added on the basis of the transformation of the target detection frame, so that the diversity of the sample image can be improved, and the data volume of the sample image can be further expanded.

Fig. 5 schematically illustrates a flowchart of a training method of a deep learning model according to an embodiment of the present disclosure.

As shown in fig. 5, the method includes operations S510 to S520.

In operation S510, a sample image is input into the deep learning model, and a sample detection result is obtained. The sample detection result includes a sample category and a sample category confidence for the target object in the sample image.

In operation S520, a deep learning model is trained using the sample detection result and the label of the sample image.

According to an embodiment of the present disclosure, a sample image is generated using the sample image generation method provided by the embodiment of the present disclosure.

According to embodiments of the present disclosure, a sample image may be input into a deep learning model, resulting in a sample detection result, such as a sample class and a sample class confidence of a target object in the sample image.

According to an embodiment of the present disclosure, training a deep learning model using a sample detection result and a label of a sample image may include: and inputting the sample detection result and the label of the sample image into a loss function to obtain a loss value. Based on the loss value, parameters of the deep learning model are adjusted until a predetermined training condition is satisfied. And taking the model meeting the preset training condition as a target model. The predetermined training conditions may include: the parameter adjustment turns reach a preset turn threshold. But is not limited thereto. May further include: the loss value converges. As long as it is a condition that can characterize the accuracy of the target model.

According to the embodiment of the present disclosure, the type of the loss function is not limited. For example, may include a cross entropy loss function.

According to an embodiment of the present disclosure, a sample image is generated using the sample image generation method provided by the embodiment of the present disclosure. The data enhancement mode for converting the target detection frame can obtain a target image with sufficient data quantity, and the target image is marked, so that the label of the target object is accurate and effective. And further, the sample image generated by the method is used for training, and the trained target model has high precision and strong generalization capability.

Fig. 6 schematically illustrates a flow chart of a target detection method according to an embodiment of the disclosure.

As shown in fig. 6, the method includes operation S610.

In operation S610, the identification image is input into the target detection model, and a detection result of the object to be identified in the image to be identified is obtained.

According to embodiments of the present disclosure, the detection result may include a category of the object to be identified and a category confidence. But is not limited thereto. The detection result may also include a detection box, a category, and a category confidence of the object to be identified.

According to the embodiment of the disclosure, the target detection model is trained by using the training method of the deep learning model provided by the embodiment of the disclosure. The target detection model obtained by training by the training method of the deep learning model provided by the embodiment of the disclosure has good model generalization performance and high recognition precision.

According to an embodiment of the present disclosure, the input data of the object detection model is an identification image, and the output data is a detection result. The object detection model is matched to the training method of the deep learning model as shown in fig. 5.

According to other embodiments of the present disclosure, inputting the identification image into the target detection model to obtain a detection result of the object to be identified in the image to be identified may further include: and carrying out image segmentation on the image to be identified to obtain a detection frame of the object to be identified in the image to be identified. And inputting the image corresponding to the detection frame into the target detection model to obtain a detection result of the object to be identified in the image to be identified. The input data of the target detection model is an image corresponding to the detection frame, and the output data is a detection result. The target detection model is matched with a training method of a deep learning model described below.

For example, inputting the sample image into the deep learning model to obtain the sample detection result may include: and carrying out semantic segmentation on the sample image to obtain a sample detection frame of the sample image. And inputting the image corresponding to the sample detection frame into the deep learning model to obtain a sample detection result. Training a deep learning model by using the sample detection result and the label of the sample image.

According to the embodiment of the present disclosure, the method of using the object detection model is not limited. May be adjusted according to the adjustment of the training method of the deep learning model provided by the embodiments of the present disclosure. And will not be described in detail herein.

Fig. 7 schematically shows a block diagram of a sample image generating device according to an embodiment of the present disclosure.

As shown in fig. 7, the sample image generating apparatus 700 includes: the segmentation module 710, the transformation module 720, the determination module 730, the identification module 740, and the generation module 750.

The segmentation module 710 is configured to perform image segmentation on the image to be processed to obtain a target detection frame of the target object in the image to be processed.

The transformation module 720 is configured to transform the target detection frame to obtain a transformation frame.

A determining module 730, configured to determine a target image from the images to be processed based on the transformation box.

The recognition module 740 is configured to perform target recognition on the target image, and determine a tag of the target object. The tag includes a category and a category confidence, the category confidence being used to indicate a probability that the target object is a category.

A generating module 750 is configured to generate a sample image based on the target image and the tag of the target object.

According to an embodiment of the present disclosure, a transformation module includes: the first determination sub-module and the first scaling sub-module.

And the first determining submodule is used for determining a scaling threshold value of the target detection frame.

And the first scaling submodule is used for scaling and transforming the target detection frame according to the proportion meeting the scaling threshold value to obtain a transformation frame.

According to an embodiment of the present disclosure, the first determination submodule includes: a generating unit and a first determining unit.

And the generating unit is used for generating a relation graph of the target object. The relationship graph indicates the relationship between category confidence and scale.

A first determining unit for determining a scaling threshold based on the relationship graph and a predetermined category confidence threshold.

According to an embodiment of the present disclosure, the generating unit includes: the device comprises a scaling subunit, a first determining subunit, a second determining subunit and a generating subunit.

And the scaling subunit is used for scaling and transforming the target detection frame according to a plurality of preset scaling scales respectively to obtain a plurality of preset transformation frames corresponding to the preset scaling scales one by one.

A first determination subunit configured to determine a plurality of predetermined target images based on the plurality of predetermined transform boxes.

A second determination subunit for determining a plurality of predetermined category confidence levels with respect to the target object based on the plurality of predetermined target images.

And the generation subunit is used for generating a relation graph based on the plurality of preset category confidence degrees and a plurality of preset scaling ratios which are in one-to-one correspondence with the plurality of preset category confidence degrees.

According to an embodiment of the present disclosure, an identification module includes: the second determination sub-module, the third determination sub-module, and the fourth determination sub-module.

And the second determination submodule is used for determining the scaling ratio of the target image.

And a third determination submodule, configured to determine a class confidence of the target object based on the scaling and the relationship graph.

And a fourth determination submodule, configured to determine a tag of the target object based on the category confidence.

According to an embodiment of the present disclosure, a transformation module includes: the second scaling sub-module, the fifth determination sub-module, the sixth determination sub-module, and the seventh determination sub-module.

And the second scaling sub-module is used for scaling and transforming the target detection frame to obtain an initial transformation frame.

And a fifth determining sub-module, configured to determine an initial target image from the images to be processed based on the initial transformation frame.

And a sixth determination submodule, configured to determine a category confidence of the target object based on the initial target image.

And a seventh determining sub-module, configured to take the initial transformation frame as a transformation frame when determining that the class confidence of the target object is greater than a predetermined confidence threshold.

According to an embodiment of the present disclosure, the determining module includes: an eighth determination submodule and a data enhancer module.

An eighth determination submodule is used for determining an initial target image from the images to be processed based on the transformation frame.

And the data enhancer module is used for enhancing the image data of the initial target image to obtain the target image.

Fig. 8 schematically shows a block diagram of a sample image generating apparatus according to an embodiment of the present disclosure.

As shown in fig. 8, the training apparatus 800 of the deep learning model may include: input module 810 and training module 820.

The input module 810 is configured to input the sample image into the deep learning model, and obtain a sample detection result. The sample detection result includes a sample category and a sample category confidence for the target object in the sample image.

The training module 820 is configured to train the deep learning model by using the sample detection result and the label of the sample image.

According to an embodiment of the present disclosure, a sample image is generated using the sample image generating device provided by the embodiment of the present disclosure.

Fig. 9 schematically illustrates a block diagram of an object detection apparatus according to an embodiment of the present disclosure.

As shown in fig. 9, the object detection apparatus 900 may include: a detection module 910.

The detection module 910 is configured to input an image to be identified into the target detection model, and obtain a detection result of the object to be identified in the image to be identified.

According to the embodiment of the disclosure, the target detection model is trained by using the training device of the deep learning model provided by the embodiment of the disclosure.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method as in an embodiment of the present disclosure.

According to an embodiment of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a method as in an embodiment of the present disclosure.

According to an embodiment of the present disclosure, a computer program product comprising a computer program which, when executed by a processor, implements a method as an embodiment of the present disclosure.

Fig. 10 shows a schematic block diagram of an example electronic device 1000 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 10, the apparatus 1000 includes a computing unit 1001 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1002 or a computer program loaded from a storage unit 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data required for the operation of the device 1000 can also be stored. The computing unit 1001, the ROM 1002, and the RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.

Various components in device 1000 are connected to an input/output (I/O) interface 1005, including: an input unit 1006 such as a keyboard, a mouse, and the like; an output unit 1007 such as various types of displays, speakers, and the like; a storage unit 1008 such as a magnetic disk, an optical disk, or the like; and communication unit 1009 such as a network card, modem, wireless communication transceiver, etc. Communication unit 1009 allows device 1000 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.

The computing unit 1001 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1001 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1001 performs the respective methods and processes described above, for example, a sample image generation method. For example, in some embodiments, the sample image generation method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 1008. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1000 via ROM 1002 and/or communication unit 1009. When the computer program is loaded into the RAM 1003 and executed by the computing unit 1001, one or more steps of the sample image generation method described above may be performed. Alternatively, in other embodiments, the computing unit 1001 may be configured to perform the sample image generation method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A sample image generation method, comprising:

image segmentation is carried out on an image to be processed to obtain a target detection frame of a target object in the image to be processed;

Transforming the target detection frame to obtain a transformation frame;

Determining a target image from the image to be processed based on the transformation frame;

Performing target recognition on the target image, and determining a label of the target object, wherein the label comprises a category and category confidence, and the category confidence is used for indicating the probability that the target object is of the category; and

Generating a sample image based on the target image and a label of the target object;

the target recognition is performed on the target image, and the determining of the label of the target object includes:

determining a scale of the target image;

determining a category confidence of the target object based on the scale and a relationship graph, wherein the relationship graph indicates a relationship between category confidence and scale; and

And determining the label of the target object based on the category confidence.

2. The method of claim 1, wherein transforming the target detection box to obtain a transformed box comprises:

determining a scaling threshold of the target detection frame; and

And scaling and transforming the target detection frame according to the proportion meeting the scaling threshold value to obtain the transformation frame.

3. The method of claim 2, wherein the determining the scaling threshold for the target detection box comprises:

generating the relationship graph of the target object; and

The scaling threshold is determined based on the relationship graph and a predetermined category confidence threshold.

4. A method according to claim 3, wherein the generating a relationship graph of the target object comprises:

Scaling and transforming the target detection frame according to a plurality of preset scaling scales respectively to obtain a plurality of preset transformation frames corresponding to the preset scaling scales one by one;

Determining a plurality of predetermined target images based on a plurality of the predetermined transformation boxes;

determining a plurality of predetermined category confidence levels with respect to the target object based on the plurality of predetermined target images;

The relationship graph is generated based on the plurality of predetermined category confidence levels and a plurality of predetermined scaling ratios that are in one-to-one correspondence with the plurality of predetermined category confidence levels.

5. A method according to claim 3, wherein said transforming the target detection box to obtain a transformed box comprises:

Scaling transformation is carried out on the target detection frame to obtain an initial transformation frame;

determining an initial target image from the image to be processed based on the initial transformation frame;

Determining a category confidence of the target object based on the initial target image; and

And taking the initial transformation frame as the transformation frame when the category confidence of the target object is determined to be greater than the preset category confidence threshold.

6. The method of claim 1, wherein the determining, based on the transformation box, a target image from the image to be processed comprises:

Determining an initial transformed image from the image to be processed based on the transformation frame; and

And carrying out image data enhancement on the initial transformation image to obtain the target image.

7. A training method of a deep learning model, comprising:

Inputting a sample image into a deep learning model to obtain a sample detection result, wherein the sample detection result comprises a sample category and a sample category confidence degree about a target object in the sample image; and

Training the deep learning model by using the sample detection result and the label of the sample image;

Wherein the sample image is generated using the method according to any one of claims 1 to 6.

8. A target detection method comprising:

Inputting an image to be identified into a target detection model to obtain a detection result of an object to be identified in the image to be identified;

wherein the object detection model is trained using the method of claim 7.

9. A sample image generation apparatus comprising:

The segmentation module is used for carrying out image segmentation on the image to be processed to obtain a target detection frame of a target object in the image to be processed;

the transformation module is used for transforming the target detection frame to obtain a transformation frame;

the determining module is used for determining a target image from the images to be processed based on the transformation frame;

the identification module is used for carrying out target identification on the target image and determining a label of the target object, wherein the label comprises a category and category confidence, and the category confidence is used for indicating the probability that the target object is of the category; and

The generation module is used for generating a sample image based on the target image and the label of the target object;

wherein, the identification module includes:

a second determination sub-module for determining a scale of the target image;

A third determination sub-module for determining a category confidence of the target object based on the scale and a relationship graph, wherein the relationship graph indicates a relationship between category confidence and scale; and

And a fourth determining sub-module, configured to determine a tag of the target object based on the category confidence.

10. The apparatus of claim 9, wherein the transformation module comprises:

a first determining submodule for determining a scaling threshold of the target detection frame; and

And the first scaling submodule is used for scaling and transforming the target detection frame according to the proportion meeting the scaling threshold value to obtain the transformation frame.

11. The apparatus of claim 10, wherein the first determination submodule comprises:

a generating unit configured to generate the relationship graph of the target object; and

A first determining unit for determining the scaling threshold based on the relationship graph and a predetermined category confidence threshold.

12. The apparatus of claim 11, wherein the generating unit comprises:

The scaling subunit is used for performing scaling transformation on the target detection frame according to a plurality of preset scaling scales respectively to obtain a plurality of preset transformation frames corresponding to the preset scaling scales one by one;

a first determining subunit configured to determine a plurality of predetermined target images based on a plurality of the predetermined transform boxes;

a second determination subunit configured to determine a plurality of predetermined category confidences regarding the target object based on the plurality of predetermined target images;

And the generation subunit is used for generating the relation graph based on the plurality of the preset category confidence degrees and a plurality of preset scaling ratios which are in one-to-one correspondence with the plurality of the preset category confidence degrees.

13. The apparatus of claim 11, wherein the transformation module comprises:

The second scaling sub-module is used for scaling and transforming the target detection frame to obtain an initial transformation frame;

A fifth determining submodule, configured to determine an initial target image from the image to be processed based on the initial transformation frame;

a sixth determining sub-module, configured to determine a category confidence of the target object based on the initial target image; and

A seventh determining sub-module, configured to take the initial transformation frame as the transformation frame when determining that the class confidence of the target object is greater than the predetermined class confidence threshold.

14. The apparatus of claim 9, wherein the means for determining comprises:

An eighth determining submodule, configured to determine an initial transformed image from the image to be processed based on the transformation frame; and

And the data enhancer module is used for enhancing the image data of the initial transformation image to obtain the target image.

15. A training device for a deep learning model, comprising:

the input module is used for inputting a sample image into the deep learning model to obtain a sample detection result, wherein the sample detection result comprises a sample category and a sample category confidence degree about a target object in the sample image; and

The training module is used for training the deep learning model by using the sample detection result and the label of the sample image;

wherein the sample image is generated using an apparatus according to any one of claims 9 to 14.

16. An object detection apparatus comprising:

the detection module is used for inputting the image to be identified into the target detection model to obtain a detection result of the object to be identified in the image to be identified;

wherein the object detection model is trained using the apparatus of claim 15.

17. An electronic device, comprising:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 8.

18. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1 to 8.

19. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 8.