CN109472264B

CN109472264B - Method and apparatus for generating an object detection model

Info

Publication number: CN109472264B
Application number: CN201811332327.3A
Authority: CN
Inventors: 刘阳
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Douyin Vision Co Ltd; Douyin Vision Beijing Co Ltd
Priority date: 2018-11-09
Filing date: 2018-11-09
Publication date: 2020-10-27
Anticipated expiration: 2038-11-09
Also published as: CN109472264A

Abstract

The embodiment of the application discloses a method and a device for generating an object detection model. One embodiment of the method comprises: acquiring a sample set, wherein samples in the sample set comprise sample images and an intersection comparison set corresponding to the sample images, and the intersection comparison set comprises an intersection comparison of at least one preset image area of the sample images and an image area where an object to be detected is displayed in the sample images; selecting samples from the sample set, and performing the following training steps: inputting a sample image in the selected sample into the initial model to obtain an output intersection comparison set corresponding to the sample image; analyzing the obtained output intersection comparison set and the intersection comparison set in the selected samples to determine a loss value; determining whether the initial model is trained or not according to the loss value; in response to determining that the initial model training is complete, the initial model is determined to be an object detection model. The embodiment realizes the training of the machine learning method to obtain the object detection model.

Description

Method and apparatus for generating an object detection model

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method and a device for generating an object detection model.

Background

Object detection is an important research direction in the field of computer vision. The object detection algorithms proposed at present are generally algorithms based on deep learning. Commonly used object detection algorithms include one-stage detection algorithms such as yolo (young Only Look one), ssd (single Shot multi box detector), and two-stage detection algorithms based on RCNN (regions with conditional Neural Network features), Faster RCNN, etc.

These object detection algorithms are complex and have continuous solution space, so that the computation complexity is high, and therefore, these object detection algorithms are not well suited for some application scenarios with limited computation resources, such as terminal devices (e.g., mobile phone terminals) of an embedded operating system.

Disclosure of Invention

The embodiment of the application provides a method and a device for generating an object detection model.

In a first aspect, an embodiment of the present application provides a method for generating an object detection model, where the method includes: acquiring a sample set, wherein samples in the sample set comprise sample images and an intersection comparison set corresponding to the sample images, and the intersection comparison set comprises an intersection comparison of at least one preset image area of the sample images and an image area where an object to be detected is displayed in the sample images; selecting samples from the sample set, and performing the following training steps: inputting a sample image in the selected sample into the initial model to obtain an output intersection comparison set corresponding to the sample image; analyzing the obtained output intersection comparison set and the intersection comparison set in the selected samples to determine a loss value; determining whether the initial model is trained or not according to the loss value; in response to determining that the initial model training is complete, the initial model is determined to be an object detection model.

In some embodiments, the training step further comprises: and in response to determining that the initial model is not trained completely, adjusting parameters of the initial model according to the loss value, reselecting the sample from the sample set, and continuing to execute the training step by using the adjusted initial model as the initial model.

In some embodiments, the sample set is obtained by: acquiring a sample image set; determining a target position information set and a target size set; determining an image area, which is displayed by the sample image and surrounded by the minimum circumscribed polygon of the object to be detected, as a first image area corresponding to the sample image for the sample image in the sample image set; determining pixel points indicated by the target position information in the target position information set in the sample image as target pixel points to obtain a target pixel point set corresponding to the sample image; for a target pixel point in the target pixel point set, determining an image area with a target size in the target size set, where the target pixel point is located, as a second image area, and obtaining a second image area set corresponding to the target pixel point; and determining the intersection ratio of the second image area in the second image area set corresponding to the target pixel point in the target pixel point set and the first image area respectively to obtain an intersection ratio set corresponding to the sample image, and forming the sample in the sample set by the obtained intersection ratio set and the sample image.

In some embodiments, the shape of the second image region in the second image region set corresponding to the target pixel point in the first image region and the target pixel point in the target pixel point set is determined according to the attribute information of the object to be detected.

In some embodiments, the initial model is a full convolutional network.

In a second aspect, an embodiment of the present application provides an apparatus for generating an object detection model, the apparatus including: the acquisition unit is configured to acquire a sample set, wherein samples in the sample set comprise sample images and an intersection comparison set corresponding to the sample images, and the intersection comparison set comprises an intersection comparison of at least one preset image area of the sample images and an image area, displayed by the sample images, where an object to be detected is located; a training unit configured to select samples from a set of samples and to perform the following training steps: inputting a sample image in the selected sample into the initial model to obtain an output intersection comparison set corresponding to the sample image; analyzing the obtained output intersection comparison set and the intersection comparison set in the selected samples to determine a loss value; determining whether the initial model is trained or not according to the loss value; in response to determining that the initial model training is complete, the initial model is determined to be an object detection model.

In some embodiments, the training unit is further configured to: and in response to determining that the initial model is not trained completely, adjusting parameters of the initial model according to the loss value, reselecting the sample from the sample set, and continuing to execute the training step by using the adjusted initial model as the initial model.

In some embodiments, the initial model is a full convolutional network.

In a third aspect, an embodiment of the present application provides a method for processing an image, including: acquiring an image to be processed; inputting the image to be processed into a pre-trained object detection model, and obtaining an intersection comparison set corresponding to the image to be processed, wherein the object detection model is generated by the method described in any one implementation manner in the first aspect, and the intersection comparison set comprises intersection ratios of at least one preset image area of the image to be processed and an image area where the object to be processed is located and displayed by the image to be processed.

In some embodiments, the method for processing an image described above, further includes: selecting preset image areas as target image areas from preset image areas respectively corresponding to the intersection ratios in the intersection ratio set by using a non-maximum suppression algorithm according to the intersection ratios in the intersection ratio set and the intersection ratio set; processing the image to be processed to highlight the target image area to obtain a processed image to be processed; and displaying the processed image to be processed.

In a fourth aspect, an embodiment of the present application provides an electronic device, including: one or more processors; storage means for storing one or more programs; when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method as described in any implementation of the first aspect or the third aspect.

In a fifth aspect, the present application provides a computer-readable medium, on which a computer program is stored, which, when executed by a processor, implements the method as described in any implementation manner of the first aspect or the third aspect.

According to the method and the device for generating the object detection model, the sample set is obtained, wherein the samples in the sample set comprise the sample images and the intersection comparison set corresponding to the sample images, and the intersection comparison set comprises the intersection comparison of at least one preset image area of the sample images and the image area, where the object to be detected is, of the sample images. And then training the initial model by using a machine learning method based on the acquired sample set to obtain an object detection model, so that the position information of the object to be detected displayed in the image can be determined according to the intersection and comparison corresponding to each preset image area of the image obtained by the object detection model.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for generating an object detection model according to the present application;

FIG. 3 is a schematic diagram of an application scenario of a method for generating an object detection model according to an embodiment of the present application;

FIG. 4 is a flow diagram of yet another embodiment of a method for generating an object detection model according to the present application;

FIG. 5 is a flow diagram of one embodiment of a method for processing an image according to the present application;

FIG. 6 is a schematic block diagram of one embodiment of an apparatus for generating an object detection model according to the present application;

FIG. 7 is a block diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 shows an exemplary architecture 100 to which embodiments of the present method for generating an object detection model or apparatus for generating an object detection model may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The

terminal devices

101, 102, 103 interact with a server 105 via a network 104 to receive or send messages or the like. Various client applications may be installed on the

terminal devices

101, 102, 103. Such as a web browser application, an image-like application, a search-like application, and the like.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices that support image processing, including but not limited to smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, for example a server training an initial model from a sample set sent by the

terminal devices

101, 102, 103.

It should be noted that the sample set for training the initial model may also be directly stored locally in the server 105, and the server 105 may directly extract the locally stored sample set to train the initial model, in which case, the

terminal devices

101, 102, and 103 and the network 104 may not be present.

It should be noted that the method for generating the object detection model provided in the embodiment of the present application is generally performed by the server 105, and accordingly, the apparatus for generating the object detection model is generally disposed in the server 105.

It should be noted that the

terminal devices

101, 102, and 103 may also train the initial model based on the sample set, in this case, the method for generating the object detection model may also be executed by the

terminal devices

101, 102, and 103, and accordingly, the apparatus for generating the object detection model may also be provided in the

terminal devices

101, 102, and 103. At this point, the exemplary system architecture 100 may not have the server 105 and the network 104.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for generating an object detection model in accordance with the present application is shown. The method for generating an object detection model comprises the following steps:

step 201, a sample set is obtained, wherein samples in the sample set include a sample image and an intersection comparison set corresponding to the sample image.

In the present embodiment, the sample image may be an arbitrary image showing the object to be detected. The object to be detected can be any detectable object determined according to application requirements. For example, if the object to be detected is a human face, the sample image may be an image in which a human face is displayed.

In this embodiment, the intersection comparison set includes intersection ratios between at least one preset image region of the sample image and image regions where the object to be detected is located and displayed in the sample image. The intersection ratio of two image regions may represent the ratio of the area of the image region corresponding to the intersection of the two image regions to the area of the image region corresponding to the union of the two image regions.

Wherein the at least one preset image area of the sample image may be preset by a technician. The image area where the object to be detected is displayed in the sample image may also be preset by a technician. At least one preset image area corresponding to a sample image in each sample in the sample set can be determined according to the same setting mode.

For any sample image, for example, sixty-four image areas of the sample image can be obtained by performing octal division from the horizontal direction and the vertical direction, and the sixty-four image areas can be used as the preset image areas of the sample image. For example, in addition to sixty-four image areas of the sample image, sixteen image areas of the sample image may be obtained by performing quartering from the horizontal direction and the vertical direction, respectively. Then, eighty total image areas obtained by two times of division can be used as the preset image area of the sample image.

It should be understood that, in practice, the preset image area of the sample image may be specifically set according to the actual application requirements. For example, if the application scene of the face detection is used, the attributes such as the shape and the size of the preset image area of the sample image can be set according to the face proportion of the ordinary person. For another example, in the application scenario of automobile detection, the attributes such as the shape and size of the preset image area of the sample image may be set according to the size of some common automobiles.

The sample set may be obtained by selecting sample images in advance by technicians and calculating an intersection set corresponding to the sample images, or may be obtained by selecting some public data sets as the sample set or adjusting the public data sets to form the sample set.

Step 202, selecting samples from the sample set, and performing the following training steps 2021-2024:

in this embodiment, there are many ways to select samples from the sample set. For example, a preset number of samples may be randomly chosen from a set of samples. As another example, a predetermined number of samples that have not been selected may be selected from the sample set.

Step 2021, inputting the sample image in the selected sample into the initial model to obtain an output intersection comparison set corresponding to the sample image.

In this embodiment, the initial model may be various types of untrained or untrained artificial neural networks, such as a deep learning model. The initial model may also be a model that combines a variety of untrained or untrained artificial neural networks. Specifically, the skilled person can construct the initial model based on the disclosed deep learning framework according to the actual application requirements (e.g. which layers are included, the number of layers per layer, the size of the convolution kernel, etc. as required).

In some optional implementations of this embodiment, the initial model may be set to a full convolutional network.

It should be understood that if the number of samples selected in step 202 is greater than one, the sample images in the selected samples are respectively input into the initial model. Correspondingly, an output union ratio set output by the initial model and respectively corresponding to the sample images in the samples can be obtained.

Step 2022, analyzing the obtained output union ratio set and the union ratio set in the selected samples to determine a loss value.

In this embodiment, the loss value may be used to represent the degree of difference in the output union ratio versus the union ratio set in the set samples. Ideally, the output cross-over ratio set is the same as the cross-over ratio set in the sample. It should be understood that the output union ratio set and the union ratios in the union ratio set in the sample are one-to-one. Assume that the preset image area of a sample image includes A, B, C, D four image areas. Then the output union ratio of the corresponding image area a in the union ratio set corresponds to the union ratio of the corresponding image area a in the union ratio set in the sample, and so on.

In this embodiment, the loss value may be determined in various ways. For example, for any one cross-over ratio in the output cross-over comparison set, the cross-over ratio and the cross-over ratio corresponding to the cross-over ratio in the cross-over comparison set in the sample can be used as one group, so that a plurality of groups of cross-over ratios are obtained. The square of the difference of the two cross-over ratios in each set of cross-over ratios may then be calculated. The average of the intersection of the groups compared to the square of the corresponding difference may then be determined as the loss value.

For another example, the absolute value of the difference between two intersection ratios in each set of intersection ratios may be determined, and then the average value of the absolute values of the differences corresponding to each set of intersection ratios may be determined, and the obtained average value may be used as the target index. Then, a target exponential power of the natural constant may be determined as the loss value.

Step 2023, determining whether the initial model is trained according to the loss value.

In this embodiment, the determination manner for determining whether the initial model is trained can be set by a technician according to the actual application requirements. For example, whether the initial model is trained can be determined by determining whether the loss value is less than a preset loss threshold. And if the loss value is larger than the loss threshold value, determining that the initial model is not trained.

Step 2024, in response to determining that the initial model training is complete, determining the initial model as the object detection model.

In some optional implementations of this embodiment, in response to determining that the initial model is not trained completely, parameters of the initial model may be adjusted according to the loss value, and the samples are reselected from the sample set, and the training step is continuously performed using the adjusted initial model as the initial model.

In the above implementation, parameters of the initial model may be adjusted by using a gradient descent and back propagation algorithm according to the determined loss value, so that the output union ratio set corresponding to the adjusted initial model and the union ratio set in the corresponding sample are as consistent as possible.

In practice, the training process of the initial model usually requires multiple iterative training, and in the training process, various ways of judging whether the initial model is trained can be set. For example, when the initial model is trained for the first time, whether the initial model is trained can be determined according to the magnitude relation between the loss value and the loss threshold value. After the initial model is adjusted in parameters, whether the adjusted initial model is trained or not can be judged according to the sum of loss values corresponding to the initial model under different parameters. For example, whether training is completed may be determined by determining whether a difference between loss values corresponding to the initial model after the parameters are adjusted for a plurality of times is smaller than a preset difference threshold.

It should be noted that, each time a sample is selected from the sample set, one sample may be selected, a plurality of samples (more than one) may be selected, or a predetermined number of samples may be selected. When the number of selected samples is greater than one, a corresponding loss value may be determined for each sample in the above-described implementation. And then, determining the overall loss value according to the loss values respectively corresponding to the samples selected at this time. For example, the sum of the loss values respectively corresponding to the samples selected this time or the maximum loss value thereof may be determined as an overall loss value, and then the overall loss value may be used to adjust the parameters of the model in the training process.

With continued reference to fig. 3, fig. 3 is a schematic diagram 300 of an application scenario of the method for generating an object detection model according to the present embodiment. In the application scenario of fig. 3, a sample set 301 is obtained first. Wherein each sample comprises a sample image and an intersection set corresponding to the sample image. And the intersection comparison set in each sample comprises intersection ratios of four image areas obtained by respectively halving the sample image in the horizontal direction and the vertical direction and an area where the displayed object to be detected is located.

Thereafter, a sample 302 may be selected from the sample set 301. Taking sample 302 as an example, sample 302 includes sample image 3021 and a corresponding set of cross-over ratios 3022 for the sample image. The cross-over ratio set 3022 includes four cross-over ratios a1, B1, C1, and D1, which correspond to A, B, C, D four image regions of the sample image 3021, respectively.

Thereafter, the samples 302 may be input to the initial model 303, resulting in an output union ratio set 304 corresponding to the samples 302. The output cross-over ratio set 304 includes four cross-over ratios a1 ', B1', C1 'and D1', which correspond to A, B, C, D four image areas of the sample image 3021, respectively.

The absolute values of the differences of the output union ratios 304 and the corresponding union ratios in the union ratio set 3022 in the samples 302 may then be calculated, respectively. Specifically, the absolute values of the differences of a1 and a1 ', B1 and B1', C1 and C1 ', and D1 and D1' were calculated, respectively. Then, the average of the four absolute values obtained is calculated, and the obtained average is taken as the loss value 305.

Thereafter, it may be determined from the loss value 305 whether the initial model 303 is trained. If the initial model 303 is not trained, the parameters of the initial model 303 may be adjusted by using a gradient descent and back propagation algorithm according to the loss value 305, and the above process may be continuously performed by selecting a sample from the sample set 301 until it is determined that the training of the initial model 303 is completed, and the trained initial model 303 may be determined as the object detection model.

The method provided by the above embodiment of the present application trains the initial model through the sample set to obtain the object detection model. The samples in the sample set comprise sample images and an intersection comparison set corresponding to the sample images, wherein the intersection comparison set comprises intersection ratios of at least one preset image area of the sample images and an image area where the object to be detected is displayed in the sample images. Therefore, the intersection ratio corresponding to each preset image area of the image obtained by the trained object detection model can be used for further judging the position of the object to be detected displayed in the image.

With further reference to FIG. 4, a flow 400 of yet another embodiment of a method for generating an object detection model is shown. The process 400 of the method for generating an object detection model includes the steps of:

step 401, obtaining a sample set through the following steps 4011-:

step 4011, a sample image set is obtained.

In the present embodiment, the sample image may be an arbitrary image showing the object to be detected. The sample image set may be obtained by selecting some sample images by a technician in advance, or some public data sets may be selected as the sample set or adjusted to form the sample set.

Step 4012, a set of target location information and a set of target size are determined.

In this embodiment, the set of target position information and the set of target size may be specified in advance by a technician. The target position information may indicate a position of the pixel point in the image, and the target size may indicate a size of the image region. In practice, the specific information indicated by the target size can be specifically set according to different application scenarios. For example, the target size may be used to indicate length and width information of the image region, may be used to indicate area information of the image region, and the like.

In practice, the set of samples may be determined based on the set of target location information and the set of target sizes. Therefore, the target position information and the target size can be set according to the actual application scenario.

Step 4013, for the sample images in the sample image set, determining an intersection comparison set corresponding to the sample images through the following steps 40131 and 40134:

in this embodiment, an intersection comparison set corresponding to each sample image in the sample image set may be determined, and then each sample image and the intersection comparison set corresponding to the sample image may be used as one sample, so as to obtain the sample set.

Step 40131, determining an image area surrounded by the minimum circumscribed polygon of the object to be detected displayed by the sample image as a first image area corresponding to the sample image.

In this embodiment, the minimum circumscribed polygon of the object to be detected displayed by the sample image may be determined by using various image processing software or various open-source algorithms for determining the minimum circumscribed polygon, so as to determine the image area covered by the minimum circumscribed polygon. Of course, the image area surrounded by the minimum bounding polygon may also be labeled manually.

Step 40132, determining a pixel point indicated by the target position information in the target position information set in the sample image as a target pixel point, and obtaining a target pixel point set corresponding to the sample image.

Step 40133, for a target pixel point in the target pixel point set, determining an image area of the target size in the target size set where the target pixel point is located as a second image area, and obtaining a second image area set corresponding to the target pixel point.

In this embodiment, for each target pixel in the target pixel set corresponding to the sample image, a plurality of image regions with different sizes may be determined at each target pixel based on the target size set. That is, for a target size in the target size set, an image area of the target size may be determined at each target pixel point, respectively.

In practice, when determining the image area based on the target size, each target pixel point may be used as a reference point. For example, image regions with different sizes are determined by taking each target pixel point as a geometric center point or an upper left corner point.

As an example, the target set of pixel points includes two pixel points a and B. The target size set includes S1 and S2. Then, it is possible to determine rectangular image regions having areas S1 and S2 with the pixel point a as the geometric center, and then determine rectangular image regions having areas S1 and S2 with the pixel point B as the geometric center. Therefore, two image areas corresponding to the pixel points A and B respectively can be obtained to form a second image area set corresponding to the pixel points A and B respectively.

In some optional implementation manners of this embodiment, the shapes of the first image region and the second image region in the second image region set corresponding to the target pixel point in the target pixel point set may be determined according to the attribute information of the object to be detected. The attribute information of the object to be detected includes various information such as the shape and size of the object to be detected.

Optionally, the shape of the second image region in the second image region set corresponding to the first image region and the target pixel point in the target pixel point set may also be specified by a technician.

Step 40134, determining an intersection ratio between the second image region in the second image region set corresponding to the target pixel point in the target pixel point set and the first image region, respectively, to obtain an intersection ratio set corresponding to the sample image.

In this embodiment, after obtaining the second image region sets corresponding to the target pixel points, the intersection ratio between the second image regions in the second image region sets and the first image region may be calculated.

Step 4014, combining the sample images in the sample image set with the intersection and proportion set corresponding to the sample images to form a sample set so as to obtain the sample set.

In this embodiment, after the intersection comparison set corresponding to each sample image is obtained, each sample image and the intersection comparison set corresponding to the sample image may be used as one sample, so as to obtain a sample set.

Step 402, selecting a sample from the sample set, and performing the following steps 4021-:

step 4021, inputting the sample images in the selected samples into the initial model to obtain an output intersection comparison set corresponding to the sample images.

And step 4022, analyzing the obtained output intersection comparison set and the intersection comparison set in the selected samples, and determining a loss value.

And step 4023, determining whether the initial model is trained according to the loss value.

Step 4024, in response to determining that the initial model training is complete, determining the initial model as the object detection model.

The specific execution process of the above step 402 and the steps 4021-4024 can refer to the related descriptions of the steps 202 and 2021-2024 in the corresponding embodiment of fig. 2, and will not be described herein again.

As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the flow 400 of the method for generating an object detection model in the present embodiment highlights that the image area corresponding to each sample image can be determined based on the target pixel point and the target size set. And determining an intersection comparison set corresponding to each sample image according to the determined image area to obtain a sample set. The method can select the image area in the sample image regularly, and the preset image area can have the image area capable of better displaying the object to be detected by controlling the distribution condition of the target pixel points in the sample image and the size of the image area to be selected.

With further reference to FIG. 5, a flow 500 of yet another embodiment of a method for processing an image is shown. The flow 500 of the method for processing an image comprises the steps of:

step 501, acquiring an image to be processed.

In this embodiment, the executing entity (e.g., server 105 shown in fig. 1) of the method for processing images may retrieve the image to be processed from a local or other storage device using a wired or wireless connection.

Step 502, inputting the image to be processed into a pre-trained object detection model to obtain an intersection comparison set corresponding to the image to be processed.

In this embodiment, the object detection model may be generated by using the method described in the embodiment corresponding to fig. 2 or fig. 4. The intersection comparison set may include intersection ratios of at least one preset image region of the image to be processed and an image region where the object to be detected is displayed in the image to be processed.

It should be understood that the at least one preset image area of the image to be processed is determined in the same way as the at least one preset image area of the sample image in the sample set of the training object detection model.

For example, if the sample images in the sample set of the training object detection model are divided into four parts from the horizontal direction and the vertical direction, sixteen image areas corresponding to the sample images are obtained. When the object detection model obtained by training the sample set is used for processing the image to be processed, the intersection-to-parallel ratio in the obtained intersection-to-parallel ratio set represents the intersection-to-parallel ratio corresponding to sixteen image areas obtained by respectively performing quartering on the image to be processed from the horizontal direction and the vertical direction.

In some optional implementation manners of this embodiment, after the union ratio set corresponding to the image to be processed is obtained, the preset image areas corresponding to the union ratios in the union ratio set may be further selected from the preset image areas corresponding to the union ratios in the union ratio set as the target image areas by using a non-maximum suppression algorithm according to the preset image areas corresponding to the union ratios in the union ratio set and the union ratio set, respectively. The image to be processed may then be further processed to highlight the target image area, resulting in a processed image to be processed. The processed image to be processed may then be displayed.

The processing mode of the image to be processed for highlighting the target image area can be specifically set according to the actual application requirements. For example, the edge of the target image area may be highlighted (e.g., the color of a pixel point included in the edge of the target image area is set to a designated color). For another example, a minimum bounding rectangle of the target image region may be determined, and the determined minimum bounding rectangle may be displayed on the image to be processed using a preset color.

The intersection ratio represents the intersection ratio of the preset image area and the image area where the object to be detected is located. Therefore, the larger the intersection ratio is, the better the intersection ratio can display the object to be detected in the corresponding preset image area. On the basis, the cross-over comparison can be used as the score of the preset image area corresponding to the cross-over comparison, and a preset image area with the highest cross-over comparison in a certain neighborhood is selected from the preset image areas respectively corresponding to the cross-over comparisons by using a non-maximum suppression algorithm. Therefore, it can be considered that the object to be detected is well displayed in the selected target image region, thereby realizing detection of the image to be processed.

Among them, the Non-maximum Suppression (NMS) algorithm is one of algorithms commonly used in the field of object detection. Non-maxima suppression algorithms are well known techniques that are currently widely studied and applied and are not described in detail herein.

The method for processing an image according to the above embodiment of the present application processes an image to be processed by using the generated object detection model, so as to obtain an intersection and comparison set corresponding to the image to be processed. Further, the intersection ratio with a larger intersection ratio than the concentration value may be determined as the image area displaying the object to be detected in the image to be processed. The mode directly utilizes the cross-over ratio as a judgment basis, and the speed of object detection is greatly improved.

With further reference to fig. 6, as an implementation of the methods shown in the above figures, the present application provides an embodiment of an apparatus for generating an object detection model, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable in various electronic devices.

As shown in fig. 6, the apparatus 600 for generating an object detection model provided in the present embodiment includes an acquisition unit 601 and a training unit 602. The acquiring unit 601 is configured to acquire a sample set, where a sample in the sample set includes a sample image and an intersection comparison set corresponding to the sample image, where the intersection comparison set includes an intersection ratio between at least one preset image area of the sample image and an image area where an object to be detected is located and displayed by the sample image; the training unit 602 is configured to select samples from a sample set and to perform the following training steps: inputting a sample image in the selected sample into the initial model to obtain an output intersection comparison set corresponding to the sample image; analyzing the obtained output intersection comparison set and the intersection comparison set in the selected samples to determine a loss value; determining whether the initial model is trained or not according to the loss value; in response to determining that the initial model training is complete, the initial model is determined to be an object detection model.

In the present embodiment, in the apparatus 600 for generating an object detection model: the specific processing of the obtaining unit 601 and the training unit 602 and the technical effects thereof can refer to the related descriptions of step 201 and step 202 in the corresponding embodiment of fig. 2, which are not repeated herein.

In some optional implementations of this embodiment, the training unit 602 is further configured to: and in response to determining that the initial model is not trained completely, adjusting parameters of the initial model according to the loss value, reselecting the sample from the sample set, and continuing to execute the training step by using the adjusted initial model as the initial model.

In some optional implementations of this embodiment, the sample set is obtained by: acquiring a sample image set; determining a target position information set and a target size set; determining an image area, which is displayed by the sample image and surrounded by the minimum circumscribed polygon of the object to be detected, as a first image area corresponding to the sample image for the sample image in the sample image set; determining pixel points indicated by the target position information in the target position information set in the sample image as target pixel points to obtain a target pixel point set corresponding to the sample image; for a target pixel point in the target pixel point set, determining an image area with a target size in the target size set, where the target pixel point is located, as a second image area, and obtaining a second image area set corresponding to the target pixel point; and determining the intersection ratio of the second image area in the second image area set corresponding to the target pixel point in the target pixel point set and the first image area respectively to obtain an intersection ratio set corresponding to the sample image, and forming the sample in the sample set by the obtained intersection ratio set and the sample image.

In some optional implementation manners of this embodiment, the shape of the second image region in the second image region set corresponding to the target pixel point in the first image region and the target pixel point in the target pixel point set is determined according to the attribute information of the object to be detected.

In some alternative implementations of this embodiment, the initial model is a full convolutional network.

The device provided by the above embodiment of the application obtains a sample set through an obtaining unit, where a sample in the sample set includes a sample image and an intersection comparison set corresponding to the sample image, where the intersection comparison set includes an intersection comparison between at least one preset image area of the sample image and an image area where an object to be detected is displayed in the sample image; the training unit selects samples from the sample set and performs the following training steps: inputting a sample image in the selected sample into the initial model to obtain an output intersection comparison set corresponding to the sample image; analyzing the obtained output intersection comparison set and the intersection comparison set in the selected samples to determine a loss value; determining whether the initial model is trained or not according to the loss value; and determining the initial model as an object detection model in response to the determination that the training of the initial model is finished, so that the position information of the object to be detected displayed in the image can be determined according to the intersection ratio corresponding to each preset image area of the image obtained by the object detection model.

Referring now to FIG. 7, shown is a block diagram of a computer system 700 suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the system 700 are also stored. The CPU 701, the ROM 702, and the RAM 703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program, when executed by a Central Processing Unit (CPU)701, performs the above-described functions defined in the method of the present application.

It should be noted that the computer readable medium of the present application can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit and a training unit. Where the names of these units do not in some cases constitute a limitation of the unit itself, for example, the acquisition unit may also be described as a "unit acquiring a sample set".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a sample set, wherein samples in the sample set comprise sample images and an intersection comparison set corresponding to the sample images, and the intersection comparison set comprises an intersection comparison of at least one preset image area of the sample images and an image area where an object to be detected is displayed in the sample images; selecting samples from the sample set, and performing the following training steps: inputting a sample image in the selected sample into the initial model to obtain an output intersection comparison set corresponding to the sample image; analyzing the obtained output intersection comparison set and the intersection comparison set in the selected samples to determine a loss value; determining whether the initial model is trained or not according to the loss value; in response to determining that the initial model training is complete, the initial model is determined to be an object detection model.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method for generating an object detection model, comprising:

acquiring a sample set, wherein a sample in the sample set comprises a sample image and an intersection comparison set corresponding to the sample image, the intersection comparison set comprises at least one preset image region of the sample image and an intersection comparison of image regions where objects to be detected are located and displayed on the sample image, and the intersection comparison of the preset image region and the image regions where the objects to be detected are located represents a ratio of an area of the image region corresponding to an intersection of the preset image region and the image regions where the objects to be detected are located to an area of the image region corresponding to a union of the preset image region and the image regions where the objects to be detected are located;

selecting samples from the sample set, and performing the following training steps: inputting a sample image in the selected sample into the initial model to obtain an output intersection comparison set corresponding to the sample image; analyzing the obtained output intersection comparison set and the intersection comparison set in the selected samples to determine a loss value; determining whether the initial model is trained or not according to the loss value; in response to determining that the initial model training is complete, the initial model is determined to be an object detection model.

2. The method of claim 1, wherein the training step further comprises:

and in response to determining that the initial model is not trained completely, adjusting parameters of the initial model according to the loss value, reselecting the sample from the sample set, and continuing to execute the training step by using the adjusted initial model as the initial model.

3. The method of claim 1, wherein the sample set is obtained by:

acquiring a sample image set;

determining a target position information set and a target size set;

determining an image area, which is displayed by the sample image and surrounded by the minimum circumscribed polygon of the object to be detected, as a first image area corresponding to the sample image for the sample image in the sample image set; determining pixel points indicated by the target position information in the target position information set in the sample image as target pixel points to obtain a target pixel point set corresponding to the sample image; for the target pixel points in the target pixel point set, determining an image area with a target size in the target size set, where the target pixel points are located, as a second image area, and obtaining a second image area set corresponding to the target pixel points; and determining the intersection ratio of the second image area in the second image area set corresponding to the target pixel point in the target pixel point set and the first image area respectively to obtain an intersection ratio set corresponding to the sample image, and forming the sample in the sample set by the obtained intersection ratio set and the sample image.

4. The method of claim 3, wherein the shape of the second image region in the second image region set corresponding to the first image region and the target pixel point in the target pixel point set is determined according to attribute information of the object to be detected.

5. The method according to one of claims 1 to 4, wherein the initial model is a full convolutional network.

6. An apparatus for generating an object detection model, comprising:

the acquisition unit is configured to acquire a sample set, wherein a sample in the sample set comprises a sample image and an intersection comparison set corresponding to the sample image, the intersection comparison set comprises an intersection comparison of at least one preset image region of the sample image and an image region where an object to be detected is displayed on the sample image, and the intersection comparison of the preset image region and the image region where the object to be detected is located represents a ratio of an area of the image region corresponding to an intersection of the preset image region and the image region where the object to be detected is located to an area of an image region corresponding to a union of the preset image region and the image region where the object to be detected is located;

a training unit configured to select samples from the set of samples and to perform the following training steps: inputting a sample image in the selected sample into the initial model to obtain an output intersection comparison set corresponding to the sample image; analyzing the obtained output intersection comparison set and the intersection comparison set in the selected samples to determine a loss value; determining whether the initial model is trained or not according to the loss value; in response to determining that the initial model training is complete, the initial model is determined to be an object detection model.

7. The apparatus of claim 6, wherein the training unit is further configured to:

8. The apparatus of claim 6, wherein the sample set is obtained by:

acquiring a sample image set;

determining a target position information set and a target size set;

9. The apparatus according to claim 8, wherein a shape of a second image region in the second image region set corresponding to the first image region and a target pixel point in the target pixel point set is determined according to attribute information of an object to be detected.

10. The apparatus according to one of claims 6-9, wherein the initial model is a full convolutional network.

11. A method for processing an image, comprising:

acquiring an image to be processed;

inputting the image to be processed into a pre-trained object detection model to obtain an intersection comparison set corresponding to the image to be processed, wherein the object detection model is generated by the method according to any one of claims 1 to 5, the intersection comparison set comprises an intersection comparison between at least one preset image region of the image to be processed and an image region where the object to be processed is located and displayed by the image to be processed, and the intersection comparison between the preset image region and the image region where the object to be processed is represented as a ratio between an area of an image region corresponding to an intersection of the preset image region and the image region where the object to be detected is located and an area of an image region corresponding to a union of the preset image region and the image region where the object to be detected is located.

12. The method of claim 11, wherein the method further comprises:

selecting preset image areas as target image areas from preset image areas respectively corresponding to the intersection ratios in the intersection ratio set by using a non-maximum suppression algorithm according to the intersection ratios in the intersection ratio set and the intersection ratio set;

processing the image to be processed to highlight the target image area to obtain a processed image to be processed;

and displaying the processed image to be processed.

13. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5 or 11-12.

14. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-5 or 11-12.