CN109344864B

CN109344864B - Image processing method and device for dense object

Info

Publication number: CN109344864B
Application number: CN201810973048.9A
Authority: CN
Inventors: 吴寅初; 张默
Original assignee: Beijing Moshanghua Technology Co ltd
Current assignee: Beijing Moshanghua Technology Co ltd
Priority date: 2018-08-24
Filing date: 2018-08-24
Publication date: 2021-04-27
Anticipated expiration: 2038-08-24
Also published as: CN109344864A

Abstract

The application discloses an image processing method and device for dense objects. The method comprises the following steps: determining an original annotated data set; training a natural scene text detection network model according to the original labeling data set; outputting an object data set through the natural scene text detection network model; training an object detection network model according to the object data set; combining the natural scene text detection network model and the object detection network model to obtain a target neural network; and inputting the picture to be recognized to the target neural network, and outputting the image prediction result of the dense object. The method and the device solve the technical problem that the dense object is easy to miss or miss detection. The method and the device can accurately detect the dense object and overcome the influence of factors such as illumination, shielding and the like.

Description

Image processing method and device for dense object

Technical Field

The present application relates to the field of image processing and computer vision, and in particular, to an image processing method and apparatus for dense objects.

Background

Object detection is one of the most important tasks in computer vision, and in recent years, various one-stage Network models based on a regional recommendation Network (RPN) have been developed. For example, fast R-CNN, Mask R-CNN, Yolo, SSD are representatives of object detection.

The inventors have found that with the continued mining of application scenarios, the need for dense object detection is increasing. However, due to the influence of factors such as shielding and illumination, most of the densely arranged objects are very easy to miss detection or false detection. Further influencing the object detection result.

Aiming at the problem that the detection of dense objects is easy to miss or miss in the related technology, no effective solution is provided at present.

Disclosure of Invention

The present application mainly aims to provide an image processing method and an image processing device for a dense object, so as to solve the problem that detection omission or false detection is easily caused for the detection of the dense object.

In order to achieve the above object, according to one aspect of the present application, there is provided an image processing method for a dense object.

The image processing method for the dense object according to the application comprises the following steps: determining an original annotated data set; training a natural scene text detection network model according to the original labeling data set; outputting an object data set through the natural scene text detection network model; training an object detection network model according to the object data set; combining the natural scene text detection network model and the object detection network model to obtain a target neural network; and inputting the picture to be recognized to the target neural network, and outputting the image prediction result of the dense object.

Further, before training the natural scene text detection network model according to the original labeling data set, the method further comprises: and executing image level presetting operation on the pictures in the original labeling data set.

Further, before training the natural scene text detection network model according to the original labeling data set, the method further comprises: and executing pixel level presetting operation on the pictures in the original labeling data set.

Further, after the detecting the network model through the natural scene text and outputting the object data set, the method further includes: merging similar categories in the object data sets and setting a merging threshold; and performing a cutting operation on the object frame to obtain a new object data set.

Further, when inputting the picture to be recognized to the target neural network and outputting the image prediction result of the dense object, the method further comprises any one or more of the following operations: the image processing method comprises image data preprocessing, image data amplification processing, image data normalization processing, Gaussian-like processing and image data visualization processing.

Further, the natural scene text detection network model comprises: a CTPN neural network, the object detection network model comprising: a retinet target detector.

In order to achieve the above object, according to another aspect of the present application, there is provided an image processing apparatus for a dense object.

An image processing apparatus for dense objects according to the present application includes: the determining module is used for determining an original annotation data set; the first model training module trains a natural scene text detection network model according to the original labeling data set; the output module is used for outputting an object data set through the natural scene text detection network model; the second model training module is used for training an object detection network model according to the object data set; the merging module is used for combining the natural scene text detection network model and the object detection network model to obtain a target neural network; and the prediction module is used for inputting the picture to be recognized to the target neural network and outputting the image prediction result of the dense object.

Further, the apparatus further comprises: a first preprocessing module, configured to perform an image level presetting operation on a picture in the original annotation data set.

Further, the apparatus further comprises: a second preprocessing module that performs a pixel level presetting operation on the pictures in the original labeled data set.

Further, the apparatus further comprises: a merging and shearing module for merging similar categories in the object data set and setting a merging threshold; and performing a cropping operation on the object frame to obtain a new object data set

In the embodiment of the application, an original marked data set is determined, a natural scene text detection network model is trained according to the original marked data set, an object data set is output through the natural scene text detection network model, and an object detection network model is trained according to the object data set; combining the natural scene text detection network model and the object detection network model to obtain a target neural network; the method achieves the purposes of inputting the picture to be recognized to the target neural network and outputting the image prediction result of the dense object, thereby realizing the technical effect of detecting the dense object and further solving the technical problem that the detection omission or false detection is easy to occur in the detection of the dense object. By the method, a plurality of positions of the same picture can share characteristic information and interact with adjacent dense objects.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, serve to provide a further understanding of the application and to enable other features, objects, and advantages of the application to be more apparent. The drawings and their description illustrate the embodiments of the invention and do not limit it. In the drawings:

FIG. 1 is a schematic diagram of an image processing method for dense objects according to an embodiment of the application;

FIG. 2 is a schematic diagram of an image processing method for dense objects according to an embodiment of the application;

FIG. 3 is a schematic diagram of an image processing method for dense objects according to an embodiment of the application;

FIG. 4 is a schematic diagram of an image processing method for dense objects according to an embodiment of the application;

FIG. 5 is a schematic diagram of an image processing apparatus for dense objects according to an embodiment of the present application;

FIG. 6 is a schematic diagram of an image processing apparatus for dense objects according to an embodiment of the present application;

FIG. 7 is a schematic diagram of an image processing apparatus for dense objects according to an embodiment of the present application;

FIG. 8 is a schematic diagram of an image processing apparatus for dense objects according to an embodiment of the present application;

FIG. 9 is a process flow diagram of an image processing method for dense objects according to the present application;

FIG. 10 is an entity diagram of an original annotation data set in accordance with the present application;

fig. 11 is a schematic diagram of visualization of CTPN neural network output results according to an embodiment of the present application;

FIG. 12 is a graphical illustration of the combined results of the CTPN neural network outputs in an embodiment in accordance with the present application;

FIG. 13 is a graphical illustration of visualization of accurate object detection according to the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In this application, the terms "upper", "lower", "left", "right", "front", "rear", "top", "bottom", "inner", "outer", "middle", "vertical", "horizontal", "lateral", "longitudinal", and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings. These terms are used primarily to better describe the present application and its embodiments, and are not used to limit the indicated devices, elements or components to a particular orientation or to be constructed and operated in a particular orientation.

Moreover, some of the above terms may be used to indicate other meanings besides the orientation or positional relationship, for example, the term "on" may also be used to indicate some kind of attachment or connection relationship in some cases. The specific meaning of these terms in this application will be understood by those of ordinary skill in the art as appropriate.

Furthermore, the terms "mounted," "disposed," "provided," "connected," and "sleeved" are to be construed broadly. For example, it may be a fixed connection, a removable connection, or a unitary construction; can be a mechanical connection, or an electrical connection; may be directly connected, or indirectly connected through intervening media, or may be in internal communication between two devices, elements or components. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art as appropriate.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

As shown in fig. 1, the method includes steps S to S as follows:

step S102, determining an original annotation data set;

a data set is determined and constructed according to a conventional object recognition labeling method, such as that shown in fig. 10. For the dense object, labeling the dense object in the original image data by using a labeling method in the prior art, and obtaining a result, namely an original labeling data set.

Step S104, training a natural scene text detection network model according to the original labeling data set;

the natural scene text detection network model can detect characters in the dense objects, and integrates context information of pictures by combining text detection, so that all the dense objects can be extracted without omission.

Preferably, in this embodiment, the text detection based network CTPN is trained on the original annotation data.

Step S106, outputting an object data set through the natural scene text detection network model;

all dense objects can be completely preserved in the object dataset. The object data set output by the model can also be subjected to related shearing operation, so that the constructed data set result is more accurate. While also requiring increased associated image-level pre-processing and pixel-level pre-processing. As shown in fig. 11, the result of the natural scene text detection network model is visualized.

It should be noted that the operation that can be performed on the obtained object data set may be image data preprocessing, image data expansion processing, image data normalization processing, gaussian-like processing, image data visualization processing, and the like, and the present application is not limited thereto, and those skilled in the art can select different processing modes according to actual scenes. The result of merging the output object data sets of the natural scene text detection network model is shown in fig. 12.

Step S108, training an object detection network model according to the object data set;

and training through the object data set to obtain a network model based on object recognition. Through the network model based on object recognition, the extracted objects are accurately classified and positioned, and the condition of missed detection and false detection can be prevented.

Step S110, combining the natural scene text detection network model and the object detection network model to obtain a target neural network;

and combining and splicing the obtained natural scene text detection network model and the object detection network model to obtain an integral target neural network.

Preferably, in the embodiment of the present application, the text-based detection CTPN and the object-based detection retinet are merged into a whole.

It should be noted that there are many ways of combining and splicing to obtain the target neural network, which are not limited in the present application, and those skilled in the art can select different ways to perform splicing and combining according to actual scenarios.

And step S112, inputting the picture to be recognized to the target neural network, and outputting the image prediction result of the dense object.

For any input picture, for example, after normalization and gauss processing, the input picture is input into a target neural network to obtain network output, the output result is an image prediction result of a dense object, and meanwhile, the image prediction result can be visualized. Visualization of the network output in combination with the artwork may result as in FIG. 13

From the above description, it can be seen that the following technical effects are achieved by the present application:

According to the embodiment of the present application, as shown in fig. 2, in step S104, before training the natural scene text detection network model according to the original annotation data set, the method further includes: and executing image level presetting operation on the pictures in the original labeling data set.

The performing of the image level presetting operation on the pictures in the original annotation data set may be: and (4) carrying out image level augmentation such as duplication removal, overturning, rotating, cutting and the like on the pictures in the data set.

Preferably, in the embodiment of the present application, the image-level data preprocessing such as cropping, rotating, flipping, etc. of the image may be completed by using a Pytorch.

In addition, the images to be input into the natural scene text detection network model need to be normalized.

In addition, a clustering method is needed to obtain a size estimation value of the object to be detected. And based on the results of the estimates as object data sets.

According to the embodiment of the present application, as shown in fig. 3, before training the natural scene text detection network model according to the original annotation data set, the method further includes: and executing pixel level presetting operation on the pictures in the original labeling data set.

The performing of the image level presetting operation on the pictures in the original annotation data set may be: and (4) carrying out pixel level amplification on the pictures in the data set, such as brightness, contrast, Gaussian noise and the like.

Preferably, Opencv may be selected to increase brightness, contrast, and the like of an image and add gaussian noise in the embodiment of the present application.

In addition, before outputting the object data set through the natural scene text detection network model, the following steps are also required:

and extracting batch processing data to normalize the image size, and further completing unified processing of input. And the mean and variance of the data at the pixel level are analyzed, and Gaussian distribution is simulated to perform Gaussian-like processing on the data.

According to an embodiment of the present application, as shown in fig. 4, after outputting the object data set through the natural scene text detection network model, the method further includes:

step S402, merging similar categories in the object data set and setting a merging threshold;

step S404, a clipping operation is performed on the object frame to obtain a new object data set.

Specifically, similar category merging needs to be performed on the output of the object data set output by the natural scene text detection network model, and a merging threshold is set to obtain a merging result.

Preferably, a new accurate object data set can be constructed by performing a cropping operation on the object frame, and similar preprocessing and augmentation operations are performed on the data in the accurate object data set, and an object detection-based model, such as a preferred retinet, is trained according to the augmented data set.

As a preference in this embodiment, the inputting of the picture to be recognized to the target neural network and the outputting of the image prediction result of the dense object further includes any one or more of the following operations:

the image processing method comprises image data preprocessing, image data amplification processing, image data normalization processing, Gaussian-like processing and image data visualization processing.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.

According to an embodiment of the present application, there is also provided an apparatus for implementing the above-described image processing method for a dense object, as shown in fig. 5, the apparatus including: a determining module 10, configured to determine an original annotation data set; a first model training module 20, which trains a natural scene text detection network model according to the original labeled data set; an output module 30, configured to output an object data set through the natural scene text detection network model; a second model training module 40 for training an object detection network model from the object data set; a merging module 50, configured to combine the natural scene text detection network model and the object detection network model to obtain a target neural network; and a prediction module 60, configured to input a picture to be identified to the target neural network, and output an image prediction result of the dense object.

In the determination module 10 of the embodiment of the present application, a data set is determined and constructed according to a conventional object recognition labeling method, such as shown in fig. 10. For the dense object, labeling the dense object in the original image data by using a labeling method in the prior art, and obtaining a result, namely an original labeling data set.

In the first model training module 20 of the embodiment of the present application, the natural scene text detection network model can detect characters in dense objects, and integrates context information of pictures by combining text detection, thereby ensuring that all dense objects are extracted without omission.

In the output module 30 of the embodiment of the present application, all dense objects can be completely stored in the object data set. The object data set output by the model can also be subjected to related shearing operation, so that the constructed data set result is more accurate. While also requiring increased associated image-level pre-processing and pixel-level pre-processing. As shown in fig. 11, the result of the natural scene text detection network model is visualized.

In the second model training module 40 of the embodiment of the present application, a network model based on object recognition is obtained through object data set training. Through the network model based on object recognition, the extracted objects are accurately classified and positioned, and the condition of missed detection and false detection can be prevented.

In the merging module 50 of the embodiment of the present application, the natural scene text detection network model and the object detection network model obtained as described above are combined and spliced to obtain an overall target neural network.

In the prediction module 60 of the embodiment of the present application, for example, after normalization and gaussianization, an arbitrary input picture is input into a target neural network to obtain a network output, the output result is an image prediction result of a dense object, and the image prediction result can be visualized. Visualization of the network output in combination with the artwork may result as in FIG. 13

According to the embodiment of the present application, preferably, as shown in fig. 6, the apparatus further includes: a first preprocessing module 70, configured to perform an image level presetting operation on the pictures in the original annotation data set.

The performing, in the first preprocessing module 70 of the embodiment of the present application, an image level presetting operation on the pictures in the original annotation data set may be: and (4) carrying out image level augmentation such as duplication removal, overturning, rotating, cutting and the like on the pictures in the data set.

According to the embodiment of the present application, preferably, as shown in fig. 6, the apparatus further includes: a second preprocessing module 80, which performs a pixel level presetting operation on the pictures in the original labeled data set.

The second preprocessing module 80 in the embodiment of the present application may perform an image level presetting operation on the pictures in the original annotation data set by: and (4) carrying out pixel level amplification on the pictures in the data set, such as brightness, contrast, Gaussian noise and the like.

According to the embodiment of the present application, preferably, as shown in fig. 6, the apparatus further includes: a merge clipping module 90 for merging similar categories in the object dataset and setting a merge threshold; and performing a cutting operation on the object frame to obtain a new object data set.

In the merging and clipping module 90 according to the embodiment of the present application, it is specifically necessary to perform similar category merging on the output of the object data set output by the natural scene text detection network model, and set a merging threshold value to obtain a merging result.

As shown in fig. 9, the implementation principle of the present application is as follows: first, a data set needs to be constructed according to a traditional object identification and labeling method. And inputting a picture to construct a data set, and performing image level augmentation preprocessing such as duplication removal, overturning, rotating, cutting and the like on the picture in the data set. And carrying out pixel level amplification preprocessing on the pictures in the data set, such as brightness, contrast, Gaussian noise and the like. Then, normalizing the picture to be input into the network model, and obtaining a size estimation value of the object to be detected by using a clustering method; and constructing a neural network model based on text detection and a neural network model based on object detection by using the result of the size estimation value of the object. And finally, dividing the data set into a training set, a testing set and a verification set according to a preset proportion, and inputting the processed data result into a neural network model based on text detection for training to obtain an optimized model. And inputting the pictures needing to be detected, preprocessing the same data, sending the preprocessed data to a network to obtain output, and visually outputting the output result corresponding to the original pictures.

Specifically, the natural scene text detection network model includes: a CTPN neural network, the object detection network model comprising: a retinet target detector.

In the embodiment of the application, a network CTPN based on text detection is trained based on original labeling data, and then image level data preprocessing such as cutting, rotating and turning of an image is completed by using a Pythrch; brightness, contrast, and the like of an image are increased using opencv and gaussian noise is added.

And extracting batch processing data from the augmented data set, and normalizing the size of the image to finish the unification of input. And mean value and variance of data at pixel level are analyzed, and Gaussian distribution is simulated to perform Gaussian-like processing on the data. The augmented data set is input into the CTPN network to obtain an output result, as shown in fig. 11. Similar category merging needs to be performed on the output data results, and a merging threshold is set, so that a merging result is obtained as shown in fig. 12. The object frame shown in fig. 12 is subjected to a cropping operation to construct a new accurate object data set. Similar preprocessing and augmentation operations are performed on the data in the accurate object dataset.

And training a model based on object detection, such as Retianet, by using the augmented data set obtained in the step, and splicing and combining the CTPN based on text detection and the Retianet based on object detection into a whole. And for any input picture, after normalization and gaussianization, inputting the picture into a network to obtain network output. The network output is data visualized in conjunction with the original image to obtain the results shown in fig. 13.

Through the operation, in the first stage, the context information of the picture is integrated by combining text detection, the extraction of all objects without omission is completed, and an accurate object data set is constructed. In the second stage, the extracted objects are accurately classified and positioned through object identification, and the possibility of missed detection and false detection is reduced.

It will be apparent to those skilled in the art that the modules or steps of the present application described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present application is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. An image processing method for dense objects, comprising:

determining an original annotated data set;

training a CTPN neural network according to the original labeling data set;

outputting an object data set through the CTPN neural network;

training a Retianet target detector according to the object data set;

combining the CTPN neural network with the Retianet target detector to obtain a target neural network; and

and inputting the picture to be recognized to the target neural network, and outputting an image prediction result of the dense object.

2. The method of image processing according to claim 1, wherein training a CTPN neural network from the original annotation data set further comprises: and executing image level presetting operation on the pictures in the original labeling data set.

3. The image processing method of claim 1 or 2, wherein training a CTPN neural network from the original annotation data set further comprises: and executing pixel level presetting operation on the pictures in the original labeling data set.

4. The image processing method of claim 1, further comprising, after outputting the object data set through the CTPN neural network:

merging similar categories in the object data sets and setting a merging threshold;

and performing a cutting operation on the object frame to obtain a new object data set.

5. The image processing method according to claim 1, wherein inputting the picture to be recognized to the target neural network and outputting the image prediction result of the dense object further comprises any one or more of the following operations:

6. The image processing method of claim 1, wherein the CTPN neural network comprises: a CTPN neural network, the Retianet target detector comprising: a retinet target detector.

7. An image processing apparatus for dense objects, comprising:

the determining module is used for determining an original annotation data set;

the first model training module is used for training the CTPN neural network according to the original labeling data set;

the output module is used for outputting the object data set through the CTPN neural network;

the second model training module is used for training the Retianet target detector according to the object data set;

the merging module is used for combining the CTPN neural network with the Retianet target detector to obtain a target neural network; and

and the prediction module is used for inputting the picture to be recognized to the target neural network and outputting the image prediction result of the dense object.

8. The image processing apparatus according to claim 7, further comprising: a first preprocessing module, configured to perform an image level presetting operation on a picture in the original annotation data set.

9. The image processing apparatus according to claim 7, further comprising: a second preprocessing module that performs a pixel level presetting operation on the pictures in the original labeled data set.

10. The image processing apparatus according to claim 7, further comprising: a merging and shearing module for merging similar categories in the object data set and setting a merging threshold; and performing a cutting operation on the object frame to obtain a new object data set.