CN112232390A

CN112232390A - Method and system for identifying high-pixel large image

Info

Publication number: CN112232390A
Application number: CN202011047615.1A
Authority: CN
Inventors: 孙宝亮; 潘红九; 梁宇; 赵俊翔; 王保录; 赵凯南; 于喜红; 周伟; 彭晓; 金娜; 张运; 李萌萌
Original assignee: Beijing Institute of Near Space Vehicles System Engineering
Current assignee: Beijing Institute of Near Space Vehicles System Engineering
Priority date: 2020-09-29
Filing date: 2020-09-29
Publication date: 2021-01-15
Anticipated expiration: 2040-09-29
Also published as: CN112232390B

Abstract

The invention provides a method and a system for identifying a large image with high pixels. The method and the device realize the process of identifying the image in a human-like manner by combining the classical algorithm in the field of target detection and image segmentation, thereby paying attention to the global information and the local information of the image, and simultaneously realizing multi-level self-adaptive information extraction according to the information requirements of different zoom levels so as to acquire information about the image as much as possible.

Description

Method and system for identifying high-pixel large image

[ technical field ] A method for producing a semiconductor device

The invention relates to image processing, computer vision and target detection, in particular to a method and a system suitable for identifying a large image and a high-pixel image.

[ background of the invention ]

In modern image processing, a large number of pictures are formed by scanning and splicing special equipment, and the pictures have the characteristics of large pictures, high quality, high pixels and special image formats. In the current image recognition technology based on deep learning, the images cannot be directly applied, and preprocessing methods such as format conversion, compression, shearing and the like are often required to be carried out on the images, so that the images are converted into the images which can be directly called by an algorithm. However, the method has two limitations, firstly, details are lost after image compression, and for a large image, the image details are usually important detection targets, which can affect the algorithm identification precision in the practical application process and cause missing detection; secondly, although the cut image retains the local details of the image, the whole information of the image is lost, and it is difficult to extract the key information in the image.

As can be seen from the retrieved prior art documents, a target detection technology based on deep learning has been widely cited, and an image segmentation technology is developed on the basis of the target detection technology to be able to implement pixel-level segmentation on a target, however, neither the target detection nor the image segmentation technology can well implement whole information extraction on a large image and a high-pixel image. Therefore, the development of the image identification method based on the combination of target detection and image segmentation has important significance and practical value in meeting the identification requirements of large images and high-pixel images.

[ summary of the invention ]

The application provides a method and a system suitable for identifying a large image and a high-pixel image. The image classification method and the image classification device realize the image human-like identification process by combining the classical algorithm in the field of target detection and image segmentation, so that not only the global information of the image but also the local information of the image are concerned, and meanwhile, multi-level self-adaptive information extraction can be realized according to the information requirements of different zoom levels, and further, the information about the image is obtained as much as possible.

The technical scheme adopted by the application is as follows:

a method for identifying a large image with high pixels is applied to an identification system of the large image with high pixels, the identification system comprises a sample library module, a model training module, a model testing module and a human-like information extraction module, wherein the sample library module is used for completing image acquisition and sample marking and sending marked data to the model training module for model training;

the model training module is used for carrying out image enhancement processing by adopting an image processing algorithm so as to enhance the image quality;

the model testing module processes the data entering the model testing module through the image preprocessing subunit and then the target detection and identification subunit and the image segmentation and identification subunit respectively;

the human-like information extraction module is connected with the model test module and used for realizing human identification control, information extraction and combination and visual output of the sample class.

Furthermore, the sample base module uses an annotation tool to label the target position and the mask of the image according to the requirements of the task and generate an annotation file, and the sample base module is responsible for generating a sample data set and provides data support for algorithm training and verification.

Further, the using of the annotation tool annotates the target position of the image, the mask code and generates the annotation file according to the requirements of the task, specifically including:

step 1, reading source data or a directory for storing files;

step 2, creating a file named sample1, putting an image into a sample1 file, and setting the scaling series N of the image file;

step 3, reading the image in sample1, converting the image into a picture with a specified size according to the zoom level, and labeling the image by using a labeling tool, such as labelme;

step 4, marking the rectangular position information of the target contained in the current image in a marking tool, and expressing the rectangular position by using the same quantity of the coordinates at the upper left corner, the height of the target and the width of the target;

step 5, in the marking tool, a polygonal marking tool or a smearing tool is used for marking the shape mask of the target, and the mask is close to the real outline of the target as much as possible;

step 6, labeling the label information of the target, and selecting two modes of typing or pulling down a menu for inputting;

and 7, marking all zoom levels of the original image, generating image marking annotation json and other format files, and jumping out of the image marking module.

Further, the model training module performs image preprocessing on the received image data to obtain preprocessed image data, generates an annotation file by the preprocessed image data and the obtained annotation information, and sequentially inputs the annotation file to the target detection neural network unit and the image segmentation neural network unit for training.

Furthermore, the label file input by the target detection neural network is a target rectangular label file, the target rectangular label file is labeled with target category and position information, and the main purpose of the target detection neural network training is to identify regional features in an image.

Further, the target detection identification subunit performs coarse target identification or area identification, and adopts a target detection technology based on a deep neural network.

Further, the image segmentation and identification subunit performs target precise identification or target characteristic information extraction, an image segmentation technology based on a deep neural network is adopted, the two networks respectively test the network performance, and the target category, position and image segmentation information are predicted through a model.

Furthermore, in the human-like information extraction module, after the image conversion unit performs image compression processing on the input original image of the sample to be identified, converting into picture size and format capable of being called by model, inputting the generated image and scaling into model, under the zoom level, firstly, the target area is identified, the target area is output in a rectangular frame form, the image coordinate and the zoom ratio are recorded, the corresponding coordinate of the target area is obtained from the original image and the image is cut, the cut image is converted into a format, the size and the zoom ratio which can be identified by the model and is input into the model to further identify the target area and the target characteristic information, acquiring the coordinate and mask information corresponding to the target area in the image, entering the next zooming level identification for identification, and iterating in the above way, and extracting the characteristic information of the target at each zoom level only after traversing all zoom levels of the target.

Further, the workflow of the human-like information extraction module specifically includes the following steps:

step 1, reading in an original image;

step 2, setting a maximum zoom level x, wherein x is 1 and represents an original image; x is 10, 1/10 representing the original image, and so on, x is the maximum zoom level, representing the scale at which the image can be input into the model;

step 3, compressing the input image according to the maximum scaling to generate a new image, and sending the generated image to an area detection module;

step 4, performing region detection and feature extraction, detecting an available region, storing coordinates, heights and widths of upper left corner points of the suspicious region, acquiring position coordinates (coordinates, heights and widths of upper left corner points) of the suspicious region in the original image according to the zoom level x, and storing feature information (categories, positions and the like);

step 5, intercepting the suspicious region from the original image, setting a maximum zoom level x2, wherein x2 is 1 and represents the intercepted original image; x2 is 10, which represents 1/10 of the clipped original image, and so on, x2 is the maximum zoom level, which represents that the image can be input into the model at the zoom scale, if x2 is 1, the process goes to step 6, and if x2>1, the step 3 is performed with the clipped image as input;

step 6, screening and integrating all levels of characteristic information, such as categories, positions and the like, and outputting all characteristic information about the target;

and 7, visually outputting, wherein in the next step, the target area is marked in the zooming level and the segmentation result of the target pixel level is marked.

The identification system is suitable for the method, the sample library module is connected with the model training module and consists of a sample acquisition unit and a sample marking unit, the sample marking unit comprises a rectangular marking subunit, a mask marking subunit and a marking file generation subunit, and the rectangular marking subunit and the mask marking subunit are connected with the marking file generation subunit;

the model training module is connected with the model testing module and comprises an image preprocessing unit, a target detection neural network unit, an image segmentation neural network unit and a model parameter unit;

the model testing module is connected with the model training and comprises an image preprocessing subunit, a target detection and identification subunit, an image segmentation and identification subunit, a category and position subunit, an image segmentation subunit and a visual output subunit;

the human-like information extraction module is connected with the model test module.

Through the embodiment of the application, the following technical effects can be obtained:

1) by the image preprocessing method provided by the invention, the functions of format conversion, compression and shearing of the large image can be realized, and meanwhile, the coordinate information of the original image is kept, so that the original image can be traced back by an algorithm conveniently;

2) the method realizes the identification and extraction of regional characteristics of a target region through a target detection method based on deep learning, maps the target region into an original image, cuts out the target region in the original image, continues to send the target region into an identification module, further identifies the characteristic information in the region, can identify the extraction of individual-level characteristics aiming at the original image through multiple iterations, and further identifies the information extraction of the whole original image;

3) the invention realizes the image pre-processing and recognition at no more zoom levels, extracts the zoom level information of the image and realizes the human-like image recognition process. By adopting an identification method combining YOLO3 and Mask-RCNN, different levels of feature identification are realized, and a human-like identification technology is realized. The method can effectively detect the global information of the large image and the high-pixel image and identify the characteristic area; meanwhile, a target detection method is utilized to quickly lock a characteristic region, and characteristic details are further analyzed in a sensitive region, so that a whole image human-like identification process from the whole to the local is obtained, and therefore, the technology has important significance in engineering practice.

[ description of the drawings ]

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and those skilled in the art can also obtain other drawings according to the drawings without inventive labor.

FIG. 1 is a schematic diagram of the structure of the identification system of the present invention;

FIG. 2 is a schematic flow chart of data annotation according to the present invention;

FIG. 3 is a schematic diagram of the workflow of the human-like information extraction module according to the present invention.

[ detailed description ] embodiments

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Fig. 1 is a schematic structural diagram of the identification system of the present invention. The recognition system comprises a sample library module, a model training module, a model testing module and a human-like information extraction module;

the system comprises a model training module, a sample library module, a sample annotation unit and a label file generation subunit, wherein the model training module is connected with the model training module and used for completing image acquisition and sample labeling;

the sample labeling module is used for labeling the target position and the mask of the image according to the requirement of the task by using a labeling tool and generating a labeling file, and the sample labeling module is responsible for generating a sample data set and providing data support for algorithm training and verification;

the generating of the sample by the sample labeling module specifically includes: generating an image format and size which can be called by an algorithm from an original large image according to different zoom levels, labeling target region characteristics and target detail characteristics in the generated image respectively by adopting two modes of rectangular labeling and mask labeling, and establishing a sample library.

FIG. 2 is a schematic flow chart of data annotation according to the present invention, in which the annotation tool is used to label the target position of the image, the mask code and generate the annotation file according to the task requirement, specifically including

Step 1, reading source data or a directory for storing files;

step 5, in the marking tool, a polygon marking tool is used for marking the shape mask of the target, and the mask is close to the real contour of the target as much as possible;

and 7, marking all zoom levels of the original image, generating an image marking annotation json file, and jumping out of the image marking module.

After the sample library is established, the identification method realized by the identification system enters a model training module.

The model training module is connected with the model testing module and used for performing image enhancement processing by adopting an image processing algorithm so as to enhance the image quality, and comprises an image preprocessing unit, a target detection neural network unit, an image segmentation neural network unit and a model parameter unit;

the image processing algorithm comprises algorithms such as Gaussian filtering, frequency domain filtering, histogram equalization and the like.

The image preprocessing unit carries out image preprocessing on the received image data to obtain preprocessed image data, generates an annotation file by the preprocessed image data and annotation information obtained from the sample annotation module, and sequentially inputs the annotation file to the target detection neural network unit (YOLO3) and the image segmentation neural network unit (YOLO3) for training;

the label file input by the target detection neural network is a target rectangular label file, the target rectangular label file is labeled with target category and position information, and the main purpose of the target detection neural network training is to identify regional characteristics in an image;

the annotation file input by the image segmentation neural network is target category and mask information, and the purpose of the image segmentation neural network training is to acquire target local feature information at different zoom levels.

In the model training process, two networks are trained simultaneously, wherein the target detection neural network reads in the preprocessed image and then performs candidate feature extraction, ROI (region of interest) region generation and frame regression operations, and position coordinates of the target in the original image are obtained; after the image preprocessed by the image segmentation neural network is read in, candidate feature extraction, ROI (region of interest) region generation and frame regression operation are carried out, and target local feature information is obtained; and after the model training is finished, the model parameter unit stores the model parameters.

And after the model training is finished, entering a model testing stage.

The model testing module is connected with the model training and comprises an image preprocessing subunit, a target detection and identification subunit, an image segmentation and identification subunit, a category and position subunit, an image segmentation subunit and a visual output subunit, data entering the model testing module are processed by the target detection and identification subunit and the image segmentation and identification subunit after passing through the image preprocessing subunit, the target detection and identification subunit performs target rough identification or region identification, a target detection technology based on a deep neural network is adopted, the image segmentation and identification subunit performs target precise identification or target characteristic information extraction, the two networks respectively test the network performance by adopting an image segmentation technology based on the deep neural network, and the category, the position and the image segmentation information are predicted by the model.

And when the model passes the test, entering a similar human information extraction module.

In the human-like information extraction module, after an image conversion unit performs image compression processing on an input sample original image to be identified, converting into picture size and format capable of being called by model, inputting the generated image and scaling into model, under the zoom level, firstly, the target area is identified, the target area is output in a rectangular frame form, the image coordinate and the zoom ratio are recorded, the corresponding coordinate of the target area is obtained from the original image and the image is cut, the cut image is converted into a format, the size and the zoom ratio which can be identified by the model and is input into the model to further identify the target area and the target characteristic information, acquiring the coordinate and mask information corresponding to the target area in the image, entering the next zooming level identification for identification, and iterating in the above way, and extracting the characteristic information of the target at each zoom level only after traversing all zoom levels of the target.

FIG. 3 is a schematic diagram of the workflow of the human-like information extraction module according to the present invention. The workflow of the human-like information extraction module specifically comprises the following steps:

step 1, reading in an original image;

The process realizes the process of human-like identification by large and small images, gradual zooming and gradual scanning, namely, the whole image is observed firstly, a suspicious region is locked, the suspicious region is retracted for continuous browsing, a characteristic region is searched in the suspicious region, and after the characteristic region is locked, a target is further searched in the region and the detail characteristics of the target are checked.

In the invention, the image characteristics are scanned step by adopting a similar human identification process, and the target characteristic information is integrated at an output stage and the prior knowledge is combined to output the information of the target category, characteristics, position and the like, thereby realizing the accurate identification aiming at the large image.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. The recognition method is applied to a recognition system of the high-pixel large image, the recognition system comprises a sample library module, a model training module, a model testing module and a human-like information extraction module, and is characterized in that:

the sample library module is used for completing image acquisition and sample marking and sending marked data to the model training module for model training;

2. The identification method of claim 1, wherein the sample library module uses an annotation tool to label the image target position, the mask code and generate an annotation file according to the task requirement, and the sample library module is responsible for generating a sample data set and provides data support for algorithm training and verification.

3. The identification method according to claim 1, wherein the using of the annotation tool to label the target position of the image, the mask code and generate the label file according to the requirement of the task specifically comprises:

step 1, reading source data or a directory for storing files;

4. The identification method according to claim 1, wherein the model training module performs image preprocessing on the received image data to obtain preprocessed image data, generates an annotation file from the preprocessed image data and the obtained annotation information, and inputs the annotation file to the target detection neural network unit and the image segmentation neural network unit in sequence for training.

5. The identification method according to claim 4, wherein the label file input by the target detection neural network is a target rectangle label file, the target rectangle label file is labeled with target category and position information, and the main purpose of the target detection neural network training is to identify regional features in the image.

6. The identification method according to claim 1, wherein the target detection identification subunit performs coarse target identification or area identification, and adopts a deep neural network-based target detection technology.

7. The identification method according to claim 1, wherein the image segmentation identification subunit performs fine target identification or target feature information extraction, and adopts an image segmentation technique based on a deep neural network, and the two networks respectively test the network performance and predict the target category, position and image segmentation information through a model.

8. The identification method according to claim 1, wherein in the human-like information extraction module, the image conversion unit performs image compression processing on an input original image of a sample to be identified, converts the original image into a picture size and a picture format which can be called by a model, inputs a generated image and a generated scaling ratio into the model, performs target area identification at the scaling level, outputs a target area in a rectangular frame form, records image coordinates and the scaling ratio, acquires coordinates corresponding to the target area from the original image and cuts the image, converts the cut image into a format, a size and a scaling ratio which can be identified by the model, inputs the format, the size and the scaling ratio which can be identified by the model into the model to further identify the target area and target feature information, acquires coordinates and mask information corresponding to the target area from the image, performs identification in a next scaling level, and iterates the steps, and extracting the characteristic information of the target at each zoom level only after traversing all zoom levels of the target.

9. The identification method according to claim 1, wherein the workflow of the human-like information extraction module specifically comprises the following steps:

step 1, reading in an original image;

10. An identification system adapted for use in the method of any one of claims 1 to 9,

the model training system comprises a model training module, a sample library module, a sample labeling unit and a labeling file generating subunit, wherein the sample library module is connected with the model training module and consists of a sample acquisition unit and the sample labeling unit, the sample labeling unit comprises a rectangular labeling subunit, a mask labeling subunit and a labeling file generating subunit, and the rectangular labeling subunit and the mask labeling subunit are connected with the labeling file generating subunit;