CN111754474A

CN111754474A - Visibility identification method and device based on image definition

Info

Publication number: CN111754474A
Application number: CN202010555211.7A
Authority: CN
Inventors: 周康明; 方飞虎
Original assignee: Shanghai Eye Control Technology Co Ltd
Current assignee: Shanghai Eye Control Technology Co Ltd
Priority date: 2020-06-17
Filing date: 2020-06-17
Publication date: 2020-10-09

Abstract

The method comprises the steps of processing the obtained building image information into training data by obtaining the building image information under a plurality of conditions; training an image definition comparison model according to the training data; processing the acquired image information of the target building into data to be input, and inputting the data to be input into the image definition comparison model to obtain the ratio of visibility to building distance; and determining the visibility of the target building according to the actual distance of the target building and the ratio of the visibility to the building distance. Therefore, an image definition comparison model with good robustness is obtained, and the visibility of the building can be accurately identified under different illumination and weather conditions through the model.

Description

Visibility identification method and device based on image definition

Technical Field

The present application relates to the field of computers, and in particular, to a visibility recognition method and apparatus based on image sharpness.

Background

In the field of aeronautical weather, visibility is a very important index and is related to whether an airplane flight can normally navigate or not; the current visibility mainly depends on manual observation and optical instrument detection; the manual observation needs professional observers to observe the landmark buildings preset around the observation station at intervals, the current visibility is obtained by judging the definition degree of the buildings, and the manual observation needs a large amount of manpower and is limited by the fact that the frequency of the obtained data cannot be too high; the optical instrument detects the transmission and scattering coefficients in a section of space, the visibility in the section is obtained through calculation of an optical formula, the detection range of the optical instrument is small, and the optical instrument is limited by the local air quality; a method of determining local visibility through a camera screen using a machine learning technique has appeared in recent years; some of the technologies use an image classification network to classify and obtain the definition of the landmark buildings in the picture, thereby estimating the visibility value which can be achieved in the picture and obtaining the approximate local visibility by integrating a plurality of camera pictures. Because the camera pictures under different illumination and weather conditions change greatly, the visibility estimation technology for judging the camera pictures based on the machine learning technology at present cannot obtain accurate information under the conditions of meeting new illumination and weather conditions, and the robustness is poor.

Disclosure of Invention

An object of the present application is to provide a visibility identification method and device based on image definition, which solve the problems that in the prior art, the visibility estimation technology for judging a camera picture based on a machine learning technology often cannot obtain accurate information under the conditions of meeting new illumination and weather conditions, and the image definition comparison model learning robustness is poor.

According to one aspect of the application, a visibility recognition method based on image definition is provided, and the method comprises the following steps:

building image information under a plurality of conditions is obtained, and the obtained building image information is processed into training data;

training an image definition comparison model according to the training data;

processing the acquired image information of the target building into data to be input, and inputting the data to be input into the image definition comparison model to obtain the ratio of visibility to building distance;

and determining the visibility of the target building according to the actual distance of the target building and the ratio of the visibility to the building distance.

Further, processing the acquired building image information into training data includes:

performing smoothing processing and channel enhancement operation on each building image in the obtained building image information for multiple times to obtain a reorganized image corresponding to each building;

building definition label marking is carried out on each image in the reorganized images corresponding to each building;

and taking the reorganized image corresponding to each building and the corresponding building definition label as training data.

Further, training an image definition comparison model according to the training data includes:

constructing a twin network structure through a convolution layer, a batchnorm layer, a scale layer, a relu layer, an eltwise layer, a pooling layer, a concat layer and a full connection layer;

respectively subtracting a first pixel value, a second pixel value and a third pixel value from RGB channel data of each pixel point of the reorganized image corresponding to each building in the training data to obtain an image with pixel distribution adjusted;

the size of the image after the pixel distribution is adjusted is zoomed to a preset pixel, and a preset image is obtained;

and simultaneously inputting the preset image and the corresponding building definition label into the twin network structure to train as an image definition comparison model.

Further, after the preset image and the corresponding building definition tag are simultaneously input into the twin network structure, the method includes:

acquiring the class probability output by the last full connection layer of the full connection layers;

calculating a cross entropy loss of the category probability and the building sharpness label;

and judging whether the twin network structure reaches a convergence state according to the cross entropy loss.

Further, the method comprises:

intercepting a current landmark building image from each image in the obtained building image information under a plurality of conditions;

determining the label of the current landmark building image, and taking all the current landmark building images and the corresponding labels as adjustment training data;

adjusting one branch in the twin network structure reaching the convergence state to obtain an adjusted network structure;

inputting the adjusted training data into the adjusted network structure, and outputting a result;

and iteratively updating the network parameters in the adjusted network structure according to the output result until the adjusted network structure reaches a convergence state, so as to obtain an image definition comparison model.

Further, determining the label of the current landmark building image includes:

acquiring query data of the current landmark building image, and determining the definition value of the current landmark building image according to the query data;

and determining the label of the current landmark building image according to the definition value of the current landmark building image.

Further, adjusting one branch of the twin network structure that reaches the convergence state to obtain an adjusted network structure, including:

and fixing the network parameters in one branch of the converged twin network structure, and connecting a full connection layer behind the last full connection layer of the branch to obtain the adjusted network structure.

Further, the outputting the result is a floating point number, and iteratively updating the network parameter in the adjusted network structure according to the outputting the result includes:

calculating a mean square error loss function between the floating point number and a label of the current landmark building image;

and iteratively updating the network parameters in the adjusted network structure according to the mean square error loss function.

According to another aspect of the present application, there is also provided an apparatus for visibility recognition based on image sharpness, the apparatus including:

one or more processors; and

a memory storing computer readable instructions that, when executed, cause the processor to perform the operations of the method as previously described.

According to yet another aspect of the present application, there is also provided a computer readable medium having computer readable instructions stored thereon, the computer readable instructions being executable by a processor to implement the method as described above.

Compared with the prior art, the method and the device have the advantages that the building image information under multiple conditions is obtained, and the obtained building image information is processed into training data; training an image definition comparison model according to the training data; processing the acquired image information of the target building into data to be input, and inputting the data to be input into the image definition comparison model to obtain the ratio of visibility to building distance; and determining the visibility of the target building according to the actual distance of the target building and the ratio of the visibility to the building distance. Therefore, an image definition comparison model with good robustness is obtained, and the visibility of the building can be accurately identified under different illumination and weather conditions through the model.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 illustrates a flow diagram of a method for visibility recognition based on image sharpness according to one aspect of the present application;

FIG. 2 shows a schematic diagram of a simple twin network structure in an embodiment of the present application;

FIG. 3 is a flow chart illustrating a visibility identification method based on building image sharpness according to an embodiment of the present disclosure;

fig. 4 shows a schematic structural diagram of an apparatus for visibility recognition based on image sharpness according to still another aspect of the present application.

The same or similar reference numbers in the drawings identify the same or similar elements.

Detailed Description

The present application is described in further detail below with reference to the attached figures.

In a typical configuration of the present application, the terminal, the device serving the network, and the trusted party each include one or more processors (e.g., Central Processing Units (CPUs)), input/output interfaces, network interfaces, and memory.

The Memory may include volatile Memory in a computer readable medium, Random Access Memory (RAM), and/or nonvolatile Memory such as Read Only Memory (ROM) or flash Memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, Phase-Change RAM (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), flash Memory or other Memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, magnetic cassette tape, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transmyedia), such as modulated data signals and carrier waves.

Fig. 1 shows a flowchart of a visibility recognition method based on image sharpness, which includes: step S11 to step S14,

in step S11, building image information under a plurality of conditions is acquired, and the acquired building image information is processed into training data; the plurality of conditions may be conditions of different visibility and different illumination, and image information of different visibility and different illumination conditions, preferably RGB image information, may be collected by a camera installed at a high position of the observation station, and the building image information may be intercepted from the image information. And processing the intercepted building image information into training data which is in accordance with the use of the training model. Thus, in step S12, training an image sharpness comparison model according to the training data; that is, the training data processed in step S11 is input to the network model to be trained as an image sharpness comparison model, so that the sharpness of the building image is estimated using the image sharpness comparison model.

In step S13, the acquired image information of the target building is processed into data to be input, and the data to be input is input into the image sharpness comparison model to obtain the ratio of visibility to building distance; after the image definition comparison model is established, the image information of the target building, of which the definition needs to be estimated, is acquired, the image information of the target building is processed to meet the input requirement of the image definition comparison model, namely the processed image size of the data to be input, the RGB channel information of the pixel points and the like meet the input requirement of the image definition comparison model, the model is input, an output result is obtained and is a certain numerical value, the numerical value represents the ratio of the visibility to the building distance, and the ratio is the definition value.

In step S14, the visibility of the target building is determined according to the actual distance of the target building and the ratio of the visibility to the building distance. Here, the actual distance of the target building is the actual distance between the building whose visibility needs to be estimated and the observation station, the actual distance of the target building is collected, and the actual distance is multiplied by the output result in step S13, that is, the actual distance and the visibility are multiplied by the ratio, and the obtained value can reflect the visibility value that the image of the target building can reflect.

In an embodiment of the present application, after determining the visibility of the target building according to the actual distance of the target building and the ratio of the visibility to the building distance, the maximum value of the visibility of all buildings at the same target time is selected as the visibility at the target time. Here, each building image corresponds to time information, that is, the time when the image is acquired corresponds to the building image, then the image information of all buildings at the same target time is selected, for example, the image information of all buildings acquired by a 9-point 10-time-sharing observation station is selected, then the visibility of all buildings is determined by using a trained image definition comparison model, and the maximum value of the visibility of all buildings at the same time is used as the actual value of the visibility at the time. Illustratively, the building images collected at 9 points and 10 points include a building a image, a building B image and a building C image, the visibility of the building A, B, C is determined to be p1, p2 and p3, and the maximum value is selected from p1, p2 and p3 as the actual value of the visibility at 9 points and 10 points.

In an embodiment of the present application, in step S11, performing multiple smoothing and channel enhancement operations on each building image in the acquired building image information to obtain a reorganized image corresponding to each building; building definition label marking is carried out on each image in the reorganized images corresponding to each building; and taking the reorganized image corresponding to each building and the corresponding building definition label as training data. The acquired building image information downloadable open source scene classification data set is classified firstly, and then each classified building image is subjected to balance processing and enhanced channel operation for multiple times, wherein the enhanced channel operation refers to the operation of enhancing a dark channel, and the dark channel is a channel formed by the minimum value of each pixel point in three color channels (R, G, B) of the image. For example, the result after the smoothing operation on the original building image a is saved as An image a1, the result after the dark channel enhancement operation on a1 is saved as An image a2, the result after the smoothing operation on a2 is saved as A3, the result after the dark channel enhancement operation on A3 is saved as a4, and the two operations are sequentially performed on the previous image to obtain n images, wherein the definition a > a1> a2> of the images in the n images is. > An-1> An; selecting a pair of pictures Ax and Ay in the reorganized pictures, if x > y, the first picture is clear, and the label of the group of pictures is recorded as 0, and if x < y, the second picture is clear, and the label of the group of pictures is recorded as 1.

In an embodiment of the present application, in step S12, a twin network structure is built up through a convolutional layer, a checknorm layer, a scale layer, a relu layer, an eltwise layer, a pooling layer, a concat layer, and a full connection layer; respectively subtracting a first pixel value, a second pixel value and a third pixel value from RGB channel data of each pixel point of the reorganized image corresponding to each building in the training data to obtain an image with pixel distribution adjusted; the size of the image after the pixel distribution is adjusted is zoomed to a preset pixel, and a preset image is obtained; and simultaneously inputting the preset image and the corresponding building definition label into the twin network structure to train as an image definition comparison model. Here, the checknorm layer is a network layer for performing normalization operation on network data, the scale layer is a network layer for performing scaling and displacement on the network data, the eltwise layer is pixel-by-pixel operation, in the embodiment of the present application, pixel-by-pixel addition is performed, and the concat layer is a network layer for splicing the network data. Preferably, a twin network structure is built by adopting 42 convolutional layers, 42 batchnorm layers, 42 scale layers, 34 relu active layers, 16 eltwise layers, 4 pooling layers, 1 concat splicing layer and 3 full-connection layers; twinning a simple twin network structure as shown in fig. 2, in which the network layers are paired, and parameters shared by each pair of network layers, such as the weights and offsets of the convolutional layers, are the same in both branches of the twin network structure, and the parameters of both are updated simultaneously when the network transmits backwards; during training, according to the picture group and the label thereof prepared in the step S11, the first pixel value (e.g. 104), the second pixel value (e.g. 117) and the third pixel value (e.g. 124) are respectively subtracted from the 3 channel data of each pixel point of each picture, so that the distribution of pixels can be changed from 0-225 to be close to-125, which is more beneficial to network learning; and the picture from which the pixel value is subtracted is scaled to 224 × 224 pixel size, and is input into the twin network together with the building definition tag (both tags of 0 or 1) recorded in step S11.

Then, obtaining the class probability output by the last full connection layer of the full connection layers; calculating a cross entropy loss of the category probability and the building sharpness label; and judging whether the twin network structure reaches a convergence state according to the cross entropy loss. Here, a two-dimensional class probability is output from the last fully-connected layer of the full-connection, and the Cross Entropy loss (Cross Entropy Error, CE) between the class probability and the building definition label is calculated according to the following specific calculation formula:

where C represents the number of categories of two picture labels, and in the above embodiment, C is 2, S_iA calculated probability representing that the pair of image labels is i, i being 0 or 1, T_iRepresenting the actual probability that the pair of image labels is a class. Calculating back propagation according to cross entropy loss (CE), iteratively updating network parameters, and when the CE tends to be stabilized to a value of about 0.1 or less, considering that the twin network structure reaches a convergence state, and the trained image definition comparison model converges.

In an embodiment of the application, a current landmark building image can be intercepted from each image in the obtained building image information under a plurality of conditions; determining the label of the current landmark building image, and taking all the current landmark building images and the corresponding labels as adjustment training data; adjusting one branch in the twin network structure reaching the convergence state to obtain an adjusted network structure; inputting the adjusted training data into the adjusted network structure, and outputting a result; and iteratively updating the network parameters in the adjusted network structure according to the output result until the adjusted network structure reaches a convergence state, so as to obtain an image definition comparison model. Marking collected landmark building data near the observation station, adjusting the feature extraction part of the twin network structure obtained in the embodiment to obtain an adjusted network structure, and training the adjusted network structure by using new training data to obtain a final building definition comparison model. Specifically, the method comprises the following steps: labeling the building images under multiple conditions near the observation station acquired in step S11, marking the landmark building position [ x, y, width, height ] in each image, recording the storage time of the current image when acquiring the image, querying the visibility record issued by the observation station by using the storage time, thereby determining the label of the current landmark building image according to the record, and further obtaining the adjustment training data including all the current landmark building images and the corresponding labels. Inputting the adjusted training data into the adjusted network structure, and iteratively updating the network parameters in the adjusted network structure by using the output result until the adjusted network structure is converged, thereby finally obtaining an image definition comparison model.

In connection with the above embodiment, when determining the label of the current landmark building image, query data of the current landmark building image may be acquired, and the sharpness value of the current landmark building image is determined according to the query data; and determining the label of the current landmark building image according to the definition value of the current landmark building image. Here, the visibility records issued by the observation station are inquired through a network, and the image data of each record at the corresponding moment is searched; intercepting a landmark building image in a current picture, dividing a current visibility value by an actual distance value of a building, taking the obtained value as a definition value of the current building image, keeping the definition value as a preset value when the obtained value is larger than the preset value, and according to an empirical value, identifying the difference of the definitions by naked eyes in an interval of which the visibility value is 2.5 times larger than the distance of the building, wherein the preset value can be selected to be 3.0; and (3) inputting the building image and the corresponding definition label (0.0-3.0) thereof as adjustment training data into the adjusted network structure.

In an embodiment of the present application, the network parameter in one branch of the converged twin network structure may be fixed and connected to a full connectivity layer after the last full connectivity layer of the branch, so as to obtain an adjusted network structure. Here, a branch of the twin network portion in the image sharpness contrast model obtained in the above embodiment is taken as a feature extraction module (taking the portion from conv1 on the left side to feat as an example in fig. 2), network parameters in the branch are fixed, and then a full connection layer is connected to the rear side, so as to obtain an adjusted network structure.

Inputting the obtained adjusted training data into the adjusted network structure, outputting a floating point number as an output result, and calculating a mean square error loss function between the floating point number and the label of the current landmark building image; and iteratively updating the network parameters in the adjusted network structure according to the mean square error loss function. Here, using the building image and the corresponding sharpness label as the data input network, subtracting 104, 117, 124 from the 3 channel data of each pixel point, respectively, and scaling to 224 × 224 pixel size, the input network obtains a floating point number from the final fully-connected network, and calculates the MSE loss function between the value and the label:

wherein, y_iA tag value indicating the current ith image,

a predicted value indicating that the ith image has full connection layer output; and calculating back propagation according to the MSE, iteratively updating network parameters, and considering the model to be converged when the MSE tends to be stable to a value of about 0.1 or less.

In a specific embodiment of the present application, as shown in fig. 3, a picture taken by a camera is obtained, a preset building area image is cut, normalization processing is performed on image data, the data after the normalization processing is input to an image definition comparison model, a ratio a of visibility to a building distance is obtained, the actual distance of the building is multiplied by a to obtain the building visibility, and then the maximum visibility of all the buildings is counted. The process of normalizing the image data comprises the following steps: intercepting an image of a landmark building area through preset building position information, for example, acquiring RGB image data of a scene around an airport through a camera, intercepting the image of the building area according to the acquired building position information, respectively subtracting 104, 117 and 124 from 3 channel data of each pixel point of the building image, and zooming to 224 multiplied by 224 pixel size to obtain normalized data. Then inputting the normalized data into an image definition comparison model to obtain a float value a representing the ratio of the maximum visibility which can be achieved in the current image to the actual distance of the building, wherein if a is larger than 3.0, the value is 3.0, and if a is smaller than 0.0, the value is 0.0; and finally, multiplying the actually acquired actual distance d of the building by the obtained a to obtain the visibility value which can be reflected by the building image, and selecting the maximum value of the visibility values of all the building images at the same moment as the actual value of the visibility at the moment.

According to the method, an image definition comparison model of a twin network structure is trained in advance, the result of a feature extraction part of the image definition comparison model is used as a feature for reflecting the image definition, adjustment is performed on the feature extraction network, the image definition visibility comparison model with high robustness can be obtained, building definition values are obtained by utilizing the model, and the visibility value represented by each landmark building is obtained through calculation of the building definition values and the actual distances of the building definition values at all preset positions, so that the overall visibility is calculated.

In addition, an embodiment of the present application further provides an apparatus for visibility identification based on image sharpness, including: one or more processors; and a memory storing computer readable instructions that, when executed, cause the processor to perform any of the foregoing methods of visibility recognition based on image sharpness.

In addition, the embodiment of the present application further provides a computer readable medium, on which computer readable instructions are stored, where the computer readable instructions are executable by a processor to implement any one of the foregoing methods for visibility recognition based on image definition.

Fig. 4 is a schematic structural diagram of an apparatus for visibility recognition based on image sharpness, where the apparatus includes: the system comprises an acquisition device 11, a training device 12, an input device 13 and a recognition device 14, wherein the acquisition device 11 is used for acquiring building image information under a plurality of conditions and processing the acquired building image information into training data; the training device 12 is used for training an image definition comparison model according to the training data; the input device 13 is used for processing the acquired image information of the target building into data to be input, and inputting the data to be input into the image definition comparison model to obtain the ratio of visibility to building distance; the identification device 14 is used for determining the visibility of the target building according to the actual distance of the target building and the ratio of the visibility to the building distance.

It should be noted that the content executed by the obtaining device 11, the training device 12, the input device 13 and the recognition device 14 is the same as or corresponding to the content in the above steps S11, S12, S13 and S14, and for brevity, the description is omitted here.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, implemented using Application Specific Integrated Circuits (ASICs), general purpose computers or any other similar hardware devices. In one embodiment, the software programs of the present application may be executed by a processor to implement the steps or functions described above. Likewise, the software programs (including associated data structures) of the present application may be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Additionally, some of the steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.

In addition, some of the present application may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or techniques in accordance with the present application through the operation of the computer. Program instructions which invoke the methods of the present application may be stored on a fixed or removable recording medium and/or transmitted via a data stream on a broadcast or other signal-bearing medium and/or stored within a working memory of a computer device operating in accordance with the program instructions. An embodiment according to the present application comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to perform a method and/or a solution according to the aforementioned embodiments of the present application.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims

1. A visibility recognition method based on image definition is characterized by comprising the following steps:

training an image definition comparison model according to the training data;

2. The method of claim 1, wherein processing the acquired building image information into training data comprises:

3. The method of claim 2, wherein training an image sharpness comparison model based on the training data comprises:

4. The method of claim 3, wherein entering the preset image and the corresponding building clarity label into the twin network structure simultaneously comprises:

5. The method of claim 4, wherein the method comprises:

6. The method of claim 5, wherein determining the label of the current landmark building image comprises:

7. The method of claim 5, wherein adjusting one branch of the twin network structure that reaches the convergence state to obtain an adjusted network structure comprises:

8. The method of claim 5, wherein the output result is a floating point number, and wherein iteratively updating the network parameters in the adjusted network structure based on the output result comprises:

9. An apparatus for visibility recognition based on image sharpness, the apparatus comprising:

one or more processors; and

a memory storing computer readable instructions that, when executed, cause the processor to perform the operations of the method of any of claims 1 to 8.

10. A computer readable medium having computer readable instructions stored thereon which are executable by a processor to implement the method of any one of claims 1 to 8.