WO2023123869A1

WO2023123869A1 - Visibility value measurement method and apparatus, device, and storage medium

Info

Publication number: WO2023123869A1
Application number: PCT/CN2022/097325
Authority: WO
Inventors: 陈康; 谭发兵; 朱铖恺; 武伟
Original assignee: 上海商汤智能科技有限公司
Priority date: 2021-12-30
Filing date: 2022-06-07
Publication date: 2023-07-06
Also published as: CN114359211A

Abstract

Provided are a visibility value measurement method and apparatus, a device, and a storage medium. The method comprises: acquiring an image to be measured; performing feature extraction on said image to obtain a representation amount of said image; and determining a visibility value corresponding to said image on the basis of a pre-calibrated mapping relationship between the representation amount and the visibility value, and the representation amount of said image, wherein the representation amount can reflect the size of the visibility value of a scene contained in the image.

Description

Visibility value detection method, device, equipment and storage medium

Related Application Cross Reference

This application claims the priority of a Chinese patent application with application number 202111656430.5 and a filing date of December 30, 2021. The entire content of this Chinese patent application is hereby incorporated by reference into this application.

technical field

The present disclosure relates to the technical field of artificial intelligence, and in particular to a visibility value detection method, device, equipment and storage medium.

Background technique

Fog, smog, dust and other climates will lead to low visibility in the environment and affect traffic travel. For example, the common foggy climate is affected by the micro-climate environment in local areas. Fog with lower visibility appears in the local area of heavy fog. It is very harmful to road traffic safety, especially on high-speed expressways, and it is easy to cause major traffic accidents. Therefore, by identifying the visibility value of the area to be detected, it is possible to determine the influence of fog, sand and dust, etc., to guide traffic. Therefore, it is necessary to provide a scheme that can conveniently and accurately determine the visibility value of the area to be detected.

Contents of the invention

According to the first aspect of an embodiment of the present disclosure, there is provided a visibility value detection method, the method comprising: acquiring an image to be detected; performing feature extraction on the image to be detected to obtain a characterization quantity of the image to be detected; The mapping relationship between the calibrated characteristic quantity and the visibility value, and the characteristic quantity of the image to be detected determine the visibility value of the image to be detected; wherein, the characteristic quantity can reflect the magnitude of the visibility value of the scene contained in the image.

According to the second aspect of the embodiments of the present disclosure, there is provided a visibility value detection device, the device comprising: an acquisition module, used to acquire an image to be detected; a characterization module, used to analyze the image to be detected through a pre-trained neural network performing feature extraction to obtain the characterization quantity of the image to be detected; a visibility value determination module, configured to determine the corresponding The visibility value of , wherein the characterization quantity can reflect the magnitude of the visibility value of the scene contained in the image.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic device, the electronic device includes a processor, a memory, and computer instructions stored in the memory that can be executed by the processor, and the processor executes the computer Instructions, the method mentioned in the first aspect above can be implemented.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium, on which computer instructions are stored, and when the computer instructions are executed, the method mentioned in the above-mentioned first aspect is implemented.

According to a fifth aspect of the embodiments of the present disclosure, a computer program product is provided, the product includes a computer program, and when the computer program is executed by a processor, the method mentioned in the above first aspect is implemented.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Description of drawings

The accompanying drawings here are incorporated into the description and constitute a part of the present description. These drawings show embodiments consistent with the present disclosure, and are used together with the description to explain the technical solution of the present disclosure.

Fig. 1 is a flow chart of a method for detecting a visibility value according to an embodiment of the disclosure.

Fig. 2 is a schematic diagram of a detection method for determining a visibility value according to an embodiment of the present disclosure.

Fig. 3 is a schematic diagram of an application scenario according to an embodiment of the present disclosure.

Fig. 4 is a schematic diagram of a neural network structure according to an embodiment of the disclosure.

Fig. 5 is a schematic diagram of a logic structure of a road visibility detection device according to an embodiment of the disclosure.

Fig. 6 is a schematic diagram of a logic structure of an electronic device according to an embodiment of the present disclosure.

Detailed ways

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatuses and methods consistent with aspects of the present disclosure as recited in the appended claims.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only, and is not intended to limit the present disclosure. As used in this disclosure and the appended claims, the singular forms "a", "the", and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It should also be understood that the term "and/or" as used herein refers to and includes any and all possible combinations of one or more of the associated listed items. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality.

It should be understood that although the terms first, second, third, etc. may be used in the present disclosure to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of the present disclosure, first information may also be called second information, and similarly, second information may also be called first information. Depending on the context, the word "if" as used herein may be interpreted as "at" or "when" or "in response to a determination."

In order to enable those skilled in the art to better understand the technical solutions in the embodiments of the present disclosure, and to make the above-mentioned purposes, features and advantages of the embodiments of the present disclosure more obvious and understandable, the technical solutions in the embodiments of the present disclosure are described below in conjunction with the accompanying drawings The program is described in further detail.

Fog, smog, dust and other climates will lead to low visibility in the environment and easily lead to traffic accidents. Generally, the amount of haze or dust content in the area to be detected can be determined by identifying the visibility value of the area to be detected, so as to guide traffic travel. At present, when determining the visibility of the area to be inspected, some methods are to directly detect the visibility value of the area to be inspected through the visibility meter. This method requires the deployment of special inspection equipment in the area to be inspected, which is costly and difficult to implement. There are also some ways to determine the visibility of the area to be inspected based on the image of the area to be inspected. For example, the image of the area to be detected can be input into a pre-trained neural network to predict the visibility of the area to be detected. However, due to the difficulty in calibrating the exact visibility value of the sample image, it is difficult to obtain a large number of images carrying visibility value labels. Therefore, when calibrating the sample image used to train the neural network, it is usually only estimated based on human experience. The visibility value range of the area, for example, less than 50m, less than 100m, greater than 1000m, etc., so that when the neural network predicts the image of the area to be detected, it only predicts a rough visibility value range, and cannot obtain accurate visibility values, and Because the visibility value range of the sample image estimated by human experience will be affected by subjective factors, the prediction result is not very accurate.

Based on this, an embodiment of the present disclosure provides a visibility value detection method, which can pre-calibrate the mapping relationship between the characterization quantity that can reflect the visibility value of the scene contained in the image and the visibility value. When determining the visibility value of the scene contained in the image to be detected , the feature extraction of the image to be detected can be performed to obtain the representation of the image to be detected, and then based on the pre-calibrated mapping relationship between the representation and the visibility value, and the representation of the image to be detected, the visibility value of the scene contained in the image to be detected can be determined.

In this way, the visibility value of the image to be detected can also be determined without using a special visibility value tester. Compared with the related art that can only determine a rough visibility value range, the embodiment of the present disclosure can also obtain a finer-grained visibility value, which is convenient for subsequent applications.

The methods provided by the embodiments of the present disclosure can be executed by various electronic devices, such as cloud servers, user terminals, etc., and the methods can be used to predict visibility values in cloudy fog, haze, sand and other climates.

As shown in FIG. 1 , the method for determining the visibility value in the embodiment of the present disclosure may include steps S102 to S106.

S102. Acquire an image to be detected.

In step S102, the image to be detected can be obtained, the image to be detected can be an image of an area where the visibility value needs to be detected, collected by an image acquisition device, and the image to be detected can include scenes such as fog, sand and haze.

S104. Perform feature extraction on the image to be detected to obtain a characteristic quantity of the image to be detected.

In step S104, feature extraction may be performed on the image to be detected to obtain a characterization quantity representing the visibility value of the scene contained in the image to be detected. Among them, when performing feature extraction on the image to be detected, a pre-trained neural network can be used to extract features from the image to be detected, or other methods can be used to extract features from the image to be detected, as long as it can be determined that it can reflect the visibility value of the scene contained in the image to be detected The characterization quantity of size is enough. For example, the neural network can be trained through sample images, so that the trained neural network can accurately extract features related to the visibility value of the image, and output a representation quantity that can represent the visibility value. The representation quantity can be a representation of the visibility value Various information of size, for example, can be vectors, matrices, specific values, etc. In some embodiments, for the convenience of mapping the representation quantity and the visibility value, the representation quantity may be a one-dimensional scalar quantity.

S106. Determine the visibility value of the image to be detected based on the pre-calibrated mapping relationship between the characterization quantity and the visibility value, and the characterization quantity of the image to be detected, wherein the characterization quantity can reflect the visibility value of the scene contained in the image the size of.

In step S106, after obtaining the characterization quantity of the image to be detected, the characterization quantity of the image to be detected can be mapped into a visibility value based on the pre-calibrated mapping relationship between the characterization quantity and the visibility value, and the scene contained in the image to be detected can be obtained. visibility value. Wherein, the characterization quantity mentioned in the embodiments of the present disclosure can reflect the magnitude of the visibility value of the scene contained in the image. When calibrating the mapping relationship, feature extraction can be performed on some images with known visibility values to obtain the characterization quantities of these images, and the mapping relationship is constructed according to the correspondence between the characterization quantities of these images and the visibility values. Or, because it is difficult to calibrate the exact visibility value corresponding to the image, it is also possible to perform feature extraction on some images with known visibility value ranges, determine the distribution of the representations of these images, and based on the distribution range of representations of these images and the distribution of these images Construct the mapping relationship for the range of visibility values. There are many ways to calibrate the mapping relationship, as long as the mapping relationship can accurately reflect the corresponding relationship between the characterization quantity and the visibility value, which is not limited in the embodiments of the present disclosure.

Through the method of the embodiments of the present disclosure, the accurate visibility value of the scene contained in the image can be predicted, and the visibility of the image can be represented in a more fine-grained manner, which is convenient for subsequent applications.

In some embodiments, the feature extraction of the image to be detected may be performed through a pre-trained neural network to obtain the characterization of the image to be detected. As shown in Figure 2, the mapping relationship between the representation quantity and the visibility value can be constructed based on the pre-trained neural network, and then the neural network is used to determine the representation quantity of the image to be detected, and then based on the representation quantity and the mapping relationship of the image to be detected to determine the Detects the visibility value of an image. Since it is difficult to calibrate the accurate visibility value of the sample image, it is difficult to obtain a large number of samples with accurate visibility value labels for training the neural network when training the neural network. In order for the trained neural network to accurately extract features from the image and output a characterization quantity that can reflect the scene visibility value contained in the image, in some embodiments, when the neural network is trained, a pair of sample images carrying label information can be obtained , the label information is used to indicate the relative size relationship of the visibility values corresponding to the two frames of sample images in the sample image pair, and then the two frames of sample images in the sample image pair are input into the preset initial neural network, using the preset The initial neural network is assumed to output the prediction result of the relative size relationship between the visibility values of the two frames of sample images in the sample image pair, and then based on the difference between the prediction result and the real result indicated by the label information, the network parameters of the initial neural network are continuously adjusted to Train the neural network. For example, sample image A and sample image B can be obtained, wherein the visibility value of sample image A is greater than the visibility value of sample image B, and then the initial neural network is used to output the probability that sample image A is an image with a larger visibility value, and then with The sample image A is compared with the real probability of the image with a large visibility value, and the deviation is determined. Based on the deviation, the initial neural network parameters are continuously adjusted until convergence, and a trained neural network is obtained. In this way, without obtaining sample images carrying visibility value labels, a neural network that can accurately extract representations reflecting the magnitude of the visibility value of an image can also be trained.

In some embodiments, the preset initial neural network may include a feature extraction network, a representation quantity determination network, and an output layer. When the preset initial neural network is used to output the prediction results of the relative size relationship of the visibility values of the two frames in the sample image pair, the feature extraction network can be used to extract the features of the two frames of images in the sample image pair first, and then through the representation quantity Based on the features extracted by the feature extraction network, the determination network determines the respective representations of the two frames of images in the sample image pair, and then the visibility of the two frames of images can be determined through the output layer based on the respective representations of the two frames of images in the sample image pair The relative size relationship of the values.

Since the characterization quantity output by the neural network needs to reflect the visibility value of the scene contained in the image as truly as possible, the visibility value obtained based on the characterization quantity mapping will be accurate. In order to make the representation output by the neural network as accurate as possible, the applicant has optimized the structure of the neural network to improve its performance. In some embodiments, the characterization quantity determination network may include two network branches, that is, the first network branch and the second network branch, and the relative magnitudes of the visibility values of the two frame sample images in the output sample image pair according to the preset initial neural network When predicting the relationship, you can first perform feature extraction on the first image in the sample image pair through the feature extraction network, and input the extracted features to the first network branch, and use the first network branch based on the extracted features to obtain The characterization quantity of the first image, and then perform feature extraction on the second image in the sample image pair through the feature extraction network, and input the extracted features to the second network branch, and obtain based on the extracted features through the second network branch The representation of the second image, and then determine the prediction result based on the representation of the first image and the representation of the second image. After the feature extraction network is used to extract the features of the two frames of images in the sample image pair, the features of the extracted two frames of images are extracted by using two network branches, so as to obtain the first image and the second image respectively. Compared with only using one network branch to extract the features of the extracted two frames of images, the result obtained will be more accurate.

Of course, in some embodiments, the characterization quantity determination network may also only include one network branch, and the features extracted by the feature extraction network of the two frames of images in the sample image pair can all be input into the network branch, through which the network branch respectively Determining the characterization quantities corresponding to the two frames of images.

In some embodiments, one of the network branches may be randomly selected for the sample image input first, and another network branch is selected for the sample image input later. In some embodiments, the two network branches can also correspond to the visibility values of the sample images. For example, for the first network branch, it can be specially used for feature extraction of sample images with larger visibility values, and for the second network branch , which can be specially used for feature extraction of sample images with small visibility values, etc.

Of course, it is difficult to determine the visibility value corresponding to the image, but it is relatively simple to determine the approximate visibility range corresponding to the image, which can be determined by human experience and visual inspection. Therefore, a large number of images can be obtained, and the range of visibility values can be determined for each frame of image, and the visibility value range label of the image can be obtained, for example, 0-50m, 50-100m, 100-150m, greater than 1000m, etc. Images with the same visibility range can be A first image set is formed, so that multiple first image sets carrying visibility range labels can be obtained. When acquiring a pair of sample images, at least two first image sets can be acquired, wherein the visibility value range of each image in each first image set of the at least two first image sets is the same, and any two images in the first image set The visibility value ranges for do not overlap. Then one frame of image is respectively acquired from any two first image sets to form a sample image pair. Since the visibility ranges of the images in any two sample image sets are different, a large number of sample image pairs with different visibility values can be obtained by randomly combining each image in any two first image sets. Train the neural network.

In some embodiments, the characterization quantity representing the visibility value of the scene included in the image may be a one-dimensional scalar, such as a specific value, so as to facilitate the establishment of a mapping relationship between the characterization quantity and the visibility value.

In some embodiments, when constructing the mapping relationship between characterization quantities and visibility values, the specific visibility values of some images can also be determined first through a visibility meter or other means, and then based on the characterization quantities of these images and the visibility values of these images Build a mapping relationship. For example, a third image set may be obtained, wherein the third image set may include multiple frames of images with known visibility values, and then feature extraction may be performed on the multiple frames of images in the third image set to determine the representation of the multiple frames of images, Furthermore, a mapping relationship may be constructed based on the representation quantities of the multiple frames of images and the visibility values of the multiple frames of images.

In some embodiments, when using multiple frames of images in the third image set to construct the mapping relationship, in order to obtain a mapping relationship as accurate as possible, the visibility values of the multiple frames of images in the third image set can be evenly distributed within a specified visibility range , where the specified visibility range can be determined based on the approximate distribution range of the visibility values of the image to be tested. For example, assuming that the visibility values of the image to be detected are generally distributed within 1000m, it is necessary to accurately calibrate the visibility values and representations within the visibility value range Therefore, the specified visibility range is 0-1000m. In order to make the calibrated mapping relationship as accurate as possible, the visibility values of the multi-frame images in the third image set can be distributed as evenly as possible within this range to cover Individual visibility gradients within the range.

Of course, since the determination of the visibility value of the image requires special instruments, it is relatively cumbersome. In order to construct the mapping relationship without using special instruments, in some embodiments, when establishing the mapping relationship between the characterization quantity and the visibility value, at least one second image set can be obtained, and each second image set includes multiple frames of images , the visibility value range of the images in each second image set is the same, for example, both are 0-50m, or all are 50-100m. For each second image set, feature extraction can be performed on each frame of the image in the second image set to obtain the representation of each frame of the image in the second image set, for example, each frame of image can be input into the trained neural network, Utilize the neural network to output the characterization quantity of the image, so that the distribution range of the characterization quantity of each image in the second image set can be obtained, and then the characterization quantity can be determined according to the visibility value range and the characterization quantity distribution range of each image in each second image set The mapping relationship with the visibility value. Wherein, each second image set may include a large number of sample images, and the more images there are, the more the visibility values of the images can cover each visibility value within the visibility value range, and the more accurate the established mapping relationship is.

For example, assuming that a certain second image set includes a large number of images, the visibility values of these images are in the range of 0-50m. Since there are a large number of images, these images can basically cover each visibility value between 0-50m. Then the representations of these images can be output through the neural network, wherein the representations and visibility values can be positively or negatively correlated, for example, the greater the visibility, the greater the representation. Furthermore, the mapping relationship may be determined according to the distribution range of the characteristic quantities and the visibility value range of these images. For example, the distribution range of the token is A-B, assuming that the token is positively correlated with the visibility value, then the visibility value corresponding to token A is 0, and the visibility value corresponding to token B is 50, and then the mapping relationship can be established. Of course, the above is just a simple example, and it may be more complicated when actually constructing the mapping relationship.

In some embodiments, there may be multiple second image sets, and after the multiple second image sets are sorted according to the size of the visibility value range, the visibility value ranges of the images in two adjacent second image sets are continuous. For example, taking six second image sets as an example, the ranges of visibility values corresponding to these six second image sets are (0m,50m], (50m,100m], (100m,200m], (200m,500m], (500m, 1000m], (1000m, ∞), the ranges of visibility values corresponding to two adjacent second image sets are continuous and separated by target visibility values. For example, the visibility ranges are (0m, 50m] and ( 50m, 100m], the two second image sets are separated by the target visibility value 50m. In order to accurately construct the mapping relationship, the mapping relationship is determined based on the visibility value range and characterization quantity distribution range corresponding to each second image set When , based on the visibility value range and the representation quantity distribution range of each image in the adjacent two second image sets, for the target visibility values in the two adjacent second image sets, determine the representation quantity corresponding to the target visibility value, and then Determine the mapping relationship based on the characterization of the target visibility value. Since the target visibility value is the boundary between two adjacent visibility value ranges, when determining the mapping relationship, it is mainly determined based on the boundary of the visibility value range and the boundary of the distribution range of the characterization quantity , so accurately determining the characterization of the boundary is the key to accurately determining the mapping relationship.

For example, taking two second image sets whose visibility value ranges are (0m, 50m] and (50m, 100m] respectively as an example, the target visibility value 50m is the boundary of the two visibility value ranges. The characterization quantity corresponding to the target visibility value of 50m can be determined. For example, taking the positive correlation between the characterization quantity and the visibility value as an example, the characterization quantity corresponding to the target visibility value of 50m is the largest characterization quantity in the previous second image set, and the next second image set The minimum characterization quantity of the target visibility value 50m determined based on the two second image sets may be inconsistent, so the final target visibility value can be obtained based on the characterization quantities of the target visibility value 50m respectively determined by the two second image sets the amount of representation.

In some embodiments, when determining the characterization quantity of the target visibility value based on the visibility value range and the distribution range of the characterization quantity corresponding to two adjacent second image sets, it may be based on the visibility value range of the two adjacent second image sets and the distribution range of the characterization quantity, determine the initial characterization quantity corresponding to the target visibility value, and then continuously adjust the initial characterization quantity to obtain the adjusted characterization quantity, based on the adjusted characterization quantity, the images in the two adjacent second image sets are Classify until the accuracy of the classification results of the two adjacent second image sets reaches the maximum value, and then use the adjusted characterization as the characterization corresponding to the target visibility value.

For example, take two second image sets whose visibility values range from (0m, 50m] and (50m, 100m] as an example, assuming that the visibility value and the representation are positively correlated, the images in the previous second image set are input to the neural network After the network, the output characterization value is between 0-20. Therefore, based on the previous second image set, it can be preliminarily determined that the characterization value corresponding to 50m is 20; after inputting the image in the latter second image set to the neural network, the output The characterization quantity is between 18-40, therefore, based on the latter second image set, it can be preliminarily determined that the characterization quantity corresponding to 50m is 18, and then, one of the characterization quantities (18 or 20) or the average value of the two can be taken as 19 As the initial characterization quantity of the target visibility value of 50m, and then adjust the initial characterization quantity within a certain range (for example, increase or decrease by a certain step size), each time the initial characterization quantity is adjusted, based on the adjusted characterization quantity, the All the images in these two image sets are classified to obtain classification results bounded by the adjusted characterization amount, and determine the accuracy of each classification result. For example, suppose the adjusted characterization amount is 19, that is, 50m corresponds to If the representation value is 19, then according to the classification according to the representation value, the image whose representation value is in the range of 19-20 in the previous sample image set will be misclassified, so that a classification accuracy can be calculated. Similarly, the representation value in the latter sample image set is in the range of 18 The image of -19 will also be misclassified, so a classification accuracy can also be calculated. The characterization quantity when the classification accuracy of the two sample image sets is the highest can be taken as the characterization quantity corresponding to the target visibility value of 50m. For other target visibility Values (for example, 100m, 200m, 500, 1000m in the above six second image sets), a similar method can also be used.

In some embodiments, the pre-trained neural network may include a feature extraction network, a first network branch, and a second network branch. When the feature extraction of the image to be detected is performed through the pre-trained neural network to obtain the representation of the image to be detected, First, feature extraction can be performed on the image to be detected through the feature extraction network, and the extracted features can be input to any one of the first network branch or the second network branch to obtain the characterization quantity of the image to be detected, that is, when extracting the feature of the image to be detected When , one of the network branches can be used to output the characterization of the image to be detected. In some embodiments, the final representations may also be obtained by combining the representations of the images to be detected output by the two network branches. For example, the feature extraction network of the neural network can be used to perform feature extraction on the image to be detected, and the extracted features can be input to the first network branch to obtain the first representation, and the extracted features can be input to the second network branch. A second characterization quantity is obtained, and a characterization quantity of the image to be detected is obtained based on the first characterization quantity and the second characterization quantity. For example, the representation quantity of the image to be detected can be obtained by weighting the first representation quantity and the second representation quantity, wherein the weights can be the same, or can be set differently according to actual requirements.

In some embodiments, the image to be detected can be an image of a road area, and the image to be detected can be collected by an image acquisition device installed on the road. After determining the visibility value of the scene contained in the image to be detected, it can also be based on the visibility value. Determine the hazard level of the current specific climate (for example, foggy weather, sandy weather, haze climate, etc.), and select a control strategy corresponding to the hazard level to control the traffic on the road. For example, multiple levels for evaluating the degree of fog weather hazard can be preset. Different levels correspond to different visibility value ranges. Each hazard level can be preset with corresponding control strategies to control traffic on the road. For example, the management and control strategy may include early warning of foggy weather for vehicles on the road, the management and control strategy may also include prompting the speed of vehicles on the road to be controlled below a certain speed, and the management and control strategy may also include closing road intersections and prohibiting vehicles from passing.

In some embodiments, the image to be detected may be an image of a road area, and the image to be detected may be a multi-frame image collected at different times by an image acquisition device set in the road. After determining the visibility values corresponding to the multi-frame images, you can Predict the change trend of a specific climate (for example, foggy weather, sandy weather, haze weather, etc.) based on the visibility value. For example, a frame of image of the road area can be obtained every 30 minutes, and the corresponding visibility value of the image can be determined to obtain the change trend of the road visibility value within a period of time, so that it can be predicted that the fog concentration in the road is gradually increasing , or decrease, so as to predict the change trend of foggy climate.

In some embodiments, the image to be detected may be an image of a road area, and the image to be detected may be an image collected by an image acquisition device installed on the road at preset time intervals, for example, one or more frames of images are collected every hour , and then determine the visibility value corresponding to this frame or multiple frames of images through the neural network to determine whether a specific climate (for example, cloudy weather, sandy weather, haze weather, etc.) phenomenon is currently occurring. For example, for each frame of image, it may be determined whether the visibility value corresponding to the image exceeds a preset threshold, if the visibility values corresponding to each frame of image exceed the preset threshold, or the corresponding visibility values of a certain number of images exceed the preset threshold, Then it is determined that a specific weather phenomenon is currently occurring in the road area. Alternatively, the average value of the visibility values of the frame or multiple frames of images may be determined, and if the average value exceeds a preset threshold, it is determined that a specific weather phenomenon is currently occurring in the road area. By counting the total number of occurrences of a specific climate within a target period of time, the frequency of occurrences of a specific climate in a certain area can be predicted, which can be used to study the climate laws of the area.

For example, one or more frames of road area images can be collected every 2 hours, and the visibility value corresponding to this frame or multiple frames of road area images can be predicted by the neural network. If the visibility values of these images exceed the preset threshold, then It is determined that clusters of fog have occurred in the road area, and the frequency of daily clusters of fog can be determined by counting the total number of clusters of fog in a day.

In order to further explain the method for detecting the visibility value provided by the embodiment of the present disclosure, it will be explained below in conjunction with a specific embodiment.

For smog, dust, fog and other weather, it is usually necessary to detect the visibility of the road area, and manage traffic based on the visibility. Usually, the images of the road area are collected by cameras set on the road, and the visibility of the road area contained in the image is predicted based on the neural network. However, since it is difficult to calibrate the specific value of the exact visibility of the scene contained in the image, it is difficult to obtain a large number of sample images carrying specific value labels of the visibility to train the neural network. At present, the neural network is only trained by sample images carrying visibility value range labels, and the output of the neural network is only the classification result of the visibility value range of the image, and the exact specific value of visibility cannot be obtained. This embodiment provides a method that can detect specific values of visibility in an image, as shown in Figure 3 is a schematic diagram of an application scenario of the method, the method can be used to detect specific values of visibility in a road area, and then carry out road traffic monitoring Control. The method includes a neural network training stage, a calibration of the mapping relationship between a one-dimensional scalar output by the neural network and a concrete numerical value of visibility, and a neural network reasoning stage.

(1) Neural network training stage

A large number of sample images can be collected. For each sample image, there is no need to calibrate its specific visibility value. It is only necessary to determine the category of the visibility value range to which it belongs. For example, the sample images can be divided into 6 categories, and the corresponding 6 visibility value ranges are : (0m,50m], (50m,100m], (100m,200m], (200m,500m], (500m,1000m], (1000m,∞). Since only the visibility value range of the image needs to be calibrated, it is easy to implement, Thus, a large number of sample images can be acquired.

Then, the sample images in any two categories of the above six types of sample image sets can be combined in pairs to obtain a large number of sample image pairs, wherein the relative magnitude relationship of the visibility value of each sample image pair is known. Then, the sample image pair can be used as a training sample, and the relative size relationship of the visibility value of the sample image pair can be used as a label to train the initial neural network to obtain a trained neural network.

Wherein, the structure of the neural network is shown in FIG. 4 , including a feature extraction network, a first network branch, a second network branch, and an output layer. Among them, the feature extraction network, for example, can choose the resnet18 network, which includes 5 convolutional layers (such as conv1-conv5 in Figure 4), and accesses 2 network branches after conv5, namely the first network branch and the second network branch , each network branch contains 3 layers of fully connected layers. The output dimension of the fully connected layer fc1 is 256, and the activation function is selected as prelu; the output dimension of the fully connected layer fc2 is 128, and the activation function is selected as prelu; the output dimension of the fully connected layer fc3 is 1, and the activation function is selected as liner.

When a sample image pair (sample image A and sample image B) is input into the neural network, the sample image A can extract features through the feature extraction network, and then further input the extracted features to the first network branch for feature extraction, and finally A one-dimensional scalar RA is output, which can reflect the magnitude of the visibility value of the scene included in the sample image A. The sample image B can extract features through the feature extraction network, and then further input the extracted features to the second network branch for feature extraction, and finally output a one-dimensional scalar RB, which can reflect the visibility value of the scene included in the sample image B. Wherein, the one-dimensional scalar output by the neural network may be positively correlated with the specific value of the visibility, and the larger the one-dimensional scalar, the greater the specific value of the visibility.

Then the one-dimensional scalars RA and RB can be input into the output layer, and the output layer can determine the probability p that the sample image A is an image with a higher specific value of visibility based on the one-dimensional scalars RA and RB, and that the sample image B is an image with a higher specific value of visibility. Larger images with probability 1-p. Among them, the loss function can use cross-entropy loss. For example, assuming that the specific value of the visibility of the sample image A is greater than that of the sample image B, the real probability that the sample image A is an image with a larger specific value of the visibility is y (y is 1 or is preset A value close to 1, such as 0.8), the real probability that the sample image B is an image with a larger specific value of visibility is 1-y. Then the cross-entropy loss of predicted probability p and true probability y can be calculated: Cross Entropy(p,y)=-[y*log(p)+(1-y)*log(1-p)]. The parameters of the neural network can be continuously adjusted based on the cross-entropy loss until the loss function converges to obtain a trained neural network.

(2) Calibration of the mapping relationship between the one-dimensional scalar output of the neural network and the specific numerical value of the visibility

Each frame image in the above six classifications can be input into the neural network, and the neural network is used to output a one-dimensional scalar representing the specific value of the visibility corresponding to each frame image. Due to the large number of sample images, it can be considered that the output one-dimensional scalar is basically It can reflect various specific values of visibility. For example, there are 10,000 sample images in the category (0,50], which has a high probability of reflecting each visibility value between 0m and 50m, and then based on the labeling in each category The image visibility value range and the one-dimensional scalar range construct the mapping relationship. Among them, in order to accurately construct the mapping relationship, you can first determine the one-dimensional scalar corresponding to the boundary value of each visibility value range, for example, 0m, 50m, 100m, 200m, 500m , 1000m. Taking 0m as an example, since the one-dimensional scalar is positively correlated with the specific value of visibility, the minimum value of the one-dimensional scalar in the image whose visibility value range is (0,50] corresponds to the boundary value 0m. Taking 50m as an example, Since 50m is the boundary value of the two classifications, the maximum value of the one-dimensional scalar in the image set whose visibility value range is (0,50] corresponds to 50m, and the minimum value of the one-dimensional scalar value in the image set whose visibility value range is (50,100] is also Corresponding to 50m, the one-dimensional scalar corresponding to 50m determined based on the two classified image sets may not be equal. For example, the one-dimensional scalar corresponding to 50m in the image set whose visibility value range is in (0,50] is 10, and the visibility value range is in ( 50,100] in the image set, the one-dimensional scalar corresponding to 50m is 15. In order to accurately determine the one-dimensional scalar corresponding to the boundary value 50m, the one-dimensional scalar corresponding to the boundary value 50m can be continuously adjusted (for example, take the one-dimensional scalar between 10-15 value), and then determine the accuracy of classifying the images in these two classifications based on the adjusted one-dimensional scalar. For example, when the one-dimensional scalar corresponding to the boundary value 50m is 12, the visibility value range is located at ( 0,50], the accuracy is only 80%. For example, based on the one-dimensional scalar 12, 8000 images whose visibility value range is in (0,50] are classified, and the labels of these 8000 sample images Input the neural network after removal, and obtain 6400 images with visibility values in the range of (0,50], it can be seen that the accuracy based on the one-dimensional scalar 12 (corresponding to the boundary value 50m) is 80%; similarly, based on This one-dimensional scalar classifies all images whose visibility values range from (50,100], the accuracy is only 90%, and then continuously adjusts the one-dimensional scalar value corresponding to the boundary value 50m until the classification accuracy of the two classified images When the maximum value is reached, the one-dimensional scalar is determined as the one-dimensional scalar corresponding to the visibility value 50m. Similarly, for other boundary values of the visibility value range, a similar method can be used to determine its corresponding one-dimensional scalar. After determining After the one-dimensional scalar corresponding to the boundary value of the visibility value range, the mapping relationship can be constructed based on the one-dimensional scalar.

Certainly, in some embodiments, it is also possible to first determine the specific value of the visibility of multiple frames of images through the visibility meter, and then determine the one-dimensional scalar of the multiple frames of images through the pre-trained neural network, based on the one-dimensional scalar sum of the multiple frames of images The specific numerical value of the visibility constructs the mapping relationship. Wherein, in order to make the constructed mapping relationship as accurate as possible, the specific values of the visibility of the multiple frames of images should cover the gradients of each visibility value as much as possible.

(3) Neural network reasoning stage

When you want to detect the specific value of the visibility of the image to be detected, you can obtain the image to be detected, input the image to be detected into the neural network, and output the one-dimensional scalar of the image to be detected through the neural network (it can be determined by using a network branch, or can be selected The average of the output results of the two network branches). Then, the specific value of the visibility of the image to be detected is determined according to the one-dimensional scalar of the image to be detected and the pre-calibrated mapping relationship. After determining the specific value of the visibility of the image to be detected, the traffic on the road can be controlled based on the specific value of the visibility.

Through the method provided in this embodiment, the neural network can be trained without calibrating the specific value of the visibility of the sample image, and the specific value of the visibility of the image to be detected can be determined based on the trained neural network. Compared with determining the range of visibility values, more accurate visibility results can be obtained.

Corresponding to the above method, the embodiment of the present disclosure also provides a visibility value detection device. As shown in FIG. 5 , the device 50 includes: an acquisition module 51 for acquiring an image to be detected; The pre-trained neural network performs feature extraction on the image to be detected to obtain the characterization quantity of the image to be detected; the visibility value determination module 53 is used to determine the mapping relationship based on the pre-calibrated characterization quantity and visibility value, and the to-be-detected image. The characterization quantity of the detected image determines the visibility value corresponding to the image to be detected, wherein the characterization quantity can reflect the magnitude of the visibility value of the scene contained in the image.

Wherein, for the specific implementation process of the device for determining the visibility value of the image to be detected, reference may be made to the description in the above-mentioned method embodiments, and details are not repeated here.

In addition, an embodiment of the present disclosure is also an electronic device. As shown in FIG. 6 , the electronic device includes a processor 61, a memory 62, and computer instructions stored in the memory 62 for execution by the processor 61. When the processor executes the computer instructions, the method described in any of the foregoing embodiments can be implemented.

An embodiment of the present disclosure further provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the method described in any one of the foregoing embodiments is implemented.

Computer-readable media, including both permanent and non-permanent, removable and non-removable media, can be implemented by any method or technology for storage of information. Information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Flash memory or other memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridge, tape magnetic disk storage or other magnetic storage device or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media excludes transitory computer-readable media, such as modulated data signals and carrier waves.

It can be known from the above description of the implementation manners that those skilled in the art can clearly understand that the embodiments of this specification can be implemented by means of software plus a necessary general hardware platform. Based on this understanding, the essence of the technical solutions of the embodiments of this specification or the part that contributes to the prior art can be embodied in the form of software products, and the computer software products can be stored in storage media, such as ROM/RAM, A magnetic disk, an optical disk, etc., include several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in various embodiments or some parts of the embodiments of this specification.

The systems, devices, modules or units described in the above embodiments may be realized by computer chips or entities, or by products with certain functions. A typical implementing device is a computer, which may take the form of a personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media player, navigation device, e-mail device, game control device, etc. desktops, tablets, wearables, or any combination of these.

Each embodiment in this specification is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, as for the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for relevant parts, please refer to part of the description of the method embodiment. The device embodiments described above are only illustrative, and the modules described as separate components may or may not be physically separated, and the functions of each module may be integrated in the same or multiple software and/or hardware implementations. Part or all of the modules can also be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without creative effort.

The above is only the implementation of the embodiment of this specification. It should be pointed out that for those of ordinary skill in the art, without departing from the principle of the embodiment of this specification, some improvements and modifications can also be made. These improvements and retouching should also be regarded as the scope of protection of the embodiments of this specification.

Claims

A method for detecting a visibility value, comprising:

Obtain the image to be detected;

performing feature extraction on the image to be detected to obtain the characterization quantity of the image to be detected;

Determine the visibility value of the image to be detected based on the pre-marked mapping relationship between the characterization quantity and the visibility value and the characterization quantity of the image to be detected; wherein the characterization quantity can reflect the visibility value of the scene contained in the image. size.
The method according to claim 1, wherein performing feature extraction on the image to be detected comprises:

The image to be detected is subjected to feature extraction through a pre-trained neural network; wherein, the neural network is trained in the following manner:

Acquiring a sample image pair carrying tag information, the tag information being used to indicate the relative size relationship of the visibility values corresponding to the two frame images in the sample image pair;

Inputting two frames of images in the sample image pair into a preset initial neural network;

Outputting the prediction result of the relative size relationship of the visibility values of the two frames of images in the sample image pair by the initial neural network;

Based on the difference between the prediction result and the label information, adjust the network parameters of the initial neural network to obtain the neural network.
The method according to claim 2, wherein the initial neural network comprises a feature extraction network, a representation quantity determination network and an output layer;

Outputting the prediction results of the relative size relationship between the visibility values of the two frames of images in the sample image pair by the initial neural network, including:

Carry out feature extraction to two frames of images in the sample image pair through the feature extraction network;

Determining the respective characterization quantities of the two frames of images in the sample image pair through the characterization quantity determination network based on the features extracted by the feature extraction network;

The relative size relationship of the visibility values of the two frames of images is determined through the output layer based on the respective representation quantities of the two frames of images in the sample image pair.
The method according to claim 3, wherein the characterization quantity determination network comprises a first network branch and a second network branch;

Determining the respective characterizations of the two frames of images in the sample image pair through the feature determination network based on the features extracted by the feature extraction network, including:

determining, by the first network branch, a characterization quantity of the first image based on features extracted from the first image in the pair of sample images;

A characterization quantity of the second image is determined by the second network branch based on features extracted from the second image in the pair of sample images.
The method according to any one of claims 2-4, wherein the sample image pair is obtained based on the following method:

Acquiring at least two first image sets, wherein, for each first image set in the at least two first image sets, the visibility value ranges of the images in the first image sets are the same, and any two of the first images The visibility value ranges of the images in the set do not overlap;

One frame of image is respectively selected from the arbitrary two first image sets to form the sample image pair.
The method according to any one of claims 1-5, wherein the characterization quantity is a one-dimensional scalar quantity, and the mapping relationship between the pre-calibrated characterization quantity and the visibility value is determined based on the following method:

Acquiring at least one second image set, for each second image set in the at least one second image set, the visibility value ranges of the images in the second image set are the same;

performing feature extraction on each image in each second image set in the at least one second image set, to obtain the characterization quantity of each image in each second image set;

Based on the characterization quantities of the images in the respective second image sets, obtain the distribution range of the characterization quantities of the images in the respective second image sets;

The mapping relationship is determined based on the visibility value range of each image in each second image set and the distribution range of the characteristic quantity.
The method according to any one of claims 1-5, wherein the characterization quantity is a one-dimensional scalar quantity, and the mapping relationship between the pre-calibrated characterization quantity and the visibility value is determined based on the following method:

Acquiring a third image set, where the visibility value of each image in the third image set is known;

performing feature extraction on each image in the third image set, to obtain the characterization quantity of each image in the third image set;

The mapping relationship is determined based on the characterization of each image in the third image set and the visibility value of each image in the third image set.
The method according to claim 6, wherein, based on the visibility value range of each image in each second image set and the distribution range of the characteristic quantity, determining the mapping relationship includes:

the at least one second image set includes a plurality of second image sets;

Based on the visibility value range of each image in two second image sets with adjacent visibility value ranges in the plurality of second image sets and the distribution range of the characteristic quantity, determine the characteristic quantity corresponding to the target visibility value, wherein the two adjacent The visibility value ranges of the images in the second image set are continuous and separated by the target visibility value;

The mapping relationship is determined based on the characterization quantity corresponding to the target visibility value.
The method according to claim 8, wherein the target is determined based on the visibility value range of each image in two second image sets whose visibility value ranges are adjacent to the plurality of second image sets and the distribution range of the characteristic quantity. The characterization quantities corresponding to the visibility value include:

Based on the visibility value range of each image in two second image sets whose visibility value ranges are adjacent to the plurality of second image sets and the distribution range of the characterization quantity, determine the initial characterization quantity corresponding to the target visibility value;

adjusting the initial characterization quantity to obtain an adjusted characterization quantity, and classifying the images in the two second image sets based on the adjusted characterization quantity until the accuracy of the classification results of the two second image sets is equal to Reaches the maximum value;

The adjusted characterization quantity is used as the characterization quantity corresponding to the target visibility value.
The method according to any one of claims 1-9, wherein the characterization quantity of the image to be detected is obtained by performing feature extraction on the image to be detected by a pre-trained neural network, and the neural network includes a feature extraction network , the first network branch and the second network branch;

Feature extraction is performed on the image to be detected by a pre-trained neural network to obtain the characterization of the image to be detected, including:

Feature extraction is performed on the image to be detected through the feature extraction network to obtain features of the image to be detected, and the extracted features are input to any of the first network branch or the second network branch. One, obtaining the characterization quantity of the image to be detected.
The method according to any one of claims 1-9, wherein the characterization quantity of the image to be detected is obtained by performing feature extraction on the image to be detected by a pre-trained neural network, and the neural network includes a feature extraction network , the first network branch and the second network branch;

Feature extraction is performed on the image to be detected by a pre-trained neural network to obtain the characterization of the image to be detected, including:

Feature extraction is performed on the image to be detected through the feature extraction network of the neural network to obtain features of the image to be detected, and the extracted features are input into the first network branch to obtain a first characterization quantity ;

inputting the extracted features into the second network branch to obtain a second representation;

A characteristic quantity of the image to be detected is obtained based on the first characteristic quantity and the second characteristic quantity.
The method according to any one of claims 1-11, wherein the image to be detected comprises an image of a road area, and the method further comprises:

Determine the hazard level of a specific climate based on the visibility value of the road area contained in the image to be detected;

The traffic in the road area is controlled according to the control strategy corresponding to the hazard level.
The method according to any one of claims 1-12, wherein the images to be detected include multiple frames of images of road areas collected at different times, the method comprising:

Based on the change trend of the visibility value of the road area in the multiple frames of images, the change trend of a specific climate in the road area is predicted.
The method according to any one of claims 1-12, wherein the images to be detected include images of road areas collected at preset time intervals, the method comprising:

In response to the visibility value corresponding to the image exceeding a preset threshold, it is determined that a specific climate occurs in the road area;

The frequency of occurrence of the specific weather in the road area is determined based on the total number of occurrences of the specific weather in the road area within the target time period.
A visibility value detection device, comprising:

An acquisition module, configured to acquire an image to be detected;

A characterization module, configured to perform feature extraction on the image to be detected through a pre-trained neural network to obtain a characterization of the image to be detected;

A visibility value determination module, configured to determine the visibility value corresponding to the image to be detected based on the pre-marked mapping relationship between the characterization quantity and the visibility value, and the characterization quantity of the image to be detected, wherein the characterization quantity can reflect the The size of the visibility value of the contained scene.
An electronic device, the device comprising a processor, a memory, and computer instructions stored in the memory for execution by the processor, when the processor executes the computer instructions, any one of claims 1-14 is realized the method described.
A computer-readable storage medium, where computer instructions are stored on the computer-readable storage medium, and when the computer instructions are executed, the method according to any one of claims 1-14 is implemented.
A computer program product, the product includes a computer program, and when the computer program is executed by a processor, the method according to any one of claims 1 to 14 is implemented.