WO2023120988A1

WO2023120988A1 - Vehicle camera occlusion classification device using deep learning-based object detector and method thereof

Info

Publication number: WO2023120988A1
Application number: PCT/KR2022/017979
Authority: WO
Inventors: 한동석; 유민우; 성재호
Original assignee: 경북대학교 산학협력단
Priority date: 2021-12-22
Filing date: 2022-11-15
Publication date: 2023-06-29

Abstract

The present invention relates to a vehicle camera occlusion classification device using a deep learning-based object detector and a method thereof. The vehicle camera occlusion classification device using the deep learning-based object detector according to the present invention comprises: an input unit which receives a captured original image as input from a camera frame by frame; a first feature extraction unit which extracts features of the input frames by reducing the size of the frames and then inputting same to a convolution neural network (CNN); a second feature extraction unit which uses an object detection algorithm to extract features of objects included in the frames input from the input unit; a calculation unit which performs a calculation by mixing the features of the frames and the features of the objects and then inputting same to an artificial neural network (ANN); and a determination unit which determines whether the camera is occluded according to the result of the calculation.

Description

Vehicle camera occlusion classification apparatus and method using deep learning-based object detector

The present invention relates to a vehicle camera occlusion classification apparatus and method using a deep learning-based object detector, and more particularly, to a deep-learning-based object detector that classifies whether a camera is occluded from a frame of a camera image using a deep-learning object detector It relates to a vehicle camera occlusion classification device and method using

Recently, many studies related to autonomous driving in the field of automobiles have been conducted in relation to unmanned autonomous driving systems.

A self-driving vehicle is a vehicle that goes to its destination on its own without the driver manipulating the steering wheel, accelerator pedal, or brakes.

Autonomous driving systems or advanced driver assistance systems installed in these self-driving vehicles automatically control the driving of the vehicle from the starting point to the ending point on the road using GPS location information and signals acquired from various sensors based on road map information. Assists driving to enable safe driving.

In this way, in order to smoothly perform autonomous driving, it is necessary to be able to control the movement of the vehicle by collecting various sensor data and processing them.

In particular, an autonomous driving system requires the help of a sensor capable of recognizing surrounding objects and a graphic processing device to recognize and determine the driving environment of a vehicle moving at high speed in real time.

At this time, the sensor measures the distance between objects and detects dangers to help you see all areas without blind spots, and the graphic processing unit identifies the surrounding environment of the vehicle through multiple cameras and analyzes the images so that the vehicle help you get there safely.

However, when the camera sensor is obstructed by an object for various reasons, there is a problem that may lead to a large-scale accident as the probability of the camera sensor failing to detect the object or erroneously detecting the object increases.

Therefore, for safe and smooth autonomous driving, it is necessary to develop a technology for accurately detecting whether the camera sensor is blocked.

The background technology of the present invention is disclosed in Republic of Korea Patent Registration No. 10-2253989 (2021. 05. 20. Notice).

The technical problem to be achieved by the present invention is to provide a vehicle camera occlusion classification apparatus and method using a deep learning-based object detector that classifies whether a camera is occluded by an object from a frame of a camera image using a deep learning object detector. will be.

A vehicle camera occlusion classification apparatus using a deep learning-based object detector according to an embodiment of the present invention for achieving this technical problem includes an input unit for receiving an original image taken from a camera in frame units; a first feature extractor for extracting features of the frame by reducing the size of the input frame and inputting the input to a convolutional neural network (CNN); a second feature extraction unit extracting features of an object included in a frame received from the input unit using an object detection algorithm; a calculation unit that mixes the characteristics of the frame and the characteristics of the object and inputs them to an artificial neural network (ANN) for calculation; and a determining unit determining whether the camera is blocked based on the calculation result.

At this time, the first feature extraction unit reduces the size of the frame to 100x100 using Nearest Neighbor interpolation, and then inputs and extracts features of the frame to the convolutional neural network in one dimension. Can be expanded .

In addition, the second feature extractor extracts the feature of each object including location information, class information, and reliability information of the object in the frame using a deep learning-based object detection algorithm, and uses the location information to extract each object feature. A box value for each object is calculated, and a set number of trust values are extracted in order of highest reliability value included in the reliability information, and FC (Fully -Connected) Can be combined into one dimension for layer operation.

In addition, the second feature extractor calculates the box value using the x coordinate value, the y coordinate value included in the location information of the object, the width of the bounding box of the image size, and the height of the bounding box of the image size, the box value is It can be calculated by the following formula.

where BOX is the box value, V _size is the size of the box, V _ratio is the aspect ratio of the box, w ₀ is the horizontal pixel of the detection box, w _t is the horizontal pixel of the image, h ₀ is the vertical pixel of the detection box, h _t is the image Vertical pixel, axis _min is the minimum value among horizontal or vertical pixels of the detection box, axis _max is the maximum value among horizontal or vertical pixels of the detection box, λ _size is a size adjustment constant parameter, and λ _ratio is a ratio adjustment constant parameter.

In addition, the determination unit may determine whether the camera is occluded according to a class classification result included in the operation result by using a softmax function.

In addition, a vehicle camera occlusion classification method using a deep learning-based object detector according to another embodiment of the present invention includes the steps of receiving an original image taken from a camera in frame units; After reducing the size of the input frame, inputting the input to a convolutional neural network (CNN) to extract features of the frame; extracting features of an object included in a frame received from the input unit using an object detection algorithm; After mixing the characteristics of the frame and the characteristics of the object, inputting the mixture into an artificial neural network (ANN) for calculation; and determining whether or not the camera is occluded according to the calculation result.

In addition, in the step of extracting the feature of the frame, after reducing the size of the frame to 100x100 using Nearest Neighbor interpolation, the feature of the frame extracted by inputting it to the convolutional neural network is converted into one dimension. can unfold

In addition, in the step of extracting the feature of the object, the feature of the object including location information, class information, and reliability information of the object in the frame is extracted for each object using a deep learning-based object detection algorithm, and the location information Calculating a box value for each object using , extracting a set number of trust values in the order of highest confidence value included in the reliability information, and using the box values of objects corresponding to the extracted trust values , combining the characteristics of corresponding objects into one dimension for FC (Fully-Connected) layer operation.

In addition, the step of extracting the feature of the object calculates the box value using the x coordinate value, the y coordinate value included in the location information of the object, the width of the bounding box of the image size, and the height of the bounding box of the image size, The box value can be calculated by the following formula.

In the step of determining whether the camera is occluded, whether the camera is occluded may be determined according to a class classification result included in the calculation result using a softmax function.

As described above, according to the present invention, it is possible to prevent the camera from failing or erroneously detecting an object by classifying whether or not the camera is occluded by an object from the frame of the camera image using the deep learning object detector, thereby reducing the probability of occurrence of an accident. There is an effect that can reduce it.

In addition, according to the present invention, as it detects whether the camera sensor is blocked, it can be important in autonomous vehicles, and it can be applied to various systems using camera sensors in addition to vehicles, so it can be used universally in various fields. .

1 is a block diagram showing a vehicle camera occlusion classification apparatus using a deep learning-based object detector according to an embodiment of the present invention.

2 and 3 are exemplary diagrams for explaining the second feature extraction unit of FIG. 1 .

4 is a diagram showing a clear image by way of example in a vehicle camera occlusion classification apparatus using a deep learning-based object detector according to an embodiment of the present invention.

5 is a diagram exemplarily illustrating a blurred image in a vehicle camera occlusion classification apparatus using a deep learning-based object detector according to an embodiment of the present invention.

FIG. 6 is an example of arranging box values and confidence values calculated in the second feature extraction unit of FIG. 1 .

FIG. 7 is an exemplary diagram for explaining an artificial neural network calculation process in the calculation unit of FIG. 1 .

8 is a flowchart illustrating an operation flow of a vehicle camera occlusion classification method using a deep learning-based object detector according to an embodiment of the present invention.

Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings. In this process, the thickness of lines or the size of components shown in the drawings may be exaggerated for clarity and convenience of description.

In addition, terms to be described later are terms defined in consideration of functions in the present invention, which may vary according to the intention or custom of a user or operator. Therefore, definitions of these terms will have to be made based on the content throughout this specification.

Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the drawings.

First, a vehicle camera occlusion classification apparatus using a deep learning-based object detector according to an embodiment of the present invention will be described with reference to FIGS. 1 to 7 .

1, the vehicle camera occlusion classification apparatus 100 using the deep learning-based object detector according to an embodiment of the present invention includes an input unit 110, a first feature extraction unit 120, and a second feature extraction unit 130. ), a calculation unit 140 and a determination unit 150.

First, the input unit 110 receives an original image photographed by a camera (not shown) in units of frames.

In this case, the camera may be a camera used in an autonomous vehicle or the like, and may also be a sensor camera used in various systems.

In addition, the first feature extractor 120 reduces the size of the frame input through the input unit 110 and extracts the feature of the frame by inputting it to a convolutional neural network (CNN).

In detail, the first feature extraction unit 120 reduces the size of the frame input through the input unit 110 to 100x100 using Nearest Neighbor interpolation, and then inputs the feature to the convolutional neural network to extract the characteristics of the frame. to one dimension (1-D) for artificial neural network (ANN) computation.

In this case, since the convolutional neural network in the first feature extractor 120 is calculated in the same way as the generally used convolutional neural network operation, a detailed description thereof will be omitted.

The second feature extraction unit 130 extracts features of objects included in frames received from the input unit 110 using a deep learning-based object detection algorithm.

At this time, the second feature extractor 130 extracts object features including location information, class information, and reliability information of objects in a frame for each object using a deep learning-based object detection algorithm.

As shown in FIG. 2 , the second feature extraction unit 130 inputs the frame received from the input unit 110 to a deep learning-based object detection algorithm to extract features of objects included in the frame for each object.

At this time, the characteristics of the extracted object include location information (x, y, w, h) of the object, class information (c), and reliability information (p) as shown in FIG. 3 . 3 as an example, object 1 (x1, y1, w1, h1, c1, p1), object 2 (x2, y2, w2, h2, c2, p2), object 3 (x3, y3, w3, h3, c3, p3), object features can be extracted for each object.

Therefore, the second feature extractor 130 calculates a box value for each object using the location information (x, y, w, h), and sets the set trust values in order of highest reliability value included in the reliability information. By using the box values of objects corresponding to the extracted confidence values, the features of the corresponding objects are combined into one dimension for FC (Fully-Connected) layer operation.

As shown in FIG. 4, the second feature extractor 130 extracts features for all objects (object 1, object 2, and object 3 in FIG. 4) when the frame input from the input unit 110 is a clear image. can In this case, a box value (Box _i ) may be calculated using location information included in the feature of the extracted object, and a confidence value ( _Confi ) for the classified class may be extracted.

At this time, the class and confidence values are values automatically extracted through a deep learning-based object detector. That is, taking FIG. 4 as an example, when the class of object 1 classified through the object detector is a truck and the confidence value ( _Confi ) is extracted as 80%, the probability that object 1 is a truck increases.

In addition, the second feature extractor 130 uses box values of objects corresponding to the extracted confidence values to combine the features of corresponding objects in one dimension for FC layer operation.

Unlike FIG. 4, in FIG. 5, when the frame received from the input unit 110 is an unclear (blurred) image, all objects (object 1, object 2 in FIG. 5, Features of object 3) can be extracted. At this time, since it is impossible to extract object 3 in the image of FIG. 5, the box value and confidence value are extracted as 0.

As shown in FIG. 6, the second feature extractor 130 sorts the reliability information in descending order in the order of highest reliability value, and then uses the box values of the objects corresponding to the top 10 highest reliability values. Then, the characteristics of the corresponding objects are merged into one dimension for FC layer operation.

At this time, the second feature extractor 130 determines the x coordinate value (x), the y coordinate value (y) included in the location information of the object, the width of the bounding box of the image size (w), and the height of the bounding box of the image size (h) ) is used to calculate the box value, but the box value is calculated by Equation 1 below.

[mathematical expression]

That is, the box value is calculated by multiplying V _size determined by the size of the box and V _ratio determined by the aspect ratio of the box.

Then, the operation unit 140 concatenates the frame features extracted from the first feature extractor 120 and the object features extracted from the second feature extractor 130, and inputs them to the artificial neural network for calculation.

As shown in FIG. 7, the calculation unit 140 mixes the feature (a) of the frame extracted from the first feature extraction unit 120 and the feature (b) of the object extracted from the second feature extraction unit 130. Calculations are performed using an artificial neural network, and the calculation results (Normal or Abnormal) are output.

At this time, since the artificial neural network in the calculation unit 140 is calculated in the same way as the generally used artificial neural network calculation, a detailed description thereof will be omitted.

Finally, the determination unit 150 determines whether or not the camera is blocked according to the operation result of the operation unit 140 .

In this case, the determination unit 150 may determine whether or not the camera is blocked according to a class classification result included in the operation result of the operation unit 140 by using a Softmax function.

Here, the softmax function is a multi-dimensional generalization of the logistic function, which is used in multinomial logistic regression and is often used as the last activation function to obtain a probability distribution in artificial neural networks. Contrary to its name, it does not smooth or smooth the max function, but smooth the one-hot arg max function, which is the argument of the max value. The method of calculation is to take an exponential function with the base of the natural logarithm as the input value and divide it by the sum of the exponential function.

In other words, the softmax function makes the K class classification results produced by the artificial neural network interpretable as probabilities.

Therefore, if the probability that the final calculation result of the calculation unit 140 is normal is high, the determination unit 150 determines that the camera is in a normal state, and if it is highly likely to be abnormal, it is determined that the camera is blocked. can judge

In an embodiment of the present invention, if only the original frame is used to classify whether or not the camera is occluded, accuracy is reduced even if the camera is occluded by an object when the original frame is blurry as shown in FIG. 5 . The reason is that the ability to extract information of objects in the frame is inferior in the original frame.

Therefore, in the embodiment of the present invention, using the object detection result information extracted using the deep learning-based object detector, artificial neural network operation is performed together with the feature information extracted from the original frame to finally determine whether the camera is blocked, thereby improving the accuracy of the determination. can

Hereinafter, a vehicle camera occlusion classification method using a deep learning-based object detector according to an embodiment of the present invention will be described with reference to FIG. 8 .

8 is a flow chart illustrating an operation flow of a vehicle camera occlusion classification method using a deep learning-based object detector according to an embodiment of the present invention, and a detailed operation of the present invention will be described with reference to this flow.

According to an embodiment of the present invention, first, the input unit 110 receives an original image photographed by the camera in units of frames (S10).

Next, the first feature extractor 120 reduces the size of the frame received in step S10 and extracts the feature of the frame by inputting it to the convolutional neural network (S20).

At this time, in step S20, the first feature extraction unit 120 reduces the size of the frame input through the input unit 110 to 100x100 by using the nearest-neighbor interpolation method, and inputs it to the convolutional neural network to artificially extract the frame features. It is spread in one dimension for neural network computation.

Next, the second feature extraction unit 130 extracts the features of the object included in the frame received in step S10 using the object detection algorithm (S30).

At this time, in step S30, the second feature extractor 130 extracts object features including location information, class information, and reliability information of objects in the frame for each object using a deep learning-based object detection algorithm.

In detail, the second feature extractor 130 calculates a box value for each object using location information, extracts a set number of trust values in order of highest reliability value included in the reliability information, and corresponds to the extracted trust values. By using the box values of the object to be performed, the features of the corresponding objects are merged into one dimension for FC layer operation. At this time, the box value is calculated by Equation 1 above.

Then, the calculation unit 140 mixes the characteristics of the frame extracted in step S20 and the characteristics of the object extracted in step S30, and inputs the mixture to the artificial neural network for calculation (S40).

In step S40, the calculation unit 140 calculates using an artificial neural network and outputs a calculation result (Normal or Abnormal).

Finally, the determination unit 150 determines whether the camera is blocked according to the operation result of step S40 (S50).

In detail, in step S50, the determination unit 150 may determine whether or not the camera is blocked according to the class classification result included in the operation result of the operation unit 140 using the softmax function.

More specifically, the determination unit 150 determines that the camera is in a normal state when the probability that the final calculation result of the calculation unit 140 is normal is high, and if the probability is high that the final calculation result is abnormal, the camera is occluded. can be judged to have occurred.

Such a vehicle camera occlusion classification apparatus and method using a deep learning-based object detector may be implemented as an application or implemented in the form of program instructions that can be executed through various computer components and recorded on a computer-readable recording medium. . The computer readable recording medium may include program instructions, data files, data structures, etc. alone or in combination.

Program instructions recorded on the computer-readable recording medium may be those specially designed and configured for the present invention, or those known and usable to those skilled in the art of computer software.

Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floptical disks. media), and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like.

Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter or the like as well as machine language codes such as those produced by a compiler. The hardware device may be configured to act as one or more software modules to perform processing according to the present invention.

As described above, the vehicle camera occlusion classification apparatus and method using a deep learning-based object detector according to an embodiment of the present invention classifies whether a camera is occluded by an object from a frame of a camera image using a deep learning object detector. Accordingly, it is possible to prevent the camera from failing or erroneously detecting an object, thereby reducing the probability of an accident.

In addition, according to an embodiment of the present invention, as it detects whether the camera sensor is blocked, it can be important in autonomous vehicles, and it can be applied to various systems using camera sensors in addition to vehicles, so it can be used universally in various fields. It can be.

Although the present invention has been described with reference to the embodiments shown in the drawings, this is merely exemplary, and those skilled in the art will understand that various modifications and equivalent other embodiments are possible therefrom. will be. Therefore, the true technical protection scope of the present invention should be determined by the technical spirit of the claims below.

[Description of code]

100: vehicle camera occlusion classification device 110: input unit

120: first feature extraction unit 130: second feature extraction unit

140: calculation unit 150: determination unit

Claims

an input unit that receives an original image captured by the camera in units of frames;

a first feature extractor for extracting features of the frame by reducing the size of the input frame and inputting the input to a convolutional neural network (CNN);

a second feature extraction unit extracting features of an object included in a frame received from the input unit using an object detection algorithm;

a calculation unit that mixes the characteristics of the frame and the characteristics of the object and inputs them to an artificial neural network (ANN) for calculation; and

Vehicle camera occlusion classification apparatus using a deep learning-based object detector including a determination unit for determining whether the camera is occluded according to the calculation result.
According to claim 1,

The first feature extraction unit,

After reducing the size of the frame to 100x100 using Nearest Neighbor interpolation, vehicle camera occlusion using a deep learning-based object detector that expands the features of the frame extracted by inputting to the convolutional neural network in one dimension sorting device.
According to claim 1,

The second feature extraction unit,

Using a deep learning-based object detection algorithm, the feature of the object including the location information, class information, and reliability information of the object in the frame is extracted for each object,

A box value for each object is calculated using the location information, and a set number of trust values are extracted in order of highest trust value included in the reliability information, and box values of objects corresponding to the extracted trust values are used to obtain corresponding A vehicle camera occlusion classification device using a deep learning-based object detector that combines the features of objects into one dimension for FC (Fully-Connected) layer calculation.
According to claim 3,

The second feature extraction unit,

Calculate the box value using the x coordinate value, the y coordinate value included in the location information of the object, the width of the bounding box of the image size, and the height of the bounding box of the image size,

The box value is a vehicle camera occlusion classification device using a deep learning-based object detector calculated by the following equation:

where BOX is the box value, V size is the size of the box, V ratio is the aspect ratio of the box, w 0 is the horizontal pixel of the detection box, w t is the horizontal pixel of the image, h 0 is the vertical pixel of the detection box, h t is the image Vertical pixel, axis min is the minimum value among horizontal or vertical pixels of the detection box, axis max is the maximum value among horizontal or vertical pixels of the detection box, λ size is a size adjustment constant parameter, and λ ratio is a ratio adjustment constant parameter.
According to claim 1,

The judge,

Vehicle camera occlusion classification apparatus using a deep learning-based object detector for determining whether the camera is occluded according to a class classification result included in the calculation result using a softmax function.
In the vehicle camera occlusion classification method performed by a vehicle camera occlusion classification apparatus using a deep learning-based object detector,

Receiving an original image photographed by a camera in units of frames;

After reducing the size of the input frame, inputting the input to a convolutional neural network (CNN) to extract features of the frame;

extracting features of an object included in a frame received from the input unit using an object detection algorithm;

After mixing the characteristics of the frame and the characteristics of the object, inputting the mixture into an artificial neural network (ANN) for calculation; and

A vehicle camera occlusion classification method comprising the step of determining whether the camera is occluded according to the calculation result.
According to claim 6,

The step of extracting the feature of the frame,

Vehicle camera occlusion classification method of reducing the size of the frame to 100x100 using Nearest Neighbor interpolation and then expanding the characteristics of the frame extracted by inputting to the convolutional neural network in one dimension.
According to claim 6,

The step of extracting the features of the object,

Using a deep learning-based object detection algorithm, the feature of the object including location information, class information, and reliability information of the object in the frame is extracted for each object,

Calculating a box value for each object using the location information;

extracting a set number of trust values in order of highest reliability values included in the reliability information; and

A vehicle camera occlusion classification method comprising the step of merging features of corresponding objects into one dimension for FC (Fully-Connected) layer calculation using box values of objects corresponding to the extracted confidence value.
According to claim 8,

The step of extracting the features of the object,

The box value is calculated using the x-coordinate value, the y-coordinate value included in the location information of the object, the width of the bounding box of the image size, and the height of the bounding box of the image size,

Vehicle camera occlusion classification method in which the box value is calculated by the following equation:

where BOX is the box value, V size is the size of the box, V ratio is the aspect ratio of the box, w 0 is the horizontal pixel of the detection box, w t is the horizontal pixel of the image, h 0 is the vertical pixel of the detection box, h t is the image Vertical pixel, axis min is the minimum value among horizontal or vertical pixels of the detection box, axis max is the maximum value among horizontal or vertical pixels of the detection box, λ size is a size adjustment constant parameter, and λ ratio is a ratio adjustment constant parameter.
According to claim 6,

The step of determining whether the camera is blocked,

A vehicle camera occlusion classification method for determining whether the camera is occluded according to a class classification result included in the calculation result using a softmax function.