WO2020032506A1 - Système de détection de vision et procédé de détection de vision l'utilisant - Google Patents
Système de détection de vision et procédé de détection de vision l'utilisant Download PDFInfo
- Publication number
- WO2020032506A1 WO2020032506A1 PCT/KR2019/009734 KR2019009734W WO2020032506A1 WO 2020032506 A1 WO2020032506 A1 WO 2020032506A1 KR 2019009734 W KR2019009734 W KR 2019009734W WO 2020032506 A1 WO2020032506 A1 WO 2020032506A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- feature map
- neural network
- visual
- visual sensing
- Prior art date
Links
- 238000001514 detection method Methods 0.000 title claims description 75
- 230000000007 visual effect Effects 0.000 claims abstract description 185
- 230000002123 temporal effect Effects 0.000 claims abstract description 63
- 238000013528 artificial neural network Methods 0.000 claims abstract description 60
- 230000004927 fusion Effects 0.000 claims abstract description 19
- 238000000034 method Methods 0.000 claims description 54
- 230000000306 recurrent effect Effects 0.000 claims description 10
- 238000011176 pooling Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 8
- 238000005070 sampling Methods 0.000 claims description 8
- 238000013527 convolutional neural network Methods 0.000 description 12
- 238000000605 extraction Methods 0.000 description 10
- 239000000779 smoke Substances 0.000 description 8
- 238000004422 calculation algorithm Methods 0.000 description 7
- 238000013135 deep learning Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 7
- 238000011161 development Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000013145 classification model Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- 235000019506 cigar Nutrition 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000001151 other effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/10—Image acquisition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/24—Aligning, centring, orientation detection or correction of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/98—Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns
- G06V10/993—Evaluation of the quality of the acquired pattern
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Definitions
- the present invention relates to a visual sensing system and a visual sensing method using the same, and more particularly, in order to increase the object recognition rate through a deep learning object sensing model (eg, R-CNN, YOLO, SSD, etc.),
- a visual sensing system that fuses temporal visual sensing results using a deep learning model (eg, MLP, LSTM, GRU, etc.) to temporal information, thereby reducing false recognition in the image and improving accuracy, and a visual sensing method using the same.
- a deep learning model eg, MLP, LSTM, GRU, etc.
- Deep learning is defined as a set of algorithms that attempts a high level of abstraction (summarizing key content or functionality in a large amount of data) through a combination of several nonlinear transform techniques, and object detection using images. And recognition, classification and speech recognition, and natural language processing.
- object detection is a technology that has been dealt with a lot in the field of computer vision.
- various researches have been conducted to increase image recognition and detection performance to human vision level. have.
- deep learning technologies such as convolutional neural networks, misperceptions still exist and development of technologies to reduce such misperceptions is urgently needed.
- FIG. 1 is an exemplary view illustrating a state of recognizing an object in an image using a conventional visual sensing method. Even though there are two pedestrians as shown in (a) of FIG. 1, only one person is recognized as a pedestrian, and even when the license plate is detected as shown in FIG. It can also be mistaken for a license plate. In addition, even when a fire occurs near the tree on the left side of the road as shown in FIG. 1C, the fire detection is not performed at all, and another part is recognized.
- the existing visual sensing method is difficult to consider various environments (ex. Weather, places, etc.), and uses spatial feature information within a single image, and has not developed a technology for object recognition according to analysis over time. It is true.
- the present invention has been made in view of the above-described problems, and an object of the present invention is to improve the accuracy of object recognition by allowing spatial and temporal visual sensing to be performed at the same time.
- both the spatial and temporal visual sensing are configured to share the feature extraction step, thereby reducing the amount of computation for feature extraction and reducing the execution time.
- a visual sensing system for detecting an object in an input image may be provided.
- the visual sensing system uses a feature extractor to generate a feature map of an image using a neural network, and uses the generated feature map to determine the location and name of an object in the image.
- the fusion decision unit may be included to determine the object detection result in the image based on the result.
- a plurality of convolution layers for generating a feature map of an image and a pooling layer for sampling the feature map are included in the neural network of the feature extractor. ) May be included.
- the spatial visual sensing unit of the visual sensing system may include a fully connected layer including at least one hidden layer.
- the apparatus further includes a storage unit for storing a feature map generated by the feature extractor of the visual sensing system according to an embodiment of the present invention, wherein the temporal visual sensor is an object in the image according to a temporal flow using a recurrent neural network.
- the recursive neural network may be learned based on the feature map stored in the storage unit.
- the neural network may be input by learning an image on which screen flip, scaling, and rotation processing is performed.
- a visual sensing method using a visual sensing system may be provided.
- the step of inputting an image generating a feature map of the image using a neural network, a position of an object in the image using the feature map, The step of estimating a name, the step of determining the object in the image based on the feature map, the step of detecting the object in the image based on the estimation result of the step and the detection result of the step based on the time sequence.
- the step of estimating the position and name of the object in the image using the feature map and the step of detecting the object in the image in chronological order based on the feature map may be simultaneously performed. Can be.
- a neural network of a visual sensing method includes a plurality of convolution layers for generating a feature map of an image and a pooling layer for sampling a feature map. Can be.
- the step of estimating the position and name of the object in the image using the feature map uses a fully connected layer composed of at least one hidden layer. Can be performed.
- the visual sensing method further includes the step of storing the feature map generated in the step of generating a feature map of the image using a neural network, in a storage unit of the visual sensing system, based on the feature map. Therefore, the step of detecting the object in the image according to the time sequence is performed using a recurrent neural network, and the recursive neural network is trained based on the stored feature map in the step of generating the feature map of the image using the neural network. Can be.
- the neural network may be trained by inputting an image on which screen flip, scaling, and rotation processing is performed.
- a computer-readable recording medium having recorded thereon a program for implementing the above method may be provided.
- the present invention by performing spatial and temporal visual sensing at the same time, it is possible to reduce the misperception in the object recognition in the image.
- both spatial and temporal visual sensing are configured to share the feature extraction step, it is possible to reduce the calculation amount for the feature extraction and reduce the execution time.
- FIG. 1 is an exemplary view illustrating a state of recognizing an object in an image using a conventional visual sensing method.
- FIG. 2 is a block diagram illustrating a visual sensing system according to an exemplary embodiment.
- 3 and 4 are exemplary views illustrating a visual sensing system according to an exemplary embodiment.
- FIG 5 is an exemplary view showing a fire detection result using the visual detection system according to an embodiment of the present invention.
- 6A to 6C are exemplary diagrams illustrating data for learning a visual sensing system according to an exemplary embodiment.
- FIG. 7A and 7B are exemplary views illustrating a forest fire detection result using a visual sensing system according to an exemplary embodiment.
- FIGS. 8 to 10 are flowcharts illustrating a visual sensing method using a visual sensing system according to an exemplary embodiment.
- a visual sensing system for detecting an object in an input image may be provided.
- the visual sensing system uses a feature extractor to generate a feature map of an image using a neural network, and uses the generated feature map to determine the location and name of an object in the image.
- the fusion decision unit may be included to determine the object detection result in the image based on the result.
- a plurality of convolution layers for generating a feature map of an image and a pooling layer for sampling the feature map are included in the neural network of the feature extractor. ) May be included.
- the spatial visual sensing unit of the visual sensing system may include a fully connected layer including at least one hidden layer.
- the apparatus further includes a storage unit for storing a feature map generated by the feature extractor of the visual sensing system according to an embodiment of the present invention, wherein the temporal visual sensor is an object in the image according to a temporal flow using a recurrent neural network.
- the recursive neural network may be learned based on the feature map stored in the storage unit.
- the neural network may be input by learning an image on which screen flip, scaling, and rotation processing is performed.
- a visual sensing method using a visual sensing system may be provided.
- the step of inputting an image generating a feature map of the image using a neural network, a position of an object in the image using the feature map, The step of estimating a name, the step of determining the object in the image based on the feature map, the step of detecting the object in the image based on the estimation result of the step and the detection result of the step based on the time sequence.
- the step of estimating the position and name of the object in the image using the feature map and the step of detecting the object in the image in chronological order based on the feature map may be simultaneously performed. Can be.
- a neural network of a visual sensing method includes a plurality of convolution layers for generating a feature map of an image and a pooling layer for sampling a feature map. Can be.
- the step of estimating the position and name of the object in the image using the feature map uses a fully connected layer composed of at least one hidden layer. Can be performed.
- the visual sensing method further includes the step of storing the feature map generated in the step of generating a feature map of the image using a neural network, in a storage unit of the visual sensing system, based on the feature map. Therefore, the step of detecting the object in the image according to the time sequence is performed using a recurrent neural network, and the recursive neural network is trained based on the stored feature map in the step of generating the feature map of the image using the neural network. Can be.
- the neural network may be trained by inputting an image on which screen flip, scaling, and rotation processing is performed.
- a computer-readable recording medium having recorded thereon a program for implementing the above method may be provided.
- any part of the specification is to “include” any component, this means that it may further include other components, except to exclude other components unless specifically stated otherwise.
- the terms “... unit”, “module”, etc. described in the specification mean a unit that processes at least one function or operation, which may be implemented by hardware or software or a combination of hardware and software.
- a part of the specification is “connected” to another part, this includes not only “directly connected”, but also “connected with other elements in the middle”. .
- FIG. 1 is an exemplary view illustrating a state of recognizing an object in the image 10 using a conventional visual sensing method.
- various techniques for recognizing and classifying objects in an input image have been developed.
- detection performance is being raised to the level of human vision due to techniques for recognizing objects in images using deep learning algorithms (eg, convolutional neural networks).
- deep learning algorithms eg, convolutional neural networks
- the conventional visual sensing method is difficult to consider various environments (ex. Weather, places, etc.), and uses the spatial feature information within a single image 10, and the development of a technique for object recognition according to analysis over time is difficult. It is not done.
- FIG. 2 is a block diagram illustrating a visual sensing system according to an exemplary embodiment.
- the feature extractor 100 generating a feature map of the image 10 by using a neural network, the generated feature Spatial visual sensor 200 for estimating the position and name of the object in the image 10 using the map, temporal visual for detecting the object in the image 10 in chronological order based on the generated feature map
- a fusion determination unit for determining an object detection result in the image 10 based on the estimated result from the sensor 300 and the spatial visual sensor 200 and the detected result from the temporal visual sensor 300 ( 400).
- the feature extractor 100 of the visual sensing system may generate a feature map of the input image 10 using various neural network models.
- the neural network when using a convolutional neural network (CNN), the neural network includes a plurality of convolution layers for sampling a feature map and a feature map for generating a feature map of the image 10. Pooling layers may be included.
- CNN convolutional neural network
- the feature map generated by the feature extractor 100 may be shared by the spatial visual sensor 200 and the temporal visual sensor 300 as described below. That is, in the visual sensing system of the present invention, the estimation of the object in the image 10 by the spatial visual sensor 200 and the detection of the object in the image 10 by the temporal visual sensor 300 are performed simultaneously. Can be.
- the step of generating a feature map in the feature extractor 100 for object detection at the same time by the spatial visual sensor 200 and the temporal visual sensor 300 is the spatial visual sensor 200 and the temporal visual sensor. 300 can be shared to all. As described above, by allowing the feature map to be shared between the spatial visual sensor 200 and the temporal visual sensor 300, the calculation amount for feature extraction in the image 10 may be reduced, and the execution time may be reduced.
- the position and name of the object are estimated in the input image 10 based on the feature map generated by the feature extractor 100. Can be.
- the spatial visual sensing unit 200 of the present invention includes a fully connected layer composed of at least one hidden layer, thereby convolving together with the feature extraction unit 100 of the present invention.
- Neural networks may be configured.
- the neural network of the feature extractor 100 since the neural network of the feature extractor 100 has a feature shared by the spatial visual sensor 200 and the temporal visual sensor 300, the convolution layers and the pooling layer of the feature extractor 100 are used. By combining the fully connected layer of the spatial visual sensing unit 200, an object of the input single image 10 may be recognized.
- the spatial visual sensing unit 200 is combined with the feature extraction unit 100 to apply a convolutional neural network (CNN) model, as well as a spatial object detection model (R-CNN) and Faster R-.
- CNN convolutional neural network
- R-CNN spatial object detection model
- YOLO You Only Look Once
- machine learning models can be applied in various ways depending on the user's purpose of use.
- objects in the image 10 may be detected in a time sequence based on the feature map generated by the feature extractor 100.
- the object detection in the temporal visual sensor 300 is performed separately from the detection result of the spatial visual sensor 200 described above, and the spatial visual sensor 200 cannot estimate the position or name of the object. Even in this case, the object may be detected by the temporal visual sensor 300. As a result, the spatial visual sensing unit 200 and the temporal visual sensing unit 300 may complement each other to recognize the object of the input image 10.
- the temporal visual sensor 300 may detect objects in the image 10 according to a time sequence in consideration of all sequence inputs.
- Various temporal visual sensor algorithms 310 may be used for this purpose.
- the temporal visual sensing algorithm 310 may include a multi-layer perceptron (MLP), a recurrent neural network (RNN), a long-short term memory (LSTM), and the like.
- MLP multi-layer perceptron
- RNN recurrent neural network
- LSTM long-short term memory
- the accuracy of object detection may be improved using algorithms such as voting and ensemble based on the results classified by the object detection classification model using the RNN and LSTM.
- the feature map generated by the feature extractor 100 is shared by the spatial visual sensor 200 and the temporal visual sensor 300 so that the spatial visual sensor 200 and Object detection in the image 10 is performed by each of the temporal visual detection units 300, and the detection result may be comprehensively determined by the fusion determination unit 400.
- the fusion determination unit 400 of the visual sensing system when the detection result of the spatial visual sensing unit 200 and the temporal visual sensing unit 300 is undetected, the fusion result is referred to as 'not'. Alarm, and when the detection result of only one of the spatial visual sensor 200 and the temporal visual sensor 300 is detected, the fusion result may be determined as 'caution'. In addition, when the result of being detected by both the spatial visual sensing unit 200 and the temporal visual sensing unit 300, the fusion decision unit 400 may be determined as an 'alarm'. In the determination result of the fusion decision unit 400 of the present invention, 'unalarmed' means that no object (Ex.
- Pedestrian, fire, etc. is detected in the input image 10, and 'attention' means spatial vision. This means that the object in the image 10 inputted as the detection result and the temporal visual detection result is different, or that the judgment is ambiguous, which means that the judgment hold or trial is required.
- 'Alarm' means that both the spatial and temporal visual sensing results are detected, and that objects such as pedestrians and fires are detected.
- the fusion decision unit 400 of the present invention fuses and judges based on the detection result of the spatial visual sensing unit 200 and the temporal visual sensing unit 300. Can be arranged together.
- the predicted threshold of the spatial visual sensing unit 200 is determined. It can be readjusted and judged again.
- the prediction threshold is a combination of the feature extraction unit 100 and the spatial visual sensing unit 200 of the present invention when applied to the CNN, R-CNN, etc., a number of nodes or hidden layers of the neural network. It may correspond to one of the weighting parameters.
- FIG. 3 and 4 are exemplary views illustrating a visual sensing system according to an exemplary embodiment.
- a process of detecting a fire from the input images 10 by applying a visual sensing system according to an embodiment of the present disclosure is illustrated for detecting a fire.
- images 10 according to a time sequence are input, and each image 10 may have a feature map generated by the feature extractor 100 of the present invention.
- the feature map generated by the feature extractor 100 is shared and input to both the spatial visual sensor 200 and the temporal visual sensor 300 so that each of the spatial visual sensor 200 and the temporal visual sensor 300 is input.
- Object detection can be made.
- a multilayer perceptron (MLP) algorithm is used as a visual sensing algorithm model used in the temporal visual sensing unit 300.
- the detection result may be determined based on the detection result of the spatial visual sensor 200 and the detection result of the temporal visual sensor 300.
- a feature map is formed through a neural network of the feature extractor 100 to detect or recognize an object in the input image 10.
- the object detection or recognition may mean recognizing an area recognized as an object in a given image 10 as one of a plurality of classes classified in advance. For example, in the case of fire detection, a flame, flame, smoke, or the like may be the object of an object in the input image 10.
- Such object detection or recognition may be performed through machine learning or deep learning.
- the plurality of classes are input to the input image 10. Which of these classes can be determined.
- the image 10 input by the visual sensing system may be an image on which screen flip, scaling, and rotation processing are performed. That is, the above screen flip, scaling, and rotation processes are performed on the input image, and various environments are considered, thereby making it possible to perform robust detection using the visual sensing system of the present invention.
- the data set for neural network learning of the feature extractor 100 may further include pre-processed images 10 such as screen switching, scaling, or rotation.
- pre-processed images 10 such as screen switching, scaling, or rotation.
- the flip corresponds to varying the learning result through screen switching such as left and right or upside down in the image 10 provided with the existing data set.
- the scaling corresponds to adjusting the size of the existing image 10 and the scale rate may be set in various ways.
- the rotation may be included in the data set as the input screen is rotated at various angles.
- Exemplary contents regarding an input data generation method for generating a feature map in the feature extractor 100 by varying the data set may be summarized as shown in Table 2 below.
- a storage unit for storing the feature map generated by the feature extraction unit 100 of the visual detection system according to an embodiment of the present invention, the temporal visual detection unit 300 by using a recurrent neural network (recurrent neural network) Objects in the image 10 are detected as time passes, and the recursive neural network can be learned based on the feature map stored in the storage unit.
- recurrent neural network recurrent neural network
- the temporal visual sensor 300 stores feature maps learned through a neural network of the feature extractor 100 in a storage unit, and a recursive neural network uses the learned feature maps.
- the neural network of the feature extractor 100 may be learned separately. That is, apart from the execution of the visual sensing system according to an embodiment of the present invention, in relation to learning, the recursion of the temporal visual sensing unit 300 is performed on the neural network structure used for neural network learning of the feature extractor 100. It can also be used jointly in neural networks.
- the visual sensing system of the present invention can be applied to various fields (eg, vehicle license plate detection, pedestrian detection, CCTV monitoring, defective product inspection or fire detection) for detecting an object in the input image 10.
- fields eg, vehicle license plate detection, pedestrian detection, CCTV monitoring, defective product inspection or fire detection
- FIG. 5 is an exemplary view illustrating a fire detection result using a visual detection system.
- FIG. 5 (a) shows a state in which a fire is detected only through temporal visual sensing
- FIG. 5 (b) shows spatial and temporal visual sensing according to the visual sensing system according to an embodiment of the present invention. It is an exemplary figure which shows the state made.
- the flame or flame is not seen at the top of the vehicle and only black-gray smoke is observed.
- a fire is detected at the upper left corner.
- the temporal visual detection result 301 determined to have been shown is shown.
- Fig. 5 (b) shows the appearance of the vehicle is burning, the flame or flame is represented by a blue box, the spatial visual detection result 201 is to detect the fire.
- the temporal visual detection result also detects a fire.
- the fusion decision unit 400 of the present invention may make a determination of 'fire alarm' on the basis of the spatial visual and temporal visual sensing results with respect to FIG.
- FIGS. 6A to 6C are exemplary diagrams illustrating data for learning a visual sensing system according to an exemplary embodiment. Learning of the visual sensing system of the present invention can be made using the above-described dataset (train / validation / test dataset). When applied to fire monitoring, various smoke or fire images as shown in FIGS. 6A to 6C are illustrated. 10 can be utilized as the data set.
- FIG. 7A and 7B are exemplary views illustrating a forest fire detection result using a visual sensing system according to an exemplary embodiment.
- FIG. 7A there is shown an image 10 in which clouds appear on the mountainside.
- the spatial visual sensing unit 200 of the present invention detects that the smoke is a smoke area is shown in Figure 7a through the blue box 201.
- the temporal visual detection unit 300 of the present invention determines that the detection result is 'not detected' according to the time sequence, and FIG. 7A shows a state in which it is determined that the fusion result 401 is not detected.
- Each of the results of the detection of the spatial visual sensing unit 200 and the temporal visual sensing unit 300 may be considered, and thus may be determined as 'caution' in the fusion determination unit 400.
- FIG. 7B an image 10 in which a fire occurs in a mountain building is shown.
- the spatial visual detection unit 200 of the present invention the fire and the smoke are sensed, and the area in which the fire is generated is shown in FIG.
- the temporal visual sensor 300 of the present invention determines the sensing result as 'detection' according to the time sequence, and FIG. 7B shows a state determined as sensing the fusion result 401. Accordingly, in the fusion determination unit 400, both the spatial and temporal visual sensing were judged as sensing, and thus the sensing results may be determined as 'fire alarm' by fusing the results.
- FIGS. 8 to 10 are flowcharts illustrating a visual sensing method using a visual sensing system according to an exemplary embodiment.
- an image 10 is input, and a feature map of the image 10 is generated using a neural network. Estimating the position and name of the object in the image 10 using the feature map, detecting the result of the step and the step of detecting the object in the image 10 in chronological order based on the feature map Based on the result, the step of determining the object detection result in the image 10 may be included.
- the position and the name of the object in the image 10 are estimated by using the feature map, and the image is arranged in time order based on the feature map. 10)
- the step of detecting the object in the can be performed at the same time.
- the spatial and temporal visual sensing can be used in the object detection (detection) by combining the results obtained in parallel.
- the result of the spatial visual sensing for example, the feature extracted from the image 10
- the temporal visual sensing process by applying the result of the spatial visual sensing (for example, the feature extracted from the image 10) to the temporal visual sensing process, the calculation amount for the feature extraction of the image 10 is not increased. Rather, there is an advantage that the time required for object detection (detection) can be greatly reduced.
- a neural network of a visual sensing method includes a plurality of convolution layers for generating a feature map of an image 10 and a pooling layer for sampling a feature map. ) May be included.
- the step of estimating the position and name of the object in the image 10 using the feature map may include a fully connected layer including at least one hidden layer. ) May be performed using
- the visual sensing method further includes the step of storing the feature map generated in the step of generating a feature map of the image 10 using a neural network, in a storage unit of the visual sensing system, and the feature map
- the step of detecting an object in the image 10 based on a time sequence is performed by using a recurrent neural network, wherein the feature map of the image 10 is generated by using a neural network. Can be learned based on the stored feature map.
- the neural network may be input by learning an image on which screen flip, scaling, and rotation processing is performed.
- a computer-readable recording medium having recorded thereon a program for implementing the above method may be provided.
- the above-described method may be written as a program executable in a computer, and may be implemented in a general-purpose digital computer operating the program using a computer readable medium.
- the structure of the data used in the above-described method can be recorded on the computer-readable medium through various means.
- a recording medium for recording an executable computer program or code for performing various methods of the present invention should not be understood to include temporary objects, such as carrier waves or signals.
- the computer readable medium may include a storage medium such as a magnetic storage medium (eg, a ROM, a floppy disk, a hard disk, etc.), an optical reading medium (eg, a CD-ROM, a DVD, etc.).
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Quality & Reliability (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
La présente invention concerne, selon un mode de réalisation, un système de détection visuelle qui peut comprendre : une unité d'extraction de caractéristiques pour générer une carte de caractéristiques d'une image à l'aide d'un réseau neuronal ; une unité de détection visuelle spatiale pour estimer l'emplacement et le nom d'objets dans l'image à l'aide de la carte de caractéristiques générée ; une unité de détection visuelle temporelle pour détecter les objets dans l'image en ordre chronologique sur la base de la carte de caractéristiques générée ; et une unité de détermination de fusion pour déterminer un résultat de détection des objets dans l'image sur la base du résultat estimé à partir de l'unité de détection visuelle spatiale et du résultat détecté à partir de l'unité de détection visuelle temporelle.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2018-0091880 | 2018-08-07 | ||
KR1020180091880A KR102104548B1 (ko) | 2018-08-07 | 2018-08-07 | 시각 감지 시스템 및 이를 이용한 시각 감지 방법 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020032506A1 true WO2020032506A1 (fr) | 2020-02-13 |
Family
ID=69414942
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2019/009734 WO2020032506A1 (fr) | 2018-08-07 | 2019-08-05 | Système de détection de vision et procédé de détection de vision l'utilisant |
Country Status (2)
Country | Link |
---|---|
KR (1) | KR102104548B1 (fr) |
WO (1) | WO2020032506A1 (fr) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111680632A (zh) * | 2020-06-10 | 2020-09-18 | 深延科技(北京)有限公司 | 基于深度学习卷积神经网络的烟火检测方法及系统 |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102161516B1 (ko) * | 2020-04-27 | 2020-10-05 | 주식회사 펜타게이트 | 딥러닝 기반의 데이터 확장학습을 이용한 차종 분류 방법 |
KR102238401B1 (ko) * | 2020-09-14 | 2021-04-09 | 전주비전대학교산학협력단 | 데이터 확장 및 cnn을 이용한 차량번호 인식 방법 |
KR102369164B1 (ko) * | 2021-06-22 | 2022-03-02 | 주식회사 시큐웍스 | 기계 학습을 이용한 화재 및 보안 감시를 위한 장치 및 방법 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0644376A (ja) * | 1992-05-28 | 1994-02-18 | Matsushita Electric Ind Co Ltd | 画像特徴抽出装置、画像認識方法及び画像認識装置 |
KR20160091709A (ko) * | 2015-01-26 | 2016-08-03 | 창원대학교 산학협력단 | 시공간 블록 특징을 이용한 화재 검출 시스템 및 방법 |
KR20170038622A (ko) * | 2015-09-30 | 2017-04-07 | 삼성전자주식회사 | 영상으로부터 객체를 분할하는 방법 및 장치 |
KR101855057B1 (ko) * | 2018-01-11 | 2018-05-04 | 셔블 테크놀러지(주) | 화재 경보 시스템 및 방법 |
KR20180071947A (ko) * | 2016-12-20 | 2018-06-28 | 서울대학교산학협력단 | 영상 처리 장치 및 방법 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101415001B1 (ko) | 2012-10-30 | 2014-07-04 | 한국해양과학기술원 | 핵산 추출 또는 증폭을 위한 휴대용 장치 |
-
2018
- 2018-08-07 KR KR1020180091880A patent/KR102104548B1/ko active IP Right Grant
-
2019
- 2019-08-05 WO PCT/KR2019/009734 patent/WO2020032506A1/fr active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0644376A (ja) * | 1992-05-28 | 1994-02-18 | Matsushita Electric Ind Co Ltd | 画像特徴抽出装置、画像認識方法及び画像認識装置 |
KR20160091709A (ko) * | 2015-01-26 | 2016-08-03 | 창원대학교 산학협력단 | 시공간 블록 특징을 이용한 화재 검출 시스템 및 방법 |
KR20170038622A (ko) * | 2015-09-30 | 2017-04-07 | 삼성전자주식회사 | 영상으로부터 객체를 분할하는 방법 및 장치 |
KR20180071947A (ko) * | 2016-12-20 | 2018-06-28 | 서울대학교산학협력단 | 영상 처리 장치 및 방법 |
KR101855057B1 (ko) * | 2018-01-11 | 2018-05-04 | 셔블 테크놀러지(주) | 화재 경보 시스템 및 방법 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111680632A (zh) * | 2020-06-10 | 2020-09-18 | 深延科技(北京)有限公司 | 基于深度学习卷积神经网络的烟火检测方法及系统 |
Also Published As
Publication number | Publication date |
---|---|
KR102104548B1 (ko) | 2020-04-24 |
KR20200019280A (ko) | 2020-02-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020032506A1 (fr) | Système de détection de vision et procédé de détection de vision l'utilisant | |
CN108596277B (zh) | 一种车辆身份识别方法、装置和存储介质 | |
WO2019107614A1 (fr) | Procédé et système d'inspection de qualité basée sur la vision artificielle utilisant un apprentissage profond dans un processus de fabrication | |
WO2013048159A1 (fr) | Procédé, appareil et support d'enregistrement lisible par ordinateur pour détecter un emplacement d'un point de caractéristique de visage à l'aide d'un algorithme d'apprentissage adaboost | |
WO2020071701A1 (fr) | Procédé et dispositif de détection d'un objet en temps réel au moyen d'un modèle de réseau d'apprentissage profond | |
WO2017164478A1 (fr) | Procédé et appareil de reconnaissance de micro-expressions au moyen d'une analyse d'apprentissage profond d'une dynamique micro-faciale | |
KR101910542B1 (ko) | 객체 검출을 위한 영상분석 서버장치 및 방법 | |
WO2017131263A1 (fr) | Procédé de sélection d'instance hybride utilisant le point voisin le plus proche pour la prédiction de défauts inter-projets | |
JP2022506905A (ja) | 知覚システムを評価するシステム及び方法 | |
WO2016108327A1 (fr) | Procédé de détection de véhicule, structure de base de données pour la détection de véhicule, et procédé de construction de base de données pour détection de véhicule | |
WO2021020866A1 (fr) | Système et procédé d'analyse d'images pour surveillance à distance | |
WO2020246655A1 (fr) | Procédé de reconnaissance de situation et dispositif permettant de le mettre en œuvre | |
CN109344886B (zh) | 基于卷积神经网络的遮挡号牌判别方法 | |
Liu et al. | Anomaly detection in surveillance video using motion direction statistics | |
WO2021153861A1 (fr) | Procédé de détection de multiples objets et appareil associé | |
WO2021100919A1 (fr) | Procédé, programme et système pour déterminer si un comportement anormal se produit, sur la base d'une séquence de comportement | |
KR20180138558A (ko) | 객체 검출을 위한 영상분석 서버장치 및 방법 | |
WO2022055023A1 (fr) | Système de plateforme d'analyse d'image intelligent intégré ido capable de reconnaître des objets intelligents | |
Miller et al. | What’s in the black box? the false negative mechanisms inside object detectors | |
WO2022114895A1 (fr) | Système et procédé de fourniture de service de contenu personnalisé à l'aide d'informations d'image | |
CN109800684B (zh) | 一种视频中对象的确定方法及装置 | |
CN111144260A (zh) | 一种翻越闸机的检测方法、装置及系统 | |
CN111860100B (zh) | 行人数量的确定方法、装置、电子设备及可读存储介质 | |
WO2021125539A1 (fr) | Dispositif, procédé et programme informatique de classification d'objets présents dans une image | |
Mahareek et al. | Detecting anomalies in security cameras with 3DCNN and ConvLSTM |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19846200 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19846200 Country of ref document: EP Kind code of ref document: A1 |