WO2020032506A1 - Système de détection de vision et procédé de détection de vision l'utilisant - Google Patents

Système de détection de vision et procédé de détection de vision l'utilisant Download PDF

Info

Publication number
WO2020032506A1
WO2020032506A1 PCT/KR2019/009734 KR2019009734W WO2020032506A1 WO 2020032506 A1 WO2020032506 A1 WO 2020032506A1 KR 2019009734 W KR2019009734 W KR 2019009734W WO 2020032506 A1 WO2020032506 A1 WO 2020032506A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
feature map
neural network
visual
visual sensing
Prior art date
Application number
PCT/KR2019/009734
Other languages
English (en)
Korean (ko)
Inventor
이준환
김병준
Original Assignee
전북대학교산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 전북대학교산학협력단 filed Critical 전북대학교산학협력단
Publication of WO2020032506A1 publication Critical patent/WO2020032506A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/98Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns
    • G06V10/993Evaluation of the quality of the acquired pattern
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present invention relates to a visual sensing system and a visual sensing method using the same, and more particularly, in order to increase the object recognition rate through a deep learning object sensing model (eg, R-CNN, YOLO, SSD, etc.),
  • a visual sensing system that fuses temporal visual sensing results using a deep learning model (eg, MLP, LSTM, GRU, etc.) to temporal information, thereby reducing false recognition in the image and improving accuracy, and a visual sensing method using the same.
  • a deep learning model eg, MLP, LSTM, GRU, etc.
  • Deep learning is defined as a set of algorithms that attempts a high level of abstraction (summarizing key content or functionality in a large amount of data) through a combination of several nonlinear transform techniques, and object detection using images. And recognition, classification and speech recognition, and natural language processing.
  • object detection is a technology that has been dealt with a lot in the field of computer vision.
  • various researches have been conducted to increase image recognition and detection performance to human vision level. have.
  • deep learning technologies such as convolutional neural networks, misperceptions still exist and development of technologies to reduce such misperceptions is urgently needed.
  • FIG. 1 is an exemplary view illustrating a state of recognizing an object in an image using a conventional visual sensing method. Even though there are two pedestrians as shown in (a) of FIG. 1, only one person is recognized as a pedestrian, and even when the license plate is detected as shown in FIG. It can also be mistaken for a license plate. In addition, even when a fire occurs near the tree on the left side of the road as shown in FIG. 1C, the fire detection is not performed at all, and another part is recognized.
  • the existing visual sensing method is difficult to consider various environments (ex. Weather, places, etc.), and uses spatial feature information within a single image, and has not developed a technology for object recognition according to analysis over time. It is true.
  • the present invention has been made in view of the above-described problems, and an object of the present invention is to improve the accuracy of object recognition by allowing spatial and temporal visual sensing to be performed at the same time.
  • both the spatial and temporal visual sensing are configured to share the feature extraction step, thereby reducing the amount of computation for feature extraction and reducing the execution time.
  • a visual sensing system for detecting an object in an input image may be provided.
  • the visual sensing system uses a feature extractor to generate a feature map of an image using a neural network, and uses the generated feature map to determine the location and name of an object in the image.
  • the fusion decision unit may be included to determine the object detection result in the image based on the result.
  • a plurality of convolution layers for generating a feature map of an image and a pooling layer for sampling the feature map are included in the neural network of the feature extractor. ) May be included.
  • the spatial visual sensing unit of the visual sensing system may include a fully connected layer including at least one hidden layer.
  • the apparatus further includes a storage unit for storing a feature map generated by the feature extractor of the visual sensing system according to an embodiment of the present invention, wherein the temporal visual sensor is an object in the image according to a temporal flow using a recurrent neural network.
  • the recursive neural network may be learned based on the feature map stored in the storage unit.
  • the neural network may be input by learning an image on which screen flip, scaling, and rotation processing is performed.
  • a visual sensing method using a visual sensing system may be provided.
  • the step of inputting an image generating a feature map of the image using a neural network, a position of an object in the image using the feature map, The step of estimating a name, the step of determining the object in the image based on the feature map, the step of detecting the object in the image based on the estimation result of the step and the detection result of the step based on the time sequence.
  • the step of estimating the position and name of the object in the image using the feature map and the step of detecting the object in the image in chronological order based on the feature map may be simultaneously performed. Can be.
  • a neural network of a visual sensing method includes a plurality of convolution layers for generating a feature map of an image and a pooling layer for sampling a feature map. Can be.
  • the step of estimating the position and name of the object in the image using the feature map uses a fully connected layer composed of at least one hidden layer. Can be performed.
  • the visual sensing method further includes the step of storing the feature map generated in the step of generating a feature map of the image using a neural network, in a storage unit of the visual sensing system, based on the feature map. Therefore, the step of detecting the object in the image according to the time sequence is performed using a recurrent neural network, and the recursive neural network is trained based on the stored feature map in the step of generating the feature map of the image using the neural network. Can be.
  • the neural network may be trained by inputting an image on which screen flip, scaling, and rotation processing is performed.
  • a computer-readable recording medium having recorded thereon a program for implementing the above method may be provided.
  • the present invention by performing spatial and temporal visual sensing at the same time, it is possible to reduce the misperception in the object recognition in the image.
  • both spatial and temporal visual sensing are configured to share the feature extraction step, it is possible to reduce the calculation amount for the feature extraction and reduce the execution time.
  • FIG. 1 is an exemplary view illustrating a state of recognizing an object in an image using a conventional visual sensing method.
  • FIG. 2 is a block diagram illustrating a visual sensing system according to an exemplary embodiment.
  • 3 and 4 are exemplary views illustrating a visual sensing system according to an exemplary embodiment.
  • FIG 5 is an exemplary view showing a fire detection result using the visual detection system according to an embodiment of the present invention.
  • 6A to 6C are exemplary diagrams illustrating data for learning a visual sensing system according to an exemplary embodiment.
  • FIG. 7A and 7B are exemplary views illustrating a forest fire detection result using a visual sensing system according to an exemplary embodiment.
  • FIGS. 8 to 10 are flowcharts illustrating a visual sensing method using a visual sensing system according to an exemplary embodiment.
  • a visual sensing system for detecting an object in an input image may be provided.
  • the visual sensing system uses a feature extractor to generate a feature map of an image using a neural network, and uses the generated feature map to determine the location and name of an object in the image.
  • the fusion decision unit may be included to determine the object detection result in the image based on the result.
  • a plurality of convolution layers for generating a feature map of an image and a pooling layer for sampling the feature map are included in the neural network of the feature extractor. ) May be included.
  • the spatial visual sensing unit of the visual sensing system may include a fully connected layer including at least one hidden layer.
  • the apparatus further includes a storage unit for storing a feature map generated by the feature extractor of the visual sensing system according to an embodiment of the present invention, wherein the temporal visual sensor is an object in the image according to a temporal flow using a recurrent neural network.
  • the recursive neural network may be learned based on the feature map stored in the storage unit.
  • the neural network may be input by learning an image on which screen flip, scaling, and rotation processing is performed.
  • a visual sensing method using a visual sensing system may be provided.
  • the step of inputting an image generating a feature map of the image using a neural network, a position of an object in the image using the feature map, The step of estimating a name, the step of determining the object in the image based on the feature map, the step of detecting the object in the image based on the estimation result of the step and the detection result of the step based on the time sequence.
  • the step of estimating the position and name of the object in the image using the feature map and the step of detecting the object in the image in chronological order based on the feature map may be simultaneously performed. Can be.
  • a neural network of a visual sensing method includes a plurality of convolution layers for generating a feature map of an image and a pooling layer for sampling a feature map. Can be.
  • the step of estimating the position and name of the object in the image using the feature map uses a fully connected layer composed of at least one hidden layer. Can be performed.
  • the visual sensing method further includes the step of storing the feature map generated in the step of generating a feature map of the image using a neural network, in a storage unit of the visual sensing system, based on the feature map. Therefore, the step of detecting the object in the image according to the time sequence is performed using a recurrent neural network, and the recursive neural network is trained based on the stored feature map in the step of generating the feature map of the image using the neural network. Can be.
  • the neural network may be trained by inputting an image on which screen flip, scaling, and rotation processing is performed.
  • a computer-readable recording medium having recorded thereon a program for implementing the above method may be provided.
  • any part of the specification is to “include” any component, this means that it may further include other components, except to exclude other components unless specifically stated otherwise.
  • the terms “... unit”, “module”, etc. described in the specification mean a unit that processes at least one function or operation, which may be implemented by hardware or software or a combination of hardware and software.
  • a part of the specification is “connected” to another part, this includes not only “directly connected”, but also “connected with other elements in the middle”. .
  • FIG. 1 is an exemplary view illustrating a state of recognizing an object in the image 10 using a conventional visual sensing method.
  • various techniques for recognizing and classifying objects in an input image have been developed.
  • detection performance is being raised to the level of human vision due to techniques for recognizing objects in images using deep learning algorithms (eg, convolutional neural networks).
  • deep learning algorithms eg, convolutional neural networks
  • the conventional visual sensing method is difficult to consider various environments (ex. Weather, places, etc.), and uses the spatial feature information within a single image 10, and the development of a technique for object recognition according to analysis over time is difficult. It is not done.
  • FIG. 2 is a block diagram illustrating a visual sensing system according to an exemplary embodiment.
  • the feature extractor 100 generating a feature map of the image 10 by using a neural network, the generated feature Spatial visual sensor 200 for estimating the position and name of the object in the image 10 using the map, temporal visual for detecting the object in the image 10 in chronological order based on the generated feature map
  • a fusion determination unit for determining an object detection result in the image 10 based on the estimated result from the sensor 300 and the spatial visual sensor 200 and the detected result from the temporal visual sensor 300 ( 400).
  • the feature extractor 100 of the visual sensing system may generate a feature map of the input image 10 using various neural network models.
  • the neural network when using a convolutional neural network (CNN), the neural network includes a plurality of convolution layers for sampling a feature map and a feature map for generating a feature map of the image 10. Pooling layers may be included.
  • CNN convolutional neural network
  • the feature map generated by the feature extractor 100 may be shared by the spatial visual sensor 200 and the temporal visual sensor 300 as described below. That is, in the visual sensing system of the present invention, the estimation of the object in the image 10 by the spatial visual sensor 200 and the detection of the object in the image 10 by the temporal visual sensor 300 are performed simultaneously. Can be.
  • the step of generating a feature map in the feature extractor 100 for object detection at the same time by the spatial visual sensor 200 and the temporal visual sensor 300 is the spatial visual sensor 200 and the temporal visual sensor. 300 can be shared to all. As described above, by allowing the feature map to be shared between the spatial visual sensor 200 and the temporal visual sensor 300, the calculation amount for feature extraction in the image 10 may be reduced, and the execution time may be reduced.
  • the position and name of the object are estimated in the input image 10 based on the feature map generated by the feature extractor 100. Can be.
  • the spatial visual sensing unit 200 of the present invention includes a fully connected layer composed of at least one hidden layer, thereby convolving together with the feature extraction unit 100 of the present invention.
  • Neural networks may be configured.
  • the neural network of the feature extractor 100 since the neural network of the feature extractor 100 has a feature shared by the spatial visual sensor 200 and the temporal visual sensor 300, the convolution layers and the pooling layer of the feature extractor 100 are used. By combining the fully connected layer of the spatial visual sensing unit 200, an object of the input single image 10 may be recognized.
  • the spatial visual sensing unit 200 is combined with the feature extraction unit 100 to apply a convolutional neural network (CNN) model, as well as a spatial object detection model (R-CNN) and Faster R-.
  • CNN convolutional neural network
  • R-CNN spatial object detection model
  • YOLO You Only Look Once
  • machine learning models can be applied in various ways depending on the user's purpose of use.
  • objects in the image 10 may be detected in a time sequence based on the feature map generated by the feature extractor 100.
  • the object detection in the temporal visual sensor 300 is performed separately from the detection result of the spatial visual sensor 200 described above, and the spatial visual sensor 200 cannot estimate the position or name of the object. Even in this case, the object may be detected by the temporal visual sensor 300. As a result, the spatial visual sensing unit 200 and the temporal visual sensing unit 300 may complement each other to recognize the object of the input image 10.
  • the temporal visual sensor 300 may detect objects in the image 10 according to a time sequence in consideration of all sequence inputs.
  • Various temporal visual sensor algorithms 310 may be used for this purpose.
  • the temporal visual sensing algorithm 310 may include a multi-layer perceptron (MLP), a recurrent neural network (RNN), a long-short term memory (LSTM), and the like.
  • MLP multi-layer perceptron
  • RNN recurrent neural network
  • LSTM long-short term memory
  • the accuracy of object detection may be improved using algorithms such as voting and ensemble based on the results classified by the object detection classification model using the RNN and LSTM.
  • the feature map generated by the feature extractor 100 is shared by the spatial visual sensor 200 and the temporal visual sensor 300 so that the spatial visual sensor 200 and Object detection in the image 10 is performed by each of the temporal visual detection units 300, and the detection result may be comprehensively determined by the fusion determination unit 400.
  • the fusion determination unit 400 of the visual sensing system when the detection result of the spatial visual sensing unit 200 and the temporal visual sensing unit 300 is undetected, the fusion result is referred to as 'not'. Alarm, and when the detection result of only one of the spatial visual sensor 200 and the temporal visual sensor 300 is detected, the fusion result may be determined as 'caution'. In addition, when the result of being detected by both the spatial visual sensing unit 200 and the temporal visual sensing unit 300, the fusion decision unit 400 may be determined as an 'alarm'. In the determination result of the fusion decision unit 400 of the present invention, 'unalarmed' means that no object (Ex.
  • Pedestrian, fire, etc. is detected in the input image 10, and 'attention' means spatial vision. This means that the object in the image 10 inputted as the detection result and the temporal visual detection result is different, or that the judgment is ambiguous, which means that the judgment hold or trial is required.
  • 'Alarm' means that both the spatial and temporal visual sensing results are detected, and that objects such as pedestrians and fires are detected.
  • the fusion decision unit 400 of the present invention fuses and judges based on the detection result of the spatial visual sensing unit 200 and the temporal visual sensing unit 300. Can be arranged together.
  • the predicted threshold of the spatial visual sensing unit 200 is determined. It can be readjusted and judged again.
  • the prediction threshold is a combination of the feature extraction unit 100 and the spatial visual sensing unit 200 of the present invention when applied to the CNN, R-CNN, etc., a number of nodes or hidden layers of the neural network. It may correspond to one of the weighting parameters.
  • FIG. 3 and 4 are exemplary views illustrating a visual sensing system according to an exemplary embodiment.
  • a process of detecting a fire from the input images 10 by applying a visual sensing system according to an embodiment of the present disclosure is illustrated for detecting a fire.
  • images 10 according to a time sequence are input, and each image 10 may have a feature map generated by the feature extractor 100 of the present invention.
  • the feature map generated by the feature extractor 100 is shared and input to both the spatial visual sensor 200 and the temporal visual sensor 300 so that each of the spatial visual sensor 200 and the temporal visual sensor 300 is input.
  • Object detection can be made.
  • a multilayer perceptron (MLP) algorithm is used as a visual sensing algorithm model used in the temporal visual sensing unit 300.
  • the detection result may be determined based on the detection result of the spatial visual sensor 200 and the detection result of the temporal visual sensor 300.
  • a feature map is formed through a neural network of the feature extractor 100 to detect or recognize an object in the input image 10.
  • the object detection or recognition may mean recognizing an area recognized as an object in a given image 10 as one of a plurality of classes classified in advance. For example, in the case of fire detection, a flame, flame, smoke, or the like may be the object of an object in the input image 10.
  • Such object detection or recognition may be performed through machine learning or deep learning.
  • the plurality of classes are input to the input image 10. Which of these classes can be determined.
  • the image 10 input by the visual sensing system may be an image on which screen flip, scaling, and rotation processing are performed. That is, the above screen flip, scaling, and rotation processes are performed on the input image, and various environments are considered, thereby making it possible to perform robust detection using the visual sensing system of the present invention.
  • the data set for neural network learning of the feature extractor 100 may further include pre-processed images 10 such as screen switching, scaling, or rotation.
  • pre-processed images 10 such as screen switching, scaling, or rotation.
  • the flip corresponds to varying the learning result through screen switching such as left and right or upside down in the image 10 provided with the existing data set.
  • the scaling corresponds to adjusting the size of the existing image 10 and the scale rate may be set in various ways.
  • the rotation may be included in the data set as the input screen is rotated at various angles.
  • Exemplary contents regarding an input data generation method for generating a feature map in the feature extractor 100 by varying the data set may be summarized as shown in Table 2 below.
  • a storage unit for storing the feature map generated by the feature extraction unit 100 of the visual detection system according to an embodiment of the present invention, the temporal visual detection unit 300 by using a recurrent neural network (recurrent neural network) Objects in the image 10 are detected as time passes, and the recursive neural network can be learned based on the feature map stored in the storage unit.
  • recurrent neural network recurrent neural network
  • the temporal visual sensor 300 stores feature maps learned through a neural network of the feature extractor 100 in a storage unit, and a recursive neural network uses the learned feature maps.
  • the neural network of the feature extractor 100 may be learned separately. That is, apart from the execution of the visual sensing system according to an embodiment of the present invention, in relation to learning, the recursion of the temporal visual sensing unit 300 is performed on the neural network structure used for neural network learning of the feature extractor 100. It can also be used jointly in neural networks.
  • the visual sensing system of the present invention can be applied to various fields (eg, vehicle license plate detection, pedestrian detection, CCTV monitoring, defective product inspection or fire detection) for detecting an object in the input image 10.
  • fields eg, vehicle license plate detection, pedestrian detection, CCTV monitoring, defective product inspection or fire detection
  • FIG. 5 is an exemplary view illustrating a fire detection result using a visual detection system.
  • FIG. 5 (a) shows a state in which a fire is detected only through temporal visual sensing
  • FIG. 5 (b) shows spatial and temporal visual sensing according to the visual sensing system according to an embodiment of the present invention. It is an exemplary figure which shows the state made.
  • the flame or flame is not seen at the top of the vehicle and only black-gray smoke is observed.
  • a fire is detected at the upper left corner.
  • the temporal visual detection result 301 determined to have been shown is shown.
  • Fig. 5 (b) shows the appearance of the vehicle is burning, the flame or flame is represented by a blue box, the spatial visual detection result 201 is to detect the fire.
  • the temporal visual detection result also detects a fire.
  • the fusion decision unit 400 of the present invention may make a determination of 'fire alarm' on the basis of the spatial visual and temporal visual sensing results with respect to FIG.
  • FIGS. 6A to 6C are exemplary diagrams illustrating data for learning a visual sensing system according to an exemplary embodiment. Learning of the visual sensing system of the present invention can be made using the above-described dataset (train / validation / test dataset). When applied to fire monitoring, various smoke or fire images as shown in FIGS. 6A to 6C are illustrated. 10 can be utilized as the data set.
  • FIG. 7A and 7B are exemplary views illustrating a forest fire detection result using a visual sensing system according to an exemplary embodiment.
  • FIG. 7A there is shown an image 10 in which clouds appear on the mountainside.
  • the spatial visual sensing unit 200 of the present invention detects that the smoke is a smoke area is shown in Figure 7a through the blue box 201.
  • the temporal visual detection unit 300 of the present invention determines that the detection result is 'not detected' according to the time sequence, and FIG. 7A shows a state in which it is determined that the fusion result 401 is not detected.
  • Each of the results of the detection of the spatial visual sensing unit 200 and the temporal visual sensing unit 300 may be considered, and thus may be determined as 'caution' in the fusion determination unit 400.
  • FIG. 7B an image 10 in which a fire occurs in a mountain building is shown.
  • the spatial visual detection unit 200 of the present invention the fire and the smoke are sensed, and the area in which the fire is generated is shown in FIG.
  • the temporal visual sensor 300 of the present invention determines the sensing result as 'detection' according to the time sequence, and FIG. 7B shows a state determined as sensing the fusion result 401. Accordingly, in the fusion determination unit 400, both the spatial and temporal visual sensing were judged as sensing, and thus the sensing results may be determined as 'fire alarm' by fusing the results.
  • FIGS. 8 to 10 are flowcharts illustrating a visual sensing method using a visual sensing system according to an exemplary embodiment.
  • an image 10 is input, and a feature map of the image 10 is generated using a neural network. Estimating the position and name of the object in the image 10 using the feature map, detecting the result of the step and the step of detecting the object in the image 10 in chronological order based on the feature map Based on the result, the step of determining the object detection result in the image 10 may be included.
  • the position and the name of the object in the image 10 are estimated by using the feature map, and the image is arranged in time order based on the feature map. 10)
  • the step of detecting the object in the can be performed at the same time.
  • the spatial and temporal visual sensing can be used in the object detection (detection) by combining the results obtained in parallel.
  • the result of the spatial visual sensing for example, the feature extracted from the image 10
  • the temporal visual sensing process by applying the result of the spatial visual sensing (for example, the feature extracted from the image 10) to the temporal visual sensing process, the calculation amount for the feature extraction of the image 10 is not increased. Rather, there is an advantage that the time required for object detection (detection) can be greatly reduced.
  • a neural network of a visual sensing method includes a plurality of convolution layers for generating a feature map of an image 10 and a pooling layer for sampling a feature map. ) May be included.
  • the step of estimating the position and name of the object in the image 10 using the feature map may include a fully connected layer including at least one hidden layer. ) May be performed using
  • the visual sensing method further includes the step of storing the feature map generated in the step of generating a feature map of the image 10 using a neural network, in a storage unit of the visual sensing system, and the feature map
  • the step of detecting an object in the image 10 based on a time sequence is performed by using a recurrent neural network, wherein the feature map of the image 10 is generated by using a neural network. Can be learned based on the stored feature map.
  • the neural network may be input by learning an image on which screen flip, scaling, and rotation processing is performed.
  • a computer-readable recording medium having recorded thereon a program for implementing the above method may be provided.
  • the above-described method may be written as a program executable in a computer, and may be implemented in a general-purpose digital computer operating the program using a computer readable medium.
  • the structure of the data used in the above-described method can be recorded on the computer-readable medium through various means.
  • a recording medium for recording an executable computer program or code for performing various methods of the present invention should not be understood to include temporary objects, such as carrier waves or signals.
  • the computer readable medium may include a storage medium such as a magnetic storage medium (eg, a ROM, a floppy disk, a hard disk, etc.), an optical reading medium (eg, a CD-ROM, a DVD, etc.).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Quality & Reliability (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

La présente invention concerne, selon un mode de réalisation, un système de détection visuelle qui peut comprendre : une unité d'extraction de caractéristiques pour générer une carte de caractéristiques d'une image à l'aide d'un réseau neuronal ; une unité de détection visuelle spatiale pour estimer l'emplacement et le nom d'objets dans l'image à l'aide de la carte de caractéristiques générée ; une unité de détection visuelle temporelle pour détecter les objets dans l'image en ordre chronologique sur la base de la carte de caractéristiques générée ; et une unité de détermination de fusion pour déterminer un résultat de détection des objets dans l'image sur la base du résultat estimé à partir de l'unité de détection visuelle spatiale et du résultat détecté à partir de l'unité de détection visuelle temporelle.
PCT/KR2019/009734 2018-08-07 2019-08-05 Système de détection de vision et procédé de détection de vision l'utilisant WO2020032506A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2018-0091880 2018-08-07
KR1020180091880A KR102104548B1 (ko) 2018-08-07 2018-08-07 시각 감지 시스템 및 이를 이용한 시각 감지 방법

Publications (1)

Publication Number Publication Date
WO2020032506A1 true WO2020032506A1 (fr) 2020-02-13

Family

ID=69414942

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2019/009734 WO2020032506A1 (fr) 2018-08-07 2019-08-05 Système de détection de vision et procédé de détection de vision l'utilisant

Country Status (2)

Country Link
KR (1) KR102104548B1 (fr)
WO (1) WO2020032506A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111680632A (zh) * 2020-06-10 2020-09-18 深延科技(北京)有限公司 基于深度学习卷积神经网络的烟火检测方法及系统

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102161516B1 (ko) * 2020-04-27 2020-10-05 주식회사 펜타게이트 딥러닝 기반의 데이터 확장학습을 이용한 차종 분류 방법
KR102238401B1 (ko) * 2020-09-14 2021-04-09 전주비전대학교산학협력단 데이터 확장 및 cnn을 이용한 차량번호 인식 방법
KR102369164B1 (ko) * 2021-06-22 2022-03-02 주식회사 시큐웍스 기계 학습을 이용한 화재 및 보안 감시를 위한 장치 및 방법

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0644376A (ja) * 1992-05-28 1994-02-18 Matsushita Electric Ind Co Ltd 画像特徴抽出装置、画像認識方法及び画像認識装置
KR20160091709A (ko) * 2015-01-26 2016-08-03 창원대학교 산학협력단 시공간 블록 특징을 이용한 화재 검출 시스템 및 방법
KR20170038622A (ko) * 2015-09-30 2017-04-07 삼성전자주식회사 영상으로부터 객체를 분할하는 방법 및 장치
KR101855057B1 (ko) * 2018-01-11 2018-05-04 셔블 테크놀러지(주) 화재 경보 시스템 및 방법
KR20180071947A (ko) * 2016-12-20 2018-06-28 서울대학교산학협력단 영상 처리 장치 및 방법

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101415001B1 (ko) 2012-10-30 2014-07-04 한국해양과학기술원 핵산 추출 또는 증폭을 위한 휴대용 장치

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0644376A (ja) * 1992-05-28 1994-02-18 Matsushita Electric Ind Co Ltd 画像特徴抽出装置、画像認識方法及び画像認識装置
KR20160091709A (ko) * 2015-01-26 2016-08-03 창원대학교 산학협력단 시공간 블록 특징을 이용한 화재 검출 시스템 및 방법
KR20170038622A (ko) * 2015-09-30 2017-04-07 삼성전자주식회사 영상으로부터 객체를 분할하는 방법 및 장치
KR20180071947A (ko) * 2016-12-20 2018-06-28 서울대학교산학협력단 영상 처리 장치 및 방법
KR101855057B1 (ko) * 2018-01-11 2018-05-04 셔블 테크놀러지(주) 화재 경보 시스템 및 방법

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111680632A (zh) * 2020-06-10 2020-09-18 深延科技(北京)有限公司 基于深度学习卷积神经网络的烟火检测方法及系统

Also Published As

Publication number Publication date
KR102104548B1 (ko) 2020-04-24
KR20200019280A (ko) 2020-02-24

Similar Documents

Publication Publication Date Title
WO2020032506A1 (fr) Système de détection de vision et procédé de détection de vision l'utilisant
CN108596277B (zh) 一种车辆身份识别方法、装置和存储介质
WO2019107614A1 (fr) Procédé et système d'inspection de qualité basée sur la vision artificielle utilisant un apprentissage profond dans un processus de fabrication
WO2013048159A1 (fr) Procédé, appareil et support d'enregistrement lisible par ordinateur pour détecter un emplacement d'un point de caractéristique de visage à l'aide d'un algorithme d'apprentissage adaboost
WO2020071701A1 (fr) Procédé et dispositif de détection d'un objet en temps réel au moyen d'un modèle de réseau d'apprentissage profond
WO2017164478A1 (fr) Procédé et appareil de reconnaissance de micro-expressions au moyen d'une analyse d'apprentissage profond d'une dynamique micro-faciale
KR101910542B1 (ko) 객체 검출을 위한 영상분석 서버장치 및 방법
WO2017131263A1 (fr) Procédé de sélection d'instance hybride utilisant le point voisin le plus proche pour la prédiction de défauts inter-projets
JP2022506905A (ja) 知覚システムを評価するシステム及び方法
WO2016108327A1 (fr) Procédé de détection de véhicule, structure de base de données pour la détection de véhicule, et procédé de construction de base de données pour détection de véhicule
WO2021020866A1 (fr) Système et procédé d'analyse d'images pour surveillance à distance
WO2020246655A1 (fr) Procédé de reconnaissance de situation et dispositif permettant de le mettre en œuvre
CN109344886B (zh) 基于卷积神经网络的遮挡号牌判别方法
Liu et al. Anomaly detection in surveillance video using motion direction statistics
WO2021153861A1 (fr) Procédé de détection de multiples objets et appareil associé
WO2021100919A1 (fr) Procédé, programme et système pour déterminer si un comportement anormal se produit, sur la base d'une séquence de comportement
KR20180138558A (ko) 객체 검출을 위한 영상분석 서버장치 및 방법
WO2022055023A1 (fr) Système de plateforme d'analyse d'image intelligent intégré ido capable de reconnaître des objets intelligents
Miller et al. What’s in the black box? the false negative mechanisms inside object detectors
WO2022114895A1 (fr) Système et procédé de fourniture de service de contenu personnalisé à l'aide d'informations d'image
CN109800684B (zh) 一种视频中对象的确定方法及装置
CN111144260A (zh) 一种翻越闸机的检测方法、装置及系统
CN111860100B (zh) 行人数量的确定方法、装置、电子设备及可读存储介质
WO2021125539A1 (fr) Dispositif, procédé et programme informatique de classification d'objets présents dans une image
Mahareek et al. Detecting anomalies in security cameras with 3DCNN and ConvLSTM

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19846200

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19846200

Country of ref document: EP

Kind code of ref document: A1