WO2018101247A1 - Image recognition imaging apparatus - Google Patents

Image recognition imaging apparatus Download PDF

Info

Publication number
WO2018101247A1
WO2018101247A1 PCT/JP2017/042578 JP2017042578W WO2018101247A1 WO 2018101247 A1 WO2018101247 A1 WO 2018101247A1 JP 2017042578 W JP2017042578 W JP 2017042578W WO 2018101247 A1 WO2018101247 A1 WO 2018101247A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
recognition
distance
detection
camera
Prior art date
Application number
PCT/JP2017/042578
Other languages
French (fr)
Japanese (ja)
Inventor
大坪 宏安
石崎 修
Original Assignee
マクセル株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from JP2016231534A external-priority patent/JP7162412B2/en
Priority claimed from JP2017052831A external-priority patent/JP2018156408A/en
Priority claimed from JP2017146497A external-priority patent/JP6860445B2/en
Application filed by マクセル株式会社 filed Critical マクセル株式会社
Publication of WO2018101247A1 publication Critical patent/WO2018101247A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B13/00Burglar, theft or intruder alarms
    • G08B13/18Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength
    • G08B13/189Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems
    • G08B13/194Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems
    • G08B13/196Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems using television cameras
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B25/00Alarm systems in which the location of the alarm condition is signalled to a central station, e.g. fire or police telegraphic systems
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B25/00Alarm systems in which the location of the alarm condition is signalled to a central station, e.g. fire or police telegraphic systems
    • G08B25/01Alarm systems in which the location of the alarm condition is signalled to a central station, e.g. fire or police telegraphic systems characterised by the transmission medium
    • G08B25/04Alarm systems in which the location of the alarm condition is signalled to a central station, e.g. fire or police telegraphic systems characterised by the transmission medium using a single signalling line, e.g. in a closed loop
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast

Definitions

  • the present invention relates to an image recognition imaging apparatus that performs image recognition by acquiring a two-dimensional image and a distance image.
  • a surveillance camera captures an image and displays it on a monitor, and this image is used by a person to monitor the image in real time or to store an image and confirm the incident after the incident occurs.
  • AI artificial intelligence
  • a camera that detects / recognizes an object such as a person or an object is known (for example, see Patent Documents 2 and 3).
  • Such a camera is used as a surveillance camera for the purpose of crime prevention, for example, and issues an alarm when an abnormality is detected by detection / recognition.
  • Deep learning is a technique for learning data characteristics using a neural network having a multilayer structure, and it is known that high-accuracy image recognition is possible by using this.
  • a corresponding point corresponding to each point on the photographed subject in each of the two photographed images is searched, and the difference in position between corresponding points on the two images corresponding to the same point on the subject is determined.
  • the above distance is calculated from the parallax.
  • image recognition such as human detection or moving body detection by hail recognition, etc., and further, avoidance of danger when driving a car, automatic driving, nomination arrangement offense by hail recognition, etc. Detecting a specific person, detecting a suspicious person such as an illegal intruder, and the like.
  • the subject on the image can easily and accurately identify people, dogs, cats, horses, vehicles, etc. Can be done.
  • an arithmetic process such as distance detection described above is performed using an image having a large number of pixels, the amount of calculation is enormous. Therefore, when processing a moving image such as a video from an in-vehicle camera or a surveillance camera, it takes a long time to process one frame.
  • the calculation processing time of one frame becomes too long, and there is a possibility that subsequent processing may not be in time when a real-time response is required. In this case, it is conceivable to reduce the processing time by first thinning out the pixels of the image to lower the resolution (number of pixels) of the image.
  • fish-eye lenses are known as camera lens units (lenses).
  • the fisheye lens is a lens unit that adopts a projection method that is not the central projection method used in a normal wide-angle lens or a telephoto lens.
  • most fisheye lenses adopt an equidistant projection method.
  • a fisheye lens employs an equisolid angle projection method, an orthographic projection method, a stereo projection method, and the like.
  • the fisheye lens is an ultra-wide-angle lens.
  • the angle of view of a fisheye lens is often 180 degrees, but there are lenses with an angle of view of less than 180 degrees and lenses with an angle of view of greater than 180 degrees.
  • a distance calculation device for obtaining the distance to a subject using a stereo camera using such a fisheye lens has been proposed (see Patent Document 4).
  • a stereo camera using this fisheye lens an image captured by the stereo camera is converted into a spherical image obtained by projecting the image onto a spherical surface, and the distance to each subject is obtained.
  • the 3D sensor can recognize the shape of the object, but the object may not be identified only by the shape read by the 3D sensor.
  • the present invention has been made in view of the above circumstances, and an object thereof is to provide an image recognition imaging apparatus capable of reducing the amount of calculation in image recognition and improving the recognition rate.
  • an image recognition imaging apparatus of the present invention includes an imaging unit that captures an image, Distance image acquisition means for acquiring a distance image in which each pixel in a range corresponding to the imaging range of the image is represented by a distance to a shooting target; Distance image recognition object extraction means for extracting a recognition object on the distance image based on the distance from the distance image; Recognition object image extraction means for extracting a partial image to be the recognition object from the image based on a range on the distance image of the recognition object extracted from the distance image; And image recognition means for recognizing the partial image and identifying the recognition object.
  • a group of pixels having a distance value close to each other and a lump can be recognized as an object that is a recognition target object on the distance image.
  • an object can be extracted by simple calculation as compared with the case of extracting an object from a two-dimensional color visible image.
  • the distance image basically corresponds to the shooting range of the image, and the position of the distance image corresponding to each position of the image includes distance information.
  • the image capturing range corresponds to the range of the range image.
  • the image and the distance image are basically taken from the same range, but if the correspondence between the position of the image and the distance image is obtained, even if either image is larger than the other image, Good.
  • Image recognition of the partial image that is the range of the object on this image to identify the object.
  • the amount of calculation can be greatly reduced because the image recognition is performed only for the extracted partial images, compared to the conventional image recognition performed on the entire image.
  • the object since the object has already been separated from the background on the image, many operations for separating the object from the image are not required, and the amount of calculation can be reduced.
  • the separation accuracy can be increased.
  • image recognition it is possible to identify a person, a car, or the like as an object attribute based on the shape, color, brightness, etc. of the extracted object.
  • image recognition using deep machine learning it can identify not only humans but also men, women, adults, children, etc., and facial features such as nose height, eyes and mouth. The size of eyes and the color of eyes can be classified by job.
  • the three-dimensional shape and size of the object can be recognized from the relationship of the distance, and the image can be recognized in consideration of the three-dimensional shape and size of the object. Separation of an object according to a distance from an image and image recognition using data of a three-dimensional shape and size in addition to the image can improve object identification capability and identification accuracy including object detection and identification.
  • the distance image acquisition unit includes a distance measurement unit that measures a distance of each pixel of the distance image.
  • the above-described image and the distance image can be obtained from the imaging unit that is a monocular camera that captures an image and the distance measurement unit that generates a distance image such as a depth sensor and a 3D sensor.
  • the depth sensor and the 3D sensor for example, a TOF (Time Of Flight) type can be used.
  • the distance image acquisition unit obtains a distance image based on parallax between the two imaging units.
  • a distance image can be obtained from a parallax of a so-called stereo camera.
  • the imaging means, the distance image acquisition means, the distance image recognition object extraction means, the recognition object image extraction means, and the image recognition means are provided in one housing. preferable.
  • the image recognition imaging apparatus may be connected to an external server so as to be able to perform data communication, and the server may be configured to store data or perform more advanced image recognition processing.
  • surveillance cameras and the like are often used for a long time after installation.
  • the detection / recognition technology used in the system may become obsolete.
  • the detection / recognition algorithm varies depending on the environment such as the place where it is used, the subject of photography, etc., so that the detection / recognition firmware originally provided in the camera before the installation of the camera is used. There is a possibility that sufficient detection / recognition cannot be performed with the detection / recognition algorithm used.
  • the present invention has been made in view of the above circumstances, and detects the feature included in the image and updates the detection / recognition performance of the detection / recognition performance for recognizing the recognition target set from this feature. It is an object of the present invention to provide a detection recognition system that can be improved.
  • the detection recognition system of the present invention includes: An imaging unit for imaging, a detection / recognition unit, and a server;
  • the detection / recognition unit includes detection / recognition firmware, detects a feature included in the image from the image acquired by the imaging unit, and recognizes a set recognition target.
  • the detection / recognition firmware can be updated to a new detection / recognition firmware generated by the detection / recognition firmware generation unit.
  • the server generates machine detection means for generating a detection / recognition algorithm by machine learning using the image acquired by the image pickup means as teacher data, and generates new detection / recognition firmware of the detection / recognition means from the detection / recognition algorithm. And detecting / recognizing firmware generation means.
  • the image pickup means picks up an image. Then, the detection / recognition unit detects a feature included in the image from the image obtained by the image pickup by the image pickup unit, and recognizes the set recognition target.
  • the machine learning means of the server generates a detection / recognition algorithm by machine learning using the image acquired by the imaging means as teacher data. Then, the generated detection / recognition algorithm is converted into firmware (detection / recognition firmware) suitable for the detection / recognition means by the detection / recognition firmware generation means of the server. Then, the detection / recognition firmware of the detection / recognition unit is updated to a new detection / recognition firmware generated by the detection / recognition firmware generation unit.
  • a detection / recognition algorithm that can be detected and recognized with higher accuracy is generated from the image obtained by the imaging means by machine learning, and the detection / recognition algorithm is converted into firmware suitable for the detection / recognition means. Since the detection / recognition firmware of the detection / recognition means can be updated, the detection / recognition performance can be improved.
  • the machine learning means may perform machine learning using, as teacher data, an image when the detection / recognition means erroneously recognizes the set recognition target. preferable.
  • the machine learning means learns a new recognition target so that the recognition target is not erroneously recognized with respect to an image when the recognition target set with the detection / recognition means is erroneously recognized. Since the detection / recognition algorithm can be generated and new detection / recognition firmware can be generated, the detection / recognition performance can be reliably improved.
  • the camera preferably includes the imaging unit and the detection / recognition unit.
  • the camera can update the detection / recognition firmware of the detection / recognition means to a new detection / recognition firmware created as a result of machine learning in the server, the detection / recognition performance of the camera is improved. be able to. Therefore, the detection / recognition performance of the camera can be easily improved even with a camera or the like after installation.
  • the detection recognition system of the present invention It is preferable to include a plurality of the cameras in which at least a part of an imaging range by the imaging unit and the set recognition target overlap.
  • a predetermined range can be imaged with a plurality of cameras and detected / recognized. Therefore, since the same object and the same phenomenon can be detected / recognized by a plurality of cameras, the accuracy of detection / recognition can be improved.
  • the machine learning means recognizes a recognition target that is overlapped by some of the plurality of cameras
  • the other camera of the plurality of cameras does not recognize the recognition target that is overlapped. It is preferable to perform machine learning using the acquired images as teacher data.
  • At least one of the plurality of cameras is a camera having a different imaging unit.
  • the detection / recognition performance for detecting a feature included in an image and recognizing a recognition target set from the feature is improved by updating the firmware for detection / recognition. be able to.
  • calculating a distance image from each frame of a moving image shot with a stereo camera using a fisheye lens requires an enormous amount of computation as in the case of a normal lens, or more than that. Processing time becomes long. Further, if an image in which an image is formed from a fish-eye lens unit on a planar image sensor surface is detected and output as it is, distortion at the center of the image is small and distortion at the peripheral edge of the image is large. In addition, when monitoring indoors with a surveillance camera, depending on the size, the stereo camera should be installed vertically so that the floor directly below the center ceiling of the indoor space is centered. It is efficient to do.
  • An object of the present invention is to provide an object distance detection device capable of obtaining an image.
  • an object distance detection device of the present invention includes a stereo camera including a pair of fisheye cameras having a fisheye lens unit and an image sensor, A distance image calculation unit that calculates a distance image from images output from the pair of imaging sensors; A distance image recognition unit for performing image recognition including identification of a subject from the distance image; With The distance image calculation unit is a partitioning unit that partitions each of the images captured by the pair of fisheye cameras into a plurality of preset partitions.
  • a resolution conversion unit that converts the image into a resolution image set for each section;
  • a corresponding point search unit for obtaining corresponding points corresponding to the same point on the photographed subject in each of the two images captured substantially simultaneously with a pair of fisheye cameras;
  • a distance calculation unit that is searched by the corresponding point search unit and obtains a distance from the stereo camera to the corresponding point based on a difference in position between two corresponding points corresponding to the same point on the subject.
  • the image captured by the fisheye camera can be divided into a plurality of sections, and the resolution can be changed according to the sections, so when considering the image captured by the fisheye camera, the central portion of the image is The distortion caused by the fisheye lens is small, and it is possible to detect, for example, a shark or a person even without modification.
  • the image since the image has a large distortion at the peripheral edge of the image and the image is distorted and compressed, it is difficult to recognize the image unless the distortion is removed.
  • the resolution of the image is first lowered uniformly in order to facilitate processing, it becomes difficult to perform highly accurate image recognition at the image peripheral portion having a large distortion.
  • the resolution of the image is lowered to reduce the amount of processing, and the processing is performed at a high resolution without reducing the resolution in the peripheral portion of the image, thereby reducing the resolution of the image. Even if the processing speed is improved, the accuracy of image recognition can be maintained. In this case, if the resolution is lowered at a large area in the center of the image, the overall processing amount can be reduced, the processing speed can be increased, and image recognition by lowering the resolution can be performed. It can suppress that accuracy falls.
  • the resolution here is, for example, the number of pixels per unit area of the image. When the resolution is lowered, the pixels of the image are thinned out by a well-known method. In this case, a process similar to a known image reduction process may be performed.
  • the sections of the images of the two fisheye cameras are basically sectioned in the same direction range. That is, when the section A, the section B, and the section C correspond to the respective images, corresponding points of the other image corresponding to the corresponding point A in the section A of one image at least other than the boundary portion of the section. A exists in the section A of the other image. Therefore, the search for corresponding points is basically performed in two corresponding sections of two images.
  • the search for corresponding points is performed by, for example, extracting feature points (singular points) in image recognition in each section of two images of a pair of fisheye lenses, and the other image corresponding to the feature point in one image section.
  • the feature points of the sections are determined, and corresponding points are determined by basic image recognition.
  • the epipolar geometry described in Patent Document 1 described above may be used for determination.
  • the distance image calculation unit includes pixels arranged vertically and horizontally, and is divided into pixel regions each including one or a plurality of pixels. Output a distance image with a color that changes according to the distance from the stereo camera to the corresponding point, In the section where the resolution of the distance image is different, it is preferable that the number of the pixels constituting the pixel region is different depending on the resolution.
  • a distance image is an image showing a numerical value of a distance as compared to an image showing a numerical value of temperature as in a thermography
  • a color change that becomes an image like a thermography is, for example, a change in brightness (luminance) or a hue. Or a combination of both luminance and hue changes.
  • the distance image calculation unit includes a distortion removal unit that removes distortion caused by the fisheye lens for each of the sections.
  • a distortion removal unit that removes distortion caused by the fisheye lens for each of the sections.
  • the pair of fisheye cameras are arranged in a substantially vertical direction, It is preferable that the resolution conversion unit changes the resolution so that the resolution is higher in the section at the periphery of the image than in the section at the center of the image.
  • the area of one section of the central portion with low resolution is larger than the area of one section of the peripheral portion with high resolution.
  • the pair of fisheye cameras are arranged in a substantially horizontal direction
  • the resolution conversion unit changes the resolution so that the resolution in the lower section of the image is higher than that in the upper section of the image.
  • the resolution at the center of the image is preferably lower than that at the bottom of the image.
  • the installation height of the fisheye camera is lower than the height of the person (the position of the shark)
  • the lower section of the image is more likely than the upper section of the image where the shark is likely to be captured.
  • the resolution may be changed so that the resolution is lowered.
  • the object distance detection device of the present invention it is possible to calculate a distance image by a stereo camera using a fisheye lens at high speed and with high accuracy without imposing a heavy load on the arithmetic processing device.
  • image recognition can be performed easily and with high accuracy.
  • FIG. 1 is a block diagram illustrating an image recognition and imaging apparatus according to a first embodiment of the image recognition and imaging apparatus of the present invention.
  • 3 is a flowchart illustrating an image recognition method by the image recognition imaging apparatus. It is a figure for demonstrating the image recognition method by an image recognition imaging device. It is a figure for demonstrating the image recognition method by an image recognition imaging device. It is a block diagram which shows the image recognition imaging device of the 2nd Embodiment of this invention.
  • FIG. 1 is a block diagram illustrating a detection recognition system according to an embodiment of the detection recognition system of the present invention. It is a block diagram which shows the camera of a detection recognition system. It is a block diagram which shows the server of a detection recognition system. 4 is a flowchart for explaining a detection recognition firmware update method by the detection recognition system.
  • FIG. 1 is a block diagram illustrating an object distance detection device according to an embodiment of the object distance detection device of the present invention. It is a block diagram which shows the image analysis part of an object distance detection apparatus similarly. 4 is a flowchart showing processing of an image analysis unit of the object distance detection device. It is a figure which shows the division of the image of an object distance detection apparatus similarly. It is a figure which shows the division of the image of an object distance detection apparatus similarly. (A), (b) is a figure for demonstrating the difference in the resolution for every division of a distance image.
  • the image recognition and imaging apparatus of the present embodiment is a combination of an image recognition apparatus and a camera mainly related to monitoring, such as a surveillance camera and an in-vehicle camera, and identifies a person, a car, and the like in a shooting range. ing.
  • the image recognition and imaging apparatus includes an image sensor 1 that is an imaging unit, a 3D sensor 2 that is a distance image acquisition unit, and a recognition target from a distance image obtained by the 3D sensor 2.
  • An object including a person
  • an object extracting unit 3 as a distance image recognition target extracting unit for extracting a recognition target, and a recognition target for extracting a partial image serving as the recognition target from the image of the image sensor 1
  • Object image extraction means 4 as an object image extraction means
  • image recognition means 5 for performing image recognition of the extracted partial image (object image)
  • control means 6 for controlling these, images, distance images, recognition results, etc.
  • Storage means 7 for storing the data.
  • the control means 6 is connected to an external server 9 (host PC) via a communication network 8 such as the Internet so that data communication is possible.
  • the image sensor 1 is a so-called image sensor (image sensor), and is used as a camera including a lens that forms an image to be photographed on the image sensor 1.
  • the 3D sensor 2 is of the above-described TOF method, for example, scans an ultrashort pulse of an infrared laser in the imaging range, measures the time until the reflected light of the light hitting the object returns, and this time Is multiplied by the speed of light to obtain the distance of each pixel in the shooting range.
  • the resolution of the image sensor 1 and the resolution of the 3D sensor 2 may or may not match, and the position of each part of the imaging range of the image sensor 1 and each of the imaging range of the 3D sensor 2 It is only necessary that the positions of the portions correspond to each other, and it is only necessary to know where an arbitrary position in the shooting range of the image sensor 1 is located in the shooting range of the 3D sensor 2.
  • the image sensor 1 and the 3D sensor 2 are configured to simultaneously capture an overlapping range as an image and a distance image.
  • the object extraction unit 3 extracts an object from a distance image that is distance information acquired by the 3D sensor 2. At this time, it is determined that a group of pixels (neighboring pixels) in which distance values are close to each other on the distance image and are substantially in a lump is an object to be extracted. At this time, a group of pixels with approximate distance values are extracted as one object. In this case, the object can be extracted basically only by the distance of each pixel in the distance image. Therefore, the distance image captured only once, not the image of a monitoring camera or the like that constantly or repeatedly captures the same range. For example, an object can be extracted with high accuracy even from a distance image captured by the in-vehicle 3D sensor 2.
  • the 3D sensor 2 that is fixed like a surveillance camera or the 3D sensor 2 in which the movement range such as rotation is determined, the same range is always shot or the same range is shot repeatedly, so the 3D sensor 2 When shooting a pixel, the distance that does not change for a certain period of time (the longest distance in the case of change) is stored as the background distance of the pixel, and the group of pixels whose distance has changed from the background distance is recognized as an object It is good. In this case, a group of pixels whose distances change over time may be detected as an object. Note that it is possible to separate the background and the object from the change over time of the image even in a two-dimensional image, but in the case of a distance image, the part that has changed so that the distance is basically shorter than the background Can be identified as an object.
  • the position of the range of the object is determined on the distance image by the 3D sensor 2.
  • the object image extraction unit 4 converts the range of the above-described object determined on the distance image into a range on a visible two-dimensional image by the image sensor 1 and extracts a partial image within this range. That is, the range of the object extracted on the distance image is assigned to the image, and the partial image that becomes the range of the object is extracted from the image.
  • a coordinate system may be provided for each of the distance image and the image, the coordinates on the image may be converted to the coordinates on the distance image, or both coordinate systems may be the same.
  • the image recognition means 5 performs image recognition of a partial image that is an object on the image extracted from the image as described above. In this case, since a portion that is an object has already been extracted from the distance image, it is recognized whether the extracted partial image is, for example, a person or a car. At this time, for example, the stored feature points of the person or the car and the feature points detected from the partial image are compared to determine whether the person is a car or the like. Further, based on an algorithm acquired by deep machine learning, a person may be identified as a dwarf, an adult, a woman, or a man.
  • OpenCV Open Computer Vision Library
  • the latest OpenCV library includes a machine learning function, for example, also includes a deep learning module, and can identify people, vehicles, and the like.
  • a machine learning function for example, also includes a deep learning module, and can identify people, vehicles, and the like.
  • the area on the image that becomes the object has already been determined using the distance image, for example, it is not necessary to determine the area of the image that becomes the identified person or car, and has already been extracted. Therefore, it is only necessary to identify whether the area of the object is a person or a car, so that an operation for processing the entire image and specifying a portion to be an object is not required, and the amount of calculation is small.
  • the attributes of an object are recognized, and the type of an object such as a person or a car, and if it is a person, the attribute of the object such as an adult, dwarf, male, female, race, facial features, etc. To detect.
  • the model, year, color, etc. are identified as attributes.
  • the three-dimensional shape and the size can be recognized from the object data on the distance image when the object is extracted, and the three-dimensional shape and the size are used when identifying the attribute of the object. Thus, it is easy to determine adults and children, small cars and large cars.
  • the control means 6 controls photographing by the image sensors 1 and 3D sensor 2, object extraction from the distance image by the object extraction means 3, extraction of a partial image from the image by the object image extraction means 4, and image recognition by the image recognition means 5.
  • the control means 6 comprises an arithmetic processing device, and the arithmetic processing device may function as the object extraction means 3, the object image extraction means 4, and the image recognition means 5.
  • the image recognition means 5 may be realized by executing a machine learning model (image recognition algorithm) that has undergone deep machine learning on the arithmetic processing unit.
  • the storage unit 7 is a storage device including a hard disk, a flash memory, and the like, and stores data such as images, distance images, and image recognition results.
  • a distance image b is photographed by the 3D sensor 2 (step S1).
  • a two-dimensional visible (color) image a is simultaneously captured by the image sensor 1.
  • the photographed image a includes an adult man, a car, and a dwarf woman.
  • the distance image b shown in FIG. 4 the difference in distance between the pixels can be expressed as an image, and an image can be obtained as shown in FIG. 4 by assigning light and darkness or color to the distance.
  • pixels having a predetermined distance or more are represented by white, for example.
  • an object is extracted from the distance image as a group of pixels having a close distance from each other (step S2).
  • pixels having a predetermined distance or more may be used as the background, and a group of pixels having a smaller distance value and approximate distance may be extracted as objects.
  • the object extracted on the distance image can represent the arrangement at the position of each pixel within the range of the object as a group of pixels.
  • distance image portions of an adult male, a car, and a dwarf female are extracted.
  • a range that is the same position as the range of the object extracted in the distance image b in FIG. 4 is extracted as a partial image that becomes an object image (step S3).
  • an object on the two-dimensional color image is extracted.
  • the object is not separated by identifying and extracting the object range on the image, but the object is separated by the difference in distance on the distance image, and the object on the separated distance image is displayed on the image. Since the object is merely fitted to the object, the position of the object is only known, and the object is not identified. Therefore, since the object is extracted only by the distance on the distance image and the object on the image is extracted based on the position, the calculation amount is extremely large compared with the case of identifying the object and extracting the object. Few.
  • step S4 image recognition of the partial image as the object image extracted from the image is performed (step S4).
  • the object since the object has been extracted as described above, in the image recognition, there is no need to process the entire image and extract the object. That is, since it is not necessary to detect and extract an object from the image, image recognition is performed only on the already extracted partial image, and the amount of calculation can be reduced.
  • image recognition can be performed on the assumption that the extracted partial image is one object or a plurality of adjacent objects, for example, the outer edge portion of the extracted portion is used as the outer edge portion of the object. Therefore, it is possible to reduce the amount of calculation in image recognition of partial images.
  • a person can be identified by well-known person detection, wrinkle recognition, or the like.
  • an object other than a person can also identify, for example, a car or a bicycle by registering feature points of various objects as part of the algorithm.
  • the three-dimensional shape and size of an object that can be read from a distance image can be used as the feature point of the object, and the recognition accuracy of the object can be improved.
  • the data such as the object attribute as the image recognition result is transmitted to the server 9 (step S5).
  • various data related to the above-described image recognition for example, data such as a distance image used for image recognition, an image, a range of an extracted object, and the like are also sent to the server 9.
  • the server 9 may include the object extraction unit 3, the object image extraction unit 4, and the image recognition unit 5. It is possible to perform more advanced processing by performing image recognition processing on the server side having high computing ability.
  • a system in which a plurality of image sensors 1 and 3D sensors 2 are connected may be used.
  • the calculation amount of each image recognition and imaging apparatus can be reduced even in the server.
  • a single high-function server is used. You can also.
  • the image recognition in the normal time may be performed on the image recognition imaging apparatus side, and the image recognition may be performed on the server when the image recognition is performed with the past data when the incident occurs. In this case, it is not necessary to save all the images and distance images. For example, only the extracted object image (partial image) is saved to reduce the storage capacity required for the server 9. Can do.
  • the object image is extracted based on the position on the image corresponding to the position of the extracted object on the distance image, and then extracted.
  • the amount of calculation can be reduced by performing image recognition on the object image, and the accuracy of separating the object from the background can be improved by performing object extraction on the distance image, for example, The object detection accuracy can be increased to about 99%.
  • the size can be easily and accurately calculated as the attribute of the object. Based on this size, it becomes easy to determine other attributes of the object, for example, whether it is an adult or a dwarf, or the type of car.
  • the distance image acquisition unit has two cameras (camera 1 and camera 2) 10 as imaging units, and left and right images captured by these cameras 10.
  • Distance image detection means 11 for calculating a distance for each pixel from the parallax of the image and generating a distance image.
  • the two cameras 10 and the distance image detecting means 11 constitute a stereo camera 12 with a 3D sensor function.
  • the stereo camera 12 with the 3D sensor function can obtain the distance image generated by the distance image detection means 11 and the image photographed by the camera 10. Note that two images with parallax can be obtained by the stereo camera 12, but either one or both may be used.
  • the configuration other than the configuration of the distance image acquisition unit and the imaging unit is the same as that of the image recognition imaging device of the first embodiment described above, and the object extraction unit 3 and the object image extraction unit 4
  • the image recognition means 5, the control means 6, and the storage means 7 are provided, and are connected to an external server 9 via a communication network 8.
  • image recognition can be performed by the same method as in the first embodiment, except that a distance image is obtained by a known method from the parallax of a pair of images taken by the stereo camera 12. It is possible to achieve the same operational effects as those of the image recognition and imaging apparatus of the first embodiment.
  • the distance image acquisition means is not limited to the TOF type 3D sensor or the stereo camera, but may be another type of 3D sensor as long as it can generate a distance image corresponding to the shooting range of the imaging means. Good.
  • the detection and recognition system is used to notify when a recognition target set from an image captured by a camera is recognized, for example.
  • an image simply includes both a moving image and a still image.
  • the detection recognition system 101 includes a plurality of cameras 102, a server 103, and a terminal 104, as shown in FIG.
  • the plurality of cameras 102, the server 103, and the terminal 104 are connected by a wired or wireless network 105.
  • the detection recognition system 101 recognizes an object photographed by the camera 102 as a suspicious person based on its outer shape and movement. In addition, it can be used to notify the terminal 104 at another place. Further, when a suspicious person is recognized in this way, a terminal, a system management device, or the like held by the administrator 106 of the detection recognition system 101 may be notified.
  • the camera 102 includes an imaging unit 120, a detection / recognition unit 121, a recording unit 122, a communication unit 123, and a control unit 124.
  • the imaging means 120 has a lens and a solid-state image sensor, for example, and acquires an image by imaging.
  • the detection / recognition means 121 includes an arithmetic processing unit and a memory, and performs image recognition. Specifically, a feature included in the image captured by the imaging unit 120 is detected by control by the detection / recognition firmware provided in the memory of the detection / recognition unit 121, and a recognition target set from this feature is recognized. To do. In the case of simply detecting / recognizing below, basically, the feature included in the image picked up by the image pickup means 120 is detected as described above, and the recognition target set from this feature is recognized. Say.
  • the recording unit 122 also includes a reference image and other information for detection / recognition by the detection / recognition unit 121, and an image at the time of abnormality (for example, when the recognition target set by the detection / recognition unit 121 is recognized). Record other information (for example, voice, etc.).
  • the communication unit 123 communicates with the server 103 via the network 105 to transmit an image and other information at the time of abnormality to the server 103 and to receive a command and detection recognition firmware from the server 103.
  • the communication unit 123 is also connected to the terminal 104 or the terminal of the administrator 106 via the network 105, and transmits an alarm signal or the like to these terminals or the server 103 when an abnormality occurs.
  • the terminal 104 or the terminal of the administrator 106 receives this alarm signal, or receives an instruction to sound an alarm from the server 103 that has received the alarm signal, and sounds the alarm.
  • the control unit 124 includes an arithmetic processing unit and a memory, and controls the imaging unit 120, the detection / recognition unit 121, the recording unit 122, and the communication unit 123. Note that the control unit 124 may share the arithmetic processing unit or the memory with the detection / recognition unit 121.
  • the camera 102 may not be provided with all of the imaging unit 120, the detection / recognition unit 121, the recording unit 122, the communication unit 123, and the control unit 124.
  • the detection recognition system 101 is connected to the camera 102 by wire or wirelessly, and includes a terminal outside the camera 102 that can control the camera 102 and display an image captured by the camera 102.
  • the detection / recognition means 121, the recording means 122, the communication means 123, and the control means 124 are provided in the terminal, and an image captured by the imaging means 120 provided in the camera 102 is detected / recognized by the terminal. You may do it.
  • the camera 102 has the same configuration as that of a general monitoring camera.
  • the imaging unit 120 captures an imaging range corresponding to the set angle of view in accordance with the orientation of the camera 102.
  • the same type of camera may be used, or different types of cameras may be used.
  • the imaging ranges of the respective cameras 102 may overlap or may be completely different.
  • a total of four different types of cameras 102 are used as the cameras 102: two stereo cameras 102a, one infrared camera 102b, and one monocular camera 102c. It is assumed that the imaging ranges of the four cameras 102 overlap each other.
  • the stereo camera 102a capable of calculating the distance, size, 3D structure, etc. from the parallax as the camera 102
  • the distance, size, 3D structure, etc. can be calculated from the parallax, so that an arithmetic processing unit for detection / recognition, etc. Therefore, even if the camera does not include a high-performance arithmetic processing unit or the like, detection / recognition can be easily performed.
  • an infrared camera near infrared camera or far infrared camera
  • a near infrared or far infrared image can be taken, and what cannot be seen by human eyes is also detected. / Can be recognized. Also, detection / recognition in a dark environment such as at night becomes easy.
  • the type of the camera 102 is not limited to these.
  • a distance image sensor may be used as the camera 102.
  • the distance image sensor for example, TOF (Time Of Flight) can be used.
  • the TOF measures the distance from the time taken for the projected laser to reciprocate to the target.
  • the camera 102 may be one in which the imaging unit 120 captures one two-dimensional image and performs detection / recognition from this image, and the imaging unit 120 captures two images, and from the parallax of these images,
  • the distance, size, 3D structure, and the like may be calculated and detected / recognized, or the imaging unit 120 may capture a 3D distance image using a TOF sensor or the like, and detect / recognize from the 3D distance image.
  • the imaging unit 120 may pick up near-infrared or far-infrared images, and perform detection / recognition from these images.
  • one camera 102 may include a plurality of the imaging means 120 described above. In other words, one camera 102 may include, for example, an imaging function of a stereo camera and an infrared camera, and detection / recognition may be performed from an image obtained by these functions.
  • the detection / recognition means 121 recognizes a set recognition target, and the recognition target may be a specific object (including a person and an object other than a person) or an abstract phenomenon. Conceivable. In other words, the recognition target may be an object such as a person such as a robber, a thief, or a arson, or an object such as a handgun, or a phenomenon such as a crime or fire.
  • the detection / recognition means it can be considered that the person 121 recognizes this person as a burglar by detecting a person holding a knife or a handgun or detecting the movement of the person from this image.
  • a fire it is detected that the temperature of a certain place is abnormally high from an image obtained by an infrared camera and recognizes that a fire has occurred. Can be considered.
  • the infrared camera uses far-infrared rays
  • the temperature can be detected, and the handgun, knife, etc. hidden in the pocket of clothes due to the temperature difference between the handgun, knife and other weapons and body temperature. It is also conceivable to detect and recognize the weapons of this.
  • the detection / recognition firmware of the detection / recognition means 121 is generated by machine learning in the machine learning means 130 described later, actually, the detection / recognition means 121 is easy for such a person to understand ( It doesn't always make a way of recognition. That is, the detection / recognition unit 121 detects a feature included in the image captured by the imaging unit 120 under the control of the detection / recognition firmware, and recognizes a recognition target set from the feature.
  • the detection / recognition means 121 may perform detection / recognition using not only images but also sound.
  • the camera 102 includes a voice input unit such as a microphone, and the detection / recognition accuracy can be improved by performing detection / recognition using the voice acquired by the voice input unit.
  • voice may be used in detection / recognition by the server-side detection / recognition means 132 described later.
  • the detection / recognition firmware of the detection / recognition unit 121 is updated by new detection / recognition firmware generated by the machine learning unit 130 and the detection / recognition firmware generation unit 131 described later.
  • the detection / recognition firmware provided in the detection / recognition unit 121 may be generated by the machine learning unit 130 and the detection / recognition firmware generation unit 131, and is detected by another machine learning capable device. / It may be incorporated in the recognition means 121.
  • the detection / recognition means 121 may be initially provided with detection / recognition firmware generated by a method other than machine learning.
  • the target setting recognized by the detection / recognition means 121 is included in the detection / recognition firmware.
  • the detection / recognition firmware is generated by the machine learning means 130 and the detection / recognition firmware generation means 131
  • the machine learning teacher data is, for example, a convenience store
  • the machine learning means 130 is provided with a plurality of images showing robbery robbers who have been robbed and information that these images are images showing burglars as teacher data (tag the images as robbers). . Then, by machine learning, it is learned where a given image (teacher data) can be recognized to recognize a burglar.
  • a detection / recognition algorithm with a high probability of being able to recognize a burglar from an image is generated.
  • the detection / recognition algorithm is converted by the detection / recognition firmware generation means 131, and detection / recognition firmware is generated. That is, the detection / recognition firmware (detection / recognition algorithm) obtained by this learning can recognize where in the image the robber is included, and can be recognized as an object to be recognized. That is, it can be said that robbery is set. Note that tagging an image is not always necessary when performing this machine learning.
  • the number of recognition targets (targets recognized by the detection / recognition firmware) set in the detection / recognition firmware is not limited to one, and a plurality of recognition targets may be set.
  • the detection / recognition firmware recognizes a specific target, and when the detection / recognition unit 121 recognizes this specific target by the detection / recognition firmware, a signal indicating that the detection has been performed. (For example, an alarm signal) is output. Further, a signal indicating that the recognition has been performed is transmitted to the server 103, the terminal 104, the terminal of the administrator 106, and the like via the communication unit 123, and notification that the setting target has been recognized is made to these terminals. The The signal indicating that the recognition has been performed is sent only to the server 103, and the server 103 comprehensively determines the information from each camera 102 and then recognizes the recognition target from the server 103 to the terminal 104 or the like. Alarm information such as an e-mail or an instruction to sound an alarm may be sent.
  • Alarm information such as an e-mail or an instruction to sound an alarm may be sent.
  • the four cameras 102 have overlapping imaging ranges, and the four cameras 102 can simultaneously recognize overlapping portions of the recognition targets set in the detection / recognition firmware. ing. That is, for example, when a burglar is set as an overlapping recognition target, it is possible to simultaneously recognize a specific burglar who performs a specific burglary with four cameras.
  • the server 103 includes machine learning means 130, detection / recognition firmware generation means 131, server-side detection / recognition means 132, server-side recording means 133, server-side communication means 134, server Side control means 135.
  • the machine learning unit 130, the detection / recognition firmware generation unit 131, the server-side detection / recognition unit 132, and the server-side control unit 135 each include an arithmetic processing unit and a memory, but each has an individual arithmetic processing unit or memory. You may have and you may share an arithmetic processing unit or memory.
  • the machine learning unit 130 performs machine learning such as deep learning to generate a detection / recognition algorithm.
  • the detection / recognition algorithm is an algorithm for recognizing a set recognition target from an image captured by the imaging unit 120 of the camera 102.
  • the detection / recognition firmware generation unit 131 converts the detection / recognition algorithm generated by the machine learning unit 130 into firmware that can be executed by each camera 102, and generates detection / recognition firmware.
  • Each camera 102 has an image resolution that can be acquired by the imaging means 120, the performance of the arithmetic processing unit of the detection / recognition means 121, the presence / absence of a GPU (Graphics Processing Unit) for the detection / recognition means 121, and a voice input means such as a microphone. And the type of camera (stereo camera, TOF sensor, etc.) are different, and the firmware that can be executed by each camera 102 is also different.
  • the detection / recognition firmware generation unit 131 converts a detection / recognition algorithm generated by machine learning into firmware that can be executed by each camera 102, so that a new detection / recognition program can be installed in each camera 102. It becomes.
  • the server-side detection / recognition means 132 performs detection / recognition by comprehensively judging the situation from the images and information of each camera 102.
  • the detection / recognition unit 121 of each camera 102 performs detection / recognition using the image acquired by the imaging unit 120 of the camera 102, but the server-side detection / recognition unit 132 is acquired by a plurality of cameras 102. Detection / recognition is performed using the selected image. Further, when the processing is heavy to be performed by each camera 102, the server-side detection / recognition unit 132 may perform a part of the processing.
  • the detection / recognition firmware of the server side detection / recognition means 132 is provided in the memory of the server side detection / recognition means 132. Further, the detection / recognition firmware of the server-side detection / recognition means 132 can also be updated by the detection / recognition firmware generated by the machine learning means 130 and the detection / recognition firmware generation means 131.
  • the server side detection / recognition means 132 determines whether the recognition target of each camera 102 is correct from the recognition results of the detection / recognition means 121 of the four cameras 102 (cameras 102a, 102b, 102c), or The probability that the recognition of the recognition target in each camera 102 is correct may be determined. Then, based on this determination result, alarm information or the like may be sent to the terminal 104 or the like. For example, when there is a notification that all four cameras have recognized the setting target (for example, robbery), the server-side detection / recognition unit 132 determines that the setting target is recognized correctly, and notifies the terminal 104 or the like May be ordered to sound.
  • the content of the alarm information may be changed depending on the number of cameras that have recognized the setting target. For example, when all four cameras recognize the setting target, the server side detection / recognition unit 132 determines that the recognition is correct and instructs the terminal 104 to sound a loud alarm sound. When only the following cameras are recognized, the server side detection / recognition unit 132 determines that the recognition may be correct, and instructs the terminal 104 to sound a small alarm sound. It may be.
  • the server side detection / recognition means 132 determines the erroneous recognition (recognition error) of the camera 102 from the recognition results of the detection / recognition means 121 of the plurality of cameras 102. For example, when there is a notification that the setting target is recognized by the detection / recognition means 121 from the three cameras 102 out of the four cameras 102 and there is no notification from the one camera 102, this one The camera 102 determines that a recognition error (recognition error) has occurred. Conversely, out of the four cameras 102, three cameras 102 did not receive a notification that the setting target was recognized by the detection / recognition means 121, but one camera 102 received a notification.
  • the one camera 102 may be determined that the one camera 102 has made a recognition error (recognition error). Even if the detection / recognition result by the server-side detection / recognition means 132 is compared with the detection / recognition result by each camera 102, the erroneous recognition (recognition error) of each camera 102 is determined. Good.
  • the server-side recording means 133 records teacher data for machine learning performed by the machine learning means 130 and the like.
  • the server-side communication means 134 communicates with each camera 102 via the network 105, receives images and other information from each camera 102, transmits commands and detection / recognition firmware to each camera 102, and is in an abnormal state.
  • the alarm information (when the setting target is recognized) is transmitted to the terminal 104 or the administrator 106.
  • the camera 102 acquires an image by the imaging unit 120 and performs recognition (detection / recognition) of a recognition target set by control by the detection / recognition firmware of the detection / recognition unit 121.
  • recognition detection / recognition
  • an image when the recognition is wrong is transmitted to the server 103 (step S11).
  • audio data or the like when the recognition is wrong may be transmitted simultaneously with the image when the recognition is wrong.
  • the server side detection / recognition means 132 determines whether or not the recognition is wrong from the recognition results of the plurality of cameras 102 as described above.
  • the camera 102 recognizes a set recognition target (for example, a burglar), in the system that notifies the server 103 of the recognition (for example, the burglar is recognized), the camera 102a and the camera 102b Is notified to the server 103, but if there is no notification from the camera 102c that the camera 102c has recognized, the server-side detection / recognition means 132 has recognized the camera 102c incorrectly from these notification results ( Judgment was not possible.
  • the control means of the server 103 gives the camera 102c the same time as or the time before and after the time when the camera 102a and the camera 102b recognized the recognition target (for example, several seconds to several minutes before and after).
  • the camera 102c is instructed to transmit the image acquired by the camera 102c to the server 103 as an image when recognition is erroneous. Upon receiving this command, the camera 102 c transmits an image when recognition is erroneous to the server 103.
  • a person may determine whether or not the recognition is wrong.
  • the detection recognition system 101 includes a display unit that displays an image captured by the camera 102 and a terminal that includes an input unit such as a pointing device or a keyboard.
  • the camera 102 can recognize a burglar. If not, the person confirms the image taken by the camera 102 from the display means of this terminal, and the person selects the image that he / she wanted to recognize the burglar using the input means of this terminal, and recognizes it. It is good also as transmitting to the server 103 as an image at the time of a mistake.
  • the server-side control unit 135 records an image sent from the camera 102 when the recognition is wrong in the server-side recording unit 133 as teacher data (education data). Further, together with the recording of the image when the recognition is wrong, the recognition result desired to be output to the detection / recognition means 121 (for example, the fact that the robber was desired to be recognized from the image) is stored in the server-side recording means 133 as teacher data. Record.
  • the recognition result desired to be output to the detection / recognition means 121 recorded as the teacher data may be generated by the server 103 or may be sent from the camera 102.
  • the server-side detection / recognition unit 132 determines whether or not the recognition is wrong from the recognition results of the plurality of cameras 102, the server-side detection / recognition unit 132 determines that the recognition result (detection / recognition unit) will be correct. (Recognition result desired to be output to 121) may be created as teacher data, and the teacher data may be recorded in the server-side recording unit 133.
  • the server-side control unit 135 may record the data in the server-side recording unit 133 as teacher data.
  • the machine learning unit 130 reads the teacher data recorded in the server side recording unit 133 (step S12). Then, the machine learning means 130 extracts feature points by convolution calculation from the image when the recognition included in the read teacher data is erroneous (step S13). The machine learning unit 130 performs machine learning from the information of the extracted feature points and the recognition result that the detection / recognition unit 121 wanted to output (step S14). As a result of machine learning, a detection / recognition algorithm that is a neural network that performs detection and recognition processing is generated (step S15).
  • Machine learning by the machine learning means 130 is performed so that a detection / recognition algorithm (detection / recognition firmware) is optimized for each camera 102.
  • Each camera 102 may have a different type of camera, or even a camera with exactly the same characteristics, and may have a different installation location and environment, so the optimum algorithm differs depending on these differences. This is because it may come.
  • the machine learning unit 130 Based on the original detection / recognition algorithm and the teacher data, the machine learning unit 130 recognizes the detection / recognition unit 121 included in the teacher data from the image when the recognition included in the teacher data is wrong. Generate a new detection / recognition algorithm that can produce a result.
  • the original detection / recognition algorithm used for machine learning may be recorded in the server-side recording unit 133.
  • Detection / recognition firmware is transmitted from the camera 102, and the detection / recognition firmware is detected / recognized. It may be converted into an algorithm and used. That is, the machine learning unit 130 generates a new detection / recognition algorithm from the detection / recognition algorithm used in the detection / recognition firmware of the camera 102 with the wrong detection / recognition and the teacher data.
  • the detection / recognition firmware generation means 131 converts the detection / recognition algorithm generated by the machine learning means 130 into detection / recognition firmware that is detection / recognition software for each camera (step S16). That is, the detection / recognition algorithm is converted into software in a format that can be executed by each camera by the detection / recognition firmware generation unit 131.
  • the server-side communication unit 134 transmits detection / recognition firmware, which is detection / recognition software generated by the detection / recognition firmware generation unit 131, to the camera 102 (step S17).
  • the control unit 124 of the camera 102 updates the firmware of the detection / recognition unit 121 to the new detection / recognition firmware.
  • the detection / recognition firmware of the detection / recognition unit 121 of the camera 102 is the new detection / recognition firmware generated by the machine learning unit 130 and the detection / recognition firmware generation unit 131 of the server 103. Can be updated.
  • the machine learning by the machine learning unit 130 is performed using, as teacher data, an image when the detection / recognition unit 121 of the camera 102 erroneously recognizes the set recognition target. In machine learning using this teacher data, The detection / recognition algorithm is improved so that the recognition target set for the image is not erroneously recognized. Therefore, the detection / recognition performance of the camera 102 can be improved.
  • the detection recognition firmware is updated even if the computing ability of the camera 102 is not so high.
  • highly accurate detection / recognition can be performed.
  • the camera does not have a relatively low performance as compared with other cameras over the years, but rather the performance can be gradually improved with use.
  • the performance of the camera 102 can be improved so that detection / recognition suitable for the environment in which the camera 102 is used can be performed.
  • the machine learns by itself it becomes possible to recognize the set recognition target even when it cannot be noticed by a person. For example, as educational data, instead of giving an image when the burglar actually robbed, giving an image of the robber before the robber was actually taken. It is not an algorithm that recognizes a burglar when a burglary is actually being carried out, but a person who is likely to be a burglar in the future because of the behavior of people who roam around the convenience store or around the convenience store. It is also possible to generate a detection / recognition algorithm that finds the characteristics and recognizes such a person as a burglar (a person who is likely to be a burglar). Note that since the machine learning means 130 determines which features are actually focused and recognized, it does not always recognize a person who is likely to be a robber from a behavior.
  • the detection recognition system of the present embodiment since the four cameras 102 have overlapping imaging ranges, there are four overlapping portions of the recognition targets set in the detection / recognition firmware. Can be recognized simultaneously by two cameras 102. Therefore, even if several of the four cameras 102 cannot recognize the recognition target, the other cameras of the four cameras 102 can recognize the recognition target. The possibility of being able to be recognized / recognized can be increased, and the accuracy of detection / recognition as a whole system can be increased.
  • the four cameras 102 include cameras with different types of imaging means 120, such as a stereo camera 102a, an infrared camera 102b, and a monocular camera 102c.
  • the infrared camera 102b can detect / recognize the entire system as compared with the case where the same type of camera 102 is used.
  • the accuracy of detection / recognition can be improved.
  • the plurality of cameras 102 may be installed in places where the imaging ranges are completely different, or may recognize completely different recognition targets.
  • the server-side detection / recognition means 132 determines whether the recognition target of each camera 102 is correct from the recognition results of the detection / recognition means 121 of the four cameras 102, or recognizes the recognition target of each camera 102. It is possible to determine the probability that the recognition of the camera is correct, and to determine the misrecognition (recognition error) of the camera 102. Therefore, the alarm sound can be emitted from the terminal 104 or the like only when the server-side detection / recognition means 132 determines that the recognition is correct from the detection / recognition results of the individual cameras 102. Further, it is possible to automatically determine the misrecognition of the camera 102 and automatically perform machine learning so as to improve the detection / recognition ability of the misrecognized camera 102.
  • the image used as the machine learning teacher data at this time can be the image that was misrecognized, so learning is performed so as not to misrecognize the image that has been misrecognized. It becomes possible. Therefore, it is possible to automatically determine misrecognition, and to improve the accuracy of detection / recognition as the camera 102 is used.
  • teacher data may be stored in the recording unit 122 or the server-side recording unit 133, and machine learning may be performed when a certain number or more have been accumulated or when a certain period has elapsed.
  • the machine learning may be performed using an image other than the image captured by the imaging unit 120.
  • the number and quality of the teacher data are not sufficient with only the image captured by the imaging unit 120, the machine learning effect can be improved by providing another image to the machine learning unit 130.
  • the recognition target recognized by the camera 102 is not limited to that described above, and any recognition target can be used as long as it can be detected / recognized from the image captured by the imaging unit 120.
  • the object distance detection device of the present embodiment uses a fisheye lens stereo camera as a camera mainly related to monitoring, such as a surveillance camera and an in-vehicle camera, but is not for outputting a stereoscopic image, A distance image in which each pixel is represented by the distance from the camera to the subject is generated, and a monitoring operation such as detection of a suspicious person is automatically enabled by image recognition from the distance image.
  • a fisheye lens stereo camera as a camera mainly related to monitoring, such as a surveillance camera and an in-vehicle camera, but is not for outputting a stereoscopic image
  • a distance image in which each pixel is represented by the distance from the camera to the subject is generated, and a monitoring operation such as detection of a suspicious person is automatically enabled by image recognition from the distance image.
  • the object distance detection device includes a pair of fisheye cameras 211 having a fisheye lens unit 221, a color filter 222, an image sensor 223, and the like, and respective images from the image sensors 223 of each of the pair of fisheye cameras 211.
  • a pair of image input units 212 for inputting signals, a pair of image signal correction processing units 213 for performing image signal correction processing for removing distortion based on the fisheye lens of the input image, and image signal correction processing from image signals are used.
  • a suspicious person detection unit as a distance image recognition unit that automatically recognizes images and automatically performs monitoring operations such as suspicious person detection And a 16.
  • the pair of fisheye cameras 211 constitutes a stereo camera, and outputs an image connected to the image sensor 223 via the color filter 222 by the fisheye lens unit 221 as an image signal. At this time, the image is output as a moving image.
  • the fisheye lens unit 221 is a fisheye lens by adopting a projection method that is not a central projection method, and is a lens unit that adopts an equidistant projection method in this embodiment.
  • the projection method of the fisheye lens unit 221 is not limited to the equidistant projection method, and any projection method other than the central projection method may be employed.
  • a lens unit of a projection method other than the above-described central projection method May be used as a fisheye lens.
  • the angle of view of the fisheye lens unit 221 is 180 degrees, but it may be, for example, an angle of view of about 160 degrees to 200 degrees.
  • the pair of fisheye cameras 211 are arranged adjacent to each other so that the optical axes of the fisheye lens units 221 are parallel to each other so that each of the other fisheye lens units 221 is photographed with an angle of view of 180 degrees. It has become. This makes it possible to use epipolar geometry for searching for corresponding points described later.
  • the output from the image sensor 223 of the fisheye camera 211 is input from the image input unit 212 in the object distance detection apparatus, and the image signal correction processing unit 213 performs color synchronization processing, white balance processing, gamma processing, color matrix processing, luminance. Performs matrix processing, color difference / luminance processing, and the like.
  • the color filter 222 may not be used, and processing relating to color may not be performed. Note that, as described later, feature points are extracted from two images by image recognition, and corresponding points are searched based on the feature points. Therefore, in image recognition, a color image is selected from a luminance image (grayscale image). If the recognition rate is higher, a color image may be generated from the image signal as described above. Also, parameters based on the image signal required by the image signal correction processing unit 213 are calculated from the image signal by the correction parameter calculation unit 215.
  • the image analysis unit 214 receives the image corrected as described above, and an image conversion unit 231 as a distortion removal unit performs image conversion to remove distortion caused by the fisheye lens as necessary.
  • an equidistant projection image is converted into a central projection image.
  • distortion removal can be performed by a known method.
  • a known integrated circuit for image conversion for distortion removal is used.
  • partitioning is performed to divide the converted image from which distortion has been removed by the partitioning unit and the image selection unit 232 serving as a resolution conversion unit into set partitions.
  • This partitioning is not limited to a straight line partitioning, but may be a curve partitioning. For example, when an equidistance projection image is used without image conversion, segmentation by a curve is preferable.
  • FIG. 13 shows the sections K11 to K33 when the pair of fisheye cameras 211 are arranged with the optical axis directed in the vertical direction (downward) on the ceiling or the like when the converted image is rectangular.
  • the area of each partition is large in the central sections K11, K12, and K21, and the peripheral sections K13, K23, K31, K32, and K33 are narrow in area.
  • the resolutions of these sections K11 to K33 are low in the central sections K11, K12, and K21, while the resolutions in the peripheral sections K13, K23, K31, K32, and K33 are high.
  • the resolution of the central sections K11, K12, and K21 is lowered, whereas the peripheral sections K13, K23, K31, and K32 The resolution will be higher.
  • the resolution is higher than that at the peripheral portion, and even if the pixel is thinned out so that the resolution is reduced at the center portion of the image, the influence is less than when the resolution is reduced at the peripheral portion. Therefore, the resolution is lowered by performing the process of thinning out pixels in the central sections K11, K12, and K21 of the image, and the pixels are not thinned out in the peripheral sections K13, K23, K31, K32, and K33 of the image, The resolution is not lowered.
  • pixels may be thinned out in all the sections K11 to K33, and the degree thereof may be increased in the sections K11, K12, and K21 in the central portion than the sections K13, K23, K31, K32, and K33 in the peripheral portion of the image.
  • the area of the central sections K11, K12, and K21 where the resolution is lowered is made wider than the peripheral sections K13, K23, K31, K32, and K33 where the resolution is not lowered.
  • the image is distorted so that the image shrinks at the peripheral portion.
  • the area of the peripheral section is made smaller than that of the central part when partitioning.
  • the resolution is lowered at the peripheral part where the distortion is large, the image is greatly deteriorated, and there is a possibility that the calculation of the distance and the image recognition are adversely affected. Therefore, when removing distortion after partitioning, it is preferable to thin out more pixels at the central portion than at the peripheral portion.
  • FIG. 14 shows that when the converted image is rectangular, for example, a pair of fisheye cameras 211 is arranged at a position higher than the human back with the optical axis directed horizontally or obliquely downward with respect to the horizontal direction.
  • the sections K11 to K42 are shown in the figure.
  • the sections K11 to K33 at the center and the lower side are the same sections as the above-described fish-eye camera 211 facing the vertical direction, and each section K11 to K33.
  • the resolution and area are set in the same manner.
  • the upper sections K41 and K42 above the center portion of the image for example, the sky is outdoors and the ceiling is reflected indoors. Therefore, the importance is low, and the resolution is lower and the area is wider than the section K11 in the center portion. Yes.
  • the upper center section K41 has the lowest resolution and the largest area.
  • the upper left and right sections K42 have a higher resolution than the section K41 and a smaller area. It has become.
  • FIG. 13 and FIG. 14 show an example of partitioning.
  • the resolution of the central partition is lowered and the area is widened.
  • the resolution and the area can be adjusted according to the importance based on the arrangement of the fisheye camera 211, based on increasing the resolution of the peripheral portion and reducing the area. That is, the fisheye camera 211 having an angle of view of about 180 degrees or more has a wide shooting range. For example, a position where a person cannot exist may enter the shooting range. It is preferable to improve the processing speed by lowering the resolution of such a portion.
  • the image selection unit 232 shown in FIG. 11 selects the same sections K11 to K33 for each of the pair of images.
  • the corresponding point selection unit 233 as the next corresponding point search unit extracts feature points of each image by image recognition and associates feature points of each image as described above. Do. In this case, epipolar geometry can be used, and corresponding points to be paired in each image are sequentially determined and selected.
  • selection of corresponding points for a pair of sections is completed, corresponding points are extracted in the next pair of sections.
  • the distance calculation unit 234 determines the difference between the positions of the corresponding points on the image and the distance between the pair of fisheye cameras 211.
  • the distance from the fisheye camera 211 to the target point corresponding to the pair of corresponding points is obtained.
  • the distance is calculated based on the so-called triangulation method.
  • the baseline distance (the distance between a pair of cameras) is much smaller than the distance to the target 100 with respect to general triangulation.
  • the distance to the target point is obtained by dividing the baseline distance by the parallax (unit radians).
  • the three-dimensional coordinates of the target point on the three-dimensional coordinate in the real space are calculated from the coordinate position (projection position) on the two-dimensional coordinate on each image of the corresponding point, the distance between the cameras, and the focal length of the camera.
  • the distance from the camera to the target point can be calculated from the target point and the three-dimensional coordinate position in the real space of the camera.
  • the flowchart in FIG. 12 shows the processing in the image analysis unit 214 described above. After the image is captured and corrected by the pair of fisheye cameras 211, the image is input to the image analysis unit 214 for each pair of frames. The process which calculates the distance image which shows the distance from the fish-eye camera 211 for every corresponding point used as each pixel to the object point is shown. Note that the flowchart in FIG. 12 illustrates a case where a process for removing distortion is performed after partitioning. As shown in the flowchart, when a corrected image for each frame is input from the pair of fisheye cameras 211, partitioning is performed (step S21).
  • an image is divided into a plurality of partitions and a process of reducing the resolution of each partition (for example, reduction processing) is performed, but the resolution is changed depending on the partition.
  • the area of one division is adjusted with a division. Basically, the resolution is low and the area is increased in the central section of the image, while the resolution of the section is higher than that in the central area and the area of the section is smaller than that in the central section.
  • step S22 distortion is removed in each partition.
  • feature points singular points such as edges are detected by well-known edge detection.
  • a corresponding point is a point on the image corresponding to a target point in the real space, a point corresponding to the same target point in two images is a corresponding point, and each point in the real space is shown in both images. If so, there will be corresponding points, and it is preferable to search for many corresponding points using the above-described epipolar geometry method.
  • Step S24 the distance from the fisheye camera 211 to the target point in the real space corresponding to the corresponding point is calculated based on the difference in the position of the corresponding point in the image and the distance between the fisheye cameras 211 (Ste S24).
  • Step S25 a distance image is generated and output with the distance of each corresponding point determined as the value of each pixel (step S25).
  • this distance image is used for monitoring.
  • a person is detected by recognizing a distance image, or a living thing or an article (vehicle or the like) other than a person is detected.
  • the person is identified from the registered person's three-dimensional shape or photograph, or the person's three-dimensional shape detected in the distance image, for example, height, physique, clothing shape, etc.
  • a child and an adult may be distinguished, an age may be distinguished, and a man and a woman may be distinguished.
  • distance detection is performed by a stereo camera including a plurality of fisheye cameras 211 having an angle of view of, for example, about 180 degrees. Therefore, for example, up to a target point as each object in real space.
  • the distance can be obtained and the distance image can be output.
  • the angle of view is wide, the amount of information in one frame image is large, and the amount of calculation processing including image recognition necessary for distance detection is large.
  • a long time is required for processing, and the processing time for one frame is long, making it difficult to process a moving image.
  • the image is distorted from the central part, and the resolution is not necessarily high.
  • the image of the fisheye camera 211 a lot of less important parts are reflected in the monitoring work such as the sky, the ceiling, the floor and the ground. Therefore, by dividing the part with low distortion and high resolution and the part with low importance and calculating the distance by reducing the resolution of this part, the processing amount can be reduced without greatly reducing the accuracy of distance calculation. This can reduce the processing time. Thereby, the image recognition in the monitoring apparatus using the distance image calculated
  • each pixel region D of the distance image output from the object distance detection device will be described.
  • each point on the image is represented by a distance from the stereo camera.
  • the distance image is represented by a change in color shade according to the distance value.
  • the color change may be, for example, a monochrome gradation or a gradation of another color.
  • each point on the image may be represented by a numerical value indicating the distance.
  • FIG. 15A shows a part of the section K11 as a part of the distance image corresponding to the sectioned image shown in FIG. 13, and
  • FIG. 15B shows a part of the section K33.
  • the sections K11 to K33 are arranged in the same manner as in FIG.
  • the resolution varies depending on the sections K11 to K33. However, as shown in FIG. 15, the size of the pixel P (part divided by both a double line and a dotted line) which is the minimum unit on the distance image is the same.
  • the pixel P is, for example, a monitor pixel that displays a distance image.
  • the image is divided into pixel areas D (parts separated by double lines), and each pixel area D is composed of one pixel P or a plurality of pixels P.
  • black and white shading corresponding to the distance from the stereo camera to the corresponding point (photographing target) is attached to each pixel area D
  • the distance image is a color (color shading) corresponding to the distance of each pixel area D. It is expressed.
  • the resolution of the section K11 is lower than that of the section K33, while the number of pixels in each pixel area D of the section K11 is 4, whereas the number of pixels of the pixel area D of the section K33 is 2.
  • the pixel area D of the section K11 having a lower resolution has a larger number of pixels P and a larger area than the pixel area D of the section K33 having a higher resolution.
  • Image sensor imaging means
  • 3D sensor distance image acquisition means, distance measurement means
  • Object extraction means distance image recognition object extraction means
  • 4 Object image extraction means Recognition object image extraction means
  • 5 Image recognition means 10
  • Camera imaging means
  • 11 Distance image detection means distance image acquisition means
  • 12 Stereo camera stereo camera

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Emergency Management (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

Provided is an image recognition imaging apparatus that is capable of reducing the calculation amount in image recognition and improving the recognition rate. This image recognition imaging apparatus is provided with: an image sensor 1 that captures an image; and a 3D sensor 2 that acquires a distance image in which each pixel in the range corresponding to the captured range of the image is indicated by the distance to an object to be imaged. The image recognition imaging apparatus is also provided with an object extraction means 3 for extracting an object to be recognized in the distance image on the basis of the distance from the distance image. The image recognition imaging apparatus is further provided with: an object image extraction means 4 for extracting, from the image on the basis of the range, a partial image which becomes an object, in the distance image, of the object extracted from the distance image; and an image recognition means 5 for identifying the extracted object by executing image recognition for the partial image.

Description

画像認識撮像装置Image recognition imaging device
 本発明は、2次元画像と距離画像を取得して画像認識を行う画像認識撮像装置に関する。 The present invention relates to an image recognition imaging apparatus that performs image recognition by acquiring a two-dimensional image and a distance image.
 一般に監視カメラでは、画像を撮影してモニタに表示させ、この画像を人がリアルタイムで監視したり、画像を記憶して事件発生後に事件を確認するのに利用されたりする。それに対して近年は、AI(人工知能)技術の発達に伴い、画像認識により、特定の人物の存在を自動で検知したり、立入禁止区域への不審者の侵入を自動で検知したりすることが可能である。 Generally, a surveillance camera captures an image and displays it on a monitor, and this image is used by a person to monitor the image in real time or to store an image and confirm the incident after the incident occurs. On the other hand, in recent years, with the development of AI (artificial intelligence) technology, it is possible to automatically detect the presence of a specific person or automatically detect the entry of a suspicious person into a restricted area by image recognition. Is possible.
 一方、自動車においては、衝突等の危険の回避を自動化するために、縦横に配置された各画素における撮影対象までの距離からなる距離画像を用いることが提案されている(例えば、特許文献1参照)これにより、障害物の位置や動きを検知することができる。この場合に、距離の測定と画像の取得に2台のカメラを有するステレオカメラが使用されている。 On the other hand, in an automobile, in order to automate the avoidance of a danger such as a collision, it has been proposed to use a distance image including a distance to a subject to be photographed in each pixel arranged vertically and horizontally (see, for example, Patent Document 1). ) Thereby, the position and movement of the obstacle can be detected. In this case, a stereo camera having two cameras is used for distance measurement and image acquisition.
 上述のように、人や物等の物体を検知/認識するカメラが知られている(例えば、特許文献2、3参照)。このようなカメラは、例えば、監視カメラとして防犯の目的に使用され、検知/認識により異常が発見された場合に警報を発したりする。 As described above, a camera that detects / recognizes an object such as a person or an object is known (for example, see Patent Documents 2 and 3). Such a camera is used as a surveillance camera for the purpose of crime prevention, for example, and issues an alarm when an abnormality is detected by detection / recognition.
 また、近年、画像認識等の分野において、機械学習がよく用いられるようになってきている。機械学習の手法としては、例えばディープラーニング等が知られている。ディープラーニングは、多層構造のニューラルネットワークを用いて、データの特徴を学習するものであり、これを用いることで、高精度の画像認識が可能になることが知られている。 In recent years, machine learning is often used in fields such as image recognition. As a machine learning technique, for example, deep learning is known. Deep learning is a technique for learning data characteristics using a neural network having a multilayer structure, and it is known that high-accuracy image recognition is possible by using this.
 上述のように、近年の画像認識技術の向上に伴って、車載カメラの映像の画像認識を利用した自動車運転時の危険回避および自動運転や、監視カメラの映像の画像認識による既知の犯罪者の検出などが可能となっている。また、車載カメラや監視カメラとして、上述の一対のカメラを有するステレオカメラを用いて、例えば、被写体上の同じ対象点に対応する各画像上の対応点同士の位置の違い(視差)に基づいて、ステレオカメラから画像上の各点に対応する被写体上の点までの距離を検出して、画像の各画素を輝度や色差ではなく被写体までの距離で表す距離画像を求め、この距離画像で画像認識を行うことが考えられる。 As described above, along with the recent improvement of image recognition technology, risk avoidance and automatic driving when driving a car using image recognition of in-vehicle camera images, and known criminals by image recognition of surveillance camera images Detection is possible. Further, as a vehicle-mounted camera or a surveillance camera, using a stereo camera having the above-described pair of cameras, for example, based on the difference in position (parallax) between corresponding points on each image corresponding to the same target point on the subject. Detects the distance from the stereo camera to the point on the subject corresponding to each point on the image, and obtains a distance image that represents each pixel of the image not by brightness or color difference but by the distance to the subject. It is conceivable to perform recognition.
 この場合に、撮影された二つの画像それぞれにおいて撮影された被写体上の各点に対応する対応点を探索し、被写体上の同じ点に対応する二つの画像上の対応点同士の位置の違いとしての視差から上述の距離を算出する。また、距離画像が得られたところで、例えば、顏認識等による人検出や移動体検出などの画像認識を行い、さらに、自動車運転時の危険回避や、自動運転や、顏認識による指名手配犯等の特定の人物の検知や、不法侵入者等の不審者の検知等を行う。距離画像における画像認識では、撮影対象上の物体の立体形状の一部が距離画像により分かった状態なので、画像上の被写体が人、犬、猫、馬、車両等の識別を容易かつ高精度に行うことが可能となる。
 また、画像認識の精度を考慮した場合に、高解像度のイメージセンサを用いることが好ましいが、画素数が多い画像を用いて上述の距離の検出等の演算処理を行う場合に、計算量が膨大な量となり、車載カメラや監視カメラの映像のような動画を処理する場合に1フームの処理に長い時間がかかってしまう。演算処理を行う集積回路の演算速度によっては、1フレームの演算処理時間が長くなり過ぎて、リアルタイムでの対応が要求される場合にその後の処理が間に合わない虞がある。この場合に、最初に画像の画素を間引いて画像の解像度(画素数)を下げて処理時間の短縮を図ることが考えられる。
In this case, a corresponding point corresponding to each point on the photographed subject in each of the two photographed images is searched, and the difference in position between corresponding points on the two images corresponding to the same point on the subject is determined. The above distance is calculated from the parallax. In addition, when the distance image is obtained, for example, image recognition such as human detection or moving body detection by hail recognition, etc., and further, avoidance of danger when driving a car, automatic driving, nomination arrangement offense by hail recognition, etc. Detecting a specific person, detecting a suspicious person such as an illegal intruder, and the like. In the image recognition in the distance image, since a part of the three-dimensional shape of the object on the object to be photographed is known from the distance image, the subject on the image can easily and accurately identify people, dogs, cats, horses, vehicles, etc. Can be done.
In consideration of the accuracy of image recognition, it is preferable to use a high-resolution image sensor. However, when an arithmetic process such as distance detection described above is performed using an image having a large number of pixels, the amount of calculation is enormous. Therefore, when processing a moving image such as a video from an in-vehicle camera or a surveillance camera, it takes a long time to process one frame. Depending on the calculation speed of the integrated circuit that performs calculation processing, the calculation processing time of one frame becomes too long, and there is a possibility that subsequent processing may not be in time when a real-time response is required. In this case, it is conceivable to reduce the processing time by first thinning out the pixels of the image to lower the resolution (number of pixels) of the image.
 また、カメラのレンズユニット(レンズ)として、魚眼レンズが知られている。魚眼レンズは、通常の広角レンズや望遠レンズで用いられる中心射影方式ではない射影方式を採用しているレンズユニットであり、例えば、ほとんどの魚眼レンズは、等距離射影方式を採用している。魚眼レンズでは、それ以外に、等立体角射影方式や、正射影方式や、立体射影方式などが採用される。なお、望遠側では、中心射影方式とその他の射影方式との間の撮影画像の差が少なく、望遠の魚眼レンズは存在せず、魚眼レンズは、超広角レンズとなる。一般的に魚眼レンズの画角は180度のものが多いが、画角が180度未満のレンズや、画角が180度より大きなレンズも存在する。 Also, fish-eye lenses are known as camera lens units (lenses). The fisheye lens is a lens unit that adopts a projection method that is not the central projection method used in a normal wide-angle lens or a telephoto lens. For example, most fisheye lenses adopt an equidistant projection method. In addition to this, a fisheye lens employs an equisolid angle projection method, an orthographic projection method, a stereo projection method, and the like. On the telephoto side, there is little difference in the captured image between the central projection method and other projection methods, there is no telescopic fisheye lens, and the fisheye lens is an ultra-wide-angle lens. In general, the angle of view of a fisheye lens is often 180 degrees, but there are lenses with an angle of view of less than 180 degrees and lenses with an angle of view of greater than 180 degrees.
 このような魚眼レンズを用いたステレオカメラで被写体までの距離を求める距離算出装置が提案されている(特許文献4参照)。この魚眼レンズを用いたステレオカメラでは、ステレオカメラで撮影された画像からこれを球面に投影した球面画像に変換し、各被写体までの距離を求めている。 A distance calculation device for obtaining the distance to a subject using a stereo camera using such a fisheye lens has been proposed (see Patent Document 4). In a stereo camera using this fisheye lens, an image captured by the stereo camera is converted into a spherical image obtained by projecting the image onto a spherical surface, and the distance to each subject is obtained.
特開2016-1464号公報Japanese Unexamined Patent Publication No. 2016-1464 特開2012-208851号公報JP 2012-208551 A 特開2010-160743号公報JP 2010-160743 A 特開2013-109779号公報JP 2013-109777 A
 ところで、2Dカメラを用いた二次元画像の認識は、3次元空間内の人や物が二次元に投影された状態の画像上で行われるので、AI技術(深層機械学習等)を利用しても処理速度および認識率の向上が頭打ちになってしまう。すなわち、二次元の可視画像からの物体抽出では、異なる色、および異なる輝度の部分を同じ物体のものであると認識するため、全画面を処理して、色や輝度の変化等から物体を認識した上で抽出する必要があり、膨大な演算量が必要となり、処理に時間がかかるだけでなく、検出精度が深層機械学習を使った画像認識においても、90%程度が限界である。 By the way, since recognition of a two-dimensional image using a 2D camera is performed on an image in a state where a person or an object in a three-dimensional space is projected two-dimensionally, AI technology (deep machine learning or the like) is used. However, the improvement of the processing speed and the recognition rate will end. That is, in object extraction from a two-dimensional visible image, different colors and different luminance parts are recognized as the same object, so the entire screen is processed and the object is recognized from changes in color and luminance. In addition, it requires a huge amount of calculation and requires a long time for processing, and the detection accuracy is about 90% in image recognition using deep machine learning.
 一方、3Dカメラでは、距離が分かることにより、自車両がぶつかる障害物があるか否かを比較的容易に認識することが可能である。すなわち、衝突するような位置にある人や物を障害物として検知することが容易である。すなわち、3Dセンサでは、物体の形状を認識可能だが、3Dセンサで読み取った形状だけで物体を識別できない場合もある。 On the other hand, with a 3D camera, it is possible to recognize relatively easily whether there is an obstacle with which the host vehicle collides by knowing the distance. That is, it is easy to detect a person or an object at a collision position as an obstacle. That is, the 3D sensor can recognize the shape of the object, but the object may not be identified only by the shape read by the 3D sensor.
 本発明は、前記事情に鑑みてなされたものであり、画像認識における演算量の低減と、認識率の向上を図ることができる画像認識撮像装置を提供することを目的とする。 The present invention has been made in view of the above circumstances, and an object thereof is to provide an image recognition imaging apparatus capable of reducing the amount of calculation in image recognition and improving the recognition rate.
 前記課題を解決するために、本発明の画像認識撮像装置は、画像を撮像する撮像手段と、
 前記画像の撮像範囲に対応する範囲の各画素が撮影対象までの距離で表される距離画像を取得する距離画像取得手段と、
 前記距離画像から距離に基づいて前記距離画像上の認識対象物を抽出する距離画像認識対象物抽出手段と、
 前記距離画像から抽出された前記認識対象物の前記距離画像上の範囲に基づき、前記画像から前記認識対象物となる部分画像を抽出する認識対象物画像抽出手段と、
 前記部分画像の画像認識を行い、前記認識対象物を識別する画像認識手段とを備えることを特徴とする。
In order to solve the above problems, an image recognition imaging apparatus of the present invention includes an imaging unit that captures an image,
Distance image acquisition means for acquiring a distance image in which each pixel in a range corresponding to the imaging range of the image is represented by a distance to a shooting target;
Distance image recognition object extraction means for extracting a recognition object on the distance image based on the distance from the distance image;
Recognition object image extraction means for extracting a partial image to be the recognition object from the image based on a range on the distance image of the recognition object extracted from the distance image;
And image recognition means for recognizing the partial image and identifying the recognition object.
 このような構成によれば、距離画像において、距離の値が互いに近く、かつ、一塊となった画素の集団を距離画像上の認識対象物である物体として認識することができる。この場合に例えば二次元のカラーの可視画像から物体を抽出する場合に比較して簡単な計算で物体を抽出することができる。 According to such a configuration, in the distance image, a group of pixels having a distance value close to each other and a lump can be recognized as an object that is a recognition target object on the distance image. In this case, for example, an object can be extracted by simple calculation as compared with the case of extracting an object from a two-dimensional color visible image.
 距離画像は、基本的に画像の撮影範囲に対応しており、画像の各位置に対応する距離画像の位置には距離情報を含むものである。たとえば、画像と距離画像が同一の画素数である場合には、対象物の一部を撮影した画像の特定位置の画素に対応する距離画像の画素はその色により当該対象物の一部までの距離情報を表すものとなっている。また、画像の撮像範囲が距離画像の範囲に対応している。すなわち、基本的に画像と距離画像は同じ範囲を撮像したものであることが好ましいが、画像と距離画像の位置の対応関係が取れていれば、いずれかの画像が他方の画像より大きくてもよい。なお、同時に撮像されたものであることが好ましい。したがって、距離画像上の認識対象物を画像上に割り当てることが可能であり、距離画像上で抽出された物体の範囲と画像上で同じ範囲を画像上の物体とすることができる。 The distance image basically corresponds to the shooting range of the image, and the position of the distance image corresponding to each position of the image includes distance information. For example, when the image and the distance image have the same number of pixels, the pixel of the distance image corresponding to the pixel at the specific position of the image obtained by photographing a part of the object is up to a part of the object by the color. It represents distance information. Further, the image capturing range corresponds to the range of the range image. In other words, it is preferable that the image and the distance image are basically taken from the same range, but if the correspondence between the position of the image and the distance image is obtained, even if either image is larger than the other image, Good. In addition, it is preferable that it was what was imaged simultaneously. Therefore, the recognition target on the distance image can be assigned on the image, and the same range on the image as the range of the object extracted on the distance image can be set as the object on the image.
 この画像上の物体の範囲となる分部画像を画像認識して、物体を識別する。この場合に、従来のように画像全体に対して画像認識を行うのに比較して、抽出された分部画像だけの画像認識を行うので演算量を大幅に削減することができる。また、画像上の背景から物体が既に分離された状態なので、画像から物体を分離する演算の多くを必要とせず、演算量を削減することができる。また、この物体の分離が距離により行われているので、分離の精度を高くできる。 画像 Image recognition of the partial image that is the range of the object on this image to identify the object. In this case, the amount of calculation can be greatly reduced because the image recognition is performed only for the extracted partial images, compared to the conventional image recognition performed on the entire image. In addition, since the object has already been separated from the background on the image, many operations for separating the object from the image are not required, and the amount of calculation can be reduced. In addition, since the separation of the object is performed based on the distance, the separation accuracy can be increased.
 画像認識においては、抽出された物体の形状や色や輝度等に基づいて、物体の属性として人や自動車等を識別することができる。また、深層機械学習を使った画像認識により、人として識別するだけでなく、男、女、大人、子供等を識別することができたり、顔の特徴、たとえば、鼻の高さ、目や口の大きさ、目の色を職別できたりする。また、自動車の車種や年式等を識別可能である。また、予め特定の人物の特徴点のデータが記憶されていれば、人と認識された認識対象物が特定の人物か否かを識別することができる。また、物体を距離画像から分離した際に、距離の関係から物体の立体形状とサイズを認識可能であり、物体の立体形状やサイズを参酌して、画像認識を行なうことが可能であり、距離画像からの距離による物体の分離と、画像に加えて立体形状やサイズのデータを利用した画像認識により物体の検出や識別を含む物体の識別能力や識別精度を向上することができる。 In image recognition, it is possible to identify a person, a car, or the like as an object attribute based on the shape, color, brightness, etc. of the extracted object. In addition to image recognition using deep machine learning, it can identify not only humans but also men, women, adults, children, etc., and facial features such as nose height, eyes and mouth. The size of eyes and the color of eyes can be classified by job. In addition, it is possible to identify the type and year of an automobile. Further, if data of feature points of a specific person is stored in advance, it is possible to identify whether or not the recognition object recognized as a person is a specific person. In addition, when the object is separated from the distance image, the three-dimensional shape and size of the object can be recognized from the relationship of the distance, and the image can be recognized in consideration of the three-dimensional shape and size of the object. Separation of an object according to a distance from an image and image recognition using data of a three-dimensional shape and size in addition to the image can improve object identification capability and identification accuracy including object detection and identification.
 本発明の前記構成において、前記距離画像取得手段は、前記距離画像の各画素の距離を計測する距離計測手段を有することが好ましい。 In the configuration of the present invention, it is preferable that the distance image acquisition unit includes a distance measurement unit that measures a distance of each pixel of the distance image.
 このような構成によれば、画像を撮影する単眼のカメラである撮像手段と、深度センサ、3Dセンサといった距離画像を生成する距離計測手段とから上述の画像と距離画像を得ることができる。深度センサ、3Dセンサとしては、例えばTOF(Time Of Flight)方式のものを用いることができる。 According to such a configuration, the above-described image and the distance image can be obtained from the imaging unit that is a monocular camera that captures an image and the distance measurement unit that generates a distance image such as a depth sensor and a 3D sensor. As the depth sensor and the 3D sensor, for example, a TOF (Time Of Flight) type can be used.
 また、本発明の前記構成において、前記距離画像取得手段は、二つの前記撮像手段の視差に基づいて距離画像を求めることが好ましい。 Further, in the configuration of the present invention, it is preferable that the distance image acquisition unit obtains a distance image based on parallax between the two imaging units.
 このような構成によれば、いわゆるステレオカメラの視差から距離画像を求めることができる。 According to such a configuration, a distance image can be obtained from a parallax of a so-called stereo camera.
 また、本発明の前記構成において、一つの筐体内に、前記撮像手段、前記距離画像取得手段、前記距離画像認識対象物抽出手段、前記認識対象物画像抽出手段および前記画像認識手段を備えることが好ましい。 Further, in the configuration of the present invention, the imaging means, the distance image acquisition means, the distance image recognition object extraction means, the recognition object image extraction means, and the image recognition means are provided in one housing. preferable.
 このような構成によれば、画像認識に要する処理が簡略化されるので、処理速度の高い大型の演算処理装置を必用とせず、監視カメラ等の比較的小さな筐体内に画像認識撮像装置を収めることができる。なお、画像認識撮像装置が外部のサーバとデータ通信可能に接続され、サーバでデータの保存やより高度な画像認識処理を行えるようにしてもよい。 According to such a configuration, processing required for image recognition is simplified, so that a large arithmetic processing device with a high processing speed is not required, and the image recognition imaging device is housed in a relatively small casing such as a monitoring camera. be able to. Note that the image recognition imaging apparatus may be connected to an external server so as to be able to perform data communication, and the server may be configured to store data or perform more advanced image recognition processing.
 ここで、監視カメラ等は、設置してから長期間使用する場合も多いが、画像から物体等を検知/認識する技術は日々進歩しているので、長期間使用をしていると、そのカメラに用いられている検知/認識技術が時代遅れのものとなってしまうおそれがある。
 また、検知/認識のアルゴリズムは、使用される場所等の環境や、撮影対象等によって、最適なアルゴリズムが変わってくるので、カメラの設置前からカメラにもともと備えられていた検知/認識のファームウェアに用いられている検知/認識アルゴリズムでは、十分な検知/認識が行なえない可能性がある。
Here, surveillance cameras and the like are often used for a long time after installation. However, since the technology for detecting / recognizing an object from an image is advancing day by day, The detection / recognition technology used in the system may become obsolete.
Also, the detection / recognition algorithm varies depending on the environment such as the place where it is used, the subject of photography, etc., so that the detection / recognition firmware originally provided in the camera before the installation of the camera is used. There is a possibility that sufficient detection / recognition cannot be performed with the detection / recognition algorithm used.
 本発明は、前記事情に鑑みてなされたもので、画像に含まれる特徴を検知し、この特徴から設定された認識対象を認識する検知/認識の性能を、検知/認識のためのファームウェアを更新して向上させることができる検知認識システムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and detects the feature included in the image and updates the detection / recognition performance of the detection / recognition performance for recognizing the recognition target set from this feature. It is an object of the present invention to provide a detection recognition system that can be improved.
 前記目的を達成するために、本発明の検知認識システムは、
 撮像をする撮像手段と、検知/認識手段と、サーバと、を備え、
 前記検知/認識手段は、検知/認識ファームウェアを備え、前記撮像手段で取得した画像から前記検知/認識ファームウェアによる制御により、前記画像に含まれる特徴を検知して、設定された認識対象を認識するとともに、前記検知/認識ファームウェアを検知/認識ファームウェア生成手段で生成された新たな検知/認識ファームウェアに更新可能であり、
 前記サーバは、前記撮像手段で取得した画像を教師データとして機械学習により検知/認識アルゴリズムを生成する機械学習手段と、前記検知/認識アルゴリズムから前記検知/認識手段の新たな検知/認識ファームウェアを生成する前記検知/認識ファームウェア生成手段と、を備えることを特徴とする。
In order to achieve the above object, the detection recognition system of the present invention includes:
An imaging unit for imaging, a detection / recognition unit, and a server;
The detection / recognition unit includes detection / recognition firmware, detects a feature included in the image from the image acquired by the imaging unit, and recognizes a set recognition target. In addition, the detection / recognition firmware can be updated to a new detection / recognition firmware generated by the detection / recognition firmware generation unit.
The server generates machine detection means for generating a detection / recognition algorithm by machine learning using the image acquired by the image pickup means as teacher data, and generates new detection / recognition firmware of the detection / recognition means from the detection / recognition algorithm. And detecting / recognizing firmware generation means.
 本発明の検知認識システムにおいては、撮像手段は、撮像をする。そして、検知/認識手段は、撮像手段での撮像により得られた画像から、検知/認識ファームウェアによる制御により、画像に含まれる特徴を検知して、設定された認識対象を認識する。また、サーバの機械学習手段は、撮像手段で取得した画像を教師データとして機械学習により検知/認識アルゴリズムを生成する。そして、生成された検知/認識アルゴリズムは、サーバの検知/認識ファームウェア生成手段により、検知/認識手段に適合したファームウェア(検知/認識ファームウェア)に変換される。そして、検知/認識手段の検知/認識ファームウェアは、検知/認識ファームウェア生成手段によって生成された新たな検知/認識ファームウェアに更新される。
 したがって、撮像手段で得られた画像から、機械学習により、より高精度な検知および認識が行なえる検知/認識アルゴリズムを生成し、この検知/認識アルゴリズムを、検知/認識手段に適合したファームウェアに変換し、検知/認識手段の検知/認識ファームウェアを更新することができるので、検知/認識の性能を向上させることができる。
In the detection recognition system of the present invention, the image pickup means picks up an image. Then, the detection / recognition unit detects a feature included in the image from the image obtained by the image pickup by the image pickup unit, and recognizes the set recognition target. The machine learning means of the server generates a detection / recognition algorithm by machine learning using the image acquired by the imaging means as teacher data. Then, the generated detection / recognition algorithm is converted into firmware (detection / recognition firmware) suitable for the detection / recognition means by the detection / recognition firmware generation means of the server. Then, the detection / recognition firmware of the detection / recognition unit is updated to a new detection / recognition firmware generated by the detection / recognition firmware generation unit.
Therefore, a detection / recognition algorithm that can be detected and recognized with higher accuracy is generated from the image obtained by the imaging means by machine learning, and the detection / recognition algorithm is converted into firmware suitable for the detection / recognition means. Since the detection / recognition firmware of the detection / recognition means can be updated, the detection / recognition performance can be improved.
 また、本発明の検知認識システムの前記構成において、前記機械学習手段は、前記検知/認識手段が、前記設定された認識対象の認識を誤った際の画像を教師データとして機械学習をすることが好ましい。 Further, in the configuration of the detection recognition system of the present invention, the machine learning means may perform machine learning using, as teacher data, an image when the detection / recognition means erroneously recognizes the set recognition target. preferable.
 このような構成によれば、機械学習手段は、検知/認識手段が設定された認識対象の認識を誤った際の画像について、認識対象の認識を誤ることがないように、学習し、新たな検知/認識アルゴリズムを生成し、新たな検知/認識ファームウェアを生成することが可能となるので、確実に検知/認識の性能を向上させることができる。 According to such a configuration, the machine learning means learns a new recognition target so that the recognition target is not erroneously recognized with respect to an image when the recognition target set with the detection / recognition means is erroneously recognized. Since the detection / recognition algorithm can be generated and new detection / recognition firmware can be generated, the detection / recognition performance can be reliably improved.
 また、本発明の検知認識システムの前記構成において、
 少なくとも1台のカメラを備え、
 前記カメラは、前記撮像手段と前記検知/認識手段とを備えることが好ましい。
Further, in the configuration of the detection recognition system of the present invention,
With at least one camera,
The camera preferably includes the imaging unit and the detection / recognition unit.
 このような構成によれば、カメラとは別に検知/認識手段を備える端末等を備える必要がないので、システム全体の小型化が行なえる。また、カメラは、検知/認識手段の検知/認識ファームウェアを、サーバでの機械学習の結果作成された新たな検知/認識ファームウェアに、更新することができるので、カメラの検知/認識性能を向上させることができる。したがって、設置後のカメラ等であっても容易にカメラの検知/認識性能を向上させることができる。 According to such a configuration, it is not necessary to provide a terminal or the like that includes detection / recognition means separately from the camera, so that the entire system can be reduced in size. In addition, since the camera can update the detection / recognition firmware of the detection / recognition means to a new detection / recognition firmware created as a result of machine learning in the server, the detection / recognition performance of the camera is improved. be able to. Therefore, the detection / recognition performance of the camera can be easily improved even with a camera or the like after installation.
 また、本発明の検知認識システムの前記構成において、
 前記撮像手段による撮像範囲および前記設定された認識対象のうちの少なくとも一部が重複する複数の前記カメラを備えることが好ましい。
Further, in the configuration of the detection recognition system of the present invention,
It is preferable to include a plurality of the cameras in which at least a part of an imaging range by the imaging unit and the set recognition target overlap.
 このような構成によれば、所定の範囲について複数のカメラで撮像し、検知/認識を行うことができる。したがって、同一の物体や、同一の現象について、複数のカメラで検知/認識をすることができるので、検知/認識の精度を高めることができる。 According to such a configuration, a predetermined range can be imaged with a plurality of cameras and detected / recognized. Therefore, since the same object and the same phenomenon can be detected / recognized by a plurality of cameras, the accuracy of detection / recognition can be improved.
 また、本発明の検知認識システムの前記構成において、
 前記機械学習手段は、前記複数のカメラのうちの、一部のカメラが重複する認識対象を認識した場合に、前記複数のカメラのうちの、他のカメラが前記重複する認識対象を認識しなかった画像を教師データとして、機械学習をすることが好ましい。
Further, in the configuration of the detection recognition system of the present invention,
When the machine learning means recognizes a recognition target that is overlapped by some of the plurality of cameras, the other camera of the plurality of cameras does not recognize the recognition target that is overlapped. It is preferable to perform machine learning using the acquired images as teacher data.
 このような構成によれば、少なくとも1台のカメラが重複する認識対象を認識した場合に、他のカメラが重複する認識対象を認識することができなかった画像を教師データとして、機械学習をすることができる。したがって、認識対象を認識して欲しかったのに認識することができなかった可能性の高い画像を教師データとして機械学習をすることができ、機械学習の効率を高めることができる。 According to such a configuration, when at least one camera recognizes an overlapping recognition target, machine learning is performed using, as teacher data, an image in which another camera could not recognize the overlapping recognition target. be able to. Therefore, it is possible to perform machine learning using, as teacher data, an image that is likely to be recognized but wanted to be recognized, and the efficiency of machine learning can be increased.
 また、本発明の検知認識システムの前記構成において、前記複数のカメラのうちの少なくとも1台は、前記撮像手段が異なるカメラであることが好ましい。 Moreover, in the configuration of the detection recognition system of the present invention, it is preferable that at least one of the plurality of cameras is a camera having a different imaging unit.
 このような構成によれば、ある撮像手段により撮像した画像からでは、検知/認識をすることが困難であり、その撮像手段を備えるカメラでは認識対象を認識することができない場合でも、他の撮像手段を備えるカメラで認識対象を認識することが可能となる。これにより、認識対象を認識することができなかったことを容易に知ることができ、認識することができなかった画像を教師データとして、機械学習をすることができるので、検知/認識をすることが困難な画像からでも認識することができるように機械学習をすることができる。 According to such a configuration, it is difficult to detect / recognize from an image picked up by a certain image pickup means, and even if a recognition target cannot be recognized by a camera including the image pickup means, It becomes possible to recognize a recognition target with a camera provided with means. As a result, it is possible to easily know that the recognition target could not be recognized, and machine learning can be performed using the image that could not be recognized as teacher data. It is possible to perform machine learning so that it can be recognized even from difficult images.
 本発明の検知認識システムによれば、画像に含まれる特徴を検知し、この特徴から設定された認識対象を認識する検知/認識の性能を、検知/認識のためのファームウェアを更新して向上させることができる。 According to the detection recognition system of the present invention, the detection / recognition performance for detecting a feature included in an image and recognizing a recognition target set from the feature is improved by updating the firmware for detection / recognition. be able to.
 ここで、魚眼レンズを利用したステレオカメラで撮影された動画の各フレームから距離画像を算出するのには、通常のレンズの場合と同様かそれ以上に膨大な演算量が必要となり、1フレーム当たりの処理時間が長くなってしまう。また、魚眼のレンズユニットから平面状のイメージセンサ面に像が結ばれた画像をそのまま検出して出力すると、画像の中央部分の歪が小さく、画像の周縁部で歪が大きい。また、監視カメラで屋内を監視するような場合には、広さにもよるが屋内空間の中央部の天井から真下の床が中心となるようにステレオカメラを鉛直方向に向けて設置して撮影することが効率的である。
 また、車載カメラで前方を撮影するような場合や被監視空間の隅から監視カメラで撮影するような場合は、例えば、水平な方向やこの方向より斜め下向きの方向が中心となるようにステレオカメラを設置して撮影することが効率的である。これらの場合も魚眼レンズで撮影された画像の中央部より周縁部が大きく歪んだ状態となる。
Here, calculating a distance image from each frame of a moving image shot with a stereo camera using a fisheye lens requires an enormous amount of computation as in the case of a normal lens, or more than that. Processing time becomes long. Further, if an image in which an image is formed from a fish-eye lens unit on a planar image sensor surface is detected and output as it is, distortion at the center of the image is small and distortion at the peripheral edge of the image is large. In addition, when monitoring indoors with a surveillance camera, depending on the size, the stereo camera should be installed vertically so that the floor directly below the center ceiling of the indoor space is centered. It is efficient to do.
In addition, when shooting in front with an in-vehicle camera or when shooting with a monitoring camera from the corner of the monitored space, for example, a stereo camera so that the horizontal direction or a diagonally downward direction is the center. It is efficient to set up and shoot. In these cases as well, the periphery is greatly distorted from the center of the image taken with the fisheye lens.
 また、天井の鉛直方向を向く監視カメラでは、真下部分に人がいても顏が写らず、頭が写ることになり、顏認識等による人の特定が困難である。但し、監視カメラの真下に人が来るためには、撮影範囲の周縁部から中央部に移動する必要があり、この際に移動する人の顔が写るので、顏認識が可能となる。
 また、水平な屋外の監視カメラや車載カメラでは、画像の上側には空が写ることになり、画像認識による被写体の特定等の必要がない場合が多い。このような魚眼レンズを有するステレオカメラの画像から距離画像を求める際に画像全体を一様に処理すると非効率的であり、処理時間の短縮を図るうえで、何等かの処理上の工夫が必要と思われる。
In addition, with a surveillance camera that faces the vertical direction of the ceiling, even if there is a person underneath, no wrinkle appears and the head appears, making it difficult to identify a person by recognizing a wrinkle or the like. However, in order for a person to come directly under the surveillance camera, it is necessary to move from the peripheral part to the center part of the photographing range, and since the face of the person who moves at this time is captured, it is possible to recognize wrinkles.
Also, in a horizontal outdoor surveillance camera or in-vehicle camera, the sky appears above the image, and there is often no need to specify the subject by image recognition. When obtaining a distance image from a stereo camera image having such a fisheye lens, it is inefficient to process the entire image uniformly, and some processing device is required to reduce the processing time. Seem.
 本発明は、前記事情に鑑みてなされたものであり、魚眼レンズを有するステレオカメラの画像から距離画像を求める場合に、画像の位置によって異なる解像度となるように解像度を下げるように変更してから距離画像を求めることができる物体距離検出装置を提供することを目的とする。 The present invention has been made in view of the above circumstances, and when a distance image is obtained from a stereo camera image having a fisheye lens, the distance is changed after the resolution is changed so as to be different depending on the position of the image. An object of the present invention is to provide an object distance detection device capable of obtaining an image.
 前記課題を解決するために、本発明の物体距離検出装置は、魚眼レンズユニットおよび撮像センサを有する一対の魚眼カメラからなるステレオカメラと、
 一対の前記撮像センサから出力される画像から距離画像を算出する距離画像算出部と、
 前記距離画像から被写体の識別を含む画像認識を行う距離画像認識部と、
 を備え、
 前記距離画像算出部は、一対の前記魚眼カメラでそれぞれ撮影された画像を予め設定された複数の区画に区画化する区画化部と、
 前記区画毎に設定された解像度の画像に変換する解像度変換部と、
 一対の前記魚眼カメラで略同時に撮影された二つの前記画像内それぞれで、撮影された被写体上の同じ点に対応する対応点を求める対応点探索部と、
 前記対応点探索部で探索され、前記被写体上の同じ点に対応する二つの対応点の位置の違いに基づいて前記ステレオカメラから対応点までの距離を求める距離算出部とを備えることを特徴とする。
In order to solve the above problems, an object distance detection device of the present invention includes a stereo camera including a pair of fisheye cameras having a fisheye lens unit and an image sensor,
A distance image calculation unit that calculates a distance image from images output from the pair of imaging sensors;
A distance image recognition unit for performing image recognition including identification of a subject from the distance image;
With
The distance image calculation unit is a partitioning unit that partitions each of the images captured by the pair of fisheye cameras into a plurality of preset partitions.
A resolution conversion unit that converts the image into a resolution image set for each section;
A corresponding point search unit for obtaining corresponding points corresponding to the same point on the photographed subject in each of the two images captured substantially simultaneously with a pair of fisheye cameras;
A distance calculation unit that is searched by the corresponding point search unit and obtains a distance from the stereo camera to the corresponding point based on a difference in position between two corresponding points corresponding to the same point on the subject. To do.
 このような構成によれば、魚眼カメラにより撮影された画像を複数の区画に分け、区画によって解像度を変更できるので、魚眼カメラで撮影された画像を考えた場合に、画像の中央部は魚眼レンズによる歪が小さく、そのままでも、例えば、顏の検知や人の検知が可能である。それに対して画像の周縁部では、画像の歪が大きく、画像が歪んで圧縮された状態なので、歪を除去しないと画像認識が難しい。ここで、処理を容易にするために最初に一様に画像の解像度を下げてしまうと歪の大きな画像周縁部で精度の高い画像認識を行うことが困難になる。
 そこで、画像の中央部の区画では、画像の解像度を低くして処理量の減少を図り画像の周縁部では解像度を下げずに、解像度が高いまま処理を行うことで、画像の解像度を下げて処理速度の向上を図っても画像認識の精度を維持することができる。この場合に、画像の中央部の広い面積の部分で解像度を下げるようにすれば、全体の処理量を減少させて、処理を高速化することができ、かつ、解像度を下げることによる画像認識の精度が低下するのを抑制できる。ここでの解像度は、例えば、画像の単位面積当たりの画素数であり、解像度を下げる場合には、画像の画素を周知の方法で間引いて減らすことになる。この場合には、周知の画像の縮小処理と同様の処理を行ってもよい。
According to such a configuration, the image captured by the fisheye camera can be divided into a plurality of sections, and the resolution can be changed according to the sections, so when considering the image captured by the fisheye camera, the central portion of the image is The distortion caused by the fisheye lens is small, and it is possible to detect, for example, a shark or a person even without modification. On the other hand, since the image has a large distortion at the peripheral edge of the image and the image is distorted and compressed, it is difficult to recognize the image unless the distortion is removed. Here, if the resolution of the image is first lowered uniformly in order to facilitate processing, it becomes difficult to perform highly accurate image recognition at the image peripheral portion having a large distortion.
Therefore, in the central section of the image, the resolution of the image is lowered to reduce the amount of processing, and the processing is performed at a high resolution without reducing the resolution in the peripheral portion of the image, thereby reducing the resolution of the image. Even if the processing speed is improved, the accuracy of image recognition can be maintained. In this case, if the resolution is lowered at a large area in the center of the image, the overall processing amount can be reduced, the processing speed can be increased, and image recognition by lowering the resolution can be performed. It can suppress that accuracy falls. The resolution here is, for example, the number of pixels per unit area of the image. When the resolution is lowered, the pixels of the image are thinned out by a well-known method. In this case, a process similar to a known image reduction process may be performed.
 二つの魚眼カメラのそれぞれの画像の区画は、基本的に同じ方向範囲で区画されている。すなわち、区画A、区画B、区画Cがそれぞれの画像に対応してある場合に、少なくとも区画の境界部分以外では、一方の画像の区画Aにある対応点Aに対応する他方の画像の対応点Aは、他方の画像の区画Aに存在する。そこで、対応点の探索は、基本的に、二つの画像の対応する二つの区画で行われる。また、対応点の探索は、例えば、一対の魚眼レンズの二つの画像の各区画において、画像認識における特徴点(特異点)の抽出を行い、一方の画像の区画の特徴点に対応する他方の画像の区画の特徴点を決定するものであり、基本的な画像認識により対応点が決定される。
 なお、対応点の探索においては、上述の特許文献1に記載されるエピポーラ幾何を利用して決定してもよい。
The sections of the images of the two fisheye cameras are basically sectioned in the same direction range. That is, when the section A, the section B, and the section C correspond to the respective images, corresponding points of the other image corresponding to the corresponding point A in the section A of one image at least other than the boundary portion of the section. A exists in the section A of the other image. Therefore, the search for corresponding points is basically performed in two corresponding sections of two images. The search for corresponding points is performed by, for example, extracting feature points (singular points) in image recognition in each section of two images of a pair of fisheye lenses, and the other image corresponding to the feature point in one image section. The feature points of the sections are determined, and corresponding points are determined by basic image recognition.
In the search for corresponding points, the epipolar geometry described in Patent Document 1 described above may be used for determination.
 本発明の物体距離検出装置の前記構成において、前記距離画像算出部は、縦横に配列された画素からなるとともに、一つまたは複数の画素からなる各画素領域に分けられ、前記画素領域毎に前記ステレオカメラから前記対応点までの距離に応じて変化する色が着けられた距離画像を出力し、
 前記距離画像の解像度の異なる前記区画では、前記画素領域を構成する前記画素の数が前記解像度に応じて異なることが好ましい。
In the configuration of the object distance detection device of the present invention, the distance image calculation unit includes pixels arranged vertically and horizontally, and is divided into pixel regions each including one or a plurality of pixels. Output a distance image with a color that changes according to the distance from the stereo camera to the corresponding point,
In the section where the resolution of the distance image is different, it is preferable that the number of the pixels constituting the pixel region is different depending on the resolution.
 このような構成によれば、各区画の画素領域を構成する画素の数を各区画の解像度に応じて増減することで、解像度の異なる各区画の画像を一つの距離画像上に略同じ表示倍率で表すことができる。なお、各画素領域は、一つの色で着色された状態で、距離画像では、距離に応じて各画素領域の色が変化するものとなっている。例えば、サーモグラフィのように温度の数値を示す画像に対して、距離画像は距離の数値を示す画像で、サーモグラフィのような画像となる色の変化は、例えば、明暗(輝度)の変化や、色相の変化や、輝度と色相の両方の変化の組み合わせなどである。 According to such a configuration, by increasing / decreasing the number of pixels constituting the pixel area of each section in accordance with the resolution of each section, images of each section having different resolutions can be displayed on the same distance image at substantially the same display magnification. Can be expressed as Each pixel region is colored with one color, and in the distance image, the color of each pixel region changes according to the distance. For example, a distance image is an image showing a numerical value of a distance as compared to an image showing a numerical value of temperature as in a thermography, and a color change that becomes an image like a thermography is, for example, a change in brightness (luminance) or a hue. Or a combination of both luminance and hue changes.
 また、本発明の物体距離検出装置の前記構成において、前記距離画像算出部は、前記区画毎に、魚眼レンズによる歪を除去する歪除去部を備えることが好ましい。
 このような構成によれば、区画毎に歪を除去した二次元画像として取り扱えるので、対応点の探索を容易にすることができる。なお、魚眼レンズで撮影された歪の除去方法は周知であり、各魚眼レンズの歪除去に必要なパラメータを設定することで、歪除去が可能となるIPコアが販売されており、これを利用することで歪の除去が可能である。
In the configuration of the object distance detection device of the present invention, it is preferable that the distance image calculation unit includes a distortion removal unit that removes distortion caused by the fisheye lens for each of the sections.
According to such a configuration, since it can be handled as a two-dimensional image from which distortion is removed for each section, it is possible to easily search for corresponding points. It should be noted that a method for removing distortion photographed with a fisheye lens is well known, and an IP core that enables distortion removal by setting parameters necessary for distortion removal of each fisheye lens is sold. It is possible to remove distortion.
 また、本発明の物体距離検出装置の前記構成において、一対の前記魚眼カメラは、略鉛直方向を向いて配置され、
 前記解像度変換部は、前記画像の中央部の前記区画より前記画像の周縁部の区画の方が、解像度が高くなるように解像度を変更することが好ましい。
Further, in the configuration of the object distance detection device of the present invention, the pair of fisheye cameras are arranged in a substantially vertical direction,
It is preferable that the resolution conversion unit changes the resolution so that the resolution is higher in the section at the periphery of the image than in the section at the center of the image.
 上述のように魚眼レンズによる画像の歪を考慮すると、歪の少ない画像の中央部の解像度を下げ、歪の大きな画像の周縁部の解像度を下げないようにすることが好ましい。さらに、天井や何等かの柱状の構造物に下を向くように取り付けられた魚眼カメラでは、真下が画像の中央部となるが、この部分では、被写体を真上から撮影する状況となり、顏等が写り難いため、画像認識が困難になるのに対して真上から少しずれた位置では、顏等が写るため画像認識が可能となり、さらに魚眼カメラの180度程度の画角における180度近くでは、顏の正面が撮影される可能性がある。そこで、画像の中央部の解像度を下げ、画像の周縁部の解像度を下げないことで、処理時間の短縮と、画像認識の精度の維持を両立させることができる。なお、この場合に解像度の低い中央部の一区画の面積が、解像度が高い周縁部の一区画の面積より大きいことが好ましい。 Considering image distortion caused by the fisheye lens as described above, it is preferable to lower the resolution of the central portion of an image with little distortion and not to lower the resolution of the peripheral portion of an image with large distortion. Furthermore, with a fisheye camera attached to the ceiling or some columnar structure facing down, the center of the image is directly below, but in this part, the subject is photographed from directly above. Since it is difficult to recognize the image, it is difficult to recognize the image. On the other hand, the image is recognizable at a position slightly deviated from above, and the image can be recognized. Further, the angle of view of the fisheye camera is 180 degrees. There is a possibility that the front of the kite will be photographed nearby. Therefore, by reducing the resolution of the central portion of the image and not reducing the resolution of the peripheral portion of the image, it is possible to achieve both shortening of the processing time and maintaining the accuracy of image recognition. In this case, it is preferable that the area of one section of the central portion with low resolution is larger than the area of one section of the peripheral portion with high resolution.
 また、本発明の物体距離検出装置の前記構成において、一対の前記魚眼カメラは、略水平方向を向いて配置され、
 前記解像度変換部は、前記画像の上部の前記区画より前記画像の下部の区画の方が、解像度が高くなるように解像度を変更することが好ましい。
Further, in the configuration of the object distance detection device of the present invention, the pair of fisheye cameras are arranged in a substantially horizontal direction,
Preferably, the resolution conversion unit changes the resolution so that the resolution in the lower section of the image is higher than that in the upper section of the image.
 このような構成によれば、魚眼カメラで水平方向を撮影した場合に、画像の上部には、屋外の場合に空が、屋内の場合に天井が写ることになり、自動車の危険回避や自動運転や、犯罪者や不審者の識別等の監視において、重要度の低い部分なので画像の下部に対して解像度を下げることで、画像認識の精度の低下を抑制しつつ処理時間の短縮を図ることができる。特に、魚眼カメラの高さが高い場合、例えば、魚眼カメラが監視カメラの場合に人間より高い位置にある場合や、魚眼カメラが車載カメラの場合に、車の高い位置にある場合などは、画像の下部の重要度が高くなり、解像度を維持することが好ましい。また、この場合に、画像の中央部の解像度は画像の下部より低くなっていることが好ましい。なお、魚眼カメラの設置高さが人の身長(顏の位置)より低いような場合には、顏が写る可能性が高い前記画像の上部の前記区画より前記画像の下部の区画の方が、解像度が低くなるように解像度を変更してもよい。 According to such a configuration, when the horizontal direction is taken with a fisheye camera, the sky is displayed at the top of the image when it is outdoors and the ceiling is displayed when it is indoors. When monitoring driving, criminal or suspicious person identification, etc., it is a less important part, so reducing the resolution to the bottom of the image will reduce the accuracy of image recognition and reduce processing time. Can do. In particular, when the height of the fisheye camera is high, for example, when the fisheye camera is a surveillance camera, it is higher than a human, or when the fisheye camera is an in-vehicle camera, it is at a higher position in the car. It is preferable to maintain the resolution because the importance of the lower part of the image becomes high. In this case, the resolution at the center of the image is preferably lower than that at the bottom of the image. In addition, when the installation height of the fisheye camera is lower than the height of the person (the position of the shark), the lower section of the image is more likely than the upper section of the image where the shark is likely to be captured. The resolution may be changed so that the resolution is lowered.
 本発明の物体距離検出装置によれば、魚眼レンズを用いたステレオカメラによる距離画像の算出を演算処理装置に大きな負荷をかけることなく、高速かつ高精度で行うことができる。 According to the object distance detection device of the present invention, it is possible to calculate a distance image by a stereo camera using a fisheye lens at high speed and with high accuracy without imposing a heavy load on the arithmetic processing device.
 本発明によれば、画像認識を容易かつ高精度で行うことができる。 According to the present invention, image recognition can be performed easily and with high accuracy.
本発明の画像認識撮像装置の第1の実施の形態を示すもので、画像認識撮像装置を示すブロック図である。1 is a block diagram illustrating an image recognition and imaging apparatus according to a first embodiment of the image recognition and imaging apparatus of the present invention. 同、画像認識撮像装置による画像認識方法を示すフローチャートである。3 is a flowchart illustrating an image recognition method by the image recognition imaging apparatus. 同、画像認識撮像装置による画像認識方法を説明するための図である。It is a figure for demonstrating the image recognition method by an image recognition imaging device. 同、画像認識撮像装置による画像認識方法を説明するための図である。It is a figure for demonstrating the image recognition method by an image recognition imaging device. 本発明の第2の実施の形態の画像認識撮像装置を示すブロック図である。It is a block diagram which shows the image recognition imaging device of the 2nd Embodiment of this invention.
本発明の検知認識システムの実施の形態を示すもので、検知認識システムを示すブロック図である。1 is a block diagram illustrating a detection recognition system according to an embodiment of the detection recognition system of the present invention. 同、検知認識システムのカメラを示すブロック図である。It is a block diagram which shows the camera of a detection recognition system. 同、検知認識システムのサーバを示すブロック図である。It is a block diagram which shows the server of a detection recognition system. 同、検知認識システムによる検知認識ファームウェアの更新方法を説明するためのフローチャートである。4 is a flowchart for explaining a detection recognition firmware update method by the detection recognition system.
本発明の物体距離検出装置の実施の形態を示すもので、物体距離検出装置を示すブロック図である。1 is a block diagram illustrating an object distance detection device according to an embodiment of the object distance detection device of the present invention. 同、物体距離検出装置の画像解析部を示すブロック図である。It is a block diagram which shows the image analysis part of an object distance detection apparatus similarly. 同、物体距離検出装置の画像解析部の処理を示すフローチャートである。4 is a flowchart showing processing of an image analysis unit of the object distance detection device. 同、物体距離検出装置の画像の区画を示す図である。It is a figure which shows the division of the image of an object distance detection apparatus similarly. 同、物体距離検出装置の画像の区画を示す図である。It is a figure which shows the division of the image of an object distance detection apparatus similarly. 同、(a)、(b)は、距離画像の区画毎の解像度の違いを説明するための図である。(A), (b) is a figure for demonstrating the difference in the resolution for every division of a distance image.
 以下、本発明の実施の形態について説明する。
 本実施の形態の画像認識撮像装置は、例えば、監視カメラや車載カメラ等の主に監視に係るカメラに画像認識装置を組み合わせたものであり、撮影範囲における人や自動車等を識別するようになっている。
Embodiments of the present invention will be described below.
The image recognition and imaging apparatus of the present embodiment is a combination of an image recognition apparatus and a camera mainly related to monitoring, such as a surveillance camera and an in-vehicle camera, and identifies a person, a car, and the like in a shooting range. ing.
 本実施の形態の画像認識撮像装置は、図1に示すように撮像手段である画像センサ1と、距離画像取得手段である3Dセンサ2と、3Dセンサ2により得られた距離画像から認識対象となる物体(人を含む)、すなわち、認識対象物を抽出する距離画像認識対象物抽出手段としての物体抽出手段3と、画像センサ1の画像から前記認識対象物となる部分画像を抽出する認識対象物画像抽出手段としての物体画像抽出手段4と、抽出された部分画像(物体画像)の画像認識を行う画像認識手段5と、これらを制御する制御手段6と、画像、距離画像、認識結果等のデータを記憶する記憶手段7とを備える。
 また、制御手段6は、インターネット等の通信網8を介して外部のサーバ9(ホストPC)にデータ通信可能に接続されている。
As shown in FIG. 1, the image recognition and imaging apparatus according to the present embodiment includes an image sensor 1 that is an imaging unit, a 3D sensor 2 that is a distance image acquisition unit, and a recognition target from a distance image obtained by the 3D sensor 2. An object (including a person), that is, an object extracting unit 3 as a distance image recognition target extracting unit for extracting a recognition target, and a recognition target for extracting a partial image serving as the recognition target from the image of the image sensor 1 Object image extraction means 4 as an object image extraction means, image recognition means 5 for performing image recognition of the extracted partial image (object image), control means 6 for controlling these, images, distance images, recognition results, etc. Storage means 7 for storing the data.
The control means 6 is connected to an external server 9 (host PC) via a communication network 8 such as the Internet so that data communication is possible.
 画像センサ1は、所謂撮像素子(イメージセンサ)であり、画像センサ1上に撮影対象の像を結像させるレンズを備えるカメラとして使用される。
 3Dセンサ2は、上述のTOF方式のものであり、例えば、赤外レーザの超短パルスを撮影範囲で走査して、物体に当たった光の反射光が戻るまでの時間を計測し、この時間に光速を乗算することで、撮影範囲の各画素の距離を求めるようになっている。なお、画像センサ1の解像度と、3Dセンサ2の解像度が一致してもよいし、一致しなくてもよく、画像センサ1の撮影範囲の各部分の位置と、3Dセンサ2の撮影範囲の各部分の位置が対応していればよく、画像センサ1の撮影範囲の任意の位置が3Dセンサ2の撮影範囲のどこに位置するかが分かるようになっていればよい。基本的に画像センサ1と3Dセンサ2とで、互いに重なる範囲を画像および距離画像として同時に撮影するようになっている。
The image sensor 1 is a so-called image sensor (image sensor), and is used as a camera including a lens that forms an image to be photographed on the image sensor 1.
The 3D sensor 2 is of the above-described TOF method, for example, scans an ultrashort pulse of an infrared laser in the imaging range, measures the time until the reflected light of the light hitting the object returns, and this time Is multiplied by the speed of light to obtain the distance of each pixel in the shooting range. Note that the resolution of the image sensor 1 and the resolution of the 3D sensor 2 may or may not match, and the position of each part of the imaging range of the image sensor 1 and each of the imaging range of the 3D sensor 2 It is only necessary that the positions of the portions correspond to each other, and it is only necessary to know where an arbitrary position in the shooting range of the image sensor 1 is located in the shooting range of the 3D sensor 2. Basically, the image sensor 1 and the 3D sensor 2 are configured to simultaneously capture an overlapping range as an image and a distance image.
 物体抽出手段3は、3Dセンサ2で取得された距離情報である距離画像から物体を抽出する。この際には、距離画像上で距離の値が互いに近く、かつ、略一塊となった画素(近傍の画素)の集団を抽出すべき物体と判断する。この際には、距離の値が近似する一塊の画素が一つの物体として抽出される。この場合に基本的に距離画像における各画素の距離だけで物体を抽出することができるので、同じ範囲を常時または繰り返し撮影している監視カメラ等の画像でなく、一回だけ撮影された距離画像、例えば、車載の3Dセンサ2で撮影された距離画像からでも精度高く物体を抽出することが可能となる。なお、画素の距離の値が所定距離以上となっている部分を背景として、物体として抽出しないようにしてもよい。また、監視カメラのように固定された3Dセンサ2や、回転等の移動範囲が決まっている3Dセンサ2では、常時同じ範囲が撮影されるか、繰り返し同じ範囲が撮影されるので、3Dセンサ2の撮影に際し、各画素で一定期間変化しない距離(変化する場合に最も長い距離)をその画素の背景距離として記憶し、当該背景となる距離から距離が変化した画素の集団を物体として認識するものとしてもよい。この際には、時間経過により距離が変化する画素の集団を物体として検出するものとしてもよい。なお、画像の時間経過による変化から背景と物体を分離することは、二次元画像でも可能であるが、距離画像の場合には、基本的に背景に対して距離が短くなるように変化した部分を物体として識別することができる。 The object extraction unit 3 extracts an object from a distance image that is distance information acquired by the 3D sensor 2. At this time, it is determined that a group of pixels (neighboring pixels) in which distance values are close to each other on the distance image and are substantially in a lump is an object to be extracted. At this time, a group of pixels with approximate distance values are extracted as one object. In this case, the object can be extracted basically only by the distance of each pixel in the distance image. Therefore, the distance image captured only once, not the image of a monitoring camera or the like that constantly or repeatedly captures the same range. For example, an object can be extracted with high accuracy even from a distance image captured by the in-vehicle 3D sensor 2. Note that a portion where the pixel distance value is equal to or greater than the predetermined distance may not be extracted as an object using the background. Further, in the 3D sensor 2 that is fixed like a surveillance camera or the 3D sensor 2 in which the movement range such as rotation is determined, the same range is always shot or the same range is shot repeatedly, so the 3D sensor 2 When shooting a pixel, the distance that does not change for a certain period of time (the longest distance in the case of change) is stored as the background distance of the pixel, and the group of pixels whose distance has changed from the background distance is recognized as an object It is good. In this case, a group of pixels whose distances change over time may be detected as an object. Note that it is possible to separate the background and the object from the change over time of the image even in a two-dimensional image, but in the case of a distance image, the part that has changed so that the distance is basically shorter than the background Can be identified as an object.
 物体抽出手段3では、3Dセンサ2による距離画像上で物体の範囲の位置が決定される。
 物体画像抽出手段4は、距離画像上で決定された上述の物体の範囲を、画像センサ1による可視の二次元の画像上の範囲に変換して、この範囲内の部分画像を抽出する。すなわち、距離画像上で抽出された物体の範囲を画像に割り当て、画像から物体の範囲となる部分画像を抽出する。例えば、距離画像と画像とにそれぞれ座標系を設けるとともに、画像上の座標を距離画像上の座標に変換可能にしたり、両者の座標系を同じにしたりしてもよい。
In the object extraction means 3, the position of the range of the object is determined on the distance image by the 3D sensor 2.
The object image extraction unit 4 converts the range of the above-described object determined on the distance image into a range on a visible two-dimensional image by the image sensor 1 and extracts a partial image within this range. That is, the range of the object extracted on the distance image is assigned to the image, and the partial image that becomes the range of the object is extracted from the image. For example, a coordinate system may be provided for each of the distance image and the image, the coordinates on the image may be converted to the coordinates on the distance image, or both coordinate systems may be the same.
 画像認識手段5は、上述のように画像から抽出された画像上の物体である部分画像の画像認識を行なう。この場合には、既に、距離画像において、物体となる部分が抽出されているので、抽出された部分画像が例えば、人なのか車なのかを画像認識する。この際には、例えば、記憶されている人の特徴点や自動車の特徴点と、部分画像上から検出された特徴点を比較して、人か、自動車か等を判定する。また、深層機械学習により取得されたアルゴリズムに基いて、人は、小人か大人か、女か男かを識別するものとしてもよい。実際の画像認識には、例えば、画像認識に関連する機能のライブラリであるOpenCV(Open Computer Vision Library)、および、深層機械学習による認識を利用することで、人検知、人追跡、顔認識を行うことができる。最新のOpenCVライブラリには、機械学習機能も含まれ、例えば、ディープラーニングモジュールも備えており、人、車両等の識別が可能である。なお、本実施例では、既に距離画像を用いて物体となる画像上の領域が決定されているので、例えば、識別された人や自動車となる画像の領域を決定する必要がなく、既に抽出された物体の領域が人か自動車等かを識別すればよいので、画像全体を処理して物体となる部分を特定する演算を必要とせず、演算量は少ないものとなる。画像認識では、物体の属性を認識するようになっており、人や車等の物体の種類、さらに人ならば大人、小人、男性、女性、人種、顔の特徴等の物体の属性を検出する。また、車ならば車種や年式や色等を属性として識別するようになっている。また、本実施の形態においては、物体を抽出した際の距離画像上の物体のデータから立体形状とサイズを認識可能であり、上述の物体の属性を識別する際に立体形状とサイズを利用するようになっており、大人と小人、小型車と大型車等の判定が容易になっている。 The image recognition means 5 performs image recognition of a partial image that is an object on the image extracted from the image as described above. In this case, since a portion that is an object has already been extracted from the distance image, it is recognized whether the extracted partial image is, for example, a person or a car. At this time, for example, the stored feature points of the person or the car and the feature points detected from the partial image are compared to determine whether the person is a car or the like. Further, based on an algorithm acquired by deep machine learning, a person may be identified as a dwarf, an adult, a woman, or a man. For actual image recognition, for example, open CV (Open Computer Vision Library), which is a library of functions related to image recognition, and recognition using deep machine learning, human detection, human tracking, and face recognition are performed. be able to. The latest OpenCV library includes a machine learning function, for example, also includes a deep learning module, and can identify people, vehicles, and the like. In the present embodiment, since the area on the image that becomes the object has already been determined using the distance image, for example, it is not necessary to determine the area of the image that becomes the identified person or car, and has already been extracted. Therefore, it is only necessary to identify whether the area of the object is a person or a car, so that an operation for processing the entire image and specifying a portion to be an object is not required, and the amount of calculation is small. In image recognition, the attributes of an object are recognized, and the type of an object such as a person or a car, and if it is a person, the attribute of the object such as an adult, dwarf, male, female, race, facial features, etc. To detect. In the case of a car, the model, year, color, etc. are identified as attributes. In the present embodiment, the three-dimensional shape and the size can be recognized from the object data on the distance image when the object is extracted, and the three-dimensional shape and the size are used when identifying the attribute of the object. Thus, it is easy to determine adults and children, small cars and large cars.
 制御手段6は、画像センサ1および3Dセンサ2による撮影、物体抽出手段3による距離画像からの物体抽出、物体画像抽出手段4による画像からの部分画像の抽出、画像認識手段5による画像認識を制御する。制御手段6は、演算処理装置からなるものであり、演算処理装置は、物体抽出手段3、物体画像抽出手段4、画像認識手段5として機能してもよい。例えば、画像認識手段5は、演算処理装置上で深層機械学習済みの機械学習モデル(画像認識アルゴリズム)を実行することにより実現してもよい。
 記憶手段7は、ハードディスク、フラッシュメモリ等からなるストレージ装置であり、画像、距離画像、画像認識結果等のデータを記憶するようになっている。
The control means 6 controls photographing by the image sensors 1 and 3D sensor 2, object extraction from the distance image by the object extraction means 3, extraction of a partial image from the image by the object image extraction means 4, and image recognition by the image recognition means 5. To do. The control means 6 comprises an arithmetic processing device, and the arithmetic processing device may function as the object extraction means 3, the object image extraction means 4, and the image recognition means 5. For example, the image recognition means 5 may be realized by executing a machine learning model (image recognition algorithm) that has undergone deep machine learning on the arithmetic processing unit.
The storage unit 7 is a storage device including a hard disk, a flash memory, and the like, and stores data such as images, distance images, and image recognition results.
 次に、図2のフローチャートと、図3、図4を参照して画像認識撮像装置による画像認識方法を説明する。
 3Dセンサ2により距離画像bを撮影する(ステップS1)。なお、この際には、同時に画像センサ1により二次元の可視(カラー)画像aを撮影する。例えば、撮影された画像aには、図3に示すように、大人の男性と、自動車と、小人の女性が写っている。また、図4に示す距離画像bでは、各画素の距離の違いが画像として表現可能となっており、距離に明暗や色を割付けることにより、図4に示すように画像とすることができる。なお、図4においては、所定距離以上の画素を例えば白で表している。
Next, an image recognition method by the image recognition imaging apparatus will be described with reference to the flowchart of FIG. 2 and FIGS. 3 and 4.
A distance image b is photographed by the 3D sensor 2 (step S1). At this time, a two-dimensional visible (color) image a is simultaneously captured by the image sensor 1. For example, as shown in FIG. 3, the photographed image a includes an adult man, a car, and a dwarf woman. Further, in the distance image b shown in FIG. 4, the difference in distance between the pixels can be expressed as an image, and an image can be obtained as shown in FIG. 4 by assigning light and darkness or color to the distance. . In FIG. 4, pixels having a predetermined distance or more are represented by white, for example.
 次に、距離画像から互いに距離が近い一塊の画素の集団として物体を抽出する(ステップS2)。この際には、上述のように所定距離以上の画素を背景とし、それより距離の値が小さく、距離が近似する画素の集団をそれぞれ物体として抽出してもよい。なお、一塊の画素であっても、明らかに異なる距離の画素に別れる場合には、別の物体として取り扱う。距離画像上で抽出された物体は、画素の集団として物体の範囲内となる各画素の位置で配置を表すことができる。ここでは、大人の男性、自動車、小人の女性の距離画像部分が抽出される。 Next, an object is extracted from the distance image as a group of pixels having a close distance from each other (step S2). In this case, as described above, pixels having a predetermined distance or more may be used as the background, and a group of pixels having a smaller distance value and approximate distance may be extracted as objects. In addition, even if it is a lump of pixels, when it is separated into pixels of clearly different distances, they are handled as different objects. The object extracted on the distance image can represent the arrangement at the position of each pixel within the range of the object as a group of pixels. Here, distance image portions of an adult male, a car, and a dwarf female are extracted.
 次に、画像センサ1で撮影した図4の画像a上で、図4の距離画像bで抽出された物体の範囲と同じ位置となる範囲を物体画像となる部分画像として抽出する(ステップS3)。これにより2次元のカラー画像上の物体が抽出された状態となる。ここでは、画像上で物体を識別して、物体の範囲を決めて抽出している分けではなく、距離画像上の距離の違いにより物体を分離し、分離された距離画像上の物体を画像上に当て嵌めているだけなので、物体は画像上の位置が分かるだけで、何であるかを識別された状態となっていない。したがって、距離画像上の距離だけで物体を抽出し、その位置に基づいて画像上の物体を抽出しているだけなので、物体を識別して物体を抽出するのに比較して、演算量が極めて少ない。 Next, on the image a in FIG. 4 taken by the image sensor 1, a range that is the same position as the range of the object extracted in the distance image b in FIG. 4 is extracted as a partial image that becomes an object image (step S3). . As a result, an object on the two-dimensional color image is extracted. Here, the object is not separated by identifying and extracting the object range on the image, but the object is separated by the difference in distance on the distance image, and the object on the separated distance image is displayed on the image. Since the object is merely fitted to the object, the position of the object is only known, and the object is not identified. Therefore, since the object is extracted only by the distance on the distance image and the object on the image is extracted based on the position, the calculation amount is extremely large compared with the case of identifying the object and extracting the object. Few.
 次いで、画像から抽出された物体画像としての部分画像の画像認識を行う(ステップS4)。この場合には、上述のように物体が抽出された状態なので、画像認識において、画像全体を処理して物体を抽出する処理が必要ない。すなわち、画像から物体を検知して抽出する必要がないので、画像認識は、既に抽出された部分画像に対してだけ行われることになり、演算量を削減することができる。また、画像認識は、抽出された部分画像が一つの物体または近接する複数の物体であることを前提として画像認識を行うことができるので、例えば、抽出された部分の外縁部分を物体の外縁部分と判断することが可能であり、部分画像の画像認識において、演算量を削減することができる。上述のように周知の人検知、顏認識等により人を識別することができる。また、人以外の物体も、アルゴリズムの一部として各種物体の特徴点を登録しておくことで、例えば、自動車や自転車等を識別することができる。また、物体の特徴点として、距離画像から読み取ることが可能な物体の立体形状やサイズを利用することが可能であり、物体の認識精度を向上することができる。 Next, image recognition of the partial image as the object image extracted from the image is performed (step S4). In this case, since the object has been extracted as described above, in the image recognition, there is no need to process the entire image and extract the object. That is, since it is not necessary to detect and extract an object from the image, image recognition is performed only on the already extracted partial image, and the amount of calculation can be reduced. In addition, since image recognition can be performed on the assumption that the extracted partial image is one object or a plurality of adjacent objects, for example, the outer edge portion of the extracted portion is used as the outer edge portion of the object. Therefore, it is possible to reduce the amount of calculation in image recognition of partial images. As described above, a person can be identified by well-known person detection, wrinkle recognition, or the like. In addition, an object other than a person can also identify, for example, a car or a bicycle by registering feature points of various objects as part of the algorithm. In addition, the three-dimensional shape and size of an object that can be read from a distance image can be used as the feature point of the object, and the recognition accuracy of the object can be improved.
 画像認識結果としての物体の属性等のデータをサーバ9に送信する(ステップS5)。また、上述の画像認識に係る各種データ、例えば、画像認識に用いられた距離画像、画像や、抽出された物体の範囲等のデータをサーバ9にも送る。この場合に、より高度な画像認識をサーバ9側で行うものとしてもよい。この場合に、物体抽出手段3、物体画像抽出手段4、画像認識手段5がサーバ9にあるものとしてもよい。高い演算能力を有するサーバ側で画像認識の処理を行うことで、より高度な処理を行うことが可能である。また、サーバ9で画像認識の処理を行う場合に、複数の画像センサ1および3Dセンサ2を接続したシステムとしてもよい。この場合に、各画像認識撮像装置の演算量はサーバにおいても削減することが可能であり、例えば、多くの画像センサ1および3Dセンサ2を備えたシステムにおいて、1台の高機能サーバで対応することもできる。また、通常時の画像認識は、画像認識撮像装置側で行い、事件発生時に過去のデータで画像認識を行う場合にサーバで画像認識を行うようにしてもよい。この場合に、全ての画像および距離画像を保存しておく必要はなく、例えば、抽出された物体画像(部分画像)だけを保存するようにして、サーバ9で必要となる記憶容量を削減することができる。 The data such as the object attribute as the image recognition result is transmitted to the server 9 (step S5). Further, various data related to the above-described image recognition, for example, data such as a distance image used for image recognition, an image, a range of an extracted object, and the like are also sent to the server 9. In this case, more advanced image recognition may be performed on the server 9 side. In this case, the server 9 may include the object extraction unit 3, the object image extraction unit 4, and the image recognition unit 5. It is possible to perform more advanced processing by performing image recognition processing on the server side having high computing ability. Further, when image recognition processing is performed by the server 9, a system in which a plurality of image sensors 1 and 3D sensors 2 are connected may be used. In this case, the calculation amount of each image recognition and imaging apparatus can be reduced even in the server. For example, in a system including a large number of image sensors 1 and 3D sensors 2, a single high-function server is used. You can also. Further, the image recognition in the normal time may be performed on the image recognition imaging apparatus side, and the image recognition may be performed on the server when the image recognition is performed with the past data when the incident occurs. In this case, it is not necessary to save all the images and distance images. For example, only the extracted object image (partial image) is saved to reduce the storage capacity required for the server 9. Can do.
 このような画像認識撮像装置によれば、距離画像で物体を抽出した後に、抽出された物体の距離画像上の位置に応じた画像上の位置に基いて物体画像を抽出してから、抽出した物体画像に対して画像認識を行うことにより演算量を削減することができるとともに、物体の抽出を距離画像に対して行うことで、背景からの物体の分離の精度を高めることができ、例えば、物体の検出精度を99%程度まで引き上げることができる。また、物体までの距離が分かることから物体の属性としてサイズを容易かつ正確に算出することができる。このサイズに基いて、物体の他の属性、例えば、大人か小人かや、自動車の車種の判断等が容易になる。 According to such an image recognition imaging apparatus, after extracting an object from a distance image, the object image is extracted based on the position on the image corresponding to the position of the extracted object on the distance image, and then extracted. The amount of calculation can be reduced by performing image recognition on the object image, and the accuracy of separating the object from the background can be improved by performing object extraction on the distance image, for example, The object detection accuracy can be increased to about 99%. In addition, since the distance to the object is known, the size can be easily and accurately calculated as the attribute of the object. Based on this size, it becomes easy to determine other attributes of the object, for example, whether it is an adult or a dwarf, or the type of car.
 次に、本發明の第2の実施の形態を説明する。
 図5に示すように、本実施の形態の画像認識撮像装置では、距離画像取得手段が撮像手段としての2台のカメラ(カメラ1、カメラ2)10と、これらカメラ10で撮影された左右の画像の視差から画素毎の距離を算出して距離画像を生成する距離画像検出手段11とを備えている。また、2台のカメラ10と距離画像検出手段11とから3Dセンサ機能付きステレオカメラ12が構成される。3Dセンサ機能付きステレオカメラ12により、距離画像検出手段11で生成された距離画像と、カメラ10で撮影された画像を得ることができる。なお、ステレオカメラ12により視差のある2つの画像が得られるが両方を用いるものとしても、どちらか一方を用いるものとしてもよい。
 第2の実施の形態において、距離画像取得手段と撮像手段の構成以外の構成は、上述の第1の実施の形態の画像認識撮像装置と同様であり、物体抽出手段3、物体画像抽出手段4、画像認識手段5、制御手段6および記憶手段7を備え、通信網8を介して外部のサーバ9に接続されている。
Next, a second embodiment of the present invention will be described.
As shown in FIG. 5, in the image recognition and imaging apparatus according to the present embodiment, the distance image acquisition unit has two cameras (camera 1 and camera 2) 10 as imaging units, and left and right images captured by these cameras 10. Distance image detection means 11 for calculating a distance for each pixel from the parallax of the image and generating a distance image. The two cameras 10 and the distance image detecting means 11 constitute a stereo camera 12 with a 3D sensor function. The stereo camera 12 with the 3D sensor function can obtain the distance image generated by the distance image detection means 11 and the image photographed by the camera 10. Note that two images with parallax can be obtained by the stereo camera 12, but either one or both may be used.
In the second embodiment, the configuration other than the configuration of the distance image acquisition unit and the imaging unit is the same as that of the image recognition imaging device of the first embodiment described above, and the object extraction unit 3 and the object image extraction unit 4 The image recognition means 5, the control means 6, and the storage means 7 are provided, and are connected to an external server 9 via a communication network 8.
 第2の実施の形態においても、距離画像をステレオカメラ12で撮影された一対の画像の視差から周知の方法により求める以外は、第1の実施の形態と同様の方法により画像認識を行なうことができ、第1の実施の形態の画像認識撮像装置と同様の作用効果を奏することができる。
 なお、距離画像取得手段は、TOF方式の3Dセンサやステレオカメラに限られるものではなく、撮像手段の撮影範囲に対応して距離画像を生成可能であれば他の方式の3Dセンサであってもよい。
Also in the second embodiment, image recognition can be performed by the same method as in the first embodiment, except that a distance image is obtained by a known method from the parallax of a pair of images taken by the stereo camera 12. It is possible to achieve the same operational effects as those of the image recognition and imaging apparatus of the first embodiment.
The distance image acquisition means is not limited to the TOF type 3D sensor or the stereo camera, but may be another type of 3D sensor as long as it can generate a distance image corresponding to the shooting range of the imaging means. Good.
 本実施の形態の検知認識システムは、例えば、カメラで撮像した画像から設定された認識対象を認識した場合に、報知をするのに用いられる。
 なお、以下において、単に、画像といった場合、基本的には動画と静止画との両方を含む。
The detection and recognition system according to the present embodiment is used to notify when a recognition target set from an image captured by a camera is recognized, for example.
In the following description, an image simply includes both a moving image and a still image.
 検知認識システム101は、図6に示すように、複数のカメラ102と、サーバ103と、端末104と、を備える。また、複数のカメラ102とサーバ103と端末104とは、それぞれ有線または無線のネットワーク105により繋がれている。 The detection recognition system 101 includes a plurality of cameras 102, a server 103, and a terminal 104, as shown in FIG. The plurality of cameras 102, the server 103, and the terminal 104 are connected by a wired or wireless network 105.
 検知認識システム101は、例えば、カメラ102が監視用のカメラとしてコンビニエンストア等の建物内あるいは屋外等に設置され、カメラ102が撮影した対象を、その外形や動きから不審者であると認識した場合に、別の場所にある端末104に報知をするといったように使用することができる。また、このように不審者を認識した場合に、検知認識システム101の管理者106が持つ端末やシステム管理用の装置等に知らせるようにしてもよい。 For example, when the camera 102 is installed in a building such as a convenience store or outdoors as a monitoring camera, the detection recognition system 101 recognizes an object photographed by the camera 102 as a suspicious person based on its outer shape and movement. In addition, it can be used to notify the terminal 104 at another place. Further, when a suspicious person is recognized in this way, a terminal, a system management device, or the like held by the administrator 106 of the detection recognition system 101 may be notified.
 カメラ102は、図7に示すように、撮像手段120と、検知/認識手段121と、記録手段122と、通信手段123と、制御手段124と、を備える。
 撮像手段120は、例えば、レンズや固体撮像素子を有し、撮像により、画像を取得する。また、検知/認識手段121は、演算処理装置とメモリとを備え、画像認識を行う。具体的には、検知/認識手段121のメモリに備えられた検知/認識ファームウェアによる制御により、撮像手段120によって撮像された画像に含まれる特徴を検知し、この特徴から設定された認識対象を認識する。なお、以下で単に検知/認識といった場合には、基本的に、このように、撮像手段120によって撮像された画像に含まれる特徴を検知し、この特徴から設定された認識対象を認識することをいう。
As shown in FIG. 7, the camera 102 includes an imaging unit 120, a detection / recognition unit 121, a recording unit 122, a communication unit 123, and a control unit 124.
The imaging means 120 has a lens and a solid-state image sensor, for example, and acquires an image by imaging. The detection / recognition means 121 includes an arithmetic processing unit and a memory, and performs image recognition. Specifically, a feature included in the image captured by the imaging unit 120 is detected by control by the detection / recognition firmware provided in the memory of the detection / recognition unit 121, and a recognition target set from this feature is recognized. To do. In the case of simply detecting / recognizing below, basically, the feature included in the image picked up by the image pickup means 120 is detected as described above, and the recognition target set from this feature is recognized. Say.
 また、記録手段122は、検知/認識手段121での検知/認識のための参照画像その他の情報や、異常時(例えば、検知/認識手段121が設定された認識対象を認識したとき)の画像その他の情報(例えば、音声等)の記録をする。また、通信手段123は、ネットワーク105を介してサーバ103と通信し、異常時の画像その他の情報のサーバ103への送信と、サーバ103からの命令や検知認識ファームウェアの受信をする。また、通信手段123は、ネットワーク105を介して端末104や管理者106の持つ端末とも接続をし、異常時に、これらの端末やサーバ103にアラーム信号等を送信する。また、端末104や管理者106の持つ端末は、このアラーム信号を受信して、あるいは、アラーム信号を受信したサーバ103からのアラームを鳴らす旨の命令を受けてアラームを鳴らす等する。 The recording unit 122 also includes a reference image and other information for detection / recognition by the detection / recognition unit 121, and an image at the time of abnormality (for example, when the recognition target set by the detection / recognition unit 121 is recognized). Record other information (for example, voice, etc.). The communication unit 123 communicates with the server 103 via the network 105 to transmit an image and other information at the time of abnormality to the server 103 and to receive a command and detection recognition firmware from the server 103. The communication unit 123 is also connected to the terminal 104 or the terminal of the administrator 106 via the network 105, and transmits an alarm signal or the like to these terminals or the server 103 when an abnormality occurs. In addition, the terminal 104 or the terminal of the administrator 106 receives this alarm signal, or receives an instruction to sound an alarm from the server 103 that has received the alarm signal, and sounds the alarm.
 また、制御手段124は、演算処理装置およびメモリを備え、撮像手段120、検知/認識手段121、記録手段122および通信手段123を制御する。なお、制御手段124は、検知/認識手段121と演算処理装置またはメモリを共有することとしてもよい。 The control unit 124 includes an arithmetic processing unit and a memory, and controls the imaging unit 120, the detection / recognition unit 121, the recording unit 122, and the communication unit 123. Note that the control unit 124 may share the arithmetic processing unit or the memory with the detection / recognition unit 121.
 なお、撮像手段120、検知/認識手段121、記録手段122、通信手段123および制御手段124の全てをカメラ102が備える構成としなくてもよい。例えば、検知認識システム101は、カメラ102と有線または無線により接続され、カメラ102の制御やカメラ102で撮影した画像の表示等ができる端末をカメラ102の外部に備え、撮像手段120をカメラ102に配置し、検知/認識手段121、記録手段122、通信手段123および制御手段124を当該端末に設け、カメラ102に備えられた撮像手段120で撮影した画像に対して当該端末で検知/認識をするようにしてもよい。 It should be noted that the camera 102 may not be provided with all of the imaging unit 120, the detection / recognition unit 121, the recording unit 122, the communication unit 123, and the control unit 124. For example, the detection recognition system 101 is connected to the camera 102 by wire or wirelessly, and includes a terminal outside the camera 102 that can control the camera 102 and display an image captured by the camera 102. The detection / recognition means 121, the recording means 122, the communication means 123, and the control means 124 are provided in the terminal, and an image captured by the imaging means 120 provided in the camera 102 is detected / recognized by the terminal. You may do it.
 カメラ102は、例えば、一般的な監視カメラと同様の構成を有するもので、例えば、カメラ102の向きに応じて、撮像手段120が設定された画角に対応する撮像範囲を撮像する。検知認識システム101が備える複数のカメラ102には、それぞれ同種のカメラを用いてもよく、種類の異なるカメラを用いてもよい。また、それぞれのカメラ102の撮像範囲は重複していてもよく、全く異なるものであってもよい。
 本実施の形態においては、カメラ102として、2台のステレオカメラ102aと、1台の赤外線カメラ102bと、1台の単眼カメラ102cと、の計4台の種類の異なるカメラ102を用いるようになっており、4台のカメラ102の撮像範囲は互いに重複しているものとする。
 カメラ102として、視差から距離、サイズまたは3D構造等を算出可能なステレオカメラ102aを用いることで、視差から距離、サイズまたは3D構造等を算出できるので、検知/認識をするための演算処理装置等に必要な性能を低減させることができ、カメラが高性能な演算処理装置等を備えていなくても、検知/認識を容易に行うことができる。
For example, the camera 102 has the same configuration as that of a general monitoring camera. For example, the imaging unit 120 captures an imaging range corresponding to the set angle of view in accordance with the orientation of the camera 102. As the plurality of cameras 102 included in the detection recognition system 101, the same type of camera may be used, or different types of cameras may be used. Further, the imaging ranges of the respective cameras 102 may overlap or may be completely different.
In the present embodiment, a total of four different types of cameras 102 are used as the cameras 102: two stereo cameras 102a, one infrared camera 102b, and one monocular camera 102c. It is assumed that the imaging ranges of the four cameras 102 overlap each other.
By using the stereo camera 102a capable of calculating the distance, size, 3D structure, etc. from the parallax as the camera 102, the distance, size, 3D structure, etc. can be calculated from the parallax, so that an arithmetic processing unit for detection / recognition, etc. Therefore, even if the camera does not include a high-performance arithmetic processing unit or the like, detection / recognition can be easily performed.
 また、カメラ102として、赤外線カメラ(近赤外線カメラまたは遠赤外線カメラ)102bを用いることで、近赤外または遠赤外の画像を撮影することができ、人の目では見ることができないものも検知/認識することができる。また、夜間等、暗い環境における検知/認識も容易になる。 In addition, by using an infrared camera (near infrared camera or far infrared camera) 102b as the camera 102, a near infrared or far infrared image can be taken, and what cannot be seen by human eyes is also detected. / Can be recognized. Also, detection / recognition in a dark environment such as at night becomes easy.
 また、カメラ102の種類は、これらに限られるものではない。例えば、カメラ102として、距離画像センサを用いてもよい。距離画像センサとしては、例えば、TOF(Time Of Flight)を用いることができる。TOFは、投射したレーザが対象まで往復するのにかかる時間から、距離を計測する。 Further, the type of the camera 102 is not limited to these. For example, a distance image sensor may be used as the camera 102. As the distance image sensor, for example, TOF (Time Of Flight) can be used. The TOF measures the distance from the time taken for the projected laser to reciprocate to the target.
 つまり、カメラ102は、撮像手段120が1つの2次元画像を撮像し、この画像より、検知/認識を行うものでもよく、撮像手段120が2つの画像を撮像し、これらの画像の視差より、距離、サイズ、3D構造等を算出し、検知/認識を行うものでもよく、撮像手段120がTOFセンサ等により、3D距離画像を撮像し、この3D距離画像より、検知/認識を行うものでもよく、撮像手段120が近赤外や遠赤外の画像を撮像し、これらの画像より、検知/認識を行うものでもよい。また、1台のカメラ102が、上記の撮像手段120を複数備えていてもよい。すなわち、1台のカメラ102が、例えば、ステレオカメラおよび赤外線カメラの撮像機能を備えており、これらの機能により得られた画像から検知/認識を行ってもよい。 That is, the camera 102 may be one in which the imaging unit 120 captures one two-dimensional image and performs detection / recognition from this image, and the imaging unit 120 captures two images, and from the parallax of these images, The distance, size, 3D structure, and the like may be calculated and detected / recognized, or the imaging unit 120 may capture a 3D distance image using a TOF sensor or the like, and detect / recognize from the 3D distance image. The imaging unit 120 may pick up near-infrared or far-infrared images, and perform detection / recognition from these images. In addition, one camera 102 may include a plurality of the imaging means 120 described above. In other words, one camera 102 may include, for example, an imaging function of a stereo camera and an infrared camera, and detection / recognition may be performed from an image obtained by these functions.
 検知/認識手段121は、設定された認識対象を認識するものであり、認識対象は、具体的な物体(人および人以外の物も含む)の場合もあれば、抽象的な現象の場合も考えられる。つまり、認識対象が強盗犯、窃盗犯、放火犯のような人や拳銃のような物といった物体の場合もあれば、犯罪や火災といった現象等の場合もある。
 例えば、認識対象として強盗犯が設定されている状態において、コンビニエンスストア内に設置されたカメラ102の撮像手段120により、包丁や拳銃を持った人の画像が撮影された場合に、検知/認識手段121は、この画像から、包丁や拳銃を持った人を検知したり、この人の動きを検知したりして、この人を強盗犯と認識することが考えられる。また、例えば、認識対象として火災が設定されている状態において、赤外線カメラにより得られた画像から、ある場所の温度が異常に高いことを検知し、火災が発生していると認識したりすることが考えられる。また、例えば、赤外線カメラが遠赤外線を使ったものであれば、温度を検知することができ、拳銃、ナイフ等の武器と体温との温度差により、服のポケット等に隠し持った拳銃、ナイフ等の武器を画像認識して検知することも考えられる。ただし、検知/認識手段121の検知/認識ファームウェアは、後述する機械学習手段130での機械学習により生成されるので、実際には、検知/認識手段121が、このような人に理解しやすい(理解可能な)認識の仕方をするとは限らない。
 つまり、検知/認識手段121は、検知/認識ファームウェアによる制御により、撮像手段120によって撮像された画像に含まれる特徴を検知し、この特徴から設定された認識対象を認識するものである。
 なお、検知/認識手段121は、画像だけでなく、音声も使用して検知/認識を行ってもよい。例えば、カメラ102がマイク等の音声入力手段を備えており、この音声入力手段で取得された音声を使用して検知/認識を行うことで、検知/認識の精度を高めることができる。また、後述するサーバ側検知/認識手段132での検知/認識においても、同様に、音声を使用してもよい。
The detection / recognition means 121 recognizes a set recognition target, and the recognition target may be a specific object (including a person and an object other than a person) or an abstract phenomenon. Conceivable. In other words, the recognition target may be an object such as a person such as a robber, a thief, or a arson, or an object such as a handgun, or a phenomenon such as a crime or fire.
For example, when a robbery is set as a recognition target, when an image of a person holding a knife or a handgun is taken by the image pickup means 120 of the camera 102 installed in a convenience store, the detection / recognition means It can be considered that the person 121 recognizes this person as a burglar by detecting a person holding a knife or a handgun or detecting the movement of the person from this image. In addition, for example, when a fire is set as a recognition target, it is detected that the temperature of a certain place is abnormally high from an image obtained by an infrared camera and recognizes that a fire has occurred. Can be considered. For example, if the infrared camera uses far-infrared rays, the temperature can be detected, and the handgun, knife, etc. hidden in the pocket of clothes due to the temperature difference between the handgun, knife and other weapons and body temperature. It is also conceivable to detect and recognize the weapons of this. However, since the detection / recognition firmware of the detection / recognition means 121 is generated by machine learning in the machine learning means 130 described later, actually, the detection / recognition means 121 is easy for such a person to understand ( It doesn't always make a way of recognition.
That is, the detection / recognition unit 121 detects a feature included in the image captured by the imaging unit 120 under the control of the detection / recognition firmware, and recognizes a recognition target set from the feature.
Note that the detection / recognition means 121 may perform detection / recognition using not only images but also sound. For example, the camera 102 includes a voice input unit such as a microphone, and the detection / recognition accuracy can be improved by performing detection / recognition using the voice acquired by the voice input unit. Similarly, voice may be used in detection / recognition by the server-side detection / recognition means 132 described later.
 なお、検知/認識手段121の検知/認識ファームウェアは、後述する機械学習手段130および検知/認識ファームウェア生成手段131で生成された新たな検知/認識ファームウェアによって更新されるが、更新される前の、最初に検知/認識手段121に備えられる検知/認識ファームウェアは、機械学習手段130および検知/認識ファームウェア生成手段131により生成されたものでもよく、他の機械学習ができる機器によって生成されたものを検知/認識手段121に組み込んだものであってもよい。また、機械学習以外の方法により生成された検知/認識ファームウェアを最初に検知/認識手段121に備えることとしてもよい。 The detection / recognition firmware of the detection / recognition unit 121 is updated by new detection / recognition firmware generated by the machine learning unit 130 and the detection / recognition firmware generation unit 131 described later. First, the detection / recognition firmware provided in the detection / recognition unit 121 may be generated by the machine learning unit 130 and the detection / recognition firmware generation unit 131, and is detected by another machine learning capable device. / It may be incorporated in the recognition means 121. Alternatively, the detection / recognition means 121 may be initially provided with detection / recognition firmware generated by a method other than machine learning.
 また、検知/認識手段121で認識する対象の設定は、検知/認識ファームウェアに含まれているものとする。例えば、検知/認識ファームウェアを機械学習手段130および検知/認識ファームウェア生成手段131により生成する場合において、認識する対象をコンビニエンスストアでの強盗犯としたい場合、機械学習の教師データとして、例えば、コンビニエンスストアで強盗をした強盗犯が写っている複数の画像と、これらの画像が強盗犯を示す画像だという情報とを教師データとして(画像に強盗犯というタグ付けをして)機械学習手段130に与える。すると、機械学習により、与えられた画像(教師データ)のどこに注目すれば強盗犯を認識することができるかが学習される。そして、機械学習の結果、画像から強盗犯を認識することができる確率の高い検知/認識アルゴリズムが生成される。そして、この検知/認識アルゴリズムが検知/認識ファームウェア生成手段131により変換され、検知/認識ファームウェアが生成される。つまり、この学習により得られた検知/認識ファームウェア(検知/認識アルゴリズム)は、画像のどこに注目すれば、画像に強盗犯が含まれているかを認識することができるものであり、認識する対象として、強盗犯が設定されているといえるということである。なお、この機械学習を行う際に画像に対するタグ付けは必ずしも必要ではない。例えば、教師データとして、強盗犯が写っている画像しか与えないのであれば、それが強盗犯を示す画像だという情報がなくても、教師データとして与えられた画像と特徴が近い画像を認識するアルゴリズムを生成することで、強盗犯を認識するアルゴリズムを生成することは可能である。
 なお、検知/認識ファームウェアに設定されている認識対象(検知/認識ファームウェアが認識する対象)は、1つとは限らず、複数設定されていてもよい。
It is assumed that the target setting recognized by the detection / recognition means 121 is included in the detection / recognition firmware. For example, in the case where the detection / recognition firmware is generated by the machine learning means 130 and the detection / recognition firmware generation means 131, if the object to be recognized is a burglar at a convenience store, the machine learning teacher data is, for example, a convenience store The machine learning means 130 is provided with a plurality of images showing robbery robbers who have been robbed and information that these images are images showing burglars as teacher data (tag the images as robbers). . Then, by machine learning, it is learned where a given image (teacher data) can be recognized to recognize a burglar. As a result of machine learning, a detection / recognition algorithm with a high probability of being able to recognize a burglar from an image is generated. Then, the detection / recognition algorithm is converted by the detection / recognition firmware generation means 131, and detection / recognition firmware is generated. That is, the detection / recognition firmware (detection / recognition algorithm) obtained by this learning can recognize where in the image the robber is included, and can be recognized as an object to be recognized. That is, it can be said that robbery is set. Note that tagging an image is not always necessary when performing this machine learning. For example, if only the image showing the burglar is given as teacher data, it recognizes an image whose characteristics are similar to the image given as the teacher data even if there is no information that it is an image showing the burglar It is possible to generate an algorithm for recognizing a burglar by generating an algorithm.
Note that the number of recognition targets (targets recognized by the detection / recognition firmware) set in the detection / recognition firmware is not limited to one, and a plurality of recognition targets may be set.
 以上のように、検知/認識ファームウェアは、特定の対象を認識するものであり、検知/認識手段121は、検知/認識ファームウェアによりこの特定の対象を認識した場合に、認識をした旨の信号等(例えば、アラーム信号)を出力する。また、この認識をした旨の信号等は、通信手段123を介してサーバ103や端末104、管理者106の持つ端末等に送られ、これらの端末等に設定対象を認識した旨の通知がなされる。なお、この認識をした旨の信号等は、サーバ103のみに送られ、サーバ103において各カメラ102からの情報を総合的に判断した上で、サーバ103から端末104等に、認識対象を認識した旨のメールやアラームを鳴らす旨の命令等のアラーム情報等を送るようにしてもよい。 As described above, the detection / recognition firmware recognizes a specific target, and when the detection / recognition unit 121 recognizes this specific target by the detection / recognition firmware, a signal indicating that the detection has been performed. (For example, an alarm signal) is output. Further, a signal indicating that the recognition has been performed is transmitted to the server 103, the terminal 104, the terminal of the administrator 106, and the like via the communication unit 123, and notification that the setting target has been recognized is made to these terminals. The The signal indicating that the recognition has been performed is sent only to the server 103, and the server 103 comprehensively determines the information from each camera 102 and then recognizes the recognition target from the server 103 to the terminal 104 or the like. Alarm information such as an e-mail or an instruction to sound an alarm may be sent.
 また、4台のカメラ102は、撮像範囲が互いに重複しており、検知/認識ファームウェアに設定されている認識対象のうち重複する部分について、4台のカメラ102で同時に認識することが可能となっている。つまり、重複する認識対象として、例えば強盗犯が設定されている場合、4台のカメラで同時に、特定の強盗を行う特定の強盗犯を認識することが可能となっている。 In addition, the four cameras 102 have overlapping imaging ranges, and the four cameras 102 can simultaneously recognize overlapping portions of the recognition targets set in the detection / recognition firmware. ing. That is, for example, when a burglar is set as an overlapping recognition target, it is possible to simultaneously recognize a specific burglar who performs a specific burglary with four cameras.
 サーバ103は、図8に示すように、機械学習手段130と、検知/認識ファームウェア生成手段131と、サーバ側検知/認識手段132と、サーバ側記録手段133と、サーバ側通信手段134と、サーバ側制御手段135と、を備える。また、機械学習手段130、検知/認識ファームウェア生成手段131、サーバ側検知/認識手段132およびサーバ側制御手段135は、演算処理装置とメモリとを有するが、それぞれが個別の演算処理装置またはメモリを有していてもよく、演算処理装置またはメモリを共有するものであってもよい。 As shown in FIG. 8, the server 103 includes machine learning means 130, detection / recognition firmware generation means 131, server-side detection / recognition means 132, server-side recording means 133, server-side communication means 134, server Side control means 135. The machine learning unit 130, the detection / recognition firmware generation unit 131, the server-side detection / recognition unit 132, and the server-side control unit 135 each include an arithmetic processing unit and a memory, but each has an individual arithmetic processing unit or memory. You may have and you may share an arithmetic processing unit or memory.
 機械学習手段130は、例えば、ディープラーニング等の機械学習を行い検知/認識アルゴリズムを生成する。ここで、検知/認識アルゴリズムとは、カメラ102の撮像手段120で撮像された画像から、設定された認識対象を認識するためのアルゴリズムである。 The machine learning unit 130 performs machine learning such as deep learning to generate a detection / recognition algorithm. Here, the detection / recognition algorithm is an algorithm for recognizing a set recognition target from an image captured by the imaging unit 120 of the camera 102.
 検知/認識ファームウェア生成手段131は、機械学習手段130が生成した検知/認識アルゴリズムを各カメラ102で実行可能なファームウェアに変換し、検知/認識ファームウェアを生成する。各カメラ102は、撮像手段120により取得できる画像の解像度や、検知/認識手段121の演算処理装置の性能、検知/認識手段121用のGPU(Graphics Processing Unit)の有無、マイク等の音声入力手段の有無、カメラの種類(ステレオカメラか、TOFセンサか等)等が異なるので、各カメラ102で実行可能なファームウェアも異なる。検知/認識ファームウェア生成手段131により、機械学習により生成した検知/認識アルゴリズムを各カメラ102で実行可能なファームウェアに変換することで、各カメラ102に新しい検知/認識用のプログラムを実装することが可能となる。 The detection / recognition firmware generation unit 131 converts the detection / recognition algorithm generated by the machine learning unit 130 into firmware that can be executed by each camera 102, and generates detection / recognition firmware. Each camera 102 has an image resolution that can be acquired by the imaging means 120, the performance of the arithmetic processing unit of the detection / recognition means 121, the presence / absence of a GPU (Graphics Processing Unit) for the detection / recognition means 121, and a voice input means such as a microphone. And the type of camera (stereo camera, TOF sensor, etc.) are different, and the firmware that can be executed by each camera 102 is also different. The detection / recognition firmware generation unit 131 converts a detection / recognition algorithm generated by machine learning into firmware that can be executed by each camera 102, so that a new detection / recognition program can be installed in each camera 102. It becomes.
 サーバ側検知/認識手段132は、各カメラ102の画像や情報から総合的に状況を判断して検知/認識をする。例えば、各カメラ102の検知/認識手段121は、そのカメラ102の撮像手段120により取得した画像を使用して検知/認識を行うが、サーバ側検知/認識手段132は、複数のカメラ102で取得した画像を使用して検知/認識を行う。また、各カメラ102で行うには重い処理である場合に、サーバ側検知/認識手段132が処理の一部を行うようにしてもよい。また、サーバ側検知/認識手段132の検知/認識ファームウェアは、サーバ側検知/認識手段132のメモリに備えられている。また、サーバ側検知/認識手段132の検知/認識ファームウェアも機械学習手段130および検知/認識ファームウェア生成手段131により生成された検知/認識ファームウェアによって、更新することが可能となっている。 The server-side detection / recognition means 132 performs detection / recognition by comprehensively judging the situation from the images and information of each camera 102. For example, the detection / recognition unit 121 of each camera 102 performs detection / recognition using the image acquired by the imaging unit 120 of the camera 102, but the server-side detection / recognition unit 132 is acquired by a plurality of cameras 102. Detection / recognition is performed using the selected image. Further, when the processing is heavy to be performed by each camera 102, the server-side detection / recognition unit 132 may perform a part of the processing. Further, the detection / recognition firmware of the server side detection / recognition means 132 is provided in the memory of the server side detection / recognition means 132. Further, the detection / recognition firmware of the server-side detection / recognition means 132 can also be updated by the detection / recognition firmware generated by the machine learning means 130 and the detection / recognition firmware generation means 131.
 また、サーバ側検知/認識手段132は、4台のカメラ102(カメラ102a,102b,102c)の検知/認識手段121の認識結果から、各カメラ102における認識対象の認識が正しいものか、あるいは、各カメラ102における認識対象の認識が正しい確率等を判断してもよい。そして、この判断結果から、端末104等にアラーム情報等を送るようにしてもよい。例えば、4台のカメラ全てから設定対象(例えば強盗)を認識した旨の通知があった場合に、サーバ側検知/認識手段132は、設定対象の認識が正しいと判断し、端末104等にアラームを鳴らすように命令してもよい。また、設定対象を認識したカメラの台数によって、アラーム情報の内容を変える等してもよい。例えば、4台のカメラ全てが設定対象を認識した場合には、認識が正しいと判断して、サーバ側検知/認識手段132は、端末104に大きなアラーム音を鳴らすように命令するが、3台以下のカメラしか設定対象を認識しなかった場合には、認識が正しい可能性があると判断して、サーバ側検知/認識手段132は、端末104に小さなアラーム音を鳴らすように命令するといったようにしてもよい。 Further, the server side detection / recognition means 132 determines whether the recognition target of each camera 102 is correct from the recognition results of the detection / recognition means 121 of the four cameras 102 ( cameras 102a, 102b, 102c), or The probability that the recognition of the recognition target in each camera 102 is correct may be determined. Then, based on this determination result, alarm information or the like may be sent to the terminal 104 or the like. For example, when there is a notification that all four cameras have recognized the setting target (for example, robbery), the server-side detection / recognition unit 132 determines that the setting target is recognized correctly, and notifies the terminal 104 or the like May be ordered to sound. Further, the content of the alarm information may be changed depending on the number of cameras that have recognized the setting target. For example, when all four cameras recognize the setting target, the server side detection / recognition unit 132 determines that the recognition is correct and instructs the terminal 104 to sound a loud alarm sound. When only the following cameras are recognized, the server side detection / recognition unit 132 determines that the recognition may be correct, and instructs the terminal 104 to sound a small alarm sound. It may be.
 また、サーバ側検知/認識手段132は、複数のカメラ102の検知/認識手段121の認識結果から、カメラ102の誤認識(認識ミス)の判断を行う。例えば、4台のカメラ102のうち、3台のカメラ102から検知/認識手段121により設定対象を認識した旨の通知があり、1台のカメラ102からは通知がなかった場合に、この1台のカメラ102は誤認識(認識ミス)をしたと判断をしたりする。また、逆に4台のカメラ102のうち、3台のカメラ102からは検知/認識手段121により設定対象を認識した旨の通知がなかったが、1台のカメラ102からは通知があった場合に、この1台のカメラ102は誤認識(認識ミス)をしたと判断をすることとしてもよい。
 なお、サーバ側検知/認識手段132での検知/認識の結果と、各カメラ102での検知/認識の結果とを比較して、各カメラ102の誤認識(認識ミス)の判断をしてもよい。
Further, the server side detection / recognition means 132 determines the erroneous recognition (recognition error) of the camera 102 from the recognition results of the detection / recognition means 121 of the plurality of cameras 102. For example, when there is a notification that the setting target is recognized by the detection / recognition means 121 from the three cameras 102 out of the four cameras 102 and there is no notification from the one camera 102, this one The camera 102 determines that a recognition error (recognition error) has occurred. Conversely, out of the four cameras 102, three cameras 102 did not receive a notification that the setting target was recognized by the detection / recognition means 121, but one camera 102 received a notification. In addition, it may be determined that the one camera 102 has made a recognition error (recognition error).
Even if the detection / recognition result by the server-side detection / recognition means 132 is compared with the detection / recognition result by each camera 102, the erroneous recognition (recognition error) of each camera 102 is determined. Good.
 サーバ側記録手段133は、機械学習手段130で行う機械学習の教師データ等を記録する。また、サーバ側通信手段134は、ネットワーク105を介して各カメラ102と通信をし、各カメラ102からの画像その他の情報の受信、各カメラ102への命令や検知/認識ファームウェアの送信、異常時(設定対象を認識したとき)のアラーム情報の端末104や管理者106への送信をする。 The server-side recording means 133 records teacher data for machine learning performed by the machine learning means 130 and the like. The server-side communication means 134 communicates with each camera 102 via the network 105, receives images and other information from each camera 102, transmits commands and detection / recognition firmware to each camera 102, and is in an abnormal state. The alarm information (when the setting target is recognized) is transmitted to the terminal 104 or the administrator 106.
 次に、このような検知認識システム101の検知認識ファームウェアの更新方法について、図9のフローチャートを参照して説明する。
 カメラ102は、撮像手段120により画像を取得し、検知/認識手段121の検知/認識ファームウェアによる制御により設定された認識対象の認識(検知/認識)を行う。そして、認識を誤った場合に、認識を誤った時の画像を、サーバ103に送信する(ステップS11)。なお、認識を誤った時の画像と同時に、認識を誤った時の音声データ等も送信してもよい。
 なお、認識を誤ったかどうかの判断は、上述のように、複数のカメラ102の認識結果からサーバ側検知/認識手段132が判断する。例えば、カメラ102が設定された認識対象(例えば、強盗犯)を認識したときは、サーバ103に認識した旨(例えば、強盗犯を認識した旨)を通知するシステムにおいて、カメラ102aおよびカメラ102bからは認識した旨の通知がサーバ103にあったが、カメラ102cからは認識した旨の通知がなかった場合、サーバ側検知/認識手段132はこれらの通知結果から、カメラ102cは認識を誤った(認識をすることができなかった)と判断する。この際、サーバ103の制御手段は、カメラ102cに対して、カメラ102aおよびカメラ102bが認識対象を認識した画像を取得した時刻と同じ時刻またはこの前後の時刻(例えば、前後数秒から数分)にカメラ102cが取得した画像を、認識を誤った時の画像として、サーバ103に送信するように、カメラ102cに対して命令する。この命令を受けてカメラ102cは、認識を誤った時の画像をサーバ103に送信する。
 なお、認識を誤ったかどうかは、人が判断してもよい。例えば、検知認識システム101は、カメラ102で撮影した画像を表示する表示手段とポインティングデバイスやキーボード等の入力手段を備えた端末とを備えており、カメラ102が、強盗犯を認識することができなかった場合に、人が、この端末の表示手段からカメラ102で撮影した画像を確認して、強盗犯を認識して欲しかった画像をこの端末の入力手段を用いて人が選択し、認識を誤った時の画像としてサーバ103に送信することとしてもよい。
Next, a method for updating the detection recognition firmware of the detection recognition system 101 will be described with reference to the flowchart of FIG.
The camera 102 acquires an image by the imaging unit 120 and performs recognition (detection / recognition) of a recognition target set by control by the detection / recognition firmware of the detection / recognition unit 121. When the recognition is wrong, an image when the recognition is wrong is transmitted to the server 103 (step S11). Note that audio data or the like when the recognition is wrong may be transmitted simultaneously with the image when the recognition is wrong.
The server side detection / recognition means 132 determines whether or not the recognition is wrong from the recognition results of the plurality of cameras 102 as described above. For example, when the camera 102 recognizes a set recognition target (for example, a burglar), in the system that notifies the server 103 of the recognition (for example, the burglar is recognized), the camera 102a and the camera 102b Is notified to the server 103, but if there is no notification from the camera 102c that the camera 102c has recognized, the server-side detection / recognition means 132 has recognized the camera 102c incorrectly from these notification results ( Judgment was not possible. At this time, the control means of the server 103 gives the camera 102c the same time as or the time before and after the time when the camera 102a and the camera 102b recognized the recognition target (for example, several seconds to several minutes before and after). The camera 102c is instructed to transmit the image acquired by the camera 102c to the server 103 as an image when recognition is erroneous. Upon receiving this command, the camera 102 c transmits an image when recognition is erroneous to the server 103.
A person may determine whether or not the recognition is wrong. For example, the detection recognition system 101 includes a display unit that displays an image captured by the camera 102 and a terminal that includes an input unit such as a pointing device or a keyboard. The camera 102 can recognize a burglar. If not, the person confirms the image taken by the camera 102 from the display means of this terminal, and the person selects the image that he / she wanted to recognize the burglar using the input means of this terminal, and recognizes it. It is good also as transmitting to the server 103 as an image at the time of a mistake.
 サーバ側制御手段135は、カメラ102から送られた認識を誤った時の画像を、教師データ(教育データ)としてサーバ側記録手段133に記録する。また、認識を誤った時の画像の記録と共に、検知/認識手段121に出して欲しかった認識結果(例えば、画像から強盗犯を認識して欲しかった旨)を教師データとしてサーバ側記録手段133に記録する。
 なお、この教師データとして記録する検知/認識手段121に出して欲しかった認識結果は、サーバ103で作り出すものであってもよく、カメラ102から送られてくるものであってもよい。例えば、認識を誤ったかどうかの判断を、複数のカメラ102の認識結果からサーバ側検知/認識手段132がする場合、サーバ側検知/認識手段132は、正しいであろう認識結果(検知/認識手段121に出して欲しかった認識結果)を教師データとして作成し、この教師データをサーバ側記録手段133に記録することとしてもよい。また、例えば、人がカメラ102で撮影した画像を確認して、認識を誤ったかどうかの判断をする場合、人が上述の端末から、強盗犯を認識して欲しかった画像を選択する際に、強盗犯を認識して欲しかった旨(画像が強盗犯を示すものである旨)もこの端末の入力手段を用いて入力し、認識を誤った時の画像と共にサーバ103に送信し、この送信されたデータを教師データとして、サーバ側制御手段135がサーバ側記録手段133に記録するものとしてもよい。
The server-side control unit 135 records an image sent from the camera 102 when the recognition is wrong in the server-side recording unit 133 as teacher data (education data). Further, together with the recording of the image when the recognition is wrong, the recognition result desired to be output to the detection / recognition means 121 (for example, the fact that the robber was desired to be recognized from the image) is stored in the server-side recording means 133 as teacher data. Record.
The recognition result desired to be output to the detection / recognition means 121 recorded as the teacher data may be generated by the server 103 or may be sent from the camera 102. For example, when the server-side detection / recognition unit 132 determines whether or not the recognition is wrong from the recognition results of the plurality of cameras 102, the server-side detection / recognition unit 132 determines that the recognition result (detection / recognition unit) will be correct. (Recognition result desired to be output to 121) may be created as teacher data, and the teacher data may be recorded in the server-side recording unit 133. Also, for example, when checking the image taken by the person with the camera 102 and determining whether the recognition is wrong, when selecting the image that the person wanted to recognize the burglar from the above-mentioned terminal, The fact that he / she wanted to recognize the burglar (the fact that the image indicates a burglar) is also input using the input means of this terminal, and is sent to the server 103 together with the image when the recognition is wrong. The server-side control unit 135 may record the data in the server-side recording unit 133 as teacher data.
 機械学習手段130は、サーバ側記録手段133に記録された教師データを読み出す(ステップS12)。そして、機械学習手段130は、この読み出した教師データに含まれる認識を誤った際の画像から、畳み込み演算により、特徴点を抽出する(ステップS13)。機械学習手段130は、抽出された特徴点と、検知/認識手段121に出して欲しかった認識結果との情報から、機械学習を行う(ステップS14)。そして、機械学習の結果、検知認識処理を行うニューラルネットワークである検知/認識アルゴリズムが生成される(ステップS15)。 The machine learning unit 130 reads the teacher data recorded in the server side recording unit 133 (step S12). Then, the machine learning means 130 extracts feature points by convolution calculation from the image when the recognition included in the read teacher data is erroneous (step S13). The machine learning unit 130 performs machine learning from the information of the extracted feature points and the recognition result that the detection / recognition unit 121 wanted to output (step S14). As a result of machine learning, a detection / recognition algorithm that is a neural network that performs detection and recognition processing is generated (step S15).
 機械学習手段130での機械学習は、カメラ102毎に検知/認識アルゴリズム(検知/認識ファームウェア)が最適化されるように行われる。各カメラ102は、カメラの種類等が違う場合もあれば、まったく同じ特性を持つカメラであっても、設置場所や使用される環境が違う場合もあるので、これらの違いによって最適なアルゴリズムが異なってくる場合もあるからである。機械学習手段130は、元の検知/認識アルゴリズムと教師データとを基に、教師データに含まれる認識を誤った際の画像から、教師データに含まれる検知/認識手段121に出して欲しかった認識結果を出すことができる新たな検知/認識アルゴリズムを生成する。なお、機械学習に使う元の検知/認識アルゴリズムはサーバ側記録手段133に記録しておくものとしてもよく、カメラ102から検知/認識ファームウェアを送信してもらい、この検知/認識ファームウェアを検知/認識アルゴリズムに変換して使用してもよい。つまり、機械学習手段130は、検知/認識を誤ったカメラ102の検知/認識ファームウェアに用いられている検知/認識アルゴリズムと教師データとから、新たな検知/認識アルゴリズムを生成する。 Machine learning by the machine learning means 130 is performed so that a detection / recognition algorithm (detection / recognition firmware) is optimized for each camera 102. Each camera 102 may have a different type of camera, or even a camera with exactly the same characteristics, and may have a different installation location and environment, so the optimum algorithm differs depending on these differences. This is because it may come. Based on the original detection / recognition algorithm and the teacher data, the machine learning unit 130 recognizes the detection / recognition unit 121 included in the teacher data from the image when the recognition included in the teacher data is wrong. Generate a new detection / recognition algorithm that can produce a result. The original detection / recognition algorithm used for machine learning may be recorded in the server-side recording unit 133. Detection / recognition firmware is transmitted from the camera 102, and the detection / recognition firmware is detected / recognized. It may be converted into an algorithm and used. That is, the machine learning unit 130 generates a new detection / recognition algorithm from the detection / recognition algorithm used in the detection / recognition firmware of the camera 102 with the wrong detection / recognition and the teacher data.
 検知/認識ファームウェア生成手段131は、機械学習手段130で生成された検知/認識アルゴリズムを各カメラ用の検知/認識ソフトである検知/認識ファームウェアに変換する(ステップS16)。つまり、検知/認識アルゴリズムは、検知/認識ファームウェア生成手段131により、各カメラで実行可能な形式のソフトウェアに変換される。
 サーバ側通信手段134は、検知/認識ファームウェア生成手段131で生成された検知/認識ソフトである検知/認識ファームウェアをカメラ102に送信する(ステップS17)。そして、カメラ102が検知/認識ファームウェアを受け取ると、カメラ102の制御手段124は、検知/認識手段121のファームウェアを、新しい検知/認識ファームウェアに更新する。
The detection / recognition firmware generation means 131 converts the detection / recognition algorithm generated by the machine learning means 130 into detection / recognition firmware that is detection / recognition software for each camera (step S16). That is, the detection / recognition algorithm is converted into software in a format that can be executed by each camera by the detection / recognition firmware generation unit 131.
The server-side communication unit 134 transmits detection / recognition firmware, which is detection / recognition software generated by the detection / recognition firmware generation unit 131, to the camera 102 (step S17). When the camera 102 receives the detection / recognition firmware, the control unit 124 of the camera 102 updates the firmware of the detection / recognition unit 121 to the new detection / recognition firmware.
 本実施の形態の検知認識システムによれば、カメラ102の検知/認識手段121の検知認識ファームウェアは、サーバ103の機械学習手段130および検知/認識ファームウェア生成手段131により生成された新しい検知/認識ファームウェアに更新することができる。
 機械学習手段130による機械学習は、カメラ102の検知/認識手段121が、設定された認識対象の認識を誤った際の画像を教師データとして行われるので、この教師データを用いた機械学習では、その画像について設定された認識対象の認識を誤らないように検知/認識アルゴリズムが改善される。したがって、カメラ102の検知/認識の性能を向上させることができる。
According to the detection recognition system of the present embodiment, the detection / recognition firmware of the detection / recognition unit 121 of the camera 102 is the new detection / recognition firmware generated by the machine learning unit 130 and the detection / recognition firmware generation unit 131 of the server 103. Can be updated.
The machine learning by the machine learning unit 130 is performed using, as teacher data, an image when the detection / recognition unit 121 of the camera 102 erroneously recognizes the set recognition target. In machine learning using this teacher data, The detection / recognition algorithm is improved so that the recognition target set for the image is not erroneously recognized. Therefore, the detection / recognition performance of the camera 102 can be improved.
 また、機械学習はサーバ103で行い、カメラ102は、サーバ103で生成された検知認識ファームウェアの実行をすればよいので、カメラ102の演算能力等がそれほど高くなくても、検知認識ファームウェアを更新して、精度の高い検知/認識を行えるようにすることができる。また、カメラが年数の経過とともに、他のカメラに比べて相対的に性能の低いものになることもなく、むしろ、使用とともに徐々に性能を向上させることが可能となる。また、カメラ102が使用される環境に適した検知/認識ができるように、カメラ102の性能を改善させていくことが可能となる。 Further, since the machine learning is performed by the server 103 and the camera 102 only needs to execute the detection recognition firmware generated by the server 103, the detection recognition firmware is updated even if the computing ability of the camera 102 is not so high. Thus, highly accurate detection / recognition can be performed. In addition, the camera does not have a relatively low performance as compared with other cameras over the years, but rather the performance can be gradually improved with use. In addition, the performance of the camera 102 can be improved so that detection / recognition suitable for the environment in which the camera 102 is used can be performed.
 また、機械が自ら学習することで、人であれば気づくことができないような場合でも、設定された認識対象を認識することが可能となる。例えば、教育データとして、強盗犯が実際に強盗をしているときの画像を与えるのではなく、実際に強盗が行なわれたときよりも前の、強盗犯が写っている画像等を与えることで、実際に強盗が行なわれている場合に強盗犯を認識するようなアルゴリズムではなく、コンビニエンスストア内やコンビニエンスストアの周囲をうろつく人の挙動等から、将来的に強盗におよぶ可能性の高い人の特徴を見出し、このような人を強盗犯(強盗犯となる可能性が高い人)として認識するような検知/認識アルゴリズムを生成することも可能になる。なお、実際にどのような特徴に注目して認識を行うかは、機械学習手段130が判断するので、挙動から強盗におよぶ可能性の高い人を認識するとは限らない。 Also, since the machine learns by itself, it becomes possible to recognize the set recognition target even when it cannot be noticed by a person. For example, as educational data, instead of giving an image when the burglar actually robbed, giving an image of the robber before the robber was actually taken. It is not an algorithm that recognizes a burglar when a burglary is actually being carried out, but a person who is likely to be a burglar in the future because of the behavior of people who roam around the convenience store or around the convenience store. It is also possible to generate a detection / recognition algorithm that finds the characteristics and recognizes such a person as a burglar (a person who is likely to be a burglar). Note that since the machine learning means 130 determines which features are actually focused and recognized, it does not always recognize a person who is likely to be a robber from a behavior.
 また、本実施の形態の検知認識システムによれば、4台のカメラ102は、撮像範囲が互いに重複しているので、検知/認識ファームウェアに設定されている認識対象のうち重複する部分について、4台のカメラ102で同時に認識することができる。したがって、4台のカメラ102のうち、数台が認識対象を認識することができなくても、4台のカメラ102のうちの他のカメラが認識対象を認識することが可能となるので、検知/認識をすることができる可能性を高め、システム全体としての検知/認識の精度を高めることができる。
 また、4台のカメラ102は、ステレオカメラ102a、赤外線カメラ102b、単眼カメラ102cという、撮像手段120の種類の異なるカメラを含む。したがって、例えば、ステレオカメラ102aでは検知/認識をすることが困難な場合でも、赤外線カメラ102bで検知/認識をすることができる等して、同一種類のカメラ102を用いた場合に比べ、システム全体としての検知/認識の精度を高めることができる。
 なお、複数のカメラ102は、それぞれ撮像範囲が全く異なる場所に設置されていたり、全く異なる認識対象を認識するものであったりしてもよい。
Also, according to the detection recognition system of the present embodiment, since the four cameras 102 have overlapping imaging ranges, there are four overlapping portions of the recognition targets set in the detection / recognition firmware. Can be recognized simultaneously by two cameras 102. Therefore, even if several of the four cameras 102 cannot recognize the recognition target, the other cameras of the four cameras 102 can recognize the recognition target. The possibility of being able to be recognized / recognized can be increased, and the accuracy of detection / recognition as a whole system can be increased.
The four cameras 102 include cameras with different types of imaging means 120, such as a stereo camera 102a, an infrared camera 102b, and a monocular camera 102c. Therefore, for example, even when it is difficult to detect / recognize with the stereo camera 102a, the infrared camera 102b can detect / recognize the entire system as compared with the case where the same type of camera 102 is used. The accuracy of detection / recognition can be improved.
The plurality of cameras 102 may be installed in places where the imaging ranges are completely different, or may recognize completely different recognition targets.
 また、サーバ側検知/認識手段132は、4台のカメラ102の検知/認識手段121の認識結果から、各カメラ102における認識対象の認識が正しいものかを判断したり、各カメラ102における認識対象の認識が正しい確率等を判断したり、カメラ102の誤認識(認識ミス)の判断をしたりすることができる。したがって、個々のカメラ102における検知/認識の結果から、サーバ側検知/認識手段132で認識が正しいと判断した場合にのみ、端末104等からアラーム音を発するようにしたりすることができる。
 また、カメラ102の誤認識を自動で判断し、誤認識をしたカメラ102について、検知/認識の能力を改善するよう、自動で機械学習を行うようにすることができる。そして、このときに機械学習の教師データとして用いる画像には、誤認識をした際の画像を用いることができるので、この誤認識をした際の画像について認識を誤ることがないように、学習することが可能となる。したがって、誤認識を自動で判断し、カメラ102の使用とともに検知/認識の精度を高めていくことができる。
The server-side detection / recognition means 132 determines whether the recognition target of each camera 102 is correct from the recognition results of the detection / recognition means 121 of the four cameras 102, or recognizes the recognition target of each camera 102. It is possible to determine the probability that the recognition of the camera is correct, and to determine the misrecognition (recognition error) of the camera 102. Therefore, the alarm sound can be emitted from the terminal 104 or the like only when the server-side detection / recognition means 132 determines that the recognition is correct from the detection / recognition results of the individual cameras 102.
Further, it is possible to automatically determine the misrecognition of the camera 102 and automatically perform machine learning so as to improve the detection / recognition ability of the misrecognized camera 102. In addition, the image used as the machine learning teacher data at this time can be the image that was misrecognized, so learning is performed so as not to misrecognize the image that has been misrecognized. It becomes possible. Therefore, it is possible to automatically determine misrecognition, and to improve the accuracy of detection / recognition as the camera 102 is used.
 なお、機械学習を行うタイミングは、必要に応じて適宜調整してもよい。例えば、教師データを記録手段122やサーバ側記録手段133に溜めておき、一定数以上溜まった場合や、一定期間経過したときに機械学習を行ってもよい。
 また、機械学習は、撮像手段120により撮像された画像以外の画像を用いて行ってもよい。撮像手段120で撮像される画像だけでは、教師データの数や質が十分でない場合に、機械学習手段130に、他の画像を与えることで機械学習の効果を向上させることができる。
Note that the timing at which machine learning is performed may be adjusted as necessary. For example, teacher data may be stored in the recording unit 122 or the server-side recording unit 133, and machine learning may be performed when a certain number or more have been accumulated or when a certain period has elapsed.
The machine learning may be performed using an image other than the image captured by the imaging unit 120. When the number and quality of the teacher data are not sufficient with only the image captured by the imaging unit 120, the machine learning effect can be improved by providing another image to the machine learning unit 130.
 なお、カメラ102で認識する認識対象は、上述のものに限られず、撮像手段120により撮像した画像から、検知/認識可能なものであれば何でもよい。 The recognition target recognized by the camera 102 is not limited to that described above, and any recognition target can be used as long as it can be detected / recognized from the image captured by the imaging unit 120.
 次に、本発明の物体距離検出装置の実施の形態について説明する。
 本実施の形態の物体距離検出装置は、例えば、監視カメラや車載カメラ等の主に監視に係るカメラとして魚眼レンズのステレオカメラを用いたものであるが、立体映像を出力するためのものではなく、各画素がカメラから被写体までの距離で表される距離画像を生成し、距離画像から画像認識により不審者の検出等の監視業務を自動で可能とさせるものである。
Next, an embodiment of the object distance detection device of the present invention will be described.
The object distance detection device of the present embodiment uses a fisheye lens stereo camera as a camera mainly related to monitoring, such as a surveillance camera and an in-vehicle camera, but is not for outputting a stereoscopic image, A distance image in which each pixel is represented by the distance from the camera to the subject is generated, and a monitoring operation such as detection of a suspicious person is automatically enabled by image recognition from the distance image.
 図10に示すように、物体距離検出装置は、魚眼レンズユニット221、カラーフィルタ222および撮像センサ223等を有する一対の魚眼カメラ211と、一対の魚眼カメラ211それぞれの撮像センサ223からそれぞれの画像信号が入力する一対の画像入力部212と、入力された画像の魚眼レンズに基づく歪を除去する画像信号補正処理を行う一対の画像信号補正処理部213と、画像信号から画像信号補正処理で用いられる補正パラメータを求める補正パラメータ算出部215と、魚眼レンズによる歪が除去された補正後の画像信号から距離画像を求める画像解析部(距離画像算出部)214と、画像解析部214で生成された距離画像を画像認識して不審者検出等の監視業務を自動で行う距離画像認識部としての不審者検出部216とを備える。
 一対の魚眼カメラ211は、ステレオカメラを構成するものであり、魚眼レンズユニット221により撮像センサ223にカラーフィルタ222を介して結ばれた像を画像信号として出力する。この際に、画像は動画として出力される。
As shown in FIG. 10, the object distance detection device includes a pair of fisheye cameras 211 having a fisheye lens unit 221, a color filter 222, an image sensor 223, and the like, and respective images from the image sensors 223 of each of the pair of fisheye cameras 211. A pair of image input units 212 for inputting signals, a pair of image signal correction processing units 213 for performing image signal correction processing for removing distortion based on the fisheye lens of the input image, and image signal correction processing from image signals are used. A correction parameter calculation unit 215 for obtaining a correction parameter, an image analysis unit (distance image calculation unit) 214 for obtaining a distance image from a corrected image signal from which distortion by the fisheye lens has been removed, and a distance image generated by the image analysis unit 214 A suspicious person detection unit as a distance image recognition unit that automatically recognizes images and automatically performs monitoring operations such as suspicious person detection And a 16.
The pair of fisheye cameras 211 constitutes a stereo camera, and outputs an image connected to the image sensor 223 via the color filter 222 by the fisheye lens unit 221 as an image signal. At this time, the image is output as a moving image.
 魚眼レンズユニット221は、中心射影方式ではない射影方式を採用していることにより、魚眼レンズであり、本実施の形態では、等距離射影方式を採用したレンズユニットとなっている。なお、魚眼レンズユニット221の射影方式は、等距離射影方式に限られるものではなく、中心射影方式以外の射影方式を採用していればよく、例えば、上述の中心射影方式以外の射影方式のレンズユニットを魚眼レンズとして使用するものとしてもよい。また、本実施の形態において魚眼レンズユニット221の画角は180度であるが、例えば、160度から200度程度の画角であってもよい。 The fisheye lens unit 221 is a fisheye lens by adopting a projection method that is not a central projection method, and is a lens unit that adopts an equidistant projection method in this embodiment. The projection method of the fisheye lens unit 221 is not limited to the equidistant projection method, and any projection method other than the central projection method may be employed. For example, a lens unit of a projection method other than the above-described central projection method May be used as a fisheye lens. In this embodiment, the angle of view of the fisheye lens unit 221 is 180 degrees, but it may be, for example, an angle of view of about 160 degrees to 200 degrees.
 一対の魚眼カメラ211は、例えば、魚眼レンズユニット221の光軸が平行になるように隣り合って配置されており、それぞれが画角180度で、互いにもう一方の魚眼レンズユニット221が撮影されるようになっている。これにより、後述の対応点の探索にエピポーラ幾何を利用することが可能となっている。 For example, the pair of fisheye cameras 211 are arranged adjacent to each other so that the optical axes of the fisheye lens units 221 are parallel to each other so that each of the other fisheye lens units 221 is photographed with an angle of view of 180 degrees. It has become. This makes it possible to use epipolar geometry for searching for corresponding points described later.
 魚眼カメラ211の撮像センサ223からの出力が物体距離検出装置内の画像入力部212から入力されて画像信号補正処理部213で色同時化処理、ホワイトバランス処理、ガンマ処理、色マトリックス処理、輝度マトリックス処理、色差/輝度処理等を行う。なお、本実施の形態では、必ずしもカラー画像をモニタ等で出力するとは限らないので、例えば、カラーフィルタ222を用いない構成としてもよく、色に関する処理を行わなくてもよい。なお、後述のように画像認識により二つの画像で特徴点を抽出し、特徴点に基づいて対応点を探索しているので、画像認識上、輝度の画像(グレースケールの画像)よりカラーの画像の方が、認識率が高いのであれば、上述のように画像信号からカラー画像を生成してもよい。また、画像信号補正処理部213で必要とされる画像信号に基づくパラメータは、補正パラメータ算出部215で画像信号から算出される。 The output from the image sensor 223 of the fisheye camera 211 is input from the image input unit 212 in the object distance detection apparatus, and the image signal correction processing unit 213 performs color synchronization processing, white balance processing, gamma processing, color matrix processing, luminance. Performs matrix processing, color difference / luminance processing, and the like. In the present embodiment, since a color image is not necessarily output on a monitor or the like, for example, the color filter 222 may not be used, and processing relating to color may not be performed. Note that, as described later, feature points are extracted from two images by image recognition, and corresponding points are searched based on the feature points. Therefore, in image recognition, a color image is selected from a luminance image (grayscale image). If the recognition rate is higher, a color image may be generated from the image signal as described above. Also, parameters based on the image signal required by the image signal correction processing unit 213 are calculated from the image signal by the correction parameter calculation unit 215.
 画像解析部214では、図11に示すように、上述のように補正された画像が入力され、歪除去部としての画像変換部231で、魚眼レンズによる歪を除去する画像変換を必要に応じて行う。たとえば、等距離射影方式の画像を中心射影方式の画像に変換する。なお、歪除去は、周知の方法で行うことができ、例えば、歪除去の画像変換用の既知の集積回路を用いるようになっている。次に、区画化部および解像度変換部としての画像選択部232で歪を除去された変換画像を設定された区画に分ける区画化を行う。
なお、この区画化は直線による区画化に限らず、曲線による区画化であってもよい。たとえば等距離射影方式の画像を画像変換せずに利用する場合には曲線による区画化の方が好ましい。
As shown in FIG. 11, the image analysis unit 214 receives the image corrected as described above, and an image conversion unit 231 as a distortion removal unit performs image conversion to remove distortion caused by the fisheye lens as necessary. . For example, an equidistant projection image is converted into a central projection image. Note that distortion removal can be performed by a known method. For example, a known integrated circuit for image conversion for distortion removal is used. Next, partitioning is performed to divide the converted image from which distortion has been removed by the partitioning unit and the image selection unit 232 serving as a resolution conversion unit into set partitions.
This partitioning is not limited to a straight line partitioning, but may be a curve partitioning. For example, when an equidistance projection image is used without image conversion, segmentation by a curve is preferable.
 ここで、図13は、変換画像を矩形とした場合に、例えば、天井等に鉛直方向(下向き)に光軸を向けて一対の魚眼カメラ211が配置されている場合の区画K11~K33を示すものであり、各区画の面積は、中央部の区画K11、K12、K21の面積が広く、周縁部の区画K13、K23、K31、K32、K33の面積が狭くなっている。また、これら区画K11~K33の解像度は、中央部の区画K11、K12、K21の解像度が低く、それに対して周縁部の区画K13、K23、K31、K32、K33の解像度が高くなっている。たとえば、各区画を同じ画素数、400×200にすることにより、結果的に中央部の区画K11、K12、K21の解像度は低くなり、それに対して周縁部の区画K13、K23、K31、K32の解像度は高くなる。 Here, FIG. 13 shows the sections K11 to K33 when the pair of fisheye cameras 211 are arranged with the optical axis directed in the vertical direction (downward) on the ceiling or the like when the converted image is rectangular. As shown, the area of each partition is large in the central sections K11, K12, and K21, and the peripheral sections K13, K23, K31, K32, and K33 are narrow in area. The resolutions of these sections K11 to K33 are low in the central sections K11, K12, and K21, while the resolutions in the peripheral sections K13, K23, K31, K32, and K33 are high. For example, by setting each section to the same number of pixels, 400 × 200, the resolution of the central sections K11, K12, and K21 is lowered, whereas the peripheral sections K13, K23, K31, and K32 The resolution will be higher.
 また、区画化の前に画像変換で歪みを修正した場合に、画像の周縁部で縮んだ状態に歪んでいる部分が延ばされているが、それにより画像変換後は、画像の中央部の方が、周縁部より解像度が高くなっており、画像の中央部で解像度を下げるように画素を間引く処理を行っても、周縁部で解像度を下げる場合よりも影響が少ない。したがって、画像の中央部の区画K11、K12、K21で画素を間引く処理を行って解像度を下げ、画像の周縁部の区画K13、K23、K31、K32、K33では画素を間引かないようにして、解像度を下げないようにしている。なお、全ての区画K11~K33で画素を間引くものとし、その度合いを画像の周縁部の区画K13、K23、K31、K32、K33より中央部の区画K11、K12、K21で多くしてもよい。また、処理時間の短縮を図る上で、解像度が下げられる中央部の区画K11、K12、K21の面積を、解像度を下げない周縁部の区画K13、K23、K31、K32、K33より広くしている。なお、画像を複数の区画に分けた後に歪を除去する補正を行う場合に、周縁部では画像が縮むように歪んでいるので、このような歪を補正するために、画像の周縁部の区画の面積を拡大するので、区画化の際に、周縁部の区画の面積を中央部より小さな面積としている。この場合は、区画化後に歪の除去を行うことで、歪が大きな周縁部で解像度を下げると画像の劣化が大きく、距離の算出や画像認識に悪影響が生じる虞がある。したがって、区画化後に歪を除去する場合にも周縁部より中央部で画素を多く間引くようにすることが好ましい。 In addition, when distortion is corrected by image conversion before partitioning, the distorted portion is stretched in a contracted state at the peripheral edge of the image. However, the resolution is higher than that at the peripheral portion, and even if the pixel is thinned out so that the resolution is reduced at the center portion of the image, the influence is less than when the resolution is reduced at the peripheral portion. Therefore, the resolution is lowered by performing the process of thinning out pixels in the central sections K11, K12, and K21 of the image, and the pixels are not thinned out in the peripheral sections K13, K23, K31, K32, and K33 of the image, The resolution is not lowered. It should be noted that pixels may be thinned out in all the sections K11 to K33, and the degree thereof may be increased in the sections K11, K12, and K21 in the central portion than the sections K13, K23, K31, K32, and K33 in the peripheral portion of the image. In order to shorten the processing time, the area of the central sections K11, K12, and K21 where the resolution is lowered is made wider than the peripheral sections K13, K23, K31, K32, and K33 where the resolution is not lowered. . In addition, when performing correction for removing distortion after dividing an image into a plurality of sections, the image is distorted so that the image shrinks at the peripheral portion. Therefore, in order to correct such distortion, Since the area is enlarged, the area of the peripheral section is made smaller than that of the central part when partitioning. In this case, by removing the distortion after partitioning, if the resolution is lowered at the peripheral part where the distortion is large, the image is greatly deteriorated, and there is a possibility that the calculation of the distance and the image recognition are adversely affected. Therefore, when removing distortion after partitioning, it is preferable to thin out more pixels at the central portion than at the peripheral portion.
 また、図14は、変換画像を矩形とした場合に、例えば、人間の背より高い位置に水平方向または水平方向に対して斜め下向きに光軸を向けて一対の魚眼カメラ211が配置されている場合の区画K11~K42を示すものであり、中央部およびその下側の区画K11~K33は、上述の鉛直方向を向く魚眼カメラ211と同様の区画となっており、各区画K11~K33における解像度および面積の設定が同じとなっている。 Further, FIG. 14 shows that when the converted image is rectangular, for example, a pair of fisheye cameras 211 is arranged at a position higher than the human back with the optical axis directed horizontally or obliquely downward with respect to the horizontal direction. The sections K11 to K42 are shown in the figure. The sections K11 to K33 at the center and the lower side are the same sections as the above-described fish-eye camera 211 facing the vertical direction, and each section K11 to K33. The resolution and area are set in the same manner.
 それに対して画像の中央部より上の上部の区画K41、K42では、例えば、屋外では空、屋内では天井が写るため、重要度が低く、中央部の区画K11より解像度が低く面積が広くなっている。なお、上部の区画K41、K42では、上部の中央部の区画K41の解像度が最も低く面積が最も広い、それに対して上部の左右側部の区画K42の解像度は、区画K41より高く、面積が狭くなっている。 On the other hand, in the upper sections K41 and K42 above the center portion of the image, for example, the sky is outdoors and the ceiling is reflected indoors. Therefore, the importance is low, and the resolution is lower and the area is wider than the section K11 in the center portion. Yes. In the upper sections K41 and K42, the upper center section K41 has the lowest resolution and the largest area. On the other hand, the upper left and right sections K42 have a higher resolution than the section K41 and a smaller area. It has become.
 なお、図13および図14は、区画分けの例を示すものであり、魚眼カメラ211の魚眼レンズユニット221による画像の歪に基づいて中央部の区画の解像度を低くするとともに面積を広くし、それと比較して周縁部の解像度を高くするとともに面積を小さくすることを基本とし、魚眼カメラ211の配置に基づく重要度によって解像度や面積を調整することができる。すなわち、画角180度程度以上の魚眼カメラ211では、撮影範囲が広いので、例えば、人が存在し得ないような位置が撮影範囲に入る可能性があり、人を監視する監視カメラにおいてそのような部分の解像度を下げて、処理速度の向上を図ることが好ましい。 FIG. 13 and FIG. 14 show an example of partitioning. Based on the distortion of the image by the fisheye lens unit 221 of the fisheye camera 211, the resolution of the central partition is lowered and the area is widened. In comparison, the resolution and the area can be adjusted according to the importance based on the arrangement of the fisheye camera 211, based on increasing the resolution of the peripheral portion and reducing the area. That is, the fisheye camera 211 having an angle of view of about 180 degrees or more has a wide shooting range. For example, a position where a person cannot exist may enter the shooting range. It is preferable to improve the processing speed by lowering the resolution of such a portion.
 図11に示す画像選択部232では、一対の画像のそれぞれで同じ区画K11~K33を選択する。一対の区画を選択した場合に、次の対応点探索部としての対応点選択部233は、上述のように、画像認識による各画像の特徴点の抽出と、各画像の特徴点の対応付けを行う。この際には、エピポーラ幾何を用いることができ、各画像で組となる対応点を順次決定して選択する。なお、一対の区画に対する対応点の選択が終了した場合に、次の一対の区画で対応点の抽出を行う。全ての対となる区画で対応点の抽出が終了した場合に、次に距離算出部234において、一対ずつの対応点の画像上の位置の違いと、1対の魚眼カメラ211間の距離から魚眼カメラ211から一対の対応点に対応する対象点までの距離を求める。所謂三角測量法に基づく距離の算出であるが、視差を用いる場合には、一般的な三角測量に対して基線距離(一対のカメラ間の距離)が対象100までの距離より非常に小さく、両点(各魚眼カメラ211)から対象点までの距離の差が問題にならないときに、対象点までの距離は、基線距離を視差(単位ラジアン)で除算することにより得られる。
 また、対応点の各画像上の二次元座標上の座標位置(投影位置)とカメラ間の距離と、カメラの焦点距離から実空間上の三次元座標上の対象点の三次元座標を算出することが可能であり、対象点とカメラの実空間上の三次元座標位置からカメラから対象点までの距離を算出可能である。
The image selection unit 232 shown in FIG. 11 selects the same sections K11 to K33 for each of the pair of images. When a pair of sections is selected, the corresponding point selection unit 233 as the next corresponding point search unit extracts feature points of each image by image recognition and associates feature points of each image as described above. Do. In this case, epipolar geometry can be used, and corresponding points to be paired in each image are sequentially determined and selected. When selection of corresponding points for a pair of sections is completed, corresponding points are extracted in the next pair of sections. When the extraction of corresponding points is completed in all pairs of sections, the distance calculation unit 234 then determines the difference between the positions of the corresponding points on the image and the distance between the pair of fisheye cameras 211. The distance from the fisheye camera 211 to the target point corresponding to the pair of corresponding points is obtained. The distance is calculated based on the so-called triangulation method. However, when using parallax, the baseline distance (the distance between a pair of cameras) is much smaller than the distance to the target 100 with respect to general triangulation. When the difference in distance from the point (each fisheye camera 211) to the target point does not matter, the distance to the target point is obtained by dividing the baseline distance by the parallax (unit radians).
Also, the three-dimensional coordinates of the target point on the three-dimensional coordinate in the real space are calculated from the coordinate position (projection position) on the two-dimensional coordinate on each image of the corresponding point, the distance between the cameras, and the focal length of the camera. The distance from the camera to the target point can be calculated from the target point and the three-dimensional coordinate position in the real space of the camera.
 図12のフローチャートが、上述の画像解析部214における処理を示すもので、一対の魚眼カメラ211による画像の撮影、画像の補正後に、画像解析部214に、一対の一フレームずつの画像の入力から各画素となる対応点毎の魚眼カメラ211から対象点までの距離を示す距離画像を算出する処理を示すものである。なお、図12のフローチャートでは、区画化の後に歪を除去する処理を行う場合を示している。フローチャートに示すように、一対の魚眼カメラ211から1フレームずつの補正された画像が入力した場合に、区画化を行う(ステップS21)。区画化では、画像を複数の区画に分けるとともに各区画の解像度を下げる処理(例えば、縮小処理)を行うが、区画によって解像度を変更する。また、区画によって、一区画の面積を調整する。基本的に画像の中央部の区画で解像度が低く、面積が大きくなるようにし、それに対して画像の周縁部で区画の解像度を中央部より高く、区画の面積を中央部より小さくする。 The flowchart in FIG. 12 shows the processing in the image analysis unit 214 described above. After the image is captured and corrected by the pair of fisheye cameras 211, the image is input to the image analysis unit 214 for each pair of frames. The process which calculates the distance image which shows the distance from the fish-eye camera 211 for every corresponding point used as each pixel to the object point is shown. Note that the flowchart in FIG. 12 illustrates a case where a process for removing distortion is performed after partitioning. As shown in the flowchart, when a corrected image for each frame is input from the pair of fisheye cameras 211, partitioning is performed (step S21). In partitioning, an image is divided into a plurality of partitions and a process of reducing the resolution of each partition (for example, reduction processing) is performed, but the resolution is changed depending on the partition. Moreover, the area of one division is adjusted with a division. Basically, the resolution is low and the area is increased in the central section of the image, while the resolution of the section is higher than that in the central area and the area of the section is smaller than that in the central section.
 区画化の後に、各区画で歪の除去を行う(ステップS22)。なお、魚眼レンズによる画像の歪の除去方法は、既に確立された方法を用いる。
 次に、各区画で対応点となる特徴点(特異点)を抽出する(ステップS23)。特徴点としては、例えば、エッジ等の特異点を周知のエッジ検出で検出する。なお、対応点は、実空間の対象点に対応する画像上の点であり、二つの画像で同じ対象点に対応する点は対応点であり、両方の画像に実空間上の各点が写っていれば、対応点が存在することになり、上述のエピポーラ幾何等の方法を用いて、多くの対応点を探索することが好ましい。
 次に、一対ずつの対応点の画像上の位置の違いと、魚眼カメラ211間の距離に基づいて魚眼カメラ211から対応点に対応する実空間上の対象点までの距離を算出する(ステップS24)。次に、求められた各対応点の距離を各画素の値とする距離画像を生成して出力する(ステップS25)。
After partitioning, distortion is removed in each partition (step S22). Note that an already established method is used as a method for removing distortion of an image using a fisheye lens.
Next, feature points (singular points) that become corresponding points in each section are extracted (step S23). As feature points, for example, singular points such as edges are detected by well-known edge detection. A corresponding point is a point on the image corresponding to a target point in the real space, a point corresponding to the same target point in two images is a corresponding point, and each point in the real space is shown in both images. If so, there will be corresponding points, and it is preferable to search for many corresponding points using the above-described epipolar geometry method.
Next, the distance from the fisheye camera 211 to the target point in the real space corresponding to the corresponding point is calculated based on the difference in the position of the corresponding point in the image and the distance between the fisheye cameras 211 ( Step S24). Next, a distance image is generated and output with the distance of each corresponding point determined as the value of each pixel (step S25).
 そして、この距離画像を監視に使用する。距離画像を画像認識して人を検出したり、人以外の生物や物品(車両等)を検出したりする。また、人に関しては、登録されている人物の顏の立体形状や写真等から人物の特定を行ったり、距離画像中で検出された人の立体形状、例えば、身長、体格、服装の形状等から子供と大人を区別したり、年齢を区別したり、男女を区別したりしてもよい。また、自動車や自動二輪車等の車両の車種ごとの形状を登録しておき、車両の検出と、車種の検出を行うものとしてもよい。 And this distance image is used for monitoring. A person is detected by recognizing a distance image, or a living thing or an article (vehicle or the like) other than a person is detected. In addition, with regard to people, the person is identified from the registered person's three-dimensional shape or photograph, or the person's three-dimensional shape detected in the distance image, for example, height, physique, clothing shape, etc. A child and an adult may be distinguished, an age may be distinguished, and a man and a woman may be distinguished. Moreover, it is good also as what detects the shape for every vehicle model of vehicles, such as a motor vehicle and a motorcycle, and detects a vehicle type.
 本実施の形態の物体距離検出装置によれば、画角が例えば180度程度の魚眼カメラ211を複数備えるステレオカメラで距離検出を行うので、例えば、実空間における各物体としての対象点までの距離を求め、距離画像を出力することができるが、画角が広いことから一フレームの画像における情報量が多く、距離の検出に必要な画像認識を含む演算処理の処理量が多くなり、演算処理回路の能力によっては処理に長い時間を必要とし、一フレームの処理時間が長く、動画の処理が難しくなる。また、魚眼カメラ211の画像中に多くの距離算出可能な対象点がある。そして、情報量(対象点)が画像の中央部より多い画像の周縁部では、画像が中央部より歪んでおり、必ずしも解像度が高くない。一方、魚眼カメラ211の画像には、空や天井、床や地面等の監視業務等において重要性の低い部分が多く映り込むことになる。そこで、歪が小さく解像度が高い部分や、重要度の低い部分を区分けし、この部分の解像度を下げて距離の算出を行うことで、距離の算出の精度を大きく低下させることなく、処理量を減らして処理時間を短縮することができる。これにより、複数の魚眼カメラ211を有するステレオカメラで求めた距離画像を用いた監視装置における画像認識を円滑に行うことができる。 According to the object distance detection device of the present embodiment, distance detection is performed by a stereo camera including a plurality of fisheye cameras 211 having an angle of view of, for example, about 180 degrees. Therefore, for example, up to a target point as each object in real space. The distance can be obtained and the distance image can be output. However, since the angle of view is wide, the amount of information in one frame image is large, and the amount of calculation processing including image recognition necessary for distance detection is large. Depending on the capability of the processing circuit, a long time is required for processing, and the processing time for one frame is long, making it difficult to process a moving image. In addition, there are many target points that can be calculated for the distance in the image of the fisheye camera 211. And in the peripheral part of the image where the amount of information (target point) is larger than the central part of the image, the image is distorted from the central part, and the resolution is not necessarily high. On the other hand, in the image of the fisheye camera 211, a lot of less important parts are reflected in the monitoring work such as the sky, the ceiling, the floor and the ground. Therefore, by dividing the part with low distortion and high resolution and the part with low importance and calculating the distance by reducing the resolution of this part, the processing amount can be reduced without greatly reducing the accuracy of distance calculation. This can reduce the processing time. Thereby, the image recognition in the monitoring apparatus using the distance image calculated | required with the stereo camera which has the some fisheye camera 211 can be performed smoothly.
 ここで物体距離検出装置から出力される距離画像の各画素領域Dについて説明する。距離画像は、画像上の各点がステレオカメラからの距離で表されるものであり、本実施の形態では、距離の値に応じた色の濃淡の変化で距離画像が表されている。色の変化は、例えば、白黒のグラデーションや、他の色のグラデーションであってもよい。なお、距離画像で機械的に画像認識する場合には、画像上の各点を上述の距離を示す数値で表しても良い。図13に示す区画化された画像に対応する距離画像の一部として図15(a)に区画K11の一部を示し、図15(b)に区画K33の一部を示す。一つの距離画像上に各区画K11~K33が図13と同様に配置されている。そして、各区画K11~K33によって解像度が異なるものとなっている。但し、図15に示すように、距離画像上において最小単位となる画素P(二重線および点線の両方で区切られた部分)の大きさは同じとなっている。画素Pは、例えば距離画像を表示するモニタの画素である。 Here, each pixel region D of the distance image output from the object distance detection device will be described. In the distance image, each point on the image is represented by a distance from the stereo camera. In the present embodiment, the distance image is represented by a change in color shade according to the distance value. The color change may be, for example, a monochrome gradation or a gradation of another color. When mechanically recognizing an image using a distance image, each point on the image may be represented by a numerical value indicating the distance. FIG. 15A shows a part of the section K11 as a part of the distance image corresponding to the sectioned image shown in FIG. 13, and FIG. 15B shows a part of the section K33. The sections K11 to K33 are arranged in the same manner as in FIG. 13 on one distance image. The resolution varies depending on the sections K11 to K33. However, as shown in FIG. 15, the size of the pixel P (part divided by both a double line and a dotted line) which is the minimum unit on the distance image is the same. The pixel P is, for example, a monitor pixel that displays a distance image.
 距離画像においては、画像が各画素領域D(二重線で区切られた部分)に分けられ、各画素領域Dは、1画素Pまたは複数画素Pから構成されている。前記画素領域D毎にステレオカメラから対応点(撮影対象)までの距離に応じた例えば白黒の濃淡が着けられており、距離画像は各画素領域Dの距離に応じた色(色の濃淡)で表されたものである。ここで、区画K11の解像度は、区画K33より低くなっているのに対応して、区画K11の各画素領域Dの画素数が4なのに対して区画K33の画素領域Dの画素数は2となっており、解像度の低い区画K11の画素領域Dの方が解像度の高い区画K33の画素領域Dより画素Pの数が多く、面積が広くなっている。一つの距離画像において、区画によって最小単位の画素の大きさを変える必要がなく、画素領域Dの画素Pの数を変更することで解像度の違いに対応することができ、例えば、解像度の異なる複数の区画の画像を一つのモニタ上に略同じ表示倍率で表示することができる。 In the distance image, the image is divided into pixel areas D (parts separated by double lines), and each pixel area D is composed of one pixel P or a plurality of pixels P. For example, black and white shading corresponding to the distance from the stereo camera to the corresponding point (photographing target) is attached to each pixel area D, and the distance image is a color (color shading) corresponding to the distance of each pixel area D. It is expressed. Here, the resolution of the section K11 is lower than that of the section K33, while the number of pixels in each pixel area D of the section K11 is 4, whereas the number of pixels of the pixel area D of the section K33 is 2. In addition, the pixel area D of the section K11 having a lower resolution has a larger number of pixels P and a larger area than the pixel area D of the section K33 having a higher resolution. In one distance image, it is not necessary to change the size of the minimum unit pixel depending on the section, and it is possible to cope with the difference in resolution by changing the number of pixels P in the pixel region D. Can be displayed on a single monitor at substantially the same display magnification.
1  画像センサ(撮像手段)
2  3Dセンサ(距離画像取得手段、距離計測手段)
3  物体抽出手段(距離画像認識対象物抽出手段)
4  物体画像抽出手段(認識対象物画像抽出手段)
5  画像認識手段
10 カメラ(撮像手段)
11 距離画像検出手段(距離画像取得手段)
12 ステレオカメラ(距離画像取得手段)
1 Image sensor (imaging means)
2 3D sensor (distance image acquisition means, distance measurement means)
3 Object extraction means (distance image recognition object extraction means)
4 Object image extraction means (recognition object image extraction means)
5 Image recognition means 10 Camera (imaging means)
11 Distance image detection means (distance image acquisition means)
12 Stereo camera (distance image acquisition means)

Claims (4)

  1.  画像を撮像する撮像手段と、
     前記画像の撮像範囲に対応する範囲の各画素が撮影対象までの距離で表される距離画像を取得する距離画像取得手段と、
     前記距離画像から距離に基づいて前記距離画像上の認識対象物を抽出する距離画像認識対象物抽出手段と、
     前記距離画像から抽出された前記認識対象物の前記距離画像上の範囲に基づき、前記画像から前記認識対象物となる部分画像を抽出する認識対象物画像抽出手段と、
     前記部分画像の画像認識を行い、前記認識対象物を識別する画像認識手段とを備えることを特徴とする画像認識撮像装置。
    An imaging means for capturing an image;
    Distance image acquisition means for acquiring a distance image in which each pixel in a range corresponding to the imaging range of the image is represented by a distance to a shooting target;
    Distance image recognition object extraction means for extracting a recognition object on the distance image based on the distance from the distance image;
    Recognition object image extraction means for extracting a partial image to be the recognition object from the image based on a range on the distance image of the recognition object extracted from the distance image;
    An image recognition imaging apparatus comprising: an image recognition unit that performs image recognition of the partial image and identifies the recognition object.
  2.  前記距離画像取得手段は、前記距離画像の各画素の距離を計測する距離計測手段を有することを特徴とする請求項1に記載の画像認識撮像装置。 2. The image recognition imaging apparatus according to claim 1, wherein the distance image acquisition unit includes a distance measurement unit that measures a distance of each pixel of the distance image.
  3.  前記距離画像取得手段は、二つの前記撮像手段の視差に基づいて距離画像を求めることを特徴とする請求項1に記載の画像認識撮像装置。 2. The image recognition imaging apparatus according to claim 1, wherein the distance image acquisition means obtains a distance image based on parallax between the two imaging means.
  4.  一つの筐体内に、前記撮像手段、前記距離画像取得手段、前記距離画像認識対象物抽出手段、前記認識対象物画像抽出手段および前記画像認識手段を備えることを特徴とする請求項1~3のいずれか1項に記載の画像認識撮像装置。 The image pickup means, the distance image acquisition means, the distance image recognition object extraction means, the recognition object image extraction means, and the image recognition means are provided in one housing. The image recognition imaging device according to any one of the above.
PCT/JP2017/042578 2016-11-29 2017-11-28 Image recognition imaging apparatus WO2018101247A1 (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
JP2016-231534 2016-11-29
JP2016231534A JP7162412B2 (en) 2016-11-29 2016-11-29 detection recognition system
JP2017-052831 2017-03-17
JP2017052831A JP2018156408A (en) 2017-03-17 2017-03-17 Image recognizing and capturing apparatus
JP2017146497A JP6860445B2 (en) 2017-07-28 2017-07-28 Object distance detector
JP2017-146497 2017-07-28

Publications (1)

Publication Number Publication Date
WO2018101247A1 true WO2018101247A1 (en) 2018-06-07

Family

ID=62242183

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2017/042578 WO2018101247A1 (en) 2016-11-29 2017-11-28 Image recognition imaging apparatus

Country Status (1)

Country Link
WO (1) WO2018101247A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180240249A1 (en) * 2017-02-23 2018-08-23 Hitachi, Ltd. Image Recognition System
CN111047637A (en) * 2018-10-12 2020-04-21 富华科精密工业(深圳)有限公司 Monocular distance measuring device
WO2020116195A1 (en) * 2018-12-07 2020-06-11 ソニーセミコンダクタソリューションズ株式会社 Information processing device, information processing method, program, mobile body control device, and mobile body
WO2020152851A1 (en) * 2019-01-25 2020-07-30 株式会社 テクノミライ Digital search security system, method, and program
JP2021026698A (en) * 2019-08-08 2021-02-22 株式会社ティファナ ドットコム Crime prevention device, automatic dispenser or information providing device and program
WO2021102911A1 (en) * 2019-11-29 2021-06-03 深圳市大疆创新科技有限公司 Image detection method, image detection device, and storage medium
JP2021100202A (en) * 2019-12-23 2021-07-01 横河電機株式会社 Distribution server, method, and program
WO2021199099A1 (en) * 2020-03-30 2021-10-07 日本電気株式会社 Control device, control system, and control method
JP2021532648A (en) * 2018-07-31 2021-11-25 ウェイモ エルエルシー Hybrid time-of-flight imager module
EP3893497A4 (en) * 2018-12-07 2022-04-27 Sony Semiconductor Solutions Corporation Information processing device, information processing method, and program
WO2023228810A1 (en) * 2022-05-24 2023-11-30 村田機械株式会社 Article recognition system and article recognition device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003028635A (en) * 2001-07-16 2003-01-29 Honda Motor Co Ltd Image range finder
JP2003044995A (en) * 2001-07-26 2003-02-14 Nissan Motor Co Ltd Device and method for discriminating body kind
JP2011039833A (en) * 2009-08-12 2011-02-24 Fujitsu Ltd Vehicle detector, vehicle detection program, and vehicle detection method
JP2015125760A (en) * 2013-12-27 2015-07-06 日立建機株式会社 Mine work machine
JP2015520433A (en) * 2012-03-26 2015-07-16 ティーケー ホールディングス インコーポレーテッド Range-cue object segmentation system and method
JP2015195018A (en) * 2014-03-18 2015-11-05 株式会社リコー Image processor, image processing method, operation support system, and program
JP2017052498A (en) * 2015-09-11 2017-03-16 株式会社リコー Image processing device, object recognition apparatus, equipment control system, image processing method, and program

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003028635A (en) * 2001-07-16 2003-01-29 Honda Motor Co Ltd Image range finder
JP2003044995A (en) * 2001-07-26 2003-02-14 Nissan Motor Co Ltd Device and method for discriminating body kind
JP2011039833A (en) * 2009-08-12 2011-02-24 Fujitsu Ltd Vehicle detector, vehicle detection program, and vehicle detection method
JP2015520433A (en) * 2012-03-26 2015-07-16 ティーケー ホールディングス インコーポレーテッド Range-cue object segmentation system and method
JP2015125760A (en) * 2013-12-27 2015-07-06 日立建機株式会社 Mine work machine
JP2015195018A (en) * 2014-03-18 2015-11-05 株式会社リコー Image processor, image processing method, operation support system, and program
JP2017052498A (en) * 2015-09-11 2017-03-16 株式会社リコー Image processing device, object recognition apparatus, equipment control system, image processing method, and program

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180240249A1 (en) * 2017-02-23 2018-08-23 Hitachi, Ltd. Image Recognition System
US10636161B2 (en) * 2017-02-23 2020-04-28 Hitachi, Ltd. Image recognition system
JP2021532648A (en) * 2018-07-31 2021-11-25 ウェイモ エルエルシー Hybrid time-of-flight imager module
JP7321246B2 (en) 2018-07-31 2023-08-04 ウェイモ エルエルシー Hybrid time-of-flight imager module
CN111047637A (en) * 2018-10-12 2020-04-21 富华科精密工业(深圳)有限公司 Monocular distance measuring device
CN111047637B (en) * 2018-10-12 2023-06-27 深圳富联富桂精密工业有限公司 Monocular distance measuring device
WO2020116195A1 (en) * 2018-12-07 2020-06-11 ソニーセミコンダクタソリューションズ株式会社 Information processing device, information processing method, program, mobile body control device, and mobile body
JP7320001B2 (en) 2018-12-07 2023-08-02 ソニーセミコンダクタソリューションズ株式会社 Information processing device, information processing method, program, mobile body control device, and mobile body
CN113168691A (en) * 2018-12-07 2021-07-23 索尼半导体解决方案公司 Information processing device, information processing method, program, mobile body control device, and mobile body
JPWO2020116195A1 (en) * 2018-12-07 2021-10-21 ソニーセミコンダクタソリューションズ株式会社 Information processing device, information processing method, program, mobile control device, and mobile
EP3893497A4 (en) * 2018-12-07 2022-04-27 Sony Semiconductor Solutions Corporation Information processing device, information processing method, and program
WO2020152851A1 (en) * 2019-01-25 2020-07-30 株式会社 テクノミライ Digital search security system, method, and program
JP2021026698A (en) * 2019-08-08 2021-02-22 株式会社ティファナ ドットコム Crime prevention device, automatic dispenser or information providing device and program
JP7250264B2 (en) 2019-08-08 2023-04-03 株式会社ティファナ ドットコム Security device, vending machine or information providing device, and program
WO2021102911A1 (en) * 2019-11-29 2021-06-03 深圳市大疆创新科技有限公司 Image detection method, image detection device, and storage medium
US11410406B2 (en) 2019-12-23 2022-08-09 Yokogawa Electric Corporation Delivery server, method and storage medium
JP7259732B2 (en) 2019-12-23 2023-04-18 横河電機株式会社 Distribution server, method and program
CN113099171A (en) * 2019-12-23 2021-07-09 横河电机株式会社 Distribution server, method and recording medium
JP2021100202A (en) * 2019-12-23 2021-07-01 横河電機株式会社 Distribution server, method, and program
CN113099171B (en) * 2019-12-23 2024-06-21 横河电机株式会社 Distribution server, method and recording medium
JPWO2021199099A1 (en) * 2020-03-30 2021-10-07
JP7279851B2 (en) 2020-03-30 2023-05-23 日本電気株式会社 Control device, control system, control method and control program
WO2021199099A1 (en) * 2020-03-30 2021-10-07 日本電気株式会社 Control device, control system, and control method
WO2023228810A1 (en) * 2022-05-24 2023-11-30 村田機械株式会社 Article recognition system and article recognition device

Similar Documents

Publication Publication Date Title
WO2018101247A1 (en) Image recognition imaging apparatus
CN107240124B (en) Cross-lens multi-target tracking method and device based on space-time constraint
CN108111818B (en) Moving target actively perceive method and apparatus based on multiple-camera collaboration
EP3648448B1 (en) Target feature extraction method and device, and application system
CN109887040B (en) Moving target active sensing method and system for video monitoring
US10503966B1 (en) Binocular pedestrian detection system having dual-stream deep learning neural network and the methods of using the same
US9165190B2 (en) 3D human pose and shape modeling
KR101337060B1 (en) Imaging processing device and imaging processing method
CN104902246B (en) Video monitoring method and device
JP6554169B2 (en) Object recognition device and object recognition system
KR101530255B1 (en) Cctv system having auto tracking function of moving target
CN104966062B (en) Video monitoring method and device
JP5127531B2 (en) Image monitoring device
CN111753609A (en) Target identification method and device and camera
US20140049600A1 (en) Method and system for improving surveillance of ptz cameras
JP2018156408A (en) Image recognizing and capturing apparatus
JP2018088157A (en) Detection recognizing system
US20220366570A1 (en) Object tracking device and object tracking method
CN114905512B (en) Panoramic tracking and obstacle avoidance method and system for intelligent inspection robot
CN117689881B (en) Casting object tracking method based on event camera and CMOS camera
JP6860445B2 (en) Object distance detector
JP7074174B2 (en) Discriminator learning device, discriminator learning method and computer program
JP2008165595A (en) Obstacle detection method, obstacle detection device, and obstacle detection system
CN109460077B (en) Automatic tracking method, automatic tracking equipment and automatic tracking system
CN112257617A (en) Multi-modal target recognition method and system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17876273

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17876273

Country of ref document: EP

Kind code of ref document: A1