EP4283576A1 - Object detection system and a method of use thereof - Google Patents
Object detection system and a method of use thereof Download PDFInfo
- Publication number
- EP4283576A1 EP4283576A1 EP23275082.8A EP23275082A EP4283576A1 EP 4283576 A1 EP4283576 A1 EP 4283576A1 EP 23275082 A EP23275082 A EP 23275082A EP 4283576 A1 EP4283576 A1 EP 4283576A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- camera
- vehicle
- view
- communication
- camera means
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 61
- 238000000034 method Methods 0.000 title claims description 22
- 238000004891 communication Methods 0.000 claims abstract description 28
- 230000000007 visual effect Effects 0.000 claims description 36
- 238000013500 data storage Methods 0.000 claims description 14
- 230000008569 process Effects 0.000 claims description 13
- 230000001771 impaired effect Effects 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 description 11
- 238000013459 approach Methods 0.000 description 7
- 241000282412 Homo Species 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 5
- 238000012544 monitoring process Methods 0.000 description 5
- 238000010191 image analysis Methods 0.000 description 4
- 238000012552 review Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000005096 rolling process Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 208000027418 Wounds and injury Diseases 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 231100001261 hazardous Toxicity 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 208000014674 injury Diseases 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/10—Image acquisition
- G06V10/16—Image acquisition using multiple overlapping images; Image stitching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60R—VEHICLES, VEHICLE FITTINGS, OR VEHICLE PARTS, NOT OTHERWISE PROVIDED FOR
- B60R1/00—Optical viewing arrangements; Real-time viewing arrangements for drivers or passengers using optical image capturing systems, e.g. cameras or video systems specially adapted for use in or on vehicles
- B60R1/20—Real-time viewing arrangements for drivers or passengers using optical image capturing systems, e.g. cameras or video systems specially adapted for use in or on vehicles
- B60R1/22—Real-time viewing arrangements for drivers or passengers using optical image capturing systems, e.g. cameras or video systems specially adapted for use in or on vehicles for viewing an area outside the vehicle, e.g. the exterior of the vehicle
- B60R1/23—Real-time viewing arrangements for drivers or passengers using optical image capturing systems, e.g. cameras or video systems specially adapted for use in or on vehicles for viewing an area outside the vehicle, e.g. the exterior of the vehicle with a predetermined field of view
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60R—VEHICLES, VEHICLE FITTINGS, OR VEHICLE PARTS, NOT OTHERWISE PROVIDED FOR
- B60R2300/00—Details of viewing arrangements using cameras and displays, specially adapted for use in a vehicle
- B60R2300/10—Details of viewing arrangements using cameras and displays, specially adapted for use in a vehicle characterised by the type of camera system used
- B60R2300/105—Details of viewing arrangements using cameras and displays, specially adapted for use in a vehicle characterised by the type of camera system used using multiple cameras
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60R—VEHICLES, VEHICLE FITTINGS, OR VEHICLE PARTS, NOT OTHERWISE PROVIDED FOR
- B60R2300/00—Details of viewing arrangements using cameras and displays, specially adapted for use in a vehicle
- B60R2300/30—Details of viewing arrangements using cameras and displays, specially adapted for use in a vehicle characterised by the type of image processing
- B60R2300/307—Details of viewing arrangements using cameras and displays, specially adapted for use in a vehicle characterised by the type of image processing virtually distinguishing relevant parts of a scene from the background of the scene
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60R—VEHICLES, VEHICLE FITTINGS, OR VEHICLE PARTS, NOT OTHERWISE PROVIDED FOR
- B60R2300/00—Details of viewing arrangements using cameras and displays, specially adapted for use in a vehicle
- B60R2300/80—Details of viewing arrangements using cameras and displays, specially adapted for use in a vehicle characterised by the intended use of the viewing arrangement
- B60R2300/8093—Details of viewing arrangements using cameras and displays, specially adapted for use in a vehicle characterised by the intended use of the viewing arrangement for obstacle warning
Definitions
- the invention to which this application relates is an object detection system and a method of use thereof.
- the present invention also relates to a vehicle comprising such an object detection apparatus.
- Pedestrian detection systems are employed in particular in road vehicles as an aid to alert the driver of a nearby pedestrian which who is within a certain proximity of the vehicle and may ultimately cross the path of the vehicle.
- a number of computer sensors are provided located around the vehicle to detect people within a predefined distance. The onboard computer of the vehicle can then use this data to alert the driver of any close-by objects. Such sensors may also be used to aid in the parking of the vehicle.
- FR2958774 (A1 ) discloses a system and method for detecting an object around a lorry using a stereoscopic camera to acquire 2D stereoscopic images and subsequently create a 3D disparity map from those images. The object can then be detected from that disparity map and classified accordingly. The 3D disparity map is projected on a vertical plane and a search for any object which detaches from the ground plane is carried out, ultimately detecting and highlighting any relevant objects, such as humans.
- the system of FR2958774 (A1 ) serves to identify features which arise from the ground in the disparity map, as the ground is identified as the linear plane beneath the vehicle. Each potential object of interest is inspected by the software of the stereoscopic camera to identify whether or not it is a pedestrian.
- Another example would be instances where the pedestrian is on a raised walkway which gradually descends to become part of the ground plane. Only as the pedestrian locates on the ground plane and is consequently identified as a detachment therefrom, would the system detect them as a relevant object of interest and subsequently notify the driver of their presence. If the FLT is travelling at speed and the driver is unsighted as a consequence of the size and shape of the load being carried, the alert/ notification may arrive too late to avoid incident or accident.
- Other examples may include scenarios where a pedestrian is partially obscured or obstructed from camera view as they are, at the time of detection, behind an obstacle such as a stack of pallets.
- the pedestrian may be entirely missed as the software would only initially recognize and classify the detect object as a "box" or other such item, and not a human.
- the reliance purely on stereo cameras and the creation of 3D disparity maps to detect pedestrians in potentially hazardous locations is therefore flawed and requires improvement.
- an object detection system said system including an apparatus having:
- said first camera means are provided to classify the type of object which has been or is detected.
- said first camera means is provided to detect humans and/or objects within its field of view.
- said system is located with or forms part of a vehicle.
- said vehicle is an industrial vehicle.
- said vehicle is a fork lift truck (FLT).
- the system includes computing means, provided in communication with said first, second and third camera means.
- said computing means may be provided in the apparatus.
- said computing means are arranged to receive and process visual data obtained by the first camera means to discern nature and type of a detect object and subsequently classify it.
- said computing means includes an open-source algorithm to process and classify said visual data.
- said open-source algorithm is an algorithm known as "You Only Look Once" (YOLO).
- said computing means is arranged to determine if a detected object is an object of interest.
- an object of interest may be defined to be a human or specific object, whereas if boxes, pallets, road cones etc. are detected, these may not be deemed to be objects of interest.
- the field of view of the first, second and third camera means spans up to 160°.
- the depth of view or range of detection of the first, second and third camera means is up to 10m.
- the depth of view or range of detection is up to 8m.
- the field and depth of view may be a user-defined area or region.
- the angle of the field of view may be narrowed and the depth of view and/or detection may consequently be increased, thereby enabling detection at a greater distance, which is particularly useful when travelling at speed, but within a more narrow or concentrated field of view.
- the computing means are further arranged to receive and process data obtained by said second and third camera means, and resolve and process the depth/distance and angle of a detected object.
- said first, second and third camera means are arranged linearly with respect to one another and in the same horizontal plane as one another.
- said second and third camera means are provided as separate and distinct camera means, in communication with one another via the computing means.
- said first, second and third camera means are located together in a first camera head.
- the first camera head has a field of view spanning up to 160°.
- a second camera head comprising further, equivalent first, second and third camera means therein.
- the second camera head is directed in a substantially opposing direction to that of the first camera head.
- the second camera head has a field of view spanning up to 160°.
- said detection system comprises first and second camera heads, each comprising first, second and third camera means and directed in first and second opposing directions, having a combined field of view of up to 320°.
- the first and second camera heads may be provided directed in substantially opposing directions, at an angle relative to one another, such that the respective fields of view of the first and second camera heads may contact or overlap with one another.
- the combined field of view may be less than 320°.
- said first and second camera heads are positioned at an angle of at least 20° with respect to one another.
- said first and second camera heads are positioned at an angle relative to one another such that their respective fields of view contact or overlap with one another.
- said first camera means is provided to act as an object detection camera
- said second and third camera means are provided to act as left and right stereo cameras, respectively.
- said first camera means may be further arranged to detect and recognize speed limit signs.
- said computing means may be arranged to process information from said signs and communicate the same to a user, in use.
- said information may be displayed on display means associated with the system, in use.
- the system further includes data storage means.
- said data storage means are located with the apparatus.
- communication means are provided associated with the system.
- said communication means enable the communication of data stored on data storage means associated with the system and/or real-time data obtained by the camera means to be transferred/communicated to a remote location.
- said first camera means are further arranged to record and save visual data of the detection and occurrence of an object to said data storage means.
- the system further includes an accelerometer.
- said accelerometer is in communication with said computing means. Further typically, said accelerometer is arranged to enable the system to adjust the field and depth of view of the first, second and third camera means according to the speed and movement of the apparatus, in use.
- said computing means is arranged to analyse each frame of visual data obtained by the first camera means to detect the occurrence of an object, in use. This consequently enables a higher level of accurate identification, in particular for partially obscured objects, and is not reliant on or restricted to a "ground-up" approach to detection, instead employed a whole image analysis approach.
- said computing means includes machine-learning software, provided so as to ensure the detection and classification of detected objects is continuously learned and improved on.
- said software includes deep neural network image classification software.
- system further includes notification and/or alert means, arranged to provide a visual and/or audio alert and/or notification on detection of an object deemed to be an object of interest.
- display means may be provided associated with the system.
- said display means may be arranged to provide a visual representation of the field of view of the first and/or second and third camera means.
- said display means may be arranged to provide an indication of the location and proximity of a detected object.
- said computing means further includes image resizing and reshaping software.
- image resizing and reshaping software is arranged to enable visual data obtained from the first, second and third camera means to be aligned and overlaid, consequently enabling the provision of higher frame rate video data.
- said computing means further includes software arranged to, in real time, convert raw visual/video data acquired and stored on data storage means to MPEG-4 Part 14 (MP4) file format.
- MP4 MPEG-4 Part 14
- said raw visual/video data is obtained in H.264 video coding format.
- an apparatus having:
- the first and second camera heads each have a field of view spanning up to 160°, and a combined field of view of up to 320°.
- the first and second camera heads may be provided directed in substantially opposing directions, at an angle relative to one another, such that the respective fields of view of the first and second camera heads may contact or overlap with one another.
- the combined field of view may be less than 320°.
- said first and second camera heads are positioned at an angle of at least 20° with respect to one another.
- said first and second camera heads are positioned at an angle relative to one another such that their respective fields of view contact or overlap with one another.
- a vehicle including an object detection system as described above provided thereon or therewith.
- said vehicle is an industrial vehicle.
- said vehicle is a fork lift truck (FLT).
- FLT fork lift truck
- said vehicle includes first and second camera heads.
- said first and second camera heads are arranged to be directed in substantially opposing directions, at an angle relative to one another, such that the respective fields of view of the first and second camera heads may contact or overlap with one another.
- a third camera head may be included, comprising at least first camera means.
- said third camera head is located to provide a view from the vehicle in a direction which may otherwise be obscured when the vehicle is carrying a load.
- the vehicle is an FLT carrying a plurality of pallets
- the view towards the front of the vehicle may be obscured or blocked to the driver.
- FLTs when FLTs are loaded, they should be driven in "reverse" providing the driver with a clear view of the surroundings.
- forward movement may be required, generally when travelling up an incline.
- said third camera head is thus provided as an "impaired vision" camera head.
- said third camera head an associated camera means are arranged to activate only when the vehicle is moving in the direction which the driver's view is obscured or blocked.
- the vehicle may be provided with a first camera head, comprising first camera means, and second and third camera means in communication with one another, and a second camera head arranged to act as an impaired vision "camera head", comprising at least first camera means.
- notification and/or alert means are provided with the vehicle, in communication with the object detection system and arranged to notify and/or alert a driver of the vehicle of a detection of an object, in use.
- said notification and/or alert means are provided in an interior of the vehicle and/or are arranged to be directed towards the driver, in use.
- said notification and/or alert means are arranged to activate in a directional manner to signal to the driver the approximate direction and location of the detected object. That is to say, if for example, notification means are provided in the form of audio speakers in four corners of the cabin of an FLT, and an object of interest is detected forward and right of the vehicle, the front right speaker may activate to signal to the driver that the detected object is in that general direction.
- further notification and/or alert means may be provided associated with the vehicle, in communication with the object detection system and arranged to notify and/or alert the detected object of the vehicle's presence, in use.
- said further notification and/or alert means are provided on an exterior of the vehicle and/or are arranged to be directed outwardly of the vehicle, in use.
- the system includes an accelerometer, arranged to detect the speed and direction of movement of the vehicle, in use.
- the vehicle may include an accelerometer provided in communication with the object detection system.
- said accelerometer is arranged to communicate real time data of the speed and direction of movement of the vehicle to the computing means, enabling the same to assess the relative position, location and direction of movement of the detected object, in use.
- said vehicle includes display means provided therein, in communication with or forming part of the objection detection system.
- said display means are arranged to provide a visual representation of the field of view of the first and/or second and third camera means, and provide an indication of the location and proximity of a detected object.
- the system further includes data storage means.
- said data storage means are located with the apparatus.
- communication means are provided associated with the system.
- said communication means enable the communication of data stored on data storage means and/or real-time data obtained by the camera means to be transferred/communicated to a remote location.
- various data and information may be collected by the system in relation to the usage of the vehicle, stored and transferred to a remote location, for example a centrally located server, for subsequent review and analysis.
- data may include any or any combination of: logging of driver habits; login of vehicle habits; shift monitoring; fleet management; and/or vehicle "not in motion" monitoring.
- a method of detecting and ascertaining the distance and position of an object using an object detection system including the steps of:
- computing means are provided with the system.
- said computing means receives and processes visual data obtained from the first camera means, correcting any image distortion therein, discerning the nature and type of a detected object and subsequently classifying it.
- processing of the visual data is achieved by use of an open-source algorithm known as "You Only Look Once" (YOLO).
- the computing means determines whether the detected object or objects is/are objects of interest.
- said object of interest may be predefined with the computing means as humans or objects.
- said computing means is used to extract visual data from the second and third camera means.
- image distortion in visual data from the second camera means and image distortion in visual data from the third camera means are corrected by the computing means.
- visual data from the second camera means and visual data from the third camera means are subsequently remapped to correspond with the visual data obtained by the first camera means.
- said first, second and third camera means are located together in a first camera head.
- a second camera head comprising further, equivalent first, second and third camera means therein, is provided similarly to detect the occurrence of an object and resolve the relative distance and angle of said object.
- the second camera head is directed in an opposing direction to that of the first camera head, and each of said first and second camera heads have a field of view spanning up to 160°, and a combined field of view of up to 320°.
- the computing means if an object detected is determined to be an object of interest, the computing means subsequently creates a 3D disparity map to discern relative depth and angle information of the detected object.
- the present invention therefore in effect uses computer stereo vision, wherein the second camera means acts as a left stereo camera and the third camera means acts as a right stereo camera, resolving the distance and relative location of an object detected by the first camera means and deemed of interest.
- the first camera means utilizes whole image analysis to detect and identify the nature and type of object, which has significant advantages over similar systems in the prior art as the entire image is analysed, ensuring that even partially obscured or raised objects of interest are detected.
- said computing means employs machine-learning software, ensuring the detection and classification of detected objects is continuously learned and improved on.
- said software includes deep neural network image classification software.
- footage of said incident may be recorded and stored on data storage means provided or associated with the system.
- the recorded footage is set to begin a predetermined time period before the incident and cease a predetermined time period after the incident.
- said time period may be up to 20 seconds prior and after the incident.
- said time period may be 10 seconds prior and after the incident.
- an incident may be defined as an object of interest being detected within a predetermined distance and/or arc/angle within the field of view of the camera means of the system.
- said predetermined distance and/or arc/angle within the field of view may be user-defined.
- the computing means in real time, converts raw visual/video data acquired and stored on data storage means to MPEG-4 Part 14 (MP4) file format.
- MP4 MPEG-4 Part 14
- said raw visual/video data is obtained in H.264 video coding format and subsequently converted to MP4 file format.
- said computing means includes image resizing and reshaping software, aligning and overlaying visual data obtained from the first, second and third camera means, thereby enabling the provision of higher frame rate video data.
- the system is configured to work in conjunction with an accelerometer provided therewith, wherein the computing means adjusts the field and depth of view according to the speed and direction of movement at which the system is travelling.
- notification and/or alert means are provided which activate to notify and/or alert a person to the detection of the object.
- said person is a driver of a vehicle with which the system is located.
- further notification and/or alert means are provided which activate to notify and/or alert the detected object of interest to the presence of a vehicle with which the system is located.
- display means are provided associated with the system, and the computing means sends to the display means a visual representation of the field of view of the first and/or second and third camera means.
- said display means provides an indication of the location and proximity of a detected object of interest.
- said first camera means may further detect and recognize speed limit signs, and said computing processes information from said signs to communicate the same to a user.
- said information is displayed on display means associated with the system.
- FIG. 1 there is generally illustrated a schematic of an object detection system 1, which includes primarily a camera apparatus 3.
- the camera apparatus includes three cameras: a first, object camera 5 which is provided to detect the occurrence of an object within its field of view, and second and third cameras 7, 9 which are separate and distinct from one another though in communication with one another via computing means in the form of a central processing unit (CPU) 11 provided associated with the camera apparatus 3 as part of the system 1.
- the CPU 11 may be integrated into the camera apparatus 3 or be provided as a separate body connected to the apparatus 3.
- the second 7 and third 9 cameras act as left and right stereo cameras and are provided to resolve the distance and angle of the detected object relative to the camera apparatus 3.
- the three cameras 5, 7, 9 are provided within the camera apparatus 3 in a linear arrangement and in the same horizontal plane with respect to one another.
- the CPU 11 receives and processes the visual data collected by the camera 5 and goes on to determine the nature and type of the detected object, and subsequently classify it.
- the CPU 11 includes an open-source algorithm stored thereon, known as "You Only Look Once" (YOLO), which effectively conduct a whole-image analysis of the visual data to detect and resolve an object. Subsequently, it is then determined whether or not the detected object is an "object of interest".
- the software employed by the CPU 11 identifies what the detected object is, and a previously user-defined set of objects may be classified as "objects of interest", for example, humans, animals, or other specifically defined objects and the like. If the detected object falls within this category, it is deemed "of interest”. This ensures that in instances where boxes, pallets, road cones etc. are detected, these are not flagged as objects of interest.
- object detection software packages such as those which employ a ground-up analysis. This is particularly the case for partially obscured objects, which may not be detected if they are obstructed at ground level, for example. Further, this approach also differs fundamentally from the use of Lidar or Radar approaches.
- Each of the second and third cameras 7, 9 obtain visual data and the CPU 11 extracts this data. Image distortion in the visual data from each camera are corrected by the CPU 11 software and subsequently remapped to correspond with the visual data acquired from the first camera 5. Consequently, with the second and third cameras 7, 9 acting as left and right stereo cameras, if an object detected is determined to be an object of interest, the CPU 11 takes the extracted data from the second and third cameras 7, 9 and creates a 3D disparity map to discern the relative depth and angle information of the detected object.
- the CPU 11 also includes image resizing and reshaping software, which enables visual/video data obtained from the first, second and third cameras 5, 7, 9 to be aligned and overlaid, consequently enabling the provision of higher frame rate video data.
- the CPU 11 further includes machine-learning software, ensuring the detection and classification of detected objects is continuously learned and improved on.
- the analysis software typically utilizes a deep neural network image classifier.
- the system 1 itself is typically provided for use with or incorporated into a vehicle.
- the system is utilized in industrial vehicles such as fork lift trucks (FLTs), as depicted in Figure 2 , which illustrates the FLT 13, the field of view 15 of the system 1 on the FLT, and an object of interest in the form of a person 17 to be detected.
- the field of view 15 of the cameras 5, 7, 9 typically spans up to 160° in angle, and the depth or range of view may be up to 10m. In some preferred embodiments, the depth or range of view may be limited to 8m. Within the maximum possible ranges, the actual field of view in some embodiments may be a specific user-defined region.
- the system 1 after detecting it determines whether or not it meets the criteria to be classified as an object of interest 17, while also computing the exact distance and position of the object 17. If it is determined to be of interest, then the 3D disparity map is subsequently created to pinpoint the location and track the object.
- the visual/video data obtained and the disparity maps which are subsequently created are stored on computer data storage means 19, which may be provided in various well-known forms.
- the cameras 5, 7, 9 may be located in a single camera head or apparatus 3 and having a field of view spanning up to 160°.
- a second camera head or apparatus 3', including further, equivalent cameras, 5', 7', 9' may also be provided associated with the system 1, also having a field of view spanning up to 160°.
- the second camera head 3' is arranged such that it is directed in a substantially opposing direction from that of the first camera head 3 - essentially placing the two heads 3, 3' back-to-back, or in some embodiments, at a slight angle with respect to one another. This consequently may provide a detection system 1 having a combined field of view of up to 320°, and is shown in one example in Figure 3 .
- the camera heads 3, 3' can be placed at an angle with respect to one another, and preferably at least 20°, such that their respective fields of view contact or overlap with one another. This can be particularly advantageous for industrial vehicles and in particular an FLT 13 wherein on most occasions, the vehicle will be moving around carrying a load which more often than not will be obstructing at least a part of the driver's view.
- the provision of a detection system 1 in the vehicle 13 having dual camera heads 3, 3' ensures as wide a coverage and detection as possible.
- an accelerometer 21 may be provided as part of the system 1 or with the FLT 13 and subsequently connected to the system 1. Linking an accelerometer with the system 1 enables the CPU 11 to take into account the speed and direction of movement of the FLT 13 as it travels, and accordingly adjust the field of view 15 of the system 1 to accommodate the movement of the FLT 13. For example, if the FLT 13 increases the speed at which it is travelling, the system 1 will automatically scan and detect objects at a greater distance in order to increase safety and ultimately be able to provide adequate notification to a driver of the FLT 13 in sufficient time. Such notifications or alerts may be provided via the provision of notification or alert means with the system 1, or fixed in the vehicle and connected to the system 1.
- the FLT 13 may be fitted with a number of interior speakers 23 or other audio means, which as an object of interest 17 is detected, may emit an audio alert to direct the driver's attention to the presence of the object 17.
- the speakers 23 can then be arranged to be directional, which is to say that once the system has detected and ascertained the precise location of the object 17, which is forward and right of the vehicle, the front right speaker may activate to signal to the driver that the detected object 17 is in that general direction.
- a display screen 25 may also be provided which displays a visual representation of the field of view 15 as seen by cameras 5, 7, 9 - or rather, the composite view as processed by the CPU 11.
- a graphical or plan view display of the relevant area/field of view may instead be presented.
- the detected object 17 can be clearly highlighted on the screen 25 and so the driver of the vehicle 13 will be clearly notified of their presence and location.
- the display screen 25 may also provide additional information as resolved by the CPU 11 such as the exact distance of the object 17 from the vehicle 13, and the relative direction of movement.
- an additional speaker or speakers 27 may be provided to be located on the exterior of the vehicle 13. This speaker may be provided to act as an alert for the object 17 itself which has been detected, to notify them as to the presence of the vehicle 13.
- the system 1 may further include communication means incorporated therewith, enabling it to communicate with a remote, third-party location.
- data which is acquired and stored on the data storage means 19 and/or real-time data obtained by the cameras 5, 7, 9 can be transferred/communicated to the remote locate, for example, a central server. Consequently, various data and information may be collected by the system 1 in relation to the usage of the vehicle 13, stored and transferred to a remote location for subsequent review and analysis.
- data may include any or any combination of: logging of driver habits; login of vehicle habits; shift monitoring; fleet management; and/or vehicle "not in motion" monitoring.
- Driver habit information may include driver reference identification, logging the telemetry data of the driver, determining whether a driver is more susceptible to near-misses / collisions than others.
- Vehicle habit information may include various vehicle telemetry data (acceleration, deceleration, pitch, roll, yaw), determining whether a particular vehicle is more susceptible to near-misses / collisions than others, and also monitoring any "not in motion" states, i.e., when the vehicle is in a live state but not in motion (idling). Time stamps may be included to monitor driver shift patterns. All this data may be utilised to effectively manage/maintain a fleet of vehicles.
- an object 17 is detected by the system 1 and is deemed to be "of interest" according to the predetermined user parameters, and an incident occurs, that is to say, the object 17 is detected and comes within a predetermined distance of the vehicle 13 and encroaches into a defined “critical alert region” 31, footage of the incident is recorded and stored on the data storage medium 19, which can be downloaded/transferred for review etc. as required.
- the recorded footage which is subsequently stored can be set to begin a predetermined time period before the actual occurrence of the incident or event. For example, the stored footage may be set to begin up to 20 seconds prior to the incident and end up to 20 seconds after the incident.
- the original footage which is captured by the cameras will be in its raw format and is generally obtained in H.264 video coding format.
- the CPU 11 is provided with further software which enables it, in real time as the footage is being captured, to convert the raw visual/video data from H.264 video coding format to an MPEG-4 Part 14 (MP4) file format.
- MP4 MPEG-4 Part 14
- This is achieved by converting each frame as it is captured by the object detection camera 5 from a standard RGB (3 channel colour) image into a YUV422 (1 channel image with colour data encoded) image.
- This YUV422 image is added into a rolling buffer of frames which contains up to 10 seconds of still image data (circa 300 frames). From the rolling buffer, each frame is converted into a compressed H.264 stream and passed through the system's "stream to video container" algorithm.
- This algorithm scans the incoming video data stream and performs live modifications of the raw stream to made it ready for encapsulating within an MP4 video container. As the raw data passes through the algorithm, pertinent information relating to the header and footer information of the output MP4 file is gathered. At the end of the rolling buffer collection the output video file is finalised and ready for exporting within 20-50 milliseconds of the completion of the proximity event/incident. Conversion to a more widely used and accessed MP4 file format means that the download/transfer and viewing of the footage is straightforward and can be played on most types of devices.
- Figure 4 illustrates a simplified flow diagram broadly highlighting the process the system 1 moves through when scanning for and detecting objects and discerning whether or not they are "of interest”.
- An advantage of the system is that the object detection carried out by the first camera 5 is run simultaneously with the depth analysis data obtained by the second and third cameras 7, 9. If a detected object 17 is subsequently determined to be "of interest", then the 3D disparity map is created and the output sent to the display screen 25 and or speakers 23, 27.
- the unique approach of the present invention is to analyse and identify any object of interest in the 2D plane. This involves an initial system sequence to remove any lens distortion from the images captured by the camera by remapping the location of each individual pixel in the image from its original position to a corrected flat perspective image position. This ensures: a stable 3D stereoscopic image provided by both stereo cameras suitable for creating a true disparity map spanning in excess of 160° horizontally; an object detector image without any image distortion to increase detection accuracy; and a mechanism to ensure that the spatial characteristics of both the object of interest and 3D cameras can be accurately aligned to each other.
- the analysis of the complete image is achieved by using a Deep Neural Network Image Classifier to inspect each image for the presence of complete and/or partial representations of the chosen object or objects of interest for detection.
- the system 1 analyses each image to gain detailed information on the object of interest 17 to determine the need for further computational measures for calculating the distance of the detected object of interest from the camera apparatus 3 or vehicle 13.
- an object 17 has been identified as requiring 3D positional data, only then does the system create a 3D disparity depth map. As the object camera 5 and depth cameras 7, 9 are aligned to each other, the segmentation of the depth region that covers the detected object 17 is interrogated to determine the distance from the object to the camera apparatus 3.
- the system 1 subsequently provides the operator with a graphical display detailing the proximity of the object of interest in relation to the camera within up to a 160° operational field of view 15.
- the system 1 will record the event for future review. Any critical alert will trigger a recording 20 seconds prior to and post of the event giving accurate information as to how and why each event took place.
- the event recording will be by direct stream conversion of the incoming still images into a compressed video format, which is then in turn remuxed in real time into an exportable video file format.
- the system 1 is designed to "teach out" objects that are not objects of interest or that cause confusion within the detection of the object of interest, aided by the incorporation of the YOLO software. Due to the diverse and expandable nature of the deep neural network image classifier utilised in this invention, objects of interest selected for detection can vary vastly in size, orientation, colour and presentation. In the case of humans as the desired object for detection, the system 1 is able to detect independently of stance, height, build, clothing, clothing colour, obscured by other objects, and detection in low/high contrast environments. As both the object detector and 3D analysis operate in parallel, the system 1 is able to operate at an increased frame rate compared to a series process.
- the higher frame rate enables the system 1 to generate more images per second for analysis to achieve quicker detection rates. This decreases the response time when alerting the driver of the vehicle 13 of an object of interest 17 being detected.
- This high frame rate means the system 1 can analyse for a detection every 33 milliseconds, which allows for multi-frame detection and therefore greater accuracy of detection whilst still being able to report detections in "real time”.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mechanical Engineering (AREA)
- Traffic Control Systems (AREA)
- Forklifts And Lifting Vehicles (AREA)
Abstract
The present invention provides an object detection system. The system includes an apparatus having first camera means provided to detect the occurrence of an object within a field of view of the first camera means, and second and third camera means in communication with one another, provided to resolve the distance and angle of the detected object relative to the apparatus. The present invention also provides a vehicle including such an object detection system.
Description
- The invention to which this application relates is an object detection system and a method of use thereof. The present invention also relates to a vehicle comprising such an object detection apparatus.
- Pedestrian detection systems are employed in particular in road vehicles as an aid to alert the driver of a nearby pedestrian which who is within a certain proximity of the vehicle and may ultimately cross the path of the vehicle. Essentially, a number of computer sensors are provided located around the vehicle to detect people within a predefined distance. The onboard computer of the vehicle can then use this data to alert the driver of any close-by objects. Such sensors may also be used to aid in the parking of the vehicle.
- Other, more advanced systems exist wherein the precise distance of a detected object may be discerned.
FR2958774 (A1 FR2958774 (A1 - Using this ground-up approach can be useful in some circumstances, however, problems do exist with such system, in particular when used by heavy goods vehicles (HGVs) or other industrial vehicles. For example, the nature of industrial vehicles is such that they may be required to carry loads of varying shapes and sizes, and in the case of fork lift trucks (FLTs), the loads being carried/held may further be done so at any given height depending on what is required at that time. Consequently, if for instance a pedestrian is situated on a raised walkway above the ground on which the vehicle is located (for example, raised by even just 0.5m), the detection methodology employed by the system and method of
FR2958774 (A1 - Another example would be instances where the pedestrian is on a raised walkway which gradually descends to become part of the ground plane. Only as the pedestrian locates on the ground plane and is consequently identified as a detachment therefrom, would the system detect them as a relevant object of interest and subsequently notify the driver of their presence. If the FLT is travelling at speed and the driver is unsighted as a consequence of the size and shape of the load being carried, the alert/ notification may arrive too late to avoid incident or accident. Other examples may include scenarios where a pedestrian is partially obscured or obstructed from camera view as they are, at the time of detection, behind an obstacle such as a stack of pallets. Utilizing the ground-up approach of
FR2958774 (A1 - It is therefore an aim of the present invention to provide an improved system for detecting an objection which overcomes the aforementioned problems associated with the prior art.
- It is a further aim of the present invention to provide a vehicle fitted with an improved system for detecting an object which overcomes the aforementioned problems associated with the prior art.
- It is yet a further aim of the present invention to provide an improved method of detecting an object which overcomes the aforementioned problems associated with the prior art.
- According to a first aspect of the invention there is provided an object detection system, said system including an apparatus having:
- first camera means provided to detect the occurrence of an object within a field of view of said first camera means; and
- second and third camera means in communication with one another, provided to resolve the distance and angle of the detected object relative to the apparatus.
- Typically, said first camera means are provided to classify the type of object which has been or is detected. Preferably, said first camera means is provided to detect humans and/or objects within its field of view.
- Preferably, said system is located with or forms part of a vehicle. Typically, said vehicle is an industrial vehicle. Further typically, said vehicle is a fork lift truck (FLT).
- In one embodiment, the system includes computing means, provided in communication with said first, second and third camera means. In one embodiment, said computing means may be provided in the apparatus. Typically, said computing means are arranged to receive and process visual data obtained by the first camera means to discern nature and type of a detect object and subsequently classify it. Typically, said computing means includes an open-source algorithm to process and classify said visual data. Preferably, said open-source algorithm is an algorithm known as "You Only Look Once" (YOLO).
- Preferably, said computing means is arranged to determine if a detected object is an object of interest. For example, an object of interest may be defined to be a human or specific object, whereas if boxes, pallets, road cones etc. are detected, these may not be deemed to be objects of interest.
- Typically, the field of view of the first, second and third camera means spans up to 160°. Typically, the depth of view or range of detection of the first, second and third camera means is up to 10m. Preferably, the depth of view or range of detection is up to 8m.
- In one embodiment, the field and depth of view may be a user-defined area or region. For example, in some embodiments, the angle of the field of view may be narrowed and the depth of view and/or detection may consequently be increased, thereby enabling detection at a greater distance, which is particularly useful when travelling at speed, but within a more narrow or concentrated field of view.
- In one embodiment, the computing means are further arranged to receive and process data obtained by said second and third camera means, and resolve and process the depth/distance and angle of a detected object.
- Typically, said first, second and third camera means are arranged linearly with respect to one another and in the same horizontal plane as one another.
- Typically, said second and third camera means are provided as separate and distinct camera means, in communication with one another via the computing means.
- In one embodiment, said first, second and third camera means are located together in a first camera head. Typically, the first camera head has a field of view spanning up to 160°.
- In one embodiment, a second camera head, comprising further, equivalent first, second and third camera means therein, is provided. Preferably, the second camera head is directed in a substantially opposing direction to that of the first camera head. Typically, the second camera head has a field of view spanning up to 160°.
- Thus, in some embodiments, said detection system comprises first and second camera heads, each comprising first, second and third camera means and directed in first and second opposing directions, having a combined field of view of up to 320°.
- In some embodiments, the first and second camera heads may be provided directed in substantially opposing directions, at an angle relative to one another, such that the respective fields of view of the first and second camera heads may contact or overlap with one another. Thus, typically, the combined field of view may be less than 320°.
- In one embodiment, said first and second camera heads are positioned at an angle of at least 20° with respect to one another. Typically, said first and second camera heads are positioned at an angle relative to one another such that their respective fields of view contact or overlap with one another.
- Preferably, said first camera means is provided to act as an object detection camera, and said second and third camera means are provided to act as left and right stereo cameras, respectively.
- In one embodiment, said first camera means may be further arranged to detect and recognize speed limit signs. Typically, said computing means may be arranged to process information from said signs and communicate the same to a user, in use. Preferably, said information may be displayed on display means associated with the system, in use.
- Typically, the system further includes data storage means. In one embodiment, said data storage means are located with the apparatus.
- In one embodiment, communication means are provided associated with the system. Typically, said communication means enable the communication of data stored on data storage means associated with the system and/or real-time data obtained by the camera means to be transferred/communicated to a remote location.
- In one embodiment, said first camera means are further arranged to record and save visual data of the detection and occurrence of an object to said data storage means.
- In one embodiment, the system further includes an accelerometer. Typically, said accelerometer is in communication with said computing means. Further typically, said accelerometer is arranged to enable the system to adjust the field and depth of view of the first, second and third camera means according to the speed and movement of the apparatus, in use.
- In one embodiment, said computing means is arranged to analyse each frame of visual data obtained by the first camera means to detect the occurrence of an object, in use. This consequently enables a higher level of accurate identification, in particular for partially obscured objects, and is not reliant on or restricted to a "ground-up" approach to detection, instead employed a whole image analysis approach.
- Typically, said computing means includes machine-learning software, provided so as to ensure the detection and classification of detected objects is continuously learned and improved on. Typically, said software includes deep neural network image classification software.
- In one embodiment, the system further includes notification and/or alert means, arranged to provide a visual and/or audio alert and/or notification on detection of an object deemed to be an object of interest.
- In one embodiment, display means may be provided associated with the system. Typically, said display means may be arranged to provide a visual representation of the field of view of the first and/or second and third camera means. Further typically, said display means may be arranged to provide an indication of the location and proximity of a detected object.
- In one embodiment, said computing means further includes image resizing and reshaping software. Typically, such software is arranged to enable visual data obtained from the first, second and third camera means to be aligned and overlaid, consequently enabling the provision of higher frame rate video data.
- In one embodiment, said computing means further includes software arranged to, in real time, convert raw visual/video data acquired and stored on data storage means to MPEG-4 Part 14 (MP4) file format. Typically, said raw visual/video data is obtained in H.264 video coding format.
- In another aspect of the present invention, there is provided an object detection system, said system including an apparatus having:
- a first camera head directed in a first direction;
- a second camera head directed in a second, substantially opposing direction to that of the first camera head;
- wherein the first and second camera heads each comprise first camera means provided to detect the occurrence of an object within a field of view of said first camera means; and
- second and third camera means in communication with one another, provided to resolve the distance and angle of the detected object relative to the apparatus.
- Typically, the first and second camera heads each have a field of view spanning up to 160°, and a combined field of view of up to 320°.
- In some embodiments, the first and second camera heads may be provided directed in substantially opposing directions, at an angle relative to one another, such that the respective fields of view of the first and second camera heads may contact or overlap with one another. Thus, typically, the combined field of view may be less than 320°.
- In one embodiment, said first and second camera heads are positioned at an angle of at least 20° with respect to one another. Typically, said first and second camera heads are positioned at an angle relative to one another such that their respective fields of view contact or overlap with one another.
- In another aspect of the present invention, there is provided a vehicle including an object detection system as described above provided thereon or therewith.
- Typically, said vehicle is an industrial vehicle. Preferably, said vehicle is a fork lift truck (FLT).
- In some embodiments, said vehicle includes first and second camera heads. Typically, said first and second camera heads are arranged to be directed in substantially opposing directions, at an angle relative to one another, such that the respective fields of view of the first and second camera heads may contact or overlap with one another.
- In some embodiments, a third camera head may be included, comprising at least first camera means. Typically, said third camera head is located to provide a view from the vehicle in a direction which may otherwise be obscured when the vehicle is carrying a load. For example, where the vehicle is an FLT carrying a plurality of pallets, the view towards the front of the vehicle may be obscured or blocked to the driver. For safety, when FLTs are loaded, they should be driven in "reverse" providing the driver with a clear view of the surroundings. However, in some circumstances, forward movement may be required, generally when travelling up an incline. Typically, said third camera head is thus provided as an "impaired vision" camera head. In some embodiments, said third camera head an associated camera means are arranged to activate only when the vehicle is moving in the direction which the driver's view is obscured or blocked.
- In other embodiments of the invention, the vehicle may be provided with a first camera head, comprising first camera means, and second and third camera means in communication with one another, and a second camera head arranged to act as an impaired vision "camera head", comprising at least first camera means.
- In one embodiment, notification and/or alert means are provided with the vehicle, in communication with the object detection system and arranged to notify and/or alert a driver of the vehicle of a detection of an object, in use. Typically, said notification and/or alert means are provided in an interior of the vehicle and/or are arranged to be directed towards the driver, in use.
- Typically, said notification and/or alert means are arranged to activate in a directional manner to signal to the driver the approximate direction and location of the detected object. That is to say, if for example, notification means are provided in the form of audio speakers in four corners of the cabin of an FLT, and an object of interest is detected forward and right of the vehicle, the front right speaker may activate to signal to the driver that the detected object is in that general direction.
- In one embodiment, further notification and/or alert means may be provided associated with the vehicle, in communication with the object detection system and arranged to notify and/or alert the detected object of the vehicle's presence, in use. Typically, said further notification and/or alert means are provided on an exterior of the vehicle and/or are arranged to be directed outwardly of the vehicle, in use.
- In one embodiment, the system includes an accelerometer, arranged to detect the speed and direction of movement of the vehicle, in use.
- In another embodiment, the vehicle may include an accelerometer provided in communication with the object detection system.
- Typically, said accelerometer is arranged to communicate real time data of the speed and direction of movement of the vehicle to the computing means, enabling the same to assess the relative position, location and direction of movement of the detected object, in use.
- In one embodiment, said vehicle includes display means provided therein, in communication with or forming part of the objection detection system. Typically, said display means are arranged to provide a visual representation of the field of view of the first and/or second and third camera means, and provide an indication of the location and proximity of a detected object.
- Typically, the system further includes data storage means. In one embodiment, said data storage means are located with the apparatus.
- In one embodiment, communication means are provided associated with the system. Typically, said communication means enable the communication of data stored on data storage means and/or real-time data obtained by the camera means to be transferred/communicated to a remote location.
- Thus, in some embodiments, various data and information may be collected by the system in relation to the usage of the vehicle, stored and transferred to a remote location, for example a centrally located server, for subsequent review and analysis. Such data may include any or any combination of: logging of driver habits; login of vehicle habits; shift monitoring; fleet management; and/or vehicle "not in motion" monitoring.
- In another aspect of the present invention, there is provided a method of detecting and ascertaining the distance and position of an object using an object detection system, said method including the steps of:
- detecting the occurrence of an object with first camera means within a field of view thereof;
- utilizing second and third camera means, in communication with one another to resolve the distance and angle of the detected object, relative to the system.
- In one embodiment, computing means are provided with the system. Typically, said computing means receives and processes visual data obtained from the first camera means, correcting any image distortion therein, discerning the nature and type of a detected object and subsequently classifying it. Typically, the processing of the visual data is achieved by use of an open-source algorithm known as "You Only Look Once" (YOLO).
- Typically, after processing the visual data and classifying any object or objects detected, the computing means determines whether the detected object or objects is/are objects of interest. Preferably, said object of interest may be predefined with the computing means as humans or objects.
- In one embodiment, said computing means is used to extract visual data from the second and third camera means. Typically, image distortion in visual data from the second camera means and image distortion in visual data from the third camera means are corrected by the computing means. Further typically, visual data from the second camera means and visual data from the third camera means are subsequently remapped to correspond with the visual data obtained by the first camera means.
- In one embodiment, said first, second and third camera means are located together in a first camera head. Typically, a second camera head, comprising further, equivalent first, second and third camera means therein, is provided similarly to detect the occurrence of an object and resolve the relative distance and angle of said object.
- Preferably, the second camera head is directed in an opposing direction to that of the first camera head, and each of said first and second camera heads have a field of view spanning up to 160°, and a combined field of view of up to 320°.
- In one embodiment, if an object detected is determined to be an object of interest, the computing means subsequently creates a 3D disparity map to discern relative depth and angle information of the detected object. The present invention therefore in effect uses computer stereo vision, wherein the second camera means acts as a left stereo camera and the third camera means acts as a right stereo camera, resolving the distance and relative location of an object detected by the first camera means and deemed of interest. The first camera means utilizes whole image analysis to detect and identify the nature and type of object, which has significant advantages over similar systems in the prior art as the entire image is analysed, ensuring that even partially obscured or raised objects of interest are detected.
- Typically, said computing means employs machine-learning software, ensuring the detection and classification of detected objects is continuously learned and improved on. Typically, said software includes deep neural network image classification software.
- In one embodiment, if an object of interest is detected and a predetermined incident occurs in relation to said object, footage of said incident may be recorded and stored on data storage means provided or associated with the system. Typically, the recorded footage is set to begin a predetermined time period before the incident and cease a predetermined time period after the incident. Typically, said time period may be up to 20 seconds prior and after the incident. Preferably, said time period may be 10 seconds prior and after the incident.
- In one embodiment, an incident may be defined as an object of interest being detected within a predetermined distance and/or arc/angle within the field of view of the camera means of the system. Typically, said predetermined distance and/or arc/angle within the field of view may be user-defined.
- In one embodiment, the computing means in real time, converts raw visual/video data acquired and stored on data storage means to MPEG-4 Part 14 (MP4) file format. Typically, said raw visual/video data is obtained in H.264 video coding format and subsequently converted to MP4 file format.
- Typically, said computing means includes image resizing and reshaping software, aligning and overlaying visual data obtained from the first, second and third camera means, thereby enabling the provision of higher frame rate video data.
- In one embodiment, the system is configured to work in conjunction with an accelerometer provided therewith, wherein the computing means adjusts the field and depth of view according to the speed and direction of movement at which the system is travelling.
- In one embodiment, upon detection of an object of interest, notification and/or alert means are provided which activate to notify and/or alert a person to the detection of the object. Typically, said person is a driver of a vehicle with which the system is located.
- In one embodiment, upon detection of an object of interest, further notification and/or alert means are provided which activate to notify and/or alert the detected object of interest to the presence of a vehicle with which the system is located.
- In one embodiment, display means are provided associated with the system, and the computing means sends to the display means a visual representation of the field of view of the first and/or second and third camera means. Typically, said display means provides an indication of the location and proximity of a detected object of interest.
- In one embodiment, said first camera means may further detect and recognize speed limit signs, and said computing processes information from said signs to communicate the same to a user. Preferably, said information is displayed on display means associated with the system.
- Embodiments of the present invention will now be described with reference to the accompanying figures, wherein:
-
Figure 1 illustrates a schematic of the components parts of an object detection system, in accordance with an embodiment of the present invention; -
Figure 2 illustrates a plan view schematic of a vehicle having an object detection system located therewith and its field of view, in accordance with an embodiment of the present invention; -
Figure 3 illustrates a plan view schematic of a vehicle having an object detection system located therewith and its field of view, in accordance with another embodiment of the present invention; and -
Figure 4 illustrates a simplified flow diagram of a method of detecting an object using an object detection system, in accordance with an embodiment of the present invention. - Referring now to the figures, in
Figure 1 there is generally illustrated a schematic of an object detection system 1, which includes primarily acamera apparatus 3. The camera apparatus includes three cameras: a first, objectcamera 5 which is provided to detect the occurrence of an object within its field of view, and second andthird cameras camera apparatus 3 as part of the system 1. TheCPU 11 may be integrated into thecamera apparatus 3 or be provided as a separate body connected to theapparatus 3. The second 7 and third 9 cameras act as left and right stereo cameras and are provided to resolve the distance and angle of the detected object relative to thecamera apparatus 3. The threecameras camera apparatus 3 in a linear arrangement and in the same horizontal plane with respect to one another. After thefirst camera 5 has detected an object, theCPU 11 receives and processes the visual data collected by thecamera 5 and goes on to determine the nature and type of the detected object, and subsequently classify it. theCPU 11 includes an open-source algorithm stored thereon, known as "You Only Look Once" (YOLO), which effectively conduct a whole-image analysis of the visual data to detect and resolve an object. Subsequently, it is then determined whether or not the detected object is an "object of interest". That is to say, the software employed by theCPU 11 identifies what the detected object is, and a previously user-defined set of objects may be classified as "objects of interest", for example, humans, animals, or other specifically defined objects and the like. If the detected object falls within this category, it is deemed "of interest". This ensures that in instances where boxes, pallets, road cones etc. are detected, these are not flagged as objects of interest. The whole image analysis performed by theCPU 11 on the visual/video data collected by thefirst camera 5 has distinct advantages over other object detection software packages, such as those which employ a ground-up analysis. This is particularly the case for partially obscured objects, which may not be detected if they are obstructed at ground level, for example. Further, this approach also differs fundamentally from the use of Lidar or Radar approaches. - Each of the second and
third cameras CPU 11 extracts this data. Image distortion in the visual data from each camera are corrected by theCPU 11 software and subsequently remapped to correspond with the visual data acquired from thefirst camera 5. Consequently, with the second andthird cameras CPU 11 takes the extracted data from the second andthird cameras CPU 11 also includes image resizing and reshaping software, which enables visual/video data obtained from the first, second andthird cameras CPU 11 further includes machine-learning software, ensuring the detection and classification of detected objects is continuously learned and improved on. The analysis software typically utilizes a deep neural network image classifier. - The system 1 itself is typically provided for use with or incorporated into a vehicle. Generally, the system is utilized in industrial vehicles such as fork lift trucks (FLTs), as depicted in
Figure 2 , which illustrates theFLT 13, the field ofview 15 of the system 1 on the FLT, and an object of interest in the form of aperson 17 to be detected. The field ofview 15 of thecameras view 15, the system 1 after detecting it determines whether or not it meets the criteria to be classified as an object ofinterest 17, while also computing the exact distance and position of theobject 17. If it is determined to be of interest, then the 3D disparity map is subsequently created to pinpoint the location and track the object. The visual/video data obtained and the disparity maps which are subsequently created are stored on computer data storage means 19, which may be provided in various well-known forms. - In a preferred embodiment of the present invention, the
cameras apparatus 3 and having a field of view spanning up to 160°. A second camera head or apparatus 3', including further, equivalent cameras, 5', 7', 9' may also be provided associated with the system 1, also having a field of view spanning up to 160°. The second camera head 3' is arranged such that it is directed in a substantially opposing direction from that of the first camera head 3 - essentially placing the twoheads 3, 3' back-to-back, or in some embodiments, at a slight angle with respect to one another. This consequently may provide a detection system 1 having a combined field of view of up to 320°, and is shown in one example inFigure 3 . The camera heads 3, 3' can be placed at an angle with respect to one another, and preferably at least 20°, such that their respective fields of view contact or overlap with one another. This can be particularly advantageous for industrial vehicles and in particular anFLT 13 wherein on most occasions, the vehicle will be moving around carrying a load which more often than not will be obstructing at least a part of the driver's view. The provision of a detection system 1 in thevehicle 13 having dual camera heads 3, 3' ensures as wide a coverage and detection as possible. - In some preferred embodiments of the invention, an
accelerometer 21 may be provided as part of the system 1 or with theFLT 13 and subsequently connected to the system 1. Linking an accelerometer with the system 1 enables theCPU 11 to take into account the speed and direction of movement of theFLT 13 as it travels, and accordingly adjust the field ofview 15 of the system 1 to accommodate the movement of theFLT 13. For example, if theFLT 13 increases the speed at which it is travelling, the system 1 will automatically scan and detect objects at a greater distance in order to increase safety and ultimately be able to provide adequate notification to a driver of theFLT 13 in sufficient time. Such notifications or alerts may be provided via the provision of notification or alert means with the system 1, or fixed in the vehicle and connected to the system 1. For example, theFLT 13 may be fitted with a number ofinterior speakers 23 or other audio means, which as an object ofinterest 17 is detected, may emit an audio alert to direct the driver's attention to the presence of theobject 17. There may be provided a singleinterior speaker 23 or in some embodiments,multiple speakers 23 may be provided in the vehicle which are located in, for example, each corner of the driver cabin. Thespeakers 23 can then be arranged to be directional, which is to say that once the system has detected and ascertained the precise location of theobject 17, which is forward and right of the vehicle, the front right speaker may activate to signal to the driver that the detectedobject 17 is in that general direction. - As a further aid for the driver of the vehicle, a
display screen 25 may also be provided which displays a visual representation of the field ofview 15 as seen bycameras CPU 11. Alternatively, a graphical or plan view display of the relevant area/field of view may instead be presented. The detectedobject 17 can be clearly highlighted on thescreen 25 and so the driver of thevehicle 13 will be clearly notified of their presence and location. Thedisplay screen 25 may also provide additional information as resolved by theCPU 11 such as the exact distance of theobject 17 from thevehicle 13, and the relative direction of movement. In some embodiments, an additional speaker orspeakers 27 may be provided to be located on the exterior of thevehicle 13. This speaker may be provided to act as an alert for theobject 17 itself which has been detected, to notify them as to the presence of thevehicle 13. - The system 1 may further include communication means incorporated therewith, enabling it to communicate with a remote, third-party location. In particular, data which is acquired and stored on the data storage means 19 and/or real-time data obtained by the
cameras vehicle 13, stored and transferred to a remote location for subsequent review and analysis. Such data may include any or any combination of: logging of driver habits; login of vehicle habits; shift monitoring; fleet management; and/or vehicle "not in motion" monitoring. Driver habit information may include driver reference identification, logging the telemetry data of the driver, determining whether a driver is more susceptible to near-misses / collisions than others. Vehicle habit information may include various vehicle telemetry data (acceleration, deceleration, pitch, roll, yaw), determining whether a particular vehicle is more susceptible to near-misses / collisions than others, and also monitoring any "not in motion" states, i.e., when the vehicle is in a live state but not in motion (idling). Time stamps may be included to monitor driver shift patterns. All this data may be utilised to effectively manage/maintain a fleet of vehicles. - Thus, in use, if an
object 17 is detected by the system 1 and is deemed to be "of interest" according to the predetermined user parameters, and an incident occurs, that is to say, theobject 17 is detected and comes within a predetermined distance of thevehicle 13 and encroaches into a defined "critical alert region" 31, footage of the incident is recorded and stored on thedata storage medium 19, which can be downloaded/transferred for review etc. as required. In order to ensure the whole incident is captured, the recorded footage which is subsequently stored can be set to begin a predetermined time period before the actual occurrence of the incident or event. For example, the stored footage may be set to begin up to 20 seconds prior to the incident and end up to 20 seconds after the incident. The original footage which is captured by the cameras will be in its raw format and is generally obtained in H.264 video coding format. TheCPU 11 is provided with further software which enables it, in real time as the footage is being captured, to convert the raw visual/video data from H.264 video coding format to an MPEG-4 Part 14 (MP4) file format. This is achieved by converting each frame as it is captured by theobject detection camera 5 from a standard RGB (3 channel colour) image into a YUV422 (1 channel image with colour data encoded) image. This YUV422 image is added into a rolling buffer of frames which contains up to 10 seconds of still image data (circa 300 frames). From the rolling buffer, each frame is converted into a compressed H.264 stream and passed through the system's "stream to video container" algorithm. This algorithm scans the incoming video data stream and performs live modifications of the raw stream to made it ready for encapsulating within an MP4 video container. As the raw data passes through the algorithm, pertinent information relating to the header and footer information of the output MP4 file is gathered. At the end of the rolling buffer collection the output video file is finalised and ready for exporting within 20-50 milliseconds of the completion of the proximity event/incident. Conversion to a more widely used and accessed MP4 file format means that the download/transfer and viewing of the footage is straightforward and can be played on most types of devices.Figure 4 illustrates a simplified flow diagram broadly highlighting the process the system 1 moves through when scanning for and detecting objects and discerning whether or not they are "of interest". An advantage of the system is that the object detection carried out by thefirst camera 5 is run simultaneously with the depth analysis data obtained by the second andthird cameras object 17 is subsequently determined to be "of interest", then the 3D disparity map is created and the output sent to thedisplay screen 25 and orspeakers - The unique approach of the present invention is to analyse and identify any object of interest in the 2D plane. This involves an initial system sequence to remove any lens distortion from the images captured by the camera by remapping the location of each individual pixel in the image from its original position to a corrected flat perspective image position. This ensures: a stable 3D stereoscopic image provided by both stereo cameras suitable for creating a true disparity map spanning in excess of 160° horizontally; an object detector image without any image distortion to increase detection accuracy; and a mechanism to ensure that the spatial characteristics of both the object of interest and 3D cameras can be accurately aligned to each other.
- Secondly, the analysis of the complete image is achieved by using a Deep Neural Network Image Classifier to inspect each image for the presence of complete and/or partial representations of the chosen object or objects of interest for detection. The system 1 analyses each image to gain detailed information on the object of
interest 17 to determine the need for further computational measures for calculating the distance of the detected object of interest from thecamera apparatus 3 orvehicle 13. When anobject 17 has been identified as requiring 3D positional data, only then does the system create a 3D disparity depth map. As theobject camera 5 anddepth cameras object 17 is interrogated to determine the distance from the object to thecamera apparatus 3. With the full positional data available for all objects of interest detected within the image, the system 1 subsequently provides the operator with a graphical display detailing the proximity of the object of interest in relation to the camera within up to a 160° operational field ofview 15. In the event of an object of interest being detected within the user definable "critical alert region" 31, the system 1 will record the event for future review. Any critical alert will trigger a recording 20 seconds prior to and post of the event giving accurate information as to how and why each event took place. The event recording will be by direct stream conversion of the incoming still images into a compressed video format, which is then in turn remuxed in real time into an exportable video file format. - The system 1 is designed to "teach out" objects that are not objects of interest or that cause confusion within the detection of the object of interest, aided by the incorporation of the YOLO software. Due to the diverse and expandable nature of the deep neural network image classifier utilised in this invention, objects of interest selected for detection can vary vastly in size, orientation, colour and presentation. In the case of humans as the desired object for detection, the system 1 is able to detect independently of stance, height, build, clothing, clothing colour, obscured by other objects, and detection in low/high contrast environments. As both the object detector and 3D analysis operate in parallel, the system 1 is able to operate at an increased frame rate compared to a series process. The higher frame rate enables the system 1 to generate more images per second for analysis to achieve quicker detection rates. This decreases the response time when alerting the driver of the
vehicle 13 of an object ofinterest 17 being detected. This high frame rate means the system 1 can analyse for a detection every 33 milliseconds, which allows for multi-frame detection and therefore greater accuracy of detection whilst still being able to report detections in "real time".
Claims (15)
- An object detection system, said system including an apparatus having:first camera means provided to detect the occurrence of an object within a field of view of said first camera means; andsecond and third camera means in communication with one another, provided to resolve the distance and angle of the detected object relative to the apparatus.
- A system according to claim 1, wherein said first camera means is provided to act as an object detection camera, and said second and third camera means are provided to act as left and right stereo cameras, respectively.
- A system according to claim 1, wherein the system includes computing means, provided in communication with said first, second and third camera means, said computing means arranged to receive and process visual data obtained by the first camera means to discern nature and type of a detect object and subsequently classify it.
- A system according to claim 3, wherein the computing means are further arranged to receive and process data obtained by said second and third camera means, and resolve and process the depth/distance and angle of a detected object.
- A system according to claim 1, wherein said first, second and third camera means are located together in a first camera head, and a second camera head, comprising further, equivalent first, second and third camera means therein, is provided, the second camera head being directed in a substantially opposing direction to that of the first camera head.
- A system according to claim 5, wherein the first and second camera heads are provided directed in substantially opposing directions, at an angle relative to one another, such that the respective fields of view of the first and second camera heads contact or overlap with one another.
- A system according to claim 1, wherein communication means are provided associated with the system, said communication means arranged to enable the communication of data stored on data storage means associated with the system and/or real-time data obtained by the camera means to be transferred/communicated to a remote location.
- A system according to claim 1, wherein the system further includes an accelerometer, said accelerometer in communication with said computing means and arranged to enable the system to adjust the field and depth of view of the first, second and third camera means according to the speed and movement of the apparatus, in use.
- A system according to claim 1, wherein the system further includes notification and/or alert means, arranged to provide a visual and/or audio alert and/or notification on detection of an object deemed to be an object of interest.
- A system according to claim 1, wherein display means are provided associated with the system, arranged to provide a visual representation of the field of view of the first and/or second and third camera means, and provide an indication of the location and proximity of a detected object.
- A vehicle including an object detection system according to any of claims 1-10 provided thereon or therewith.
- A vehicle according to claim 11, wherein said vehicle includes first and second camera heads, said first and second camera heads arranged to be directed in substantially opposing directions, at an angle relative to one another, such that the respective fields of view of the first and second camera heads may contact or overlap with one another.
- A vehicle according to claim 12, wherein a third camera head is included, comprising at least first camera means, said third camera head being located to provide a view from the vehicle in a direction which may otherwise be obscured when the vehicle is carrying a load.
- A vehicle according to claim 11, wherein the vehicle is provided with a first camera head, comprising first camera means, and second and third camera means in communication with one another, and a second camera head arranged to act as an impaired vision "camera head", comprising at least first camera means.
- A vehicle according to claim 11, wherein notification and/or alert means are provided with the vehicle, in communication with the object detection system and arranged to notify and/or alert a driver of the vehicle of a detection of an object, in use, said notification and/or alert means provided in an interior of the vehicle and/or are arranged to be directed towards the driver, in use.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GBGB2207744.0A GB202207744D0 (en) | 2022-05-26 | 2022-05-26 | Object detection system and a method of use thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
EP4283576A1 true EP4283576A1 (en) | 2023-11-29 |
Family
ID=82324015
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP23275082.8A Pending EP4283576A1 (en) | 2022-05-26 | 2023-05-25 | Object detection system and a method of use thereof |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP4283576A1 (en) |
GB (2) | GB202207744D0 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1835439A1 (en) * | 2006-03-14 | 2007-09-19 | MobilEye Technologies, Ltd. | Systems and methods for detecting pedestrians in the vicinity of a powered industrial vehicle |
FR2958774A1 (en) | 2010-04-08 | 2011-10-14 | Arcure Sa | Method for detecting object e.g. obstacle, around lorry from stereoscopic camera, involves classifying object from one image, and positioning object in space around vehicle by projection of object in focal plane of stereoscopic camera |
US11155209B2 (en) * | 2019-08-22 | 2021-10-26 | Micron Technology, Inc. | Virtual mirror with automatic zoom based on vehicle sensors |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11392133B2 (en) * | 2017-06-06 | 2022-07-19 | Plusai, Inc. | Method and system for object centric stereo in autonomous driving vehicles |
DE112020000821T5 (en) * | 2019-02-14 | 2021-11-04 | Jonathan Abramson | VEHICLE NAVIGATION SYSTEMS AND PROCEDURES |
US20200410705A1 (en) * | 2019-04-17 | 2020-12-31 | XRSpace CO., LTD. | System and method for processing image related to depth |
-
2022
- 2022-05-26 GB GBGB2207744.0A patent/GB202207744D0/en not_active Ceased
-
2023
- 2023-05-25 GB GB2307846.2A patent/GB2621012A/en active Pending
- 2023-05-25 EP EP23275082.8A patent/EP4283576A1/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1835439A1 (en) * | 2006-03-14 | 2007-09-19 | MobilEye Technologies, Ltd. | Systems and methods for detecting pedestrians in the vicinity of a powered industrial vehicle |
FR2958774A1 (en) | 2010-04-08 | 2011-10-14 | Arcure Sa | Method for detecting object e.g. obstacle, around lorry from stereoscopic camera, involves classifying object from one image, and positioning object in space around vehicle by projection of object in focal plane of stereoscopic camera |
US11155209B2 (en) * | 2019-08-22 | 2021-10-26 | Micron Technology, Inc. | Virtual mirror with automatic zoom based on vehicle sensors |
Also Published As
Publication number | Publication date |
---|---|
GB202307846D0 (en) | 2023-07-12 |
GB2621012A (en) | 2024-01-31 |
GB202207744D0 (en) | 2022-07-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110163904B (en) | Object labeling method, movement control method, device, equipment and storage medium | |
US10878288B2 (en) | Database construction system for machine-learning | |
US9235990B2 (en) | Vehicle periphery monitoring device | |
US11669972B2 (en) | Geometry-aware instance segmentation in stereo image capture processes | |
CN112800860B (en) | High-speed object scattering detection method and system with coordination of event camera and visual camera | |
JP5922257B2 (en) | Vehicle periphery monitoring device | |
CN113947946B (en) | Port area traffic safety monitoring method based on Internet of vehicles V2X and video fusion | |
CN102792314A (en) | Cross traffic collision alert system | |
CN113240939B (en) | Vehicle early warning method, device, equipment and storage medium | |
JP4967758B2 (en) | Object movement detection method and detection apparatus | |
CN114119955A (en) | Method and device for detecting potential dangerous target | |
CN115083088A (en) | Railway perimeter intrusion early warning method | |
EP3380368B1 (en) | Object detection system and method thereof | |
EP4283576A1 (en) | Object detection system and a method of use thereof | |
KR101793156B1 (en) | System and method for preventing a vehicle accitdent using traffic lights | |
CN113435224A (en) | Method and device for acquiring 3D information of vehicle | |
KR102407202B1 (en) | Apparatus and method for intelligently analyzing video | |
JP4176558B2 (en) | Vehicle periphery display device | |
EP4177694A1 (en) | Obstacle detection device and obstacle detection method | |
US20210056356A1 (en) | Automated system for determining performance of vehicular vision systems | |
US20110080495A1 (en) | Method and camera system for the generation of images for the transmission to an external control unit | |
JPH0979821A (en) | Obstacle recognizing device | |
US20240137473A1 (en) | System and method to efficiently perform data analytics on vehicle sensor data | |
US20240236278A9 (en) | System and method to efficiently perform data analytics on vehicle sensor data | |
CN113496514B (en) | Data processing method, monitoring system, electronic equipment and display equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN PUBLISHED |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC ME MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20240528 |