CN111566660A

CN111566660A - System and method for detecting number of persons in vehicle

Info

Publication number: CN111566660A
Application number: CN201880081102.7A
Authority: CN
Inventors: 沈海峰; 赵元
Original assignee: Beijing Didi Infinity Technology and Development Co Ltd
Current assignee: Beijing Didi Infinity Technology and Development Co Ltd
Priority date: 2018-11-28
Filing date: 2018-11-28
Publication date: 2020-08-21
Also published as: WO2020107251A1

Abstract

A system for determining the number of passengers in a vehicle (100). The system includes at least one camera (110) configured to capture at least one image in the vehicle. The system also includes a controller (120) in communication with the at least one camera (110). The controller (120) is configured to detect at least two human objects (410) from the image, detect one or more vehicle occupants in each human object (410), and determine a number of people based on the detected vehicle occupants.

Description

System and method for detecting number of persons in vehicle

Technical Field

The present application relates to a system and method for detecting the number of people in a vehicle, and more particularly, to a system and method for automatically detecting the number of people in a vehicle based on images acquired within the vehicle.

Background

Network appointment platforms (e.g., DiDi)^TMOnline) provide a ride-sharing service to passengers by dispatching transportation service vehicles (e.g., taxis, private cars, etc.). Certain situations may result in high demand for ride-sharing services, such as during peak hours, severe weather conditions, or before/after a large social gathering. Therefore, it is difficult to find a transport service vehicle. Reducing the waiting time to find a ride may encourage shared drivers and passengers to overload the service vehicle.

However, overloading a vehicle can lead to accidents and can cause safety issues. For example, when a vehicle is carrying more passengers than it is capable of, the weight may affect vehicle steering. In addition, overloaded passengers do not have their own seat belts and may be subjected to more serious injury in an accident. Therefore, it is very important to detect an overloaded vehicle and stop it before any accident occurs.

Existing vehicle overload detection methods include checkpointing to manually screen vehicles and count people within the vehicle. However, manual screening can only inspect vehicles (only vehicles that pass a checkpoint), but cannot effectively inspect all vehicles on the roadway. In addition, the cost of setting up the checkpoint and hiring the inspector is inevitably high. The screen may also slow down traffic and cause traffic congestion.

Embodiments of the present application address the above-mentioned problems by automatically detecting the number of people in a vehicle using images captured by at least one camera in the vehicle.

Disclosure of Invention

Embodiments of the present application provide a system for determining the number of passengers in a vehicle. The system includes at least one camera configured to capture at least one image in the vehicle. The system also includes a controller in communication with the at least one camera. The controller is configured to detect at least two human objects from the image, detect one or more vehicle occupants in each human, and determine a number of people based on the detected vehicle occupants.

Embodiments of the present application also provide a method for determining a number of passengers in a vehicle. The method includes capturing at least one image in the vehicle by at least one camera. The method also includes detecting, by the processor, at least two human objects from the image, and detecting, by the processor, one or more vehicle occupants in each human object. The method also includes determining, by the processor, a number of people based on the detected vehicle occupants.

Embodiments of the present application also provide a non-transitory computer-readable medium storing a set of instructions. When executed by at least one processor of the electronic device, the set of instructions causes the electronic device to perform a method for determining a number of passengers in a vehicle. The method includes capturing at least one image in the vehicle. The method also includes detecting at least two human objects from the image and detecting one or more vehicle occupants in each human. The method also includes determining a number of people based on the detected vehicle occupants.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

Fig. 1 is a schematic view of an exemplary vehicle interior equipped with a people number detection system according to an embodiment of the present application.

FIG. 2 is a block diagram of an exemplary controller shown in accordance with an embodiment of the present application.

FIG. 3 is a data flow diagram of an exemplary processor in the controller shown in FIG. 2, shown in accordance with an embodiment of the present application.

FIG. 4 is a data flow diagram of the exemplary coarse people estimation unit of FIG. 3 shown according to an embodiment of the application.

FIG. 5 is a data flow diagram of the exemplary refined people estimation unit of FIG. 3 shown according to an embodiment of the application.

FIG. 6 is a flow chart of an exemplary method for determining a number of vehicles shown in an embodiment in accordance with the application.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.

Fig. 1 is a schematic diagram of an exemplary vehicle 100 equipped with a collision detection system according to an embodiment of the present application. Consistent with some embodiments, the vehicle 100 may be configured to be remotely controlled and/or autonomously operated by an operator occupying the vehicle. Contemplated vehicles 100 may be electric vehicles, fuel cell vehicles, hybrid vehicles, or conventional internal combustion engine vehicles. The vehicle 100 may have a body that may be any body type, such as a sports vehicle, a coupe, a sedan, a pick-up truck, a station wagon, a Sport Utility Vehicle (SUV), a minivan, or a switch car.

As shown in fig. 1, the interior of the vehicle 100 surrounded by the body may include one or more rows of seats for accommodating persons within the vehicle. For example, the front row of seats may house the driver 102 and passengers (not shown). The rear seats 106 may accommodate one or more passengers, such as the passenger 104. The vehicle 100 may include more than two rows of seats for accommodating more passengers. In some embodiments, the armrests or cup holders may be mounted between seats. For example, the cup holder can receive a water bottle 108.

The vehicle 100 may be designed to accommodate a limited number of passengers, which is referred to as vehicle capacity. For example, a sports car may have a capacity of 2-4, a compact vehicle or sedan may have a capacity of 4-5, an SUV may have a capacity of 5-7, and a minivan may have a capacity of 7-8. If more passengers than their designed capacity are loaded in the vehicle 100, the vehicle 100 is overloaded. In some embodiments, the vehicle 100 may be equipped with a people detection system to automatically determine the number of people in the vehicle in order to detect an overload condition.

As shown in fig. 1, the people number detection system includes, among other things, at least one camera 110 and a controller 120. The camera 110 may be mounted or otherwise installed in the vehicle 100. In some embodiments, the camera 110 may be mounted on the dashboard, above the windshield, on the ceiling, in a corner, or the like. In some embodiments, the camera 110 may be integrated in a mobile device, such as a mobile phone, tablet, or Global Positioning System (GPS) navigation device mounted on the dashboard of the vehicle 100. In some embodiments, the camera 110 may be configured to capture images within the vehicle 100 while the vehicle 100 is completing a service trip. Consistent with the present application, camera 110 may be a digital camera or digital still camera for taking pictures or video of the interior of vehicle 100. The images may capture various objects within the vehicle 100, such as the driver 102, the passenger 104, the empty seat 106, and the water bottle 108.

In some embodiments, multiple cameras 110 may be mounted at different locations within the vehicle 100 and take interior photographs from different perspectives. The camera 110 may continuously capture images as the vehicle 100 travels toward the destination. Each image captured at a particular point in time is referred to as an image frame. For example, the camera 110 may record a video consisting of a plurality of image frames captured at a plurality of points in time.

Returning to fig. 1, in some embodiments, the camera 110 may communicate with the controller 120. In some embodiments, the controller 120 may be an onboard controller of the vehicle 100, such as an electronic control unit or a vehicle information controller. In some embodiments, the controller 120 may be part of a local physical server, a cloud server (as shown in fig. 1), a virtual server, a distributed server, or any other suitable computing device. The controller 120 may communicate with the camera 110 and/or other components of the vehicle 100 via a network, such as a Wireless Local Area Network (WLAN), a Wide Area Network (WAN), a wireless network such as radio waves, a cellular network, a satellite communication network, and/or a local or short range wireless network (e.g., Bluetooth)^TM)。

Consistent with the present application, the controller 120 may be responsible for processing images captured by the camera 110 and detecting vehicle-mounted conflicts based on the images. In some embodiments, the controller 120 may identify human objects, such as the driver 102 and one or more passengers 104, using various image processing methods. For example, the controller 120 may perform image segmentation and object classification methods to identify human objects and determine a coarse population based thereon. Depending on the viewing angle of the camera 110 and the seating position of the passengers, one passenger may be completely or partially hidden in the image because the passenger is in front of him. Thus, a detected human subject may sometimes contain more than one passenger. In some embodiments, the controller 120 may further detect one or more vehicle occupants in each human subject and determine a fine population based on the total vehicle occupants detected in the vehicle 100. For example, if two human objects are detected, one including one occupant and the other including two, the fine population is three. In some embodiments, the controller 120 may compare the determined number of people to the capacity of the vehicle 100 to detect an overload condition.

For example, fig. 2 is a block diagram of an exemplary controller 120 shown in accordance with an embodiment of the present application. Consistent with the present application, controller 120 may receive image data 203 from one or more cameras 110. In some embodiments, image data 203 may include a two-dimensional (2D) image or a three-dimensional (3D) image. In some embodiments, when multiple cameras 110 are installed at different locations within vehicle 100, image data 203 may include image data captured from different perspectives.

The controller 120 may determine a coarse population based on the human object detected from the image data 203 and a fine population based on the vehicle occupant detected from the human object. The number of people may then be used to detect an overload condition in the vehicle 100. In some embodiments, as shown in fig. 2, controller 120 includes a communication interface 202, a processor 204, a memory 206, and a storage 208. In some embodiments, the controller 120 comprises different modules in a single device, such as an Integrated Circuit (IC) chip (implemented as an Application Specific Integrated Circuit (ASIC) or Field Programmable Gate Array (FPGA)), or a separate device with dedicated functionality. In some embodiments, one or more components of the controller 120 may be located in the cloud, or may be replaced in a single location (such as within the vehicle 100 or within a mobile device) or in a distributed location. The components of the controller 120 may be in an integrated device or distributed in different locations but in communication with each other via a network (not shown).

Communication interface 202 may be via a communications cable, a Wireless Local Area Network (WLAN), a Wide Area Network (WAN), a wireless network such as radio waves, a cellular network, and/or a local or short-range wireless network (e.g., Bluetooth)^TM) Andor other communication method, to transmit and receive data from a component, such as the camera 110, in some embodiments the communication interface 202 can be an Integrated Services Digital Network (ISDN) card, cable modem, satellite modem, or modem to provide a data communication connection. As another example, communication interface 202 may be a Local Area Network (LAN) card to provide a data communication connection to a compatible LAN. The wireless link may also be implemented by the communication interface 202. In such implementations, communication interface 202 may send and receive electrical, electromagnetic or optical signals that carry digital data streams representing various types of information over a network.

Consistent with some embodiments, the communication interface 202 may receive image data 203 captured by the camera 110. The communication interface 202 may also provide the received data to the memory 208 for storage or to the processor 204 for processing.

The processor 204 may comprise any suitable type of general or special purpose microprocessor, digital signal processor, or microcontroller. The processor 204 may be configured as a separate processor module dedicated to performing in-vehicle collision detection based on image data captured by the camera 110. Alternatively, the processor 204 may be configured as a shared processor module for performing other functions.

As shown in FIG. 2, the processor 204 includes a plurality of modules, such as a coarse people estimation unit 210, a fine people estimation unit 212, a people determination unit 214, and the like. In some embodiments, the processor 204 may additionally include an overload detection unit 216. These modules (and any corresponding sub-modules or sub-units) may be hardware units (e.g., portions of an integrated circuit) of the processor 204 that are designed to be used with other components or software units implemented by the processor 204 by executing at least a portion of a program. The program may be stored on a computer readable medium and when executed by the processor 204, it may perform one or more functions. Although FIG. 2 shows all of the units 210 within one processor 204 and 216, it is contemplated that the units may be distributed among multiple processors that are close or remote from each other.

Fig. 3 is a data flow diagram 300 for the processor 204 in the controller 120 shown in fig. 2, shown in accordance with an embodiment of the present application. As shown in fig. 3, the rough people estimation unit 210 may receive the image data 203 from the communication interface 202 and be configured to determine a rough number of people based on human objects detected from the image data 203. The fine head count estimation unit 212 may further detect one or more vehicle occupants in each human subject detected in the coarse head count estimation unit 210, and determine a fine head count based on the detected total vehicle occupants. The coarse population and the fine population may be provided to the population determination unit 214, based on which a final population 302 is determined.

In some embodiments, the image segmentation and object detection methods may be applied by the coarse people number estimation unit 210 to identify human objects. For example, fig. 4 is a data flow diagram 400 of the exemplary coarse people number estimation unit 210 of fig. 3 shown in accordance with an embodiment of the application. As shown in fig. 4, the rough person number estimation unit 210 may further include an object segmentation unit 402 and a human object detection unit 404. The object segmentation unit 402 may receive the image data 203 from the communication interface 202 and apply segmentation to the image data 203 to identify objects from the image. The objects identified by image segmentation may include various objects within the vehicle 100, such as human objects, empty seats, bags, safety belts, bottles or cups placed in cup holders, and other objects that may be mounted or brought into the vehicle 100. In some embodiments, the object segmentation unit 402 may apply the object segmentation model 406 to perform image segmentation. The object detection model 408 may be a machine learning model, such as a CNN model, trained using images trained in these images and corresponding objects.

Then, the human object detection unit 404 may detect a human object among the recognized objects using the object detection model 408. In some embodiments, the object detection model 408 may be a machine learning model, such as a CNN model, trained using images trained in these images and the corresponding types of objects. For example, the training object images may be labeled with known objects (e.g., human body, seat, water bottle, etc.) depicted therein. In some embodiments, the human object may be identified by determining contour information of the human object.

In some alternative embodiments, the object segmentation unit 402 and the human object detection unit 404 may be switched in order such that object detection is performed prior to human object segmentation. For example, the human object detection unit 404 may determine a boundary region containing a human object from the image data 203, e.g. by applying the object detection model 408. The bounding region may be any suitable shape, such as rectangular, square, circular, oval, diamond, and the like. The object segmentation unit 402 may then apply an object segmentation model 406 to segment each boundary region to identify human objects therein.

The rough people estimation unit 210 may provide two outputs: a detected human subject 410 and a rough population 412. In some embodiments, the coarse population 412 is the number of human objects 410 detected. The human subject 410 may be received and used by the fine people estimation unit 212 to further determine a fine people number. The people determination unit 214 may receive the rough number of people 412 to determine the final number of people.

In some embodiments, the fine people estimation unit 212 may apply head detection and/or skeletal keypoint detection to detect one or more vehicle passengers in each human object. For example, fig. 5 is a data flow diagram 500 of the exemplary refined people number estimation unit 212 of fig. 3 shown in accordance with an embodiment of the application. As shown in fig. 5, the fine people estimation unit 212 may also include a head detection unit 502, a skeleton detection unit 504, and a fusion unit 510. In some embodiments, the refined people number estimation unit 212 may include only one of the head detection unit 502 and the skeleton detection unit 504, and the fusion unit 510 may be omitted.

The head detection unit 502 and the skeleton detection unit 504 may receive the human objects 410 from the coarse people number estimation unit 210, respectively, and further detect one or more vehicle occupants in each human object. In some embodiments, the processing by head detection unit 502 and skeleton detection unit 504 may be performed in parallel. The head detection unit 502 may apply a head detection model 506 to detect the human head. The head detection model 506 may be a machine learning model, such as a CNN model, trained using a trained image labeled in a training image and a human head. In some embodiments, the fine people number estimation unit 212 may use the total number of people detected on all human subjects as the fine people number. For example, if two heads are detected in the human object I and another two heads are detected in the human object II, the number of fine persons is determined to be four.

The skeleton detection unit 504 may apply a skeleton detection module 508 to detect human skeletons in each human object. Unlike head detection model 506, which focuses on human head features, skeleton detection unit 504 focuses on key points of human skeletons to detect different skeletons. The skeleton detection module 508 may be a machine learning model, such as a CNN model, trained using training images labeled in the training images and a human skeleton. The human skeletal structure may be defined by a number of key points, such as the head, neck, shoulders, wrists, legs, feet, arms, hands, etc. These keypoints may be marked in the training image. In some cases, to detect different passengers in a human subject, skeleton detection may be more accurate than head detection. For example, if the head of a passenger behind the driver is completely invisible in the image, the head detection method may not be able to determine whether there is another passenger behind the driver. However, as long as some key skeleton points of the passenger are visible in the image, the skeleton detection method is able to identify the passenger as a unique passenger.

In some embodiments, head detection and skeleton detection may be performed simultaneously, as shown in fig. 5, to further improve detection accuracy. The detection results from the head detection unit 502 and the skeleton detection unit 504 may be provided to a fusion unit 510, which fuses the detection results to provide a final passenger detection. In some embodiments, the fusion unit 510 may perform an OR operation on the two detection results. That is, if one detection method returns two passengers in a human subject and the other detection method returns one occupant in the same human subject, the fusion unit 510 will take two results. In some other embodiments, head detection model 506 and skeleton detection module 508 may be jointly trained and applied by fusion unit 510 to detect passengers. The fusion unit 510 outputs the refined population 512 to the population determination unit 214.

Referring back to fig. 3, the head count determination unit 214 determines the final head count 302 based on the coarse head count 412 and the fine head count 512. In some embodiments, the number of people determination unit 214 may perform the maximum operation. For example, if the coarse population 412 is a and the fine population 512 is b, the final population 302 may be determined as c ═ max (a, b).

In some embodiments, the processor 204 may repeatedly execute the data flow diagram 300 to identify the number of people or detect any change in the number of people in the vehicle. If a number of persons is detected based on image data acquired at a specific point in time or within a short period of time, the detection result may be unreliable. For example, the passenger 104 may occasionally bend to pick up items from the floor and thus be lost entirely from the image data 203. Thus, the processor 204 may periodically repeat the people detection to identify the final number of people and reduce the likelihood of under-counting. In some embodiments, the processor 204 may generate control signals to cause the camera 110 to acquire more images over a relatively long period of time (e.g., 10, 20, or 30 seconds). Alternatively, if the camera 110 captures video containing multiple image frames, the processor 204 may sample the image frames over a period of time (e.g., 10, 20, or 30 seconds). The processor 204 may repeat the detection process performed by the unit 210 and 214 for each image frame. The number-of-persons determination unit 214 may confirm the final number of persons if the same number of persons is continuously detected on the sample image frame. If the number of people changes over time, the number of people determination unit 214 may query vehicle operation information, such as vehicle stop, door open, weight change, etc., to determine whether the change in number of people is caused by passenger loading or unloading.

Referring back to fig. 2, the overload detection unit 216 may detect an overload condition by comparing the end population to a threshold. In some embodiments, the threshold may be predetermined as the vehicle capacity. For example, if five passengers are detected in a 4-passenger compact vehicle, an overload condition is detected. Upon detection, the processor 204 may generate a control signal to trigger an alarm and send the control signal to the terminal 230 via the communication interface 202. In some embodiments, the terminal 230 may be a driver terminal or a passenger terminal, such as a smartphone, PDA, wearable device, or the like. For example, the driver/passenger terminal may be installed with a ride-sharing application for the driver/passenger for transportation services. The overload condition may be notified to the driver/passenger via the terminal 230 to prompt the driver/passenger to end the overload condition. In some embodiments, the control signal may cause the terminal 230 to generate a warning notification, such as a pop-up window, beep, vibration or audio alert on a display screen of the terminal 230, or the like.

In some embodiments, the terminal 230 may be a regulation module of a service platform, or a server/controller of a police department. In some embodiments, the control signal may trigger a telephone call to the terminal 230 to report an overload condition. In some other embodiments, the control signal may trigger data transmission to the terminal 230, including, for example, vehicle registration information, driver information, passenger information, vehicle location, and the end population. The terminal 230 may intervene to request the driver/passenger to stop the overload condition immediately. For example, a police department may send a police officer near the vehicle location to chase and stop the vehicle 100.

Memory 206 and storage 208 may comprise any suitable type of mass storage provided to store any type of information that processor 204 may need to operate. The memory 206 and storage 208 may be volatile or non-volatile, magnetic, semiconductor, tape, optical, removable, non-removable, or other types of storage devices or tangible (i.e., non-transitory) computer-readable media, including but not limited to ROM, flash memory, dynamic RAM, and static RAM. The memory 206 and/or storage 208 may be configured to store one or more computer programs that may be executed by the processor 204 to perform the image data processing and collision detection disclosed herein. For example, the memory 206 and/or storage 208 may be configured to store a program that may be executed by the processor 204 to determine a number of people in the vehicle 100 and detect an overload condition based on the number of people.

Memory 206 and/or storage 208 may be further configured to store information and data used by processor 204. For example, the memory 206 and/or storage 208 may be configured to store various types of data (e.g., image data 203) captured by the camera 110 and data related to camera settings. The memory 206 and/or storage 208 may also store intermediate data, such as human objects, head and skeletal features, and the like. The memory 206 and/or storage 208 may further store various learning models used by the processor 204, such as an object segmentation model 406, an object detection model 408, a head detection model 506, and a skeleton detection module 508. Various types of data may be permanently stored, periodically removed, or immediately ignored after processing each data frame.

FIG. 6 is a flow chart of an exemplary method 600 for determining the number of people in a vehicle, shown in an embodiment according to the present application. In some embodiments, the method 600 may be implemented by the controller 120, the controller 120 including, among other things, the processor 204. However, the method 600 is not limited to this exemplary embodiment. The method 600 may include steps S602-S618 as described below. It should be understood that some steps may be optional to perform the applications provided herein. Further, some steps may be performed simultaneously, or in a different order than shown in fig. 6.

In step S602, when the vehicle 100 is completing a service trip, the camera 110 captures image data 203 of at least one object in the vehicle 100. In some embodiments, multiple cameras 110 may be mounted at various locations within the vehicle 100, and may capture image data simultaneously from different angles. For example, the camera 110 may be a rear-facing camera mounted on the dashboard of the vehicle 100 or embedded in a GPS navigation device or a cellular phone mounted on the dashboard of the vehicle 100. In some embodiments, the objects may include or bring a driver (e.g., driver 102), one or more passengers (e.g., passenger 104), an empty seat (e.g., empty seat 106), a seat belt, and any other items installed within vehicle 100 into vehicle 100 (e.g., water bottle 108).

Camera 110 may be configured to capture image data 203 continuously or at particular points in time. For example, the camera 110 may be a video camera configured to capture video containing a plurality of image frames. In some embodiments, image data 203 may include 2D images and/or 3D images. The image data 203 captured by the camera 110 may be sent to the controller 120, for example, via a network.

In step S604, the controller 120 identifies an object from an image within the image data 203 using the object segmentation model 406. The objects identified by image segmentation may include various objects within the vehicle 100, such as human objects, empty seats, bags, safety belts, bottles or cups placed in cup holders, and other objects that may be mounted or brought into the vehicle 100. The object detection model 408 may be trained using the trained ones of the images and the corresponding objects.

In step S606, the controller 120 may identify a human object among the objects detected in step S604 using the object detection model 408. The object detection model 408 may be trained using the training object images and the labeled objects in these images. In some embodiments, the human object may be identified by determining contour information of the human object.

In some embodiments, step S604 and step S606 may be switched in order. That is, the controller 120 may first perform object detection using the object detection model 408 to determine boundary regions containing human objects, and then segment each boundary region using the object segmentation model 406 to identify human objects. In step S608, the controller 120 determines a rough number of persons based on the human subject detected in step S606.

In step S610, the controller 120 detects the head in each human subject using the head detection model 506. The head detection model 506 may be trained using images trained in the training images and the human head. In step S612, the controller 120 detects skeletal keypoints in each human object using the skeletal detection module 508. The skeleton detection module 508 may be trained using the labeled training images in the training images and the human skeleton key points. In some embodiments, the controller 120 may perform steps S610 and S612 in parallel to obtain head detection and skeleton detection results. In some embodiments, one of steps S610 and S612 may be optional and omitted from method 600.

In step S614, the controller 120 determines the number of fine persons in the vehicle. In some embodiments, the controller 120 may use the total number of detected heads in all human subjects as the fine population. In some other embodiments, the controller 120 may use the total number of different human skeletal structures detected on all human subjects as the refined population. In other embodiments, the detection results from steps S610 and S612 may be fused to determine a final passenger detection. For example, the controller 120 may perform an OR operation on the two detection results.

In step S616, the controller 120 may compare the final number of people with a preset threshold. For example, the threshold may be set to the vehicle capacity. If the number of people exceeds the threshold (S616: YES), a vehicle overload condition is detected and the method 600 proceeds to step S618 to generate an alert. Otherwise (S616: NO) the method 600 returns to step S602 to continue capturing images within the vehicle 100 and then repeats steps S604-S616 to determine if the vehicle 100 is overloaded. In some embodiments, an overload may be confirmed if the overload condition detected at step S616 is permanently detected over a plurality of image frames captured by the camera 110.

The controller 120 generates a control signal to trigger an alarm and transmits the control signal to the terminal 230 at step S618. In some embodiments, the terminal 230 may be a driver terminal or a passenger terminal for a ride-sharing service. Through the terminal 230, the driver or passengers in the vehicle 100 may be notified of the overload condition and prompted to stop the condition. For example, the control signal may cause the terminal 230 to generate a warning notification, such as a pop-up window on a display screen of the terminal 230, a beep, a vibration or audio alert, or the like. In some embodiments, the controller 120 may further generate a control signal to trigger an alarm to other terminals 230 such as a service platform or police department if the condition still exists after the alert. In some embodiments, the control signal may trigger a telephone call or send data to the alert receiver 230. For example, the data transmission may include, for example, vehicle registration information, driver information, passenger information, vehicle location, and the ultimate number of people in the vehicle.

Another aspect of the application relates to a non-transitory computer-readable medium storing instructions that, when executed, cause one or more processors to perform a method as described above. The computer-readable medium includes volatile or nonvolatile, magnetic, semiconductor, tape, optical, removable, non-removable, or other types of computer-readable medium or computer-readable storage device. For example, a computer-readable medium as in the present application may be a storage device or a storage module having stored thereon computer instructions. In some embodiments, the computer readable medium may be a disk or flash drive having computer instructions stored thereon.

It will be apparent that various modifications and variations can be made in the system and related methods of the present application by those of ordinary skill in the art. Other embodiments will be apparent to those skilled in the art from consideration of the specification and practice of the system and associated method of the present application.

It is intended that the specification and examples be considered as exemplary only, with a true scope being indicated by the following claims and their equivalents.

Claims

1. A system for automatically determining the number of passengers in a vehicle, comprising:

at least one camera configured to capture at least one image in the vehicle; and

a controller is in communication with the at least one camera and is configured to:

detecting at least two human objects from the image;

detecting one or more vehicle occupants in each human subject; and

determining the number of people based on the detected vehicle occupants.

2. The system of claim 1, wherein to detect the human subject, the controller is configured to:

determining a boundary region containing the human object from the image; and

segmenting the boundary region to detect the human object.

3. The system of claim 1, wherein to detect the human subject, the controller is configured to:

segmenting the at least one image to identify an object; and

a human object is detected in the object based on an object detection model.

4. The system of claim 1, wherein to detect one or more vehicle occupants in each human subject, the controller is configured to detect at least one head in the human subject based on a head detection model.

5. The system of claim 4, wherein the number of people is a total number of heads detected in the human subject.

6. The system of claim 1, wherein one or more vehicle occupants are detected in each human subject, the controller configured to:

detecting key skeleton points in the human object based on a skeleton detection model; and

mapping the key skeletal points to the one or more vehicle occupants.

7. The system of claim 1, wherein the controller is further configured to determine a coarse number of people in the vehicle based on the identified human subject.

8. The system of claim 1, wherein the controller is further configured to:

detecting an overload condition by comparing the number of people to a capacity of the vehicle; and

an alarm is issued upon detection of the overload condition.

9. The system of claim 1, wherein the controller is further configured to detect an getting on or off event for a passenger based on a change in the number of people over time.

10. A method for automatically determining the number of passengers in a vehicle, comprising:

capturing at least one image in the vehicle by at least one camera;

detecting, by a processor, at least two human objects from the image;

detecting, by the processor, one or more vehicle occupants in each human subject; and

determining, by the processor, the number of people based on the detected vehicle occupants.

11. The method of claim 10, wherein detecting the human subject comprises:

determining a boundary region containing the human object from the image; and

segmenting the boundary region to detect the human object.

12. The method of claim 10, wherein detecting the human subject comprises:

segmenting the at least one image for identifying the object; and

detecting a human object among the objects based on an object detection model.

13. The method of claim 10, wherein detecting the one or more vehicle occupants in each human subject comprises detecting at least one head in the human subject based on a head detection model.

14. The method of claim 13, wherein the number of persons is a total number of heads detected in the human subject.

15. The method of claim 10, wherein detecting the one or more vehicle occupants in each human subject comprises:

detecting key skeleton points in the human object based on a skeleton detection model;

mapping the key skeletal points to the one or more vehicle occupants.

16. The method of claim 10, further comprising determining a coarse number of people in the vehicle based on the identified human subject.

17. The method of claim 10, further comprising:

an alarm is issued upon detection of the overload condition.

18. The method of claim 10, further comprising detecting an getting on or off event for a passenger based on a change in the number of people over time.

19. A non-transitory computer-readable medium storing a set of instructions that, when executed by at least one processor of an electronic device, cause the electronic device to perform a method for automatically determining a number of passengers in a vehicle, comprising:

capturing at least one image in the vehicle;

detecting at least two human objects from the image;

detecting one or more vehicle occupants in each human subject; and

determining the number of people based on the detected vehicle occupants.

20. The non-transitory computer-readable medium of claim 19, wherein the method further comprises:

an alarm is issued when the overload condition is detected.