WO2020058215A1

WO2020058215A1 - Video surveillance system installed on board a rail vehicle

Info

Publication number: WO2020058215A1
Application number: PCT/EP2019/074762
Authority: WO
Inventors: Andrei Stoian; Johann MENEUR
Original assignee: Thales
Priority date: 2018-09-18
Filing date: 2019-09-17
Publication date: 2020-03-26
Also published as: EP3853762A1; CL2021000647A1; SG11202102532UA; FR3086085A1; FR3086085B1

Abstract

The invention concerns a video surveillance system (1) installed on board a rail vehicle, comprising a plurality of video surveillance cameras (41, 4i, 4N) installed on board the rail vehicle, each camera (41, 4i, 4N) having a spatial position and an associated field of view, each camera being capable of capturing a video stream made up of a succession of digital images, the system (1) further comprising an on-board electronic calculation device (10) comprising at least one calculation acceleration device (14). Each camera is connected to the on-board electronic calculation device and is capable of transmitting captured video streams to the electronic calculation device, the electronic calculation device (10) being capable of obtaining a breakdown of the plurality of video surveillance cameras into a sequence of groups of cameras (G1,.. GN) to be processed successively over time, on a circular repetition basis. The device comprises modules that are capable, for each group of cameras, of obtaining (20) video streams, applying (22) a digital image analysis in order to calculate values of descriptors, and applying (24) at least one situation analysis algorithm.

Description

CCTV system on board a rail vehicle

The present invention relates to a video surveillance system on board a railway vehicle, and to an associated video surveillance method.

The invention relates to the general field of automated video surveillance, based on video streams from video surveillance cameras, and in particular to the application of video surveillance in the railway sector.

In this area, specific constraints arise, linked in particular to railway standards, vibrations caused by the running of a vehicle (train or metro for example), the space available, and the limitation in electricity consumption.

The algorithms for processing and analyzing video streams are increasingly complex and consume computing power and energy, to allow analysis and detection of situations on board a vehicle, for example the detection of abnormal behavior or aggressive, falling passengers, or overloading the occupation of a railway vehicle. Such known algorithms are currently implemented on computers located in a ground processing center, which have sufficient computing capacities, on video streams transmitted from video surveillance cameras on board a rail vehicle. The processing in a data center on the ground is limited by the bandwidth capacities of a communication network making it possible to route the video streams to the computers, which induces a time delay and consequently, it is difficult to obtain real-time situation analysis results, which can be critical, especially if a detected situation poses a risk to passenger safety.

There are also video surveillance cameras, or CCTV cameras (from the English "closed-circuit television"), integrating calculation processors and adapted to implement image analysis processing. However, each camera is only able to process the images of the video streams it acquires, and therefore it is not possible in this case to perform a situation analysis by correlating information from several cameras. In addition, cameras compatible with railway standards have limited computing power, and therefore can only implement simple processing, for example algorithms for detecting movement in successive images of a video stream. Such processing is insufficient for an analysis of the situation of people and objects present in a part of a railway vehicle, and therefore to detect possible risk situations.

The object of the invention is to remedy the drawbacks of the state of the art.

To this end, the invention provides a video surveillance system on board a rail vehicle, comprising a plurality of video surveillance cameras on board the rail vehicle, each camera having a spatial position and an associated field of view, each camera being adapted to capture a video stream composed of a succession of digital images. This system also comprises an electronic on-board calculation device comprising at least one calculation acceleration device. Each camera is connected to the on-board electronic computing device and is adapted to transmit captured video streams to said electronic computing device, the electronic computing device being adapted to obtain a division of the plurality of CCTV cameras into a series of groups of cameras. to be treated successively over time, according to a circular repetition, and for each group of cameras treated successively:

- obtain video streams from said group of cameras, each video stream containing at least one digital image,

- apply an analysis of digital images to calculate values of descriptors characteristic of objects present in the or each image of each video stream originating from a camera of said group of cameras,

- applying at least one situation analysis algorithm using the descriptor values calculated for at least one image of each video stream of said group of cameras processed to obtain a situation analysis result in the field of view of the cameras of said group of cameras.

Advantageously, the system of the invention makes it possible to carry out an automatic analysis of video streams captured by groups of cameras, the groups of cameras being treated successively, with a circular repetition, by the same on-board electronic calculation device comprising a calculation accelerator. . Thus, the railway constraints are respected, and the repetition of the processing of the video streams by groups of cameras allows an automated video surveillance and almost in real time for the whole of the railway vehicle.

The video surveillance system according to the invention may also have one or more of the characteristics below, taken independently or in any technically conceivable combination.

Digital image analysis implements, in each digital image of a video stream, a detection of people and / or a skeleton extraction associated with a person present in the vehicle.

The values of characteristic descriptors include spatial positions, in a frame of reference associated with the digital image, of articulations associated with a person present in the scene. The system is on board a rail vehicle comprising a plurality of cars, each camera group being formed of cameras installed on board a car of the vehicle.

The system is adapted to implement several different situation analysis algorithms as a function of a railway vehicle condition.

The situation analysis algorithm implements an abnormal event detection, and in the event of positive detection for a current group of cameras, the system is adapted to continue processing images of video streams from the current group of cameras.

The system is further adapted to carry out an alert raising after the situation analysis application.

The system includes a timing module adapted to select a next group of CCTV cameras to be processed, based on a situation analysis result.

According to another aspect, the invention relates to a video surveillance method implemented by a video surveillance system as briefly described above. This process includes steps consisting in:

obtain a division of the plurality of video surveillance cameras into a series of groups of cameras to be treated successively over time, according to a circular repetition, and for each group of cameras treated successively:

a) obtain video streams from said group of cameras, each video stream containing at least one digital image,

-b) apply a digital image analysis to calculate descriptor values characteristic of objects present in the or each image of each video stream originating from a camera of said group of cameras,

-c) apply at least one situation analysis algorithm using the descriptor values calculated for at least one image of each video stream of said group of cameras processed to obtain a situation analysis result in the field of view of the cameras of said group of cameras.

The method provides the same advantages as the video surveillance system briefly described above.

In one embodiment, the method includes a step of selecting a next group of cameras to be processed.

In one embodiment, the situation analysis algorithm implements an abnormal event detection, and in the event of a positive event detection abnormal for a current group of cameras, the method repeats steps a), b) and c) for the current group of cameras.

In one embodiment, an alert is also raised in the event of positive detection of an abnormal event.

According to another aspect, the invention also relates to a computer program comprising software instructions which, when executed by a programmable electronic device, implement a video surveillance method as briefly described above.

Other characteristics and advantages of the invention will emerge from the description which is given below, for information and in no way limitative, with reference to the appended figures, among which:

- Figure 1 is a schematic representation of an on-board video surveillance system according to an embodiment of the invention;

- Figure 2 is a schematic illustration of a first embodiment of a circular repetition of treatments by groups of cameras;

- Figure 3 is a schematic illustration of a second embodiment of a circular repetition of treatments by groups of cameras;

- Figure 4 is a block diagram of the main steps of an automatic video surveillance method according to one embodiment.

The invention will be described below in its application in a railway video surveillance system, on board a railway vehicle, for example a train or a metro.

FIG. 1 schematically illustrates functional blocks of a video surveillance system 1 in one embodiment, the video surveillance system being carried on a rail vehicle 2, for example a train.

The video surveillance system 1 comprises a plurality of on-board cameras 4 _I , ..., 4 _Î , ... 4 _N , the number N being an integer chosen according to various constraints, for example the number N being the number of rail cars 2.

For example, in one embodiment, the vehicle 2 has 3 cars, and 2 video surveillance cameras are positioned spatially in each car, for example integrated into the ceiling or the side walls of the car and adapted to capture images with a field of view. associated shooting, the shooting field being oriented towards the interior of the car. Preferably, the system includes 2 to 8 cameras by car of a railway vehicle, their spatial position being chosen so that their combined fields of view cover the interior space of the railway car.

For example, in one embodiment, each camera is adapted to acquire video of resolution 1080 × 720 pixels.

In the video surveillance system 1, the cameras are grouped into groups of cameras denoted Gi to G _L , forming a series of groups of cameras. For example, L = 3 groups of cameras are distinguished in the example of FIG. 1. The order of the groups of cameras in this series of groups of cameras is chosen, for example according to an order of the railway cars.

The video surveillance system 1 also includes a programmable electronic device 10 adapted to perform calculations, on board the vehicle 2, simply called calculation device 10 below.

Each video surveillance camera 4 comprises a communication module, suitable for implementing a given communication protocol, for example the IP protocol (Internet Protocol), connected by a communication link 8, to the calculation device 10, for example a wired connection.

The calculation device 10 also includes communication modules 9i, ..., 9i, ... 9 _N enabling a communication connection to be made with each on-board camera 4.

Thus, the device 10 is capable of acquiring video streams captured by each of the on-board video surveillance cameras, substantially in real time.

In one embodiment, each video stream composed of a succession of digital images is coded in a compressed format, so as to reduce the amount of binary data to be transmitted to the computing device 10. For example, the H264 compression format is used. The calculation device 10 is suitable for decoding video streams in H264 format.

The computing device 10 comprises one or more CPUs (central processing unit), having an operating frequency of at least 1.4 GHz. It also comprises at least one calculation acceleration device 14, adapted materially for carrying out image processing calculations, for example convolutions, in an optimized manner. The calculation acceleration device 14 is for example a GPU processor (“Graphics Processing Unit”), TPU (“tensor procesing unit”), DSP (“digital signal processor”) or FPGA (“Field Programmable Gate Array”). Preferably, the calculation acceleration device 14 is capable of performing at least one trillion floating point operations per second, while consuming less than 15W.

The calculation device 10 also includes a memory 16 capable of storing digital data, in particular software program computer instructions, configuration files and parameter values used by software implementing a video surveillance method as described when executed by the processors 12 and the calculation acceleration device (s) 14.

The video surveillance method of the invention is implemented, in one embodiment, by modules 20, 22, 24, 26 of software instructions.

The module 20 for obtaining video streams makes it possible to obtain, over a time interval, video streams, preferably time synchronized, originating from all the video surveillance cameras of the same group of video surveillance cameras. Each video stream includes at least one digital image acquired at a time point.

The image analysis module 22 implements one or more image processing algorithms, making it possible to calculate characteristic values of objects present in each image. The term objects is to be understood here in the broad sense, and includes inanimate objects, people, animals. For example, this is a people detection algorithm, and a skeleton detection algorithm, as described in more detail below.

The situation analysis module 24 implements one or more situation analysis algorithms as a function of the characteristic values of calculated objects, on several images of all the video streams coming from the video surveillance cameras of the group treated.

The timing module 26 allows the selection of a next group of video surveillance cameras, making it possible to successively select all the groups of cameras, and to perform a circular repetition, that is to say again to select the first group of cameras in the suite of camera groups after processing the last group of cameras in this suite. In other words, the video streams supplied by the CCTV cameras are processed by groups of video streams, according to a carousel processing.

The rate of rotation between successive groups of cameras providing video streams to be processed is chosen, and varies for example depending on the situation analysis to be carried out by the module 24.

For example, for an estimation of passenger density, the analysis of a single image of a video stream by camera is sufficient, while for a detection event, for example aggression, it is necessary to process a number of images, for example equal to 48, of each video stream of a group of cameras before moving on to the next group of cameras.

FIG. 2 schematically illustrates a first embodiment of the processing of two groups of cameras Gi and G ₂ , each comprising two separate cameras and providing two video streams, by time diagrams 32, 34, 36, 38 represented in parallel. FIG. 2 schematically illustrates the temporal sequencing of the processing of the respective video streams by the processor 12 and the calculation acceleration device 14 of the calculation device 10.

Thus, at the instant noted t ₀ , a first image h _{, i} of the first video stream Vu of the first group of cameras Gi is acquired, as well as a first image li _{, 2} of the second video stream V _2i of the first group of cameras Gi. Then, at a time instant ti, an analysis of the image read is carried out, and at the time instant t ₂ , an analysis of the image li _{, 2} is carried out. For example, the analysis processing is a skeleton detection.

At a later time t ₃ , the results of the preceding analyzes are merged to obtain descriptor values characteristic of the objects present in the scene captured by the two cameras of the group Gi. these characteristic descriptor values being stored.

Next, the method processes the video streams from the second group of cameras G ₂ .

Thus, at the instant noted t ₄ , a first image Ji _{, i} of the first video stream Vi ₂ of the second group of cameras G ₂ is acquired, as well as a first image Ji _{, 2} of the second video stream V ₂₂ of the second group of G ₂ cameras. Then, at a time instant t ₅ , an analysis of the image Ji _{, i} is carried out, and at the time instant t ₆ , an analysis of the image Ji _{, 2} is carried out. For example, the analysis processing is a skeleton detection.

At a later time t ₇ , the results of the previous analyzes are merged to obtain descriptor values characteristic of the objects present in the scene captured by the two cameras of the second group of cameras G _2, these characteristic descriptor values being stored .

Then, the method again processes the first group of cameras Gi, from a time instant t ₈ of obtaining second images of the first and second video stream of the first group of cameras Gi, and so on.

In this example, only one image is acquired and analyzed per group of cameras before moving on to the next group.

After performing M processing sequences for the same group of cameras, M being equal to 4 in this example, a situation analysis is performed at a instant t _a . For example, it is an integration of the results previously stored, to count people present in the joint field of vision of a group of cameras. In a practical case, it involves counting a number of people per rail car.

FIG. 3 schematically illustrates a second embodiment of the processing of two groups of cameras Gi and G ₂ , each comprising two separate cameras and providing two video streams, by time diagrams 42, 44, 46, 48 represented in parallel. FIG. 2 schematically illustrates the temporal sequencing of the processing of the respective video streams by the processor 12 and the calculation acceleration device 14 of the calculation device 10.

Unlike the embodiment illustrated in FIG. 2, in this second embodiment, a series of 5 images of the first and second video streams from the first group of cameras Gi are acquired and processed before the processing of the first and second video streams of the second group of cameras G ₂ . For example, in this second embodiment, the processing of each image extracted from a video stream consists in carrying out a skeleton detection, and the joint processing of the descriptor values consists in detecting an abnormal event concerning people, for example a fall or assault.

FIG. 3 illustrates the processing of the first group of cameras up to the implementation of the situation analysis at time t ' _a , making it possible to detect abnormal behavior.

In this second embodiment, for the implementation of this situation analysis, the acquisition of images of the first and second video streams from the first group of cameras is continued in parallel with the implementation of the situation analysis . In the event of an abnormal event being detected, it is planned to continue the image analyzes and the situation analysis, and to raise alarms following a positive detection of the occurrence of an abnormal event. In this case, the processing timing of the video streams coming from the groups of cameras of the series of groups of cameras is dynamically modified to take into account an abnormal situation detected.

For example, in one embodiment, the processing of the video streams of the first group is continued until a negative detection of an abnormal event, then the next group of cameras is then processed.

In the example illustrated in FIG. 3, there is no detection of abnormal events, and therefore the acquisition and processing of the images of the first and second video streams of the second group of cameras G ₂ is carried out from of the time point t _'is the end of treatment of the first group Gi of cameras. It should be noted that in one embodiment, the acquisition of images is carried out in parallel with the situation analysis the situation analysis does not occupy all the computing resources of the computing device 10.

FIG. 4 is a block diagram of the main steps of a video surveillance method according to one embodiment, implemented by a computing device 10.

The method comprises a first step 50 of obtaining a division into a series of groups of cameras {G _I ... G _L } to be processed, from among all of the N cameras connected to the calculation device 10. For example, the breakdown is obtained from a configuration file. As a variant, the cutting is performed by calculation, as a function of the spatial position of each camera with respect to a frame of reference associated with the rail vehicle.

Preferably, the cameras of a given group of cameras have joint fields of view, therefore have an intersection.

Each group of cameras provides a video stream, synchronized or almost synchronized, acquired over a given time interval, to be processed before proceeding to the processing of the video stream supplied by the following group of cameras.

In the embodiment more particularly described above, it is considered that the video streams acquired by a group of cameras are synchronized, that is to say that the images of the respective video streams of each camera of the group of cameras are acquired at predetermined time instants.

The method comprises a selection 52 of a group of cameras to be processed, called the current camera group, then an acquisition 54 of synchronized video streams from the current camera group.

Analysis of digital images 56 is performed on the acquired images.

In one embodiment, the acquisition steps 54 and analysis steps 56 follow one another on a plurality of synchronized images extracted from the video streams coming from the cameras of the current camera group, as illustrated in FIG. 3.

In one embodiment, the analysis step 56 implements a skeleton extraction algorithm. A detection of image salient points corresponding to articulations is carried out, each point corresponding to an articulation having an associated spatial position, expressed by coordinates (x, y) in the reference frame of the digital image. The detected joints each have an associated type of joint, for example: eyes, ears, neck, shoulders, elbows, wrists, hips, knees and ankles. Given the type of joint detected, the joints are linked to form a skeleton, representative of a person. For example, the algorithm proposed in the article “Realtime multi-person 2d pose estimation using part affinity fields” by Cao et al, Conférence on Computer Vision and Pattern Récognition 2017, is used.

Of course, any other method making it possible to form a skeleton representative of a person from points associated with the joints known to a person skilled in the art can be used.

The spatial positions of the joints and their links form descriptors characteristic of objects present in the analyzed digital image.

The descriptor values are stored for each image analyzed.

Next, a situation analysis step 58 is carried out using a fusion of the descriptor values calculated and stored.

For example, in step 58, an event detection algorithm (i.e. fall or aggression) is implemented as a function of the trajectories of the joints of the different people on a plurality of images.

According to a variant, in step 58, an algorithm is used to estimate the occupancy rate (also called passenger density) of the car of the railway vehicle observed by the current group of cameras, which can be performed on a single digital image of a video stream.

According to another variant, in step 58, an intrusion detection or presence detection algorithm is implemented.

Preferably, the selection of the algorithm implemented in step 58 is carried out according to a state of the train, for example: in service, at the end of service, at the depot.

For example, a configuration file is associated with each state of the train, and stored in memory 16 of the calculation device 10.

Indeed, it is useful to implement an estimate of occupancy rate or an event detection when the train is in service, a detection of the presence of people at the end of service when the train goes to the depot and a detection presence or intrusion when the train is at the depot.

For certain situation analyzes, for example in the event of an event detection in step 58, steps 54 to 58 are repeated for the current group of cameras.

In the event of a negative event detection, or if the situation analysis step implements another situation analysis that does not require particular vigilance, step 58 is followed by a step 60 of selecting the group of Next cameras to deal with in the following camera groups. The result of the situation analysis is transmitted in step 62 to a client application.

In addition, in the event of a positive verification, an alert is raised in step 62, for example an alert is sent to a control center, to the train driver. Thus, advantageously, in the event of an incident being detected, rapid handling of this incident is possible.

Advantageously, the video surveillance system and the video surveillance method described make it possible to carry out one or more situation analyzes on board a railway vehicle, taking into account images captured by several cameras. Thanks to the processing by groups of cameras, it is possible to carry out situation analyzes for the entire railway vehicle, in a manner compatible with railway constraints.

Claims

1.- Video surveillance system on board a rail vehicle, comprising a plurality of cameras (4i, 4 ,, 4 _N ) for video surveillance on board the rail vehicle (2), each camera (4i, 4 ,, 4 _N ) having a spatial position and an associated field of view, each camera being adapted to capture a video stream composed of a succession of digital images, the system (1) further comprising an on-board electronic computing device (10 ), characterized in that:

- said on-board electronic calculation device (10) comprises at least one calculation acceleration device (14),

and in that each camera (4i, 4 ,, 4 _N ) is connected to the on-board electronic computing device (10) and is adapted to transmit captured video streams to said electronic computing device (10),

the electronic calculation device (10) being adapted to:

a) obtaining (20) video streams from said group of cameras, each video stream containing at least one digital image,

-b) apply (22) an analysis of digital images to calculate values of descriptors characteristic of objects present in the or each image of each video stream originating from a camera of said group of cameras,

-c) applying (24) at least one situation analysis algorithm using the descriptor values calculated for at least one image of each video stream of said group of cameras processed to obtain a situation analysis result in the shooting field view of the cameras of said group of cameras.

2. A video surveillance system according to claim 1, in which said analysis of digital images implements, in each digital image of a video stream, a detection of persons and / or a skeleton extraction associated with a person present in the vehicle.

3. A video surveillance system according to claim 2, in which the values of characteristic descriptors include spatial positions, in a frame of reference associated with the digital image, of joints associated with a person present in the scene.

4. A video surveillance system according to any one of claims 1 to 3, on board a rail vehicle comprising a plurality of cars, each camera group being formed of cameras installed on board a car of the vehicle.

5.- System according to any one of claims 1 to 4, adapted to implement several different situation analysis algorithms depending on a state of the rail vehicle.

6.- System according to any one of claims 1 to 5, in which the situation analysis algorithm implements an abnormal event detection, and in the event of positive detection for a current group of cameras, the system is suitable for continuing to process video stream images from the current camera group.

7.- System according to any one of claims 1 to 6, further adapted to carry out an alert raising after the application of situation analysis.

8.- System according to any one of claims 1 to 7, comprising a timing module (26) adapted to select a group of next video surveillance cameras to be processed, according to a situation analysis result.

9.- Video surveillance method implemented by a video surveillance system in accordance with one of claims 1 to 8,

comprising steps consisting in:

obtaining (50) a division of the plurality of video surveillance cameras into a series of groups of cameras to be treated successively over time, according to a circular repetition, and for each group of cameras treated successively:

a) obtaining (54) video streams from said group of cameras, each video stream containing at least one digital image,

-b) apply (56) an analysis of digital images to calculate values of descriptors characteristic of objects present in the or each image of each video stream originating from a camera of said group of cameras,

-c) applying (58) at least one situation analysis algorithm using the descriptor values calculated for at least one image of each video stream of said group of cameras processed to obtain a situation analysis result in the shooting field view of the cameras of said group of cameras.

10.- Method according to claim 9, further comprising a step of selecting a next group of cameras to be processed.

1 1 .- Computer program comprising software instructions which, when executed by a programmable electronic device, implement a video surveillance method according to claim 9.