GB2602357A

GB2602357A - Method and apparatus for processing a video stream

Info

Publication number: GB2602357A
Application number: GB2100318.1A
Authority: GB
Inventors: Balzer Arnim; Whitby Mike; Soghbatyan Tigran; Kavoukis Anastasios
Original assignee: Seechange Technologies Ltd
Current assignee: Seechange Technologies Ltd
Priority date: 2020-12-23
Filing date: 2021-01-11
Publication date: 2022-06-29
Also published as: GB202100318D0

Abstract

A method and associated apparatus of processing a video stream comprising, at a first device configured to access a first camera, i) receiving a video stream from the first camera, the video stream comprising a temporal sequence of video frames, the content of each video frame representing a scene captured by the first camera, 102; ii) extracting one or more of the video frames from the video stream; iii) for each of the extracted video frames, determining whether to either analyse the content of the extracted video frame at the first device to determine one or more characteristics of the scene represented by the content, or transmit the extracted video frame to a second device for analysis of the content of the extracted video frame by the second device, 106; and iv) performing one of the analysis, 108, and the transmission, 110, in dependence on the determination. Also disclosed is a method and associated apparatus for processing a video stream at the second device and a system comprising the first and second device.

Description

METHOD AND APPARATUS FOR PROCESSING A VIDEO STREAM

Technical Field

The present invention relates to a method and apparatus for processing a video stream.

Background

The analysis of video content to determine characteristics of a scene represented by the content is known. Such characteristics may include features, objects, or higher-level inferences of the scene represented by the content. For example, feature detection models may be applied to digital images to detect features, for example to detect lines, edges, ridges, corners, blobs, textures, shapes, gradients, regions, boundaries, surfaces, volumes, colours and/or shadings. Object recognition may be achieved, for example, by applying a process of comparing stored representations of objects of interest to detected features and applying a matching rule for determining a match between the stored representations and the features. Such object recognition may utilise a data store of pre-specified objects when trying to identify an object represented by an image. For example, an object recognition model may group a set of detected features as a candidate object in a given scene and refer to the data store of pre-specified objects in order to identify the object. The detected features and/or objects may be used to infer information about the scene represented by content of the video stream, such as spatial models, lists of objects, tracking of objects though space, estimation of the motion of objects, detection of events, and/or recognition of gestures.

Such analysis of video content has many uses. As one example, video from cameras installed in a shop may be analysed to determine a list of items that a particular shopper has placed into their basket. This list of items may be used to charge the shopper for the items, for example as compared to needing to scan the items at a checkout. However, there are practical challenges to implementing such analysis of video content. For example, such analysis of video content can be relatively computationally intensive. Moreover, it can be advantageous for certain applications that such analysis be conducted in real-time or near real-time and/or for a relatively high video frame rate when needed, for example to ensure that the analysis does not miss an item being placed in a shopper's basket. This can compound the computational demands. Large computational power can be provided by the use of high-performance CPU/GPUs or cloud computing platforms, but the use of such may also have drawbacks, for example in terms of cost.

Summary

According to a first aspect of the present invention, there is provided a method of processing a video stream, the method comprising, at a first device configured to access a first camera: receiving a video stream from the first camera, the video stream comprising a temporal sequence of video frames, the content of each video frame representing a scene captured by the first camera; extracting one or more of the video frames from the video stream; determining, for each of the extracted video frames, whether to either: (a) analyse the content of the extracted video frame at the first device to determine one or more characteristics of the scene represented by the content, or (b) transmit the extracted video frame to a second device for analysis of the content of the extracted video frame by the second device, the second device being configured to access a second camera; and performing one of the analysis and the transmission in dependence on the determination.

According to a second aspect of the present invention, there is provided a method of processing a video stream, the method comprising, at a second device configured to access a second camera: receiving, from a first device configured to access a first camera, one or more video frames the received one or more video frames having been extracted by the first device from a video stream received by the first device from the first camera, the video stream comprising a temporal sequence of said video frames, the content of each of the one or more video frames representing a scene captured by the first camera; and for each of the received one or more video frames, analysing the content of the video frame to determine one or more characteristics of the scene represented by the content.

According to a third aspect of the present invention, there is provided a method of processing a video stream, the method comprising: receiving, at a first device configured to access a first camera, a video stream from the first camera, the video stream comprising a temporal sequence of video frames, the content of each video frame representing a scene captured by the first camera; extracting, at the first device, one or more of the video frames from the video stream; determining, at the first device, for each of the extracted video frames, whether to either: (a) analyse the content of the extracted video frame at the first device to determine one or more characteristics of the scene represented by the content, or (b) transmit the extracted video frame to a second device for analysis of the content of the extracted video frame by the second device, the second device being configured to access a second camera; if it is determined to analyse the extracted video frame at the first device, then analysing the content of the extracted video frame at the first device to determine one or more characteristics of the scene represented by the content; and if it is determined to transmit the extracted video frame to the second device, then transmitting the extracted video frame to the second device and analysing the content of the extracted video frame at the second device to determine one or more characteristics of the scene represented by the content.

According to a fourth aspect of the present invention, there is provided a device for processing a video stream, the device being configured to access a first camera, the device comprising a processing unit configured to: receive, via a first interface, a video stream from the first camera, the video stream comprising a temporal sequence of video frames, the content of each video frame representing a scene captured by the first camera; extract one or more of the video frames from the video stream; determine, for each of the extracted video frames, whether to either: (a) analyse the content of the extracted video frame at the first device to determine one or more characteristics of the scene represented by the content, or (b) transmit, via a second interface, the extracted video frame to another device for analysis of the content of the extracted video frame by the other device, the other device being configured to access a second camera; and; perform one of the analysis and the transmission in dependence on the determination.

According to a fifth aspect of the present invention, there is provided a device for processing a video stream, the device being configured to access a second camera, the device comprising a processing unit configured to: receive, via an interface, from another device, one or more video frames, the other device being configured to access a first camera, the received one or more video frames having been extracted by the other device from a video stream received by the other device from the first camera, the video stream comprising a temporal sequence of said video frames, the content of each video frame representing a scene captured by the first camera; and for each of the received one or more video frames, analyse the content of the video frame to determine one or more characteristics of the scene represented by the content.

According to a sixth aspect of the present invention, there is provided a system comprising a first device and a second device connected by a network, the first device being configured to access a first camera, the second device being configured to access a second camera; wherein the first device comprises a first processing unit configured to: receive, via a first interface, a video stream from the first camera, the video stream comprising a temporal sequence of video frames, the content of each video frame representing a scene captured by the first camera; extract one or more of the video frames from the video stream; determine, for each of the extracted video frames, whether to either: (a) analyse the content of the extracted video frame to determine one or more characteristics of the scene represented by the content, or (b) transmit, via a second interface, the extracted video frame to the second device for analysis of the content of the extracted video frame by the second device; and perform one of the analysis and the transmission in dependence on the determination; and wherein the second device comprises a second processing unit configured to: receive, from the first device, via a third interface, one or more of the extracted video frames transmitted to the second device by the first device; and for each of the one or more extracted video frames received by the second device, analyse the content of the video frame to determine one or more characteristics of the scene represented by the content.

Further features and advantages of the invention will become apparent from the following description of preferred embodiments of the invention, given by way of example only, which is made with reference to the accompanying drawings.

Brief Description of the Drawings

Figure 1 is a flow diagram illustrating schematically a method according to an example; Figure 2 is a flow diagram illustrating schematically a method according to an 30 example; Figure 3 is a schematic diagram illustrating a system according to an example; Figure 4 is a schematic diagram illustrating a device according to an example; and Figure 5 is a schematic diagram illustrating an apparatus according to an example.

Detailed Description

Referring to Figure 1, there is illustrated a method of processing a video stream at a first device, according to an example.

Referring briefly to Figure 3, there is illustrated an example system 300 in which examples of the method illustrated in Figure 1 may be implemented.

The system 300 comprises a first device 304, a second device 306, and a third device 308.

The first device 304 is configured to access a first camera 314a. That is, the first device 304 is associated with and configured to process video streams produced by the first camera 314a. In this example, the first device is also configured to access a second camera 314b. The first device 304 is configured to receive video streams from the first camera 314a and the second camera 314b.

The second device 306 is configured to access a third camera 316. That is, the second device 306 is associated with and configured process video streams produced by the third camera 316. The second device 306 is configured to receive video streams from the third camera 316 and the second camera 314b.

The first device 304 and the second device 306 are part of and connected over a first network 310. In this example the first network 310 is a local network 310, such as a Local Area Network (LAN) 310. The local network 310 is, in turn, connected to a wider computer network 302, for example, a Wide Area Network (WAN) 302, for example the intemet 302.

The third device 308 is also connected to and part of the first network 310. The third device 308 as illustrated in Figure 3 is not connected to any camera but is configured such that it may at some later time be connected to an additional camera (not shown) in order to process video streams produced by that camera (not shown). In this sense, the third device 308 may be configured to access the further camera whether or not that camera or other device is presently connected to the third device 308.

In the illustrated example, the local network 310 and/or the devices 304, 306, 308 are located at the edge or extremity of the wider computer network 302 such as the intemet 302. This is for example as compared to being located within the wider computer network 302. As such, the local network 310 and/or the devices may be said to be at the network edge. As described in more detail below, each device 304, 306, 308 analyses content of video frames extracted from video streams produced by the camera(s) with which it is associated to determine one or more characteristics of a scene represented by the content. For example, this analysis may comprise the device 304, 306, 308 applying a trained machine learning or artificial intelligence model to the content, for example detect one or more features and/or recognise one or more objects of the scene represented by the content. As such, in these examples, the devices 304, 306, 308 may be said to provide analysis, such as trained machine learning or artificial intelligence analysis, of the content of the extracted video frames at the network edge. This is for example as compared to providing such analysis in the wider computer network 302, for example at a cloud computing platform. Providing such analysis at the network edge may allow advantages such as relatively lower latency, routing overheads, and bandwidth costs.

In some examples, the devices 304, 306, 308, as well as the cameras 314a, 314b, 316a that the devices 304, 306, 308 are configured to access, are part of and connected over the first, local, network 310. In these examples, the first camera 314a and the second camera 314b may transmit video streams produced thereby to the first device 304 over the local network 310, and the third camera 316a may transmit the video stream produced thereby to the second device 306 over the local network 310.

In other examples, one or more of the devices 304, 306, 308 may provide a gateway between a network comprising the camera(s) 314a, 314b, 316a associated with the device 304, 306, 308 and the first, local, network 310.

For example, the first device 304 may be a first gateway 304 between a second network 311 comprising the first camera 314a and the second camera 314b, and the first network 310. In this case, the first gateway 304 may be configured to receive video streams from the first camera 314a and the second camera 314b over the second network 311. A gateway may be generally defined as a network node that connects one discrete network to another discrete network. For example, a gateway may provide interoperability between different communication protocols used by the discrete networks. In this sense, the first gateway 304 may connect the second network 311 and the first network 310 which may be discrete from the second network 311. In particular, the first gateway 304 may act to connect the second network 311 comprising the first camera 314a and/or the second camera 314b and over which video streams are transmitted to the first gateway 304, to the local network 310 and/or the wider network 302 beyond.

As another example, the second device 306 may be a second gateway 306 between a third network 313 and the first, local, network 310. In these examples, the third network 313 comprises the third camera 316a. The second gateway 306 is connected to the third camera 316a and is configured to receive a video stream from the third camera 316a over the third network 313. The second gateway 306 connects the third network 313 to the first network 310, which may be discrete from the third network 313. In particular, the second gateway 306 may act to connect the third network 313 comprising the third camera 316a and over which video streams are transmitted to the second gateway 306, to the local network 310 and/or the wider network 302 beyond. As another example, the third device 308 may be a third gateway 308 connected to the first network 310. As mentioned above, in the illustrated example the third device 308 is not connected to any camera but may be configured such that it may at some later time be connected to an additional camera (not shown) over a fourth network (not shown) in order to provide gateway services for the additional camera, similarly to as described above for the first gateway 304 and/or the second gateway 306. In this sense, the third gateway 308 may be configured as a gateway between the second network 310 and the fourth network (not shown) whether or not a camera or other device is presently connected to the fourth network (not shown).

Returning again to Figure 1, the method is performed at a device configured to access a camera. For conciseness, the method illustrated in Figure 1 will be described as being performed by the first device 304 configured to access the first camera 314a but it will be appreciated that the method may be performed at any such device configured to access a camera, for example the second device 306 of Figure 3.

The method comprises, in step 102, receiving a video stream from the first camera 314a, the video stream comprising a temporal sequence of video frames, the content of each video frame representing a scene captured by the first camera 314a.

For example, the first camera 314a may be configured to capture digital images of a scene, apply a timestamp to each image, and stream the images (i.e. transmit the images as they are captured) to the first device 304. In this sense, the captured and transmitted digital images constitute a temporal sequence of frames of a video stream. As mentioned above, in some examples the first camera 314a may transmit the video stream to the first device 304 over a second network 311 to the first device 304, and in other examples the second network 311 may in effect be part of the first, local, network 310 and the first camera 314 may transit the video stream to the first device 304 over the first, local, network 310.

In some examples, each video frame of the stream may comprise a header and the content. For example, the header may comprise the timestamp, information on the camera 314a that captured the video frame, a stream identifier and/or a sequence number of the video frame. The content of the video frame comprises the image data captured by the camera 314a, and represents the scene captured by the camera 314a. For example, the content of the video frame may comprise pixel values for the digital image captured by the camera 314a.

The scene captured by the camera 314a may be, for example, an aisle of a shop.

In this case, the content of each video frame may represent the scene of the aisle of the shop at a moment in time. Taken together, the content of each fame of the temporal sequence of video frames may represent the scene of the aisle of the shop over a period of time, for example represent a person walking through the aisle or interacting with objects on a shelf of the aisle of the shop.

In any case, in step 102, the video stream is received at the first device 304 from the camera 314a.

The method comprises, in step 104, extracting one or more of the video frames from the video stream. For example, the one or more video frames may be extracted by a frame grabber of the first device 304.

In some examples, all of the video frames of the video stream may be extracted from the video stream.

In other examples, less than all of the video frames of the video stream may be extracted from the video stream. For example, only a certain proportion of the video frames of the video stream may be extracted. For example, if the video stream has a frame rate of 20 frames per second (fps), and the video frames are extracted from the video stream at a rate of 10 fps, then half of the video frames of the video stream will be extracted. The extraction in step 104 may comprise extracting one video frame for every predetermined number of video frames of the video stream received, for example, extracting one frame out of every two frames received. The proportion of frames extracted from the video stream may be set or determined, for example, based on a desired frame rate for analysed frames, i.e. frames the content of which is to be analysed. For example, it may be determined that an analysis of the content of the frames at a frame rate of 10 fps is desired in order to reliably detect actions of the scene represented by the content, and hence it may be determined that frames of the video stream are to be extracted from the video stream at a rate of 10 fps.

The method comprises, in step 106, determining, for each of the extracted video frames, whether to either: (a) analyse the content of the extracted video frame at the first device 304 to determine one or more characteristics of the scene represented by the content, or (b) transmit the extracted video frame to a another device 306, 308 for analysis of the content of the extracted video frame by the other device 306, 308 the other device 306, 308 being configured to access another camera 316a. For example, the other device may be the second device 306 or the third device 308 described with reference to Figure 3.

The method comprises performing one of the analysis (step 108) and the transmission (step 110) in dependence on the determination. That is, if it is determined in step 106 for a given extracted video frame to analyse the content of the extracted video frame at the first device 304 to determine one or more characteristics of the scene represented by the content (i.e. the route labelled 'analyse' in Figure 1), then the method moves to step 108 in which, for the given extracted video frame, the content of the extracted video frame is analysed at the first device 304 to determine one or more characteristics of the scene represented by the content. However, if it is determined in step 106 for a given extracted video frame to transmit the extracted video frame to another device 306, 308 for analysis of the content of the extracted video frame by the other device 306, 308 (i.e. the route labelled 'transmit' in Figure 1) then the method moves to step 110 in which, for the given extracted video frame, the extracted video frame is transmitted to the other device 306, 308 for analysis of the content of the extracted video frame by the other device 306, 308.

Providing that extracted video frames can be either analysed at the first device 304 or transmitted to another device 306, 308 for analysis may help allow that the extracted frames can be analysed in real-time or near real time and/or at a relatively high frame rate when needed, but without requiring each device 304, 306, 308 to have a powerful CPU/GPU and/or without the need to use cloud computing platforms. For example, this may allow that each device 304, 306, 308 may be provided by relatively inexpensive hardware, for example a Raspberry Pi 0 or NVIDIA 0 Jetson Nano TM Developer Kit processing device.

As an example, if it is determined at the first device 304 that, at a given time, the processing speed or capacity at the first device 304 is sufficient to analyse the extracted frames in real-time or near real-time at a specified desired frame rate, then all of the frames extracted at the first device 304 may be analysed at the first device 304. However, if it is determined at the first device 304 that, at a given time, the processing speed or capacity at the first device 304 will not be sufficient to analyse all of the extracted frames in real-time or near real-time at the specified desired frame rate, the first device 304 may determine to transmit some of the extracted frames to the second device 306 (or third device 308) for analysis instead by the second device 306 (or third device 308, respectively). This may help ensure that extracted frames can still be analysed in real-time or near real-time at the specified desired frame rate but without having to include a relatively powerful and hence expensive CPU/GPU at the first device 304 and/or without needing to rely on relatively expensive cloud computing platforms.

As described above, the first device 304 and the other device 306, 308 may be part of a local network 310. In other words, the first network 310 may be a local network 310 such a LAN 310. The transmission of the extracted video frame to the other device 306, 308 in step 110 may be over, for example solely over, the local network 310, for example the LAIN 310. The network over which the extracted frames are transmitted being a local network 310 may allow that the extracted video frames, which can be of a relatively large size, can be transmitted quickly, efficiently and/or at relatively low cost, to the other devices 306, 308. This may be as compared to, for example, transmitting extracted video frames over a wider network 302 such as the internet 302 to a cloud computing platform for analysis, which may involve relatively high latency, routing overheads, and bandwidth costs.

The determination in step 106 may be based on one or more of a variety of factors. Some examples factors are outlined in the following.

In some examples, the determination in step 106 of whether to analyse a given extracted frame at the first device 304 or transmit the extracted frame for analysis by another device 306, 308 may be based on an estimate of whether the analysis of the content of the extracted frame at the first device 304 would occur at frame rate greater than or equal to a given specified rate at which the content of extracted frames is to be analysed. For example, if it is determined or estimated that the first device 304 can or will analyse the extracted frame at frame rate greater than or equal to the given specified rate at which extracted frames are to be analysed (i.e. at a high enough frame rate to match or better the desired analysis rate) then the extracted frame may be analysed at the first device 304. Analysis at the first device 304 may be preferable as this avoids overheads associated with transmitting the extracted frame to another device 306, 308. However, if it is determined or estimated that the first device 304 can or will be able to analyse the extracted frame only at a frame rate lower than the given specified rate (i.e. cannot or will not be able to analyse the extracted frame at a high enough frame rate to match or better the desired analysis rate), or otherwise, then the extracted frame may be transmitted to the other device 306, 308 for analysis.

It may be desirable to analyse the content of frames of the video stream at a given specified frame rate, for example to ensure that events of the scene represented by the video stream are not missed. However, in order to provide analysis results for the content of the extracted frames in real-time or near real-time, it may be desirable that the rate at which the extracted frames are analysed is not lower than the rate at which the extracted frames are sent for analysis. If this does occur, a back-log of extracted frames to be analysed may develop, and the analysis results may not be provided in real time-or near real time. Basing the determination in step 106 on an estimate of whether the analysis frame rate at the first device 304 is or will be greater than or equal to the specified frame rate may help prevent a back-log occurring at the first device 304 and hence may help for the analysis results to be provided in real time or near real-time, but while minimising transmission of extracted frames to other devices 306, 308 and hence minimising overheads associated with such transmission.

In some examples, the determination in step 106 may be based on one or more of: a rate at which the video frames are extracted from the video stream; the specified rate at which each extracted frame is to be analysed, and a maximum rate at which the first device 304 is able to analyse the content of the extracted frame. For example, it may be specified that the rate at which the video frames are extracted from the video stream and/or the rate at which the extracted frames are to be analysed, is a certain number of frames per second, e.g. 9 fps. It may be known or estimated that the maximum rate at which the first device 304 is able to analyse the content of the extracted frames is lower than the specified rate, for example 6 fps. The determination to transmit a given extracted frame to the other device 306, 308 may be based at least in part on a determination that the specified rate is higher than the maximum rate.

In some examples, the determination in step 106 may be based at least in part on a processing load parameter of the first device 304. For example the processing load parameter may indicate a current or expected processing load of a processing unit at the first device 304. For example, the expected processing load may be based on a processing task que of the processing unit. The determination to transmit the extracted frame to the other device 306, 308 may be in response to a determination that the processing load parameter is larger than a threshold value.

In some examples, the determination in step 106 may be based at least in part on an identifier of the video stream from which the video frame is extracted may be used. For example, the identifier may be obtained from the header of the extracted video frame of the stream. It may be known, for example pre-programmed, that video frames from the identified video stream are such that the first device 304 will be unable to analyse content of the frames at the specified rate. For example, the video frames from the identified video stream may be particularly large in size. The determination to transmit a given extracted frame to the other device 306, 308 may be based at least in part on a determination that the identifier of the video stream matches a predefined identifier.

In some examples, the determination in step 106 may be based on the position of the extracted frame in a sequence of extracted video frames to be analysed. For example, the extraction in step 104 may comprise extracting a sequence of the video frames from the video stream, and the determination in step 106 may comprise: for a first video frame in the sequence, determining that the video frame is to be analysed at the first device 304; and for a second video frame in the sequence, determining that the video frame is to be transmitted to the other device 306, 308 for analysis by the other device 306, 308. The determination for each of the video frames in the sequence may follow a predetermined pattern that is repeated for successive sequences of video frames extracted from the video stream. As an example, a sequence of extracted video frames may comprise three successive frames, and if the given extracted frame is first or second in the sequence then the given frame may be analysed at the first device 304, whereas if the given frame is third in the sequence then the given frame may be transmitted to the other device 306, 308 for analysis. This pattern may be repeated for successive sequences of extracted video frames. The pattern may be based on one or more distribution rules. For example, a distribution rule may specify that one extracted frame is to be transmitted to the other device or devices 306, 308 for analysis for every given number of extracted frames analysed at the first device 304. As another example, a distribution rule may specify that first device 304 and the other device or devices 306, 308 each analyse the content of an equal number of extracted frames.

In some examples, the determination in step 106 may be based at least in part on a processing load parameter of the other device or devices 306, 308. For example, the other device or devices 306, 308 may periodically transmit a processing load parameter to the first device 304. In some examples, the determination in step 106 may be based at least in part on a comparison of the processing load parameter of the first device 304 with the processing load parameter of the other device or devices 306, 308. For example, the determination in step 106 to transmit the extracted frame to the other device 306, 308 for analysis may be based at least in part on a determination that the processing load parameter of the other device 306 is lower than the processing load parameter of the first device 304.

It will be appreciated that the determination in step 106 may be based, alternatively or additionally, on other example parameters, for example parameters indicative of the ability of the device 304 and/or the other device or devices 306, 308 306 to perform the analysis of the content of the extracted frame.

In some examples, the determination in step 106 may be based at least in part on an operational state of an analysis engine of the device 304 that would perform the analysis in step 108 on the extracted frame. For example, if it is determined that the analysis engine of the device 304 that would perform the analysis in step 108 on the extracted frame is failing or has failed or is non-responsive, then it may be determined in step 106 to transmit the extracted frame to the other device 306, 308 for analysis as in step 110. For example, the extracted frame may be transmitted as in step 110 to another device 306, 308 for which it is determined that the analysis engine that would perform the analysis at the other device 306, 308 is operational. This may provide for system redundancy.

In some examples, the determination in step 106 may be based on the results of a pre-analysis of the content of the extracted video frame at the first device 304.

For example, the pre-analysis may be performed on the content of the extracted video frame at the first device 304, which pre-analysis is relatively computationally inexpensive and/or fast compared to the analysis performed in step 108 by the first device 304 or the analysis performed by the other device 306, 308 as a result of the extracted video frame being transmitted to the other device 306, 308 in step 110. For example, the pre-analysis may comprise applying a relatively fast and/or relatively computationally inexpensive Al model to the extracted frame. In some examples, the results of the pre-analysis may be used to determine, in step 106, whether to perform the analysis in step 108 on the extracted frame or transmit the extracted frame to another device 306, 308 as in step 110.

In some examples, the analysis performed by the first device 304 in step 108 may be to determine, for example specialised to determine, a first set of characteristics, whereas the analysis performed by the other device 306, 308 to which the frame would be transmitted in step 110 may be to determine, for example specialised to determine, a second set of characteristics. For example, the analysis performed by the first device 304 may be specialised to determine characteristics of cars, whereas the analysis performed by the other device 306, 308 may be specialised to determine characteristics of people. The pre-analysis may determine in a relatively quick and/or computationally inexpensive way a type of scene represented by the content, and the determination in step 106 may be made based on the determined type of scene. For example, if it is determined from the pre-analysis that the scene includes or primarily includes cars, it may be determined to perform the analysis in step 108 at the first device 304, whereas if it is determined from the pre-analysis that the scene includes or primarily includes people, it may be determined to transmit the extracted frame as in step 110 for the analysis to be performed by the other device 306, 308.

As another example, the pre-analysis may comprise an estimation of the time the analysis that would be performed by the first device in step 108 may take to complete. The pre-analysis may determine in a relatively quick and/or computationally inexpensive way an estimate of the time the analysis that would be performed by the first device in step 108 may take to complete, and the determination in step 106 may be made based on the estimated time. For example, the pre-analysis may be applied to the content of the extracted video frame to determine a type of content of the extracted video frame, and based on the determined type of content an estimate of the time the analysis that is to be performed by the first device in step 108 may take to complete may be made. For example, if the pre-analysis determines the content of the image is relatively complex, for example is estimated to include many objects to be analysed, it may be estimated that the analysis that is to be performed by the first device in step 108 may take a relatively long time to complete. The estimated time may be used to determine, in step 106, whether to perform the analysis in step 108 on the extracted frame or transmit the extracted frame to another device 306, 308 as in step 110. For example, if the estimated time is larger than a predefined threshold, it may be determined to transmit the extracted frame to another device 306, 308 as in step 110.

As another example, the determination in step 106 may be based on the estimated time and the specified rate at which each extracted frame is to be analysed. For example, if the estimated time corresponds to an analysis rate (i.e. the reciprocal of the estimated time) that is greater than or equal to the specified rate at which each extracted frame is to be analysed, then it may be determined in step 106 to perform the analysis in step 108 on the extracted frame, whereas if the estimated time corresponds to an analysis rate that is less than the specified rate at which each extracted frame is to be analysed, then it may be determined in step 106 to transmit the extracted frame to another device 306, 308 as in step 110.

In any case, the determination in step 106 is made.

In some examples, the first device 304 may perform a discovery process on the local network 310 to identify each of the other devices 306, 308 on the local network 310. The discovery process may include discovering a network address, a processing capacity, an availability of the other devices 306, 308 for analysing content of extracted video frames transmitted from the first devices 304, an operational status of the other devices 306, 308, and/or a type of analysis performed on video frames by the other devices 306, 308. For example, the processing capacity may be indicated by a processing load parameter as described above. Alternatively or additionally, the other devices 306, 308 may periodically broadcast or transmit such information to the first device 304.

In some examples, the transmission in step 110 may comprise determining to which one of a plurality of other devices 306, 308 the extracted video frame is to be transmitted.

In some examples, the determination of to which other device 306, 308 the extracted frame is to be transmitted may be based on a processing load parameter of each of the plurality of other devices 306, 308. For example, the extracted frame may be transmitted to that other device 306, 308 that has the lowest processing load parameter.

In some examples, the determination of to which other device 306, 308 the extracted frame is to be transmitted may be based on a position of the video frame in a sequence of video frames to be analysed. For example, there may be a sequence of extracted frames to be analysed, and if the given extracted frame is first in the sequence then the given frame may be analysed at the device 304, if the given frame is second in the sequence then the given frame may be transmitted to a first of the other devices 306 for analysis, and if the given frame is third in the sequence then the given frame may be transmitted to a second of the other devices 308 for analysis. This pattern may be repeated for successive sequences of extracted video frames. This may be referred to as a 'round robin' distribution of the analysis of the extracted frames. The pattern may be based on one or more distribution rules, for example as described above.

The number of other devices 306, 308 to which the extracted frame may be transmitted by the device 304 may be scaled in dependence on the specified frame rate at which each extracted frame is to be analysed, the maximum rate at which each other device 306, 308 is able to analyse the content of an extracted frame and/or the processing load parameter of each of the other devices 306, 308. For example, the higher the specified frame rate, the lower the maximum rate at which each other device 306, 308 is able to analyse the content, and/or the higher the processing load parameter of each of the other devices 306, 308, then the larger the number of other devices 306, 308 to which the extracted frame may be transmitted (e.g. the larger the number of frames included a given sequence and the larger the number other devices 306, 308 included in the distribution pattern).

In some examples, the determination of to which other device 306, 308 the extracted frame is to be transmitted may be based on a type of analysis performed on video frames by the other devices 306, 308. For example, the second device 306 may be configured to perform a first type of analysis to determine, for example, a specialised first set of characteristics, and the third device 308 may be configured to perform a second type of analysis to determine, for example, a specialised second set of characteristics. For example, the analysis performed by the second device 306 may be specialised to determine characteristics of cars, whereas the analysis performed by the third device 308 may be specialised to determine characteristics of people. If the content of the extracted frame is determined to be suited to the analysis performed by the second device 306, for example the scene represented by its content involves cars, then it may be determined to transmit the extracted video frame to the second device 306, whereas for example if the extracted frame is determined to be suited to the analysis performed by the third device 308, for example the scene represented by its content involves people, then it may be determined to transmit the extracted video frame to the third device 308. The type of scene represented by the content of the extracted frame may be determined, for example, from the results of a pre-analysis performed on the content, for example as described above.

As mentioned, the method comprises, in step 108, at the first device 304, analysing the content of the extracted video frame to determine one Or more characteristics of the scene represented by the content.

In some examples, the analysis in step 108 may comprise applying a trained model to the content of the extracted video frame to infer the one or more characteristics of the scene represented by the content. For example, the model may be trained to detect one or more features of the scene, or to recognise one or more objects of the scene represented by the content of the video frame. For example, the model may be or comprise a trained artificial intelligence and/or machine learning computational algorithm.

In some examples, the analysis in step 108 may comprise feature detection, object recognition, and/or higher-level inferences. Accordingly, in some examples, the one or more characteristics of the scene represented by the content may be a detected feature, a recognised object, and/or a higher-level inference of the scene. For example, a feature detection model may be applied to the frame to detect features of the scene, for example to detect lines, edges, ridges, corners, blobs, textures, shapes, gradients, regions, boundaries, surfaces, volumes, colours and/or shadings. Object recognition may be achieved, for example, by applying a process of comparing stored representations of objects of interest to detected features and applying a matching rule for determining a match between the stored representations and the features. Such object recognition may utilise a data store of pre-specified objects when trying to identify an object represented by an image. For example, an object recognition model may group a set of detected features as a candidate object in a given scene and refer to the data store of pre-specified objects in order to identify the object. The detected features and/or objects may be used to infer information about the scene represented by content of one or more frames of the video stream, such as spatial models, lists of objects, tracking of objects though space, estimation of the motion of objects, detection of events, and/or recognition of gestures.

In some examples, where the analysis is performed on the extracted video frame at the first device 304 as in step 108, the method may comprise, at the first device 304, transmitting the determined one or more characteristics to a sever (not shown). For example, the server (not show) may be part of the wide area network 302 of Figure 3, The server (not shown) may collect and organise the characteristics determined by and transmitted from each of the devices 304, 306, 308. The server (not shown) may implement monitoring or inference functions based on the transmitted characteristics, for example object tracking over different cameras.

In some examples, where the analysis is performed on the extracted video frame at the first device 304 as in step 108, the method may comprise, at the first device 304, storing the one or more characteristics in a memory at the first device 304. For example, the characteristics determined for one extracted frame may be combined with the characteristics determined for another extracted frame at the first device 304, for example to determine one or more high-level inferences such as the movement of an object through the scene.

In some examples, where the transmission of the extracted video frame to the other device 306, 308 is performed as in step 110, the method may comprise, at the first device 304, receiving, from the other device 306, 308 one or more characteristics of the scene represented by the content of the video frame determined by analysing the content of the video frame at the other device 306, 308. In some examples, the method may comprise storing the received one or more characteristics in a memory at the first device 304. In some examples, the method may comprise combining the one or more characteristics received from the other device 306, 308 with one or more characteristics of a scene represented by the content of another video frame extracted from the sensor data stream by the first device 304 and determined by the first device 304 by analysing the content of the other video frame. Similarly to as mentioned above, the characteristics of multiple frames of a video stream may be combined to determine, at the first device 304, one or more high-level inferences such as the movement of an object through the scene. The results of the determination, for example the high-level inferences, may be transmitted by the first device 304 to the server (not shown).

Transmitting the one or more determined characteristics and/or the high-level inferences to the server, for example as compared to transmitting all of the video frames (which may be relatively large) to the server for analysis, may help lower the communication overheads, latency, and bandwidth costs associated with communicating over the wide area network 302.

In some examples, receiving the one or more characteristics from the other device 306, 308 may comprise receiving, from the other device 306, 308, in combination with the one or more characteristics, the transmitted video frame. This may help provide that the first device 304 can still perform tasks with video frame without having to store the video frame whilst it is being analysed at another device 306, 308. For example, the first device 304 may use the received frame, with or without the accompanying determined one or more characteristics, to provide a preview of the video to a user. Reducing the need to store the video frame at the first device 304 whilst it is being analysed at another device 306, 308 may help reduce the storage burden at the first device 304, thereby helping to allow the first device 304 to be provided by a low-cost device.

Referring to Figure 2, there is illustrated a method of processing a video stream at a device, according to another example. For conciseness, the method illustrated in Figure 2 will be described as being performed by the second device 306 configured to access the second camera 316a, but it will be appreciated that the method may be performed at any such device, for example the first device 304 or the third device 308 of Figure 3, The method comprises, in step 202, receiving, from a first device 304 configured to access a first camera 314a, one or more video frames, the received one or more video frames having been extracted by the other first device 304 from a video stream received by the first device 304 from the first camera 314a, the video stream comprising a temporal sequence of said video frames, the content of each of the one or more video frames representing a scene captured by the first camera 314a.

For example, the one or more video frames received at the second device 306 in step 202 may comprise the extracted video frame transmitted by the first device 304 in step 110 of the method described above with reference to Figure 1.

The method comprises, in step 204, for each of the received one or more video frames, analysing the content of the video frame to determine one or more characteristics of the scene represented by the content.

For example, the analysis in step 204 may be or comprise any one or combination of the examples of the analysis in step 108 of the method described with reference to Figure 1. In some examples, the analysis performed in step 204 of Figure 2 may be the same as the analysis performed in step 108 of Figure 1. That is, the analysis performed by the second device 306 on the content of the received video frame may be the same as the analysis that would otherwise have been applied to the video frame at the first device 304 if it had been analysed by the first device 304. In some examples, an indication of the analysis or type of analysis that the second device 306 is to apply to the frame may be transmitted by the first device 304 to the second device 306 along with the extracted frame. For example, an analysis identifier may be written by the first device 304 into the header of the extracted frame, and the second device 306 may determine the analysis to apply to the frame based on the analysis identifier.

In some examples, the method may comprise publishing the determined one or more characteristics to a server (not shown). For example, similarly to as described above, the server (not shown) may be part of the wide area network 302 of Figure 3.

The server (not shown) may collect and organise the characteristics determined by and transmitted from each of the devices 304, 306, 308. The server (not shown) may implement monitoring or inference functions based on the transmitted characteristics. In some examples, the one or more characteristics may be published to server (not shown), for example rather than being transmitted back to the first device 304, in cases where analysis across frames is not required. For example, the server may wish to know the number of people represented in the content of each extracted frame. This may not require an analysis across frames. In this case, the second device 306 may simply publish the characteristics (i.e. the number of people represented) for the frame to the server (not shown). The server (not shown) may log the characteristic for each extracted frame of the video stream In some examples, the method may comprise transmitting the determined one or more characteristics to the first device 304, for example over the first network 310. For example, the first device 304 may be configured to perform not only an analysis of individual frames, but also an analysis of characteristics across multiple frames in the video stream. The one or more characteristics of the frame analysed at the second device 306 may therefore be transmitted to the first device 304 so that the first device 304 can combine the characteristics with one or more characteristics of the content of a frame analysed at the first device 304. For example, as mentioned above, the characteristics of multiple frames of a video stream may be combined to determine, at the first device 304, one or more high-level inferences such as the tracking of an object through the scene.

In some examples, the transmitting the determined one or more characteristics to the first device 304 may comprise transmitting, to the first device 304, in combination with the determined one or more characteristics, the received video frame. As mentioned above, the first device 304 may use the received frame to perform one or more tasks with the frame, for example to provide a preview of the video to a user of the first device 304, for example.

It will be appreciated that the second device 306 may also be capable of analysing content of video frames extracted from a video stream from the second camera 316a with which the second device 306 is associated. Accordingly, in some examples, the method may comprise, at the second device 306, receiving a further video stream from the second camera 316a, the further video stream comprising a temporal sequence of further video frames, the content of each further video frame representing a scene captured by the second camera 316a; extracting one or more of the further video frames from the further video stream; and for each of the extracted further video frames, analysing the content of the extracted further video frame to determine one or more characteristics of the scene represented by the content.

It will also be appreciated that the method performed at the first device 304 described above with reference to Figure 1 may be combined with the method performed at the second device 306 described above with reference to Figure 2, to provide an overall method of processing a video stream. Accordingly, such an overall method comprises receiving, at the first device 304 configured to access a first camera 314a, a video stream from the first camera 314a, the video stream comprising a temporal sequence of video frames, the content of each video frame representing a scene captured by the first camera 314a, extracting, at the first device 304, one or more of the video frames from the video stream; determining, at the first device 304, for each of the extracted video frames, whether to either: (a) analyse the content of the extracted video frame at the first device 304 to determine one or more characteristics of the scene represented by the content, or (b) transmit the extracted video frame to a second device 306 for analysis of the content of the extracted video frame by the second device 306, the second device 306 being configured to access a second camera 316a, if it is determined to analyse the extracted video frame at the first device 304, then analysing the content of the extracted video frame at the fist device 304 to determine one or more characteristics of the scene represented by the content; and if it is determined to transmit the extracted video frame to the second device 306, then transmitting the extracted video frame to the second device 306 and analysing the content of the extracted video frame at the second device 306 to determine one or more characteristics of the scene represented by the content.

Referring to Figure 4, there is a schematic diagram illustrating the functional blocks of a device 422 according to an example. The device 422 may be used as any one of the devices 304, 306, 308 described above with reference to Figures 1 to 3. For example, the device 422 may, like the first device 304 described above, be configured to access the first camera 114a. As another example, the device 422 may, like the second device 306 described above, be configured to access the third camera 316a. Similarly, the device 422 or functional components thereof may be configured to perform a method according to any of the examples described above with reference to Figures 1 to 3.

As illustrated, the example device 422 comprises the following functional components: an extraction block 414, a configuration block 460, a model block 440, an inference block 450, a preview block 430, a plugin block 418, a GPU block 420, a logging and metrics block 470, and a log and metrics source 476.

The extraction block 414 comprises a frame grabber 416. The frame grabber 416 is communicatively connected to one or more cameras 410, 412. The configuration block 460 comprises a configuration proxy 462 and a configurator 464. The model block 440 comprises a model proxy 448, a downloader 444, a model storage 446, and an engine 442. The inference block 450 comprises a publisher 454, an inference proxy 456, a ruler 452, and an alerter 458. The preview block 430 comprises an authenticator 432 and a previewer 434. The logging and metrics block 470 comprises a logging and metrics proxy 472 and a log and metrics gatherer 474.

The authenticator 432 and frame grabber 416 are communicatively connected to a local network or intranet 402. For example, the local network 402 may be the same as or similar to the local network 310 described above with reference to Figure 3. The authenticator 432 is also communicatively connected to a cloud backend (not shown) configured to verify authentication requests sent by the authenticator 432. The model proxy 448 is communicatively connected to a model warehouse 404. The inference proxy 456 is communicatively connected to an inference cloud 406. The configuration proxy 462 is communicatively connected to a configuration cloud 408. The logging and metrics proxy 472 is communicatively connected to a shared services cloud 402. For example the model warehouse 404, the inference cloud 406, the configuration cloud 408 and/or the shared services cloud 402 may be located in a wide area network such as the internet, for example the wide area network 302 described above with reference to Figure 3. Communication of components of the device 422 with the cloud(s) may be via secure cloud communication using Transport Layer Security (Its) protocol and one or more authentication protocols. Similarly, the communication between the frame grabber 416 and/or the authenticator 432 over the local network 402 for example with other devices (not shown in Figure 4) may be encrypted, for example using Transport Layer Security (Us) protocol and one or more authentication protocols.

The frame grabber 416 is configured to receive a video stream from one or more cameras 410, 412, and extract or pull one or more of the video frames from the video stream. The frame grabber 416 is configured to determine for each extracted frame whether to either send the frame to the engine 442 for analysis at the device 422 or to transmit the frame to another device over the local network 402. The determination may be based on any one of the examples described with reference to step 106 and/or step 110 of Figure 1. The frame grabber 416 is configured to send the extracted frame to the engine 442 or to transmit the frame to another device over the local network 402 in dependence on the determination. The frame grabber 416 obtains access to the CPU 420 via the plugin 418.

The configurator 464 is configured to obtain (for example pull) from the configuration cloud 408, via the configuration proxy 464, configuration information for the device 422. The configurator 464 is configured to publish the configuration information to other components of the device 422, for example, the previewer 434, the downloader 444, the engine 442, the publisher 454, the ruler 452 and the alerter 458. In particular, the configuration information may be used to configure the basis on which the frame grabber 416 determines to send an extracted frame to the engine 442 or to another device (not shown in Figure 4). For example, the configuration information may be used to configure the rate according to which the frame grabber 416 extracts frames from the video stream and/or the specified rate at which the extracted frames are to be analysed. As another example, the configuration information may be used to configure the sequence according to which the frame grabber 416 is to transmit extracted frames to the engine 442 or another device for analysis. As another example, the configuration information may be used to configure the identifiers of the video streams that are to be processed by the device 422 and/or the identifiers of the video streams for which the frame grabber 416 is to transmit at least some of the frames extracted therefrom to another device or devices for analysis. As another example, the configuration information may be used to configure the one or more other devices to which the frame grabber 416 may transmit an extracted frame. In some examples, the configuration information may be used to configure the type of analysis that the engine 442 is to apply to the content of the video frames.

The downloader 444 is configured to obtain (for example pull) models (i.e. computational models according to which the engine 442 may analyse the extracted video frames) from the model warehouse 404, via the model proxy 448, and store the obtained models in the model storage 446. The model proxy 448 is configured to provide encryption and authentication for the communication of the downloader with the model warehouse 404. Accordingly, the model proxy 448 ensures that communication with the model warehouse 404 is encrypted (e.g. using TLS) and authenticated, while providing the downloader 444 with an internal endpoint from which to obtain the models. The engine 442 is configured to load one or more modes from the model storage 446. The engine 442 is configured to analyse the content of the extracted video frame using one or more of the models to determine one or more characteristics of the scene represented by the content. For example, the analysis and/or determined one or more characteristics of any of the examples described with reference to Figures 1 to 3 may be applied by the engine 442. The engine 442 is configured to publish the determined one or more characteristics to the publisher 454, the ruler 452, and the previewer 434. The engine 442 is also configured to receive one or more determined characteristics and/or frames from other devices via the local network 402, for example for frames extracted by the frame grabber 416 but transmitted to another device via the local network 402 for analysis. For example, as also described above with reference to Figures 1 to 3, the engine 442 may utilise this information when performing a higher-level analysis over several frames, for example. The engine 442 obtains access to the GPU 420 via the plugin 418.

The publisher 454 is configured to receive the one or more characteristics determined and published by the engine 442 and push the one or more characteristics to the inference cloud 406 via the inference proxy 456. Similarly to as described above with reference to Figures 1 to 3, the inference cloud 406 may use the one or more characteristics for a higher-level purpose, such as tracking an object across multiple cameras.

The ruler 452 is configured to receive the one or more characteristics determined and published by the engine 442 and apply one or more rules to the characteristics. The application of the rules to the characteristics may trigger one or more events. The ruler 452 is configured to publish such triggered events to the inference proxy 456 to be provided to the inference cloud 406 and to the alerter 458. The alerter 458 is configured to issue an alert or notification in response to the triggered event. For example, the one or more characteristics may be that a particular person is identified as present in the scene, the ruler 452 may implement a rule that a security event is triggered when the particular person is identified, and the alerter 458 may issue a local alert indicating that the particular person has been identified.

The previewer 434 is configured to receive the video stream, the extracted video frames, and/or the one or more characteristics determined by the engine 442, to provide a preview of, for example, the raw video stream, the raw extracted video frames, and/or a video stream and/or video frames that have been annotated based on the determined one or more characteristics. For example, if the one or more characteristics are that a person is recognised in the scene, a marker may be applied to the image to indicate the location of the recognised person in the image. The previewer 434 is configured to provide the preview to the authenticator 432 via which a user device of the local network 402 may be authenticated to access and view the preview.

The log and metrics gatherer 474 is configured to gather logs and metrics of the operation of the components of the device 422. For example, the log and metric gatherer 474 may extract logs and metrics from the log and metric source 476. The log and metrics source 476 may, in some examples, represent the underlying operating system of the device 422, which may be accessed to extract logs and metrics of the operating system. For example, the log and metrics gatherer 474 may gather information on a processing load parameter of the GPU 420. The log and metrics gatherer is configured to push the logs and metrics to the shared services cloud 402 via the log and metrics proxy 472. The shared services cloud 402 may also be in communication with the authenticator 432, the inference proxy 456 and the config proxy 462 in order to gather information from these elements or provide information or services to these elements, as required.

In some examples, the engine 442 may also be configured to receive one or more frames extracted by another device and transmitted to the device 422 over the local network 402 for analysis of the content of the frame by the engine 442.. In some examples, the engine 442 is configured to publish the one or more characteristics determined for such a frame to the inference cloud via the inference block 450. In other examples, the engine 442 is configured to transmit the one or more characteristics determined for such a frame back to the other device via the local network 402.

Similarly to as described above with reference to Figures 1 to 3, the device 422 described with reference to Figure 4 may provide that frames extracted from a video stream can be analysed in real-time or near real time and/or at a relatively high frame rate when needed, but without requiring a powerful CPU/GPU 420 and/or without the need to use cloud computing platforms to undertake the analysis. For example, the device 422 may be provided by a relatively inexpensive device, for example a Raspberry Pi 0 or NVIDIA 0 Jetson Nano TM Developer Kit processing device. Further, the transmission of the extracted video frame over the local network or intranet 402 may allow that the extracted video frames, which can be of a relatively large size, can be transmitted quickly, efficiently and/or at relatively low cost, to another device for analysis. This may be as compared to, for example, transmitting extracted video frames over a wider network 302 such as the intemet 302 to a cloud computing platform for analysis, which may involve relatively high latency, routing overheads, and bandwidth costs.

Referring to Figure 5, there is illustrated a device 550 for processing a video stream, according to an example. The device 550 comprises a processing unit 552, a memory 554, and input interface 556 and an output interface 558. The device 550 may be, for example, a Raspberry Pi GI) or NVIDIA ® Jetson Nano TM Developer Kit processing device.. The device 550 may be used as any one of the devices 304, 306, 308, 422 described above with reference to Figures 1 to 4. The device 550 may be configured to perform the method according to any of the examples described above with reference to Figures 1 to 3, and/or implement the functional blocks according to any of the examples described above with reference to Figure 4. In some examples, the memory 554 stores a computer program comprising instructions which, when executed by the processing unit 552 cause the processing unit 552 to perform the method according to any of the examples described above with reference to Figures 1 to 3, and/or implement the functional blocks according to any of the examples described above with reference to Figure 4.

In some examples, the device 550 is configured to access a first camera 314a, and the processing unit 552 is configured to: receive, via the input interface 556, a video stream from the first camera 314a, the video stream comprising a temporal sequence of video frames, the content of each video frame representing a scene captured by the first camera 314a; extract one or more of the video frames from the video stream; determine, for each of the extracted video frames, whether to either: (a) analyse the content of the extracted video frame at the device 550 to determine one or more characteristics of the scene represented by the content, or (b) transmit, via the output interface 558, the extracted video frame to another device 306, 308 for analysis of the content of the extracted video frame by the other device 306, 308, the other device being configured to access a second camera 316a; and perform one of the analysis and the transmission in dependence on the determination.

In some examples, the device 550 is configured access a second camera 316a, and the processing unit 552 is configured to: receive, via the input interface 556, from another device 304 one or more video frames, the other device 304 being configured to access a first camera 314a, the received one or more video frames having been extracted by the other device 304 from a video stream received by the other device 304 from the first camera 314a, the video stream comprising a temporal sequence of said video frames, the content of each video frame representing a scene captured by the first camera 314a; and for each of the received one or more video frames, analyse the content of the video frame to determine one or more characteristics of the scene represented by the content. The processing unit 552 may be configured to output the determined one or more characteristics via the output interface 558, for example to a sewer (not shown) or back to the other device 304.

Each of the first device 304, the second device 306 and/or the third device 308 described with reference to Figures 1 to 3 may be provided by such a device 550 as described with reference to Figure 5. Accordingly, it will be appreciated that, in some examples, there is provided a system 300 comprising the first device 304 and a second device 306 connected by a network 310, the first device 304 being configured to access a first camera 314a, the second device 306 being configured to access a second camera 316a, wherein the first device 304 comprises a first processing unit configured to: receive, via a first interface, a video stream from the first camera 314a, the video stream comprising a temporal sequence of video frames, the content of each video frame representing a scene captured by the first camera 314a; extract one or more of the video frames from the video stream; determine, for each of the extracted video frames, whether to either: (a) analyse the content of the extracted video frame to determine one or more characteristics of the scene represented by the content, or (b) transmit, via a second interface, the extracted video frame over the network 310to the second device 306 for analysis of the content of the extracted video frame by the second device 306; and perform one of the analysis and the transmission in dependence on the determination; and wherein the second device 306 comprises a second processing unit configured to: receive, from the first device 304, via a third interface, one or more of the extracted video frames transmitted to the second device 306 by the first device 304 over the network 310; and for each of the one or more extracted video frames received by the second device 306, analyse the content of the video frame to determine one or more characteristics of the scene represented by the content. The second processing unit may be configured to output the determined one or more characteristics via a fourth interface, for example to a server (not shown) or back to the first device 304. The network 310 may be a local network 310, such as the LAN 310 described above with reference to Figures 1 to 3.

The above examples are to be understood as illustrative examples of the invention. It is to be understood that any feature described in relation to any one example may be used alone, or in combination with other features described, and may also be used in combination with one or more features of any other of the examples, or any combination of any other of the examples. Furthermore, equivalents and modifications not described above may also be employed without departing from the scope of the invention, which is defined in the accompanying claims.

Claims

CLAIMSI. A method of processing a video stream, the method comprising, at a first device configured to access a first camera: receiving a video stream from the first camera, the video stream comprising a temporal sequence of video frames, the content of each video frame representing a scene captured by the first camera; extracting one or more of the video frames from the video stream; determining, for each of the extracted video frames, whether to either: (a) analyse the content of the extracted video frame at the first device to determine one or more characteristics of the scene represented by the content, or (b) transmit the extracted video frame to a second device for analysis of the content of the extracted video frame by the second device, the second device being configured to access a second camera; and performing one of the analysis and the transmission in dependence on the determination.
2. The method of claim 1, wherein the first device and the second device are part of a local network and the transmission of the extracted video frame is over the local network.
3. The method according to claim 1 or claim 2, wherein the determination is based on an estimate of whether the analysis of the content of the extracted video frame at the first device would occur at a frame rate greater than or equal to a specified rate at which extracted frames are to be analysed.
4. The method according to any one of claims 1 to 3, wherein the determination is based on one or more of a rate at which the video frames are extracted from the video stream; a specified rate at which each extracted frame is to be analysed, a maximum rate at which the first device is able to analyse the content of the extracted frame, a position of the video frame in a sequence of video frames to be analysed, a processing load parameter indicative of a processing load at the first device, an identifier of the video stream from which the video frame is extracted, and a result of a pre-analysis of the content of the extracted video frame.
The method according to any one of claim 1 to claim 4, wherein the analysis comprises applying a trained model to the content of the extracted video frame to detect one or more features and/or recognise one or more objects of the scene represented by the content of the extracted video frame.
6. The method according to any one of claim 1 to claim 5, wherein the method further comprises, at the first device: where the analysis is performed on the extracted video frame at the first device, transmitting the determined one or more characteristics to a sever.
7. The method according to any one of claim 1 to claim 6, wherein the method further comprises, at the first device: where the transmission of the extracted video frame to the second device is performed, receiving, from the second device, one or more characteristics of the scene represented by the content of the video frame determined by analysing the content of the video frame at the second device.
8 The method according to claim 7, wherein the method further comprises, at the first device: combining the one or more characteristics received from the second device with one or more characteristics of a scene represented by the content of another video frame extracted from the sensor data stream by the first device and determined by the first device by analysing the content of the other video frame.
9. The method according to claim 7 or claim 8, wherein receiving the one or more characteristics from the second device comprises receiving, from the second device, in combination with the one or more characteristics, the transmitted video frame.
10. The method of any of claims 1 to 9, wherein the extraction comprises extracting a sequence of the video frames from the video stream, and wherein the determination comprises: for a first video frame in the sequence, determining that the video frame is to be analysed at the first device; and for a second video frame in the sequence, determining that the video frame is to be transmitted to the second device for analysis by the second device.
11. The method of claim 10, wherein the determination for each of the video frames in the sequence follows a predetermined pattern that is repeated for successive sequences of video frames extracted from the video stream
12. The method according to any one of claims 1 to claim 11, wherein the transmission of the extracted video frame comprises determining to which one of a plurality of devices, including the second device, the extracted video frame is to be transmitted, each one of the plurality of other devices being configured to access a respective camera.
13. The method according to any one of claim 1 to claim 12, wherein the first camera, the first device, and the second device are part of the same network.
14. The method according to any one of claim 1 to claim 12, wherein the first device is a first gateway between a first network and a second network discrete from the first network, the first network comprising the first camera, the second network comprising the second device.
The method according to claim 14, wherein the second device is a second gateway between a third network and the second network, the third network comprising the second camera.
16. A method of processing a video stream, the method comprising, at a second device configured to access a second camera: receiving, from a first device configured to access a first camera, one or more video frames the received one or more video frames having been extracted by the first device from a video stream received by the first device from the first camera, the video stream comprising a temporal sequence of said video frames, the content of each of the one or more video frames representing a scene captured by the first camera; and for each of the received one or more video frames, analysing the content of the video frame to determine one or more characteristics of the scene represented by the content.
17. The method according to claim 16, wherein the method further comprises, at the second device: transmitting the determined one or more characteristics to the first device; or publishing the determined one or more characteristics to a server.
18. The method according to claim 17, wherein transmitting the one or more characteristics to the first device comprises transmitting, to the first device, in combination with the one or more characteristics, the received video frame.
19. The method according to any one of claim 16 to claim 18, wherein the method further comprises, at the second device: receiving a further video stream from the second camera, the further video stream comprising a temporal sequence of further video frames; the content of each further video frame representing a scene captured by the second camera; extracting one or more of the further video frames from the further video stream; and for each of the extracted further video frames, analysing the content of the extracted further video frame to determine one or more characteristics of the scene represented by the content.
20. A method of processing a video stream, the method comprising: receiving, at a first device configured to access a first camera, a video stream from the first camera, the video stream comprising a temporal sequence of video frames, the content of each video frame representing a scene captured by the first camera; extracting, at the first device, one or more of the video frames from the video stream; determining, at the first device, for each of the extracted video frames, whether to either: (a) analyse the content of the extracted video frame at the first device to determine one or more characteristics of the scene represented by the content, or (b) transmit the extracted video frame to a second device for analysis of the content of the extracted video frame by the second device, the second device being configured to access a second camera; if it is determined to analyse the extracted video frame at the first device, then analysing the content of the extracted video frame at the first device to determine one or more characteristics of the scene represented by the content; and if it is determined to transmit the extracted video frame to the second device, then transmitting the extracted video frame to the second device and analysing the content of the extracted video frame at the second device to determine one or more characteristics of the scene represented by the content.
21. A device for processing a video stream, the device being configured to access a first camera, the device comprising a processing unit configured to: receive, via a first interface, a video stream from the first camera, the video stream comprising a temporal sequence of video frames, the content of each video frame representing a scene captured by the first camera; extract one or more of the video frames from the video stream; determine, for each of the extracted video frames, whether to either: (a) analyse the content of the extracted video frame at the first device to determine one or more characteristics of the scene represented by the content, or (b) transmit, via a second interface, the extracted video frame to another device for analysis of the content of the extracted video frame by the other device, the other device being configured to access a second camera; and perform one of the analysis and the transmission in dependence on the determination.
22. A device for processing a video stream, the device being configured to access a second camera, the device comprising a processing unit configured to: receive, via an interface, from another device, one or more video frames, the other device being configured to access a first camera, the received one or more video frames having been extracted by the other device from a video stream received by the other device from the first camera, the video stream comprising a temporal sequence of said video frames, the content of each video frame representing a scene captured by the first camera; and for each of the received one or more video frames, analyse the content of the video frame to determine one or more characteristics of the scene represented by the content.
23. A system comprising a first device and a second device connected by a network, the first device being configured to access a first camera, the second device being configured to access a second camera; wherein the first device comprises a first processing unit configured to: receive, via a first interface, a video stream from the first camera, the video stream comprising a temporal sequence of video frames, the content of each video frame representing a scene captured by the first camera; extract one or more of the video frames from the video stream; determine, for each of the extracted video frames, whether to either: (a) analyse the content of the extracted video frame to determine one or more characteristics of the scene represented by the content, or (b) transmit, via a second interface, the extracted video frame to the second device for analysis of the content of the extracted video frame by the second device; and perform one of the analysis and the transmission in dependence on the determination; and wherein the second device comprises a second processing unit configured to: receive, from the first device, via a third interface, one or more of the extracted video frames transmitted to the second device by the first device; and for each of the one or more extracted video frames received by the second device, analyse the content of the video frame to determine one or more characteristics of the scene represented by the content.