WO2024083537A1

WO2024083537A1 - Method and system for optically tracking moving objects

Info

Publication number: WO2024083537A1
Application number: PCT/EP2023/077799
Authority: WO
Inventors: Stein NORHEIM
Original assignee: Topgolf Sweden Ab
Priority date: 2022-10-17
Filing date: 2023-10-06
Publication date: 2024-04-25
Also published as: SE2230331A1

Abstract

Methods, systems, and apparatus, including computer program products, for tracking moving objects include depicting a space using a digital camera to produce a series of digital images (It); for two or more of said pixel values, determining an inequality comparing a first value to a second value, the first value being calculated based on the square of the difference between a pixel value (ix,y,t) and a predicted pixel value (AA), the second value being calculated based on a product of the square of a number Z and an estimated variance for historic pixel values; storing in a computer memory information indicating that the pixel value (ix,y,t) is part of a detected blob; and correlating detected blobs across said 10 series of digital images (It) to determine paths of moving objects.

Description

Method and system for optically tracking moving objects

BACKGROUND OF THE INVENTION

The present invention relates to a method and a system for optically tracking moving ob- jects.

Known methods track moving objects using computer vision, using one or more cameras depicting a space where the moving objects exist. The tracking may be performed by first identifying an object as one image pixel, or a set of adjacent pixels, that deviate from a local background. Such deviating pixels are together denoted a "blob". Once a number of blobs have been detected in several image frames, possible tracked object paths are identified by interconnecting identified blobs in subsequent frames.

One example of such a method is exemplified in US 20220051420 Al.

The blob generation in each individual frame potentially results in very many false positive blobs, in other words identified blobs that do not really correspond to an existing moving object. This may be due to noise, shifting lighting conditions and non-tracked objects occur- ring in the field of view of the camera in question.

The detection of possible tracked object paths normally results in a reduction of such false positives, for instance based on filtering away of physically or statistically implausible paths. Due to the large number of false positive blob detections, however, even if most of the false positives are filtered away in the tracked paths detection step, the blob detection itself is associated with heavy memory and processor load and may therefore constitute a bottle- neck for the object tracking even if high-performance hardware is used.

Moreover, as the performance of digital cameras increases, pixel data output from such cameras increases correspondingly. In order to achieve accurate tracking of moving objects, it is desired to use as accurate and precise image information as possible. In order to avoid too many non-detected blobs (false negatives), leading to potentially missed tracked object paths, it is normally preferred to accept a relatively large share of false positive blob detections.

SUMMARY OF THE INVENTION

The various embodiments described herein solve one or more of the above described prob- lems and provide techniques for tracking the paths of moving objects using less memory and/or processing power compared to conventional object tracking techniques.

Hence, the invention can be embodied as a method for tracking moving objects, comprising the steps obtaining a series of digital images l_t at consecutive times t, the digital images l_t rep- resenting optical input from a three-dimensional space within a field of view of the digital camera, the digital camera being arranged to produce said digital images l_t having a corre- sponding set of pixels p_x,_y, said digital images comprising corresponding pixel values i_x,_y,t, the digital camera not moving in relation to said three-dimensional space during production of said series of digital images (l_t); for two or more of said pixel values i_x,_y,_t, determining an inequality comparing a first value to a second value, the first value being calculated as, or based on, the square of the difference between the pixel value i_x,_y,_t in question and a predicted pixel value fi._{xy t}, the second value being calculated as, or based on, a product of, firstly, the square of a number Z and, secondly, an estimated variance or standard deviation a_{xy t} with respect to historic pixel values i_x,_y,{_t-n,t-i} for the pixel p_x,_y in question, where the predicted pixel value fi._Xiy_it is calculated based on historic pixel values i_x,_y,{_t-n,t-i} for the pixel p_x,_y in question, Z being a number selected such that Z² is an integer such that 10 < Z² < 20; for pixel values i_x,_y,_t for which said first value is higher than said second value, storing in a computer memory information indicating that the pixel value i_x,_y,_t is part of a detected blob; and correlating, based on the information stored in the computer memory, detected blobs across said series of digital images l_t to determine paths of moving objects through said three-dimensional space.

In some embodiments, said inequality is

where s said predicted pixel value

and where a_{x y t} is an estimated standard deviation with respect to historic pixel values i_x,_y,{t- _n,t-i} for the pixel p_x,_y in question.

In some embodiments, the method further comprises storing, in said computer memory, for individual ones of said pixels p_x,_y and for a num- ber n < N, the sums

for individual ones of said pixel values i_x,_y,_t, determining said inequality as

In some embodiments,

S_x,_y,t, Qx,_y,t, or both, are calculated recursively, whereby a calculated value for a pixel value i_x,_y,_t is calculated using a previously stored calculated value S_x,_y,_t, Q_x,_y,_t, or both, for the same pixel p_x,_y but at an immediately preceding time t-1.

In some embodiments,

S_x,_y,_t is calculated as

and Q_x,_y,t is calculated as

In some embodiments, the method further comprises storing in said computer memory S_x,_y,_t and Q_x,_y,_t in combination as a single datatype comprising 12 bytes or less per pixel p_x,_y.

In some embodiments, the method further comprises storing in said computer memory, for a particular digital image l_t, a pixmap having, for each pixel p_x,_y, said information indicating that the pixel value i_x,_y,t is part of a detected blob.

In some embodiments, said information indicating that the pixel value i_x,_y,_t is part of a detected blob is indi- cated in a single a bit for each pixel p_x,_y.

In some embodiments, said pixmap also comprises, for each pixel p_x,_y, a value indicating an expected pixel value i_x,_y,_t for that pixel p_x,_y.

In some embodiments, said value indicating an expected pixel value i_x,_y,_t for the pixel p_x,_y in question by storing the predicted pixel value (p_x,_y,_t) ^{as a} fixed-point fractional number, using a total of 15 bits for the integer and fractional parts.

In some embodiments, the predicted pixel value p_{xy t}, the estimated variance or standard deviation a_{xy t}, ^or both, is or are calculated based on a set of n historic pixel values i_x,_y,{_t-n,t-i} for the pixel p_x,_y in question, where 10 < n < 300.

In some embodiments, a number n of previous images l_t considered for the estimation of an estimated vari- ance or a standard deviation o_x,y ,t °f the second value is selected to be a power of 2.

In some embodiments, said pixel values i_x,_y,_t have a depth across one or several channels of between 8 and 46 bits.

In some embodiments, the predicted pixel value fi_x>y>t is determined based on an estimated projected future mean pixel value

in turn determined based on historic pixel values i_x,_y,t for a sampled set of pixels p_x,_y in said images l_t.

In some embodiments, the predicted pixel value p._Xiy_it is determined as p._Xiy_it ⁼

where a and /? are constants determined so as to minimize is

said estimated projected future mean pixel value for the pixel pj,k in question, and where j and k are iterated over a test set of pixels.

In some embodiments, is an estimated historic mean with respect to pixel values i_x,_y,_t for the pixel pj,k in question.

In some embodiments, said test set of pixels contains between 1% and 25% of the total set of pixels p_x,_y in the image l_t.

In some embodiments, said test set of pixels is geometrically evenly distributed across the total set of pixels p_x,_y in the image l_t.

In some embodiments, the estimated standard deviation a_{x y t} is determined according to a_{x y t} =

In some embodiments, the method further comprises determining that at least one is true of a being further away from 1 than a first thresh- old value and /? being further away from 0 than a second threshold value; and determining the predicted pixel value, (fi_x>y>t) according to any one of claims 14-18 until it is determined that a is no longer further away from 1 than the first threshold value and /? is no longer further away from 0 than the second threshold value.

In some embodiments, the method further comprises for said pixel values i_x,_y,t for which said first value is higher than said second value, only store said information indicating that the pixel value i_x,_y,t is part of a detected blob in case also the following inequality holds: B[i_{x y t} — fi_X:y_:t]² > fi_x,_y,t, where i_{x y t} is the pixel value in question, where p._Xiy_it is the predicted pixel value and where B is an integer such that B > 100.

In some embodiments, the method further comprises using a Hoshen-Kopelman algorithm to group together individual adjacent pixels de- termined to be part of a same blob.

In some embodiments, the objects are golf balls.

The invention can be embodied as a method for tracking moving objects, the method com- prising: obtaining a series of digital images I from a digital camera, the digital images I repre- senting optical input from a three-dimensional space within a field of view of the digital camera over time, each of the digital images I having pixels p_x,_y with corresponding pixel values i_x,_y; performing, at a computer, image segmentation on each image of the series of digital images I using a statistical model of background for the optical input to detect blobs, wherein performing the image segmentation comprises, for each of two or more pixel val- ues i_x,_y,_t in the image, determining an inequality result using a current pixel value i_x,_y,_t for a pixel p_x,_y in a current image l_t, first S_{xy t} and second Q_Xiy_it values of the statistical model for the pixel p_x,_y, and a confidence level value Z², wherein the first S_{x y t} and second Q_Xiy_it values are calcu- lated based on historic pixel values i_x,_y from images from the series of digital images I before the current image l_t, each of the current pixel value i_x,_y,_t, the first S_{xy t} and second Q_Xiy_it values, and the confidence level value Z² are stored as integer type data in a memory of the computer, and the determining uses integer operations in the computer, and storing in the memory of the computer, information indicating that the current pixel value i_x,_y,_t for the image pixel in the current image l_t is part of a detected blob in response to the inequality result; and using the stored information to correlate detected blobs across the series of digital images I to determine paths of moving objects through the three-dimensional space within the field of view of the digital camera.

Moreover, the invention can also be embodied as a system for tracking moving objects, the system comprising a digital camera, a digital image analyzer and a moving object tracker, the digital camera being arranged to represent optical input from a three-dimensional space within a field of view of the digital camera to produce a series of digital images l_t at consecutive times t, the digital camera being arranged to produce said digital images l_t hav- ing a corresponding set of pixels p_x,_y, said digital images comprising corresponding pixel val- ues i_x,_y,_t, the digital camera being arranged to not move in relation to said three-dimensional space during production of said series of digital images (l_t); the digital Image analyzer being configured to, for two or more of said pixel values i_x,_y,t, determine an inequality comparing a first value to a second value, the first value being calculated as, or based on, the square of the difference between the pixel value i_x,_y,_t in ques- tion and a predicted pixel value fi._{xy t}, the second value being calculated as, or based on, a product of, firstly, the square of a number Z and, secondly, an estimated variance or stand- ard deviation a_{xy t} with respect to historic pixel values i_x,_y,{_t-n,t-i} for the pixel p_x,_y in question, where the predicted pixel value p._Xiy_it is calculated based on historic pixel values i_x,_y,{_t-n,t-i} for the pixel p_x,_y in question, and where Z is selected such that Z² is an integer such that 10 < Z² < 20; the digital image analyzer being configured to, for pixel values i_x,_y,t for which said first value is higher than said second value, store in a computer memory information indicating that the pixel value i_x,_y,t is part of a detected blob; and the moving object tracker being configured to correlate, based on the information stored in the computer memory, detected blobs across said series of digital images l_t to determine paths of moving objects through said three-dimensional space.

Furthermore, the invention can also be embodied as a computer software product config- ured to, when executing, receive a series of digital images l_t from a digital camera, the digital camera being arranged to represent optical input from a three-dimensional space to produce said digital images l_t at consecutive times t, the digital camera being arranged to produce said digital images l_t having a corresponding set of pixels p_x,_y, said digital images comprising corre- sponding pixel values i_x,_y,_t, the digital camera being arranged to not move in relation to said three-dimensional space during production of said series of digital images (l_t); for two or more of said pixel values i_x,_y,_t, determine an inequality comparing a first value to a second value, the first value being calculated as, or based on, the square of the difference between the pixel value i_x,_y,_t in question and a predicted pixel value p._Xiy_it> the second value being calculated as, or based on, a product of the square of, firstly, a number Z and, secondly, an estimated variance or standard deviation a_{xy t} with respect to historic pixel values i_x,_y,{_t-n,t-i} for the pixel p_x,_y in question, where the predicted pixel value fi._Xiy_it is calculated based on historic pixel values i_x,_y,{_t-n,t-i} for the pixel p_x,_y in question, and where Z is selected such that Z² is an integer such that 10 < Z² < 20; for pixel values i_x,_y,_t for which said first value is higher than said second value, store in a computer memory information indicating that the pixel value i_x,_y,_t is part of a detected blob; and correlate, based on the information stored in the computer memory, detected blobs across said series of digital images l_t to determine paths of moving objects through said three-dimensional space. The computer software product can be implemented by a non-transitory computer-reada- ble medium encoding instructions that cause one or more hardware processors located in at least one of computer hardware devices in the system to perform the digital image pro- cessing and the object tracking.

BRIEF DESCRIPTION OF THE DRAWINGS

In the following, the invention will be described in detail, with reference to exemplifying embodiments of the invention and to the enclosed drawings, wherein:

Figure 1 is an overview of a system 100 configured to perform a method of the type illus- trated in Figure 3;

Figure 2 is a simplified illustration of a data processing apparatus;

Figure 3 shows a general flowchart for logically tracking moving target objects;

Figure 4 is a flowchart of a method performed by the system 100 shown in Figure 1;

Figure 5 is an overview illustrating a noise model of a type described herein;

Figure 6 shows an image frame illustrating a noise model;

Figure 7 illustrates an example of clustering of pixels into blobs; and

Figure 8 illustrates intensities for a pixel during a sudden exposure change event.

All figures share the same reference numerals for same and corresponding parts.

DETAILED DESCRIPTION

With reference to Figure 1, the method relates to a method for tracking moving target ob- jects 120. Generally, a system 100 can comprise one or several digital cameras 110, each being arranged to represent optical input from a three-dimensional space 111 within a field of view of the digital camera 110, to produce digital images of such moving target objects 120, the objects travelling through a space 111 hence being represented by the digital cam- era 110 in consecutive digital images. Such representation by the digital camera 110 will herein be denoted a "depiction", for brevity. The digital camera 110 is arranged to not move in relation to the space 111 during produc- tion of the series of digital images (l_t). For instance, the digital camera 110 may be fixed in relation to said space 111, or, in case it is movable it is kept still during the production of the series of digital images (l_t). Hence, the same part of the space 111 is depicted each time by the digital camera 110, and the digital camera 110 is arranged to produce digital images It having a corresponding set of pixels p_x,_y, and so that said produced digital images l_t com- prise corresponding pixel values i_x,_y,_t. "x" and "y" denote coordinates in an image coordinate system, whereas "t" denotes time.

That the pixel values i_x,_y,_t of two or more different images l_t "correspond" to each other means that individual pixels p_x,_y measure light entering the camera 110 from the same, or substantially the same, light cone in all of the images l_t in question. It is realized that the camera 110 may move slightly, due to wind, thermal expansion and so forth, between im- ages l_t, but that there is substantial correspondence between pixels p_x,_y even in cases where such noise-inducing slight movement is present. There can be at least 50% overlap between light cones of any one same pixel p_x,_y of the camera 110 between any two consecutive im- ages l_t. There may also be cases where the camera 110 is movable, such as pivotable. In such cases an image transformation can be applied to a captured image so as to bring its pixels p_x,_y into correspondence with pixels of a previous or future captured image.

In case the system 100 comprises more than one digital camera 110, several such digital cameras 110 can be arranged to depict the same space 111 and consequently tracking the same moving target object(s) 120 through said space 111. In such cases, the several digital cameras 110 can be used to construct a stereoscopic view of the respective tracked path of each target object 120.

As mentioned, the digital camera 110 is arranged to produce a series of consecutive images It, at different points in time. Such images may also be denoted image "frames". In some embodiments, the digital camera 110 is a digital video camera, arranged to produce a digital moving film comprising or being constituted by such consecutive digital image frames. As is illustrated in Figure 1, the system 100 comprises a digital image analyzer 130, config- ured to analyze digital images received directly from the digital camera 110, or receive from the digital camera 110 via an intermediate system, in same or processed (re-formatted, compressed, filtered, etc.) form. The analysis performed by the digital image analyzer 130 can take place in the digital domain. The digital image analyzer 130 may also be denoted a "blob detector".

The system 100 further comprises an object tracker 140, configured to track said moving target objects 120 across several of said digital images, based on information provided from the digital image analyzer 130. The analysis performed by the object tracker 140 can also take place in the digital domain.

In example embodiments, the system 100 is configured to track target objects 120 in the form of sports objects in flight, such as balls in flight, for instance baseballs or golf balls in flight. In some embodiments, the system 100 is used at a golf practice range, such as a driv- ing range having a plurality of bays for hitting golf balls that are to be tracked using the system 100. In other cases, the system 100 can be installed at an individual golf range bay, or at a golf tee, and configured to track golf balls being struck from said bay or tee. The system 100 can also be a portable system 100, configured to be positioned at a location from which it can track said moving target objects 120. It is realized that the monitored "space" mentioned above will, in each of these and other cases, will be a space through which sport balls are expected to move.

Various types of computers can be used in the system 100. The digital image analyzer 130 and the object tracker 140 constitute examples of such computers. In some cases, the digital image analyzer 130 and the object tracker 140 can be provided as software functions exe- cuting on one and the same computer. The one or several digital cameras 110 can also be configured to perform digital image processing, and then also constitute examples of such computers. In some embodiments, the digital image analyzer 130 and/or the object tracker 140 is or are implemented as software functions configured to execute on hardware of one or several digital cameras 110. In other embodiments, the digital image analyzer 130 and/or the object tracker 140 is or are implemented on standalone or combined hardware plat- forms, such as on a computer server.

The one or several digital cameras 110, the digital image analyzer 130 and the object tracker 140 are configured to communicate digitally, either via computer-internal communication paths, such as via a computer bus, or via computer-external wired and/or wireless commu- nication paths, such as via internet network 10 (e.g., the Internet). In implementations that need substantial communications bandwidth, the camera(s) 110 and the digital image ana- lyzer 130 can communicate via a direct, wired digital communication route, which is not over the network 10. On the other hand, the digital image analyzer 130 and the object tracker 140 may communicate with each other over the network 10 (e.g., a conventional Internet connection).

The essential elements of a computer, in general, are a processor for performing instruc- tions and one or more memory devices for storing instructions and data. As used herein, a "computer" can include a server computer, a client computer, a personal computer, em- bedded programmable circuitry, or a special purpose logic circuitry. Such computers can be connected with one or more other computers through a network, such as the internet 10, or via any suitable peer-to-peer connection for digital communications, such as a Blue- tooth® connection.

Each computer can include various software modules, which can be distributed between an applications layer and an operating system. These can include executable and/or interpret- able software programs or libraries, including various programs that operate, for instance, as the digital image analyzer 130 program and/or the object tracker 140 program. Other examples include a digital image preprocessing and/or compressing program. The number of software modules used can vary from one implementation to another and from one such computer to another. Each of said programs can be implemented in embedded firmware and/or as software modules that are distributed on one or more data processing apparatus connected by one or more computer networks or other suitable communication networks. Figure 2 illustrates an example of such a computer, being a data processing apparatus 300 that can include hardware or firmware devices including one or more hardware processors 312, one or more additional devices 314, a non-transitory computer readable medium 316, a communication interface 318, and one or more user interface devices 320. The processor 312 is capable of processing instructions for execution within the data processing apparatus 300, such as instructions stored on the non-transitory computer readable medium 316, which can include a storage device such as one of the additional devices 314. In some im- plementations, the processor 312 is a single or multi-core processor, or two or more central processing units (CPUs). The data processing apparatus 300 uses its communication inter- face 318 to communicate with one or more other computers 390, for example, over the network 380. Thus, in various implementations, the processes described can be run in par- allel, concurrently, or serially, on a single or multi-core computing machine, and/or on a computer cluster/cloud, etc.

The data processing apparatus 300 includes various software modules, which can be dis- tributed between an applications layer and an operating system. These can include execut- able and/or interpretable software programs or libraries, including a program 330 that con- stitutes the digital image analyzer 130 described herein, configured to perform the method steps performed by such digital image analyzer 130. The program 330 can also constitute the object tracker 140 described herein, configured to perform the method steps per- formed by such object tracker 140.

Examples of user interface devices 320 include a display, a touchscreen display, a speaker, a microphone, a tactile feedback device, a keyboard, and a mouse. Moreover, the user in- terface device(s) need not be local device(s) 320, but can be remote from the data pro- cessing apparatus 300, e.g., user interface device(s) 390 accessible via one or more commu- nication network(s) 380. The user interface device 320 can also be in the form of a standalone device having a screen, such as a conventional smartphone being connected to the system 100 via a configuration or setup step. The data processing apparatus 300 can store instructions that implement operations as described in this document, for example, on the non-transitory computer readable medium 316, which can include one or more ad- ditional devices 314, for example, one or more of a floppy disk device, a hard disk device, an optical disk device, a tape device, and a solid state memory device (e.g., a RAM drive, a Flash memory or an EEPROM). Moreover, the instructions that implement the operations described in this document can be downloaded to the non-transitory computer readable medium 316 over the network 380 from one or more computers 390 (e.g., from the cloud), and in some implementations, the RAM drive is a volatile memory device to which the in- structions are downloaded each time the computer is turned on.

It is realized that the described computer hardware can be physical hardware, virtual hard- ware or any combination thereof.

As mentioned, the system 100 is configured to perform a method according to one or more embodiments for optically tracking moving target objects 120.

The present invention can furthermore be embodied as a computer software product, con- figured to perform said method when executing on computer hardware of the type de- scribed herein. The computer software product can hence be deployed as a part of the sys- tem 100 so as to provide the functionality required to perform the present method.

Both said system 100 and said computer software product are hence configured to track moving target objects 120 moving through said space 111 in relation to one or several digital cameras 110, by comprising or embodying the above-mentioned digital image analyzer 130 and object tracker 140, in turn being configured to perform the corresponding method steps described herein.

In general, everything that is said in relation to the presently described method is equally applicable to the system 100 and to the computer software product described herein, and vice versa. Figure 3 illustrates a general flowchart for tracking moving target objects 120 based on dig- ital image information received from one or several digital cameras 110.

In computer vision, "image segmentation" is the process of separating an image into differ- ent regions, representing target objects within it. Generally, it is desirable to distinguish potential moving target objects from a background. The background may in general be changing and noisy, and is in many cases quite unpredictable. In the example of a golf ball, for instance, when such a ball is far away from the digital camera 110 depicting the ball 120, it may be even as small as one single pixel p_x,_y in the digital image frame produced by the digital camera 120.

For these reasons, it is in general not possible to separate out a foreground object 120 from a background based only on a detected shape in relation to an expected shape of the target object 120. Instead, it is proposed to set up a statistical model of the background (in the following denoted a "noise model"), and to identify pixels p_x,_y that by a probability measure deviate from an expected value with more than a threshold value, based on this model. Adjacent pixels p_x,_y in the detected digital image that deviate from the expected value in accordance with the model are grouped together into a "blob" of pixels p_x,_y ("blob aggrega- tion").

Such a method may result in a very large number of false positives, such as about 99.9% false positives. However, a subsequent motion tracking analysis can sort out the vast ma- jority of all false positives, such as only keeping blobs that seem to obey Newton's laws of motion between consecutive digital image frames l_t.

The noise model step, as depicted in Figure 3, is used to suppress noise in the image frames, with the purpose of lowering the number of detected blobs in the subsequent blob aggre- gation step. The noise model analyzes a plurality of pixels p_x,_y, such as every pixel p_x,_y, in said image frames l_t, and is therefore at risk of becoming a major bottleneck. These calculations, aiming to identify noise that does not conform to a detected statistical pattern in order to identify outliers, can be handled by high-performance GPUs (Graphics Processing Units), but performance may still prove to be a problem. The approach described herein has turned out to drastically reduce the computational power required per pixel p_x,_y in a moving target object 120 tracking system 100. This reduction can be exploited by using simpler hardware, lower power consumption or a larger incoming image bitrate.

Turning now to Figure 4, a method according to one or more embodiments is illustrated.

In a first step SI, the method starts.

In a subsequent step S2, a number Z is selected such that Z² is an integer. The number Z can be selected such that Z² is an integer such that 10 < Z² < 20. It is noted that Z may be a non-integer, as long as Z² is an integer value. This step S2 may be performed ahead of time, such as during a system 100 design process or a system 100 calibration step.

In a subsequent step S3, the space 111 is depicted using the digital camera 110 to produce a series of digital images i_t at consecutive times t. The space 111 can be depicted using the digital camera 110 to produce a series of N digital images i_t at consecutive times t. However, it is realized that the procedure can also be a continuous or semi-continuous procedure, wherein the digital camera 110 will continue to produce digital images i_t at consecutive times t so long as the procedure is ongoing. Hence, in this case the number of digital images N will grow by 1 for each captured frame. In either case, the series of digital images i_t at consecutive times t may be seen as a stream of digital images captured much like a digital video stream.

In a subsequent step S4, for two or more (e.g., several) of said pixel values i_x,_y,t, an inequality is determined, involving comparing a first value to a second value.

The first value is calculated based on the square of the difference between the pixel value i_x,_y,t in question and a calculated predicted pixel value p._Xiy_it for that pixel p_x,_y. The second value is calculated based on a product of, on the one hand, the square of the selected num- berZ, this square then being an integer value, and, on the other hand, an estimated variance or standard deviation cT_x,y,_t with respect to historic pixel values i_x,_y,{t-n,t-i} for the pixel p_x,_y in question. Concretely, the second value can be calculated based on said estimated variance or a square of the estimated standard deviation

The predicted pixel value p._Xiy_it is also calculated based on historic pixel values i_x,_y,{t-n,t-i} for the pixel p_x,_y in question, in other words using information from image frames l_t-at captured by the camera 110 at points in time prior to the time t. The predicted pixel value (i_Xy_)t can be calculated based on the same, or different, set of historic pixel values i_x,_y,{t-n,t-i} as the estimated variance or standard deviation a_{x y t}.

In the notation used herein, "n" denotes the number of historic pixel values i_x,_y,t, considered by the noise model, counting backwards from the currently considered image frame. This notation hence assumes that the same consecutive pixel values i_x,_y,t, up to the presently considered image frame, are used to calculate both the first and the second value, but it is realized that any suitable contiguous or non-contiguous, same or different, intervals of pixel values i_x,_y,_t can be used to calculate the first and the second value, respectively.

In general, the equations and expressions disclosed and discussed herein are provided as illustrative examples, and it is realized that in practical embodiments they can be tailored to specific needs. This can include, for instance, the introduction of various constant factors and scaling factors; additional intermediate calculation steps, such as filtering steps; and so forth.

In some embodiments, said inequality may be written as:

where (i_Xy_)t is said predicted pixel value and where a_{x y t} is an estimated standard deviation with respect to said historic pixel values i_x,_y,{_t-n,t-i} for the pixel p_x,_y in question. In general, the presently described noise model can be configured to, for each pixel p_x,_y, estimate a moving average and standard deviation based on the last n image frames, and then to use these metrics to decide whether the pixel value i_x,_y,t in the same image location in the new frame deviates from the expected value more than an allowed limit.

This model can be designed to assume that any pixel in the background of the considered image i_t has an intrinsic Gaussian noise, as long as the background only contains features that are assumed to be static in the first approximation. A normal distribution can be used to establish a suitable confidence interval. For instance, if a Z score of 3.464 is used, it can be seen that 99.95% of all samples with no significant differences from the background fall within the corresponding confidence interval. Therefore, a pixel p_x,_y with signal value i_x,_y at time t is considered to have a significant difference from the background if:

Here, k iterates over the previous n frames. The limit is based on the (uncorrected) standard deviation:

The corrected (unbiased) standard deviation would be a mathematically more correct choice, i.e. a more accurate estimate of <J would result from dividing by n-1 rather than by n. However, for the present purposes this is not significant, since the limit used is a multiple of the standard deviation that may be freely selected. Selecting the number n of previous image frames considered for the estimation of the standard deviation in the second value (used in evaluating said inequality) to be a power of 2 (e.g. 16, 32, 64, ...), we can get com- putationally efficient multiplications and divisions at a very low cost, by using shifting oper- ations. When processing an innage frame h, pixel values i_x,_y from frames k E [t — n, t — 1] are used. A variant of the formula for computing the standard deviation that allows for it to be com- puted in a single pass is the following. Here the expression for the estimate of the mean is also provided: )

)

Set Q_Xiy_it = re- written as:

Revisiting (1), it is safe to square both sides, since both the left-hand and right-hand sides of this equation are non-negative:

Combining (4) and (6) yields:

This is equivalent to:

It is noted that n < N. Hence, the above-discussed inequality can be expressed as (8), with

Since i_Xry_rt, n, S_Xry_rt and Qx,_y,t are all integers, and since Z can be picked to produce an appro- priate or desired number of false positives, the entire calculation can be done using only integer numbers. This means that the calculations can be performed without any loss of precision due to floating point truncation errors. Also, integer operations are typically faster than their floating-point counterparts.

In the following table, various outcomes for different selected values of Z are shown:

Z² Z P (number of false positives, ppm)

12 3.46410 532.1

13 3.60555 311.6

14 3.75166 175.7

15 3.87298 107.6

16 4.00000 63.4

Equation (8) depends on knowledge of the sum S and the squared sum Q of the last n ob- servations of the pixel value i_x,_y,t in question:

While it would be possible to calculate the statistics using (9) and (10) directly, for each pixel value i_x,_y,t of each frame l_t, it is much more computationally efficient to use a recursive def- inition, where in every step the new frame l_t is added to the noise model, and the frame l_t- _n from n frames back is removed:

Updating Q_x,y,t requires two multiplications to generate the squares. However, since Q_x,_y,_t involves a difference of squares, it can be reduced to one single multiplication if rewritten as follows:

(14) and (15) are then the full calculations required to update the noise model. A straight- forward implementation would require only 3 (int) additions, 1 (int) subtraction and 1 (int) multiplication per pixel, which makes it very computationally efficient. Furthermore, these calculations can be accelerated by use of SIMD instructions sets such as AVX2 (on x86_64) or NEON (on aarch64), or they can be run on a GPU or even implemented on an FPGA.

The calculations performed to update the noise model between consecutive image frames It are conceptually illustrated in Figure 5, showing how the "New frame" l_t is added to the Noise Model, and a frame l_t.n from n frames back, such as the last frame in the currently considered "Queue of frames in model" is removed. As has been described above, this can be performed efficiently by considering the individual pixel values i_x,_y,t and i_x,_y,t-_n, calcuting the value of z+ and z..

From the above it is clear that 0 < S < 2^bn and 0 < S < 2^2bn, where b is the bit depth of the input pixel value i_x,_y,_t data. In some embodiments, n is not more than 300, or not more than 256, or not more than 128, and such as not more than 64 = 2⁶ (the averaging is not performed over more than 64 consecutive image frames l_t). In some embodiments, n can be as low as 32, or even as low as 16 or even 10. In some embodiments, the n frames con- sidered at each point in time are the n latest frames captured and provided by the camera 110. In this case, the n frames can together cover a time period of between 0.1 s and 10 s, such as between 0.5 s and 2 s, of captured video. In other words, the number of considered frames n can be relatively close to a frame rate used by the digital camera 110. The noise model may then be required to store two integers per pixel p_x,_y, in addition to keeping the actual image frames in memory for at least as many frames l_t as the length of the window size n. Furthermore, an additional single-precision float may be required per pixel to store the estimated variance if the calculation (as described in equation (19), below) is used.

In some embodiments, the pixel values i_x,_y,t have a bit depth across one or several channels of between 8 and 48 bits, such as a single channel (for instance a gray channel) of 8 or 16 bit depth or three channels (such as RGB) of 16 or 24 bit depth.

In case the camera 110 provides pixel value i_x,_y,_t information across several color channels, the pixel values i_x,_y,_t can be transformed into a single-channel (such as a gray scale channel) before processing of the pixel values i_x,_y,_t by the digital image analyzer 130. Alternatively, only one such channel, out of several available channels, can be used for the analysis. Fur- ther alternatively, several channels can be analyzed separately and in parallel, so that a pixel that is detected to be a blob in at least one such analyzed channel is determined to be a blob at any point in time.

The transformed pixel values i_x,_y,_t can have a bit depth of at least 8 bits, and in some em- bodiments at the most 24 bits, such as at the most 16 bits. A bit depth of 12 bits has proven to strike a reasonable balance between speed, memory requirements and output quality. In case input data has a higher bit depth than required, the data from the camera 110 can be transformed (down-sampled) before processing by the digital image analyzer 130.

More generally, the number of bits required can be found for S as D + log₂(n) and for Q as 2D + log₂(n), where D is a bit depth for one single considered channel.

The following table shows the required storage space for S and Q depending on the used pixel value i_x,_y,_t bit depth when n = 64: Pixel bit depth S required bits Q required bits

8 14 (uintl6) 22 (uint32)

10 16 (uintl6) 26 (uint32)

12 18 (uint32) 30 (uint32)

16 22 (uint32) 38 (uint64)

In general, the method can comprise a step in which the noise model is updated and stored in computer memory, as a collection of updated noise model information (S and Q) with respect to individual pixels p_x,_y for which blob detection is to be performed. This noise map can hence be updated and stored for each pixel p_x,_y in the image.

Using the above-explained calculations, it is possible to store, in said computer memory, updated values for S_x,_y,_t and Qx,_y,t in combination as a single datatype (such as a single struc- ture, record or tuple), the datatype comprising 12 bytes or less, or 10 bytes or less, or even 8 bytes or less, per pixel p_x,_y. This storing, for each analyzed pixel value i_x,_y,_t (such as for all pixels p_x,_y in the image l_t), of updated values for S_x,_y,_t and Q_x,_y,_t in combination as a single datatype, constitutes an example of the "noise model" described herein. Hence, the noise model is updated for each analyzed digital image frame l_t, such as for each individual image frame l_t in the set of consecutive image frames l_t produced and provided by the (each) digital camera 110.

In the same step S4, for pixel values i_x,_y,_t for which said first value is found to be higher than said second value, information is stored in said computer memory, the information indicat- ing that the pixel value i_x,_y,_t is part of a detected blob.

This storing can take place in a generated pixmap, in other words a data structure having such indicating information for each pixel p_x,_y. The information for each pixel p_x,_y that it be- longs or does not belong to a blob for that image frame l_t can be stored very computation- ally efficient, since it can be stored as a single binary bit. One way of implementing such a pixmap in practice is to use a "noise map" of the general type that will be described in the following, where the pixmap also comprises, for each pixel p_x,y, a value indicating an expected pixel value i_x,_y,t for that pixel p_x,_y.

Hence, for each frame, the noise model established as described above can be used to gen- erate such a noise map, that for every pixel position p_x,_y provides information about whether or not that particular pixel value i_x,_y,_t in the new frame l_t was outside of the allowed limits (that is, if (6) or (8) was true). In addition, the noise map can store an expected signal value for each pixel p_x,_y at time t, such as based on the calculations performed in the deter- mination of the noise model. The expected signal value is useful in downstream calculations, such as in a subsequent blob aggregation step, and so it is computationally efficient to es- tablish and store this information already at this point.

Figure 6 illustrates the noise model after being updated based on the information of a most recently available image frame l_t, and in particular how the frame l_t relates to the values of S_x,_y and Q_x,_y for that time t.

Even though it would be possible to first emit the noise map for each new image frame l_t arriving at the digital analyzer 130, and only thereafter to update the noise model in the digital analyzer 130, both of them can be done in one go, without unloading or overwriting the information in memory between said calculations. Hence, the (each) new image frame It is loaded into the CPU memory; and z+, z., S_{x y t},

are calculated for each pixel p_x,_y, as the case may be, before the loaded data is unloaded or overwritten in the CPU memory. The advantage achieved then is to avoid memory access becoming a bot- tleneck. Once the penalty of loading the data into the CPU has been paid, all the necessary calculations are performed before unloading or overwriting the data in the CPU memory.

In the following example, the noise map requires 16 bits per pixel p_x,_y to store. This infor- mation can be stored in a single two-byte datatype (such as an uintl6). The information indicating whether or not the pixel p_x,_y corresponding to each noise map entry is a blob pixel or not can be stored in the form of one single bit out of the total number of stored bits for the pixel p_x,_y in question in the noise map. In some embodiments, the most significant bit in datatype used to store noise map data for each pixel p_x,_y, such as the most significant bit in the exemplifying two-byte structure, indicates whether the pixel value i_x,_y,_t in question is outside the blob generating limits. Then, the lower 15 bits can encode the expected (average) pixel value i_x,_y signal, scaled to 15 bits precision and can be stored in fixed-point representation. It is noted that this expected pixel value i_x,_y signal corresponds to the above-discussed predicted pixel value fi._{xy t}. In other words, the value in the noise map indicating an expected pixel value i_x,_y,_t for the pixel p_x,_y can be achieved by transforming (if necessary) the predicted pixel value

to a grayscale bit depth of 15 bits.

In one example, the encoding is according to the following, for performance reasons: First, the expected signal is scaled to 15 bits (0..32767). If n = 32 and the input pixel depth is 12 bits, this means that S_t uses 17 bits for each pixel value i_x,_y,_t. A simple shift operation will divide this number by 4, which puts it in the 15 bit range. Secondly, if the pixel value i_x,_y,_t is within the limits given in equation (6) (or as given in its reformulated form (8)), all bits are negated. A noise map consumer can therefore iterate through the pixels p_x,_y of the noise map data and ignore all entries that have the most significant bit set to 1.

It should be noted that the pixmap for each pixel at least or only contains information on 1) whether that pixel is part of a blob and 2) the predicted pixel value for that pixel. In this case, the prediction is simply the arithmetic mean of the previous n frames, but we will, later on, describe a method, which can be an alternative method to the one described so far, to predict the value to be used when the recent frames have large changes in capture parameters such as shutter time or gain.

In some embodiments, the stored noise model incorporates all available information from image frames l_t received to the digital image analyzer 130 from the camera 110. In other words, it can use n consecutive or non-consecutive image frames l_t up until a most recently received image frame l_t to calculate values for Q_x,_y and S_x,_y. On the other hand, the estimated projection (predicted pixel value fi_Xiy_it) data stored for each pixel p_x,_y in the noise map can be updated only using a second-to-most recently received image frame l_t, i.e. not using a most recently received image frame l_t that contains the pixel values i_x,_y,t to be assessed with respect to blob classification. In practice, this may mean that the previous values for S_x,_y,_t, before being updated using the most recently received pixel values i_x,_y,_t, can be used to cal- culate the (transformed) predicted data which is then stored in the pixmap.

In the above example, the predicted pixel value p._Xiy_it is determined as (or at least based on) an estimated projected future mean pixel value n_{x y t}, in turn determined based on historic pixel values i_x,_y,_t for a sampled set of pixels p_x,_y in said sequence of image frames l_t.

In embodiments that will be described in more detail in the following, the predicted pixel value p._Xiy_it is determined as fi._Xiy_it = a.ii_{x y t} + ft, where a and /? are constants determined so as to minimize the expression

is said estimated projected future mean pixel value for the pixel pj,k in question, and where j and k are iterated over a test set of pixels p_x,_y in the image frame l_t. The deter- mination of a and /? can take place in any per se conventional manner, which is well within the reach of the skilled person. As is the case for the above-described noise model, in some embodiments, fi_Xiy_it can be an estimated historic mean with respect to pixel values i_x,_y,_t for the pixel pj,k in question.

The above-described pure variance based noise model has proven to give good results in a wide range of environments. However, if the light conditions in the image change too quickly, the noise map will be flooded with outliers at first. In the image frames l_t that follow upon such changed light conditions, the standard deviation estimate will be in inflated, which instead leads to some degree of blindness until the noise model stabilizes again. The suitability of different variants of the presently described method can also vary depend- ing on the camera 110 hardware used. For instance, exposure and gain can be more or less coarse for different types of cameras, and aperture changes can be performed more or less quickly.

It is then proposed to estimate a linear mapping between the average intensity value in the noise model and the pixel intensities in the new frame. That is, find values for variables a and /? that minimize (16).

In (16), j may represent a sample or test set of pixels p_x,_y, such as a set of pixels p_x,_y evenly distributed (geometrically) pixel positions in the image frame l_t.

To be clear, when establishing the coefficients a and /?, pixels p_x,_y from different positions in the same image frame l_t are considered, and such pixels p_x,_y are compared with their corresponding positions in the noise model data.

In some embodiments, said test set of pixels p_x,_y can contain at least 0.1%, such as at least 1%, such as at least 10%, of the total set of pixels p_x,_y in the image l_t. In some embodiments, said test set of pixels p_x,_y can contain at most 80%, such as at most 50%, such as at most 25%, such as at most 10%, of the total set of pixels p_x,_y in the image l_t. The test set of pixels p_x,_y can be geometrically evenly distributed across the total set of pixels p_x,_y in the image l_t. For instance, the set can form a uniform sparse pattern extending across the entire image It, or extending across at least 50% of the image l_t; or the set can form a sparsely but evenly distributed set of vertical and/or horizontal full or broken lines distributed across the entire image l_t, or at least 50% of the image l_t. In some embodiments, pixels that are overexposed are not included in the test set. This can be determined by comparing the pixel values to a known threshold value, often provided by the sensor manufacturer. If it is not known, the threshold value can easily be established experimentally.

Next, the equation for checking the limits (that is, (6) above or an equivalent formulation of this equation), is updated according to the following:

Also, since the variance a_x,y _t changes over time, the variance estimate needs to be updated as well. It is unfortunately not feasible to use the value from (4), since it will be inflated by the exposure change that is already compensated for by using p_x,y,t as explained above. Instead, it is updated by weighing in the current squared deviation:

[ is the factor that decides how much weight should be given to this deviation com-

pared to the existing value. The higher f, the faster the noise model will adapt to fluctua- tions.

These combined, together with observing that if we apply a scaling factor a to the input, the variance can be scaled appropriately:

This variant of the noise model requires <Jx,y,t ^to be stored in an array, typically with one single-precision float (32 bits) per pixel p_x,_y. As comparison, the pure variance noise model stores (indirectly, by storing S and Q that allows for calculation of o_x,y,t ^as described above) the estimated variance <Jx,y,t for each pixel Px,y when run. When this linear mapping model is used, Ox,_y,t is updated using (19). Since the definition is recursive, the variance of this pixel in the previous frame will either be calculated from S and Q or from the previous iteration's calculation of (19) for this pixel. The sums S_x,_y,t and square-sums Q_x,_y,_t still need to be up- dated for every image frame l_t, in order to have the numbers available as soon as the step effect has passed. To illustrate the case when the predicted pixel value is determined as fi_x>y>t =

+ P> an example will now be provided as illustrated in Figure 8. In the chart shown therein, th Y axis shows pixel intensity i_x,_y,t for one particular pixel p_x,_y i in a sequence of consecutive dig- ital image frames l_t. The X axis depicts frame numbers. The window size n = 32, which means that during the first 32 frames, the model is still being initialized. Once 32 frames have been processed, the model contains sufficient information to make predictions of expected mean P-_X,_y,t and variance a_X)y_)t. The line AVG shows the rolling average of the last 32 frames, which is the predictor determined according to (5).

As can be seen in the graph, the true signal value fluctuates around 2021 from the start until frame #60, where there is a sudden change in exposure time. The exposure times used can be provided as a part of the frame" metadata. If the exposure time in the new frame differs significantly from the exposure times of the recent frames, the method described in con- nection to (17)-(19) should be used, since the levels have shifted and the model will be con- taminated while this is happening.

As can also be seen on the graph, it takes 32 frames for fi_Xiy_it to fully stabilize on the new level. Until that point is reached, p._Xiy_it is not ^a particularly good predictor, since it is lagging behind. In order to compensate for this, the rolling average goes through a linear transfor- mation according to (18). This outcome is shown as "Adj AVG" in the graph. It can clearly be seen that this corresponds much better to the pixel values.

Similarly, as can be seen in the following table (corresponding to the graph in Figure 8), the variance a_{x y t} is somewhere around 250 before the exposure change, whereas it gets in- flated all the way up to 4200 while the model is adapting. This is why the variance update method according to (19) is put into use. When processing frame 60, it first transforms the average value p._Xiy_it ^to fix,y,t using the linear mapping. It calculates the new pixel value's i_x,_y,_t deviation from fi_Xiy_it and decides whether it is outside the limits, according to (20). If this is the first frame where the exposure change was noticed, the variance

of the previous frame is used. This is initially based on S and Q (as determined according to the above), but is useful since S and Q still only includes pixel values i_x,_y,_t from before the exposure change. Finally, the <Jx,y,t+i ^to be use for the next frame is calculated according to (19). Below, # =

Frame number; PV = Pixel Value; AVG = AVG; AAVG = Adj AVG.

S and Q can continue to be updated as above, and can be used in order for the model to stabilize on the new level. Once the point is reached where a « 1 and /? « 0, the average and variance are considered to be stable again and can go back to the usual way of calcu- lating the variance.

Once the information about blob-allocated pixel values i_x,_y,t has been updated (and the noise map has also been updated), in a subsequent step 56 blobs are generated based on the blob-allocated pixel values i_x,_y,_t.

Blob generation is the process of iterating over the individual pixels p_x,_y in a generated noise map, filtering out false positives and forming blobs from connected clusters of outliers. While it is important that the noise map generation is efficient, more computation per pixel p_x,_y can be offered in the blob generation as long as it is knows that the pixel value in ques- tion i_x,_y,_t indeed overstepped the threshold in the noise map generation. Whereas setting the limits based on mean and sample standard deviation of the recent pixel values i_x,_y,_t works well in most cases, one notable problematic issue arises when parts of the image l_t become overexposed. In this case, the signal value tends to be saturated on some value close to the upper limits of the range, and since the affected pixel values i_x,_y,_t as a result stop fluctuating over time, the standard deviation also becomes zero, which in turn means that even the slightest change would lead to blobs being generated.

To address this issue, one can add an additional minimum required deviation, in a step S5, used in the blob generation step as an anti-saturation filter:

where is the noise model's prediction for the pixel value i_x,_y,_t. If the deviation is less than this, the pixel value i_x,_y,t is discarded as a non-blob pixel despite it overstepping the initial limits set up by the noise model.

Since square roots are computationally expensive, it is better to use:

q is > 0 but « 1, implying that q² is even smaller. Define:

B is a positive number that controls the filtering limit. Since any number for B that gives the appropriate filtering effect can be selected, one can decide to pick an integer value. In some embodiments, B is at least 10, such as at least 50, such as at least 100. In some embodi- ments, B is at the most 10000, such as at the most 1000.

Then, the condition can be rewritten as:

Since the noise map and the noise model were updated in the same step, the noise model that this currently considered noise map was based on is already lost when arriving at the blob generation step. The noise model data is overwritten in the computer memory each iteration of the method. However, since

was saved (with 15 bits precision in the above example) in the noise map itself, this value can be used instead when calculating (24). If the other terms are appropriately scaled (using fixed-point arithmetic), (24) can also be calcu- lated only using integer math.

After a possible such anti-saturation filtering step, pixel values i_x,_y,t overstepping the noise model limits (as described above across expressions (l)-(24)) are grouped together into multi-pixel blobs. This can be done using the per se well-known Hoshen-Kopelman algo- rithm, which is a raster-scan method to form such pixel groups that runs in linear time. Dur- ing the first pass, it runs through all pixel values i_x,_y,_t. If a pixel value i_x,_y,_t oversteps a limit and it has a neighboring pixel value i_x±i,_y±i,t that belongs to a blob, it will be added to that same blob. If it has multiple neighboring blob-classified pixel values i_x±i,_y±i,t, these will be joined into one single blob, and the pixel value i_x,_y,_t is added to the group. Finally, if there are no neighboring blobs, the pixel value i_x,_y,_t will be registered as a new blob. For each blob, the following metrics can be aggregated. This provides different options for estimating the center of the blob. One possibility is to use the absolute modulus of the noise model devia- tions:

and another option is to weight the coordinates by their squared deviations:

Short Name Type Description X_wi weightedXSum uint32 coordi-

nates of blob weighted by the deviation from the noise model.

Y_wi weightedYSum coordi-

nates of blob weighted by the deviation from the noise model.

W4 weightSum

all the absolute deviations from the noise model.

X_w2 sqWeightedXSum ^{u int32}

coordi- nates of blob weighted by the squared deviation from the noise model.

V_w2 sqWeightedYSum ^{u int32} Xp Py\i_p ~ ^_P]² ■ Y coordi- nates of blob weighted by the squared deviation from the noise model.

W2 sqWeightSum uint32

all the square deviations from the noise model.

Experimental data so far indicates that using (x_r; yj when the blob is small (blob size in pixels not larger than 16), but using the squared-weighted (%₂J £2) options for larger blobs (number of pixels in blob > 32), and interpolating between these for medium-sized blobs achieves a good stereo matching. Figure 7 illustrates an exemplifying clustering of four different detected blobs 1-4 based on individual pixel values i_x,_y,t found to fulfill the criteria for being considered as part of blobs at time t.

In a subsequent method step S7, performed by the target object tracker 140, detected blobs are correlated across said time-ordered series of digital images l_t to determine paths of moving objects through said space. Such correlation can, for instance, use linear interpola- tion and/or implied Newtonian laws of motion as filtering mechanism, so as to purge blobs not moving in ways that are plausible provided a reasonable model of the types of objects being tracked.

In case several cameras 110 are used, or in case one or several cameras 110 are used to- gether with another type of target object 120 sensor, tracking information available from such available cameras 110 and any other sensors can be combined to determine one or several 3-dimensional target object 120 tracks through the space 111. This can, for instance, take place using stereoscopic techniques, that are well-known in themselves.

In a subsequent step S8, one or several determined 2D and/or 3D target object 120 tracks can be output to an external system, and/or graphically displayed on an display of a track- monitoring device. For instance, such displayed information can be used by a golfer using the system 100 to gain knowledge of the properties of a newly hit golf strike.

In concrete examples, the user (such as a golfer) may be presented with a visual 2D or 3D representation, on a computer display screen, of the track of a golf ball just hit, as detected using the method and system described above, against a graphical representation of a vir- tual golf practice range or similar. This will provide feedback to the golfer that can be used to make decisions regarding various parts of the golf swing. The track may also be part of a virtual experience, in which a golfer may for instance play a virtual golf hole and the de- tected and displayed track is represented as a golf shot in said virtual experience. It is specifically noted that the amount of data necessary to process for achieving such tracks is substantial. For instance, at an updating frequency of 100 images per second and using a 10 Mpixel camera, 1 billion pixel values per second need to be processed and assessed with respect to blob status. This analysis may take place in a depicted space 111 that can include trees and other fine-granular objects displaying rapidly shifting light conditions; rapidly shifting general light conditions due to clouding, and so forth. Using the systems and tech- niques described herein, it is possible to process the data in essentially real-time, e.g., such that the track can be determined and output while the object is still in the air.

In a subsequent step S9, the method ends.

As mentioned above, the invention also relates to the system 100 as such, comprising the digital camera 110, the digital image analyzer 130 and the moving object tracker 140.

The digital camera 110 is then arranged to depict the space 111 to produce the series of digital images l_t as described above. The digital image analyzer 130 is configured to deter- mine said inequality for the pixel values i_x,_y,t as described above, and to store in the com- puter memory information indicating that one or several pixel values i_x,_y,t are part of a de- tected blob. The moving object tracker 140 is configured to correlate detected blobs across said series of digital images l_t as described above.

As also mentioned, the invention also relates to the computer software product as such. The computer software product is then configured to, when executing on suitable hardware as described above, embody the digital image analyzer 130 and the moving object tracker 140. As such, it is configured to receive a series of digital images l_t from the digital camera 110, and to perform the above-described method steps performed by the digital image an- alyzer 130 and the moving object tracker 140. For instance, the digital frames l_t can be pro- vided as a continuous or semi-continuous stream of frames from the digital camera 110 (and a set of n most recent considered frames can be analyzed for each frame or set of frames received), or the entire set of N images can be received as one big batch and ana- lysed thereafter. The computer software product can execute on a computer belonging to the system 100, and can as such constitute part of the system 100.

Above, a number of embodiments have been described. However, it is apparent to the skilled person that many modifications can be made to the disclosed embodiments without departing from the basic idea of the invention.

For instance, many additional data processing, filtering, transformation, etc. steps can be taken, in addition to the ones being described herein.

The generated blob data can be used in various ways in addition to the object tracking.

In general, everything which is said in relation to the method is equally applicable to the system and to the computer software product, and vice versa.

Hence, the invention is not limited to the described embodiments, but can be varied within the scope of the enclosed claims.

Claims

C L A I M S

1. A method for tracking moving objects, the method comprising: obtaining a series of digital images (l_t) at consecutive times (t), the series of digital images (l_t) representing optical input from a three-dimensional space within a field of view of the digital camera, the digital camera being arranged to produce said series of digital images (l_t) having a corresponding set of pixels, said series of digital images comprising cor- responding pixel values, the digital camera not moving in relation to said three-dimensional space during production of said series of digital images (l_t); for two or more of said pixel values, determining an inequality comparing a first value to a second value, the first value being calculated as, or based on, the square of the differ- ence between a pixel value (i_x,_y,_t), of a pixel (p_x,_y) in question, and a predicted pixel value the second value being calculated as, or based on, a product of, firstly, the square of a number Z and, secondly, an estimated variance or standard deviation (<j_{x y t}) with re- spect to historic pixel values for the pixel (p_x,_y) in question, wherein the predicted pixel value calculated based on the historic pixel values for the pixel (p_x,_y) in question, the inequality being determined as (ni_x,_y ,t ~

f°^r individual ones of said pixel values and for a number n, S_{x y t} =

and, for individual ones of said pixels, S_{x y t} and Q_Xiy_it are stored in a computer memory; for pixel values for which said first value is higher than said second value, storing in the computer memory information indicating that the pixel value (i_x,_y,_t) is part of a detected blob; and correlating, based on the information stored in the computer memory, detected blobs across said series of digital images (l_t) to determine paths of moving objects through said three-dimensional space.

2. The method according to claim 1, wherein said inequality is

> Z²a_{x y t}, where fi._Xiy_it is said predicted pixel value and where <j_{x y t} is an estimated standard deviation with respect to the historic pixel values (i_x,_y,{t-_n,t-i}) for the pixel (p_x,_y) in question.

3. The method according to claim 1, 1 is a number selected such that Z² is an integer such that 10 < Z² < 20.

4. The method according to claim 1, wherein S_x,_y,_t, Qx,y,t, or both, are calculated recur- sively, whereby a calculated value for the pixel value (i_x,_y,t) is calculated using a previously stored calculated value of S_x,_y,t, Q_x,_y,t, or both, for the pixel (p_x,_y) but at an immediately pre- ceding time (t-1).

5. The method according to claim 4, wherein

S_x,_y,_t is calculated as S_{x y t} = S_{x y t-1} + i_{x y t} — i_Xiy_it-n and Qx,_y,t is calculated as

6. The method according to any of claims 1-5, wherein the method comprises: storing in said computer memory S_x,_y,_t and Q_x,_y,_t in combination as a single datatype comprising 12 bytes or less per pixel (p_x,_y).

7. The method according to any of claims 1-5, wherein the method comprises: storing in said computer memory, for a particular digital image, a pixmap comprising, for each pixel, said information indicating that the pixel value (i_x,_y,t) is part of a detected blob.

8. The method according to claim 7, wherein said information indicating that the pixel value (i_x,_y,_t) is part of a detected blob is indi- cated in a single bit for each pixel (p_x,_y).

9. The method according to claim 7, wherein said pixmap comprises, for each pixel, a value indicating an expected pixel value for that pixel.

10. The method according to claim 9, wherein said value indicating an expected pixel value for the pixel is achieved by storing a pre- dicted pixel value as a fixed-point fractional number, using a total of 15 bits for the integer and fractional parts.

11. The method according to any of claims 1-5, wherein the predicted pixel value (fi._Xiy_it), the estimated variance or standard deviation (cr_Xty_tt), or both, is or are calculated based on a set of n historic pixel values (i_x,y,{t-n,t-i}) for the pixel (p_x,_y) in question, where 10 < n < 300.

12. The method according to any of claims 1-5, wherein a number n of previous images considered for the estimation of the estimated vari- ance or standard deviation (<j_{x y t}) of the second value is selected to be a power of 2.

13. The method according to any of claims 1-5, wherein said pixel values have a depth across one or several channels of between 8 and 48 bits.

14. The method according to any of claims 1-5, wherein the predicted pixel value, (fi_Xiy_it) is determined based on an estimated projected fu- ture mean pixel value

iⁿ turn determined based on a numerical relationship between historic pixel values and current pixel values for a sampled set of pixels (p_x,_y) in said series of digital images (l_t).

15. The method according to claim 14, wherein the predicted pixel value (fi_Xiy_it) is determined as fi._Xiy_it = ocp._Xiy_it + /?, where a and /? are constants determined so as to minimize Z ' j,k (j-j,k,t ~ (sWj.k.t + /?)) > where

is said estimated projected future mean pixel value for a pixel (pj,k) in question, and where j and k are iterated over a test set of pixels.

16. The method according to claim 15, wherein H_X)y_)t is an estimated historic mean with respect to pixel values for the pixel (p_x,_y) in question.

17. The method according to claim 15, wherein said test set of pixels contains between 1% and 25% of a total set of pixels in a given image.

18. The method according to claim 17, wherein said test set of pixels is geometrically evenly distributed across the total set of pixels in the given image.

19. The method according to claim 15, wherein the method comprises: determining the estimated standard deviation (<j_{x y t}) according to a_{x y t} =

20. The method according to claim 15, wherein the method comprises: determining that at least one is true of a being further away from 1 than a first thresh- old value and /? being further away from 0 than a second threshold value; and determining the predicted pixel value

until it is determined that a is no longer further away from 1 than the first threshold value and /? is no longer further away from 0 than the second threshold value.

21. The method according to claim 1, wherein the method comprises: for said pixel values for which said first value is higher than said second value, only store said information indicating that the pixel value (i_x,_y,t) is part of a detected blob in case also the following inequality holds: B [i_{x y t} — fi_X:y,t\² > (i_x,y,t> where t_{x y t} is the pixel value in question, where fi._Xiy_it is the predicted pixel value and where B is an integer such that B > 100.

22. The method according to any of claims 1-5 or 21, wherein the method comprises: using a Hoshen-Kopelman algorithm to group together individual adjacent pixels de- termined to be part of a same blob.

23. The method according to any of claims 1-5 or 21, wherein the moving objects are golf balls.

24. System for tracking moving objects, the system comprising: a digital camera arranged to represent optical input from a three-dimensional space within a field of view of the digital camera to produce a series of digital images (l_t) at con- secutive times (t), the digital camera (110) being arranged to produce said series of digital images (l_t) having a corresponding set of pixels, said series of digital images comprising cor- responding pixel values, the digital camera being arranged to not move in relation to said three-dimensional space during production of said series of digital images (l_t); a computer having an associated computer memory, the computer being configured to run a digital image analyzer; the digital image analyzer being configured to, for two or more of said pixel values, determine an inequality comparing a first value to a second value, the first value being cal- culated as, or based on, the square of the difference between a pixel value ( i_x,_y,t), of a pixel (p_x,_y) in question and, a predicted pixel value (fi_Xiyit), the second value being calculated as, or based on, a product of the square of, firstly, a number Z and, secondly, an estimated variance or standard deviation (<j_{x y t}) with respect to historic pixel values for the pixel (p_x,_y) in question, wherein the predicted pixel value

is calculated based on the historic pixel values for the pixel (p_x,_y) in question, the inequality being determined as a^ncb f°^r individual ones of said pixel values and

Q_{x v t} = 7^11 „ i_{x v} and, for individual ones of said pixels, S_{x y t} and Q_Xiy_it are stored in the computer memory; the digital image analyzer being configured to, for pixel values for which said first value is higher than said second value, store in the computer memory information indicat- ing that the pixel value (i_x,_y,_t) is part of a detected blob; and a moving object tracker being configured to correlate, based on the information stored in the computer memory, detected blobs across said series of digital images (l_t) to determine paths of moving objects through said three-dimensional space.

25. The system of 24, wherein the digital image analyzer is configured to perform opera- tions in accordance with any of claims 2-23.

26. A non-transitory computer-readable medium encoding a computer program product configured to perform operations in accordance with any of claims 1-23