WO2024083537A1 - Method and system for optically tracking moving objects - Google Patents

Method and system for optically tracking moving objects Download PDF

Info

Publication number
WO2024083537A1
WO2024083537A1 PCT/EP2023/077799 EP2023077799W WO2024083537A1 WO 2024083537 A1 WO2024083537 A1 WO 2024083537A1 EP 2023077799 W EP2023077799 W EP 2023077799W WO 2024083537 A1 WO2024083537 A1 WO 2024083537A1
Authority
WO
WIPO (PCT)
Prior art keywords
pixel
value
pixel values
pixel value
digital
Prior art date
Application number
PCT/EP2023/077799
Other languages
French (fr)
Inventor
Stein NORHEIM
Original Assignee
Topgolf Sweden Ab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Topgolf Sweden Ab filed Critical Topgolf Sweden Ab
Publication of WO2024083537A1 publication Critical patent/WO2024083537A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/215Motion-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/254Analysis of motion involving subtraction of images
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63BAPPARATUS FOR PHYSICAL TRAINING, GYMNASTICS, SWIMMING, CLIMBING, OR FENCING; BALL GAMES; TRAINING EQUIPMENT
    • A63B24/00Electric or electronic controls for exercising apparatus of preceding groups; Controlling or monitoring of exercises, sportive games, training or athletic performances
    • A63B24/0021Tracking a path or terminating locations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63BAPPARATUS FOR PHYSICAL TRAINING, GYMNASTICS, SWIMMING, CLIMBING, OR FENCING; BALL GAMES; TRAINING EQUIPMENT
    • A63B24/00Electric or electronic controls for exercising apparatus of preceding groups; Controlling or monitoring of exercises, sportive games, training or athletic performances
    • A63B24/0021Tracking a path or terminating locations
    • A63B2024/0028Tracking the path of an object, e.g. a ball inside a soccer pitch
    • A63B2024/0034Tracking the path of an object, e.g. a ball inside a soccer pitch during flight
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63BAPPARATUS FOR PHYSICAL TRAINING, GYMNASTICS, SWIMMING, CLIMBING, OR FENCING; BALL GAMES; TRAINING EQUIPMENT
    • A63B69/00Training appliances or apparatus for special sports
    • A63B69/36Training appliances or apparatus for special sports for golf
    • A63B69/3658Means associated with the ball for indicating or measuring, e.g. speed, direction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30221Sports video; Sports image
    • G06T2207/30224Ball; Puck
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory

Definitions

  • the present invention relates to a method and a system for optically tracking moving ob- jects.
  • Known methods track moving objects using computer vision, using one or more cameras depicting a space where the moving objects exist.
  • the tracking may be performed by first identifying an object as one image pixel, or a set of adjacent pixels, that deviate from a local background. Such deviating pixels are together denoted a "blob". Once a number of blobs have been detected in several image frames, possible tracked object paths are identified by interconnecting identified blobs in subsequent frames.
  • the blob generation in each individual frame potentially results in very many false positive blobs, in other words identified blobs that do not really correspond to an existing moving object. This may be due to noise, shifting lighting conditions and non-tracked objects occur- ring in the field of view of the camera in question.
  • the detection of possible tracked object paths normally results in a reduction of such false positives, for instance based on filtering away of physically or statistically implausible paths. Due to the large number of false positive blob detections, however, even if most of the false positives are filtered away in the tracked paths detection step, the blob detection itself is associated with heavy memory and processor load and may therefore constitute a bottle- neck for the object tracking even if high-performance hardware is used.
  • the various embodiments described herein solve one or more of the above described prob- lems and provide techniques for tracking the paths of moving objects using less memory and/or processing power compared to conventional object tracking techniques.
  • the invention can be embodied as a method for tracking moving objects, comprising the steps obtaining a series of digital images l t at consecutive times t, the digital images l t rep- resenting optical input from a three-dimensional space within a field of view of the digital camera, the digital camera being arranged to produce said digital images l t having a corre- sponding set of pixels p x , y , said digital images comprising corresponding pixel values i x , y ,t, the digital camera not moving in relation to said three-dimensional space during production of said series of digital images (l t ); for two or more of said pixel values i x , y , t , determining an inequality comparing a first value to a second value, the first value being calculated as, or based on, the square of the difference between the pixel value i x , y , t in question and a predicted pixel value fi.
  • the second value being calculated as, or based on, a product of, firstly, the square of a number Z and, secondly, an estimated variance or standard deviation a xy t with respect to historic pixel values i x , y , ⁇ t -n,t-i ⁇ for the pixel p x , y in question, where the predicted pixel value fi.
  • Xi y it is calculated based on historic pixel values i x , y , ⁇ t -n,t-i ⁇ for the pixel p x , y in question, Z being a number selected such that Z 2 is an integer such that 10 ⁇ Z 2 ⁇ 20; for pixel values i x , y , t for which said first value is higher than said second value, storing in a computer memory information indicating that the pixel value i x , y , t is part of a detected blob; and correlating, based on the information stored in the computer memory, detected blobs across said series of digital images l t to determine paths of moving objects through said three-dimensional space.
  • said inequality is where s said predicted pixel value and where a x y t is an estimated standard deviation with respect to historic pixel values i x , y , ⁇ t- n ,t-i ⁇ for the pixel p x , y in question.
  • the method further comprises storing, in said computer memory, for individual ones of said pixels p x , y and for a num- ber n ⁇ N, the sums for individual ones of said pixel values i x , y , t , determining said inequality as
  • S x , y ,t, Qx, y ,t, or both are calculated recursively, whereby a calculated value for a pixel value i x , y , t is calculated using a previously stored calculated value S x , y , t , Q x , y , t , or both, for the same pixel p x , y but at an immediately preceding time t-1.
  • the method further comprises storing in said computer memory S x , y , t and Q x , y , t in combination as a single datatype comprising 12 bytes or less per pixel p x , y .
  • the method further comprises storing in said computer memory, for a particular digital image l t , a pixmap having, for each pixel p x , y , said information indicating that the pixel value i x , y ,t is part of a detected blob.
  • said information indicating that the pixel value i x , y , t is part of a detected blob is indi- cated in a single a bit for each pixel p x , y .
  • said pixmap also comprises, for each pixel p x , y , a value indicating an expected pixel value i x , y , t for that pixel p x , y .
  • said value indicating an expected pixel value i x , y , t for the pixel p x , y in question by storing the predicted pixel value (p x , y , t ) as a fixed-point fractional number, using a total of 15 bits for the integer and fractional parts.
  • the predicted pixel value p xy t , the estimated variance or standard deviation a xy t , or both is or are calculated based on a set of n historic pixel values i x , y , ⁇ t -n,t-i ⁇ for the pixel p x , y in question, where 10 ⁇ n ⁇ 300.
  • a number n of previous images l t considered for the estimation of an estimated vari- ance or a standard deviation o x ,y ,t °f the second value is selected to be a power of 2.
  • said pixel values i x , y , t have a depth across one or several channels of between 8 and 46 bits.
  • the predicted pixel value fi x>y> t is determined based on an estimated projected future mean pixel value in turn determined based on historic pixel values i x , y ,t for a sampled set of pixels p x , y in said images l t .
  • the predicted pixel value p. Xi y it is determined as p. Xi y it where a and /? are constants determined so as to minimize is said estimated projected future mean pixel value for the pixel pj,k in question, and where j and k are iterated over a test set of pixels.
  • said test set of pixels contains between 1% and 25% of the total set of pixels p x , y in the image l t .
  • said test set of pixels is geometrically evenly distributed across the total set of pixels p x , y in the image l t .
  • the method further comprises determining that at least one is true of a being further away from 1 than a first thresh- old value and /? being further away from 0 than a second threshold value; and determining the predicted pixel value, (fi x>y> t) according to any one of claims 14-18 until it is determined that a is no longer further away from 1 than the first threshold value and /? is no longer further away from 0 than the second threshold value.
  • the method further comprises for said pixel values i x , y ,t for which said first value is higher than said second value, only store said information indicating that the pixel value i x , y ,t is part of a detected blob in case also the following inequality holds: B[i x y t — fi X: y :t ] 2 > fi x , y ,t, where i x y t is the pixel value in question, where p. Xi y it is the predicted pixel value and where B is an integer such that B > 100.
  • the method further comprises using a Hoshen-Kopelman algorithm to group together individual adjacent pixels de- termined to be part of a same blob.
  • the objects are golf balls.
  • the invention can be embodied as a method for tracking moving objects, the method com- prising: obtaining a series of digital images I from a digital camera, the digital images I repre- senting optical input from a three-dimensional space within a field of view of the digital camera over time, each of the digital images I having pixels p x , y with corresponding pixel values i x , y ; performing, at a computer, image segmentation on each image of the series of digital images I using a statistical model of background for the optical input to detect blobs, wherein performing the image segmentation comprises, for each of two or more pixel val- ues i x , y , t in the image, determining an inequality result using a current pixel value i x , y , t for a pixel p x , y in a current image l t , first S xy t and second Q Xi y it values of the statistical model for the pixel
  • the invention can also be embodied as a system for tracking moving objects, the system comprising a digital camera, a digital image analyzer and a moving object tracker, the digital camera being arranged to represent optical input from a three-dimensional space within a field of view of the digital camera to produce a series of digital images l t at consecutive times t, the digital camera being arranged to produce said digital images l t hav- ing a corresponding set of pixels p x , y , said digital images comprising corresponding pixel val- ues i x , y , t , the digital camera being arranged to not move in relation to said three-dimensional space during production of said series of digital images (l t ); the digital Image analyzer being configured to, for two or more of said pixel values i x , y ,t, determine an inequality comparing a first value to a second value, the first value being calculated as, or based on, the square of the difference between the pixel value i
  • the second value being calculated as, or based on, a product of, firstly, the square of a number Z and, secondly, an estimated variance or stand- ard deviation a xy t with respect to historic pixel values i x , y , ⁇ t -n,t-i ⁇ for the pixel p x , y in question, where the predicted pixel value p.
  • Xi y it is calculated based on historic pixel values i x , y , ⁇ t -n,t-i ⁇ for the pixel p x , y in question, and where Z is selected such that Z 2 is an integer such that 10 ⁇ Z 2 ⁇ 20; the digital image analyzer being configured to, for pixel values i x , y ,t for which said first value is higher than said second value, store in a computer memory information indicating that the pixel value i x , y ,t is part of a detected blob; and the moving object tracker being configured to correlate, based on the information stored in the computer memory, detected blobs across said series of digital images l t to determine paths of moving objects through said three-dimensional space.
  • the invention can also be embodied as a computer software product config- ured to, when executing, receive a series of digital images l t from a digital camera, the digital camera being arranged to represent optical input from a three-dimensional space to produce said digital images l t at consecutive times t, the digital camera being arranged to produce said digital images l t having a corresponding set of pixels p x , y , said digital images comprising corre- sponding pixel values i x , y , t , the digital camera being arranged to not move in relation to said three-dimensional space during production of said series of digital images (l t ); for two or more of said pixel values i x , y , t , determine an inequality comparing a first value to a second value, the first value being calculated as, or based on, the square of the difference between the pixel value i x , y , t in question and a predicted pixel value p.
  • Xi y it > the second value being calculated as, or based on, a product of the square of, firstly, a number Z and, secondly, an estimated variance or standard deviation a xy t with respect to historic pixel values i x , y , ⁇ t -n,t-i ⁇ for the pixel p x , y in question, where the predicted pixel value fi.
  • Xi y it is calculated based on historic pixel values i x , y , ⁇ t -n,t-i ⁇ for the pixel p x , y in question, and where Z is selected such that Z 2 is an integer such that 10 ⁇ Z 2 ⁇ 20; for pixel values i x , y , t for which said first value is higher than said second value, store in a computer memory information indicating that the pixel value i x , y , t is part of a detected blob; and correlate, based on the information stored in the computer memory, detected blobs across said series of digital images l t to determine paths of moving objects through said three-dimensional space.
  • the computer software product can be implemented by a non-transitory computer-reada- ble medium encoding instructions that cause one or more hardware processors located in at least one of computer hardware devices in the system to perform the digital image pro- cessing and the object tracking.
  • Figure 1 is an overview of a system 100 configured to perform a method of the type illus- trated in Figure 3;
  • FIG. 2 is a simplified illustration of a data processing apparatus
  • Figure 3 shows a general flowchart for logically tracking moving target objects
  • Figure 4 is a flowchart of a method performed by the system 100 shown in Figure 1;
  • Figure 5 is an overview illustrating a noise model of a type described herein;
  • Figure 6 shows an image frame illustrating a noise model
  • Figure 7 illustrates an example of clustering of pixels into blobs
  • Figure 8 illustrates intensities for a pixel during a sudden exposure change event.
  • a system 100 can comprise one or several digital cameras 110, each being arranged to represent optical input from a three-dimensional space 111 within a field of view of the digital camera 110, to produce digital images of such moving target objects 120, the objects travelling through a space 111 hence being represented by the digital cam- era 110 in consecutive digital images.
  • Such representation by the digital camera 110 will herein be denoted a "depiction", for brevity.
  • the digital camera 110 is arranged to not move in relation to the space 111 during produc- tion of the series of digital images (l t ).
  • the digital camera 110 may be fixed in relation to said space 111, or, in case it is movable it is kept still during the production of the series of digital images (l t ).
  • the same part of the space 111 is depicted each time by the digital camera 110, and the digital camera 110 is arranged to produce digital images It having a corresponding set of pixels p x , y , and so that said produced digital images l t com- prise corresponding pixel values i x , y , t .
  • "x" and "y” denote coordinates in an image coordinate system, whereas "t” denotes time.
  • That the pixel values i x , y , t of two or more different images l t "correspond" to each other means that individual pixels p x , y measure light entering the camera 110 from the same, or substantially the same, light cone in all of the images l t in question. It is realized that the camera 110 may move slightly, due to wind, thermal expansion and so forth, between im- ages l t , but that there is substantial correspondence between pixels p x , y even in cases where such noise-inducing slight movement is present. There can be at least 50% overlap between light cones of any one same pixel p x , y of the camera 110 between any two consecutive im- ages l t . There may also be cases where the camera 110 is movable, such as pivotable. In such cases an image transformation can be applied to a captured image so as to bring its pixels p x , y into correspondence with pixels of a previous or future captured image.
  • system 100 comprises more than one digital camera 110
  • several such digital cameras 110 can be arranged to depict the same space 111 and consequently tracking the same moving target object(s) 120 through said space 111.
  • the several digital cameras 110 can be used to construct a stereoscopic view of the respective tracked path of each target object 120.
  • the digital camera 110 is arranged to produce a series of consecutive images It, at different points in time. Such images may also be denoted image "frames".
  • the digital camera 110 is a digital video camera, arranged to produce a digital moving film comprising or being constituted by such consecutive digital image frames.
  • the system 100 comprises a digital image analyzer 130, config- ured to analyze digital images received directly from the digital camera 110, or receive from the digital camera 110 via an intermediate system, in same or processed (re-formatted, compressed, filtered, etc.) form. The analysis performed by the digital image analyzer 130 can take place in the digital domain.
  • the digital image analyzer 130 may also be denoted a "blob detector".
  • the system 100 further comprises an object tracker 140, configured to track said moving target objects 120 across several of said digital images, based on information provided from the digital image analyzer 130.
  • the analysis performed by the object tracker 140 can also take place in the digital domain.
  • the system 100 is configured to track target objects 120 in the form of sports objects in flight, such as balls in flight, for instance baseballs or golf balls in flight.
  • the system 100 is used at a golf practice range, such as a driv- ing range having a plurality of bays for hitting golf balls that are to be tracked using the system 100.
  • the system 100 can be installed at an individual golf range bay, or at a golf tee, and configured to track golf balls being struck from said bay or tee.
  • the system 100 can also be a portable system 100, configured to be positioned at a location from which it can track said moving target objects 120. It is realized that the monitored "space" mentioned above will, in each of these and other cases, will be a space through which sport balls are expected to move.
  • the digital image analyzer 130 and the object tracker 140 constitute examples of such computers.
  • the digital image analyzer 130 and the object tracker 140 can be provided as software functions exe- cuting on one and the same computer.
  • the one or several digital cameras 110 can also be configured to perform digital image processing, and then also constitute examples of such computers.
  • the digital image analyzer 130 and/or the object tracker 140 is or are implemented as software functions configured to execute on hardware of one or several digital cameras 110.
  • the digital image analyzer 130 and/or the object tracker 140 is or are implemented on standalone or combined hardware plat- forms, such as on a computer server.
  • the one or several digital cameras 110, the digital image analyzer 130 and the object tracker 140 are configured to communicate digitally, either via computer-internal communication paths, such as via a computer bus, or via computer-external wired and/or wireless commu- nication paths, such as via internet network 10 (e.g., the Internet).
  • internet network 10 e.g., the Internet
  • the camera(s) 110 and the digital image ana- lyzer 130 can communicate via a direct, wired digital communication route, which is not over the network 10.
  • the digital image analyzer 130 and the object tracker 140 may communicate with each other over the network 10 (e.g., a conventional Internet connection).
  • a "computer” can include a server computer, a client computer, a personal computer, em- bedded programmable circuitry, or a special purpose logic circuitry. Such computers can be connected with one or more other computers through a network, such as the internet 10, or via any suitable peer-to-peer connection for digital communications, such as a Blue- tooth® connection.
  • Each computer can include various software modules, which can be distributed between an applications layer and an operating system. These can include executable and/or interpret- able software programs or libraries, including various programs that operate, for instance, as the digital image analyzer 130 program and/or the object tracker 140 program. Other examples include a digital image preprocessing and/or compressing program.
  • the number of software modules used can vary from one implementation to another and from one such computer to another.
  • Each of said programs can be implemented in embedded firmware and/or as software modules that are distributed on one or more data processing apparatus connected by one or more computer networks or other suitable communication networks.
  • Figure 2 illustrates an example of such a computer, being a data processing apparatus 300 that can include hardware or firmware devices including one or more hardware processors 312, one or more additional devices 314, a non-transitory computer readable medium 316, a communication interface 318, and one or more user interface devices 320.
  • the processor 312 is capable of processing instructions for execution within the data processing apparatus 300, such as instructions stored on the non-transitory computer readable medium 316, which can include a storage device such as one of the additional devices 314.
  • the processor 312 is a single or multi-core processor, or two or more central processing units (CPUs).
  • the data processing apparatus 300 uses its communication inter- face 318 to communicate with one or more other computers 390, for example, over the network 380.
  • the processes described can be run in par- allel, concurrently, or serially, on a single or multi-core computing machine, and/or on a computer cluster/cloud, etc.
  • the data processing apparatus 300 includes various software modules, which can be dis- tributed between an applications layer and an operating system. These can include execut- able and/or interpretable software programs or libraries, including a program 330 that con- stitutes the digital image analyzer 130 described herein, configured to perform the method steps performed by such digital image analyzer 130.
  • the program 330 can also constitute the object tracker 140 described herein, configured to perform the method steps per- formed by such object tracker 140.
  • Examples of user interface devices 320 include a display, a touchscreen display, a speaker, a microphone, a tactile feedback device, a keyboard, and a mouse.
  • the user in- terface device(s) need not be local device(s) 320, but can be remote from the data pro- cessing apparatus 300, e.g., user interface device(s) 390 accessible via one or more commu- nication network(s) 380.
  • the user interface device 320 can also be in the form of a standalone device having a screen, such as a conventional smartphone being connected to the system 100 via a configuration or setup step.
  • the data processing apparatus 300 can store instructions that implement operations as described in this document, for example, on the non-transitory computer readable medium 316, which can include one or more ad- ditional devices 314, for example, one or more of a floppy disk device, a hard disk device, an optical disk device, a tape device, and a solid state memory device (e.g., a RAM drive, a Flash memory or an EEPROM).
  • ad- ditional devices 314 for example, one or more of a floppy disk device, a hard disk device, an optical disk device, a tape device, and a solid state memory device (e.g., a RAM drive, a Flash memory or an EEPROM).
  • the instructions that implement the operations described in this document can be downloaded to the non-transitory computer readable medium 316 over the network 380 from one or more computers 390 (e.g., from the cloud), and in some implementations, the RAM drive is a volatile memory device to which the in- structions are downloaded each time the computer is turned on.
  • system 100 is configured to perform a method according to one or more embodiments for optically tracking moving target objects 120.
  • the present invention can furthermore be embodied as a computer software product, con- figured to perform said method when executing on computer hardware of the type de- scribed herein.
  • the computer software product can hence be deployed as a part of the sys- tem 100 so as to provide the functionality required to perform the present method.
  • Both said system 100 and said computer software product are hence configured to track moving target objects 120 moving through said space 111 in relation to one or several digital cameras 110, by comprising or embodying the above-mentioned digital image analyzer 130 and object tracker 140, in turn being configured to perform the corresponding method steps described herein.
  • Figure 3 illustrates a general flowchart for tracking moving target objects 120 based on dig- ital image information received from one or several digital cameras 110.
  • image segmentation is the process of separating an image into differ- ent regions, representing target objects within it.
  • the background may in general be changing and noisy, and is in many cases quite unpredictable.
  • a golf ball for instance, when such a ball is far away from the digital camera 110 depicting the ball 120, it may be even as small as one single pixel p x , y in the digital image frame produced by the digital camera 120.
  • Such a method may result in a very large number of false positives, such as about 99.9% false positives.
  • a subsequent motion tracking analysis can sort out the vast ma- jority of all false positives, such as only keeping blobs that seem to obey Newton's laws of motion between consecutive digital image frames l t .
  • the noise model step is used to suppress noise in the image frames, with the purpose of lowering the number of detected blobs in the subsequent blob aggre- gation step.
  • the noise model analyzes a plurality of pixels p x , y , such as every pixel p x , y , in said image frames l t , and is therefore at risk of becoming a major bottleneck.
  • These calculations aiming to identify noise that does not conform to a detected statistical pattern in order to identify outliers, can be handled by high-performance GPUs (Graphics Processing Units), but performance may still prove to be a problem.
  • the approach described herein has turned out to drastically reduce the computational power required per pixel p x , y in a moving target object 120 tracking system 100. This reduction can be exploited by using simpler hardware, lower power consumption or a larger incoming image bitrate.
  • a first step SI the method starts.
  • a number Z is selected such that Z 2 is an integer.
  • the number Z can be selected such that Z 2 is an integer such that 10 ⁇ Z 2 ⁇ 20. It is noted that Z may be a non-integer, as long as Z 2 is an integer value.
  • This step S2 may be performed ahead of time, such as during a system 100 design process or a system 100 calibration step.
  • the space 111 is depicted using the digital camera 110 to produce a series of digital images i t at consecutive times t.
  • the space 111 can be depicted using the digital camera 110 to produce a series of N digital images i t at consecutive times t.
  • the procedure can also be a continuous or semi-continuous procedure, wherein the digital camera 110 will continue to produce digital images i t at consecutive times t so long as the procedure is ongoing.
  • the number of digital images N will grow by 1 for each captured frame.
  • the series of digital images i t at consecutive times t may be seen as a stream of digital images captured much like a digital video stream.
  • an inequality is determined, involving comparing a first value to a second value.
  • the first value is calculated based on the square of the difference between the pixel value i x , y ,t in question and a calculated predicted pixel value p. Xi y it for that pixel p x , y .
  • the second value is calculated based on a product of, on the one hand, the square of the selected num- berZ, this square then being an integer value, and, on the other hand, an estimated variance or standard deviation cT x ,y, t with respect to historic pixel values i x , y , ⁇ t-n,t-i ⁇ for the pixel p x , y in question.
  • the second value can be calculated based on said estimated variance or a square of the estimated standard deviation
  • the predicted pixel value p. Xi y it is also calculated based on historic pixel values i x , y , ⁇ t-n,t-i ⁇ for the pixel p x , y in question, in other words using information from image frames l t -at captured by the camera 110 at points in time prior to the time t.
  • the predicted pixel value (i X y )t can be calculated based on the same, or different, set of historic pixel values i x , y , ⁇ t-n,t-i ⁇ as the estimated variance or standard deviation a x y t .
  • n denotes the number of historic pixel values i x , y ,t, considered by the noise model, counting backwards from the currently considered image frame.
  • This notation hence assumes that the same consecutive pixel values i x , y ,t, up to the presently considered image frame, are used to calculate both the first and the second value, but it is realized that any suitable contiguous or non-contiguous, same or different, intervals of pixel values i x , y , t can be used to calculate the first and the second value, respectively.
  • equations and expressions disclosed and discussed herein are provided as illustrative examples, and it is realized that in practical embodiments they can be tailored to specific needs. This can include, for instance, the introduction of various constant factors and scaling factors; additional intermediate calculation steps, such as filtering steps; and so forth.
  • said inequality may be written as: where (i X y )t is said predicted pixel value and where a x y t is an estimated standard deviation with respect to said historic pixel values i x , y , ⁇ t -n,t-i ⁇ for the pixel p x , y in question.
  • the presently described noise model can be configured to, for each pixel p x , y , estimate a moving average and standard deviation based on the last n image frames, and then to use these metrics to decide whether the pixel value i x , y ,t in the same image location in the new frame deviates from the expected value more than an allowed limit.
  • This model can be designed to assume that any pixel in the background of the considered image i t has an intrinsic Gaussian noise, as long as the background only contains features that are assumed to be static in the first approximation.
  • a normal distribution can be used to establish a suitable confidence interval. For instance, if a Z score of 3.464 is used, it can be seen that 99.95% of all samples with no significant differences from the background fall within the corresponding confidence interval. Therefore, a pixel p x , y with signal value i x , y at time t is considered to have a significant difference from the background if:
  • the corrected (unbiased) standard deviation would be a mathematically more correct choice, i.e. a more accurate estimate of ⁇ J would result from dividing by n-1 rather than by n. However, for the present purposes this is not significant, since the limit used is a multiple of the standard deviation that may be freely selected. Selecting the number n of previous image frames considered for the estimation of the standard deviation in the second value (used in evaluating said inequality) to be a power of 2 (e.g. 16, 32, 64, ...), we can get com- putationally efficient multiplications and divisions at a very low cost, by using shifting oper- ations.
  • Equation (8) depends on knowledge of the sum S and the squared sum Q of the last n ob- servations of the pixel value i x , y ,t in question:
  • (14) and (15) are then the full calculations required to update the noise model.
  • a straight- forward implementation would require only 3 (int) additions, 1 (int) subtraction and 1 (int) multiplication per pixel, which makes it very computationally efficient.
  • these calculations can be accelerated by use of SIMD instructions sets such as AVX2 (on x86_64) or NEON (on aarch64), or they can be run on a GPU or even implemented on an FPGA.
  • n can be as low as 32, or even as low as 16 or even 10.
  • the n frames con- sidered at each point in time are the n latest frames captured and provided by the camera 110.
  • the n frames can together cover a time period of between 0.1 s and 10 s, such as between 0.5 s and 2 s, of captured video.
  • the number of considered frames n can be relatively close to a frame rate used by the digital camera 110.
  • the noise model may then be required to store two integers per pixel p x , y , in addition to keeping the actual image frames in memory for at least as many frames l t as the length of the window size n.
  • an additional single-precision float may be required per pixel to store the estimated variance if the calculation (as described in equation (19), below) is used.
  • the pixel values i x , y ,t have a bit depth across one or several channels of between 8 and 48 bits, such as a single channel (for instance a gray channel) of 8 or 16 bit depth or three channels (such as RGB) of 16 or 24 bit depth.
  • the pixel values i x , y , t can be transformed into a single-channel (such as a gray scale channel) before processing of the pixel values i x , y , t by the digital image analyzer 130.
  • a single-channel such as a gray scale channel
  • only one such channel, out of several available channels, can be used for the analysis.
  • several channels can be analyzed separately and in parallel, so that a pixel that is detected to be a blob in at least one such analyzed channel is determined to be a blob at any point in time.
  • the transformed pixel values i x , y , t can have a bit depth of at least 8 bits, and in some em- bodiments at the most 24 bits, such as at the most 16 bits.
  • a bit depth of 12 bits has proven to strike a reasonable balance between speed, memory requirements and output quality.
  • the data from the camera 110 can be transformed (down-sampled) before processing by the digital image analyzer 130.
  • the number of bits required can be found for S as D + log 2 (n) and for Q as 2D + log 2 (n), where D is a bit depth for one single considered channel.
  • the method can comprise a step in which the noise model is updated and stored in computer memory, as a collection of updated noise model information (S and Q) with respect to individual pixels p x , y for which blob detection is to be performed.
  • This noise map can hence be updated and stored for each pixel p x , y in the image.
  • This storing, for each analyzed pixel value i x , y , t (such as for all pixels p x , y in the image l t ), of updated values for S x , y , t and Q x , y , t in combination as a single datatype, constitutes an example of the "noise model" described herein.
  • the noise model is updated for each analyzed digital image frame l t , such as for each individual image frame l t in the set of consecutive image frames l t produced and provided by the (each) digital camera 110.
  • step S4 for pixel values i x , y , t for which said first value is found to be higher than said second value, information is stored in said computer memory, the information indicat- ing that the pixel value i x , y , t is part of a detected blob.
  • This storing can take place in a generated pixmap, in other words a data structure having such indicating information for each pixel p x , y .
  • the information for each pixel p x , y that it be- longs or does not belong to a blob for that image frame l t can be stored very computation- ally efficient, since it can be stored as a single binary bit.
  • One way of implementing such a pixmap in practice is to use a "noise map" of the general type that will be described in the following, where the pixmap also comprises, for each pixel p x ,y, a value indicating an expected pixel value i x , y ,t for that pixel p x , y .
  • the noise model established as described above can be used to gen- erate such a noise map, that for every pixel position p x , y provides information about whether or not that particular pixel value i x , y , t in the new frame l t was outside of the allowed limits (that is, if (6) or (8) was true).
  • the noise map can store an expected signal value for each pixel p x , y at time t, such as based on the calculations performed in the deter- mination of the noise model. The expected signal value is useful in downstream calculations, such as in a subsequent blob aggregation step, and so it is computationally efficient to es- tablish and store this information already at this point.
  • Figure 6 illustrates the noise model after being updated based on the information of a most recently available image frame l t , and in particular how the frame l t relates to the values of S x , y and Q x , y for that time t.
  • the noise map requires 16 bits per pixel p x , y to store. This infor- mation can be stored in a single two-byte datatype (such as an uintl6).
  • the information indicating whether or not the pixel p x , y corresponding to each noise map entry is a blob pixel or not can be stored in the form of one single bit out of the total number of stored bits for the pixel p x , y in question in the noise map.
  • the most significant bit in datatype used to store noise map data for each pixel p x , y indicates whether the pixel value i x , y , t in question is outside the blob generating limits. Then, the lower 15 bits can encode the expected (average) pixel value i x , y signal, scaled to 15 bits precision and can be stored in fixed-point representation. It is noted that this expected pixel value i x , y signal corresponds to the above-discussed predicted pixel value fi. xy t .
  • the value in the noise map indicating an expected pixel value i x , y , t for the pixel p x , y can be achieved by transforming (if necessary) the predicted pixel value to a grayscale bit depth of 15 bits.
  • the pixmap for each pixel at least or only contains information on 1) whether that pixel is part of a blob and 2) the predicted pixel value for that pixel.
  • the prediction is simply the arithmetic mean of the previous n frames, but we will, later on, describe a method, which can be an alternative method to the one described so far, to predict the value to be used when the recent frames have large changes in capture parameters such as shutter time or gain.
  • the stored noise model incorporates all available information from image frames l t received to the digital image analyzer 130 from the camera 110. In other words, it can use n consecutive or non-consecutive image frames l t up until a most recently received image frame l t to calculate values for Q x , y and S x , y .
  • the estimated projection (predicted pixel value fi Xi y it ) data stored for each pixel p x , y in the noise map can be updated only using a second-to-most recently received image frame l t , i.e.
  • the predicted pixel value p. Xi y it is determined as (or at least based on) an estimated projected future mean pixel value n x y t , in turn determined based on historic pixel values i x , y , t for a sampled set of pixels p x , y in said sequence of image frames l t .
  • the deter- mination of a and /? can take place in any per se conventional manner, which is well within the reach of the skilled person.
  • fi Xi y it can be an estimated historic mean with respect to pixel values i x , y , t for the pixel pj,k in question.
  • the above-described pure variance based noise model has proven to give good results in a wide range of environments. However, if the light conditions in the image change too quickly, the noise map will be flooded with outliers at first. In the image frames l t that follow upon such changed light conditions, the standard deviation estimate will be in inflated, which instead leads to some degree of blindness until the noise model stabilizes again.
  • the suitability of different variants of the presently described method can also vary depend- ing on the camera 110 hardware used. For instance, exposure and gain can be more or less coarse for different types of cameras, and aperture changes can be performed more or less quickly.
  • j may represent a sample or test set of pixels p x , y , such as a set of pixels p x , y evenly distributed (geometrically) pixel positions in the image frame l t .
  • pixels p x , y from different positions in the same image frame l t are considered, and such pixels p x , y are compared with their corresponding positions in the noise model data.
  • said test set of pixels p x , y can contain at least 0.1%, such as at least 1%, such as at least 10%, of the total set of pixels p x , y in the image l t . In some embodiments, said test set of pixels p x , y can contain at most 80%, such as at most 50%, such as at most 25%, such as at most 10%, of the total set of pixels p x , y in the image l t . The test set of pixels p x , y can be geometrically evenly distributed across the total set of pixels p x , y in the image l t .
  • the set can form a uniform sparse pattern extending across the entire image It, or extending across at least 50% of the image l t ; or the set can form a sparsely but evenly distributed set of vertical and/or horizontal full or broken lines distributed across the entire image l t , or at least 50% of the image l t .
  • pixels that are overexposed are not included in the test set. This can be determined by comparing the pixel values to a known threshold value, often provided by the sensor manufacturer. If it is not known, the threshold value can easily be established experimentally.
  • the variance estimate needs to be updated as well. It is unfortunately not feasible to use the value from (4), since it will be inflated by the exposure change that is already compensated for by using p x ,y,t as explained above. Instead, it is updated by weighing in the current squared deviation: [ is the factor that decides how much weight should be given to this deviation com- pared to the existing value. The higher f, the faster the noise model will adapt to fluctua- tions.
  • This variant of the noise model requires ⁇ Jx,y,t to be stored in an array, typically with one single-precision float (32 bits) per pixel p x , y .
  • the pure variance noise model stores (indirectly, by storing S and Q that allows for calculation of o x ,y,t as described above) the estimated variance ⁇ Jx,y,t for each pixel Px,y when run.
  • Ox, y ,t is updated using (19). Since the definition is recursive, the variance of this pixel in the previous frame will either be calculated from S and Q or from the previous iteration's calculation of (19) for this pixel.
  • the window size n 32, which means that during the first 32 frames, the model is still being initialized. Once 32 frames have been processed, the model contains sufficient information to make predictions of expected mean P- X , y ,t and variance a X) y )t .
  • the line AVG shows the rolling average of the last 32 frames, which is the predictor determined according to (5).
  • the true signal value fluctuates around 2021 from the start until frame #60, where there is a sudden change in exposure time.
  • the exposure times used can be provided as a part of the frame" metadata. If the exposure time in the new frame differs significantly from the exposure times of the recent frames, the method described in con- nection to (17)-(19) should be used, since the levels have shifted and the model will be con- taminated while this is happening.
  • the variance update method according to (19) is put into use.
  • processing frame 60 it first transforms the average value p. Xi y it to fix,y,t using the linear mapping. It calculates the new pixel value's i x , y , t deviation from fi Xi y it and decides whether it is outside the limits, according to (20). If this is the first frame where the exposure change was noticed, the variance of the previous frame is used.
  • S and Q can continue to be updated as above, and can be used in order for the model to stabilize on the new level. Once the point is reached where a « 1 and /? « 0, the average and variance are considered to be stable again and can go back to the usual way of calcu- lating the variance.
  • blobs are generated based on the blob-allocated pixel values i x , y , t .
  • Blob generation is the process of iterating over the individual pixels p x , y in a generated noise map, filtering out false positives and forming blobs from connected clusters of outliers. While it is important that the noise map generation is efficient, more computation per pixel p x , y can be offered in the blob generation as long as it is knows that the pixel value in ques- tion i x , y , t indeed overstepped the threshold in the noise map generation. Whereas setting the limits based on mean and sample standard deviation of the recent pixel values i x , y , t works well in most cases, one notable problematic issue arises when parts of the image l t become overexposed.
  • the signal value tends to be saturated on some value close to the upper limits of the range, and since the affected pixel values i x , y , t as a result stop fluctuating over time, the standard deviation also becomes zero, which in turn means that even the slightest change would lead to blobs being generated.
  • a step S5 used in the blob generation step as an anti-saturation filter: where is the noise model's prediction for the pixel value i x , y , t . If the deviation is less than this, the pixel value i x , y ,t is discarded as a non-blob pixel despite it overstepping the initial limits set up by the noise model.
  • B is a positive number that controls the filtering limit. Since any number for B that gives the appropriate filtering effect can be selected, one can decide to pick an integer value. In some embodiments, B is at least 10, such as at least 50, such as at least 100. In some embodi- ments, B is at the most 10000, such as at the most 1000.
  • the noise model that this currently considered noise map was based on is already lost when arriving at the blob generation step.
  • the noise model data is overwritten in the computer memory each iteration of the method. However, since was saved (with 15 bits precision in the above example) in the noise map itself, this value can be used instead when calculating (24). If the other terms are appropriately scaled (using fixed-point arithmetic), (24) can also be calcu- lated only using integer math.
  • pixel values i x , y ,t overstepping the noise model limits (as described above across expressions (l)-(24)) are grouped together into multi-pixel blobs. This can be done using the per se well-known Hoshen-Kopelman algo- rithm, which is a raster-scan method to form such pixel groups that runs in linear time. Dur- ing the first pass, it runs through all pixel values i x , y , t .
  • a pixel value i x , y , t oversteps a limit and it has a neighboring pixel value i x ⁇ i, y ⁇ i,t that belongs to a blob, it will be added to that same blob. If it has multiple neighboring blob-classified pixel values i x ⁇ i, y ⁇ i,t, these will be joined into one single blob, and the pixel value i x , y , t is added to the group. Finally, if there are no neighboring blobs, the pixel value i x , y , t will be registered as a new blob. For each blob, the following metrics can be aggregated. This provides different options for estimating the center of the blob. One possibility is to use the absolute modulus of the noise model devia- tions: and another option is to weight the coordinates by their squared deviations:
  • Figure 7 illustrates an exemplifying clustering of four different detected blobs 1-4 based on individual pixel values i x , y ,t found to fulfill the criteria for being considered as part of blobs at time t.
  • detected blobs are correlated across said time-ordered series of digital images l t to determine paths of moving objects through said space.
  • Such correlation can, for instance, use linear interpola- tion and/or implied Newtonian laws of motion as filtering mechanism, so as to purge blobs not moving in ways that are plausible provided a reasonable model of the types of objects being tracked.
  • tracking information available from such available cameras 110 and any other sensors can be combined to determine one or several 3-dimensional target object 120 tracks through the space 111. This can, for instance, take place using stereoscopic techniques, that are well-known in themselves.
  • one or several determined 2D and/or 3D target object 120 tracks can be output to an external system, and/or graphically displayed on an display of a track- monitoring device. For instance, such displayed information can be used by a golfer using the system 100 to gain knowledge of the properties of a newly hit golf strike.
  • the user (such as a golfer) may be presented with a visual 2D or 3D representation, on a computer display screen, of the track of a golf ball just hit, as detected using the method and system described above, against a graphical representation of a vir- tual golf practice range or similar.
  • This will provide feedback to the golfer that can be used to make decisions regarding various parts of the golf swing.
  • the track may also be part of a virtual experience, in which a golfer may for instance play a virtual golf hole and the de- tected and displayed track is represented as a golf shot in said virtual experience. It is specifically noted that the amount of data necessary to process for achieving such tracks is substantial.
  • the invention also relates to the system 100 as such, comprising the digital camera 110, the digital image analyzer 130 and the moving object tracker 140.
  • the digital camera 110 is then arranged to depict the space 111 to produce the series of digital images l t as described above.
  • the digital image analyzer 130 is configured to deter- mine said inequality for the pixel values i x , y ,t as described above, and to store in the com- puter memory information indicating that one or several pixel values i x , y ,t are part of a de- tected blob.
  • the moving object tracker 140 is configured to correlate detected blobs across said series of digital images l t as described above.
  • the invention also relates to the computer software product as such.
  • the computer software product is then configured to, when executing on suitable hardware as described above, embody the digital image analyzer 130 and the moving object tracker 140. As such, it is configured to receive a series of digital images l t from the digital camera 110, and to perform the above-described method steps performed by the digital image an- alyzer 130 and the moving object tracker 140.
  • the digital frames l t can be pro- vided as a continuous or semi-continuous stream of frames from the digital camera 110 (and a set of n most recent considered frames can be analyzed for each frame or set of frames received), or the entire set of N images can be received as one big batch and ana- lysed thereafter.
  • the computer software product can execute on a computer belonging to the system 100, and can as such constitute part of the system 100.
  • the generated blob data can be used in various ways in addition to the object tracking.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Physical Education & Sports Medicine (AREA)
  • Image Analysis (AREA)

Abstract

Methods, systems, and apparatus, including computer program products, for tracking moving objects include depicting a space using a digital camera to produce a series of digital images (It); for two or more of said pixel values, determining an inequality comparing a first value to a second value, the first value being calculated based on the square of the difference between a pixel value (ix,y,t) and a predicted pixel value (AA), the second value being calculated based on a product of the square of a number Z and an estimated variance for historic pixel values; storing in a computer memory information indicating that the pixel value (ix,y,t) is part of a detected blob; and correlating detected blobs across said 10 series of digital images (It) to determine paths of moving objects.

Description

Method and system for optically tracking moving objects
BACKGROUND OF THE INVENTION
The present invention relates to a method and a system for optically tracking moving ob- jects.
Known methods track moving objects using computer vision, using one or more cameras depicting a space where the moving objects exist. The tracking may be performed by first identifying an object as one image pixel, or a set of adjacent pixels, that deviate from a local background. Such deviating pixels are together denoted a "blob". Once a number of blobs have been detected in several image frames, possible tracked object paths are identified by interconnecting identified blobs in subsequent frames.
One example of such a method is exemplified in US 20220051420 Al.
The blob generation in each individual frame potentially results in very many false positive blobs, in other words identified blobs that do not really correspond to an existing moving object. This may be due to noise, shifting lighting conditions and non-tracked objects occur- ring in the field of view of the camera in question.
The detection of possible tracked object paths normally results in a reduction of such false positives, for instance based on filtering away of physically or statistically implausible paths. Due to the large number of false positive blob detections, however, even if most of the false positives are filtered away in the tracked paths detection step, the blob detection itself is associated with heavy memory and processor load and may therefore constitute a bottle- neck for the object tracking even if high-performance hardware is used.
Moreover, as the performance of digital cameras increases, pixel data output from such cameras increases correspondingly. In order to achieve accurate tracking of moving objects, it is desired to use as accurate and precise image information as possible. In order to avoid too many non-detected blobs (false negatives), leading to potentially missed tracked object paths, it is normally preferred to accept a relatively large share of false positive blob detections.
SUMMARY OF THE INVENTION
The various embodiments described herein solve one or more of the above described prob- lems and provide techniques for tracking the paths of moving objects using less memory and/or processing power compared to conventional object tracking techniques.
Hence, the invention can be embodied as a method for tracking moving objects, comprising the steps obtaining a series of digital images lt at consecutive times t, the digital images lt rep- resenting optical input from a three-dimensional space within a field of view of the digital camera, the digital camera being arranged to produce said digital images lt having a corre- sponding set of pixels px,y, said digital images comprising corresponding pixel values ix,y,t, the digital camera not moving in relation to said three-dimensional space during production of said series of digital images (lt); for two or more of said pixel values ix,y,t, determining an inequality comparing a first value to a second value, the first value being calculated as, or based on, the square of the difference between the pixel value ix,y,t in question and a predicted pixel value fi.xy t, the second value being calculated as, or based on, a product of, firstly, the square of a number Z and, secondly, an estimated variance or standard deviation axy t with respect to historic pixel values ix,y,{t-n,t-i} for the pixel px,y in question, where the predicted pixel value fi.Xiyit is calculated based on historic pixel values ix,y,{t-n,t-i} for the pixel px,y in question, Z being a number selected such that Z2 is an integer such that 10 < Z2 < 20; for pixel values ix,y,t for which said first value is higher than said second value, storing in a computer memory information indicating that the pixel value ix,y,t is part of a detected blob; and correlating, based on the information stored in the computer memory, detected blobs across said series of digital images lt to determine paths of moving objects through said three-dimensional space.
In some embodiments, said inequality is
Figure imgf000005_0004
where s said predicted pixel value
Figure imgf000005_0005
and where ax y t is an estimated standard deviation with respect to historic pixel values ix,y,{t- n,t-i} for the pixel px,y in question.
In some embodiments, the method further comprises storing, in said computer memory, for individual ones of said pixels px,y and for a num- ber n < N, the sums
Figure imgf000005_0001
for individual ones of said pixel values ix,y,t, determining said inequality as
Figure imgf000005_0002
In some embodiments,
Sx,y,t, Qx,y,t, or both, are calculated recursively, whereby a calculated value for a pixel value ix,y,t is calculated using a previously stored calculated value Sx,y,t, Qx,y,t, or both, for the same pixel px,y but at an immediately preceding time t-1.
In some embodiments,
Sx,y,t is calculated as
Figure imgf000005_0006
and Qx,y,t is calculated as
Figure imgf000005_0003
In some embodiments, the method further comprises storing in said computer memory Sx,y,t and Qx,y,t in combination as a single datatype comprising 12 bytes or less per pixel px,y.
In some embodiments, the method further comprises storing in said computer memory, for a particular digital image lt, a pixmap having, for each pixel px,y, said information indicating that the pixel value ix,y,t is part of a detected blob.
In some embodiments, said information indicating that the pixel value ix,y,t is part of a detected blob is indi- cated in a single a bit for each pixel px,y.
In some embodiments, said pixmap also comprises, for each pixel px,y, a value indicating an expected pixel value ix,y,t for that pixel px,y.
In some embodiments, said value indicating an expected pixel value ix,y,t for the pixel px,y in question by storing the predicted pixel value (px,y,t) as a fixed-point fractional number, using a total of 15 bits for the integer and fractional parts.
In some embodiments, the predicted pixel value pxy t, the estimated variance or standard deviation axy t, or both, is or are calculated based on a set of n historic pixel values ix,y,{t-n,t-i} for the pixel px,y in question, where 10 < n < 300.
In some embodiments, a number n of previous images lt considered for the estimation of an estimated vari- ance or a standard deviation ox,y ,t °f the second value is selected to be a power of 2.
In some embodiments, said pixel values ix,y,t have a depth across one or several channels of between 8 and 46 bits.
In some embodiments, the predicted pixel value fix>y>t is determined based on an estimated projected future mean pixel value
Figure imgf000007_0001
in turn determined based on historic pixel values ix,y,t for a sampled set of pixels px,y in said images lt.
In some embodiments, the predicted pixel value p.Xiyit is determined as p.Xiyit =
Figure imgf000007_0002
where a and /? are constants determined so as to minimize is
Figure imgf000007_0003
said estimated projected future mean pixel value for the pixel pj,k in question, and where j and k are iterated over a test set of pixels.
In some embodiments, is an estimated historic mean with respect to pixel values ix,y,t for the pixel pj,k in question.
In some embodiments, said test set of pixels contains between 1% and 25% of the total set of pixels px,y in the image lt.
In some embodiments, said test set of pixels is geometrically evenly distributed across the total set of pixels px,y in the image lt.
In some embodiments, the estimated standard deviation ax y t is determined according to ax y t =
Figure imgf000007_0004
In some embodiments, the method further comprises determining that at least one is true of a being further away from 1 than a first thresh- old value and /? being further away from 0 than a second threshold value; and determining the predicted pixel value, (fix>y>t) according to any one of claims 14-18 until it is determined that a is no longer further away from 1 than the first threshold value and /? is no longer further away from 0 than the second threshold value.
In some embodiments, the method further comprises for said pixel values ix,y,t for which said first value is higher than said second value, only store said information indicating that the pixel value ix,y,t is part of a detected blob in case also the following inequality holds: B[ix y t — fiX:y:t]2 > fix,y,t, where ix y t is the pixel value in question, where p.Xiyit is the predicted pixel value and where B is an integer such that B > 100.
In some embodiments, the method further comprises using a Hoshen-Kopelman algorithm to group together individual adjacent pixels de- termined to be part of a same blob.
In some embodiments, the objects are golf balls.
The invention can be embodied as a method for tracking moving objects, the method com- prising: obtaining a series of digital images I from a digital camera, the digital images I repre- senting optical input from a three-dimensional space within a field of view of the digital camera over time, each of the digital images I having pixels px,y with corresponding pixel values ix,y; performing, at a computer, image segmentation on each image of the series of digital images I using a statistical model of background for the optical input to detect blobs, wherein performing the image segmentation comprises, for each of two or more pixel val- ues ix,y,t in the image, determining an inequality result using a current pixel value ix,y,t for a pixel px,y in a current image lt, first Sxy t and second QXiyit values of the statistical model for the pixel px,y, and a confidence level value Z2, wherein the first Sx y t and second QXiyit values are calcu- lated based on historic pixel values ix,y from images from the series of digital images I before the current image lt, each of the current pixel value ix,y,t, the first Sxy t and second QXiyit values, and the confidence level value Z2 are stored as integer type data in a memory of the computer, and the determining uses integer operations in the computer, and storing in the memory of the computer, information indicating that the current pixel value ix,y,t for the image pixel in the current image lt is part of a detected blob in response to the inequality result; and using the stored information to correlate detected blobs across the series of digital images I to determine paths of moving objects through the three-dimensional space within the field of view of the digital camera.
Moreover, the invention can also be embodied as a system for tracking moving objects, the system comprising a digital camera, a digital image analyzer and a moving object tracker, the digital camera being arranged to represent optical input from a three-dimensional space within a field of view of the digital camera to produce a series of digital images lt at consecutive times t, the digital camera being arranged to produce said digital images lt hav- ing a corresponding set of pixels px,y, said digital images comprising corresponding pixel val- ues ix,y,t, the digital camera being arranged to not move in relation to said three-dimensional space during production of said series of digital images (lt); the digital Image analyzer being configured to, for two or more of said pixel values ix,y,t, determine an inequality comparing a first value to a second value, the first value being calculated as, or based on, the square of the difference between the pixel value ix,y,t in ques- tion and a predicted pixel value fi.xy t, the second value being calculated as, or based on, a product of, firstly, the square of a number Z and, secondly, an estimated variance or stand- ard deviation axy t with respect to historic pixel values ix,y,{t-n,t-i} for the pixel px,y in question, where the predicted pixel value p.Xiyit is calculated based on historic pixel values ix,y,{t-n,t-i} for the pixel px,y in question, and where Z is selected such that Z2 is an integer such that 10 < Z2 < 20; the digital image analyzer being configured to, for pixel values ix,y,t for which said first value is higher than said second value, store in a computer memory information indicating that the pixel value ix,y,t is part of a detected blob; and the moving object tracker being configured to correlate, based on the information stored in the computer memory, detected blobs across said series of digital images lt to determine paths of moving objects through said three-dimensional space.
Furthermore, the invention can also be embodied as a computer software product config- ured to, when executing, receive a series of digital images lt from a digital camera, the digital camera being arranged to represent optical input from a three-dimensional space to produce said digital images lt at consecutive times t, the digital camera being arranged to produce said digital images lt having a corresponding set of pixels px,y, said digital images comprising corre- sponding pixel values ix,y,t, the digital camera being arranged to not move in relation to said three-dimensional space during production of said series of digital images (lt); for two or more of said pixel values ix,y,t, determine an inequality comparing a first value to a second value, the first value being calculated as, or based on, the square of the difference between the pixel value ix,y,t in question and a predicted pixel value p.Xiyit> the second value being calculated as, or based on, a product of the square of, firstly, a number Z and, secondly, an estimated variance or standard deviation axy t with respect to historic pixel values ix,y,{t-n,t-i} for the pixel px,y in question, where the predicted pixel value fi.Xiyit is calculated based on historic pixel values ix,y,{t-n,t-i} for the pixel px,y in question, and where Z is selected such that Z2 is an integer such that 10 < Z2 < 20; for pixel values ix,y,t for which said first value is higher than said second value, store in a computer memory information indicating that the pixel value ix,y,t is part of a detected blob; and correlate, based on the information stored in the computer memory, detected blobs across said series of digital images lt to determine paths of moving objects through said three-dimensional space. The computer software product can be implemented by a non-transitory computer-reada- ble medium encoding instructions that cause one or more hardware processors located in at least one of computer hardware devices in the system to perform the digital image pro- cessing and the object tracking.
BRIEF DESCRIPTION OF THE DRAWINGS
In the following, the invention will be described in detail, with reference to exemplifying embodiments of the invention and to the enclosed drawings, wherein:
Figure 1 is an overview of a system 100 configured to perform a method of the type illus- trated in Figure 3;
Figure 2 is a simplified illustration of a data processing apparatus;
Figure 3 shows a general flowchart for logically tracking moving target objects;
Figure 4 is a flowchart of a method performed by the system 100 shown in Figure 1;
Figure 5 is an overview illustrating a noise model of a type described herein;
Figure 6 shows an image frame illustrating a noise model;
Figure 7 illustrates an example of clustering of pixels into blobs; and
Figure 8 illustrates intensities for a pixel during a sudden exposure change event.
All figures share the same reference numerals for same and corresponding parts.
DETAILED DESCRIPTION
With reference to Figure 1, the method relates to a method for tracking moving target ob- jects 120. Generally, a system 100 can comprise one or several digital cameras 110, each being arranged to represent optical input from a three-dimensional space 111 within a field of view of the digital camera 110, to produce digital images of such moving target objects 120, the objects travelling through a space 111 hence being represented by the digital cam- era 110 in consecutive digital images. Such representation by the digital camera 110 will herein be denoted a "depiction", for brevity. The digital camera 110 is arranged to not move in relation to the space 111 during produc- tion of the series of digital images (lt). For instance, the digital camera 110 may be fixed in relation to said space 111, or, in case it is movable it is kept still during the production of the series of digital images (lt). Hence, the same part of the space 111 is depicted each time by the digital camera 110, and the digital camera 110 is arranged to produce digital images It having a corresponding set of pixels px,y, and so that said produced digital images lt com- prise corresponding pixel values ix,y,t. "x" and "y" denote coordinates in an image coordinate system, whereas "t" denotes time.
That the pixel values ix,y,t of two or more different images lt "correspond" to each other means that individual pixels px,y measure light entering the camera 110 from the same, or substantially the same, light cone in all of the images lt in question. It is realized that the camera 110 may move slightly, due to wind, thermal expansion and so forth, between im- ages lt, but that there is substantial correspondence between pixels px,y even in cases where such noise-inducing slight movement is present. There can be at least 50% overlap between light cones of any one same pixel px,y of the camera 110 between any two consecutive im- ages lt. There may also be cases where the camera 110 is movable, such as pivotable. In such cases an image transformation can be applied to a captured image so as to bring its pixels px,y into correspondence with pixels of a previous or future captured image.
In case the system 100 comprises more than one digital camera 110, several such digital cameras 110 can be arranged to depict the same space 111 and consequently tracking the same moving target object(s) 120 through said space 111. In such cases, the several digital cameras 110 can be used to construct a stereoscopic view of the respective tracked path of each target object 120.
As mentioned, the digital camera 110 is arranged to produce a series of consecutive images It, at different points in time. Such images may also be denoted image "frames". In some embodiments, the digital camera 110 is a digital video camera, arranged to produce a digital moving film comprising or being constituted by such consecutive digital image frames. As is illustrated in Figure 1, the system 100 comprises a digital image analyzer 130, config- ured to analyze digital images received directly from the digital camera 110, or receive from the digital camera 110 via an intermediate system, in same or processed (re-formatted, compressed, filtered, etc.) form. The analysis performed by the digital image analyzer 130 can take place in the digital domain. The digital image analyzer 130 may also be denoted a "blob detector".
The system 100 further comprises an object tracker 140, configured to track said moving target objects 120 across several of said digital images, based on information provided from the digital image analyzer 130. The analysis performed by the object tracker 140 can also take place in the digital domain.
In example embodiments, the system 100 is configured to track target objects 120 in the form of sports objects in flight, such as balls in flight, for instance baseballs or golf balls in flight. In some embodiments, the system 100 is used at a golf practice range, such as a driv- ing range having a plurality of bays for hitting golf balls that are to be tracked using the system 100. In other cases, the system 100 can be installed at an individual golf range bay, or at a golf tee, and configured to track golf balls being struck from said bay or tee. The system 100 can also be a portable system 100, configured to be positioned at a location from which it can track said moving target objects 120. It is realized that the monitored "space" mentioned above will, in each of these and other cases, will be a space through which sport balls are expected to move.
Various types of computers can be used in the system 100. The digital image analyzer 130 and the object tracker 140 constitute examples of such computers. In some cases, the digital image analyzer 130 and the object tracker 140 can be provided as software functions exe- cuting on one and the same computer. The one or several digital cameras 110 can also be configured to perform digital image processing, and then also constitute examples of such computers. In some embodiments, the digital image analyzer 130 and/or the object tracker 140 is or are implemented as software functions configured to execute on hardware of one or several digital cameras 110. In other embodiments, the digital image analyzer 130 and/or the object tracker 140 is or are implemented on standalone or combined hardware plat- forms, such as on a computer server.
The one or several digital cameras 110, the digital image analyzer 130 and the object tracker 140 are configured to communicate digitally, either via computer-internal communication paths, such as via a computer bus, or via computer-external wired and/or wireless commu- nication paths, such as via internet network 10 (e.g., the Internet). In implementations that need substantial communications bandwidth, the camera(s) 110 and the digital image ana- lyzer 130 can communicate via a direct, wired digital communication route, which is not over the network 10. On the other hand, the digital image analyzer 130 and the object tracker 140 may communicate with each other over the network 10 (e.g., a conventional Internet connection).
The essential elements of a computer, in general, are a processor for performing instruc- tions and one or more memory devices for storing instructions and data. As used herein, a "computer" can include a server computer, a client computer, a personal computer, em- bedded programmable circuitry, or a special purpose logic circuitry. Such computers can be connected with one or more other computers through a network, such as the internet 10, or via any suitable peer-to-peer connection for digital communications, such as a Blue- tooth® connection.
Each computer can include various software modules, which can be distributed between an applications layer and an operating system. These can include executable and/or interpret- able software programs or libraries, including various programs that operate, for instance, as the digital image analyzer 130 program and/or the object tracker 140 program. Other examples include a digital image preprocessing and/or compressing program. The number of software modules used can vary from one implementation to another and from one such computer to another. Each of said programs can be implemented in embedded firmware and/or as software modules that are distributed on one or more data processing apparatus connected by one or more computer networks or other suitable communication networks. Figure 2 illustrates an example of such a computer, being a data processing apparatus 300 that can include hardware or firmware devices including one or more hardware processors 312, one or more additional devices 314, a non-transitory computer readable medium 316, a communication interface 318, and one or more user interface devices 320. The processor 312 is capable of processing instructions for execution within the data processing apparatus 300, such as instructions stored on the non-transitory computer readable medium 316, which can include a storage device such as one of the additional devices 314. In some im- plementations, the processor 312 is a single or multi-core processor, or two or more central processing units (CPUs). The data processing apparatus 300 uses its communication inter- face 318 to communicate with one or more other computers 390, for example, over the network 380. Thus, in various implementations, the processes described can be run in par- allel, concurrently, or serially, on a single or multi-core computing machine, and/or on a computer cluster/cloud, etc.
The data processing apparatus 300 includes various software modules, which can be dis- tributed between an applications layer and an operating system. These can include execut- able and/or interpretable software programs or libraries, including a program 330 that con- stitutes the digital image analyzer 130 described herein, configured to perform the method steps performed by such digital image analyzer 130. The program 330 can also constitute the object tracker 140 described herein, configured to perform the method steps per- formed by such object tracker 140.
Examples of user interface devices 320 include a display, a touchscreen display, a speaker, a microphone, a tactile feedback device, a keyboard, and a mouse. Moreover, the user in- terface device(s) need not be local device(s) 320, but can be remote from the data pro- cessing apparatus 300, e.g., user interface device(s) 390 accessible via one or more commu- nication network(s) 380. The user interface device 320 can also be in the form of a standalone device having a screen, such as a conventional smartphone being connected to the system 100 via a configuration or setup step. The data processing apparatus 300 can store instructions that implement operations as described in this document, for example, on the non-transitory computer readable medium 316, which can include one or more ad- ditional devices 314, for example, one or more of a floppy disk device, a hard disk device, an optical disk device, a tape device, and a solid state memory device (e.g., a RAM drive, a Flash memory or an EEPROM). Moreover, the instructions that implement the operations described in this document can be downloaded to the non-transitory computer readable medium 316 over the network 380 from one or more computers 390 (e.g., from the cloud), and in some implementations, the RAM drive is a volatile memory device to which the in- structions are downloaded each time the computer is turned on.
It is realized that the described computer hardware can be physical hardware, virtual hard- ware or any combination thereof.
As mentioned, the system 100 is configured to perform a method according to one or more embodiments for optically tracking moving target objects 120.
The present invention can furthermore be embodied as a computer software product, con- figured to perform said method when executing on computer hardware of the type de- scribed herein. The computer software product can hence be deployed as a part of the sys- tem 100 so as to provide the functionality required to perform the present method.
Both said system 100 and said computer software product are hence configured to track moving target objects 120 moving through said space 111 in relation to one or several digital cameras 110, by comprising or embodying the above-mentioned digital image analyzer 130 and object tracker 140, in turn being configured to perform the corresponding method steps described herein.
In general, everything that is said in relation to the presently described method is equally applicable to the system 100 and to the computer software product described herein, and vice versa. Figure 3 illustrates a general flowchart for tracking moving target objects 120 based on dig- ital image information received from one or several digital cameras 110.
In computer vision, "image segmentation" is the process of separating an image into differ- ent regions, representing target objects within it. Generally, it is desirable to distinguish potential moving target objects from a background. The background may in general be changing and noisy, and is in many cases quite unpredictable. In the example of a golf ball, for instance, when such a ball is far away from the digital camera 110 depicting the ball 120, it may be even as small as one single pixel px,y in the digital image frame produced by the digital camera 120.
For these reasons, it is in general not possible to separate out a foreground object 120 from a background based only on a detected shape in relation to an expected shape of the target object 120. Instead, it is proposed to set up a statistical model of the background (in the following denoted a "noise model"), and to identify pixels px,y that by a probability measure deviate from an expected value with more than a threshold value, based on this model. Adjacent pixels px,y in the detected digital image that deviate from the expected value in accordance with the model are grouped together into a "blob" of pixels px,y ("blob aggrega- tion").
Such a method may result in a very large number of false positives, such as about 99.9% false positives. However, a subsequent motion tracking analysis can sort out the vast ma- jority of all false positives, such as only keeping blobs that seem to obey Newton's laws of motion between consecutive digital image frames lt.
The noise model step, as depicted in Figure 3, is used to suppress noise in the image frames, with the purpose of lowering the number of detected blobs in the subsequent blob aggre- gation step. The noise model analyzes a plurality of pixels px,y, such as every pixel px,y, in said image frames lt, and is therefore at risk of becoming a major bottleneck. These calculations, aiming to identify noise that does not conform to a detected statistical pattern in order to identify outliers, can be handled by high-performance GPUs (Graphics Processing Units), but performance may still prove to be a problem. The approach described herein has turned out to drastically reduce the computational power required per pixel px,y in a moving target object 120 tracking system 100. This reduction can be exploited by using simpler hardware, lower power consumption or a larger incoming image bitrate.
Turning now to Figure 4, a method according to one or more embodiments is illustrated.
In a first step SI, the method starts.
In a subsequent step S2, a number Z is selected such that Z2 is an integer. The number Z can be selected such that Z2 is an integer such that 10 < Z2 < 20. It is noted that Z may be a non-integer, as long as Z2 is an integer value. This step S2 may be performed ahead of time, such as during a system 100 design process or a system 100 calibration step.
In a subsequent step S3, the space 111 is depicted using the digital camera 110 to produce a series of digital images it at consecutive times t. The space 111 can be depicted using the digital camera 110 to produce a series of N digital images it at consecutive times t. However, it is realized that the procedure can also be a continuous or semi-continuous procedure, wherein the digital camera 110 will continue to produce digital images it at consecutive times t so long as the procedure is ongoing. Hence, in this case the number of digital images N will grow by 1 for each captured frame. In either case, the series of digital images it at consecutive times t may be seen as a stream of digital images captured much like a digital video stream.
In a subsequent step S4, for two or more (e.g., several) of said pixel values ix,y,t, an inequality is determined, involving comparing a first value to a second value.
The first value is calculated based on the square of the difference between the pixel value ix,y,t in question and a calculated predicted pixel value p.Xiyit for that pixel px,y. The second value is calculated based on a product of, on the one hand, the square of the selected num- berZ, this square then being an integer value, and, on the other hand, an estimated variance or standard deviation cTx,y,t with respect to historic pixel values ix,y,{t-n,t-i} for the pixel px,y in question. Concretely, the second value can be calculated based on said estimated variance or a square of the estimated standard deviation
Figure imgf000019_0001
The predicted pixel value p.Xiyit is also calculated based on historic pixel values ix,y,{t-n,t-i} for the pixel px,y in question, in other words using information from image frames lt-at captured by the camera 110 at points in time prior to the time t. The predicted pixel value (iXy)t can be calculated based on the same, or different, set of historic pixel values ix,y,{t-n,t-i} as the estimated variance or standard deviation ax y t.
In the notation used herein, "n" denotes the number of historic pixel values ix,y,t, considered by the noise model, counting backwards from the currently considered image frame. This notation hence assumes that the same consecutive pixel values ix,y,t, up to the presently considered image frame, are used to calculate both the first and the second value, but it is realized that any suitable contiguous or non-contiguous, same or different, intervals of pixel values ix,y,t can be used to calculate the first and the second value, respectively.
In general, the equations and expressions disclosed and discussed herein are provided as illustrative examples, and it is realized that in practical embodiments they can be tailored to specific needs. This can include, for instance, the introduction of various constant factors and scaling factors; additional intermediate calculation steps, such as filtering steps; and so forth.
In some embodiments, said inequality may be written as:
Figure imgf000019_0002
where (iXy)t is said predicted pixel value and where ax y t is an estimated standard deviation with respect to said historic pixel values ix,y,{t-n,t-i} for the pixel px,y in question. In general, the presently described noise model can be configured to, for each pixel px,y, estimate a moving average and standard deviation based on the last n image frames, and then to use these metrics to decide whether the pixel value ix,y,t in the same image location in the new frame deviates from the expected value more than an allowed limit.
This model can be designed to assume that any pixel in the background of the considered image it has an intrinsic Gaussian noise, as long as the background only contains features that are assumed to be static in the first approximation. A normal distribution can be used to establish a suitable confidence interval. For instance, if a Z score of 3.464 is used, it can be seen that 99.95% of all samples with no significant differences from the background fall within the corresponding confidence interval. Therefore, a pixel px,y with signal value ix,y at time t is considered to have a significant difference from the background if:
Figure imgf000020_0001
Here, k iterates over the previous n frames. The limit is based on the (uncorrected) standard deviation:
Figure imgf000020_0002
The corrected (unbiased) standard deviation would be a mathematically more correct choice, i.e. a more accurate estimate of <J would result from dividing by n-1 rather than by n. However, for the present purposes this is not significant, since the limit used is a multiple of the standard deviation that may be freely selected. Selecting the number n of previous image frames considered for the estimation of the standard deviation in the second value (used in evaluating said inequality) to be a power of 2 (e.g. 16, 32, 64, ...), we can get com- putationally efficient multiplications and divisions at a very low cost, by using shifting oper- ations. When processing an innage frame h, pixel values ix,y from frames k E [t — n, t — 1] are used. A variant of the formula for computing the standard deviation that allows for it to be com- puted in a single pass is the following. Here the expression for the estimate of the mean is also provided: )
Figure imgf000021_0005
)
Set QXiyit = re- written as:
Figure imgf000021_0001
Revisiting (1), it is safe to square both sides, since both the left-hand and right-hand sides of this equation are non-negative:
Figure imgf000021_0002
Combining (4) and (6) yields:
Figure imgf000021_0003
This is equivalent to:
Figure imgf000021_0004
It is noted that n < N. Hence, the above-discussed inequality can be expressed as (8), with
Figure imgf000022_0002
Since iXryrt, n, SXryrt and Qx,y,t are all integers, and since Z can be picked to produce an appro- priate or desired number of false positives, the entire calculation can be done using only integer numbers. This means that the calculations can be performed without any loss of precision due to floating point truncation errors. Also, integer operations are typically faster than their floating-point counterparts.
In the following table, various outcomes for different selected values of Z are shown:
Z2 Z P (number of false positives, ppm)
12 3.46410 532.1
13 3.60555 311.6
14 3.75166 175.7
15 3.87298 107.6
16 4.00000 63.4
Equation (8) depends on knowledge of the sum S and the squared sum Q of the last n ob- servations of the pixel value ix,y,t in question:
Figure imgf000022_0001
While it would be possible to calculate the statistics using (9) and (10) directly, for each pixel value ix,y,t of each frame lt, it is much more computationally efficient to use a recursive def- inition, where in every step the new frame lt is added to the noise model, and the frame lt- n from n frames back is removed:
Figure imgf000022_0003
Updating Qx,y,t requires two multiplications to generate the squares. However, since Qx,y,t involves a difference of squares, it can be reduced to one single multiplication if rewritten as follows:
Figure imgf000023_0001
(14) and (15) are then the full calculations required to update the noise model. A straight- forward implementation would require only 3 (int) additions, 1 (int) subtraction and 1 (int) multiplication per pixel, which makes it very computationally efficient. Furthermore, these calculations can be accelerated by use of SIMD instructions sets such as AVX2 (on x86_64) or NEON (on aarch64), or they can be run on a GPU or even implemented on an FPGA.
The calculations performed to update the noise model between consecutive image frames It are conceptually illustrated in Figure 5, showing how the "New frame" lt is added to the Noise Model, and a frame lt.n from n frames back, such as the last frame in the currently considered "Queue of frames in model" is removed. As has been described above, this can be performed efficiently by considering the individual pixel values ix,y,t and ix,y,t-n, calcuting the value of z+ and z..
From the above it is clear that 0 < S < 2bn and 0 < S < 22bn, where b is the bit depth of the input pixel value ix,y,t data. In some embodiments, n is not more than 300, or not more than 256, or not more than 128, and such as not more than 64 = 26 (the averaging is not performed over more than 64 consecutive image frames lt). In some embodiments, n can be as low as 32, or even as low as 16 or even 10. In some embodiments, the n frames con- sidered at each point in time are the n latest frames captured and provided by the camera 110. In this case, the n frames can together cover a time period of between 0.1 s and 10 s, such as between 0.5 s and 2 s, of captured video. In other words, the number of considered frames n can be relatively close to a frame rate used by the digital camera 110. The noise model may then be required to store two integers per pixel px,y, in addition to keeping the actual image frames in memory for at least as many frames lt as the length of the window size n. Furthermore, an additional single-precision float may be required per pixel to store the estimated variance if the calculation (as described in equation (19), below) is used.
In some embodiments, the pixel values ix,y,t have a bit depth across one or several channels of between 8 and 48 bits, such as a single channel (for instance a gray channel) of 8 or 16 bit depth or three channels (such as RGB) of 16 or 24 bit depth.
In case the camera 110 provides pixel value ix,y,t information across several color channels, the pixel values ix,y,t can be transformed into a single-channel (such as a gray scale channel) before processing of the pixel values ix,y,t by the digital image analyzer 130. Alternatively, only one such channel, out of several available channels, can be used for the analysis. Fur- ther alternatively, several channels can be analyzed separately and in parallel, so that a pixel that is detected to be a blob in at least one such analyzed channel is determined to be a blob at any point in time.
The transformed pixel values ix,y,t can have a bit depth of at least 8 bits, and in some em- bodiments at the most 24 bits, such as at the most 16 bits. A bit depth of 12 bits has proven to strike a reasonable balance between speed, memory requirements and output quality. In case input data has a higher bit depth than required, the data from the camera 110 can be transformed (down-sampled) before processing by the digital image analyzer 130.
More generally, the number of bits required can be found for S as D + log2(n) and for Q as 2D + log2(n), where D is a bit depth for one single considered channel.
The following table shows the required storage space for S and Q depending on the used pixel value ix,y,t bit depth when n = 64: Pixel bit depth S required bits Q required bits
8 14 (uintl6) 22 (uint32)
10 16 (uintl6) 26 (uint32)
12 18 (uint32) 30 (uint32)
16 22 (uint32) 38 (uint64)
In general, the method can comprise a step in which the noise model is updated and stored in computer memory, as a collection of updated noise model information (S and Q) with respect to individual pixels px,y for which blob detection is to be performed. This noise map can hence be updated and stored for each pixel px,y in the image.
Using the above-explained calculations, it is possible to store, in said computer memory, updated values for Sx,y,t and Qx,y,t in combination as a single datatype (such as a single struc- ture, record or tuple), the datatype comprising 12 bytes or less, or 10 bytes or less, or even 8 bytes or less, per pixel px,y. This storing, for each analyzed pixel value ix,y,t (such as for all pixels px,y in the image lt), of updated values for Sx,y,t and Qx,y,t in combination as a single datatype, constitutes an example of the "noise model" described herein. Hence, the noise model is updated for each analyzed digital image frame lt, such as for each individual image frame lt in the set of consecutive image frames lt produced and provided by the (each) digital camera 110.
In the same step S4, for pixel values ix,y,t for which said first value is found to be higher than said second value, information is stored in said computer memory, the information indicat- ing that the pixel value ix,y,t is part of a detected blob.
This storing can take place in a generated pixmap, in other words a data structure having such indicating information for each pixel px,y. The information for each pixel px,y that it be- longs or does not belong to a blob for that image frame lt can be stored very computation- ally efficient, since it can be stored as a single binary bit. One way of implementing such a pixmap in practice is to use a "noise map" of the general type that will be described in the following, where the pixmap also comprises, for each pixel px,y, a value indicating an expected pixel value ix,y,t for that pixel px,y.
Hence, for each frame, the noise model established as described above can be used to gen- erate such a noise map, that for every pixel position px,y provides information about whether or not that particular pixel value ix,y,t in the new frame lt was outside of the allowed limits (that is, if (6) or (8) was true). In addition, the noise map can store an expected signal value for each pixel px,y at time t, such as based on the calculations performed in the deter- mination of the noise model. The expected signal value is useful in downstream calculations, such as in a subsequent blob aggregation step, and so it is computationally efficient to es- tablish and store this information already at this point.
Figure 6 illustrates the noise model after being updated based on the information of a most recently available image frame lt, and in particular how the frame lt relates to the values of Sx,y and Qx,y for that time t.
Even though it would be possible to first emit the noise map for each new image frame lt arriving at the digital analyzer 130, and only thereafter to update the noise model in the digital analyzer 130, both of them can be done in one go, without unloading or overwriting the information in memory between said calculations. Hence, the (each) new image frame It is loaded into the CPU memory; and z+, z., Sx y t,
Figure imgf000026_0001
are calculated for each pixel px,y, as the case may be, before the loaded data is unloaded or overwritten in the CPU memory. The advantage achieved then is to avoid memory access becoming a bot- tleneck. Once the penalty of loading the data into the CPU has been paid, all the necessary calculations are performed before unloading or overwriting the data in the CPU memory.
In the following example, the noise map requires 16 bits per pixel px,y to store. This infor- mation can be stored in a single two-byte datatype (such as an uintl6). The information indicating whether or not the pixel px,y corresponding to each noise map entry is a blob pixel or not can be stored in the form of one single bit out of the total number of stored bits for the pixel px,y in question in the noise map. In some embodiments, the most significant bit in datatype used to store noise map data for each pixel px,y, such as the most significant bit in the exemplifying two-byte structure, indicates whether the pixel value ix,y,t in question is outside the blob generating limits. Then, the lower 15 bits can encode the expected (average) pixel value ix,y signal, scaled to 15 bits precision and can be stored in fixed-point representation. It is noted that this expected pixel value ix,y signal corresponds to the above-discussed predicted pixel value fi.xy t. In other words, the value in the noise map indicating an expected pixel value ix,y,t for the pixel px,y can be achieved by transforming (if necessary) the predicted pixel value
Figure imgf000027_0001
to a grayscale bit depth of 15 bits.
In one example, the encoding is according to the following, for performance reasons: First, the expected signal is scaled to 15 bits (0..32767). If n = 32 and the input pixel depth is 12 bits, this means that St uses 17 bits for each pixel value ix,y,t. A simple shift operation will divide this number by 4, which puts it in the 15 bit range. Secondly, if the pixel value ix,y,t is within the limits given in equation (6) (or as given in its reformulated form (8)), all bits are negated. A noise map consumer can therefore iterate through the pixels px,y of the noise map data and ignore all entries that have the most significant bit set to 1.
It should be noted that the pixmap for each pixel at least or only contains information on 1) whether that pixel is part of a blob and 2) the predicted pixel value for that pixel. In this case, the prediction is simply the arithmetic mean of the previous n frames, but we will, later on, describe a method, which can be an alternative method to the one described so far, to predict the value to be used when the recent frames have large changes in capture parameters such as shutter time or gain.
In some embodiments, the stored noise model incorporates all available information from image frames lt received to the digital image analyzer 130 from the camera 110. In other words, it can use n consecutive or non-consecutive image frames lt up until a most recently received image frame lt to calculate values for Qx,y and Sx,y. On the other hand, the estimated projection (predicted pixel value fiXiyit) data stored for each pixel px,y in the noise map can be updated only using a second-to-most recently received image frame lt, i.e. not using a most recently received image frame lt that contains the pixel values ix,y,t to be assessed with respect to blob classification. In practice, this may mean that the previous values for Sx,y,t, before being updated using the most recently received pixel values ix,y,t, can be used to cal- culate the (transformed) predicted data which is then stored in the pixmap.
In the above example, the predicted pixel value p.Xiyit is determined as (or at least based on) an estimated projected future mean pixel value nx y t, in turn determined based on historic pixel values ix,y,t for a sampled set of pixels px,y in said sequence of image frames lt.
In embodiments that will be described in more detail in the following, the predicted pixel value p.Xiyit is determined as fi.Xiyit = a.iix y t + ft, where a and /? are constants determined so as to minimize the expression
Figure imgf000028_0001
Figure imgf000028_0002
is said estimated projected future mean pixel value for the pixel pj,k in question, and where j and k are iterated over a test set of pixels px,y in the image frame lt. The deter- mination of a and /? can take place in any per se conventional manner, which is well within the reach of the skilled person. As is the case for the above-described noise model, in some embodiments, fiXiyit can be an estimated historic mean with respect to pixel values ix,y,t for the pixel pj,k in question.
The above-described pure variance based noise model has proven to give good results in a wide range of environments. However, if the light conditions in the image change too quickly, the noise map will be flooded with outliers at first. In the image frames lt that follow upon such changed light conditions, the standard deviation estimate will be in inflated, which instead leads to some degree of blindness until the noise model stabilizes again. The suitability of different variants of the presently described method can also vary depend- ing on the camera 110 hardware used. For instance, exposure and gain can be more or less coarse for different types of cameras, and aperture changes can be performed more or less quickly.
It is then proposed to estimate a linear mapping between the average intensity value in the noise model and the pixel intensities in the new frame. That is, find values for variables a and /? that minimize (16).
In (16), j may represent a sample or test set of pixels px,y, such as a set of pixels px,y evenly distributed (geometrically) pixel positions in the image frame lt.
To be clear, when establishing the coefficients a and /?, pixels px,y from different positions in the same image frame lt are considered, and such pixels px,y are compared with their corresponding positions in the noise model data.
In some embodiments, said test set of pixels px,y can contain at least 0.1%, such as at least 1%, such as at least 10%, of the total set of pixels px,y in the image lt. In some embodiments, said test set of pixels px,y can contain at most 80%, such as at most 50%, such as at most 25%, such as at most 10%, of the total set of pixels px,y in the image lt. The test set of pixels px,y can be geometrically evenly distributed across the total set of pixels px,y in the image lt. For instance, the set can form a uniform sparse pattern extending across the entire image It, or extending across at least 50% of the image lt; or the set can form a sparsely but evenly distributed set of vertical and/or horizontal full or broken lines distributed across the entire image lt, or at least 50% of the image lt. In some embodiments, pixels that are overexposed are not included in the test set. This can be determined by comparing the pixel values to a known threshold value, often provided by the sensor manufacturer. If it is not known, the threshold value can easily be established experimentally.
Next, the equation for checking the limits (that is, (6) above or an equivalent formulation of this equation), is updated according to the following:
Figure imgf000030_0001
Also, since the variance ax,y t changes over time, the variance estimate needs to be updated as well. It is unfortunately not feasible to use the value from (4), since it will be inflated by the exposure change that is already compensated for by using px,y,t as explained above. Instead, it is updated by weighing in the current squared deviation:
Figure imgf000030_0002
[ is the factor that decides how much weight should be given to this deviation com-
Figure imgf000030_0003
pared to the existing value. The higher f, the faster the noise model will adapt to fluctua- tions.
These combined, together with observing that if we apply a scaling factor a to the input, the variance can be scaled appropriately:
Figure imgf000030_0004
This variant of the noise model requires <Jx,y,t to be stored in an array, typically with one single-precision float (32 bits) per pixel px,y. As comparison, the pure variance noise model stores (indirectly, by storing S and Q that allows for calculation of ox,y,t as described above) the estimated variance <Jx,y,t for each pixel Px,y when run. When this linear mapping model is used, Ox,y,t is updated using (19). Since the definition is recursive, the variance of this pixel in the previous frame will either be calculated from S and Q or from the previous iteration's calculation of (19) for this pixel. The sums Sx,y,t and square-sums Qx,y,t still need to be up- dated for every image frame lt, in order to have the numbers available as soon as the step effect has passed. To illustrate the case when the predicted pixel value is determined as fix>y>t =
Figure imgf000031_0001
+ P> an example will now be provided as illustrated in Figure 8. In the chart shown therein, th Y axis shows pixel intensity ix,y,t for one particular pixel px,y i in a sequence of consecutive dig- ital image frames lt. The X axis depicts frame numbers. The window size n = 32, which means that during the first 32 frames, the model is still being initialized. Once 32 frames have been processed, the model contains sufficient information to make predictions of expected mean P-X,y,t and variance aX)y)t. The line AVG shows the rolling average of the last 32 frames, which is the predictor determined according to (5).
As can be seen in the graph, the true signal value fluctuates around 2021 from the start until frame #60, where there is a sudden change in exposure time. The exposure times used can be provided as a part of the frame" metadata. If the exposure time in the new frame differs significantly from the exposure times of the recent frames, the method described in con- nection to (17)-(19) should be used, since the levels have shifted and the model will be con- taminated while this is happening.
As can also be seen on the graph, it takes 32 frames for fiXiyit to fully stabilize on the new level. Until that point is reached, p.Xiyit is not a particularly good predictor, since it is lagging behind. In order to compensate for this, the rolling average goes through a linear transfor- mation according to (18). This outcome is shown as "Adj AVG" in the graph. It can clearly be seen that this corresponds much better to the pixel values.
Similarly, as can be seen in the following table (corresponding to the graph in Figure 8), the variance ax y t is somewhere around 250 before the exposure change, whereas it gets in- flated all the way up to 4200 while the model is adapting. This is why the variance update method according to (19) is put into use. When processing frame 60, it first transforms the average value p.Xiyit to fix,y,t using the linear mapping. It calculates the new pixel value's ix,y,t deviation from fiXiyit and decides whether it is outside the limits, according to (20). If this is the first frame where the exposure change was noticed, the variance
Figure imgf000031_0002
of the previous frame is used. This is initially based on S and Q (as determined according to the above), but is useful since S and Q still only includes pixel values ix,y,t from before the exposure change. Finally, the <Jx,y,t+i to be use for the next frame is calculated according to (19). Below, # =
Frame number; PV = Pixel Value; AVG = AVG; AAVG = Adj AVG.
Figure imgf000032_0001
Figure imgf000032_0002
Figure imgf000032_0003
Figure imgf000033_0001
Figure imgf000033_0002
Figure imgf000033_0003
S and Q can continue to be updated as above, and can be used in order for the model to stabilize on the new level. Once the point is reached where a « 1 and /? « 0, the average and variance are considered to be stable again and can go back to the usual way of calcu- lating the variance.
Once the information about blob-allocated pixel values ix,y,t has been updated (and the noise map has also been updated), in a subsequent step 56 blobs are generated based on the blob-allocated pixel values ix,y,t.
Blob generation is the process of iterating over the individual pixels px,y in a generated noise map, filtering out false positives and forming blobs from connected clusters of outliers. While it is important that the noise map generation is efficient, more computation per pixel px,y can be offered in the blob generation as long as it is knows that the pixel value in ques- tion ix,y,t indeed overstepped the threshold in the noise map generation. Whereas setting the limits based on mean and sample standard deviation of the recent pixel values ix,y,t works well in most cases, one notable problematic issue arises when parts of the image lt become overexposed. In this case, the signal value tends to be saturated on some value close to the upper limits of the range, and since the affected pixel values ix,y,t as a result stop fluctuating over time, the standard deviation also becomes zero, which in turn means that even the slightest change would lead to blobs being generated.
To address this issue, one can add an additional minimum required deviation, in a step S5, used in the blob generation step as an anti-saturation filter:
Figure imgf000034_0001
where is the noise model's prediction for the pixel value ix,y,t. If the deviation is less than this, the pixel value ix,y,t is discarded as a non-blob pixel despite it overstepping the initial limits set up by the noise model.
Since square roots are computationally expensive, it is better to use:
Figure imgf000034_0002
q is > 0 but « 1, implying that q2 is even smaller. Define:
Figure imgf000034_0003
B is a positive number that controls the filtering limit. Since any number for B that gives the appropriate filtering effect can be selected, one can decide to pick an integer value. In some embodiments, B is at least 10, such as at least 50, such as at least 100. In some embodi- ments, B is at the most 10000, such as at the most 1000.
Then, the condition can be rewritten as:
Figure imgf000035_0001
Since the noise map and the noise model were updated in the same step, the noise model that this currently considered noise map was based on is already lost when arriving at the blob generation step. The noise model data is overwritten in the computer memory each iteration of the method. However, since
Figure imgf000035_0002
was saved (with 15 bits precision in the above example) in the noise map itself, this value can be used instead when calculating (24). If the other terms are appropriately scaled (using fixed-point arithmetic), (24) can also be calcu- lated only using integer math.
After a possible such anti-saturation filtering step, pixel values ix,y,t overstepping the noise model limits (as described above across expressions (l)-(24)) are grouped together into multi-pixel blobs. This can be done using the per se well-known Hoshen-Kopelman algo- rithm, which is a raster-scan method to form such pixel groups that runs in linear time. Dur- ing the first pass, it runs through all pixel values ix,y,t. If a pixel value ix,y,t oversteps a limit and it has a neighboring pixel value ix±i,y±i,t that belongs to a blob, it will be added to that same blob. If it has multiple neighboring blob-classified pixel values ix±i,y±i,t, these will be joined into one single blob, and the pixel value ix,y,t is added to the group. Finally, if there are no neighboring blobs, the pixel value ix,y,t will be registered as a new blob. For each blob, the following metrics can be aggregated. This provides different options for estimating the center of the blob. One possibility is to use the absolute modulus of the noise model devia- tions:
Figure imgf000035_0003
and another option is to weight the coordinates by their squared deviations:
Figure imgf000035_0004
Short Name Type Description Xwi weightedXSum uint32 coordi-
Figure imgf000036_0005
nates of blob weighted by the deviation from the noise model.
Ywi weightedYSum coordi-
Figure imgf000036_0004
nates of blob weighted by the deviation from the noise model.
W4 weightSum
Figure imgf000036_0001
all the absolute deviations from the noise model.
Xw2 sqWeightedXSum u int32
Figure imgf000036_0002
coordi- nates of blob weighted by the squared deviation from the noise model.
Vw2 sqWeightedYSum u int32 Xp Py\ip ~ ^P]2 ■ Y coordi- nates of blob weighted by the squared deviation from the noise model.
W2 sqWeightSum uint32
Figure imgf000036_0003
all the square deviations from the noise model.
Experimental data so far indicates that using (xr; yj when the blob is small (blob size in pixels not larger than 16), but using the squared-weighted (%2J £2) options for larger blobs (number of pixels in blob > 32), and interpolating between these for medium-sized blobs achieves a good stereo matching. Figure 7 illustrates an exemplifying clustering of four different detected blobs 1-4 based on individual pixel values ix,y,t found to fulfill the criteria for being considered as part of blobs at time t.
In a subsequent method step S7, performed by the target object tracker 140, detected blobs are correlated across said time-ordered series of digital images lt to determine paths of moving objects through said space. Such correlation can, for instance, use linear interpola- tion and/or implied Newtonian laws of motion as filtering mechanism, so as to purge blobs not moving in ways that are plausible provided a reasonable model of the types of objects being tracked.
In case several cameras 110 are used, or in case one or several cameras 110 are used to- gether with another type of target object 120 sensor, tracking information available from such available cameras 110 and any other sensors can be combined to determine one or several 3-dimensional target object 120 tracks through the space 111. This can, for instance, take place using stereoscopic techniques, that are well-known in themselves.
In a subsequent step S8, one or several determined 2D and/or 3D target object 120 tracks can be output to an external system, and/or graphically displayed on an display of a track- monitoring device. For instance, such displayed information can be used by a golfer using the system 100 to gain knowledge of the properties of a newly hit golf strike.
In concrete examples, the user (such as a golfer) may be presented with a visual 2D or 3D representation, on a computer display screen, of the track of a golf ball just hit, as detected using the method and system described above, against a graphical representation of a vir- tual golf practice range or similar. This will provide feedback to the golfer that can be used to make decisions regarding various parts of the golf swing. The track may also be part of a virtual experience, in which a golfer may for instance play a virtual golf hole and the de- tected and displayed track is represented as a golf shot in said virtual experience. It is specifically noted that the amount of data necessary to process for achieving such tracks is substantial. For instance, at an updating frequency of 100 images per second and using a 10 Mpixel camera, 1 billion pixel values per second need to be processed and assessed with respect to blob status. This analysis may take place in a depicted space 111 that can include trees and other fine-granular objects displaying rapidly shifting light conditions; rapidly shifting general light conditions due to clouding, and so forth. Using the systems and tech- niques described herein, it is possible to process the data in essentially real-time, e.g., such that the track can be determined and output while the object is still in the air.
In a subsequent step S9, the method ends.
As mentioned above, the invention also relates to the system 100 as such, comprising the digital camera 110, the digital image analyzer 130 and the moving object tracker 140.
The digital camera 110 is then arranged to depict the space 111 to produce the series of digital images lt as described above. The digital image analyzer 130 is configured to deter- mine said inequality for the pixel values ix,y,t as described above, and to store in the com- puter memory information indicating that one or several pixel values ix,y,t are part of a de- tected blob. The moving object tracker 140 is configured to correlate detected blobs across said series of digital images lt as described above.
As also mentioned, the invention also relates to the computer software product as such. The computer software product is then configured to, when executing on suitable hardware as described above, embody the digital image analyzer 130 and the moving object tracker 140. As such, it is configured to receive a series of digital images lt from the digital camera 110, and to perform the above-described method steps performed by the digital image an- alyzer 130 and the moving object tracker 140. For instance, the digital frames lt can be pro- vided as a continuous or semi-continuous stream of frames from the digital camera 110 (and a set of n most recent considered frames can be analyzed for each frame or set of frames received), or the entire set of N images can be received as one big batch and ana- lysed thereafter. The computer software product can execute on a computer belonging to the system 100, and can as such constitute part of the system 100.
Above, a number of embodiments have been described. However, it is apparent to the skilled person that many modifications can be made to the disclosed embodiments without departing from the basic idea of the invention.
For instance, many additional data processing, filtering, transformation, etc. steps can be taken, in addition to the ones being described herein.
The generated blob data can be used in various ways in addition to the object tracking.
In general, everything which is said in relation to the method is equally applicable to the system and to the computer software product, and vice versa.
Hence, the invention is not limited to the described embodiments, but can be varied within the scope of the enclosed claims.

Claims

C L A I M S
1. A method for tracking moving objects, the method comprising: obtaining a series of digital images (lt) at consecutive times (t), the series of digital images (lt) representing optical input from a three-dimensional space within a field of view of the digital camera, the digital camera being arranged to produce said series of digital images (lt) having a corresponding set of pixels, said series of digital images comprising cor- responding pixel values, the digital camera not moving in relation to said three-dimensional space during production of said series of digital images (lt); for two or more of said pixel values, determining an inequality comparing a first value to a second value, the first value being calculated as, or based on, the square of the differ- ence between a pixel value (ix,y,t), of a pixel (px,y) in question, and a predicted pixel value the second value being calculated as, or based on, a product of, firstly, the square of a number Z and, secondly, an estimated variance or standard deviation (<jx y t) with re- spect to historic pixel values for the pixel (px,y) in question, wherein the predicted pixel value calculated based on the historic pixel values for the pixel (px,y) in question, the inequality being determined as (nix,y ,t ~
Figure imgf000040_0001
r individual ones of said pixel values and for a number n, Sx y t =
Figure imgf000040_0002
and, for individual ones of said pixels, Sx y t and QXiyit are stored in a computer memory; for pixel values for which said first value is higher than said second value, storing in the computer memory information indicating that the pixel value (ix,y,t) is part of a detected blob; and correlating, based on the information stored in the computer memory, detected blobs across said series of digital images (lt) to determine paths of moving objects through said three-dimensional space.
2. The method according to claim 1, wherein said inequality is
Figure imgf000040_0003
> Z2ax y t, where fi.Xiyit is said predicted pixel value and where <jx y t is an estimated standard deviation with respect to the historic pixel values (ix,y,{t-n,t-i}) for the pixel (px,y) in question.
3. The method according to claim 1, 1 is a number selected such that Z2 is an integer such that 10 < Z2 < 20.
4. The method according to claim 1, wherein Sx,y,t, Qx,y,t, or both, are calculated recur- sively, whereby a calculated value for the pixel value (ix,y,t) is calculated using a previously stored calculated value of Sx,y,t, Qx,y,t, or both, for the pixel (px,y) but at an immediately pre- ceding time (t-1).
5. The method according to claim 4, wherein
Sx,y,t is calculated as Sx y t = Sx y t-1 + ix y t — iXiyit-n and Qx,y,t is calculated as
Figure imgf000041_0001
6. The method according to any of claims 1-5, wherein the method comprises: storing in said computer memory Sx,y,t and Qx,y,t in combination as a single datatype comprising 12 bytes or less per pixel (px,y).
7. The method according to any of claims 1-5, wherein the method comprises: storing in said computer memory, for a particular digital image, a pixmap comprising, for each pixel, said information indicating that the pixel value (ix,y,t) is part of a detected blob.
8. The method according to claim 7, wherein said information indicating that the pixel value (ix,y,t) is part of a detected blob is indi- cated in a single bit for each pixel (px,y).
9. The method according to claim 7, wherein said pixmap comprises, for each pixel, a value indicating an expected pixel value for that pixel.
10. The method according to claim 9, wherein said value indicating an expected pixel value for the pixel is achieved by storing a pre- dicted pixel value as a fixed-point fractional number, using a total of 15 bits for the integer and fractional parts.
11. The method according to any of claims 1-5, wherein the predicted pixel value (fi.Xiyit), the estimated variance or standard deviation (crXtytt), or both, is or are calculated based on a set of n historic pixel values (ix,y,{t-n,t-i}) for the pixel (px,y) in question, where 10 < n < 300.
12. The method according to any of claims 1-5, wherein a number n of previous images considered for the estimation of the estimated vari- ance or standard deviation (<jx y t) of the second value is selected to be a power of 2.
13. The method according to any of claims 1-5, wherein said pixel values have a depth across one or several channels of between 8 and 48 bits.
14. The method according to any of claims 1-5, wherein the predicted pixel value, (fiXiyit) is determined based on an estimated projected fu- ture mean pixel value
Figure imgf000042_0001
in turn determined based on a numerical relationship between historic pixel values and current pixel values for a sampled set of pixels (px,y) in said series of digital images (lt).
15. The method according to claim 14, wherein the predicted pixel value (fiXiyit) is determined as fi.Xiyit = ocp.Xiyit + /?, where a and /? are constants determined so as to minimize Z ' j,k (j-j,k,t ~ (sWj.k.t + /?)) > where
Figure imgf000042_0002
is said estimated projected future mean pixel value for a pixel (pj,k) in question, and where j and k are iterated over a test set of pixels.
16. The method according to claim 15, wherein HX)y)t is an estimated historic mean with respect to pixel values for the pixel (px,y) in question.
17. The method according to claim 15, wherein said test set of pixels contains between 1% and 25% of a total set of pixels in a given image.
18. The method according to claim 17, wherein said test set of pixels is geometrically evenly distributed across the total set of pixels in the given image.
19. The method according to claim 15, wherein the method comprises: determining the estimated standard deviation (<jx y t) according to ax y t =
Figure imgf000043_0001
20. The method according to claim 15, wherein the method comprises: determining that at least one is true of a being further away from 1 than a first thresh- old value and /? being further away from 0 than a second threshold value; and determining the predicted pixel value
Figure imgf000043_0002
until it is determined that a is no longer further away from 1 than the first threshold value and /? is no longer further away from 0 than the second threshold value.
21. The method according to claim 1, wherein the method comprises: for said pixel values for which said first value is higher than said second value, only store said information indicating that the pixel value (ix,y,t) is part of a detected blob in case also the following inequality holds: B [ix y t — fiX:y,t\2 > (ix,y,t> where tx y t is the pixel value in question, where fi.Xiyit is the predicted pixel value and where B is an integer such that B > 100.
22. The method according to any of claims 1-5 or 21, wherein the method comprises: using a Hoshen-Kopelman algorithm to group together individual adjacent pixels de- termined to be part of a same blob.
23. The method according to any of claims 1-5 or 21, wherein the moving objects are golf balls.
24. System for tracking moving objects, the system comprising: a digital camera arranged to represent optical input from a three-dimensional space within a field of view of the digital camera to produce a series of digital images (lt) at con- secutive times (t), the digital camera (110) being arranged to produce said series of digital images (lt) having a corresponding set of pixels, said series of digital images comprising cor- responding pixel values, the digital camera being arranged to not move in relation to said three-dimensional space during production of said series of digital images (lt); a computer having an associated computer memory, the computer being configured to run a digital image analyzer; the digital image analyzer being configured to, for two or more of said pixel values, determine an inequality comparing a first value to a second value, the first value being cal- culated as, or based on, the square of the difference between a pixel value ( ix,y,t), of a pixel (px,y) in question and, a predicted pixel value (fiXiyit), the second value being calculated as, or based on, a product of the square of, firstly, a number Z and, secondly, an estimated variance or standard deviation (<jx y t) with respect to historic pixel values for the pixel (px,y) in question, wherein the predicted pixel value
Figure imgf000044_0001
is calculated based on the historic pixel values for the pixel (px,y) in question, the inequality being determined as a ncb f°r individual ones of said pixel values and
Figure imgf000044_0002
Qx v t = 7^11 „ ix v and, for individual ones of said pixels, Sx y t and QXiyit are stored in the computer memory; the digital image analyzer being configured to, for pixel values for which said first value is higher than said second value, store in the computer memory information indicat- ing that the pixel value (ix,y,t) is part of a detected blob; and a moving object tracker being configured to correlate, based on the information stored in the computer memory, detected blobs across said series of digital images (lt) to determine paths of moving objects through said three-dimensional space.
25. The system of 24, wherein the digital image analyzer is configured to perform opera- tions in accordance with any of claims 2-23.
26. A non-transitory computer-readable medium encoding a computer program product configured to perform operations in accordance with any of claims 1-23
PCT/EP2023/077799 2022-10-17 2023-10-06 Method and system for optically tracking moving objects WO2024083537A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
SE2230331A SE2230331A1 (en) 2022-10-17 2022-10-17 Method and system for optically tracking moving objects
SE2230331-7 2022-10-17

Publications (1)

Publication Number Publication Date
WO2024083537A1 true WO2024083537A1 (en) 2024-04-25

Family

ID=88372203

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2023/077799 WO2024083537A1 (en) 2022-10-17 2023-10-06 Method and system for optically tracking moving objects

Country Status (2)

Country Link
SE (1) SE2230331A1 (en)
WO (1) WO2024083537A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180374233A1 (en) * 2017-06-27 2018-12-27 Qualcomm Incorporated Using object re-identification in video surveillance
US20220051420A1 (en) 2020-08-14 2022-02-17 Topgolf Sweden Ab Motion Based Pre-Processing of Two-Dimensional Image Data Prior to Three-Dimensional Object Tracking With Virtual Time Synchronization

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4144377B2 (en) * 2003-02-28 2008-09-03 ソニー株式会社 Image processing apparatus and method, recording medium, and program
US7940961B2 (en) * 2007-12-20 2011-05-10 The United States Of America As Represented By The Secretary Of The Navy Method for enhancing ground-based detection of a moving object
US11138442B2 (en) * 2015-06-01 2021-10-05 Placemeter, Inc. Robust, adaptive and efficient object detection, classification and tracking
US20160379074A1 (en) * 2015-06-25 2016-12-29 Appropolis Inc. System and a method for tracking mobile objects using cameras and tag devices
CA2934102A1 (en) * 2015-06-25 2016-12-25 Appropolis Inc. A system and a method for tracking mobile objects using cameras and tag devices
WO2018063914A1 (en) * 2016-09-29 2018-04-05 Animantis, Llc Methods and apparatus for assessing immune system activity and therapeutic efficacy
US20180144476A1 (en) * 2016-11-23 2018-05-24 Qualcomm Incorporated Cascaded-time-scale background modeling
US10803598B2 (en) * 2017-06-21 2020-10-13 Pankaj Chaurasia Ball detection and tracking device, system and method
US11004209B2 (en) * 2017-10-26 2021-05-11 Qualcomm Incorporated Methods and systems for applying complex object detection in a video analytics system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180374233A1 (en) * 2017-06-27 2018-12-27 Qualcomm Incorporated Using object re-identification in video surveillance
US20220051420A1 (en) 2020-08-14 2022-02-17 Topgolf Sweden Ab Motion Based Pre-Processing of Two-Dimensional Image Data Prior to Three-Dimensional Object Tracking With Virtual Time Synchronization

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BOYER M ET AL: "Accelerating leukocyte tracking using CUDA: A case study in leveraging manycore coprocessors", PARALLEL&DISTRIBUTED PROCESSING, 2009. IPDPS 2009. IEEE INTERNATIONAL SYMPOSIUM ON, IEEE, PISCATAWAY, NJ, USA, 23 May 2009 (2009-05-23), pages 1 - 12, XP031487429, ISBN: 978-1-4244-3751-1 *
PICCARDI M.: "Background subtraction techniques: a review", 2004 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (IEEE CAT. NO.04CH37583), 1 January 2004 (2004-01-01), pages 3099 - 3104, XP093122204, ISBN: 978-0-7803-8567-2, DOI: 10.1109/ICSMC.2004.1400815 *
TADESSE MISIKER ET AL: "High performance automatic target recognition", AFRICON 2015, IEEE, 14 September 2015 (2015-09-14), pages 1 - 5, XP032813578, DOI: 10.1109/AFRCON.2015.7331961 *

Also Published As

Publication number Publication date
SE2230331A1 (en) 2024-04-18

Similar Documents

Publication Publication Date Title
CN107480704B (en) Real-time visual target tracking method with shielding perception mechanism
EP3631756B1 (en) Block-matching optical flow and stereo vision for dynamic vision sensors
CN110717527B (en) Method for determining target detection model by combining cavity space pyramid structure
CA3123509A1 (en) Automated semantic segmentation of non-euclidean 3d data sets using deep learning
CN111369608A (en) Visual odometer method based on image depth estimation
Gorur et al. Speeded up Gaussian mixture model algorithm for background subtraction
EP2352128B1 (en) Mobile body detection method and mobile body detection apparatus
CN112116001A (en) Image recognition method, image recognition device and computer-readable storage medium
CN112184757A (en) Method and device for determining motion trail, storage medium and electronic device
CN111144377B (en) Crowd counting algorithm-based dense area early warning method
CN117370329B (en) Intelligent management method and system for equipment data based on industrial Internet of things
CN104978738A (en) Method of detection of points of interest in digital image
CN114037087A (en) Model training method and device, depth prediction method and device, equipment and medium
CN114169425A (en) Training target tracking model and target tracking method and device
WO2024083537A1 (en) Method and system for optically tracking moving objects
Liu et al. Video monitoring of Landslide based on background subtraction with Gaussian mixture model algorithm
CN107067411B (en) Mean-shift tracking method combined with dense features
CN115953438A (en) Optical flow estimation method and device, chip and electronic equipment
CN114494441B (en) Grape and picking point synchronous identification and positioning method and device based on deep learning
JP7475959B2 (en) Image processing device, image processing method, and program
CN113450385B (en) Night work engineering machine vision tracking method, device and storage medium
US20230036402A1 (en) Focused Computer Detection Of Objects In Images
CN108830884B (en) Multi-vision sensor cooperative target tracking method
JP3377075B2 (en) 2D precipitation forecasting device
KR20210051707A (en) Apparatus for tracking feature point based on image for drone hovering and method thereof