US11023761B2 - Accurate ROI extraction aided by object tracking - Google Patents

Accurate ROI extraction aided by object tracking Download PDF

Info

Publication number
US11023761B2
US11023761B2 US16/161,412 US201816161412A US11023761B2 US 11023761 B2 US11023761 B2 US 11023761B2 US 201816161412 A US201816161412 A US 201816161412A US 11023761 B2 US11023761 B2 US 11023761B2
Authority
US
United States
Prior art keywords
region
frames
interest
locations
merged
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
US16/161,412
Other versions
US20190138833A1 (en
Inventor
Weihua Xiong
Guangbin Zhang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Eaglesens Systems Corp
Original Assignee
Eaglesens Systems Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Eaglesens Systems Corp filed Critical Eaglesens Systems Corp
Priority to US16/161,412 priority Critical patent/US11023761B2/en
Assigned to EagleSens Systems Corporation reassignment EagleSens Systems Corporation ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: XIONG, WEIHUA, ZHANG, GUANGBIN
Publication of US20190138833A1 publication Critical patent/US20190138833A1/en
Application granted granted Critical
Publication of US11023761B2 publication Critical patent/US11023761B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • G06K9/3233
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06K9/00771
    • G06K9/342
    • G06K9/6298
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/187Segmentation; Edge detection involving region growing; involving region merging; involving connected component labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/72Data preparation, e.g. statistical preprocessing of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06K2009/3291
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30232Surveillance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking

Definitions

  • This disclosure generally relates to image data processing methods based on artificial intelligence and image sensors configured to perform the same.
  • a region of interest is subset of image pixels that are identified for a particular purpose. This concept is commonly used in image and vision related applications. Normally, several objects, and their locations in the image, are needed from a single scene. For example, in surveillance systems, the system typically concentrates on several specific subjects, such as vehicle license plates, faces, etc., at the same time.
  • one aspect disclosed features an image data processing method comprising: receiving frame image data of N frames, where N>1; detecting a region of interest in one of the N frames; tracking locations of the region of interest in at least one of the N frames; and providing a merged location of the region of interest based on the locations of the region of interest in the N frames.
  • Embodiments of the method may include one or more of the following features. Some embodiments comprise providing T of the merged locations of the region of interest for T respective groups of N frames, where T>1; providing respective statistical data for each of the T merged locations; and providing a final location of the region of interest based on the T merged locations and the statistical data for the T merged locations.
  • the statistical data for each of the T merged locations comprises at least one of: a number of the frames in which the region of interest appeared; and a percentage of the frames in which the region of interest appeared.
  • Some embodiments comprise receiving a previously-detected location for the region of interest; and providing the final location of the region of interest based on the previously-detected location, the T merged locations, and the statistical data for the T merged locations. Some embodiments comprise receiving trained parameters of feature descriptions; detecting at least one potential ROI in a first frame in the sequence; and detecting the region of interest based on the at least one detected potential ROI. Some embodiments comprise tracking the locations of the region of interest based on the at least one detected potential ROI.
  • an image sensor comprising: an image input unit configured to receive, from the image sensor, receiving frame image data of N frames, wherein N>1; a detect unit configured to detect a region of interest in one of the N frames; a track unit configured to track locations of the region of interest in at least one of the N frames; and an analysis unit configured to provide a merged location of the region of interest based on the locations of the region of interest in the N frames.
  • Embodiments of the image sensor may include one or more of the following features.
  • the analysis unit is further configured to: provide T of the merged locations of the region of interest for T respective groups of N frames, wherein T>1; provide respective statistical data for each of the T merged locations; and provide a final location of the region of interest based on the T merged locations and the statistical data for the T merged locations.
  • the statistical data for each of the T merged locations comprises at least one of: a number of the frames in which the region of interest appeared; and a percentage of the frames in which the region of interest appeared.
  • Some embodiments comprise an ROI input unit configured to receive a previously-detected location for the region of interest; wherein the analysis unit is further configured to provide the final location of the region of interest based on the previously-detected location, the T merged locations, and the statistical data for the T merged locations.
  • Some embodiments comprise receiving trained parameters of feature descriptions; wherein the detect unit is further configured to detect at least one potential ROI in a first frame in the sequence based on the trained parameters of feature descriptions; and wherein the detect unit is further configured to detect the region of interest based on the at least one detected potential ROI.
  • the track unit is further configured to: track the locations of the region of interest based on the at least one detected potential ROI.
  • one aspect disclosed features a non-transitory machine-readable storage medium encoded with instructions executable by a hardware processor, the machine-readable storage medium comprising instructions to cause the hardware processor to perform an image data processing method, the method comprising: receiving frame image data of N frames, where N>1; detecting a region of interest in one of the N frames; tracking locations of the region of interest in at least one of the N frames; and providing a merged location of the region of interest based on the locations of the region of interest in the N frames.
  • Embodiments of the medium may include one or more of the following features.
  • the method includes providing T of the merged locations of the region of interest for T respective groups of N frames, where T>1; providing respective statistical data for each of the T merged locations; and providing a final location of the region of interest based on the T merged locations and the statistical data for the T merged locations.
  • the statistical data for each of the T merged locations comprises at least one of: a number of the frames in which the region of interest appeared; and a percentage of the frames in which the region of interest appeared.
  • the method includes receiving a previously-detected location for the region of interest; and providing the final location of the region of interest based on the previously-detected location, the T merged locations, and the statistical data for the T merged locations.
  • the method includes receiving trained parameters of feature descriptions; detecting at least one potential ROI in a first frame in the sequence; and detecting the region of interest based on the at least one detected potential ROI.
  • the method includes tracking the locations of the region of interest based on the at least one detected potential ROI.
  • FIG. 1 illustrates an exemplary system for communicating video frame image data captured by an artificial intelligence (AI) based image sensor according to various embodiments.
  • AI artificial intelligence
  • FIG. 2 shows a conventional process for ROI detection from a single image.
  • FIG. 3 illustrates an exemplary data flow in an AI-based image data processing device for ROI detection with tracking from a sequence of frames according to various embodiments.
  • FIG. 4 illustrates an exemplary data flow in an AI-based image data processing device for ROI detection with tracking from a sequence of frames according to various embodiments.
  • FIG. 5 illustrates an exemplary data flow in an AI-based image data processing device for ROI detection with tracking from a sequence of N frames according to various embodiments.
  • FIG. 6 presents an example of face ROI extraction without tracking for a sequence of frames.
  • FIG. 7 presents an example of face ROI extraction with tracking according to various embodiments.
  • FIG. 8 illustrates a flowchart of an exemplary image data processing method according to various embodiments.
  • FIG. 9 illustrates a block diagram of an exemplary computer system 900 to implement one or more of functionalities of the AI-based image sensor according to various embodiments.
  • One or more of the various embodiments of the present disclosure is directed to detecting regions of interest (ROIs) for objects using multiple image frames.
  • this invention relates to determining reliable regions of interest in image sequences using vision processing.
  • the invention analyzes outputs from a sequence of frames, rather than from a single image, for example in a video mode, to determine whether the extracted ROI meets required purposes so as to reduce false detection with more accuracy and less transfer bandwidth.
  • This invention presents a novel method that considers ROI object tracking and detection in an integrated framework in order to reduce false results. With the help of object tracking, object detection becomes more stable.
  • the inspiration of the invention follows our vision system: If one wants to identify an object in a scene, one may have some difficulty exactly determining the object from a brief glance; however, if one can stare at it for a while, the identification of the object becomes much more accurate.
  • one or more ROIs are determined by employing an artificial intelligence (AI) based image recognition technique referred to herein as ‘accurate ROI extraction aided by object tracking.’
  • AI artificial intelligence
  • an image sensor configured to carry out an AI-based image recognition may be mounted on a post near a traffic signal, a pedestrian crossing for a roadway, or the like.
  • Image data obtained from the image sensor may be transmitted to a local system and further a cloud system for further image processing.
  • the key contents of the image data may include valuable information, such as the identities of people, vehicles, and the like.
  • FIG. 1 illustrates an exemplary system 100 for communicating video frame image data captured by an artificial intelligence (AI) based image sensor according to various embodiments.
  • the system 100 includes an artificial intelligence (AI) based image sensor 102 , a local system 104 , and a cloud system 106 .
  • AI artificial intelligence
  • the AI-based image sensor 102 is configured to obtain original video frame image data from the real world and carry out AI-based image data processing.
  • the AI-based image sensor 102 is configured to obtain original video frame image data from the image sensor array, and pre-process the obtained original video frame image data to extract key information. Through the pre-processing, the AI sensor chip 102 may reduce the bandwidth from original video frame data frame rate to a lower bandwidth data stream which can be transferred through the first data link 108 .
  • the AI-based image sensor 102 in extracting key information, is configured to determine which part of the original video frame data may contain key image data and needs to be kept, and non-key image data that may be compressed to reduce the overall data bandwidth. More detail of the AI-based image data processing will be described below.
  • the AI-based image sensor 102 is formed as a chip on which an image sensor array is disposed.
  • the AI-based image sensor 102 also includes an image signal processor (ISP) on the chip to carry out the AI-based image data processing.
  • the AI-based image sensor 102 may be mounted on a post to capture surrounding images thereof.
  • the output image data from the AI-based image sensor 102 may be either raw or ISP processed format, such as YUV, or Motion-JPEG.
  • the output image data from the AI-based image sensor 102 is transmitted through the first data link 108 to a local data processing unit 110 in the local system 104 .
  • the first data link 108 may be a wired link or a wireless link, and the wireless link may be any applicable wireless data communication link such as a WiFi, Bluetooth, ZigBee, etc.
  • the local system 104 represents a computing system disposed proximate to the AI-based image sensor 102 and configured to perform additional image data processing for various applicable purposes.
  • the local system 104 may be a computing system configured to perform an autonomous operation of operating roadway signals for pedestrians and vehicles based on output image data from the AI-based image sensor 102 .
  • the local data processing unit 110 is implemented as a field-programmable gate array (FPGA), a graphics processing unit (GPU), a tensor processing unit (TPU), a network processing unit (NPU), and/or a central processing unit (CPU).
  • FPGA field-programmable gate array
  • GPU graphics processing unit
  • TPU tensor processing unit
  • NPU network processing unit
  • CPU central processing unit
  • the AI-based image sensor 102 may be manufactured using a mixed-signal silicon process, e.g., 90 nm mixed-signal process, which supports both digital MOSFET and analog MOSFET as sensor elements of the AI-based image sensor 102 .
  • the local data processing unit 110 may be manufactured using digital MOSFET.
  • a highly advanced silicon process e.g., 14 nm process, may be employed to achieve high performance. Therefore, in some embodiments, it may be preferable to dispose the ISP in the local system 104 rather than to use an on-chip ISP within the AI-based image sensor 102 .
  • the local system 104 may also include an optional local storage device 112 for storing image data processed by the local data processing unit 110 .
  • the bandwidth of the first data link 108 and/or the processing power of the local data processing unit 110 is typically limited. As a result, the resolution and frame rate of the AI-based image sensor 102 that can be effectively utilized may be largely limited in many applications.
  • Output image data of the local system 104 is transmitted through a second data link 114 to the cloud system 106 .
  • the cloud system 106 represents a computing system disposed separately from the local system 104 and the AI-based image sensor 102 and configured to perform additional image data processing for various applicable purposes.
  • the cloud system 106 may be a server computing system configured to perform data analysis of operations by the local system 104 and/or image data obtained from the local system 104 .
  • the data analysis may include traffic analysis, monitoring of vehicles, humans, animals, etc.
  • the cloud system 106 includes a cloud data processing unit 116 and an optional cloud storage device 118 .
  • the cloud data processing unit 116 has a more powerful processing power than the local data processing unit 110 and the optional cloud storage device 118 has a larger storage capacity than the optional local storage device 112 .
  • the bandwidth of the second data link 114 may be significantly limited in comparison to the processing power of the local data processing unit 110 .
  • FIG. 2 shows a conventional process for ROI detection from a single image.
  • a processing unit 200 acts as a training machine that determines fundamental features of an interesting object.
  • the input to the training machine is a large bundle of positive examples and negative examples, while the outputs 210 are trained parameters of feature descriptions that can differentiate the positive examples from negative ones.
  • the training machine may implement any machine learning strategy, including Support Vector Machine (SVM), Adaboost, Convolutional Neural Network (CNN), or others.
  • SVM Support Vector Machine
  • CNN Convolutional Neural Network
  • the output data 210 of processing unit 200 is passed though data link 110 to inference processing unit 220 .
  • a block size is defined for image processing.
  • the inference processing unit 220 accepts a single image, applies the parameters of feature descriptions to the image on every block centered at each pixel, and predicts the ROI regions on predict unit 230 .
  • FIG. 3 illustrates an exemplary data flow 300 in an AI-based image data processing device for ROI detection with tracking from a sequence of frames according to various embodiments.
  • the AI-based image data processing device includes an inference unit 310 , a track unit 320 , a detect unit 330 , an analysis unit 340 , and a vote unit 350 .
  • Each of the inference unit 310 , track unit 320 , detect unit 330 , analysis unit 340 , and vote unit 350 may be configured by a specifically configured circuitry and/or a software-based computer system such as described below with reference to FIG. 9 .
  • the inference unit 310 receives the trained parameters of feature descriptions 210 . Instead of directly reporting ROI locations one each frame, the inference unit 310 initially detects potential objects on the first frame in the sequence.
  • the inference unit 310 may implement any ROI detector, including SVM, Adaboost, CNN, and others.
  • the track unit 320 tracks the locations of the detected potential objects.
  • the track unit 320 may implement any tracking methods, for example including Block Correlation, Minimal Average Difference, Maximal Entropy, or others.
  • the detect unit 330 detects object ROIs in every Nth frame.
  • the detect unit 330 may implement any ROI detector.
  • the analysis unit 340 analyzes the detected and tracked ROI locations every Nth frame, and maintains statistical data for the ROIs.
  • the vote unit 350 disregards false ROI locations and reports correct ROI locations.
  • FIG. 4 illustrates an exemplary data flow in an AI-based image data processing device for ROI detection with tracking from a sequence of frames according to various embodiments.
  • the AI-based image data processing device includes an input unit 400 , a detect unit 410 , a detect and track unit 420 , and an analysis unit 430 .
  • Each of input unit 400 , detect unit 410 , detect and track unit 420 , and analysis unit 430 may be configured by a specifically configured circuitry and/or a software-based computer system such as described below with reference to FIG. 9 .
  • the input unit 400 receives a sequence of frames of image data, for example such as consecutive image frames generated in an image sensor operating in a video mode.
  • the input unit 400 assigns each frame a frame index, beginning with 0.
  • the detect unit 410 processes the first frame to detect potential ROI locations. These locations are fed into detect and track unit 420 for tracking the locations in the following frames. Besides tracking, detect and track unit 420 also detects ROIs at every Nth frame in the sequence, and merges the locations of the detected ROIs with the respective tracked locations.
  • FIG. 5 illustrates an exemplary data flow in an AI-based image data processing device for ROI detection with tracking from a sequence of N frames according to various embodiments.
  • the AI-based image data processing device includes an image input unit 500 , a detect unit 510 , a track unit 520 , an analysis unit 530 , and a ROI input unit 540 .
  • Each of image input unit 500 , detect unit 510 , track unit 520 , analysis unit 530 , and ROI input unit 540 may be configured by a specifically configured circuitry and/or a software-based computer system such as described below with reference to FIG. 9 .
  • the image input unit 500 receives a sequence of N frames of image data, for example such as consecutive image frames generated in an image sensor operating in a video mode.
  • the image input unit 500 assigns each frame a frame index, beginning with 0.
  • the image input unit 500 provides the frames and frame indexes to both detect unit 510 and track unit 520 .
  • the ROI input unit 540 provides previous ROI locations to track unit 520 .
  • the previous ROI locations may include ROIs extracted from the first frame in the sequence, which may be provided by detect unit 310 of FIG. 3 .
  • the previous ROI locations may include ROIs extracted from the previous sequence of N frames, which may be provided by the analysis unit 340 of FIG. 3 , by the detect and track unit 420 of FIG. 4 , or by the analysis unit 530 of FIG. 5 .
  • the detect unit 510 detects object ROIs in every Nth frame.
  • the detect unit 510 may implement any ROI detector.
  • the track unit 520 tracks the locations of the ROIs in every frame in the sequence.
  • the track unit 520 may implement any tracking method.
  • the analysis unit 530 merges the detected ROI locations generated by the detect unit 510 , and the tracked ROI locations generated by the track unit 520 . Based on this data, the analysis unit 530 creates ROI locations for new objects, adjusts ROI locations for existing objects, and generates and updates statistical data for all objects.
  • FIG. 6 presents an example of conventional face ROI extraction without tracking for a sequence of frames.
  • the system applies an ROI detection method on Frame 0, and outputs the results, which include not only a true face detection, but also a false detection, of an arm.
  • the system does not consider the interrelation among consecutive frames, but instead works on each frame independently. Therefore, the system generates lots of false detections, including more detections of the arm, resulting in wasted transfer bandwidth.
  • FIG. 7 presents an example of face ROI extraction with tracking according to various embodiments.
  • the interrelation between consecutive frames is taken into account, as described elsewhere in this disclosure.
  • tracked ROIs are shown in yellow boxes, and detected ROIs are shown in red boxes.
  • the system applies an ROI extraction method on initial Frame 0, and detects two ROIs.
  • One is a face ROI, and the other one is arm ROI.
  • the system does not provide the results immediately.
  • the locations of these two detected objects, face and arm are tracked.
  • the system detects ROIs, not on every frame, but instead on every Nth frame, and updates statistical data about detection and tracking for each object.
  • N is set to be 3.
  • This procedure repeats T times, which is set to be 3 in this example.
  • the ROIs detected in Frame 0 are tracked.
  • the system performs ROI detection and tracking.
  • the face ROI is shown in the red box to indicate it has been detected.
  • the statistics are updated to show that the face ROI has been detected once, and the arm ROI has been detected once.
  • the ROIs detected previously are tracked.
  • the system performs ROI detection and tracking. If one object ROI appears enough in T extractions, its ROI is labeled TRUE; otherwise, the ROI will be labeled FALSE.
  • the face ROI is detected 7 times, for a 78% detection rate.
  • the arm ROI is detected only 3 times, for a 33% detection rate.
  • a true detection threshold may be set at 5 times, or 56%. Using this threshold, the face ROI is labeled TRUE, while the arm ROI is labeled FALSE.
  • Frame 9 only the face ROI is shown in a red box to indicate a true detection. So only one ROI is produced, with higher accuracy and less required bandwidth.
  • FIG. 8 illustrates a flowchart 800 of an exemplary image data processing method 800 according to various embodiments.
  • the exemplary method may be implemented in various environments including, for example, the functional units of the AI-based image sensor illustrated in FIG. 1 .
  • the operations of the exemplary method presented below are intended to be illustrative. Depending on the implementation, the exemplary method may include additional, fewer, or alternative steps performed in various orders or in parallel.
  • this flowchart illustrates blocks (and potentially decision points) organized in a fashion that is conducive to understanding. It should be recognized, however, that the blocks can be reorganized for parallel execution, reordered, and modified (changed, removed, or augmented), where circumstances permit.
  • the flowchart 800 starts at block 802 , with receiving frame image data of N frames, where N>1.
  • the frame image data may be received from an image sensor.
  • the frames may be consecutive images in a video, for example such as a video produced by an image sensor in video mode.
  • the image input unit 500 of FIG. 5 receives the frames from an image sensor array of an AI-based image sensor.
  • the flowchart 800 continues to block 804 , with detecting a region of interest in one of the N frames.
  • the detect unit 510 of FIG. 5 detects the region of interest.
  • Some embodiments comprise receiving trained parameters of feature descriptions, detecting at least one potential ROI in a first frame in the sequence, and detecting the potential ROI in every Nth frame in the following sequence.
  • the flowchart 800 continues to block 806 , with tracking locations of the region of interest in at least one of the N frames.
  • the track unit 520 of FIG. 5 tracks the locations in every frame.
  • the flowchart 800 continues to block 808 , with providing a merged location of the region of interest based on the locations of the region of interest in the N frames.
  • the analysis unit 530 of FIG. 5 provides the merged location of the region of interest.
  • Some embodiments comprise receiving a previously-detected location for the region of interest, and providing the final location of the region of interest based on the previously-detected location, the T merged locations, and the statistical data for the T merged locations.
  • Some embodiments comprise tracking the locations of the region of interest based on the at least one detected potential ROI.
  • the flowchart 800 continues to block 810 , with providing T of the merged locations of the region of interest for T respective groups of N frames, where T>1.
  • the analysis unit 530 of FIG. 5 provides the T merged locations of the region of interest for the T respective groups of N frames.
  • the flowchart 800 continues to block 812 , with providing respective statistical data for each of the T merged locations.
  • the analysis unit 530 of FIG. 5 provides the respective statistical data for each of the T merged locations.
  • the statistical data for each of the T merged locations includes a number of the frames in which the region of interest appeared, a percentage of the frames in which the region of interest appeared, or both, as well as other statistical measures.
  • the flowchart 800 continues to block 814 , with providing a final location of the region of interest based on the T merged locations and the statistical data for the T merged locations.
  • the analysis unit 530 of FIG. 5 provides the final location of the region of interest.
  • FIG. 9 illustrates a block diagram of an exemplary computer system 900 to implement one or more of functionalities of the AI-based image sensor according to various embodiments.
  • the system 900 may correspond to one or more of the first resolution modification unit 204 , the feature detection unit 206 , the second resolution modification unit 212 , the pre-processing unit 214 , and the data combination unit 216 illustrated in FIG. 2 .
  • the computer system 900 includes a bus 902 or other communication mechanism for communicating information, one or more hardware processors 904 coupled with bus 902 for processing information.
  • Hardware processor(s) 904 may be, for example, one or more general purpose microprocessors.
  • the computer system 900 also includes a main memory 906 , such as a random access memory (RAM), cache and/or other dynamic storage devices, coupled to bus 902 for storing information and instructions to be executed by processor 904 .
  • Main memory 906 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 904 .
  • Such instructions when stored in storage media accessible to processor 904 , render computer system 900 into a special-purpose machine that is customized to perform the operations specified in the instructions.
  • the computer system 900 further includes a read only memory (ROM) 908 or other static storage device coupled to bus 902 for storing static information and instructions for processor 904 .
  • a storage device 910 such as a magnetic disk, optical disk, or USB thumb drive (Flash drive), etc., is provided and coupled to bus 902 for storing information and instructions.
  • the computer system 900 may implement the techniques described herein using customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or program logic which in combination with the computer system causes or programs computer system 900 to be a special-purpose machine. According to one embodiment, the techniques herein are performed by computer system 900 in response to processor(s) 904 executing one or more sequences of one or more instructions contained in main memory 906 . Such instructions may be read into main memory 906 from another storage medium, such as storage device 910 . Execution of the sequences of instructions contained in main memory 906 causes processor(s) 904 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.
  • the main memory 906 , the ROM 908 , and/or the storage 910 may include non-transitory storage media.
  • non-transitory media,’ and similar terms, as used herein refers to any media that store data and/or instructions that cause a machine to operate in a specific fashion. Such non-transitory media may comprise non-volatile media and/or volatile media.
  • Non-volatile media includes, for example, optical or magnetic disks, such as storage device 910 .
  • Volatile media includes dynamic memory, such as main memory 906 .
  • non-transitory media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge, and networked versions of the same.
  • the computer system 900 also includes a communication interface 918 coupled to bus 902 .
  • Communication interface 918 provides a two-way data communication coupling to one or more network links that are connected to one or more local networks.
  • communication interface 918 may be an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line.
  • ISDN integrated services digital network
  • communication interface 918 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN (or WAN component to communicated with a WAN).
  • LAN local area network
  • Wireless links may also be implemented.
  • communication interface 918 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
  • the computer system 900 can send messages and receive data, including program code, through the network(s), network link and communication interface 918 .
  • a server might transmit a requested code for an application program through the Internet, the ISP, the local network and the communication interface 918 .
  • the received code may be executed by processor 904 as it is received, and/or stored in storage device 910 , or other non-volatile storage for later execution.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

An image data processing method includes receiving frame image data of N frames, where N>1, detecting a region of interest in one of the N frames, tracking locations of the region of interest in at least one of the N frames, and providing a merged location of the region of interest based on the locations of the region of interest in the N frames. Some embodiments include providing T of the merged locations of the region of interest for T respective groups of N frames, where T>1, providing respective statistical data for each of the T merged locations, and providing a final location of the region of interest based on the T merged locations and the statistical data for the T merged locations.

Description

CROSS REFERENCE TO RELATED APPLICATION
The present application claims priority to U.S. Provisional Patent Application No. 62/582,306, filed Nov. 6, 2017, entitled “Accurate ROI Extraction Aided by Object Tracking,” the entire content of which is incorporated by reference herein.
FIELD OF THE INVENTION
This disclosure generally relates to image data processing methods based on artificial intelligence and image sensors configured to perform the same.
BACKGROUND
A region of interest (ROI) is subset of image pixels that are identified for a particular purpose. This concept is commonly used in image and vision related applications. Normally, several objects, and their locations in the image, are needed from a single scene. For example, in surveillance systems, the system typically concentrates on several specific subjects, such as vehicle license plates, faces, etc., at the same time.
Many ROI extraction methods have been proposed. Recently, many machine learning approaches have been proposed, including Support Vector Machine (SVM), Adaboost, and Convolutional Neural Network (CNN). However, all of these methods extract ROIs from a single frame (image) and therefore inevitably produce false detections.
SUMMARY
In general, one aspect disclosed features an image data processing method comprising: receiving frame image data of N frames, where N>1; detecting a region of interest in one of the N frames; tracking locations of the region of interest in at least one of the N frames; and providing a merged location of the region of interest based on the locations of the region of interest in the N frames.
Embodiments of the method may include one or more of the following features. Some embodiments comprise providing T of the merged locations of the region of interest for T respective groups of N frames, where T>1; providing respective statistical data for each of the T merged locations; and providing a final location of the region of interest based on the T merged locations and the statistical data for the T merged locations. In some embodiments, the statistical data for each of the T merged locations comprises at least one of: a number of the frames in which the region of interest appeared; and a percentage of the frames in which the region of interest appeared. Some embodiments comprise receiving a previously-detected location for the region of interest; and providing the final location of the region of interest based on the previously-detected location, the T merged locations, and the statistical data for the T merged locations. Some embodiments comprise receiving trained parameters of feature descriptions; detecting at least one potential ROI in a first frame in the sequence; and detecting the region of interest based on the at least one detected potential ROI. Some embodiments comprise tracking the locations of the region of interest based on the at least one detected potential ROI.
In general, one aspect disclosed features an image sensor comprising: an image input unit configured to receive, from the image sensor, receiving frame image data of N frames, wherein N>1; a detect unit configured to detect a region of interest in one of the N frames; a track unit configured to track locations of the region of interest in at least one of the N frames; and an analysis unit configured to provide a merged location of the region of interest based on the locations of the region of interest in the N frames.
Embodiments of the image sensor may include one or more of the following features. In some embodiments, the analysis unit is further configured to: provide T of the merged locations of the region of interest for T respective groups of N frames, wherein T>1; provide respective statistical data for each of the T merged locations; and provide a final location of the region of interest based on the T merged locations and the statistical data for the T merged locations. In some embodiments, the statistical data for each of the T merged locations comprises at least one of: a number of the frames in which the region of interest appeared; and a percentage of the frames in which the region of interest appeared. Some embodiments comprise an ROI input unit configured to receive a previously-detected location for the region of interest; wherein the analysis unit is further configured to provide the final location of the region of interest based on the previously-detected location, the T merged locations, and the statistical data for the T merged locations. Some embodiments comprise receiving trained parameters of feature descriptions; wherein the detect unit is further configured to detect at least one potential ROI in a first frame in the sequence based on the trained parameters of feature descriptions; and wherein the detect unit is further configured to detect the region of interest based on the at least one detected potential ROI. In some embodiments, the track unit is further configured to: track the locations of the region of interest based on the at least one detected potential ROI.
In general, one aspect disclosed features a non-transitory machine-readable storage medium encoded with instructions executable by a hardware processor, the machine-readable storage medium comprising instructions to cause the hardware processor to perform an image data processing method, the method comprising: receiving frame image data of N frames, where N>1; detecting a region of interest in one of the N frames; tracking locations of the region of interest in at least one of the N frames; and providing a merged location of the region of interest based on the locations of the region of interest in the N frames.
Embodiments of the medium may include one or more of the following features. In some embodiments, the method includes providing T of the merged locations of the region of interest for T respective groups of N frames, where T>1; providing respective statistical data for each of the T merged locations; and providing a final location of the region of interest based on the T merged locations and the statistical data for the T merged locations. In some embodiments, the statistical data for each of the T merged locations comprises at least one of: a number of the frames in which the region of interest appeared; and a percentage of the frames in which the region of interest appeared. In some embodiments, the method includes receiving a previously-detected location for the region of interest; and providing the final location of the region of interest based on the previously-detected location, the T merged locations, and the statistical data for the T merged locations. In some embodiments, the method includes receiving trained parameters of feature descriptions; detecting at least one potential ROI in a first frame in the sequence; and detecting the region of interest based on the at least one detected potential ROI. In some embodiments, the method includes tracking the locations of the region of interest based on the at least one detected potential ROI.
BRIEF DESCRIPTION OF THE DRAWINGS
Certain features of various embodiments of the present technology are set forth with particularity in the appended claims. A better understanding of the features and advantages of the technology will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings of which:
FIG. 1 illustrates an exemplary system for communicating video frame image data captured by an artificial intelligence (AI) based image sensor according to various embodiments.
FIG. 2 shows a conventional process for ROI detection from a single image.
FIG. 3 illustrates an exemplary data flow in an AI-based image data processing device for ROI detection with tracking from a sequence of frames according to various embodiments.
FIG. 4 illustrates an exemplary data flow in an AI-based image data processing device for ROI detection with tracking from a sequence of frames according to various embodiments.
FIG. 5 illustrates an exemplary data flow in an AI-based image data processing device for ROI detection with tracking from a sequence of N frames according to various embodiments.
FIG. 6 presents an example of face ROI extraction without tracking for a sequence of frames.
FIG. 7 presents an example of face ROI extraction with tracking according to various embodiments.
FIG. 8 illustrates a flowchart of an exemplary image data processing method according to various embodiments.
FIG. 9 illustrates a block diagram of an exemplary computer system 900 to implement one or more of functionalities of the AI-based image sensor according to various embodiments.
DETAILED DESCRIPTION
One or more of the various embodiments of the present disclosure is directed to detecting regions of interest (ROIs) for objects using multiple image frames. In particular, this invention relates to determining reliable regions of interest in image sequences using vision processing. The invention analyzes outputs from a sequence of frames, rather than from a single image, for example in a video mode, to determine whether the extracted ROI meets required purposes so as to reduce false detection with more accuracy and less transfer bandwidth.
This invention presents a novel method that considers ROI object tracking and detection in an integrated framework in order to reduce false results. With the help of object tracking, object detection becomes more stable. The inspiration of the invention follows our vision system: If one wants to identify an object in a scene, one may have some difficulty exactly determining the object from a brief glance; however, if one can stare at it for a while, the identification of the object becomes much more accurate.
According to some embodiments, one or more ROIs are determined by employing an artificial intelligence (AI) based image recognition technique referred to herein as ‘accurate ROI extraction aided by object tracking.’ According to this technique, it is possible to obtain frame image data that can be transmitted through a data communication link of a narrow bandwidth, while maintaining specificity of key contents of the image data. In some embodiments, an image sensor configured to carry out an AI-based image recognition may be mounted on a post near a traffic signal, a pedestrian crossing for a roadway, or the like. Image data obtained from the image sensor may be transmitted to a local system and further a cloud system for further image processing. When the image sensor is mounted as described, the key contents of the image data may include valuable information, such as the identities of people, vehicles, and the like.
FIG. 1 illustrates an exemplary system 100 for communicating video frame image data captured by an artificial intelligence (AI) based image sensor according to various embodiments. In FIG. 1, the system 100 includes an artificial intelligence (AI) based image sensor 102, a local system 104, and a cloud system 106.
The AI-based image sensor 102 is configured to obtain original video frame image data from the real world and carry out AI-based image data processing. In some embodiments, the AI-based image sensor 102 is configured to obtain original video frame image data from the image sensor array, and pre-process the obtained original video frame image data to extract key information. Through the pre-processing, the AI sensor chip 102 may reduce the bandwidth from original video frame data frame rate to a lower bandwidth data stream which can be transferred through the first data link 108. In some embodiments, in extracting key information, the AI-based image sensor 102 is configured to determine which part of the original video frame data may contain key image data and needs to be kept, and non-key image data that may be compressed to reduce the overall data bandwidth. More detail of the AI-based image data processing will be described below.
In some embodiments, the AI-based image sensor 102 is formed as a chip on which an image sensor array is disposed. In a specific implementation, the AI-based image sensor 102 also includes an image signal processor (ISP) on the chip to carry out the AI-based image data processing. In a specific implementation, the AI-based image sensor 102 may be mounted on a post to capture surrounding images thereof. The output image data from the AI-based image sensor 102 may be either raw or ISP processed format, such as YUV, or Motion-JPEG. The output image data from the AI-based image sensor 102 is transmitted through the first data link 108 to a local data processing unit 110 in the local system 104. The first data link 108 may be a wired link or a wireless link, and the wireless link may be any applicable wireless data communication link such as a WiFi, Bluetooth, ZigBee, etc.
The local system 104 represents a computing system disposed proximate to the AI-based image sensor 102 and configured to perform additional image data processing for various applicable purposes. For example, when the AI-based image sensor 102 is mounted on a post to capture images of surrounding environments, the local system 104 may be a computing system configured to perform an autonomous operation of operating roadway signals for pedestrians and vehicles based on output image data from the AI-based image sensor 102. In some embodiments, the local data processing unit 110 is implemented as a field-programmable gate array (FPGA), a graphics processing unit (GPU), a tensor processing unit (TPU), a network processing unit (NPU), and/or a central processing unit (CPU).
In some embodiments, the AI-based image sensor 102 may be manufactured using a mixed-signal silicon process, e.g., 90 nm mixed-signal process, which supports both digital MOSFET and analog MOSFET as sensor elements of the AI-based image sensor 102. To the contrary, the local data processing unit 110 may be manufactured using digital MOSFET. For that reason, a highly advanced silicon process, e.g., 14 nm process, may be employed to achieve high performance. Therefore, in some embodiments, it may be preferable to dispose the ISP in the local system 104 rather than to use an on-chip ISP within the AI-based image sensor 102.
The local system 104 may also include an optional local storage device 112 for storing image data processed by the local data processing unit 110. The bandwidth of the first data link 108 and/or the processing power of the local data processing unit 110 is typically limited. As a result, the resolution and frame rate of the AI-based image sensor 102 that can be effectively utilized may be largely limited in many applications. Output image data of the local system 104 is transmitted through a second data link 114 to the cloud system 106.
The cloud system 106 represents a computing system disposed separately from the local system 104 and the AI-based image sensor 102 and configured to perform additional image data processing for various applicable purposes. For example, when the local system 104 is mounted on a post to capture images of surrounding environments, the cloud system 106 may be a server computing system configured to perform data analysis of operations by the local system 104 and/or image data obtained from the local system 104. The data analysis may include traffic analysis, monitoring of vehicles, humans, animals, etc. The cloud system 106 includes a cloud data processing unit 116 and an optional cloud storage device 118. In some embodiments, the cloud data processing unit 116 has a more powerful processing power than the local data processing unit 110 and the optional cloud storage device 118 has a larger storage capacity than the optional local storage device 112. In a specific implementation, the bandwidth of the second data link 114 may be significantly limited in comparison to the processing power of the local data processing unit 110.
FIG. 2 shows a conventional process for ROI detection from a single image. A processing unit 200 acts as a training machine that determines fundamental features of an interesting object. The input to the training machine is a large bundle of positive examples and negative examples, while the outputs 210 are trained parameters of feature descriptions that can differentiate the positive examples from negative ones. The training machine may implement any machine learning strategy, including Support Vector Machine (SVM), Adaboost, Convolutional Neural Network (CNN), or others.
The output data 210 of processing unit 200 is passed though data link 110 to inference processing unit 220. A block size is defined for image processing. The inference processing unit 220 accepts a single image, applies the parameters of feature descriptions to the image on every block centered at each pixel, and predicts the ROI regions on predict unit 230.
FIG. 3 illustrates an exemplary data flow 300 in an AI-based image data processing device for ROI detection with tracking from a sequence of frames according to various embodiments. The AI-based image data processing device includes an inference unit 310, a track unit 320, a detect unit 330, an analysis unit 340, and a vote unit 350. Each of the inference unit 310, track unit 320, detect unit 330, analysis unit 340, and vote unit 350 may be configured by a specifically configured circuitry and/or a software-based computer system such as described below with reference to FIG. 9.
The inference unit 310 receives the trained parameters of feature descriptions 210. Instead of directly reporting ROI locations one each frame, the inference unit 310 initially detects potential objects on the first frame in the sequence. The inference unit 310 may implement any ROI detector, including SVM, Adaboost, CNN, and others. The track unit 320 tracks the locations of the detected potential objects. The track unit 320 may implement any tracking methods, for example including Block Correlation, Minimal Average Difference, Maximal Entropy, or others. The detect unit 330 detects object ROIs in every Nth frame. The detect unit 330 may implement any ROI detector. The analysis unit 340 analyzes the detected and tracked ROI locations every Nth frame, and maintains statistical data for the ROIs. The vote unit 350 disregards false ROI locations and reports correct ROI locations.
FIG. 4 illustrates an exemplary data flow in an AI-based image data processing device for ROI detection with tracking from a sequence of frames according to various embodiments. The AI-based image data processing device includes an input unit 400, a detect unit 410, a detect and track unit 420, and an analysis unit 430. Each of input unit 400, detect unit 410, detect and track unit 420, and analysis unit 430 may be configured by a specifically configured circuitry and/or a software-based computer system such as described below with reference to FIG. 9.
The input unit 400 receives a sequence of frames of image data, for example such as consecutive image frames generated in an image sensor operating in a video mode. The input unit 400 assigns each frame a frame index, beginning with 0. The detect unit 410 processes the first frame to detect potential ROI locations. These locations are fed into detect and track unit 420 for tracking the locations in the following frames. Besides tracking, detect and track unit 420 also detects ROIs at every Nth frame in the sequence, and merges the locations of the detected ROIs with the respective tracked locations. The detect and track unit 420 also generates and updates detection statistics for each potential object. After the Nth frame, the detect and track unit 420 provides the statistics, and the merged ROI locations, to the analysis unit 430. Normally, the detect and track unit 420 repeats this process T times. When the Mth frame is reached, where M=T×N, the analysis unit 430 disregards false detections, and reports correct ROI locations, based on the statistics.
FIG. 5 illustrates an exemplary data flow in an AI-based image data processing device for ROI detection with tracking from a sequence of N frames according to various embodiments. The AI-based image data processing device includes an image input unit 500, a detect unit 510, a track unit 520, an analysis unit 530, and a ROI input unit 540. Each of image input unit 500, detect unit 510, track unit 520, analysis unit 530, and ROI input unit 540 may be configured by a specifically configured circuitry and/or a software-based computer system such as described below with reference to FIG. 9.
The image input unit 500 receives a sequence of N frames of image data, for example such as consecutive image frames generated in an image sensor operating in a video mode. The image input unit 500 assigns each frame a frame index, beginning with 0. The image input unit 500 provides the frames and frame indexes to both detect unit 510 and track unit 520. The ROI input unit 540 provides previous ROI locations to track unit 520. The previous ROI locations may include ROIs extracted from the first frame in the sequence, which may be provided by detect unit 310 of FIG. 3. The previous ROI locations may include ROIs extracted from the previous sequence of N frames, which may be provided by the analysis unit 340 of FIG. 3, by the detect and track unit 420 of FIG. 4, or by the analysis unit 530 of FIG. 5.
The detect unit 510 detects object ROIs in every Nth frame. The detect unit 510 may implement any ROI detector. The track unit 520 tracks the locations of the ROIs in every frame in the sequence. The track unit 520 may implement any tracking method. At the Nth frame in the sequence, the analysis unit 530 merges the detected ROI locations generated by the detect unit 510, and the tracked ROI locations generated by the track unit 520. Based on this data, the analysis unit 530 creates ROI locations for new objects, adjusts ROI locations for existing objects, and generates and updates statistical data for all objects.
FIG. 6 presents an example of conventional face ROI extraction without tracking for a sequence of frames. In this example, the system applies an ROI detection method on Frame 0, and outputs the results, which include not only a true face detection, but also a false detection, of an arm. In the following frames, the system does not consider the interrelation among consecutive frames, but instead works on each frame independently. Therefore, the system generates lots of false detections, including more detections of the arm, resulting in wasted transfer bandwidth.
FIG. 7 presents an example of face ROI extraction with tracking according to various embodiments. In this example, the interrelation between consecutive frames is taken into account, as described elsewhere in this disclosure. Here tracked ROIs are shown in yellow boxes, and detected ROIs are shown in red boxes.
Referring to FIG. 7, the system applies an ROI extraction method on initial Frame 0, and detects two ROIs. One is a face ROI, and the other one is arm ROI. However, the system does not provide the results immediately. In the following frames, the locations of these two detected objects, face and arm, are tracked. The system then detects ROIs, not on every frame, but instead on every Nth frame, and updates statistical data about detection and tracking for each object. In this example, N is set to be 3. This procedure repeats T times, which is set to be 3 in this example.
In the first sequence of three frames (T=1), in Frames 1 and 2, the ROIs detected in Frame 0 are tracked. In Frame 3, the system performs ROI detection and tracking. In Frame 3, the face ROI is shown in the red box to indicate it has been detected. The statistics are updated to show that the face ROI has been detected once, and the arm ROI has been detected once.
In the second sequence of three frames (T=2), in Frames 4 and 5, the ROIs detected previously are tracked. In Frame 6, the system performs ROI detection and tracking. In Frame 6, both the face ROI and the arm ROI are shown in red boxes to indicate they had been detected. The statistics are updated to show that the face ROI has been detected twice, and the arm ROI has been detected once.
In the third sequence of three frames (T=3), in Frames 7 and 8, the ROIs detected previously are tracked. In Frame 9, the system performs ROI detection and tracking. If one object ROI appears enough in T extractions, its ROI is labeled TRUE; otherwise, the ROI will be labeled FALSE. In this example, in the consecutive 9 frames, the face ROI is detected 7 times, for a 78% detection rate. The arm ROI is detected only 3 times, for a 33% detection rate. A true detection threshold may be set at 5 times, or 56%. Using this threshold, the face ROI is labeled TRUE, while the arm ROI is labeled FALSE. In Frame 9, only the face ROI is shown in a red box to indicate a true detection. So only one ROI is produced, with higher accuracy and less required bandwidth.
FIG. 8 illustrates a flowchart 800 of an exemplary image data processing method 800 according to various embodiments. The exemplary method may be implemented in various environments including, for example, the functional units of the AI-based image sensor illustrated in FIG. 1. The operations of the exemplary method presented below are intended to be illustrative. Depending on the implementation, the exemplary method may include additional, fewer, or alternative steps performed in various orders or in parallel. Also, this flowchart illustrates blocks (and potentially decision points) organized in a fashion that is conducive to understanding. It should be recognized, however, that the blocks can be reorganized for parallel execution, reordered, and modified (changed, removed, or augmented), where circumstances permit.
The flowchart 800 starts at block 802, with receiving frame image data of N frames, where N>1. The frame image data may be received from an image sensor. The frames may be consecutive images in a video, for example such as a video produced by an image sensor in video mode. In a specific implementation, the image input unit 500 of FIG. 5 receives the frames from an image sensor array of an AI-based image sensor.
The flowchart 800 continues to block 804, with detecting a region of interest in one of the N frames. In a specific implementation, the detect unit 510 of FIG. 5 detects the region of interest. Some embodiments comprise receiving trained parameters of feature descriptions, detecting at least one potential ROI in a first frame in the sequence, and detecting the potential ROI in every Nth frame in the following sequence.
The flowchart 800 continues to block 806, with tracking locations of the region of interest in at least one of the N frames. In a specific implementation, the track unit 520 of FIG. 5 tracks the locations in every frame.
The flowchart 800 continues to block 808, with providing a merged location of the region of interest based on the locations of the region of interest in the N frames. In a specific implementation, the analysis unit 530 of FIG. 5 provides the merged location of the region of interest. Some embodiments comprise receiving a previously-detected location for the region of interest, and providing the final location of the region of interest based on the previously-detected location, the T merged locations, and the statistical data for the T merged locations. Some embodiments comprise tracking the locations of the region of interest based on the at least one detected potential ROI.
The flowchart 800 continues to block 810, with providing T of the merged locations of the region of interest for T respective groups of N frames, where T>1. In a specific implementation, the analysis unit 530 of FIG. 5 provides the T merged locations of the region of interest for the T respective groups of N frames.
The flowchart 800 continues to block 812, with providing respective statistical data for each of the T merged locations. In a specific implementation, the analysis unit 530 of FIG. 5 provides the respective statistical data for each of the T merged locations. The statistical data for each of the T merged locations includes a number of the frames in which the region of interest appeared, a percentage of the frames in which the region of interest appeared, or both, as well as other statistical measures.
The flowchart 800 continues to block 814, with providing a final location of the region of interest based on the T merged locations and the statistical data for the T merged locations. In a specific implementation, the analysis unit 530 of FIG. 5 provides the final location of the region of interest.
FIG. 9 illustrates a block diagram of an exemplary computer system 900 to implement one or more of functionalities of the AI-based image sensor according to various embodiments. In some embodiments, the system 900 may correspond to one or more of the first resolution modification unit 204, the feature detection unit 206, the second resolution modification unit 212, the pre-processing unit 214, and the data combination unit 216 illustrated in FIG. 2. The computer system 900 includes a bus 902 or other communication mechanism for communicating information, one or more hardware processors 904 coupled with bus 902 for processing information. Hardware processor(s) 904 may be, for example, one or more general purpose microprocessors.
The computer system 900 also includes a main memory 906, such as a random access memory (RAM), cache and/or other dynamic storage devices, coupled to bus 902 for storing information and instructions to be executed by processor 904. Main memory 906 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 904. Such instructions, when stored in storage media accessible to processor 904, render computer system 900 into a special-purpose machine that is customized to perform the operations specified in the instructions. The computer system 900 further includes a read only memory (ROM) 908 or other static storage device coupled to bus 902 for storing static information and instructions for processor 904. A storage device 910, such as a magnetic disk, optical disk, or USB thumb drive (Flash drive), etc., is provided and coupled to bus 902 for storing information and instructions.
The computer system 900 may implement the techniques described herein using customized hard-wired logic, one or more ASICs or FPGAs, firmware and/or program logic which in combination with the computer system causes or programs computer system 900 to be a special-purpose machine. According to one embodiment, the techniques herein are performed by computer system 900 in response to processor(s) 904 executing one or more sequences of one or more instructions contained in main memory 906. Such instructions may be read into main memory 906 from another storage medium, such as storage device 910. Execution of the sequences of instructions contained in main memory 906 causes processor(s) 904 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions.
The main memory 906, the ROM 908, and/or the storage 910 may include non-transitory storage media. The term ‘non-transitory media,’ and similar terms, as used herein refers to any media that store data and/or instructions that cause a machine to operate in a specific fashion. Such non-transitory media may comprise non-volatile media and/or volatile media. Non-volatile media includes, for example, optical or magnetic disks, such as storage device 910. Volatile media includes dynamic memory, such as main memory 906. Common forms of non-transitory media include, for example, a floppy disk, a flexible disk, hard disk, solid state drive, magnetic tape, or any other magnetic data storage medium, a CD-ROM, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, NVRAM, any other memory chip or cartridge, and networked versions of the same.
The computer system 900 also includes a communication interface 918 coupled to bus 902. Communication interface 918 provides a two-way data communication coupling to one or more network links that are connected to one or more local networks. For example, communication interface 918 may be an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, communication interface 918 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN (or WAN component to communicated with a WAN). Wireless links may also be implemented. In any such implementation, communication interface 918 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
The computer system 900 can send messages and receive data, including program code, through the network(s), network link and communication interface 918. In the Internet example, a server might transmit a requested code for an application program through the Internet, the ISP, the local network and the communication interface 918.
The received code may be executed by processor 904 as it is received, and/or stored in storage device 910, or other non-volatile storage for later execution.
Each of the processes, methods, and algorithms described in the preceding sections may be embodied in, and fully or partially automated by, code modules executed by one or more computer systems or computer processors comprising computer hardware. The processes and algorithms may be implemented partially or wholly in application-specific circuitry.
The various features and processes described above may be used independently of one another, or may be combined in various ways. All possible combinations and sub-combinations are intended to fall within the scope of this disclosure. In addition, certain method or process blocks may be omitted in some implementations. The methods and processes described herein are also not limited to any particular sequence, and the blocks or states relating thereto can be performed in other sequences that are appropriate. For example, described blocks or states may be performed in an order other than that specifically disclosed, or multiple blocks or states may be combined in a single block or state. The example blocks or states may be performed in serial, in parallel, or in some other manner. Blocks or states may be added to or removed from the disclosed example embodiments. The example systems and components described herein may be configured differently than described. For example, elements may be added to, removed from, or rearranged compared to the disclosed example embodiments.
Throughout this specification, plural instances may implement components, operations, or structures described as a single instance. Although individual operations of one or more methods are illustrated and described as separate operations, one or more of the individual operations may be performed concurrently, and nothing requires that the operations be performed in the order illustrated. Structures and functionality presented as separate components in example configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements fall within the scope of the subject matter herein.
Although an overview of the subject matter has been described with reference to specific example embodiments, various modifications and changes may be made to these embodiments without departing from the broader scope of embodiments of the present disclosure. Such embodiments of the subject matter may be referred to herein, individually or collectively, by the term ‘invention’ merely for convenience and without intending to voluntarily limit the scope of this application to any single disclosure or concept if more than one is, in fact, disclosed.
The Detailed Description is not to be taken in a limiting sense, and the scope of various embodiments is defined only by the appended claims, along with the full range of equivalents to which such claims are entitled.

Claims (12)

The invention claimed is:
1. A computer-implemented image data processing method comprising:
receiving frame image data of N frames from an image sensor, where N>1;
detecting a region of interest in one of the N frames;
tracking locations of the region of interest in a plurality of the N frames;
providing T merged locations of the region of interest for T respective groups of N frames, wherein each of the T merged locations is based on the locations of the region of interest in the respective group of the N frames by merging the locations of the region of interest, wherein T>1;
providing respective statistical data for each of the T merged locations; and
providing a final location of the region of interest based on the T merged locations and the statistical data for the T merged locations;
wherein the statistical data for each of the T merged locations comprises at least one of:
a number of the N frames in the respective group of the N frames in which the region of interest appeared; and
a percentage of the N frames in the respective group of the N frames in which the region of interest appeared.
2. The computer-implemented image data processing method of claim 1, further comprising:
receiving a previously-detected location for the region of interest; and
providing the final location of the region of interest based on the previously-detected location, the T merged locations, and the statistical data for the T merged locations.
3. The computer-implemented image data processing method of claim 1, further comprising:
receiving trained parameters of feature descriptions;
detecting at least one potential region of interest (ROI) in a first frame in one of the groups of the N frames based on the trained parameters of feature descriptions; and
detecting the at least one detected potential ROI in every Nth frame in the one of the groups of the N frames.
4. The computer-implemented image data processing method of claim 3, further comprising:
tracking the locations of the region of interest based on the at least one detected potential ROI.
5. An apparatus comprising:
an image sensor;
an image input circuit configured to receive, from the image sensor, receiving frame image data of N frames, wherein N>1;
a detect circuit configured to detect a region of interest in one of the N frames;
a track circuit configured to track locations of the region of interest in a plurality of the N frames; and
an analysis circuit configured to:
provide T merged locations of the region of interest for T respective groups of N frames, wherein each of the T merged locations is based on the locations of the region of interest in the respective group of the N frames by merging the locations of the region of interest, wherein T>1,
provide respective statistical data for each of the T merged locations, and
provide a final location of the region of interest based on the T merged locations and the statistical data for the T merged locations;
wherein the statistical data for each of the T merged locations comprises at least one of:
a number of the N frames in the respective group of the N frames in which the region of interest appeared, and
a percentage of the N frames in the respective group of the N frames in which the region of interest appeared.
6. The apparatus of claim 5, further comprising:
an ROI input circuit configured to receive a previously-detected location for the region of interest;
wherein the analysis circuit is further configured to provide the final location of the region of interest based on the previously-detected location, the T merged locations, and the statistical data for the T merged locations.
7. The apparatus of claim 5, further comprising:
receiving trained parameters of feature descriptions;
wherein the detect circuit is further configured to detect at least one potential region of interest (ROI) in a first frame in one of the groups of the N frames based on the trained parameters of feature descriptions; and
wherein the detect circuit is further configured to detect the at least one detected potential ROI in every Nth frame in the one of the groups of the N frames.
8. The apparatus of claim 5, wherein the track circuit is further configured to:
track the locations of the region of interest based on the at least one detected potential ROI.
9. A non-transitory machine-readable storage medium encoded with instructions executable by a hardware processor, the machine-readable storage medium comprising instructions to cause the hardware processor to perform an image data processing method, the method comprising:
receiving, from an image sensor, frame image data of N frames, where N>1;
detecting a region of interest in one of the N frames;
tracking locations of the region of interest in a plurality of the N frames;
providing T merged locations of the region of interest for T respective groups of N frames, wherein each of the T merged locations is based on the locations of the region of interest in the respective group of the N frames by merging the locations of the region of interest, wherein T>1;
providing respective statistical data for each of the T merged locations; and
providing a final location of the region of interest based on the T merged locations and the statistical data for the T merged locations;
wherein the statistical data for each of the T merged locations comprises at least one of:
a number of the N frames in the respective group of the N frames in which the region of interest appeared, and
a percentage of the N frames in the respective group of the N frames in which the region of interest appeared.
10. The medium of claim 9, the method further comprising:
receiving a previously-detected location for the region of interest; and
providing the final location of the region of interest based on the previously-detected location, the T merged locations, and the statistical data for the T merged locations.
11. The medium of claim 9, the method further comprising:
receiving trained parameters of feature descriptions;
detecting at least one potential region of interest (ROI) in a first frame in one of the groups of the N frames based on the trained parameters of feature descriptions; and
detecting the at least one potential ROI in every Nth frame in the one of the groups of the N frames.
12. The medium of claim 11, the method further comprising:
tracking the locations of the region of interest based on the at least one detected potential ROI.
US16/161,412 2017-11-06 2018-10-16 Accurate ROI extraction aided by object tracking Active US11023761B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/161,412 US11023761B2 (en) 2017-11-06 2018-10-16 Accurate ROI extraction aided by object tracking

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201762582306P 2017-11-06 2017-11-06
US16/161,412 US11023761B2 (en) 2017-11-06 2018-10-16 Accurate ROI extraction aided by object tracking

Publications (2)

Publication Number Publication Date
US20190138833A1 US20190138833A1 (en) 2019-05-09
US11023761B2 true US11023761B2 (en) 2021-06-01

Family

ID=66327360

Family Applications (1)

Application Number Title Priority Date Filing Date
US16/161,412 Active US11023761B2 (en) 2017-11-06 2018-10-16 Accurate ROI extraction aided by object tracking

Country Status (2)

Country Link
US (1) US11023761B2 (en)
CN (1) CN109740590B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11281927B2 (en) * 2018-11-09 2022-03-22 Canon Kabushiki Kaisha Image processing apparatus, image processing method, and non-transitory computer-readable recording medium
US20220231979A1 (en) * 2021-01-21 2022-07-21 Samsung Electronics Co., Ltd. Device and method for providing notification message related to content
US20230076241A1 (en) * 2021-09-07 2023-03-09 Johnson Controls Tyco IP Holdings LLP Object detection systems and methods including an object detection model using a tailored training dataset

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10691969B2 (en) * 2017-11-06 2020-06-23 EagleSens Systems Corporation Asynchronous object ROI detection in video mode
US10783375B2 (en) * 2018-05-10 2020-09-22 Apptarix Mobility Solutions Pvt Ltd System and method for grouping independent machine learnt artificial intelligence to generate collective “machine wisdom” to obtain higher accuracy in identification of tags, objects and actions in a video
EP3959645A1 (en) * 2019-04-24 2022-03-02 Sony Group Corporation Sensing apparatus and control system for automotive
CN111144406B (en) * 2019-12-22 2023-05-02 复旦大学 Adaptive target ROI positioning method for solar panel cleaning robot
CN111369588B (en) * 2020-02-21 2024-08-02 上海联影医疗科技股份有限公司 Method, device, equipment and storage medium for processing region of interest
CN113129328B (en) * 2021-04-22 2022-05-17 中国电子科技集团公司第二十九研究所 Target hotspot area fine analysis method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150310370A1 (en) * 2014-04-24 2015-10-29 Xerox Corporation Video tracking based method for automatic sequencing of vehicles in drive-thru applications

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101924933A (en) * 2009-04-10 2010-12-22 特克特朗尼克国际销售有限责任公司 Method for tracing interested area in video frame sequence
CN106469443B (en) * 2015-08-13 2020-01-21 微软技术许可有限责任公司 Machine vision feature tracking system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150310370A1 (en) * 2014-04-24 2015-10-29 Xerox Corporation Video tracking based method for automatic sequencing of vehicles in drive-thru applications

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11281927B2 (en) * 2018-11-09 2022-03-22 Canon Kabushiki Kaisha Image processing apparatus, image processing method, and non-transitory computer-readable recording medium
US20220231979A1 (en) * 2021-01-21 2022-07-21 Samsung Electronics Co., Ltd. Device and method for providing notification message related to content
US11943184B2 (en) * 2021-01-21 2024-03-26 Samsung Electronics Co., Ltd. Device and method for providing notification message related to content
US20230076241A1 (en) * 2021-09-07 2023-03-09 Johnson Controls Tyco IP Holdings LLP Object detection systems and methods including an object detection model using a tailored training dataset
US11893084B2 (en) * 2021-09-07 2024-02-06 Johnson Controls Tyco IP Holdings LLP Object detection systems and methods including an object detection model using a tailored training dataset

Also Published As

Publication number Publication date
CN109740590A (en) 2019-05-10
US20190138833A1 (en) 2019-05-09
CN109740590B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
US11023761B2 (en) Accurate ROI extraction aided by object tracking
US10691969B2 (en) Asynchronous object ROI detection in video mode
Sun et al. Deep affinity network for multiple object tracking
Singh et al. Deep spatio-temporal representation for detection of road accidents using stacked autoencoder
CN110235083B (en) Unsupervised learning of object recognition methods and systems
CN105844234B (en) Method and equipment for counting people based on head and shoulder detection
KR102194499B1 (en) Apparatus for CCTV Video Analytics Based on Object-Image Recognition DCNN and Driving Method Thereof
Shen et al. A convolutional neural‐network‐based pedestrian counting model for various crowded scenes
WO2020088763A1 (en) Device and method for recognizing activity in videos
Luo et al. Traffic analytics with low-frame-rate videos
CN104794446B (en) Human motion recognition method and system based on synthesis description
Dai et al. Two-stream convolution neural network with video-stream for action recognition
Wang et al. Towards detection of abnormal vehicle behavior using traffic cameras
Xue et al. Multiple pedestrian tracking under first-person perspective using deep neural network and social force optimization
Rishika et al. Real-time vehicle detection and tracking using YOLO-based deep sort model: A computer vision application for traffic surveillance
Du et al. Perceiving local relative motion and global correlations for weakly supervised group activity recognition
Kroneman et al. Accurate pedestrian localization in overhead depth images via Height-Augmented HOG
Moghaddam et al. A robust attribute-aware and real-time multi-target multi-camera tracking system using multi-scale enriched features and hierarchical clustering
CN111027482B (en) Behavior analysis method and device based on motion vector segmentation analysis
Yanakova et al. Facial recognition technology on ELcore semantic processors for smart cameras
Veluchamy et al. Detection and localization of abnormalities in surveillance video using timerider-based neural network
WO2022228325A1 (en) Behavior detection method, electronic device, and computer readable storage medium
Yu et al. Pedestrian counting based on spatial and temporal analysis
CN113259630B (en) Multi-camera pedestrian track aggregation system and method
Vu et al. Anomaly detection in surveillance videos by future appearance-motion prediction

Legal Events

Date Code Title Description
AS Assignment

Owner name: EAGLESENS SYSTEMS CORPORATION, CAYMAN ISLANDS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:XIONG, WEIHUA;ZHANG, GUANGBIN;SIGNING DATES FROM 20181011 TO 20181012;REEL/FRAME:047240/0765

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE