US20210004575A1 - Quantized transition change detection for activity recognition - Google Patents
Quantized transition change detection for activity recognition Download PDFInfo
- Publication number
- US20210004575A1 US20210004575A1 US16/458,288 US201916458288A US2021004575A1 US 20210004575 A1 US20210004575 A1 US 20210004575A1 US 201916458288 A US201916458288 A US 201916458288A US 2021004575 A1 US2021004575 A1 US 2021004575A1
- Authority
- US
- United States
- Prior art keywords
- image frame
- classes
- sequence
- class
- classifier
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G06K9/00335—
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
- G06F18/2113—Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/217—Validation; Performance evaluation; Active pattern learning techniques
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G06K9/00362—
-
- G06K9/00711—
-
- G06K9/6262—
-
- G06K9/6267—
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/771—Feature selection, e.g. selecting representative features from a multi-dimensional feature space
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/776—Validation; Performance evaluation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/778—Active pattern-learning, e.g. online learning of image or video features
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
Definitions
- the present disclosure relates generally to artificial intelligence, and more specifically, to human activity recognition from a video stream and symbolic processing.
- recognition of human physical activities contributes towards various applications such as surveillance of a retail store check-out process involving a self-check out (SCO) system. Such a system allows buyers to complete a process of purchasing by themselves.
- Another example of application of recognition of human physical activities is providing assistance in video surveillance by detecting unfair activities done by shop lifters such as theft and thereby alerting a personnel employed in the shop to prevent the theft.
- recognition of human physical activities is employed in intelligent driver assisting systems, assisted living systems for humans in need, video games, physiotherapy, and so forth.
- recognition of human physical activities is actively used in the field of sports, military, medical, robotics and so forth.
- the present disclosure seeks to provide a system for recognizing human activity from a video stream and a method thereof.
- a system for recognizing human activity from a video stream captured by an imaging device includes a memory to store one or more instructions, and a processor communicatively coupled to the memory.
- the system includes a classifier communicatively coupled to the imaging device, and configured to classify an image frame of the video steam in one or more classes of a set of pre-defined classes, wherein the image frame is classified based on user action in a region of interest of the image frame, and generate a class probability vector for the image frame based on the classification, wherein the class probability vector includes a set of probabilities of classification of the image frame in each pre-defined class.
- the system further includes a data filtering and binarization module configured to filter and binarize each probability value of the class probability vector based on a pre-defined probability threshold value.
- the system further includes a compressed word composition module configured to determine one or more transitions of one or more classes in one or more consecutive image frames of the video stream, based on corresponding binarized probability vectors, and generate a sequence of compressed words based on the determined one or more transitions in the one or more consecutive image frames.
- the system further includes a sequence dependent classifier configured to extract one or more user actions by analyzing the sequence of compressed words, and recognize human activity therefrom.
- a method for recognizing human activity from a video stream includes classifying by a classifier, an image frame of the video steam in one or more classes of a set of pre-defined classes, wherein the image frame is classified based on user action in a region of interest of the image frame.
- the method further includes generating a class probability vector for the image frame based on the classification, wherein the class probability vector includes a set of probabilities of classification of the image frame in each pre-defined class.
- the method furthermore includes binarizing each probability value of the class probability vector based on a pre-defined probability threshold value.
- a computer programmable product for recognizing human activity from a video stream
- the computer programmable product comprising a set of instructions.
- the set of instructions when executed by a processor causes the processor to classify an image frame of the video steam in one or more classes of a set of pre-defined classes, wherein the image frame is classified based on user action in a region of interest of the image frame, generate a class probability vector for the image frame based on the classification, wherein the class probability vector includes a set of probabilities of classification of the image frame in each pre-defined class, binarize each probability value of the class probability vector based on a pre-defined probability threshold value, determine one or more transitions of one or more classes in one or more consecutive image frames of the video stream, based on corresponding binarized probability vectors, generate a sequence of compressed words based on the determined one or more transitions in the one or more consecutive image frames, and extract one or more user actions by analyzing the sequence of compressed words to extract one or more user actions,
- the present disclosure seeks to provide a system for recognizing human activity from a video stream. Such a system enables efficient and reliable recognition of human activities from the video stream.
- FIG. 2 illustrates the activity recognition system for recognizing one or more human actions and activity in the video stream captured by the imaging device of FIG. 1 , in accordance with an embodiment of the present disclosure
- FIG. 3 is a flowchart illustrating a method for recognizing human activity from a video stream, in accordance with an embodiment of the present disclosure.
- an underlined number is employed to represent an item over which the underlined number is positioned or an item to which the underlined number is adjacent.
- a non-underlined number relates to an item identified by a line linking the non-underlined number to the item. When a number is non-underlined and accompanied by an associated arrow, the non-underlined number is used to identify a general item at which the arrow is pointing.
- FIG. 1 illustrates an environment 100 , wherein various embodiments of the present disclosure can be practiced.
- the environment 100 includes an imaging device 101 , an activity recognition system 102 , and a computing device 103 , communicatively coupled to each other through a communication network 104 .
- the communication network 104 may be any suitable wired network, wireless network, a combination of these or any other conventional network, without limiting the scope of the present disclosure. Few examples may include a Local Area Network (LAN), wireless LAN connection, an Internet connection, a point-to-point connection, or other network connection and combinations thereof.
- LAN Local Area Network
- wireless LAN connection an Internet connection
- point-to-point connection or other network connection and combinations thereof.
- the imaging device 101 is configured to capture a video stream.
- the imaging device 101 is configured to capture one or videos of a retail check out process including a Selfcheck out system (SCO).
- SCO Selfcheck out system
- the imaging device 101 includes, but not limited to, an Internet protocol (IP) camera, a Pan-Tilt-Zoom (PTZ) camera, a thermal image camera or an Infrared camera.
- IP Internet protocol
- PTZ Pan-Tilt-Zoom
- thermal image camera or an Infrared camera.
- the activity recognition system 102 is configured to recognize human actions and human activities in the video stream captured by the imaging device 101 .
- the activity recognition system 102 includes a central processing unit (CPU) 106 , an operation panel 108 , and a memory 110 .
- the CPU 106 is a processor, computer, microcontroller, or other circuitry that controls the operations of various components such as the operation panel 108 , and the memory 110 .
- the CPU 106 may execute software, firmware, and/or other instructions, for example, that are stored on a volatile or non-volatile memory, such as the memory 110 , or otherwise provided to the CPU 106 .
- the CPU 106 may be connected to the operation panel 108 , and the memory 110 , through wired or wireless connections, such as one or more system buses, cables, or other interfaces.
- the CPU 106 may include a custom Graphic processing unit (GPU) server software to provide realtime object detection and prediction, for all cameras on a local network.
- GPU Graphic processing unit
- the operation panel 108 may be a user interface for the image forming apparatus 100 and may take the form of a physical keypad or touchscreen.
- the operation panel 108 may receive inputs from one or more users relating to selected functions, preferences, and/or authentication, and may provide and/or receive inputs visually and/or audibly.
- the memory 110 in addition to storing instructions and/or data for use by the CPU 106 in managing operation of the image forming apparatus 100 , may also include user information associated with one or more users of the image forming apparatus 100 .
- the user information may include authentication information (e.g. username/password pairs), user preferences, and other user-specific information.
- the CPU 106 may access this data to assist in providing control functions (e.g. transmitting and/or receiving one or more control signals) related to operation of the operation panel 108 , and the memory 110 .
- the imaging device 101 and the activity recognition system 102 may be controlled/operated by the computing device 103 .
- Examples of the computing device 103 include a smartphone, a personal computer, a laptop, and the like.
- the computing device 103 enables the user/operator to view and save the videos captured by the imaging device 101 , and access the videos/images processed by the activity recognition system 102 .
- the computing device 103 may execute a mobile application of the activity recognition system 102 so as to enable a user to access and process the video stream captured by the imaging device 101 .
- the camera 101 , the activity recognition system 102 , and the computing device 103 may be integrated in a single device, where the single device is a portable smartphone having a built-in camera and a display.
- FIG. 2 illustrates the activity recognition system 102 for recognizing one or more human actions and activity in the video stream captured by the imaging device 101 , in accordance with an embodiment of the present disclosure.
- the activity recognition system 102 includes the CPU 106 that includes a classifier 202 that is operable to analyze each frame of the video stream to determine at least one action region of interest, wherein the at least one region of interest comprise at least one object.
- the action region of interest refers to a rectangular area in each frame of the video stream, where in the at least one object is seen and one or more actions take place.
- the at least one object may be a person, objects such as clothing items, groceries, wallet and so forth, and one or more actions may include a person taking out wallet from its pocket, the person walking in a queue, the person swiping a credit card, and the like.
- Each action can be used as a building block for process model extraction, wherein a process can be expressed as a chain of actions.
- the classifier 202 may be an algorithm-based classifier such as a convolutional neural network (CNN) trained to classify an image frame of the video of the SCO scan area (scanning action region of interest) in classes such as hand, object in hand, object, body part, empty scanner.
- CNN convolutional neural network
- Hand The image frame shows human hand(s).
- Object in hand The image frame shows an object in a hand of the user.
- the image frame shows a human body part
- the CNN as referred herein is defined as trained deep artificial neural networks that is used primarily to classify the at least one object in the at least one region of interest. Notably, they are algorithms that can identify faces, individuals, street signs, and the like.
- the term “neural network” as used herein can include a highly interconnected network of processing elements, each optionally associated with a local memory.
- the neural network may be a Kohonen map, a multi-layer perceptron, and so forth.
- the processing elements of the neural networks can be “artificial neural units”, “artificial neurons,” “neural units,” “neurons,” “nodes,” and the like.
- the neuron can receive data from an input or one or more other neurons, process the data, and send processed data to an output or yet one or more other neurons.
- the neural network or one or more neurons thereof can be generated in either hardware, software, or a combination of hardware and software, and the neural network can be subsequently trained.
- the convolutional neural network consists of an input layer, a plurality of hidden layers and an output layer.
- the plurality of hidden layers of the convolutional neural network typically consist of convolutional layers, pooling layers, fully connected layers and normalization layers.
- a Visual Geometry Group 19 (VGG 19) model is used as a convolutional neural network architecture.
- the VGG 19 model is configured to classify the at least one object in the frame of the video stream into classes.
- hidden layers comprise a plurality of sets of convolution layers.
- the classifier 202 receives and classifies an image frame of the video stream of the SCO scan area (scanning action region of interest) in classes such as hand, object in hand, object, body part, empty scanner based on content of the image frame.
- P Hand Probability of the image frame to be classified in class ‘hand’
- P HandObject Probability of the image frame to be classified in class ‘object in hand’
- P Object Probability of the image frame to be classified in class ‘object’
- P BodyPart Probability of the image frame to be classified in class ‘body part”
- P EmptyScanner Probability of the image frame to be classified in class ‘empty scanner”
- the classifier 202 generates six probability vectors P v1 till P v6 for six consecutive image frames in five classes, in a format given below.
- the CPU 106 further includes a quantized signature generation module 204 for generating a quantized signature for each scan action determined by the classifier 202 .
- a scan action is a user action performed for scanning an item in a scanning zone of a self-check out (SCO) terminal.
- SCO self-check out
- the quantized signature generation module 204 includes a data filtering and binarization module 205 , a silent interval detection module 206 , and a compressed word composition module 207 .
- the data filtering and binarization module 205 is configured to apply a filter on the class probability vectors generated by the classifier 202 to minimize errors by the classifier 202 .
- a classifier error appears if the classifier 202 classifies a continuous movement on the scanner using a single class for the entire sequence except one isolated frame. In such case, the isolated frame may be wrongly classified.
- each probability vector Pvn includes probabilities of classification of the image frame in each of the five classes “hand”, “object in hand”, “object”, “body part”, and “empty scanner”.
- the probability vector P v3 of the third image frame of the video sequence is different, which means that there is an error in the classification of the third image frame by the classifier 202 .
- the data filtering and binarization module 205 rectifies the error in the classification of the third image frame based on the information that the six frames cover substantially similar information. In an embodiment of the present disclosure, the data filtering and binarization module 205 rectifies the error by removing the erroneous frame.
- the data filtering and binarization module 205 is then configured to binarize the filtered values of probability vectors using a heuristic threshold value, such that each component of a probability vector is assigned a value “1” if it is equal to or greater than the heuristic threshold value, else “0”.
- exemplary filtered probability vectors P vf for five consecutive image frames may be represented as below:
- Each binarized probability vector Pvb is thus a binarized string of a series of binary numbers, that can be used to determine transitions of classes in consecutive frames.
- the binary value corresponding to class ‘object’ is ‘0’
- the binary value corresponding to class ‘object’ is ‘1’
- the binary value corresponding to class ‘object’ is ‘1’
- the binary value for ‘object in hand’ changes to ‘1’
- the binary value for ‘object’ changes to ‘0’. This clearly indicates that the user has kept the object in their hand during transition from fourth to fifth frame.
- the binarized/quantized probability vectors provide information about transition of classes in consecutive image frames.
- the silent interval detection module 206 is configured to detect one or more silent intervals in the video stream.
- no activity is detected in the scanning zone for a threshold time duration.
- the threshold time duration may be set as ‘0.5 s’, and a time interval of more than 0.5 s is marked as ‘silent interval’ when the binary value of class “empty scanner” of corresponding image frames remains ‘1’ during the entire time interval.
- each word is composed from letters of an alphabet containing 2*N letters correlated with the process actions semantics, where N represents the number of classes. In an example, if the number of classes is 5, then each word is composed from total 10 letters. For each class a “0->1” transition generates a specific “beginning” letter (e.g. ‘O’ for the class Object), while a “1->0” transition generates an “ending” letter (e.g. ‘o’ for the class Object).
- two adjacent words are separated by at least one frame classified as “empty scanner”. This could represent or not a silent interval depending on the length of consecutive ‘1’ ‘empty scanner’ values.
- the sequence dependent classifier 208 is configured to receive the quantized output from the compressed word composition module 207 , and extract one or more scan actions from the continuous sequence of transitions represented as alphabet letters.
- the sequence dependent classifier 208 includes a machine learning based engine, as used herein relates to an engine that is capable of studying of algorithms and statistical models and use them to effectively perform a specific task without using explicit instructions, relying on patterns and inference. Examples of the sequence dependent classifier 208 include a recurrent neural network (RNN), a K nearest neighbor algorithm (KNN), and a support vector machine (SVM) algorithm, and so forth.
- RNN recurrent neural network
- KNN K nearest neighbor algorithm
- SVM support vector machine
- the sequence dependent classifier 208 analyzes the sequence of compressed words to recognize the human activity from the video stream.
- the sequence of compressed words is analyzed in order to determine various transitions of the classes in the region of interest. Such determination of the transitions of the classes leads to the recognition of the human activity from the video stream.
- the sequence dependent classifier 208 recognize transitions of the binarized input signal which suggest basic actions.
- the quantized signature generation module 204 provides a quantization process for input signals coming from the classifier 202 observing a region of interest where an activity take place.
- the method for transitions quantization aims to reduce the influence of time variation and the variety of body parts movements in activity recognition using the sequence dependent classifier 208 .
- FIG. 3 is a flowchart illustrating a method 300 for recognizing human activity from a video stream, in accordance with an embodiment of the present disclosure. Some steps may be discussed with respect to the system as shown in FIG. 2 .
- an image frame of the video steam in one or more classes of a set of pre-defined classes is classified by a classifier, wherein the image frame is classified based on user action in a region of interest of the image frame.
- the classifier is a convolutional neural network.
- the set of predefined classes for a Self-check out (SCO) scanning zone include classes such as hand, object in hand, object, body part, and empty scanner.
- P Hand Probability of the image frame to be classified in class ‘hand’
- P HandObject Probability of the image frame to be classified in class ‘object in hand’
- P Object Probability of the image frame to be classified in class ‘object’
- P BodyPart Probability of the image frame to be classified in class ‘body part”
- P EmptyScanner Probability of the image frame to be classified in class ‘empty scanner”
- each probability value of the class probability vector is binarized based on a pre-defined probability threshold value.
- each component of a probability vector is assigned a value “1” if it is equal to or greater than the heuristic threshold value, else “0”.
- one or more transitions of one or more classes are determined in one or more consecutive image frames of the video stream, based on corresponding binarized probability vectors. For example, if in the first image frame, the binary value corresponding to class ‘object’ is ‘0’, and in the second image frame, the binary value corresponding to class ‘object’ is ‘1’, which means that there is clearly a transition of class from the first to second image frame.
- a sequence of compressed words is generated based on the determined one or more transitions in the one or more consecutive image frames.
- the compressed words are generated based on the transition of classes from ‘1’ to ‘0’ and ‘0’ to ‘1’ in consecutive image frames.
- a compressed word is formed from letters of an alphabet containing number of letters equivalent to twice the number of pre-defined classes.
- each of the compressed word of the sequence of compressed words comprise at least one frame of non-activity therebetween. In an example, if the number of classes is 5, then each word is composed from total 10 letters. For each class a “0->1” transition generates a specific “beginning” letter (e.g. ‘O’ for the class Object), while a “1->0” transition generates an “ending” letter (e.g. ‘o’ for the class Object).
- one or more user actions are extracted based on analysis of the sequence of compressed words by a sequence dependent classifier.
- the one or more user actions may be used to recognize human activity in the SCO scan area (scanning action region of interest), and transmits the recognition results to a user computing device.
- the user computing device may be configured to store or display the recognition results.
- the sequence dependent classifier is a recurrent neural network.
- the present disclosure also relates to software products recorded on machine-readable non-transient data storage media, wherein the software products are executable upon computing hardware to implement methods of recognizing human activity from a video stream.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- Human Computer Interaction (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
Description
- The present disclosure relates generally to artificial intelligence, and more specifically, to human activity recognition from a video stream and symbolic processing.
- With advancement in technology, recognition of human physical activities is gaining tremendous importance. The recognition of human physical activities contributes towards various applications such as surveillance of a retail store check-out process involving a self-check out (SCO) system. Such a system allows buyers to complete a process of purchasing by themselves. Another example of application of recognition of human physical activities is providing assistance in video surveillance by detecting unfair activities done by shop lifters such as theft and thereby alerting a personnel employed in the shop to prevent the theft. Moreover, recognition of human physical activities is employed in intelligent driver assisting systems, assisted living systems for humans in need, video games, physiotherapy, and so forth. Furthermore, recognition of human physical activities is actively used in the field of sports, military, medical, robotics and so forth.
- Human physical activities represent the building blocks of most process modelling. However, as human behaviour is unpredictable, the recognition of such human physical activities in a diverse environment is a difficult task. The human physical activity is typically decomposable into a set of basic actions involving various human body parts, such as hands, feet, face, and so forth. Moreover, the set of basic actions associated with the human physical activity are spanned over a plurality of time intervals. Recognition tasks of such activities face the problem of summarizing the overall sequence of actions over a variable time interval.
- The conventional human physical activity recognition techniques are inefficient in recognizing the human physical activities, due to a different body structure, a different body shape, a different skin colour and so forth of each human body. Also, the time frame for a human activity pose important variation in time depending on the subject, and maybe other environment conditions. Moreover, not all the basic body parts movements are related with the purpose of the considered activity. Therefore, the activity recognition process face two major problems related with actions time variation and physical trajectory variation of human body parts involved in the activity.
- Therefore, in light of the foregoing discussion, there exists a need to overcome the aforementioned drawbacks associated with the recognition of human physical activities, and provide a system and method that aims to reduce the influence of time variation and the variety of body parts movements in activity recognition using a recurrent neural network.
- The present disclosure seeks to provide a system for recognizing human activity from a video stream and a method thereof.
- According to an aspect of the present disclosure, there is provided a system for recognizing human activity from a video stream captured by an imaging device. The system includes a memory to store one or more instructions, and a processor communicatively coupled to the memory. The system includes a classifier communicatively coupled to the imaging device, and configured to classify an image frame of the video steam in one or more classes of a set of pre-defined classes, wherein the image frame is classified based on user action in a region of interest of the image frame, and generate a class probability vector for the image frame based on the classification, wherein the class probability vector includes a set of probabilities of classification of the image frame in each pre-defined class. The system further includes a data filtering and binarization module configured to filter and binarize each probability value of the class probability vector based on a pre-defined probability threshold value. The system further includes a compressed word composition module configured to determine one or more transitions of one or more classes in one or more consecutive image frames of the video stream, based on corresponding binarized probability vectors, and generate a sequence of compressed words based on the determined one or more transitions in the one or more consecutive image frames. The system further includes a sequence dependent classifier configured to extract one or more user actions by analyzing the sequence of compressed words, and recognize human activity therefrom.
- According to another aspect of the present disclosure, there is provided a method for recognizing human activity from a video stream. The method includes classifying by a classifier, an image frame of the video steam in one or more classes of a set of pre-defined classes, wherein the image frame is classified based on user action in a region of interest of the image frame. The method further includes generating a class probability vector for the image frame based on the classification, wherein the class probability vector includes a set of probabilities of classification of the image frame in each pre-defined class. The method furthermore includes binarizing each probability value of the class probability vector based on a pre-defined probability threshold value. The method furthermore includes determining one or more transitions of one or more classes in one or more consecutive image frames of the video stream, based on corresponding binarized probability vectors. The method furthermore includes generating a sequence of compressed words based on the determined one or more transitions in the one or more consecutive image frames. The method furthermore includes extracting one or more user actions by analyzing the sequence of compressed words to, and recognize human activity therefrom.
- According to yet another aspect of the present disclosure, there is provided a computer programmable product for recognizing human activity from a video stream, the computer programmable product comprising a set of instructions. The set of instructions when executed by a processor causes the processor to classify an image frame of the video steam in one or more classes of a set of pre-defined classes, wherein the image frame is classified based on user action in a region of interest of the image frame, generate a class probability vector for the image frame based on the classification, wherein the class probability vector includes a set of probabilities of classification of the image frame in each pre-defined class, binarize each probability value of the class probability vector based on a pre-defined probability threshold value, determine one or more transitions of one or more classes in one or more consecutive image frames of the video stream, based on corresponding binarized probability vectors, generate a sequence of compressed words based on the determined one or more transitions in the one or more consecutive image frames, and extract one or more user actions by analyzing the sequence of compressed words to extract one or more user actions, and recognize human activity therefrom.
- The present disclosure seeks to provide a system for recognizing human activity from a video stream. Such a system enables efficient and reliable recognition of human activities from the video stream.
- It will be appreciated that features of the present disclosure are susceptible to being combined in various combinations without departing from the scope of the present disclosure as defined by the appended claims.
- The summary above, as well as the following detailed description of illustrative embodiments, is better understood when read in conjunction with the appended drawings. For the purpose of illustrating the present disclosure, exemplary constructions of the disclosure are shown in the drawings. However, the present disclosure is not limited to specific methods and instrumentalities disclosed herein. Moreover, those in the art will understand that the drawings are not to scale. Wherever possible, like elements have been indicated by identical numbers.
- Embodiments of the present disclosure will now be described, by way of example only, with reference to the following diagrams wherein:
-
FIG. 1 illustrates an environment, wherein various embodiments of the present disclosure can be practiced; -
FIG. 2 illustrates the activity recognition system for recognizing one or more human actions and activity in the video stream captured by the imaging device ofFIG. 1 , in accordance with an embodiment of the present disclosure; and -
FIG. 3 is a flowchart illustrating a method for recognizing human activity from a video stream, in accordance with an embodiment of the present disclosure. - In the accompanying drawings, an underlined number is employed to represent an item over which the underlined number is positioned or an item to which the underlined number is adjacent. A non-underlined number relates to an item identified by a line linking the non-underlined number to the item. When a number is non-underlined and accompanied by an associated arrow, the non-underlined number is used to identify a general item at which the arrow is pointing.
- The following detailed description illustrates embodiments of the present disclosure and ways in which they can be implemented. Although some modes of carrying out the present disclosure have been disclosed, those skilled in the art would recognize that other embodiments for carrying out or practicing the present disclosure are also possible.
-
FIG. 1 illustrates anenvironment 100, wherein various embodiments of the present disclosure can be practiced. Theenvironment 100 includes animaging device 101, anactivity recognition system 102, and acomputing device 103, communicatively coupled to each other through acommunication network 104. Thecommunication network 104 may be any suitable wired network, wireless network, a combination of these or any other conventional network, without limiting the scope of the present disclosure. Few examples may include a Local Area Network (LAN), wireless LAN connection, an Internet connection, a point-to-point connection, or other network connection and combinations thereof. - The
imaging device 101 is configured to capture a video stream. In an embodiment of the present disclosure, theimaging device 101 is configured to capture one or videos of a retail check out process including a Selfcheck out system (SCO). Optionally, theimaging device 101 includes, but not limited to, an Internet protocol (IP) camera, a Pan-Tilt-Zoom (PTZ) camera, a thermal image camera or an Infrared camera. - The
activity recognition system 102 is configured to recognize human actions and human activities in the video stream captured by theimaging device 101. - The
activity recognition system 102 includes a central processing unit (CPU) 106, anoperation panel 108, and amemory 110. TheCPU 106 is a processor, computer, microcontroller, or other circuitry that controls the operations of various components such as theoperation panel 108, and thememory 110. TheCPU 106 may execute software, firmware, and/or other instructions, for example, that are stored on a volatile or non-volatile memory, such as thememory 110, or otherwise provided to theCPU 106. TheCPU 106 may be connected to theoperation panel 108, and thememory 110, through wired or wireless connections, such as one or more system buses, cables, or other interfaces. In an embodiment of the present disclosure, theCPU 106 may include a custom Graphic processing unit (GPU) server software to provide realtime object detection and prediction, for all cameras on a local network. - The
operation panel 108 may be a user interface for theimage forming apparatus 100 and may take the form of a physical keypad or touchscreen. Theoperation panel 108 may receive inputs from one or more users relating to selected functions, preferences, and/or authentication, and may provide and/or receive inputs visually and/or audibly. - The
memory 110, in addition to storing instructions and/or data for use by theCPU 106 in managing operation of theimage forming apparatus 100, may also include user information associated with one or more users of theimage forming apparatus 100. For example, the user information may include authentication information (e.g. username/password pairs), user preferences, and other user-specific information. TheCPU 106 may access this data to assist in providing control functions (e.g. transmitting and/or receiving one or more control signals) related to operation of theoperation panel 108, and thememory 110. - The
imaging device 101 and theactivity recognition system 102 may be controlled/operated by thecomputing device 103. Examples of thecomputing device 103 include a smartphone, a personal computer, a laptop, and the like. Thecomputing device 103 enables the user/operator to view and save the videos captured by theimaging device 101, and access the videos/images processed by theactivity recognition system 102. Thecomputing device 103 may execute a mobile application of theactivity recognition system 102 so as to enable a user to access and process the video stream captured by theimaging device 101. - In an embodiment, the
camera 101, theactivity recognition system 102, and thecomputing device 103 may be integrated in a single device, where the single device is a portable smartphone having a built-in camera and a display. -
FIG. 2 illustrates theactivity recognition system 102 for recognizing one or more human actions and activity in the video stream captured by theimaging device 101, in accordance with an embodiment of the present disclosure. - The
activity recognition system 102 includes theCPU 106 that includes aclassifier 202 that is operable to analyze each frame of the video stream to determine at least one action region of interest, wherein the at least one region of interest comprise at least one object. The action region of interest refers to a rectangular area in each frame of the video stream, where in the at least one object is seen and one or more actions take place. In an example, the at least one object may be a person, objects such as clothing items, groceries, wallet and so forth, and one or more actions may include a person taking out wallet from its pocket, the person walking in a queue, the person swiping a credit card, and the like. Each action can be used as a building block for process model extraction, wherein a process can be expressed as a chain of actions. - In an embodiment of the present disclosure, the
classifier 202 may be an algorithm-based classifier such as a convolutional neural network (CNN) trained to classify an image frame of the video of the SCO scan area (scanning action region of interest) in classes such as hand, object in hand, object, body part, empty scanner. The criteria for classification of an image frame in each class has been mentioned below: - Hand—The image frame shows human hand(s).
- Object in hand—The image frame shows an object in a hand of the user.
- Object—The image frame shows only object
- Body part—The image frame shows a human body part
- Empty scanner—The image frame shows only the empty scanner
- The CNN as referred herein is defined as trained deep artificial neural networks that is used primarily to classify the at least one object in the at least one region of interest. Notably, they are algorithms that can identify faces, individuals, street signs, and the like. The term “neural network” as used herein can include a highly interconnected network of processing elements, each optionally associated with a local memory. In an example, the neural network may be a Kohonen map, a multi-layer perceptron, and so forth. Furthermore, the processing elements of the neural networks can be “artificial neural units”, “artificial neurons,” “neural units,” “neurons,” “nodes,” and the like. Moreover, the neuron can receive data from an input or one or more other neurons, process the data, and send processed data to an output or yet one or more other neurons. The neural network or one or more neurons thereof can be generated in either hardware, software, or a combination of hardware and software, and the neural network can be subsequently trained. It will be appreciated that the convolutional neural network (CNN) consists of an input layer, a plurality of hidden layers and an output layer. Moreover, the plurality of hidden layers of the convolutional neural network typically consist of convolutional layers, pooling layers, fully connected layers and normalization layers. Optionally, a Visual Geometry Group 19 (VGG 19) model is used as a convolutional neural network architecture. The VGG 19 model is configured to classify the at least one object in the frame of the video stream into classes. It will be appreciated that hidden layers comprise a plurality of sets of convolution layers.
- In operation, the
classifier 202 receives and classifies an image frame of the video stream of the SCO scan area (scanning action region of interest) in classes such as hand, object in hand, object, body part, empty scanner based on content of the image frame. In an embodiment of the present disclosure, theclassifier 202 analyses each image frame statically and for each image frame, outputs a class probability vector Pv having one component for each considered class, such that, Pv={PHand, PHandObject, PObject, PBodyPart, PEmptyScanner} - Where PHand=Probability of the image frame to be classified in class ‘hand’
PHandObject=Probability of the image frame to be classified in class ‘object in hand’
PObject=Probability of the image frame to be classified in class ‘object’
PBodyPart=Probability of the image frame to be classified in class ‘body part”
PEmptyScanner=Probability of the image frame to be classified in class ‘empty scanner” - In an example, the
classifier 202 generates six probability vectors Pv1 till Pv6 for six consecutive image frames in five classes, in a format given below. -
P v1={0.0,0.0,0.0,0.0,1.0} -
P v2={0.0,0.0,0.28,0.0,0.72} -
P v3={0.0,0.0,0.26,0.0,0.74} -
P v4={0.0,0.0,0.19,0.0,0.81} -
P v5={0.0,0.0,0.29,0.0,0.71} P v6={0.0,0.45,0.14,0.0,0.41} - The
CPU 106 further includes a quantizedsignature generation module 204 for generating a quantized signature for each scan action determined by theclassifier 202. A scan action is a user action performed for scanning an item in a scanning zone of a self-check out (SCO) terminal. - The quantized
signature generation module 204 includes a data filtering andbinarization module 205, a silentinterval detection module 206, and a compressedword composition module 207. - The data filtering and
binarization module 205 is configured to apply a filter on the class probability vectors generated by theclassifier 202 to minimize errors by theclassifier 202. A classifier error appears if theclassifier 202 classifies a continuous movement on the scanner using a single class for the entire sequence except one isolated frame. In such case, the isolated frame may be wrongly classified. - Below is an example output of probability vectors from the
classifier 202 for six consecutive image frames of the video stream, wherein the six consecutive image frames cover a continuous movement over the scanner. For an image frame in, each probability vector Pvn includes probabilities of classification of the image frame in each of the five classes “hand”, “object in hand”, “object”, “body part”, and “empty scanner”. -
P v1={0.0,0.0,0.28,0.0,0.72} -
P v2={0.0,0.0,0.28,0.0,0.72} -
P v3={0.0,0.0,0.01,0.27,0.72} -
P v4={0.0,0.0,0.28,0.0,0.72} -
P v5={0.0,0.0,0.28,0.0,0.72} -
P v6={0.0,0.0,0.28,0.0,0.72} - It can be clearly seen that the probability vector Pv3 of the third image frame of the video sequence is different, which means that there is an error in the classification of the third image frame by the
classifier 202. The data filtering andbinarization module 205 rectifies the error in the classification of the third image frame based on the information that the six frames cover substantially similar information. In an embodiment of the present disclosure, the data filtering andbinarization module 205 rectifies the error by removing the erroneous frame. - The data filtering and
binarization module 205 is then configured to binarize the filtered values of probability vectors using a heuristic threshold value, such that each component of a probability vector is assigned a value “1” if it is equal to or greater than the heuristic threshold value, else “0”. - In an example, when heuristic threshold value is 0.2, exemplary filtered probability vectors Pvf for five consecutive image frames may be represented as below:
-
P vf1={0.0,0.0,1.0} -
P vf2={0.0,0.0,0.28,0.0,0.72} -
P vf3={0.0,0.0,0.26,0.0,0.74} -
P vf4={0.0,0.0,0.39,0.0,0.71} -
P vf5={0.0,0.45,0.14,0.0,0.41} - and corresponding binarized probability vectors Pvb may be represented as below:
-
P vb1={0 0 0 0 1} -
P vb2={0 0 1 0 1} -
P vb3={0 0 1 0 1} -
P vb4={0 0 1 0 1} -
P vb5={0 1 0 0 1} - Each binarized probability vector Pvb is thus a binarized string of a series of binary numbers, that can be used to determine transitions of classes in consecutive frames. For example, in the first image frame, the binary value corresponding to class ‘object’ is ‘0’, and in the second image frame, the binary value corresponding to class ‘object’ is ‘1’, which means that there is clearly a transition of class from the first to second image frame. Similarly, in the fourth image frame, the binary value corresponding to class ‘object in hand’ is ‘0’, and the binary value corresponding to class ‘object’ is ‘1’. In the fifth frame, the binary value for ‘object in hand’ changes to ‘1’, and the binary value for ‘object’ changes to ‘0’. This clearly indicates that the user has kept the object in their hand during transition from fourth to fifth frame. Thus, the binarized/quantized probability vectors provide information about transition of classes in consecutive image frames.
- The silent
interval detection module 206 is configured to detect one or more silent intervals in the video stream. In an embodiment of the present disclosure, during silent interval, no activity is detected in the scanning zone for a threshold time duration. In an example, the threshold time duration may be set as ‘0.5 s’, and a time interval of more than 0.5 s is marked as ‘silent interval’ when the binary value of class “empty scanner” of corresponding image frames remains ‘1’ during the entire time interval. - The compressed
word composition module 207 is configured to generate a sequence of compressed words based on the binarized strings generated by the data filtering andbinarization module 205. The compressed words are generated based on the transition of classes from ‘1’ to ‘0’ and ‘0’ to ‘1’ in consecutive image frames. - In an embodiment of the present disclosure, each word is composed from letters of an alphabet containing 2*N letters correlated with the process actions semantics, where N represents the number of classes. In an example, if the number of classes is 5, then each word is composed from total 10 letters. For each class a “0->1” transition generates a specific “beginning” letter (e.g. ‘O’ for the class Object), while a “1->0” transition generates an “ending” letter (e.g. ‘o’ for the class Object).
- Thus, the alphabet for five classes: ‘hand’, ‘object in hand’, ‘object’, ‘body part’, and ‘empty scanner’, contains the following letters:
- classHand up:H down:h
classHandObject up:Q down:q
classObject up:O down:o
classBodyPart up: B down: b
classEmptyScanner up: E down: e - In an embodiment of the present disclosure, two adjacent words are separated by at least one frame classified as “empty scanner”. This could represent or not a silent interval depending on the length of consecutive ‘1’ ‘empty scanner’ values.
- An example of quantized output generated by the compressed
word composition module 207 is represented below: - Silence
- OoE
- Silence
- OQoOqBobE
- Silence
- The sequence
dependent classifier 208 is configured to receive the quantized output from the compressedword composition module 207, and extract one or more scan actions from the continuous sequence of transitions represented as alphabet letters. The sequencedependent classifier 208 includes a machine learning based engine, as used herein relates to an engine that is capable of studying of algorithms and statistical models and use them to effectively perform a specific task without using explicit instructions, relying on patterns and inference. Examples of the sequencedependent classifier 208 include a recurrent neural network (RNN), a K nearest neighbor algorithm (KNN), and a support vector machine (SVM) algorithm, and so forth. - The sequence
dependent classifier 208 analyzes the sequence of compressed words to recognize the human activity from the video stream. The sequence of compressed words is analyzed in order to determine various transitions of the classes in the region of interest. Such determination of the transitions of the classes leads to the recognition of the human activity from the video stream. The sequencedependent classifier 208 recognize transitions of the binarized input signal which suggest basic actions. - Thus, the quantized
signature generation module 204 provides a quantization process for input signals coming from theclassifier 202 observing a region of interest where an activity take place. The method for transitions quantization aims to reduce the influence of time variation and the variety of body parts movements in activity recognition using the sequencedependent classifier 208. -
FIG. 3 is a flowchart illustrating amethod 300 for recognizing human activity from a video stream, in accordance with an embodiment of the present disclosure. Some steps may be discussed with respect to the system as shown inFIG. 2 . - At
step 302, an image frame of the video steam in one or more classes of a set of pre-defined classes is classified by a classifier, wherein the image frame is classified based on user action in a region of interest of the image frame. In an embodiment of the present disclosure, the classifier is a convolutional neural network. In another embodiment of the present disclosure, the set of predefined classes for a Self-check out (SCO) scanning zone, include classes such as hand, object in hand, object, body part, and empty scanner. - At
step 304, a class probability vector is generated for the image frame based on the classification, wherein the class probability vector includes a set of probabilities of classification of the image frame in each pre-defined class. In an example, a class probability vector Pv is represented by: -
P v ={P Hand ,P HandObject ,P Object ,P BodyPart ,P EmptyScanner} - Where PHand=Probability of the image frame to be classified in class ‘hand’
PHandObject=Probability of the image frame to be classified in class ‘object in hand’
PObject=Probability of the image frame to be classified in class ‘object’
PBodyPart=Probability of the image frame to be classified in class ‘body part”
PEmptyScanner=Probability of the image frame to be classified in class ‘empty scanner” - At
step 306, each probability value of the class probability vector is binarized based on a pre-defined probability threshold value. In an example, each component of a probability vector is assigned a value “1” if it is equal to or greater than the heuristic threshold value, else “0”. - At
step 308, one or more transitions of one or more classes are determined in one or more consecutive image frames of the video stream, based on corresponding binarized probability vectors. For example, if in the first image frame, the binary value corresponding to class ‘object’ is ‘0’, and in the second image frame, the binary value corresponding to class ‘object’ is ‘1’, which means that there is clearly a transition of class from the first to second image frame. - At
step 310, a sequence of compressed words is generated based on the determined one or more transitions in the one or more consecutive image frames. The compressed words are generated based on the transition of classes from ‘1’ to ‘0’ and ‘0’ to ‘1’ in consecutive image frames. In an embodiment of the present disclosure, a compressed word is formed from letters of an alphabet containing number of letters equivalent to twice the number of pre-defined classes. Further, each of the compressed word of the sequence of compressed words comprise at least one frame of non-activity therebetween. In an example, if the number of classes is 5, then each word is composed from total 10 letters. For each class a “0->1” transition generates a specific “beginning” letter (e.g. ‘O’ for the class Object), while a “1->0” transition generates an “ending” letter (e.g. ‘o’ for the class Object). - At
step 312, one or more user actions are extracted based on analysis of the sequence of compressed words by a sequence dependent classifier. The one or more user actions may be used to recognize human activity in the SCO scan area (scanning action region of interest), and transmits the recognition results to a user computing device. In some embodiments, the user computing device may be configured to store or display the recognition results. In an embodiment of the present disclosure, the sequence dependent classifier is a recurrent neural network. - The present disclosure also relates to software products recorded on machine-readable non-transient data storage media, wherein the software products are executable upon computing hardware to implement methods of recognizing human activity from a video stream.
- Modifications to embodiments of the invention described in the foregoing are possible without departing from the scope of the invention as defined by the accompanying claims. Expressions such as “including”, “comprising”, “incorporating”, “consisting of”, “have”, “is” used to describe and claim the present invention are intended to be construed in a non-exclusive manner, namely allowing for items, components or elements not explicitly described also to be present. Reference to the singular is also to be construed to relate to the plural. Numerals included within parentheses in the accompanying claims are intended to assist understanding of the claims and should not be construed in any way to limit subject matter claimed by these claims.
Claims (19)
Priority Applications (11)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US16/458,288 US10902247B1 (en) | 2019-07-01 | 2019-07-01 | Quantized transition change detection for activity recognition |
| EP20726560.4A EP3994603A1 (en) | 2019-07-01 | 2020-05-12 | Quantized transition change detection for activity recognition |
| JP2021578060A JP7285973B2 (en) | 2019-07-01 | 2020-05-12 | Quantized Transition Change Detection for Activity Recognition |
| PCT/IB2020/054488 WO2021001702A1 (en) | 2019-07-01 | 2020-05-12 | Quantized transition change detection for activity recognition |
| CA3141958A CA3141958A1 (en) | 2019-07-01 | 2020-05-12 | Quantized transition change detection for activity recognition |
| CN202080046269.7A CN114008693B (en) | 2019-07-01 | 2020-05-12 | Quantized transition change detection for activity recognition |
| KR1020227001315A KR102783240B1 (en) | 2019-07-01 | 2020-05-12 | Quantized transition change detection for activity recognition |
| MX2021015584A MX2021015584A (en) | 2019-07-01 | 2020-05-12 | QUANTIFIED TRANSITION CHANGE DETECTION FOR ACTIVITY RECOGNITION. |
| AU2020298842A AU2020298842B2 (en) | 2019-07-01 | 2020-05-12 | Quantized transition change detection for activity recognition |
| BR112021024260A BR112021024260A2 (en) | 2019-07-01 | 2020-05-12 | Quantized transition change detection for activity recognition |
| CONC2021/0016435A CO2021016435A2 (en) | 2019-07-01 | 2021-12-01 | Quantized transition change detection for activity recognition |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US16/458,288 US10902247B1 (en) | 2019-07-01 | 2019-07-01 | Quantized transition change detection for activity recognition |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20210004575A1 true US20210004575A1 (en) | 2021-01-07 |
| US10902247B1 US10902247B1 (en) | 2021-01-26 |
Family
ID=70740723
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US16/458,288 Active 2039-10-10 US10902247B1 (en) | 2019-07-01 | 2019-07-01 | Quantized transition change detection for activity recognition |
Country Status (11)
| Country | Link |
|---|---|
| US (1) | US10902247B1 (en) |
| EP (1) | EP3994603A1 (en) |
| JP (1) | JP7285973B2 (en) |
| KR (1) | KR102783240B1 (en) |
| CN (1) | CN114008693B (en) |
| AU (1) | AU2020298842B2 (en) |
| BR (1) | BR112021024260A2 (en) |
| CA (1) | CA3141958A1 (en) |
| CO (1) | CO2021016435A2 (en) |
| MX (1) | MX2021015584A (en) |
| WO (1) | WO2021001702A1 (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2023013879A1 (en) * | 2021-08-03 | 2023-02-09 | Samsung Electronics Co., Ltd. | A method and system for tracking at least one action of the user(s) for overcoming occlusion |
| US11776319B2 (en) * | 2020-07-14 | 2023-10-03 | Fotonation Limited | Methods and systems to predict activity in a sequence of images |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20120320199A1 (en) * | 2011-06-06 | 2012-12-20 | Malay Kundu | Notification system and methods for use in retail environments |
| US20130051662A1 (en) * | 2011-08-26 | 2013-02-28 | Canon Kabushiki Kaisha | Learning apparatus, method for controlling learning apparatus, detection apparatus, method for controlling detection apparatus and storage medium |
| US20150294192A1 (en) * | 2014-04-10 | 2015-10-15 | Disney Enterprises, Inc. | Multi-level framework for object detection |
| US20150379497A1 (en) * | 2014-06-27 | 2015-12-31 | Miguel Florez | System, device, and method for self-checkout shopping |
| US20180096567A1 (en) * | 2016-09-18 | 2018-04-05 | Stoplift, Inc. | Non-Scan Loss Verification at Self-Checkout Terminal |
| US20200258067A1 (en) * | 2019-02-11 | 2020-08-13 | Everseen Limited | System and method for operating an sco surface area of a retail store |
| US20200302029A1 (en) * | 2016-03-30 | 2020-09-24 | Covenant Eyes, Inc. | Applications, Systems and Methods to Monitor, Filter and/or Alter Output of a Computing Device |
Family Cites Families (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8405531B2 (en) * | 2010-08-31 | 2013-03-26 | Mitsubishi Electric Research Laboratories, Inc. | Method for determining compressed state sequences |
| US10223580B2 (en) | 2013-03-26 | 2019-03-05 | Disney Enterprises, Inc. | Methods and systems for action recognition using poselet keyframes |
| CN104766038B (en) * | 2014-01-02 | 2018-05-18 | 株式会社理光 | The recognition methods of palm opening and closing movement and device |
| US9524426B2 (en) * | 2014-03-19 | 2016-12-20 | GM Global Technology Operations LLC | Multi-view human detection using semi-exhaustive search |
| US20150294143A1 (en) * | 2014-04-10 | 2015-10-15 | GM Global Technology Operations LLC | Vision based monitoring system for activity sequency validation |
| KR101720781B1 (en) * | 2015-12-21 | 2017-03-28 | 고려대학교 산학협력단 | Apparatus and method for prediction of abnormal behavior of object |
| EP3321844B1 (en) * | 2016-11-14 | 2021-04-14 | Axis AB | Action recognition in a video sequence |
-
2019
- 2019-07-01 US US16/458,288 patent/US10902247B1/en active Active
-
2020
- 2020-05-12 WO PCT/IB2020/054488 patent/WO2021001702A1/en not_active Ceased
- 2020-05-12 CN CN202080046269.7A patent/CN114008693B/en active Active
- 2020-05-12 AU AU2020298842A patent/AU2020298842B2/en active Active
- 2020-05-12 CA CA3141958A patent/CA3141958A1/en active Pending
- 2020-05-12 MX MX2021015584A patent/MX2021015584A/en unknown
- 2020-05-12 EP EP20726560.4A patent/EP3994603A1/en active Pending
- 2020-05-12 KR KR1020227001315A patent/KR102783240B1/en active Active
- 2020-05-12 JP JP2021578060A patent/JP7285973B2/en active Active
- 2020-05-12 BR BR112021024260A patent/BR112021024260A2/en not_active Application Discontinuation
-
2021
- 2021-12-01 CO CONC2021/0016435A patent/CO2021016435A2/en unknown
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20120320199A1 (en) * | 2011-06-06 | 2012-12-20 | Malay Kundu | Notification system and methods for use in retail environments |
| US20130051662A1 (en) * | 2011-08-26 | 2013-02-28 | Canon Kabushiki Kaisha | Learning apparatus, method for controlling learning apparatus, detection apparatus, method for controlling detection apparatus and storage medium |
| US20150294192A1 (en) * | 2014-04-10 | 2015-10-15 | Disney Enterprises, Inc. | Multi-level framework for object detection |
| US20150379497A1 (en) * | 2014-06-27 | 2015-12-31 | Miguel Florez | System, device, and method for self-checkout shopping |
| US20200302029A1 (en) * | 2016-03-30 | 2020-09-24 | Covenant Eyes, Inc. | Applications, Systems and Methods to Monitor, Filter and/or Alter Output of a Computing Device |
| US20180096567A1 (en) * | 2016-09-18 | 2018-04-05 | Stoplift, Inc. | Non-Scan Loss Verification at Self-Checkout Terminal |
| US20200258067A1 (en) * | 2019-02-11 | 2020-08-13 | Everseen Limited | System and method for operating an sco surface area of a retail store |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US11776319B2 (en) * | 2020-07-14 | 2023-10-03 | Fotonation Limited | Methods and systems to predict activity in a sequence of images |
| US12249184B2 (en) | 2020-07-14 | 2025-03-11 | Tobii Technologies Limited | Methods and systems to predict activity in a sequence of images |
| WO2023013879A1 (en) * | 2021-08-03 | 2023-02-09 | Samsung Electronics Co., Ltd. | A method and system for tracking at least one action of the user(s) for overcoming occlusion |
Also Published As
| Publication number | Publication date |
|---|---|
| BR112021024260A2 (en) | 2022-01-11 |
| KR102783240B1 (en) | 2025-03-17 |
| AU2020298842B2 (en) | 2023-08-17 |
| US10902247B1 (en) | 2021-01-26 |
| CO2021016435A2 (en) | 2021-12-10 |
| JP7285973B2 (en) | 2023-06-02 |
| MX2021015584A (en) | 2022-01-31 |
| WO2021001702A1 (en) | 2021-01-07 |
| CA3141958A1 (en) | 2021-01-07 |
| KR20220017506A (en) | 2022-02-11 |
| EP3994603A1 (en) | 2022-05-11 |
| CN114008693B (en) | 2025-07-25 |
| CN114008693A (en) | 2022-02-01 |
| JP2022540069A (en) | 2022-09-14 |
| AU2020298842A1 (en) | 2021-12-16 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Ehatisham-Ul-Haq et al. | C2FHAR: Coarse-to-fine human activity recognition with behavioral context modeling using smart inertial sensors | |
| Jyoti et al. | Expression empowered residen network for facial action unit detection | |
| Tereikovska et al. | Recognition of emotions by facial Geometry using a capsule neural network | |
| US20210232855A1 (en) | Movement state recognition model training device, movement state recognition device, methods and programs therefor | |
| US10902247B1 (en) | Quantized transition change detection for activity recognition | |
| KR102736894B1 (en) | Systems and methods for determining actions performed by objects within an image | |
| Sharif et al. | Human gait recognition using deep learning: A comprehensive review | |
| Angel et al. | Faster Region Convolutional Neural Network (FRCNN) Based Facial Emotion Recognition. | |
| Ezzeldin et al. | Survey on multimodal complex human activity recognition | |
| Kadhim et al. | Enhanced dynamic hand gesture recognition for finger disabilities using deep learning and an optimized Otsu threshold method | |
| Gaikwad et al. | Fusion of vision based features for human activity recognition | |
| Huynh-The et al. | Visualizing inertial data for wearable sensor based daily life activity recognition using convolutional neural network | |
| Ansari et al. | Using postural data and recurrent learning to monitor shoplifting activities in megastores | |
| Jo et al. | Facial Emotion Recognition Using Canny Edge Detection Operator and Histogram of Oriented Gradients | |
| Hashem et al. | Human gait identification system based on transfer learning | |
| Pandey et al. | An efficient algorithm for sign language recognition | |
| Ansari et al. | Decoding Human Activities: Algorithms, Frameworks, and Challenges in Recognition Systems | |
| Moharkan | Comprehensive Survey on Body Language Decoder | |
| Chandragiri et al. | Recognizing human actions in video using motion history image and deep learning | |
| Sánchez-González | by Computer Vision | |
| Merikapudi et al. | Domain human recognition techniques using deep learning | |
| Harish et al. | Emotion Recognition Model Based on Visual Cues and Explainable AI Using Facial Expression Video | |
| Dash et al. | 16 usage of convolutional neural networks in real‐time facial emotion detection | |
| Ninh | Human activity recognition based on IMU sensors using a combination of convolutional neural network and multi-head attention | |
| Sheril Angel et al. | Faster Region Convolutional Neural Network (FRCNN) Based Facial Emotion Recognition. |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| AS | Assignment |
Owner name: EVERSEEN LIMITED, IRELAND Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:PESCARU, DAN;CERNAZANU-GLAVAN, COSMIN;GUI, VASILE;REEL/FRAME:049649/0796 Effective date: 20190604 |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
| MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |