CN111652043A

CN111652043A - Object state identification method and device, image acquisition equipment and storage medium

Info

Publication number: CN111652043A
Application number: CN202010295399.6A
Authority: CN
Inventors: 谢存煌; 杨蒙昭
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2020-04-15
Filing date: 2020-04-15
Publication date: 2020-09-11

Abstract

The application discloses a method and a device for identifying object states, image acquisition equipment and a storage medium, wherein the method comprises the steps of acquiring video frames for analysis; determining a plurality of objects according to the video frame, and obtaining a tracking image sequence of each object; determining a state response image sequence contained in the tracking image sequence, and selecting a plurality of frames of images from the state response image sequence as images to be distinguished based on time sequence relevance; and performing state recognition on the image to be distinguished to obtain a state recognition result of each object. The method has the advantages that the state response image sequence of the object is quickly selected in the video frame, and then the plurality of key frames are selected for state recognition based on the time sequence relevance.

Description

Object state identification method and device, image acquisition equipment and storage medium

Technical Field

The application relates to the field of image recognition, in particular to an object state recognition method and device, image acquisition equipment and a storage medium.

Background

With the rapid development of science and technology, many living places are completely unmanned, such as unmanned stores, cafeterias and the like. The intelligent stores save a large amount of labor force and bring great convenience to the lives of people, but because no human carries out direct management, great challenges are brought to the operation and safety of the intelligent stores. In a cafeteria, the kitchen hygiene condition is a problem which is very concerned by merchants and customers, but the merchants cannot monitor the behavior of kitchen staff in real time in the actual working process, and the situations that the kitchen staff smoke and make calls without supervision may occur; in an unmanned store, for example, there is a possibility that a fire is not found in time, which is undoubtedly a safety hazard to both merchants and customers.

For such a situation, some smart cameras are usually installed in smart stores, and when an abnormal situation is detected, an alarm is given to remind people. However, the existing intelligent camera has many disadvantages, such as single detection mode, and only analyzes the current frame acquired by the camera.

Disclosure of Invention

In view of the above, the present application is made to provide an object state recognition method, apparatus, image pickup device, and storage medium that overcome or at least partially solve the above-mentioned problems.

According to an aspect of the present application, there is provided an object state recognition method, including:

acquiring a video frame for analysis;

determining a plurality of objects according to the video frame, and obtaining a tracking image sequence of each object;

determining a state response image sequence contained in the tracking image sequence, and selecting a plurality of frames of images from the state response image sequence as images to be distinguished based on time sequence relevance;

and performing state recognition on the image to be distinguished to obtain a state recognition result of each object.

Optionally, in the method, determining a plurality of objects according to the video frame, and obtaining a tracking image sequence of each object includes:

generating a detection image of each video frame in an image size reduction mode;

determining a first object region included in the detection image;

cutting out a second object area corresponding to the first object area from the video frame as a tracking image;

and determining the object corresponding to each tracking image according to the tracking features extracted from the tracking images, thereby obtaining the tracking image sequence of each object.

Optionally, in the above method, determining a state response image sequence included in the tracking image sequence includes:

and determining the state response image sequence contained in the tracking image sequence under the condition that the tracking image sequence is matched with the preset state identification interval.

and respectively carrying out state recognition on each tracking image in the tracking image sequence based on the state rough recognition model, and tracking a state response image in the images according to the state recognition result.

Optionally, in the method, selecting a plurality of frame images from the state response image sequence as the image to be distinguished based on the time sequence relevance includes:

dividing the state response image sequence into a plurality of subsequences according to the frame sequence interval;

determining the discrimination weight of each subsequence according to a preset rule;

and determining the image to be distinguished according to the distinguishing weight and the response confidence of each state response image in the subsequence.

adopting a Gaussian distribution function to determine and calculate the discrimination weight of each state response image according to the response confidence of the state response image;

and selecting an image to be distinguished from the state response image sequence based on the distinguishing weight.

Optionally, in the method, performing state recognition on the image to be distinguished to obtain a state recognition result of each object includes:

carrying out state recognition on the image to be distinguished through a state fine recognition model built in the image acquisition equipment to obtain a state recognition result of each object;

alternatively, the first and second electrodes may be,

and uploading the image to be distinguished to a server so that the server performs state recognition on the image to be distinguished and receiving the state recognition result of each object returned by the server.

According to another aspect of the present application, there is provided an object state recognition apparatus including:

an acquisition unit configured to acquire a video frame for analysis;

the data processing unit is used for determining a plurality of objects according to the video frame and obtaining a tracking image sequence of each object; the tracking image sequence is used for determining a state response image sequence contained in the tracking image sequence and selecting a plurality of frames of images from the state response image sequence as images to be distinguished based on time sequence relevance;

and the identification unit is used for carrying out state identification on the image to be distinguished to obtain the state identification result of each object.

Optionally, in the apparatus, the data processing unit is configured to generate a detection image of each video frame in an image size reduction manner; determining a first object region included in the detection image; cutting out a second object area corresponding to the first object area from the video frame as a tracking image; and determining the object corresponding to each tracking image according to the tracking features extracted from the tracking images, thereby obtaining the tracking image sequence of each object.

Optionally, in the above apparatus, the data processing unit is configured to determine a state response image sequence included in the tracking image sequence when the tracking image sequence matches a preset state identification interval.

Optionally, in the apparatus, the data processing unit is configured to perform state recognition on each tracking image in the tracking image sequence based on the state rough recognition model, and track a state response image in the tracking image according to a state recognition result.

Optionally, in the apparatus, the data processing unit is configured to divide the state response image sequence into a plurality of sub-sequences according to a frame sequence interval; determining the discrimination weight of each subsequence according to a preset rule; and determining the image to be distinguished according to the distinguishing weight and the response confidence of each state response image in the subsequence.

Optionally, in the apparatus, the data processing unit is configured to calculate the decision weight of each state response image by using gaussian distribution function determination and the response confidence of the state response image; and selecting an image to be distinguished from the state response image sequence based on the distinguishing weight.

Optionally, in the apparatus, the identification unit is configured to perform state identification on the image to be distinguished through a state fine identification model built in the image acquisition device, so as to obtain a state identification result of each object; or uploading the image to be distinguished to a server so that the server performs state recognition on the image to be distinguished and receiving the state recognition result of each object returned by the server.

In accordance with still another aspect of the present application, there is provided an image pickup apparatus, wherein the image pickup apparatus includes: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform a method as any one of above.

According to yet another aspect of the application, a computer readable storage medium is provided, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the method as any of the above.

According to the technical scheme, the video frames for analysis are obtained; determining a plurality of objects according to the video frame, and obtaining a tracking image sequence of each object; determining a state response image sequence contained in the tracking image sequence, and selecting a plurality of frames of images from the state response image sequence as images to be distinguished based on time sequence relevance; and performing state recognition on the image to be distinguished to obtain a state recognition result of each object. The method has the advantages that the state response image sequence of the object is quickly selected in the video frame, and then the plurality of key frames are selected for state recognition based on the time sequence relevance.

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 shows a schematic flow diagram of an object state identification method according to an embodiment of the present application;

FIG. 2 shows a schematic flow diagram of an object state identification method according to another embodiment of the present application;

fig. 3 is a schematic structural diagram of an object state recognition apparatus according to an embodiment of the present application;

FIG. 4 shows a schematic structural diagram of an image acquisition device according to an embodiment of the present application;

FIG. 5 shows a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Fig. 1 shows a schematic flowchart of an object state identification method according to an embodiment of the present application, where the method includes:

step S110, a video frame for analysis is acquired.

At present, more and more intelligent stores appear in daily life of people, such as unmanned shopping malls, self-service restaurants without service staff and the like, the intelligent stores bring convenience to the life of people, and meanwhile, potential risks exist, such as some inappropriate behaviors of customers in the unmanned shopping malls, such as damage to articles and the like, and some inappropriate behaviors of kitchen staff of the self-service restaurants, such as smoking and calling, which all provide challenges for the management of merchants.

Firstly, acquiring a video frame for analysis, which can be realized by an acquisition module of a camera, and acquiring a video segment for analysis, wherein the video segment is composed of a sequence of still images, and each still image is called a video frame.

It should be noted here that, the loss of a certain frame or a certain number of frames in a video frame due to the fact that the camera is accidentally touched or jammed does not affect the implementation of the present application, and in this case, the lost frames can be directly ignored.

Step S120, determining a plurality of objects according to the video frame, and obtaining a tracking image sequence of each object.

Determining a plurality of objects in a video frame for analysis, wherein the object can be a natural person, an object or even an environment; multiple objects may be determined to be analyzed simultaneously, e.g., 3 objects if 3 people appear in a video frame.

After an object is determined, a tracking image sequence of the object is obtained through visual target tracking (tracker), wherein each frame image in the tracking image sequence contains at least part of the object, the tracking image sequence can be one or several frames of a video frame (which can be a complete video frame or a cut area corresponding to the object in the video frame), when the tracking image sequence is several frames of the video frame, the states of the several frames in the video frame can be continuous or discontinuous, such as an artificial analysis object, the object stands beside a table, suddenly squats down, then stands up again, when several frames of images in the video frame shot by a camera contain the object, then one or two frames of the object disappear, the object appears in the subsequent video frame, and when the visual target is tracked, the frame images containing the object can be combined according to the recorded time sequence, a sequence of tracked images of the object is formed, omitting a two-frame view of the object crouching under the table.

The visual target tracking method is an important direction of computer vision, and the target tracking algorithm in the application can be single target tracking or multi-target tracking. The single target tracking task is to predict the size and the position of a target in a subsequent frame under the condition of giving the size and the position of the target of an initial frame of a certain video sequence; the main task of Multi Object Tracking (MOT) is to give an image sequence, find moving objects in the image sequence, correspond moving objects in different frames one to one, and then give motion tracks of different objects. The single target tracking can be realized through apparent modeling or motion modeling of the target so as to process the problems of illumination, deformation, shielding and the like; compared with single-target tracking, multi-target tracking has the advantages that multiple analysis objects can be tracked simultaneously, but the processing process of multi-target tracking can also have the problems of frequent shielding, unknown track starting and stopping time, too small target, similar appearance, interaction between targets, low frame rate and the like besides the problem of single-target tracking, so that multi-target tracking is more complicated.

The visual target tracking method may adopt any one or more of the existing methods, including but not limited to a generation (generative) model method and a discriminant (discriminant) model method. The method for generating the model models a target area in a current frame, and finding an area most similar to the model in a next frame is a prediction position, such as kalman filtering, particle filtering and the like, for example, it is known from the current frame that 80% of the target area is red and 20% is green, and then a search algorithm finds an area most suitable for the color ratio everywhere in the next frame.

The classical architecture of the discrimination model method is image feature combined machine learning, the flow is that the current frame takes a target area as a positive sample and a background area as a negative sample, a machine learning method trains a classifier, and the next frame uses the trained classifier to find the optimal area. The classic discrimination method comprises a Structure Tracking algorithm and TLD (Tracking-Learning-Detection, which has no uniform Chinese name temporarily in the industry); at present, Correlation Filter (CF) is researched more, correlation filter tracking algorithms include but are not limited to CSK, KCF/DCF, CN (unified Chinese name in the industry) and the like, and because the correlation filter tracking algorithms are high in calculation speed and high in accuracy, the application recommends the correlation filter tracking algorithms as an optimal method.

Step S130, determining a state response image sequence contained in the tracking image sequence, and selecting a plurality of frames of images from the state response image sequence as images to be distinguished based on time sequence relevance.

Each tracking image sequence corresponds to an object to be analyzed, each frame image in the tracking image sequence contains at least part of the object, and the object is subjected to preliminary state detection, if the object is a natural person, under the condition that the object is confirmed to have no abnormal behavior, the state of the object corresponding to the tracking image sequence is confirmed to be normal, and the tracking image sequence does not need to be subjected to next processing; when the object is confirmed to have abnormal behaviors, such as destroying articles, smoking and the like, one or more frames of the tracking image sequence or the object with a certain specific motion characteristic are taken as a state response image sequence.

For the determination method of the abnormal behavior, one or more of the prior art may be adopted, for example, whether the behavior of the object in the image belongs to the abnormal behavior may be determined through a preset abnormal behavior rule, where the preset abnormal behavior rule may be used to characterize the association between the behavior of the analyzed object and the specific behavior. Specifically, behavior feature recognition may be performed on the behavior of the object, and if the behavior is associated with a specific behavior, the behavior of the object may be considered as an abnormal behavior, and the specific behavior includes, but is not limited to, smoking, making a call, stealing an article, and the like. Further, the behavior feature recognition is carried out on the behavior of the object by matching the action feature in the behavior of the object with the action feature of the specific behavior, and whether the behavior of the object is associated with the specific behavior is determined according to the result of the behavior feature recognition. For example, if a certain behavior includes an action feature of putting a hand into the mouth, and if the action feature is included in the behavior of a certain object, the behavior of the certain object is considered to be associated with the certain behavior, and the state of the certain object is considered to be abnormal. In order to save computing resources and improve computing efficiency, the application recommends using a lightweight algorithm, that is, only the state of the object can be preliminarily identified, for example, only the state of the object in the tracking image sequence or each frame image thereof needs to be judged to be normal or abnormal, but what the abnormal behavior is not specifically judged.

After a state response image sequence of a certain object is determined, a plurality of frame images are selected from the state response image sequence as images to be distinguished based on time sequence relevance so as to carry out final state identification. In the state identification, the time sequence relevance is a non-negligible information characteristic, and the time context information can be used for bringing great gain to the state identification, so that the accuracy of the state identification is obviously improved.

In the embodiment, one or more key frames with particularly prominent abnormal action characteristics are screened out from the state response image sequence by utilizing the time sequence relevance and serve as the images to be distinguished, so that the calculation resources can be saved, and the calculation efficiency is improved; the accuracy of recognition can be obviously improved, for example, for the behavior of smoking of the analyzed object, one frame or a plurality of continuous frames of the analyzed object with hands at the mouth can be screened out to be used as the image to be distinguished.

The time sequence relevance analysis method can select one or a combination of several in the prior art, for example, the extracted Histogram of Oriented Gradient (HOG) is used as input, a higher-level abstract feature is obtained through Deep Belief Network (DBN) training, a human body region is identified by using the trained DBN, and finally, whether abnormal behaviors exist in the analyzed object is judged by using the time sequence relevance feature of the centroid of the region, specifically, the acceleration of the centroid between two frames can be calculated, if the acceleration change is larger than a set threshold value, the possibility that the abnormal behaviors exist in the analyzed object in the two frames of images is very high, the two frames are used as images to be judged, for example, a kitchen worker of a cafeteria can have behaviors of spitting anywhere, and in the spitting process, along with the action of bending down, in the process, the position of its centroid changes rapidly.

And step S140, performing state recognition on the image to be distinguished to obtain the state recognition result of each object.

As mentioned above, after the image to be determined is determined, the fine attribute of the image to be determined may be identified, and the method may refer to the method for determining the state response image sequence included in the tracking image sequence in the above steps, but the requirement on accuracy may be stricter. The state of each object may be normal or abnormal, an alarm system can be set in advance, and when an abnormal condition is met, a sharp sound can be emitted to perform early warning; also can be connected with intelligent extinguishing device, when meetting the conflagration and taking place, can start extinguishing device in time put out a fire.

The method shown in fig. 1 shows that the state response image sequence of the object is quickly selected from the video frames, and then a plurality of key frames are selected for state recognition based on the time sequence relevance.

In an embodiment of the present application, in the method, determining a plurality of objects according to the video frame, and obtaining a tracking image sequence of each object includes: generating a detection image of each video frame in an image size reduction mode; determining a first object region included in the detection image; cutting out a second object area corresponding to the first object area from the video frame as a tracking image; and determining the object corresponding to each tracking image according to the tracking features extracted from the tracking images, thereby obtaining the tracking image sequence of each object.

For example, if the monitoring camera acquisition module obtains 720p video frame data, the 720p video frame data is first reduced to 300 × 300PPI (pixels per inch) small-sized image as the detection image of each video frame.

If the first object is a natural person, the first object area may be a rectangular area containing the natural person, or a human-shaped area obtained by using human-shaped detection means. After the first object region is obtained, a corresponding second object region may be cut out from each frame image of the video frame corresponding to the detection image based on the first object region information, the second object region including the same object as the first object region, the second object region being rectangular in the case where the first object region is rectangular, and the second object region being a human-shaped region in the case where the first object region is a human-shaped region. The second object region is a region of interest (ROI) of the present embodiment as a tracking image. If a plurality of objects to be analyzed, such as a plurality of natural persons, exist in the same detection image, all object areas corresponding to all the objects are determined.

After the tracking images are confirmed, the tracking features of the objects in each tracking image are extracted, so that the objects corresponding to each tracking image are determined, and the tracking images of each object are collected according to the recorded time sequence to form a tracking image sequence of each object. If a plurality of objects are contained in one frame of tracking image, the same frame of tracking image appears in a sequence of tracking images of different objects. The extraction of the tracking features may employ one or more of the prior art techniques, including but not limited to: the Benchmark algorithm (currently, there is no unified Chinese in the industry), the CSK algorithm (currently, there is no unified Chinese in the industry), and the like.

The object confirmation method greatly saves the data processing amount and can quickly and accurately lock a plurality of objects to be analyzed.

In one embodiment of the present application, in the above method, determining the state response image sequence included in the tracking image sequence includes: and determining the state response image sequence contained in the tracking image sequence under the condition that the tracking image sequence is matched with the preset state identification interval.

A certain motion of the object under analysis is often not done instantaneously, so the sequence of tracking images is usually not one frame, but several, tens or even tens of frames. Therefore, a state identification interval can be preset in advance according to different application scenes, the state identification interval can be determined according to the frame rate of the image acquisition device, if the frame rate of the image acquisition device is 20 frames/s, the state identification interval can be set to 10 frames, namely the duration of the tracking image sequence is 0.5s, the tracking image sequence which is smaller than the state identification interval cannot be used for carrying out a certain abnormal action or behavior because the time is too short, and therefore the tracking image sequence which is smaller than the state identification interval can be directly judged to contain no state response image sequence.

According to the method and the device, the tracking image sequence without the state response image sequence can be rapidly screened out by rejecting the tracking image sequence smaller than the state identification interval, so that the calculated amount is greatly reduced, and the calculation efficiency is remarkably improved.

In one embodiment of the present application, in the above method, determining the state response image sequence included in the tracking image sequence includes: and respectively carrying out state recognition on each tracking image in the tracking image sequence based on the state rough recognition model, and tracking a state response image in the images according to the state recognition result.

When determining a state response image sequence from a tracking image sequence, in order to further increase the calculation speed, the present embodiment recommends using a state coarse recognition model for recognition, where the state coarse recognition model is still a state recognition method known in the art, such as may be implemented based on a lightweight deep neural network, but the accuracy is lower than that of fine attribute recognition, such as the state coarse recognition model can only recognize whether the state of an object to be analyzed is normal or abnormal, and cannot recognize what the abnormal behavior of the object to be analyzed is specifically, such as smoking, spitting, and the like.

The state rough identification model in the embodiment further improves the calculation efficiency and reduces the workload of the calculation equipment.

In an embodiment of the present application, in the method, selecting a plurality of frames of images from the state response image sequence as the images to be discriminated based on the time-series correlation includes: dividing the state response image sequence into a plurality of subsequences according to the frame sequence interval; determining the discrimination weight of each subsequence according to a preset rule; and determining the image to be distinguished according to the distinguishing weight and the response confidence of each state response image in the subsequence.

In the embodiment, one or more frames are selected from the state response image sequence to be used as the image to be judged for the next judgment. Firstly, dividing a state response image sequence into a plurality of subsequences, for example, a frame sequence interval can be preset, for example, 5 frames, then dividing a state response image sequence into a plurality of subsequences, wherein each subsequence contains 5 frames of images, and if the number of the images in the state response image sequence is not a multiple of 5, the rest frames form a subsequence; the division can also be performed according to the result of the state rough identification model, and if the judgment result of the state rough identification model on each frame image of the state response image sequence includes abnormal and normal, which are respectively represented by 1 and 0, the following sequence can be represented: ID p 1: 000010001101111, we can perform partitioning according to the rough judgment result, for example, partition into 3 segments, where the 5 th frame is a sub-sequence, the 9 th frame to the 10 th frame are sub-sequences, and the 12 th frame to the 15 th frame are sub-sequences, and further, in order to ensure robustness, a tolerance parameter q is added during sub-sequence partitioning, where q is greater than or equal to 1, to allow a break point to exist in a frame inside the sub-sequence, that is, if the tolerance q is greater than 1, as exemplified by the sequence ID p1, the sub-sequence is: the 5 th frame is a subsequence, and the 9 th frame to the 15 th frame are subsequences.

The discrimination weight of each sub-sequence is determined according to a preset rule, the preset rule can be specifically set according to different scenes, and the second division manner is taken as an example, and the sub-sequences can be given based on the length of the sub-sequences, that is, the first segment weight is 1/7, the second segment weight is 2/7, and the third segment weight is 4/7.

After the weighting is finished, weighting each frame in each sub-sequence according to the weight, then selecting the optimal frame by combining the response confidence coefficient of each state response image, wherein the optimal frame refers to one frame with the maximum weight value and abnormal object state in the image or a plurality of frames with the weight value larger than a certain threshold value, and determining the frame or the frames as the image to be distinguished.

In an embodiment of the present application, in the method, selecting a plurality of frames of images from the state response image sequence as the images to be discriminated based on the time-series correlation includes: adopting a Gaussian distribution function to determine and calculate the discrimination weight of each state response image according to the response confidence of the state response image; and selecting an image to be distinguished from the state response image sequence based on the distinguishing weight.

In this embodiment, a gaussian distribution function is used to determine and calculate the discrimination weight of each state response image, where the gaussian distribution is also called normal distribution, and if a random variable follows a probability distribution with a position parameter μ and a scale parameter σ, and the probability density function is formula (1):

in the implementation, a gaussian distribution g with a mean value, i.e., a position parameter, of μ and a variance, i.e., a scale parameter, of σ is selected as a weight function, a confidence interval range of [ -a, a ] is obtained through calculation, and a weighted score in the range of [ max (F-a,0), min (F + a, F + N) ] is calculated for each frame image in the state response image sequence by taking the gaussian function as a coefficient and is used as a discrimination weight of each frame image in the state response image sequence; and selecting one frame with the maximum discrimination weight or a plurality of frames with the weight more than a certain threshold value as the image to be discriminated.

The implementation of selecting a plurality of frame images from the state response image sequence as the image to be distinguished based on the time sequence relevance can be realized in the two modes, but can be selected according to hardware equipment and scene requirements.

In an embodiment of the present application, in the method, performing state recognition on the image to be distinguished, and obtaining a state recognition result of each object includes: carrying out state recognition on the image to be distinguished through a state fine recognition model built in the image acquisition equipment to obtain a state recognition result of each object; or uploading the image to be distinguished to a server so that the server performs state recognition on the image to be distinguished and receiving the state recognition result of each object returned by the server.

In the application, all devices for realizing the method can be integrated inside the image acquisition device, specifically, all devices can be integrated inside the intelligent camera, in this case, the manufacturing cost of the intelligent camera is expensive and limited to the size and the quality of the intelligent camera, and the processing capacity of the computing device integrated in the intelligent camera is limited, so that the computing speed and the processing capacity of the intelligent camera are limited.

In the application, the image acquisition equipment can be connected with the server through radio frequency, and when fine attribute identification is carried out, the image to be identified is uploaded to the server, so that the server carries out state identification on the image to be identified and receives state identification results of objects returned by the server.

The foregoing embodiments may be implemented individually or in combination, and specifically, fig. 2 shows a flowchart of an object state identification method according to another embodiment of the present application.

Firstly, acquiring a video frame for analysis, then reducing the size of each image in the video frame, and generating a detection image of each video frame; determining a first object region included in the detection image; cutting out a second object area corresponding to the first object area from the video frame as a tracking image; and determining the A object corresponding to the tracking image according to the tracking features extracted from the tracking image, thereby obtaining a tracking image sequence of the A object.

And judging whether the tracking image sequence is matched with the preset state identification interval or not, if not, not processing the tracking image sequence, and if so, carrying out the next processing.

Judging whether the state of the tracking image sequence of the object A is abnormal or not on the basis of a state rough identification model for the tracking image sequence matched with a preset state identification interval, and if not, not needing processing; if the abnormality exists, dividing the state response image sequence of the object A into a plurality of subsequences; and giving discrimination weights to the subsequences, and determining the image to be discriminated according to the discrimination weights and the response confidence degrees of the state response images in the subsequences.

After the image to be distinguished is determined, the image to be distinguished is uploaded to a server, so that the server performs state recognition on the image to be distinguished, and receives a state recognition result of the object A returned by the server.

Fig. 3 shows an object state recognition apparatus according to an embodiment of the present application, and as described in fig. 3, the object state recognition apparatus 300 includes:

an obtaining unit 310 is configured to obtain a video frame for analysis.

It should be noted here that, the loss of a certain frame or a certain number of frames in a video frame due to the fact that the camera is accidentally touched or jammed does not affect the implementation of the present application, and in this case, the lost frames can be directly ignored. A data processing unit 320, configured to determine a plurality of objects according to the video frame, and obtain a tracking image sequence of each object; the tracking image sequence is used for determining a state response image sequence contained in the tracking image sequence and selecting a plurality of frames of images from the state response image sequence as images to be distinguished based on time sequence relevance.

After a state response image sequence of a certain object is determined, a plurality of frame images are selected from the state response image sequence as images to be distinguished based on time sequence relevance so as to carry out final state identification. In the state identification, the relevance of the time sequence is a non-negligible information characteristic, and the utilization of the time context information can bring great gain to the state identification, thereby obviously improving the accuracy of the state identification.

The identifying unit 330 is configured to perform state identification on the image to be distinguished to obtain a state identification result of each object.

In an embodiment of the present application, in the above apparatus, the data processing unit 320 is configured to generate a detection image of each video frame in an image size reduction manner; determining a first object region included in the detection image; cutting out a second object area corresponding to the first object area from the video frame as a tracking image; and determining the object corresponding to each tracking image according to the tracking features extracted from the tracking images, thereby obtaining the tracking image sequence of each object.

In an embodiment of the present application, in the above apparatus, the data processing unit 320 is configured to determine the state response image sequence included in the tracking image sequence if the tracking image sequence matches a preset state identification interval.

In an embodiment of the present application, in the above apparatus, the data processing unit 320 is configured to perform state recognition on each tracking image in the tracking image sequence based on the state coarse recognition model, and track the state response image in the tracking image according to the state recognition result.

In an embodiment of the present application, in the above apparatus, the data processing unit 320 is configured to divide the state response image sequence into a plurality of subsequences according to a frame sequence interval; determining the discrimination weight of each subsequence according to a preset rule; and determining the image to be distinguished according to the distinguishing weight and the response confidence of each state response image in the subsequence.

In an embodiment of the present application, in the above apparatus, the data processing unit 320 is configured to calculate the discrimination weight of each state response image by using the gaussian distribution function determination and the response confidence of the state response image; and selecting an image to be distinguished from the state response image sequence based on the distinguishing weight.

In an embodiment of the present application, in the above apparatus, the identifying unit 330 is configured to perform state identification on an image to be distinguished through a state fine identification model built in an image capturing device, so as to obtain a state identification result of each object; or uploading the image to be distinguished to a server so that the server performs state recognition on the image to be distinguished and receiving the state recognition result of each object returned by the server.

It should be noted that the object state identification devices in the foregoing embodiments can be respectively used for executing the object state identification methods in the foregoing embodiments, and therefore, detailed description thereof is omitted.

It should be noted that:

the algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may be used with the teachings herein. The required structure for constructing such a device will be apparent from the description above. In addition, this application is not directed to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present application as described herein, and any descriptions of specific languages are provided above to disclose the best modes of the present application.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the application and aiding in the understanding of one or more of the various application aspects. However, the disclosed method should not be interpreted as reflecting an intention that: this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains. Rather, as the following claims reflect, application is directed to less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It will be appreciated by those skilled in the art that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in an object state recognition apparatus according to embodiments of the present application. The present application may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present application may be stored on a computer readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

For example, fig. 4 shows a schematic structural diagram of an image acquisition apparatus according to an embodiment of the present application. The image acquisition apparatus 400 comprises a processor 410 and a memory 420 arranged to store computer executable instructions (computer readable program code). The memory 420 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. The memory 420 has a storage space 430 storing computer readable program code 431 for performing any of the method steps described above. For example, the storage space 430 for storing the computer readable program code may include respective computer readable program codes 431 for respectively implementing various steps in the above method. The computer readable program code 431 can be read from or written to one or more computer program products. These computer program products comprise a program code carrier such as a hard disk, a Compact Disc (CD), a memory card or a floppy disk. Such a computer program product is typically a computer readable storage medium such as described in fig. 5. FIG. 5 shows a schematic diagram of a computer-readable storage medium according to an embodiment of the present application. The computer readable storage medium 500 stores computer readable program code 431 for performing the steps of the method according to the present application, which is readable by the processor 410 of the image acquisition apparatus 400, which computer readable program code 431, when executed by the image acquisition apparatus 400, causes the image acquisition apparatus 400 to perform the steps of the method described above, in particular the computer readable program code 431 stored by the computer readable storage medium may perform the method shown in any of the embodiments described above. The computer readable program code 431 may be compressed in a suitable form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. An object state identification method, comprising:

acquiring a video frame for analysis;

2. The method of claim 1, wherein determining a number of objects from the video frames and deriving a sequence of tracking images for each object comprises:

determining a first object region included in the detection image;

3. The method of claim 1, wherein determining the sequence of status response images contained in the sequence of tracking images comprises:

and under the condition that the tracking image sequence is matched with a preset state identification interval, determining a state response image sequence contained in the tracking image sequence.

4. The method of claim 1, wherein determining the sequence of status response images contained in the sequence of tracking images comprises:

5. The method according to claim 1, wherein the selecting a number of frames of images from the state response image sequence as images to be distinguished based on time-series correlation comprises:

dividing the state response image sequence into a plurality of subsequences according to a frame sequence interval;

and determining an image to be distinguished according to the distinguishing weight and the response confidence of each state response image in the subsequence.

6. The method according to claim 1, wherein the selecting a number of frames of images from the state response image sequence as images to be distinguished based on time-series correlation comprises:

and selecting an image to be distinguished from the state response image sequence based on distinguishing weight.

7. The method according to any one of claims 1 to 6, wherein the performing state recognition on the image to be distinguished to obtain a state recognition result of each object comprises:

alternatively, the first and second electrodes may be,

and uploading the image to be distinguished to a server so that the server performs state identification on the image to be distinguished and receives the state identification result of each object returned by the server.

8. An object state recognition apparatus, characterized in that the apparatus comprises:

an acquisition unit configured to acquire a video frame for analysis;

9. An image capturing apparatus, wherein the image capturing apparatus comprises: a processor; and a memory arranged to store computer-executable instructions that, when executed, cause the processor to perform the method of any one of claims 1-7.

10. A computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the method of any of claims 1-7.