CN112818933A - Target object identification processing method, device, equipment and medium - Google Patents

Target object identification processing method, device, equipment and medium Download PDF

Info

Publication number
CN112818933A
CN112818933A CN202110221208.6A CN202110221208A CN112818933A CN 112818933 A CN112818933 A CN 112818933A CN 202110221208 A CN202110221208 A CN 202110221208A CN 112818933 A CN112818933 A CN 112818933A
Authority
CN
China
Prior art keywords
image
target object
target
video stream
resolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110221208.6A
Other languages
Chinese (zh)
Inventor
赵珂
赵代平
李展鹏
孙德乾
胡超凡
孔祥晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sensetime Technology Development Co Ltd
Original Assignee
Beijing Sensetime Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sensetime Technology Development Co Ltd filed Critical Beijing Sensetime Technology Development Co Ltd
Priority to CN202110221208.6A priority Critical patent/CN112818933A/en
Publication of CN112818933A publication Critical patent/CN112818933A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the specification provides a target object identification processing method, a target object identification processing device, target object identification processing equipment and a target object identification processing medium. The method comprises the following steps: acquiring a first image to be identified; determining a target image area including a target object in a first image according to position information of the target object in a second image, wherein the first image and the second image correspond to the same shooting scene, and the resolution of the second image is lower than that of the first image; and identifying the target object based on the target image area. The low-resolution image is adopted during the whole image detection, so that the speed of position detection can be improved, and the accuracy and precision of the identification result can be ensured by adopting the high-resolution image during the identification of attributes, states and the like.

Description

Target object identification processing method, device, equipment and medium
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a device, and a medium for identifying and processing a target object.
Background
With the development of imaging technology, most image acquisition devices can obtain images with higher resolution by shooting. In the image processing process, it is usually necessary to perform full-image detection on the image to determine the position of the target object in the image, and then perform further processing on the image region corresponding to the target object, which consumes a large amount of computing resources and takes a long time. Thus, there is a need to provide a solution that can improve the recognition process of a target object in an image.
Disclosure of Invention
The embodiment of the disclosure provides a target object identification processing method, a target object identification processing device, target object identification processing equipment and a target object identification processing medium.
According to a first aspect of the embodiments of the present disclosure, there is provided a method of target object identification processing, the method including:
acquiring a first image to be identified;
determining a target image area including a target object in a first image according to position information of the target object in a second image, wherein the first image and the second image correspond to the same shooting scene, and the resolution of the second image is lower than that of the first image;
and identifying the target object based on the target image area.
In some embodiments, the first image and the second image are captured at a time interval not exceeding a preset threshold.
In some embodiments, the first image is obtained from a video frame of a first video stream, the second image is obtained by downsampling the video frame of the first video stream, or the second image is obtained from a video frame of a second video stream.
In some embodiments, the second image is captured simultaneously with or obtained by down-sampling a specified video frame in the first video stream, and the first image includes a plurality of video frames in the first video stream whose inter-frame spacing from the specified video frame is smaller than a preset threshold.
In some embodiments, the first video stream and the second video stream are acquired through video channels with different resolutions of the same camera device, or
The first video stream and the second video stream are acquired through two camera devices with different resolutions respectively.
In some embodiments, the method further comprises:
in response to the target object not being identified in the target image region, updating the second image based on a newly captured video frame in the first video stream or a newly captured video frame in the second video stream to update the location information.
In some embodiments, the determining, in the first image, a target image region including the target object according to the position information of the target object in the second image includes:
in response to the coordinate systems of the first image and the second image being different, determining a mapping relationship between the coordinate system of the first image and the coordinate system of the second image;
determining a target image area including the target object in the first image based on the mapping relationship and the position information.
In some embodiments, the identifying the target object based on the target image region includes:
extracting the features of the target image area;
and identifying attribute information and/or state information of the target object based on the extracted features.
In some embodiments, the method further comprises:
and generating a report for describing the state of the target object in a specific scene based on the attribute information and/or the state information, and displaying the report to a user.
In some embodiments, the determining, in the first image, a target image region including the target object according to the position information of the target object in the second image includes:
determining a plurality of pixel points in the first image based on the position information of the boundary points and a scaling, the scaling being determined based on a resolution of the first image and a resolution of the second image;
and determining the target image area in the first image by taking the plurality of pixel points as boundary points.
According to a second aspect of the embodiments of the present disclosure, there is provided an apparatus for identifying and processing a target object, the apparatus including:
the acquisition module is used for acquiring a first image to be identified;
the processing module is used for determining a target image area comprising a target object in a first image according to position information of the target object in a second image, wherein the first image and the second image correspond to the same shooting scene, and the resolution of the second image is lower than that of the first image;
and the identification module is used for identifying the target object based on the target image area.
In some embodiments, the first image and the second image are captured at a time interval not exceeding a preset threshold.
In some embodiments, the first image is obtained from a video frame of a first video stream, the second image is obtained by downsampling the video frame of the first video stream, or the second image is obtained from a video frame of a second video stream.
In some embodiments, the second image is captured simultaneously with or obtained by down-sampling a specified video frame in the first video stream, and the first image includes a plurality of video frames in the first video stream whose inter-frame spacing from the specified video frame is smaller than a preset threshold.
In some embodiments, the first video stream and the second video stream are acquired through video channels with different resolutions of the same camera device, or
The first video stream and the second video stream are acquired through two camera devices with different resolutions respectively.
In some embodiments, the apparatus is further configured to:
in response to the target object not being identified in the target image region, updating the second image based on a newly captured video frame in the first video stream or a newly captured video frame in the second video stream to update the location information.
In some embodiments, the processing module, when determining the target image region including the target object in the first image according to the position information of the target object in the second image, is specifically configured to:
in response to the coordinate systems of the first image and the second image being different, determining a mapping relationship between the coordinate system of the first image and the coordinate system of the second image;
determining a target image area including the target object in the first image based on the mapping relationship and the position information.
In some embodiments, the identification module, when configured to perform identification processing on the target object based on the target image area, is specifically configured to:
extracting the features of the target image area;
and identifying attribute information and/or state information of the target object based on the extracted features.
In some embodiments, the apparatus is further configured to:
and generating a report for describing the state of the target object in a specific scene based on the attribute information and/or the state information.
In some embodiments, the position information includes position information of a boundary of an object frame corresponding to the target object, and the processing module is configured to, when determining, in the first image, a target image region including the target object according to the position information of the target object in the second image, specifically:
determining a plurality of pixel points in the first image based on the position information of the boundary points and a scaling, the scaling being determined based on a resolution of the first image and a resolution of the second image;
and determining the target image area in the first image by taking the plurality of pixel points as boundary points.
According to a third aspect of embodiments of the present disclosure, an electronic device is provided, where the electronic device includes a processor, a memory, and a computer program stored in the memory and executable by the processor, and when the processor executes the computer program, the electronic device implements the method according to any one of the first aspect mentioned above.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium storing computer instructions, which when executed, implement the method of the first aspect.
In the embodiment of the present disclosure, when identifying a target object, images of two resolutions for the same scene may be used, and an image of a low resolution may be used for full-map detection to determine a position of the target object in the low-resolution image, and then an image area corresponding to the target object is determined in the high-resolution image based on the position of the target object in the low-resolution image, and the image area corresponding to the target object in the high-resolution image is used for subsequent identification processing. Because the full-image detection is needed when the position of the target object is detected, the low-resolution image can be adopted for detection so as to improve the speed of the position detection of the target object, and the high-resolution image is adopted for identification when the refined information such as attributes, states and the like is identified, so that the precision and the accuracy of the identification result can be ensured.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.
Fig. 1 is a flowchart of a process of identifying a target object according to an embodiment of the present disclosure.
Fig. 2 is a schematic diagram of a target object recognition process according to an embodiment of the present disclosure.
Fig. 3 is a schematic diagram of an application scenario of an embodiment of the present disclosure.
Fig. 4 is a schematic logical structure diagram of an apparatus for performing recognition processing on a target object according to an embodiment of the present disclosure.
Fig. 5 is a schematic diagram of a logical structure of an electronic device according to an embodiment of the present disclosure.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
In order to make the technical solutions in the embodiments of the present disclosure better understood and make the above objects, features and advantages of the embodiments of the present disclosure more comprehensible, the technical solutions in the embodiments of the present disclosure are described in further detail below with reference to the accompanying drawings.
In many service scenes, such as scenes of offline education, video conferences, security and the like, a high-resolution camera and other camera devices can be adopted to acquire images or videos of a target object and then identify the attribute or state of the target object. For example, whether the face is a target face or not is recognized, the expression, age, and sex of the face are recognized, or the motion and posture of the body are recognized for subsequent applications. Before identifying the attribute or the state of the target object, the whole image may be detected to determine the position of the target object in the image, and then the attribute or the state of the image region corresponding to the position may be identified. For example, when performing face recognition, the position of a face in an image may be determined, for example, a face frame including the face is determined in the image, and then a subsequent recognition processing is performed on an image area corresponding to the face frame to determine identity information, gender information, expression posture and the like corresponding to the face. When the position of the target object is determined, since the whole image needs to be detected, under the condition that the resolution of the image is high, a large amount of computing resources are consumed, the consumed time is long, and the final speed of target object identification is influenced.
Based on this, the embodiment of the present disclosure provides a method for identifying a target object, when identifying a target object, images with two resolutions for the same scene may be used, and an image with a low resolution may be used for full-map detection to determine a position of the target object in the image with the low resolution, and then an image area corresponding to the target object is determined in the image with the high resolution based on the position, and a subsequent identification process is performed by using the image area corresponding to the target object in the image with the high resolution. Of course, in practical application, images with more than two resolutions may be used, for example, an image with a low resolution is used for full image detection, and images with different resolutions are used for identifying different attributes or states in combination with different identification requirements. In the embodiment of the present disclosure, the details of the scheme are illustrated by taking images of two resolutions as an example, but the scheme is not limited to the use manner of the technical scheme.
The method provided in the embodiments of the present disclosure may be executed by any electronic device with sufficient computing capability, for example, a server, a notebook computer, a mobile phone, a tablet, or a camera device. For example, in some embodiments, the step of identifying the target object in the image or video may be performed after the video or image is directly captured by the camera. In some embodiments, the step of identifying the target object in the image or video may also be performed by the server after receiving the image or video captured by the camera. The server may be an independent server or a server cluster, and the embodiments of the present disclosure are not limited thereto.
The target object in the embodiment of the present disclosure may be various objects whose attribute information or state information needs to be identified, for example, a person, an animal, or a scene, or a local part of the person, the animal, or the scene, for example, a face, a limb, or the like of the person.
The following describes a method for identifying and processing a target object according to an embodiment of the present disclosure with reference to fig. 1 and fig. 2, where fig. 1 is a processing flow chart of the method, and fig. 2 is a schematic processing procedure diagram of the method, and specifically includes the following steps:
s102, acquiring a first image to be identified;
s104, determining a target image area including a target object in a first image according to position information of the target object in a second image, wherein the first image and the second image correspond to the same shooting scene, and the resolution of the second image is lower than that of the first image;
and S106, identifying the target object based on the target image area.
For the same shooting scene, two images with different resolutions corresponding to the scene can be acquired, hereinafter, the image with higher resolution is collectively referred to as a first image, and the image with lower resolution is collectively referred to as a second image.
When the position of the target object is detected, the requirement on the resolution of the image is low, so that the second image with low resolution can be subjected to full-image detection to determine the position information of the target object in the second image. The position information of the target object in the second image may be pixel coordinates of each pixel point corresponding to the target object in the second image, or pixel coordinates of a contour point of the target object, or coordinates of an object frame including the target object, for example, pixel coordinates of a boundary point or four vertices of a rectangular frame including the target object, or pixel coordinates and side lengths of a center point of the rectangular frame including the target object. The position information may be various information that may be used to determine the position of the target object in the second image, and the embodiments of the present disclosure are not limited thereto.
When the first image to be identified is acquired, a target image area including the target object may be determined in the first image according to the position information of the target object in the second image. The first image can be an image with higher resolution acquired by a high-definition camera, and because the first image has higher resolution, a more accurate result can be obtained when the first image is used for identifying and processing a target object. The first image and the second image may be images acquired simultaneously for the same shooting scene, or images acquired at short time intervals to ensure that the scenes in the two frames of images are relatively close and the position change of the target object is small. The position of the target object in the first image may be determined based on the position information of the target object in the second image, and a target image region including the target object may be determined in the first image for a subsequent recognition process. The target image region may be an image region formed only by pixels corresponding to the target object, or may be an image region including pixels corresponding to the target object and partial background pixels, for example, an image region including the target object and partial background image framed by an object frame corresponding to the target object.
After the target image area is determined in the first image, the target object may be identified according to the target image area, for example, the target image area may be input into a pre-trained model to identify detail information such as attribute information and state information of the target object. For example, the target object may be a face, a face frame may be determined in the first image based on a position of the face in the second image, and then an image area corresponding to the face frame is input into the face recognition model to perform various recognition processes such as face comparison, facial expression recognition, skin color recognition, gender recognition, age recognition, and the like. Alternatively, the target object may be a limb, and an image area including the limb in the first image may be input into the model to recognize information such as a motion and a posture of the limb.
According to the method, two images with different resolutions are acquired for the same shooting scene, the low-resolution image is used for full-map detection to determine the position information of a target object in the low-resolution image, an image area including the target object is determined in the high-resolution image according to the position of the target object in the low-resolution image, the target object is identified by using the image area including the target object in the high-resolution image, and the full-map detection is performed through the low-resolution image, so that the calculation resource consumed in the position detection of the target object can be reduced, and meanwhile, the detection speed can be improved. The image area including the target object in the high-resolution image is used for identifying the attribute, the state and other detail information of the target object, so that the precision and the accuracy of the detection result can be improved, and the target object can be identified quickly and accurately.
Since the position of the target object in the first image can be obtained according to the position mapping of the target object in the second image, it is required to ensure that the position of the target object is not changed, or the change is small and negligible, when the two images are taken, so that the target image area including the target object determined in the first image by the position information of the target object in the second image is relatively accurate.
In some embodiments, the time interval between the first image and the second image is not greater than a preset time threshold, where the preset time threshold may be set according to an actual application scenario, for example, in some scenarios, the position of the target object does not substantially change, for example, in an offline classroom or offline meeting scenario, and the positions of the students and the participants do not substantially change, so that the preset time threshold may be set to be larger. In some scenarios, the target object may be moving and its position may change in real time, and thus the preset time threshold may be set smaller to ensure that the target object is included in the determined target image area after mapping the position of the target object in the second image to the first image.
In some scenarios, information such as attributes, states and the like of the target object can be identified by shooting a video stream of the target object for a period of time, for example, a scene of online education, a video of a student in a class can be shot, and the learning state of the student in the class can be determined by performing identification analysis on expressions, postures, actions and the like of the student in the video. Thus, in some embodiments, the first image may be obtained from a video frame of the first video stream and the second image may be obtained by downsampling the video frame of the first video stream. For example, a video stream of a target object may be acquired by a camera device, after a frame of first image to be recognized is acquired from the video stream, downsampling processing may be performed on the first image to obtain a second image with a lower resolution, a downsampling frequency may be set according to an actual requirement, and then a target image area may be determined in the first image according to position information of the target object in the second image. Of course, the first image may be downsampled once for each frame to obtain the second image for determining the target image area. In a scene where the position of the target object does not change much, in order to improve the processing efficiency, after obtaining the second image of one frame, the second image may be used by the first images of the subsequent frames to determine the position of the target object in the first images of the subsequent frames.
In some embodiments, a second video stream of low resolution may be acquired at the same time that the first video stream of high resolution is acquired, and the second image may also be obtained from video frames of the second video stream. The second image may be an image in the second video stream captured simultaneously with the first image, or may be an image in the second video stream captured at a shorter time interval than the first image.
In some embodiments, the first video stream and the second video stream may be captured through different resolution video channels of the same camera device. At present, many camera devices can support simultaneous acquisition of multiple channels of videos or images, where the resolutions of the videos or images acquired by different channels at the same time are different, and the contents of the images or videos are completely the same.
In some embodiments, the first video stream and the second video stream may also be captured by two cameras with different resolutions. For example, for the same scene, two cameras with different resolutions may be used to simultaneously acquire video streams, a video frame acquired by a low-resolution camera is used for position detection of a target object, and a video frame acquired by a high-resolution camera is used for identification of the target object.
In some scenes, since the position change of the target object is small, the position information of the target object determined for the second image of the same frame may be used for determining the position of the target object in the first images of a plurality of frames, which are, of course, as continuous as possible or adjacent images of a plurality of frames in order to ensure that the position change of the target object in the first images of the plurality of frames is small. Thus, in some embodiments, the second image may be a frame image captured simultaneously with a specified video frame in the first video stream, or the second image may be an image obtained by downsampling a specified video frame in the subject first video stream, in which case the first image may be a plurality of video frames in the first video stream having an inter-frame spacing from the specified video frame less than a preset threshold. Therefore, the position information determined by the second image of one frame can be repeatedly used for the first images of multiple frames, and the efficiency is improved. The first image may be a multi-frame image with an inter-frame distance smaller than a preset threshold, for example, a multi-frame image with an inter-frame distance smaller than 5 frames, or smaller than 10 frames, and the positions of the target objects in the multi-frame image tend to be relatively close, so that the area including the target object in the multi-frame image may be determined according to the position information of the target object in the second image. The multiple frames of images may be continuously acquired images or discontinuously acquired images, and the embodiments of the present disclosure are not limited thereto.
In some embodiments, after the position of the target object is determined in the second image of one frame, the target object may be used in a plurality of subsequent first images, and since the position of the target object may change, if the position of the target object changes greatly, the target image area determined in the first image based on the already determined position information may not contain the target object. Thus, in some embodiments, if the target object is not identified in the target image region, indicating that the position of the target object has changed significantly at this time, the determined position of the target object in the first image based on the determined position information is less accurate, and therefore, the second image may be updated based on the newly captured video frame in the first video stream or the newly captured video frame in the second video stream, and the position information of the target object in the updated second image may be re-determined for use in the subsequent first image. For example, if the second image is obtained by down-sampling the first image, a video frame of a latest frame in the current first video stream may be obtained, the down-sampling is performed to obtain a new second image, and the position information of the target object in the new second image is determined for determining the target image area in the subsequent video frame in the first video stream. If the second image is obtained from the second video stream, the latest frame of video frame from the second video stream may be obtained as the second image, and the position information of the target object in the second image is determined for determining the target image area in the subsequent video frame in the first video stream.
In some scenarios, if the first image and the second image are acquired by the same camera, or the second image is obtained based on down-sampling of the first image, the coordinate systems of the two images are consistent, so that when the target image area of the first image is determined according to the position information of the target object in the second image, the target image area can be determined in the first image directly based on the position information, the resolution of the first image and the resolution of the second image without converting the coordinate systems.
In some embodiments, the position information of the target object determined in the second image may be pixel coordinates of a pixel corresponding to the target object, and at this time, when the target image region of the first image is determined according to the position information of the target object in the second image, a scaling ratio may be determined according to resolutions of the first image and the second image, and then the pixel coordinates of the pixel corresponding to the target object in the first image may be determined according to the scaling ratio and the pixel coordinates of the pixel corresponding to the target object in the second image, so as to obtain the target image region. For example, the pixel coordinate of each pixel point of the target object in the second image is P2, and the pixel coordinate of the corresponding pixel point of the target object in the first image is P1, then P1 may be determined according to the scaling sum P2. For example, the second image is a 100 × 100 image, the first image is a 200 × 200 image, the scaling ratio of the first image in the row and column directions is 2 compared to the second image, and assuming that the coordinate of a pixel of the target object in the second image is (1, 1), the coordinate of the corresponding pixel of the pixel in the first image is (1, 1), (1, 2), (2, 1) and (2, 2), and so on, the coordinate of each pixel of the target object in the first image can be determined.
In some scenarios, to improve processing efficiency, the position information of the target object determined in the second image may be position information of a frame including the target object. The box may be a rectangular box, a running box, or other irregularly shaped box. When the target image area is determined to be in the first image, the position of the frame may be directly mapped to the first image, thereby determining the target image area. For example, in some embodiments, the position information of the target object determined in the second image may be position information of a boundary point of a rectangular frame containing the target object, where the boundary point may be four vertices of the rectangular frame, and when the target image region is determined in the first image according to the position information of the target object in the second image, a scaling ratio may be determined according to a resolution of the first image and a resolution of the second image, then a plurality of pixel points are determined in the first image according to the position information of the boundary point and the scaling ratio, and a rectangular region, that is, the target image region is determined in the first image by using the plurality of pixel points as the boundary points. For example, the second image is a 100 × 100 image, the first image is a 200 × 200 image, and the scaling ratio of the first image in the row and column directions is 2 compared to the second image, and assuming that the coordinates of the four vertices of the rectangular frame including the target object in the second image are (1, 1), (1, 2), (2, 1), (2, 2), the coordinates of the four vertices of the rectangular frame including the target object determined on the second image are (2, 2), (2, 4), (4, 2), (4, 4).
Of course, since the position information of the target object on the second image of one frame may be used for the first images of multiple frames in some scenes, since the position of the target object on the first images of multiple frames may change, in order to ensure that the target object is included in the target image region determined according to the position information, after the rectangular frame including the target object is determined in the second image, the rectangular frame may be enlarged, so that even if the position of the target object changes, the rectangular frame determined in the first image according to the enlarged rectangular frame may include the complete target object. Of course, in some embodiments, after the rectangular frame in the first image is determined according to the rectangular frame including the target object in the second image, the rectangular frame in the first image may be enlarged to ensure that the target object is within the rectangular frame.
In some embodiments, the first image and the second image are acquired by two cameras with different resolutions, so that the coordinate systems of the first image and the second image are different, and when the target image area of the first image is determined according to the position information of the target object in the second image, the mapping relationship between the coordinate system of the first image and the coordinate system of the second image may be determined, and then the target area is determined based on the mapping relationship and the position information. For example, the two images may be mapped to the same coordinate system, and then the target image area may be determined in the first image by using the above method. The relative position relationship between the two cameras for acquiring the first image and the second image may be calibrated in advance, then the mapping relationship between the coordinate systems of the two images is determined based on the relative position relationship between the two cameras, and the coordinate systems of the two images are unified based on the mapping relationship, for example, the coordinate system of the second image may be mapped to the coordinate system of the first image, or the coordinate system of the first image may be mapped to the coordinate system of the second image, or the coordinate system of the first image and the coordinate system of the second image may be mapped to the other coordinate system, and then the position of the target object may be mapped.
In some embodiments, when the target object is identified in the target image region, the target image region may be subjected to feature extraction, and then one or more of attribute information and state information of the target object may be identified based on the extracted features.
In some embodiments, after determining the attribute information and/or the state information of the target object, a report describing the state of the target object in a specific scene may be generated according to the attribute information or the state information, and then displayed to the user. The specific scene refers to a scene in which a user needs to observe the state of a target object through a camera device, such as a scene of a student in a class or an examination, a working scene of a worker on a station, and the like. Taking offline education as an example, video streams of students in class or examination can be collected through the camera device. Then, the expression, posture, action and other information of the student are recognized, whether the student dozes in class, cheats in an examination and the like is determined based on the information, then a report for describing the state of the student in a class or an examination scene is generated, and a teacher or parents can check the report to know the state of the student in the class or the examination.
In order to further explain the identification processing method of the target object provided by the embodiment of the present disclosure, the following is explained in detail with reference to a specific embodiment.
At present, the scenes of offline education usually use a camera device to collect video images of students in class or examination, and identify and analyze the states, expressions, actions and postures of the students in the video images to determine the learning states of the students, such as whether to doze in class, whether to cheat in examination, and the like. In order to save the computing resources and improve the detection efficiency, as shown in fig. 3, a camera device supporting simultaneous acquisition of multi-channel videos may be used to acquire videos of students in a classroom, where the camera device may acquire two video streams at the same time, one being a low-resolution video stream (e.g. 1080P) and one being a high-resolution video stream (e.g. 4k resolution), and when identifying students in each image frame in the high-resolution video stream, the camera device may acquire a high-resolution image 1 and a low-resolution image 1 acquired at the same time from the two videos, and then perform face position detection on the low-resolution image 1 to determine a face frame 0 in the low-resolution image 1. Then, the scaling is determined according to the resolution of the high-resolution image 1 and the low-resolution image 1, the face frame 0 in the low-resolution image 1 is mapped into the high-resolution image 1 according to the scaling, the face frame 1 containing the face is determined in the high-resolution image 1, then the face frame 1 is intercepted from the high-resolution image 1, and the face frame is input into a subsequent model to identify attribute information or state information such as face identity identification, expression identification, action identification and the like. Of course, for each frame of high-resolution image, the face position can be determined by using the low-resolution images acquired at the same time, but the change of the student position is not large, so that the face position can be determined by using one frame of low-resolution image and then used by using subsequent frames of high-resolution images. For example, after the face frame 1 is determined on the high resolution image 1 according to the face frame 0 on the low resolution image 1, the face frame 0 on the low resolution image 1 may be further used to determine the face frame 2, the face frame 3, and the face frame 4 in the high resolution image 2, the high resolution image 3, and the high resolution image 4, so as to be used for subsequent identification of the attribute information or the state information. Of course, in order to ensure that the face frame determined in the high resolution image according to the face frame 0 contains a face, the inter-frame distance between the rest of the high resolution image and the high resolution image 1 is preferably controlled within a certain range, for example, not more than 5 frames or 10 frames. In addition, in order to avoid incomplete faces in the face frame determined in the high-resolution image according to the face frame 0 on the low-resolution image due to the change of the face position, after the face frame in the high-resolution image is determined (such as the face frame 1, the face frame 2, and the face frame 3), the face frame with high resolution may be subjected to an amplification process, and then an image area in the amplified face frame is intercepted for subsequent face recognition. Of course, if the face cannot be identified in the face frame determined in the high resolution image, this indicates that the face position has changed greatly, and therefore the face position determined in the high resolution image according to the face frame on the low resolution image 1 may not be accurate, so that the latest frame of low resolution image may be re-acquired from the low resolution video stream, and then the face frame may be determined for determining the face position in the subsequent high resolution image.
Certainly, the face of the student is recognized, meanwhile, the limb information of the student can be recognized, the action and the state of the student can be analyzed through the recognition of the limb information, and after the attribute and the state information of each student in the high-resolution video stream are recognized, a report for describing the class or examination state of the student can be generated based on the recognized attribute and state information so as to record the class state of the student, such as whether the student sleeps in class or not, whether the examination cheats in class or not and the like. Meanwhile, the teaching condition of the teacher can be analyzed according to the student status report to determine the teaching quality of the teacher.
The two paths of video streams with different resolutions are used for identifying attribute information or state information such as facial expressions, postures, body actions and the like, so that the identification speed can be increased, and the accuracy of an identification result can be ensured.
Accordingly, an embodiment of the present disclosure provides an apparatus for identifying and processing a target object, as shown in fig. 4, where the apparatus 40 includes:
an obtaining module 41, configured to obtain a first image to be identified;
a processing module 42, configured to determine a target image area including a target object in a first image according to position information of the target object in a second image, where the first image and the second image correspond to a same shooting scene, and a resolution of the second image is lower than a resolution of the first image;
and an identifying module 43, configured to perform identification processing on the target object based on the target image area.
In some embodiments, the first image and the second image are captured at a time interval not exceeding a preset threshold.
In some embodiments, the first image is obtained from a video frame of a first video stream, the second image is obtained by downsampling the video frame of the first video stream, or the second image is obtained from a video frame of a second video stream.
In some embodiments, the second image is captured simultaneously with or obtained by down-sampling a specified video frame in the first video stream, and the first image includes a plurality of video frames in the first video stream whose inter-frame spacing from the specified video frame is smaller than a preset threshold.
In some embodiments, the first video stream and the second video stream are acquired through video channels with different resolutions of the same camera device, or
The first video stream and the second video stream are acquired through two camera devices with different resolutions respectively.
In some embodiments, the apparatus is further configured to:
in response to the target object not being identified in the target image region, updating the second image based on a newly captured video frame in the first video stream or a newly captured video frame in the second video stream to update the location information.
In some embodiments, the processing module, when determining the target image region including the target object in the first image according to the position information of the target object in the second image, is specifically configured to:
in response to the coordinate systems of the first image and the second image being different, determining a mapping relationship between the coordinate system of the first image and the coordinate system of the second image;
determining a target image area including the target object in the first image based on the mapping relationship and the position information. In some embodiments, the identification module, when configured to perform identification processing on the target object based on the target image area, is specifically configured to:
extracting the features of the target image area;
and identifying attribute information and/or state information of the target object based on the extracted features.
In some embodiments, the apparatus is further configured to:
and generating a report for describing the state of the target object in a specific scene based on the attribute information and/or the state information.
In some embodiments, the position information includes position information of a boundary of an object frame corresponding to the target object, and the processing module is configured to, when determining, in the first image, a target image region including the target object according to the position information of the target object in the second image, specifically:
determining a plurality of pixel points in the first image based on the position information of the boundary points and a scaling, the scaling being determined based on a resolution of the first image and a resolution of the second image;
and determining the target image area in the first image by taking the plurality of pixel points as boundary points.
Further, an electronic device is provided in an embodiment of the present disclosure, as shown in fig. 5, the electronic device 50 includes a processor 51, a memory 52, and a computer program stored in the memory 52 and executable by the processor 51, and when the processor 51 executes the computer program, the method in any one of the above embodiments is implemented.
The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the method of any of the foregoing embodiments.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
From the above description of the embodiments, it is clear to those skilled in the art that the embodiments of the present disclosure can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the embodiments of the present specification may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments of the present specification.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, it is relatively simple to describe, and reference may be made to some descriptions of the method embodiment for relevant points. The above-described apparatus embodiments are merely illustrative, and the modules described as separate components may or may not be physically separate, and the functions of the modules may be implemented in one or more software and/or hardware when implementing the embodiments of the present disclosure. And part or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The foregoing is only a specific embodiment of the embodiments of the present disclosure, and it should be noted that, for those skilled in the art, a plurality of modifications and decorations can be made without departing from the principle of the embodiments of the present disclosure, and these modifications and decorations should also be regarded as the protection scope of the embodiments of the present disclosure.

Claims (13)

1. A target object identification processing method is characterized by comprising the following steps:
acquiring a first image to be identified;
determining a target image area including a target object in a first image according to position information of the target object in a second image, wherein the first image and the second image correspond to the same shooting scene, and the resolution of the second image is lower than that of the first image;
and identifying the target object based on the target image area.
2. The method of claim 1, wherein the first image and the second image are captured at a time interval not exceeding a preset threshold.
3. The method according to claim 1 or 2, wherein the first image is obtained from a video frame of a first video stream, the second image is obtained by downsampling the video frame of the first video stream, or the second image is obtained from a video frame of a second video stream.
4. The method of claim 3, wherein the second image is captured simultaneously with or obtained by down-sampling a specified video frame in the first video stream, and wherein the first image comprises a plurality of video frames in the first video stream having an inter-frame distance from the specified video frame smaller than a preset threshold.
5. The method according to claim 3 or 4, wherein the first video stream and the second video stream are captured by video channels of different resolutions of the same camera device, or
The first video stream and the second video stream are acquired through two camera devices with different resolutions respectively.
6. The method according to any one of claims 3-5, further comprising:
in response to the target object not being identified in the target image region, updating the second image based on a newly captured video frame in the first video stream or a newly captured video frame in the second video stream to update the location information.
7. The method of claim 1, wherein the determining a target image area including a target object in the first image according to the position information of the target object in the second image comprises:
in response to the coordinate systems of the first image and the second image being different, determining a mapping relationship between the coordinate system of the first image and the coordinate system of the second image;
determining a target image area including the target object in the first image based on the mapping relationship and the position information.
8. The method according to any one of claims 1-7, wherein the identifying the target object based on the target image area comprises:
extracting the features of the target image area;
and identifying attribute information and/or state information of the target object based on the extracted features.
9. The method of claim 8, further comprising:
and generating a report for describing the state of the target object in a specific scene based on the attribute information and/or the state information.
10. The method according to any one of claims 1 to 9, wherein the position information includes position information of a boundary point of an object frame corresponding to the target object, and the determining a target image area including the target object in the first image according to the position information of the target object in the second image includes:
determining a plurality of pixel points in the first image based on the position information of the boundary points and a scaling, the scaling being determined based on a resolution of the first image and a resolution of the second image;
and determining the target image area in the first image by taking the plurality of pixel points as boundary points.
11. An apparatus for recognizing and processing a target object, the apparatus comprising:
the acquisition module is used for acquiring a first image to be identified;
the processing module is used for determining a target image area comprising a target object in a first image according to position information of the target object in a second image, wherein the first image and the second image correspond to the same shooting scene, and the resolution of the second image is lower than that of the first image;
and the identification module is used for identifying the target object based on the target image area.
12. An electronic device, comprising a processor, a memory, and a computer program stored in the memory for execution by the processor, wherein the processor, when executing the computer program, implements the method of any of claims 1-10.
13. A computer-readable storage medium having computer instructions stored thereon which, when executed, implement the method of any of claims 1-10.
CN202110221208.6A 2021-02-26 2021-02-26 Target object identification processing method, device, equipment and medium Pending CN112818933A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110221208.6A CN112818933A (en) 2021-02-26 2021-02-26 Target object identification processing method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110221208.6A CN112818933A (en) 2021-02-26 2021-02-26 Target object identification processing method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN112818933A true CN112818933A (en) 2021-05-18

Family

ID=75862341

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110221208.6A Pending CN112818933A (en) 2021-02-26 2021-02-26 Target object identification processing method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN112818933A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113673461A (en) * 2021-08-26 2021-11-19 深圳随锐云网科技有限公司 Method and device for realizing selection of human face and human figure region based on 4K + AI
CN114067477A (en) * 2021-11-04 2022-02-18 极视角(上海)科技有限公司 Mask recognition entrance guard for AI image recognition and use method
CN114430500A (en) * 2022-04-02 2022-05-03 深圳酷源数联科技有限公司 Video plug-flow method with real-time target detection, equipment and storage medium
US20220166923A1 (en) * 2019-08-14 2022-05-26 Fujifilm Corporation Imaging apparatus, operation method of imaging apparatus, and program
DE102022200833A1 (en) 2022-01-26 2023-07-27 Robert Bosch Gesellschaft mit beschränkter Haftung Surveillance arrangement, method for registering surveillance cameras and analysis modules, computer program and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110458198A (en) * 2019-07-10 2019-11-15 哈尔滨工业大学(深圳) Multiresolution target identification method and device
CN110619626A (en) * 2019-08-30 2019-12-27 北京都是科技有限公司 Image processing apparatus, system, method and device
CN111597953A (en) * 2020-05-12 2020-08-28 杭州宇泛智能科技有限公司 Multi-path image processing method and device and electronic equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110458198A (en) * 2019-07-10 2019-11-15 哈尔滨工业大学(深圳) Multiresolution target identification method and device
CN110619626A (en) * 2019-08-30 2019-12-27 北京都是科技有限公司 Image processing apparatus, system, method and device
CN111597953A (en) * 2020-05-12 2020-08-28 杭州宇泛智能科技有限公司 Multi-path image processing method and device and electronic equipment

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220166923A1 (en) * 2019-08-14 2022-05-26 Fujifilm Corporation Imaging apparatus, operation method of imaging apparatus, and program
US11689816B2 (en) * 2019-08-14 2023-06-27 Fujifilm Corporation Imaging apparatus, operation method of imaging apparatus, and program
CN113673461A (en) * 2021-08-26 2021-11-19 深圳随锐云网科技有限公司 Method and device for realizing selection of human face and human figure region based on 4K + AI
CN113673461B (en) * 2021-08-26 2024-03-26 深圳随锐云网科技有限公司 Method and device for realizing face and human shape area selection based on 4K+AI
CN114067477A (en) * 2021-11-04 2022-02-18 极视角(上海)科技有限公司 Mask recognition entrance guard for AI image recognition and use method
DE102022200833A1 (en) 2022-01-26 2023-07-27 Robert Bosch Gesellschaft mit beschränkter Haftung Surveillance arrangement, method for registering surveillance cameras and analysis modules, computer program and storage medium
WO2023144182A1 (en) 2022-01-26 2023-08-03 Robert Bosch Gmbh Monitoring assembly, method for registering monitoring cameras and analysis modules, computer program, and storage medium
CN114430500A (en) * 2022-04-02 2022-05-03 深圳酷源数联科技有限公司 Video plug-flow method with real-time target detection, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112818933A (en) Target object identification processing method, device, equipment and medium
CN109325933B (en) Method and device for recognizing copied image
CN108121952B (en) Face key point positioning method, device, equipment and storage medium
US20210343041A1 (en) Method and apparatus for obtaining position of target, computer device, and storage medium
US20200394392A1 (en) Method and apparatus for detecting face image
CN111667001B (en) Target re-identification method, device, computer equipment and storage medium
CN111144356B (en) Teacher sight following method and device for remote teaching
CN111209811B (en) Method and system for detecting eyeball attention position in real time
CN110059624B (en) Method and apparatus for detecting living body
CN110062157B (en) Method and device for rendering image, electronic equipment and computer readable storage medium
US20180063449A1 (en) Method for processing an asynchronous signal
CN110111241B (en) Method and apparatus for generating dynamic image
CN109934873B (en) Method, device and equipment for acquiring marked image
CN112001944A (en) Classroom teaching quality evaluation data acquisition method, computer equipment and medium
CN110969045A (en) Behavior detection method and device, electronic equipment and storage medium
JP2022550195A (en) Text recognition method, device, equipment, storage medium and computer program
CN113160418A (en) Three-dimensional reconstruction method, device and system, medium and computer equipment
CN110930386B (en) Image processing method, device, equipment and storage medium
CN112308018A (en) Image identification method, system, electronic equipment and storage medium
Kantarcı et al. Shuffled patch-wise supervision for presentation attack detection
CN110084306B (en) Method and apparatus for generating dynamic image
US10438066B2 (en) Evaluation of models generated from objects in video
CN111310595A (en) Method and apparatus for generating information
CN113239915B (en) Classroom behavior identification method, device, equipment and storage medium
CN115134677A (en) Video cover selection method and device, electronic equipment and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210518