WO2021008252A1 - Method and apparatus for recognizing position of person in image, computer device and storage medium - Google Patents

Method and apparatus for recognizing position of person in image, computer device and storage medium Download PDF

Info

Publication number
WO2021008252A1
WO2021008252A1 PCT/CN2020/093608 CN2020093608W WO2021008252A1 WO 2021008252 A1 WO2021008252 A1 WO 2021008252A1 CN 2020093608 W CN2020093608 W CN 2020093608W WO 2021008252 A1 WO2021008252 A1 WO 2021008252A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
recognized
human body
video
video image
Prior art date
Application number
PCT/CN2020/093608
Other languages
French (fr)
Chinese (zh)
Inventor
石磊
王健宗
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021008252A1 publication Critical patent/WO2021008252A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • This application relates to the field of computer technology, in particular to a method, device, computer equipment and storage medium for identifying the position of a person in an image.
  • a method for identifying the position of a person in an image comprising:
  • the human body posture model obtained through training recognizes the key points of the human body in the image to be recognized, and determines the position information of the person in the video image to be recognized based on the recognized key points of the human body;
  • the lightweight target detection model obtained through training recognizes the position information of the person in the video image to be recognized.
  • a device for recognizing the position of a person in an image comprising:
  • the preprocessing module is used to obtain the surveillance video file to be identified, and preprocess the surveillance video file to be identified to obtain the video image to be identified;
  • the determining module is used to determine the image type of the video image to be recognized
  • the recognition module is used to recognize the key points of the human body in the video image to be recognized by the human body posture model obtained by training when the image type is a color image, and determine the video image to be recognized based on the recognized key points of the human body Position information in
  • the recognition module is also used to recognize the position information of the person in the video image to be recognized by the lightweight target detection model obtained through training when the image type is a night vision image.
  • a computer device includes a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, the following steps are implemented:
  • the human body posture model obtained through training recognizes the key points of the human body in the image to be recognized, and determines the position information of the person in the video image to be recognized based on the recognized key points of the human body;
  • the lightweight target detection model obtained through training recognizes the position information of the person in the video image to be recognized.
  • a computer-readable storage medium has a computer program stored thereon, and when the computer program is executed by a processor, the method for recognizing the position of a person in an image is realized.
  • the surveillance video file to be identified is preprocessed to obtain the video image to be identified, thereby facilitating the subsequent processing of video content identification .
  • the human body posture model obtained through training recognizes the key points of the human body in the image to be recognized, and based on the recognized The key points of the human body determine the position information of the person in the video image to be recognized.
  • the lightweight target detection model obtained through training recognizes the position information of the person in the video image to be recognized. This ensures that different types of video images to be recognized can be recognized with the most matching recognition model, and the accuracy of recognition is improved. And according to different recognition models to detect the position of the person in the video image, it can get rid of the old manual recognition and viewing method, and realize the automatic and rapid recognition of surveillance video content. Improve work efficiency.
  • FIG. 1 is an application scene diagram of the method for recognizing the position of a person in an image in an embodiment
  • FIG. 2 is a schematic flowchart of a method for recognizing the position of a person in an image in an embodiment
  • FIG. 3 is a schematic flowchart of the step of determining the type of video image in an embodiment
  • FIG. 4 is a schematic flowchart of a method for recognizing the position of a person in an image in another embodiment
  • FIG. 5 is a structural block diagram of an apparatus for recognizing the position of a person in an image in an embodiment
  • Fig. 6 is an internal structure diagram of a computer device in an embodiment.
  • the method for identifying the position of a person in an image provided by this application can be applied to the application environment as shown in FIG. 1.
  • the monitoring device 102 communicates with the server 104 through the network.
  • the server 104 obtains the surveillance video file to be identified sent by the surveillance device 102, and the server 104 preprocesses the surveillance video file to obtain the video image to be identified.
  • the server 104 determines the image type of the video image to be recognized. When the image type is a color image, the server 104 recognizes the key points of the human body in the image to be recognized through the human body posture model obtained through training, and determines the position information of the person in the video image to be recognized based on the recognized key points of the human body.
  • the server 104 recognizes the position information of the person in the video image to be recognized through the lightweight target detection model obtained through training.
  • the monitoring device 102 can be, but is not limited to, various cameras, personal computers with cameras, notebook computers, smart phones, tablet computers, and portable wearable devices, etc.
  • the server 104 can be an independent server or composed of multiple servers. Server cluster to achieve.
  • a method for recognizing the position of a person in an image is provided. Taking the method applied to the server in FIG. 1 as an example for description, the method includes the following steps:
  • Step S202 Obtain a surveillance video file to be identified, and perform preprocessing on the surveillance video file to be identified to obtain a video image to be identified.
  • the surveillance video file to be identified refers to the file that includes the surveillance video collected by the surveillance equipment. It can be understood that the surveillance video file to be identified includes, but is not limited to, the surveillance video collected by the surveillance equipment and sent to the server. It can also be a file with a transmission function. Other terminal devices that communicate with the server. That is, the surveillance video file to be identified obtained by the server can come from the surveillance device or the video file sent by other terminal devices. Preprocessing means that the surveillance video file to be identified is decoded to obtain the corresponding surveillance video to be identified, and the surveillance video to be identified is segmented to obtain the surveillance video to be identified in the surveillance video to be identified, and the gray scale of the identified video image is adjusted and removed. Technical processing such as drying and sharpening, that is, through adjustments to improve image quality and noise to ensure image clarity and quality.
  • the user can issue a character position recognition instruction through the monitoring device, and select the to-be-recognized monitoring video that needs to be recognized.
  • the surveillance equipment receives the person position recognition instruction issued by the user, it obtains the surveillance video to be identified selected by the user, compresses and encapsulates the surveillance video file to be identified, and sends the surveillance video file to be identified to the corresponding server. And send a request for person location recognition to the corresponding server.
  • the server After the server receives the person location recognition request, it decodes and restores the to-be-recognized surveillance video file corresponding to the person location-recognition request to obtain the to-be-recognized surveillance video, and then preprocesses the to-be-recognized surveillance video to obtain the to-be-recognized surveillance video. Video image.
  • Step S204 Determine the image type of the video image to be recognized.
  • the server preprocesses the surveillance video file to be recognized to obtain the corresponding video image to be recognized, it determines whether the video image to be recognized is a night vision image or a color image by acquiring pixel values in the video image to be recognized.
  • Step S206 When the image type is a color image, the human body posture model obtained through training recognizes the key points of the human body in the image to be recognized, and determines the position information of the person in the video image to be recognized based on the recognized key points of the human body.
  • the human pose model is an openpose model.
  • the openpose model is a pose detection framework used to detect human joints, such as key points such as the neck, shoulders, and elbows, and link the key points to obtain the human body posture.
  • the openpose model includes a pre-network layer and a dual-branch multi-level CNN network (Convolutional Neural Networks, convolutional neural network).
  • the front network is in the VGG network (Visual Geometry Group Network, the super-resolution test network) is a modified VGG-19 network, including ten two-dimensional convolutional layers and modified linear unit layers in series, with three pooling layers inserted in between.
  • the VGG-19 module includes 4 blocks, among which, two convolutional layers and two modified linear units in block1, block2 and block4, four convolution kernels and four modified linear units in block3, and 3 pooling layers Between each block.
  • the two-branch multi-level CNN network includes the confidence network and the correlation vector field network.
  • the openpose model is called as the recognition model of the video image to be recognized. Input the to-be-recognized video image into the openpose model, and use the openpose model to recognize the to-be-recognized video image to obtain the key points of the human body in the to-be-recognized video image, thereby obtaining the position of the person according to the key points of the human body.
  • Step S208 When the image type is a night vision image, the lightweight target detection model obtained through training is used to identify the position information of the person in the video image to be identified.
  • the lightweight target detection model is ssdlite (Single Shot Detector-Lite, a lightweight single-shot detector) model.
  • the ssdlite model is a target detection framework that is used to identify whether there is a target.
  • the original loss (loss function) of the ssdlite model is changed to focal loss.
  • the openpose model is used to detect color images
  • the ssdlite model is used to detect night vision images.
  • the server determines the type of the video image to be recognized based on it, and if the type of the video image to be recognized is a night vision image, the ssdlite model is called as the recognition model of the video image to be recognized, and the ssdlite model is subsequently used for the video image to be recognized. Recognize the video image to identify the position of the person.
  • the surveillance video file to be identified is preprocessed to obtain the video image to be identified, thereby facilitating the subsequent processing of video content identification.
  • the image type of the video image to be recognized call the corresponding recognition model according to the image type, that is, when the image type is a color image, the human body posture model obtained through training recognizes the key points of the human body in the image to be recognized, and based on the recognized The key points of the human body determine the position information of the person in the video image to be recognized.
  • the image type is a night vision image
  • the lightweight target detection model obtained through training recognizes the position information of the person in the video image to be recognized.
  • step S204 determining the image type of the video image to be recognized includes the following steps:
  • Step S302 Obtain the three-channel pixel value of each pixel in the video image to be identified.
  • pixels refer to the small squares that make up the image, that is, the smallest unit in the image. And the small square has a clear position and assigned color value.
  • the color and position of the small square determine the appearance of the image.
  • the pixel value is the color value corresponding to the pixel, and the image type can be determined by the pixel value.
  • Image types include night vision images and color images.
  • the three-channel pixel value is the RGB pixel value
  • the RGB pixel value is the color value that determines the displayed color of the image. RGB are red, green, and blue respectively.
  • the server determines whether the video image is a night vision image or a color image according to the pixel value of the image image, it first obtains the RGB pixel values corresponding to all pixels in the image.
  • Step S304 Perform difference calculation based on the pixel values of the three channels, and select the value with the largest difference as the pixel difference.
  • the difference calculation is performed on the RGB.
  • the difference calculation is to subtract any two of the RGB, and select the value with the largest difference among the obtained multiple differences as the pixel difference value corresponding to this pixel. For example, taking pixel 1 as an example, the RGB value corresponding to pixel 1 is obtained.
  • Each RGB has a corresponding component value.
  • the specific component value depends on the specific image. Generally, the corresponding component value of RGB is between 0-255. That is, the component value corresponding to R, the component value corresponding to G, and the component value corresponding to B are respectively obtained, and then the three component values are subjected to mutual difference operation.
  • Step S306 Determine the image type of the video image to be recognized according to the preset value and the pixel difference value.
  • the preset value is a preset reference pixel value used to determine whether the video image is a color image or a night vision image.
  • the preset value is 10. Specifically, after the pixel difference value corresponding to the pixel is obtained, the pixel difference value is compared with a preset value of 10. If the pixel difference is greater than the preset value of 10, it is determined that the video image to be recognized is a color image, and if the pixel difference is less than or equal to the preset value of 10, it is determined that the video image to be recognized is a night vision image.
  • the image type of the video image to be recognized is determined by the pixel value of the video image to be recognized, ensuring that the recognition model that best matches the video image to be recognized can be called for subsequent recognition according to the image type of the video image to be recognized. Recognition accuracy rate.
  • step S204, determining the image type of the video image to be recognized includes: acquiring the acquisition mode adjustment time of the surveillance device corresponding to the surveillance video file to be identified, and acquiring the shooting time corresponding to the video image to be identified; adjusting according to the acquisition mode Time determines the image type of the video image to be recognized.
  • the monitoring device has two modes, including a color collection mode and a night vision black and white collection mode.
  • a color collection mode When the surveillance equipment collects surveillance video, the quality of the color video collected under the condition of low light is lost.
  • the surveillance equipment can automatically adjust the color acquisition mode to the night vision black and white mode to collect night vision black and white surveillance videos. Therefore, when the content of the video image to be identified is determined, the acquisition mode adjustment time of the monitoring device corresponding to the surveillance video file to be identified corresponding to the video image content to be identified is acquired, that is, the time to adjust from the color acquisition mode to the night vision black and white mode is acquired, thereby Determine the time for the monitoring device to adjust the mode.
  • the shooting time of the video image to be recognized is further obtained, and the shooting time of the video image to be recognized can be obtained from the video information.
  • the video image to be recognized can be determined to be a color image
  • the shooting time is after the acquisition mode adjustment time
  • the video image to be recognized can be determined
  • the video image is a night vision image.
  • the human body posture model obtained through training recognizes the key points of the human body in the image to be recognized, and determines the position information of the person in the video image to be recognized based on the recognized key points of the human body, It specifically includes: using the front network layer of the human pose model to perform feature extraction on the video image to be recognized to obtain the feature map corresponding to the video image to be recognized; using the confidence network layer of the human pose model to extract the video image to be recognized from the feature map
  • the key points of the human body in the human body are obtained, and the key point confidence map corresponding to the key points of the human body in the video image to be recognized is obtained; the correlation vector network layer of the human pose model is used to extract the key points of the human body in the video image to be recognized
  • Correlation Determine the position information of the person in the video image to be recognized according to the correlation between the key point confidence map and the key points of the human body.
  • the video image to be recognized is a color image
  • the video image to be recognized is first input into the front network of the human body note model, and the front network layer performs the operation of feature extraction such as convolution pooling on the video image to be recognized, and the conversion is obtained
  • the feature map corresponding to the video image to be recognized is input into the dual-branch multi-level CNN network, that is, through the confidence network in the dual-branch network, get each human key point and the corresponding key point confidence map, and get the correlation vector field network in the dual-branch network
  • the degree of association of each key point of the human body is to determine the position information of the person in the video image to be recognized according to the degree of association between the key point confidence map and the key point of the human body.
  • the key points of the human body in the video image to be recognized can be found, and the effective connection between the key points of the human body can be obtained according to the degree of association, that is, the position of the person can be determined through the key point confidence map and the degree of association.
  • determining the position information of the person in the video image to be recognized according to the correlation between the key point confidence map and the key points of the human body includes: connecting the key points of the human body on the key point confidence map according to the correlation of the key points of the human body , And calculate the key point contour; obtain the minimum circumscribed rectangle according to the key point contour, which is the rectangle with the smallest area including the contour of the key point; determine the position information of the person in the video image to be recognized according to the minimum circumscribed rectangle.
  • the key point contour refers to an irregular shape that frames the key points of the human body
  • the minimum circumscribed rectangle refers to the smallest rectangle that frames all the key point contours.
  • the opencv tool is used to calculate according to the key point confidence map and the degree of association.
  • the key points of the human body on the key point confidence map are connected according to the degree of association to obtain the posture corresponding to the human body.
  • use the opencv tool to calculate the outline of the key points.
  • the smallest circumscribed rectangle is obtained.
  • the area circumscribed by the smallest rectangle is the position of the character
  • the position coordinates of the circumscribed smallest rectangle is the character position information.
  • the obtained minimum circumscribed rectangle is deviated from the regular rectangle, that is, the obtained minimum circumscribed rectangle is an irregular rectangle, then it is corrected to a regular rectangle, and the final obtained minimum circumscribed rectangle is a regular rectangle .
  • another method for recognizing the position of a person in an image is provided, that is, after obtaining the position information of the person to be recognized, the method further includes the following steps:
  • Step S210 generating video information corresponding to the character position information.
  • Step S212 Write the video information into the corresponding log.
  • the preset target includes but is not limited to the human body, and can also be other objects, which are preset according to actual needs.
  • Logs refer to files used to record video information.
  • a surveillance video is taken as an example, and the requirement for identifying the surveillance video is to identify a human body appearing in the surveillance video. Therefore, the human body is taken as the preset target in this embodiment.
  • the video information includes whether the video image includes a preset target, which surveillance video file the video image comes from, and the coordinate position of the preset target in the video image.
  • the source of the video image and the coordinate position of the person in the video image are obtained, and then packaged into a file to obtain the generated video information.
  • the video information is generated, write the video information into the corresponding log.
  • the video information recorded in the log file can be used to know the video content of all surveillance video files. .
  • the recognition model is a pre-trained network model, that is, the human pose model and the lightweight target detection model are pre-trained models used for character position recognition.
  • Training the human body pose model and lightweight target detection model specifically includes: acquiring historical surveillance videos of monitoring equipment; extracting color image samples and night vision image samples from historical surveillance videos, and performing key points of the human body in the color image samples Annotate and annotate the position coordinates of the human body in the night vision image sample to obtain annotated color image and annotated night vision image; respectively adjust the size of the annotated color image and annotated night vision image to obtain training color image and training night vision image ; Map the key points of the human body in the training color image with the key points of the human body marked in the marked color image, and use the mapped training color image to train the human pose model; the position coordinates in the training night vision image and the mark The location coordinates marked in the night vision image are mapped, and the light-weight target detection model is trained using the mapped training night vision image.
  • the recognition model is trained based on the training images obtained from the historical surveillance video, so that the recognition model can fully learn the surveillance scene, and the subsequent identification of surveillance video content is more accurate. Since the recognition model includes two models, openpose and ssdlite, which recognize different types of images, the historical surveillance video obtained should include color video and night vision video.
  • the annotation software is used to mark the coordinates of the key points of the human body in the color image to obtain the labeled color image.
  • the labeling software includes but not limited to labelme standard software.
  • the color image is used to mark the key points of the human body.
  • the key points of the human body can have different numbers such as 9, 14, 16, 17, 18. In order to achieve more accurate recognition, this embodiment preferably marks the coordinates of 18 key points.
  • the key points include nose, neck, right shoulder, right elbow, right wrist, left shoulder, left elbow, left wrist, right hip, right knee, right ankle, left hip, left knee, left ankle, left eye, right eye, left ear, Right ear.
  • the night vision image is directly marked with the coordinates of the position of the person, and the marked night vision image is obtained.
  • the coordinates can be expressed as (minimum x-coordinate, minimum y-coordinate, maximum x-coordinate, and maximum y-coordinate), that is, (xmin, ymin, xmax, ymax).
  • the openpose and ssdlite models process images differently, the size of the input image that can be accepted is different. Therefore, after the color image is annotated with human body key points, the corresponding annotated color image needs to be scaled to 432*368 size, and the annotated night vision image is scaled to 300*300, and the scaled annotated color image and annotated night vision image are used as training colors Images and training night vision images.
  • the zoomed annotated coordinates will be mapped to the target annotation before zooming, that is, the training color image and the training night vision image will be mapped to the corresponding
  • the training color image and the training night vision image are input to the corresponding model for training, and the mapping relationship is established after training, so that the correct coordinates can be learned during the model training process.
  • the color image is input to the Openpose model for training
  • the night vision image is input to the ssdlite model for training.
  • a device for identifying the position of a person in an image including: a preprocessing module 502, a determination module 504, and an identification module 506, wherein:
  • the preprocessing module 502 is used to obtain the surveillance video file to be identified, and preprocess the surveillance video file to be identified to obtain the video image to be identified.
  • the determining module 504 is used to determine the image type of the video image to be recognized.
  • the recognition module 508 is used to recognize the key points of the human body in the video image to be recognized by the human body posture model obtained through training when the image type is a color image, and determine the position information of the person in the video image to be recognized based on the recognized key point of the human body .
  • the recognition module 508 is also used to recognize the position information of the person in the video image to be recognized by the lightweight target detection model obtained through training when the image type is a night vision image.
  • the determining module 504 is further configured to obtain the three-channel pixel value of each pixel in the video image to be identified; perform difference calculation based on the three-channel pixel value, and select the value with the largest difference as the pixel difference value; The value and the pixel difference determine the image type of the video image to be recognized.
  • the determining module 504 is further configured to obtain the acquisition mode adjustment time of the monitoring device corresponding to the surveillance video file to be identified, and the shooting time corresponding to the video image to be identified; determine the image of the video image to be identified according to the acquisition mode adjustment time Types of.
  • the recognition module 508 is further configured to use the front network layer of the human pose model to perform feature extraction on the video image to be recognized to obtain the feature map corresponding to the video image to be recognized; use the confidence network layer of the human pose model to extract features from the Extract the human body key points of the human body in the video image to be recognized, and obtain the key point confidence map corresponding to the human body key points in the video image to be recognized; use the correlation vector network layer of the human pose model to extract the video to be recognized from the feature map
  • the degree of association of each of the key points of the human body in the image; the position information of the person in the video image to be recognized is determined according to the degree of association between the key point confidence map and the key points of the human body.
  • the recognition module 508 is further configured to connect the key points of the human body on the key point confidence map according to the degree of association of the key points of the human body, and calculate the key point contour; obtain the minimum circumscribed rectangle according to the key point contour, The smallest rectangle is the rectangle with the smallest area including the outline of the key point; the position information of the person in the video image to be recognized is determined according to the circumscribed smallest rectangle.
  • the device for identifying the position of a person in an image further includes a generating module for generating video information corresponding to the position information of the person; and writing the video information into the corresponding log.
  • the device for recognizing the position of a person in an image further includes a training module for acquiring historical surveillance videos of the surveillance equipment; extracting color image samples and night vision image samples from the historical surveillance videos, and comparing the color image samples Annotate the key points of the human body and annotate the position coordinates of the human body in the night vision image sample to obtain annotated color image and annotated night vision image; adjust the size of the annotated color image and annotated night vision image to obtain the training color Images and training night vision images; map the human body key points in the training color image with the human body key points marked in the annotated color image, and use the mapped training color image to train the human body posture model; the night vision image will be trained The position coordinates in and the position coordinates marked in the marked night vision image are mapped, and the light-weight target detection model is trained using the mapped training night vision image.
  • the various modules in the above-mentioned surveillance video content recognition device can be implemented in whole or in part by software, hardware and a combination thereof.
  • the foregoing modules may be embedded in the form of hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 6.
  • the computer equipment includes a processor, a memory, a network interface and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the computer equipment database is used to store data.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program is executed by the processor to realize a method of recognizing the position of the person in the image.
  • FIG. 6 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • a computer device including a memory and a processor, the memory stores a computer program, and the processor implements the following steps when executing the computer program:
  • the human body posture model obtained through training recognizes the key points of the human body in the video image to be recognized, and determines the position information of the person in the video image to be recognized based on the recognized key points of the human body;
  • the lightweight target detection model obtained through training can identify the position information of the person in the video image to be recognized.
  • the processor further implements the following steps when executing the computer program:
  • the processor further implements the following steps when executing the computer program:
  • the processor further implements the following steps when executing the computer program:
  • the processor further implements the following steps when executing the computer program:
  • the correlation degree of the key points of the human body connect the key points of the human body on the key point confidence map, and calculate the key point contour; obtain the minimum circumscribed rectangle according to the key point contour, and the minimum circumscribed rectangle is the smallest area including the contour of the key point The rectangle; the position information of the person in the video image to be recognized is determined according to the minimum rectangle circumscribed.
  • the processor further implements the following steps when executing the computer program:
  • the processor further implements the following steps when executing the computer program:
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented:
  • the human body posture model obtained through training recognizes the key points of the human body in the video image to be recognized, and determines the position information of the person in the video image to be recognized based on the recognized key points of the human body;
  • the lightweight target detection model obtained through training can identify the position information of the person in the video image to be recognized.
  • the computer program further implements the following steps when being executed by the processor:
  • the computer program further implements the following steps when being executed by the processor:
  • the computer program further implements the following steps when being executed by the processor:
  • the computer program further implements the following steps when being executed by the processor:
  • the correlation degree of the key points of the human body connect the key points of the human body on the key point confidence map, and calculate the key point contour; obtain the minimum circumscribed rectangle according to the key point contour, and the minimum circumscribed rectangle is the smallest area including the contour of the key point The rectangle; the position information of the person in the video image to be recognized is determined according to the smallest rectangle circumscribed.
  • the computer program further implements the following steps when being executed by the processor:
  • the computer program further implements the following steps when being executed by the processor:
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Abstract

The present application relates to a method and an apparatus for recognizing the position of a person in an image on the basis of a neural network, a computer device and a storage medium. The method comprises: acquiring a monitoring video file to be recognized, and pre-processing said monitoring video file to obtain a video image to be recognized; determining the image type of said video image; when the image type is a color image, recognizing key human body points in said video image by means of a human body posture model obtained by training, and determining position information of a person in said video image on the basis of the recognized key human body points; and when the image type is a night vision image, recognizing the position information of the person in said video image by means of a lightweight object detection model obtained by training. Said method improves the working efficiency.

Description

识别图像中人物位置方法、装置、计算机设备和存储介质Method, device, computer equipment and storage medium for recognizing person position in image
本申请基于巴黎公约申明享有2019年7月12日递交的申请号为CN201910628940.8、名称为“识别图像中人物位置方法、装置、计算机设备和存储介质”的中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。Based on the Paris Convention, this application declares that it enjoys the priority of the Chinese patent application filed on July 12, 2019, with the application number CN201910628940.8 and titled "Methods, Devices, Computer Equipment and Storage Media for Identifying the Position of People in Images". The entire content of the patent application is incorporated into this application by reference.
技术领域Technical field
本申请涉及计算机技术领域,特别是涉及一种识别图像中人物位置的方法、装置、计算机设备和存储介质。This application relates to the field of computer technology, in particular to a method, device, computer equipment and storage medium for identifying the position of a person in an image.
背景技术Background technique
随着社会经济和安全生产的需要,视频监控设备在平安城市、智慧交通、安防工程等领域得到了越来越广泛的部署。并且,近年来视频监控朝着高清化、网络化和智能化的方向发展。但是,由于监控视频的广泛引用,海量摄像头所产生的大量视频数据也越来越多,为查看目标物,需从海量视频数据中查询,发明人意识到现有的查询方式主要依赖于人力查看和手动检索,导致视频内容监控自动化程度不高,查询效率慢等问题。With the needs of social economy and safe production, video surveillance equipment has been deployed more and more widely in fields such as safe cities, smart transportation, and security engineering. Moreover, in recent years, video surveillance has developed in the direction of high-definition, networking and intelligence. However, due to the widespread use of surveillance video, a large amount of video data generated by massive cameras is also increasing. In order to view the target, it is necessary to query from the massive video data. The inventor realized that the existing query methods mainly rely on human viewing. And manual retrieval, leading to problems such as low automation of video content monitoring and slow query efficiency.
技术问题technical problem
基于此,有必要针对上述技术问题,提供一种能够提高效率的识别图像中人物位置的方法、装置、计算机设备和存储介质。Based on this, it is necessary to provide a method, a device, a computer device, and a storage medium for identifying the position of a person in an image that can improve efficiency in response to the above technical problems.
技术解决方案Technical solutions
一种识别图像中人物位置的方法,所述方法包括:A method for identifying the position of a person in an image, the method comprising:
获取待识别监控视频文件,并对所述待识别监控视频文件进行预处理,获得待识别视频图像;Acquiring a surveillance video file to be identified, and preprocessing the surveillance video file to be identified to obtain a video image to be identified;
确定所述待识别视频图像的图像类型;Determining the image type of the video image to be recognized;
在所述图像类型为彩色图像时,通过训练获得的人体姿态模型识别所述待识别图像中的人体关键点,并基于识别出的人体关键点确定所述待识别视频图像中的人物位置信息;When the image type is a color image, the human body posture model obtained through training recognizes the key points of the human body in the image to be recognized, and determines the position information of the person in the video image to be recognized based on the recognized key points of the human body;
在所述图像类型为夜视图像时,通过训练获得的轻量级目标检测模型,识别出所述待识别视频图像中的人物位置信息。When the image type is a night vision image, the lightweight target detection model obtained through training recognizes the position information of the person in the video image to be recognized.
一种识别图像中人物位置的装置,所述装置包括:A device for recognizing the position of a person in an image, the device comprising:
预处理模块,用于获取待识别监控视频文件,并对所述待识别监控视频文件进行预处理,获得待识别视频图像;The preprocessing module is used to obtain the surveillance video file to be identified, and preprocess the surveillance video file to be identified to obtain the video image to be identified;
确定模块,用于确定所述待识别视频图像的图像类型;The determining module is used to determine the image type of the video image to be recognized;
识别模块,用于在所述图像类型为彩色图像时,通过训练获得的人体姿态模型识别所述待识别视频图像中的人体关键点,并基于识别出的人体关键点确定所述待识别视频图像中的人物位置信息;The recognition module is used to recognize the key points of the human body in the video image to be recognized by the human body posture model obtained by training when the image type is a color image, and determine the video image to be recognized based on the recognized key points of the human body Position information in
所述识别模块还用于在所述图像类型为夜视图像时,通过训练获得的轻量级目标检测模型,识别出所述待识别视频图像中的人物位置信息。The recognition module is also used to recognize the position information of the person in the video image to be recognized by the lightweight target detection model obtained through training when the image type is a night vision image.
一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现如下步骤:A computer device includes a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, the following steps are implemented:
获取待识别监控视频文件,并对所述待识别监控视频文件进行预处理,获得待识别视频图像;Acquiring a surveillance video file to be identified, and preprocessing the surveillance video file to be identified to obtain a video image to be identified;
确定所述待识别视频图像的图像类型;Determining the image type of the video image to be recognized;
在所述图像类型为彩色图像时,通过训练获得的人体姿态模型识别所述待识别图像中的人体关键点,并基于识别出的人体关键点确定所述待识别视频图像中的人物位置信息;When the image type is a color image, the human body posture model obtained through training recognizes the key points of the human body in the image to be recognized, and determines the position information of the person in the video image to be recognized based on the recognized key points of the human body;
在所述图像类型为夜视图像时,通过训练获得的轻量级目标检测模型,识别出所述待识别视频图像中的人物位置信息。When the image type is a night vision image, the lightweight target detection model obtained through training recognizes the position information of the person in the video image to be recognized.
一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述识别图像中人物位置的方法。A computer-readable storage medium has a computer program stored thereon, and when the computer program is executed by a processor, the method for recognizing the position of a person in an image is realized.
有益效果Beneficial effect
上述识别图像中人物位置的方法、装置、计算机设备和存储介质,当获取到待识别监控视频文件后,对待识别监控视频文件进行预处理得到待识别视频图像,从而便于后续对视频内容识别的处理。确定待识别视频图像的图像类型后,根据图像类型调用对应的识别模型,即在图像类型为彩色图像时,通过训练获得的人体姿态模型识别待识别图像中的人体关键点,并基于识别出的人体关键点确定待识别视频图像中的人物位置信息。而在图像类型为夜视图像时,通过训练获得的轻量级目标检测模型,识别出所述待识别视频图像中的人物位置信息。从而保证不同类型的待识别视频图像能够有最匹配的识别模型进行识别,提高识别的准确性。并且根据不同的识别模型检测视频图像中人物的位置,能够摆脱旧式的人工识别查看方法,实现自动化快速识别监控视频内容。提高工作效率。In the above method, device, computer equipment and storage medium for identifying the position of a person in an image, when the surveillance video file to be identified is acquired, the surveillance video file to be identified is preprocessed to obtain the video image to be identified, thereby facilitating the subsequent processing of video content identification . After determining the image type of the video image to be recognized, call the corresponding recognition model according to the image type, that is, when the image type is a color image, the human body posture model obtained through training recognizes the key points of the human body in the image to be recognized, and based on the recognized The key points of the human body determine the position information of the person in the video image to be recognized. When the image type is a night vision image, the lightweight target detection model obtained through training recognizes the position information of the person in the video image to be recognized. This ensures that different types of video images to be recognized can be recognized with the most matching recognition model, and the accuracy of recognition is improved. And according to different recognition models to detect the position of the person in the video image, it can get rid of the old manual recognition and viewing method, and realize the automatic and rapid recognition of surveillance video content. Improve work efficiency.
附图说明Description of the drawings
图1为一个实施例中识别图像中人物位置的方法的应用场景图;FIG. 1 is an application scene diagram of the method for recognizing the position of a person in an image in an embodiment;
图2为一个实施例中识别图像中人物位置的方法的流程示意图;FIG. 2 is a schematic flowchart of a method for recognizing the position of a person in an image in an embodiment;
图3为一个实施例中确定视频图像的类型步骤的流程示意图;FIG. 3 is a schematic flowchart of the step of determining the type of video image in an embodiment;
图4为另一个实施例中识别图像中人物位置的方法的流程示意图;4 is a schematic flowchart of a method for recognizing the position of a person in an image in another embodiment;
图5为一个实施例中识别图像中人物位置的装置的结构框图;FIG. 5 is a structural block diagram of an apparatus for recognizing the position of a person in an image in an embodiment;
图6为一个实施例中计算机设备的内部结构图。Fig. 6 is an internal structure diagram of a computer device in an embodiment.
本发明的实施方式Embodiments of the invention
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions, and advantages of this application clearer, the following further describes this application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the application, and not used to limit the application.
本申请提供的识别图像中人物位置的方法,可以应用于如图1所示的应用环境中。其中,监控设备102通过网络与服务器104进行通信。服务器104获取监控设备102发送的待识别监控视频文件,服务器104对监控视频文件进行预处理得到待识别视频图像。服务器104确定待识别视频图像的图像类型。服务器104在图像类型为彩色图像时,通过训练获得的人体姿态模型识别待识别图像中的人体关键点,并基于识别出的人体关键点确定待识别视频图像中的人物位置信息。服务器104在图像类型为夜视图像时,通过训练获得的轻量级目标检测模型,识别出待识别视频图像中的人物位置信息。其中,监控设备102可以但不限于是各种摄像头、携带有摄像头的个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备等,服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。The method for identifying the position of a person in an image provided by this application can be applied to the application environment as shown in FIG. 1. Wherein, the monitoring device 102 communicates with the server 104 through the network. The server 104 obtains the surveillance video file to be identified sent by the surveillance device 102, and the server 104 preprocesses the surveillance video file to obtain the video image to be identified. The server 104 determines the image type of the video image to be recognized. When the image type is a color image, the server 104 recognizes the key points of the human body in the image to be recognized through the human body posture model obtained through training, and determines the position information of the person in the video image to be recognized based on the recognized key points of the human body. When the image type is a night vision image, the server 104 recognizes the position information of the person in the video image to be recognized through the lightweight target detection model obtained through training. Among them, the monitoring device 102 can be, but is not limited to, various cameras, personal computers with cameras, notebook computers, smart phones, tablet computers, and portable wearable devices, etc. The server 104 can be an independent server or composed of multiple servers. Server cluster to achieve.
在一个实施例中,如图2所示,提供了一种识别图像中人物位置的方法,以该方法应用于图1中的服务器为例进行说明,包括以下步骤:In an embodiment, as shown in FIG. 2, a method for recognizing the position of a person in an image is provided. Taking the method applied to the server in FIG. 1 as an example for description, the method includes the following steps:
步骤S202,获取待识别监控视频文件,对待识别监控视频文件进行预处理得到待识别视频图像。Step S202: Obtain a surveillance video file to be identified, and perform preprocessing on the surveillance video file to be identified to obtain a video image to be identified.
其中,待识别监控视频文件是指包括监控设备所采集的监控视频的文件,可以理解为,待识别监控视频文件包括但不限于是监控设备采集监控视频传送给服务器,也可以是具备传输功能且与服务器进行通信的其他终端设备。即服务器获取的待识别监控视频文件可以来自监控设备,也可以来自其他终端设备发送的视频文件。预处理是指对待识别监控视频文件进行解码获取到对应的待识别监控视频,并且对待识别监控视频进行分割得到待识别监控视频中的待识别视频图像,并对待识别视频图像进行灰度调整、去燥以及锐化等技术处理,即通过调整改善图像画质和噪音保证图像的清晰度和质量。Among them, the surveillance video file to be identified refers to the file that includes the surveillance video collected by the surveillance equipment. It can be understood that the surveillance video file to be identified includes, but is not limited to, the surveillance video collected by the surveillance equipment and sent to the server. It can also be a file with a transmission function. Other terminal devices that communicate with the server. That is, the surveillance video file to be identified obtained by the server can come from the surveillance device or the video file sent by other terminal devices. Preprocessing means that the surveillance video file to be identified is decoded to obtain the corresponding surveillance video to be identified, and the surveillance video to be identified is segmented to obtain the surveillance video to be identified in the surveillance video to be identified, and the gray scale of the identified video image is adjusted and removed. Technical processing such as drying and sharpening, that is, through adjustments to improve image quality and noise to ensure image clarity and quality.
具体地,用户可以通过监控设备下发人物位置识别指令,并且选定需要进行识别的待识别监控视频。当监控设备接收到用户下发的人物位置识别指令后,获取用户所选定的待识别监控视频进行压缩封装成对应的待识别监控视频文件,并将待识别监控视频文件发送至对应的服务器,以及向对应的服务器发送人物位置识别的请求。服务器接收到人物位置识别请求后,将该人物位置识别请求对应的待识别监控视频文件进行解码还原得到待识别监控视频,然后对该待识别监控视频进行预处理得到待识别监控视频中的待识别视频图像。Specifically, the user can issue a character position recognition instruction through the monitoring device, and select the to-be-recognized monitoring video that needs to be recognized. When the surveillance equipment receives the person position recognition instruction issued by the user, it obtains the surveillance video to be identified selected by the user, compresses and encapsulates the surveillance video file to be identified, and sends the surveillance video file to be identified to the corresponding server. And send a request for person location recognition to the corresponding server. After the server receives the person location recognition request, it decodes and restores the to-be-recognized surveillance video file corresponding to the person location-recognition request to obtain the to-be-recognized surveillance video, and then preprocesses the to-be-recognized surveillance video to obtain the to-be-recognized surveillance video. Video image.
步骤S204,确定待识别视频图像的图像类型。Step S204: Determine the image type of the video image to be recognized.
具体地,当服务器对待识别监控视频文件进行预处理得到对应的待识别视频图像后,通过获取该待识别视频图像中的像素值确定该待识别视频图像属于夜视图像还是彩色图像。Specifically, after the server preprocesses the surveillance video file to be recognized to obtain the corresponding video image to be recognized, it determines whether the video image to be recognized is a night vision image or a color image by acquiring pixel values in the video image to be recognized.
步骤S206,在图像类型为彩色图像时,通过训练获得的人体姿态模型识别待识别图像中的人体关键点,并基于识别出的人体关键点确定待识别视频图像中的人物位置信息。Step S206: When the image type is a color image, the human body posture model obtained through training recognizes the key points of the human body in the image to be recognized, and determines the position information of the person in the video image to be recognized based on the recognized key points of the human body.
其中,人体姿态模型为openpose模型,openpose模型是一种姿势检测框架,用于检测人体的关节,例如颈部、肩部和肘部等关键点,将关键点联系起来得到人体姿态。openpose模型包括前置网络层和双分支多层次的CNN网络(Convolutional Neural Networks,卷积神经网络)。前置网络是在VGG网络(Visual Geometry Group Network,超分辨率测试网络)的基础上修改而来的VGG-19网络,包括十个二维卷积层和修正线性单元层依次串联,其间插入3个池化层。即VGG-19模块包括4个block,其中,block1、block2 和block4中分别两个卷积层和两个修正线性单元,block3中四个卷积核和四个修正线性单元,3个池化层介于每个block之间。双分支多层次的CNN网络包括置信度网络和关联度向量场网络。Among them, the human pose model is an openpose model. The openpose model is a pose detection framework used to detect human joints, such as key points such as the neck, shoulders, and elbows, and link the key points to obtain the human body posture. The openpose model includes a pre-network layer and a dual-branch multi-level CNN network (Convolutional Neural Networks, convolutional neural network). The front network is in the VGG network (Visual Geometry Group Network, the super-resolution test network) is a modified VGG-19 network, including ten two-dimensional convolutional layers and modified linear unit layers in series, with three pooling layers inserted in between. That is, the VGG-19 module includes 4 blocks, among which, two convolutional layers and two modified linear units in block1, block2 and block4, four convolution kernels and four modified linear units in block3, and 3 pooling layers Between each block. The two-branch multi-level CNN network includes the confidence network and the correlation vector field network.
具体地,当服务器基于确定待识别视频图像的类型后,若该待识别视频图像的类型为彩色图像,则调用openpose模型作为该待识别视频图像的识别模型。将待识别视频图像输入至openpose模型中,利用openpose模型对该待识别视频图像进行识别,得到待识别视频图像中人体的人体关键点,从而根据人体关键点得到人物位置。Specifically, after the server determines based on the type of the video image to be recognized, if the type of the video image to be recognized is a color image, the openpose model is called as the recognition model of the video image to be recognized. Input the to-be-recognized video image into the openpose model, and use the openpose model to recognize the to-be-recognized video image to obtain the key points of the human body in the to-be-recognized video image, thereby obtaining the position of the person according to the key points of the human body.
步骤S208,在图像类型为夜视图像时,通过训练获得的轻量级目标检测模型,识别出待识别视频图像中的人物位置信息。Step S208: When the image type is a night vision image, the lightweight target detection model obtained through training is used to identify the position information of the person in the video image to be identified.
其中,轻量级目标检测模型为ssdlite(Single Shot Detector-Lite,轻量级单次检测器)模型,ssdlite模型是一种目标检测框架,用于识别是否存在目标物的模型。在本实施例中,为了提高模型的精度,将ssdlite模型原有的loss(损失函数)改为focal loss。并且,由于夜视图像很难检测人体姿态的各个关键点,因此在本实施例中,openpose模型用于检测彩色图像,ssdlite模型用于检测夜视图像。Among them, the lightweight target detection model is ssdlite (Single Shot Detector-Lite, a lightweight single-shot detector) model. The ssdlite model is a target detection framework that is used to identify whether there is a target. In this embodiment, in order to improve the accuracy of the model, the original loss (loss function) of the ssdlite model is changed to focal loss. In addition, since night vision images are difficult to detect various key points of the human body posture, in this embodiment, the openpose model is used to detect color images, and the ssdlite model is used to detect night vision images.
具体地,当服务器基于确定待识别视频图像的类型后,而若该待识别视频图像的类型为夜视图像,则调用ssdlite模型作为该待识别视频图像的识别模型,后续使用ssdlite模型对该待识别视频图像进行人物位置的识别。Specifically, after the server determines the type of the video image to be recognized based on it, and if the type of the video image to be recognized is a night vision image, the ssdlite model is called as the recognition model of the video image to be recognized, and the ssdlite model is subsequently used for the video image to be recognized. Recognize the video image to identify the position of the person.
上述识别图像中人物位置的方法,当获取到待识别监控视频文件后,对待识别监控视频文件进行预处理得到待识别视频图像,从而便于后续对视频内容识别的处理。确定待识别视频图像的图像类型后,根据图像类型调用对应的识别模型,即在图像类型为彩色图像时,通过训练获得的人体姿态模型识别待识别图像中的人体关键点,并基于识别出的人体关键点确定待识别视频图像中的人物位置信息。而在图像类型为夜视图像时,通过训练获得的轻量级目标检测模型,识别出所述待识别视频图像中的人物位置信息。从而保证不同类型的待识别视频图像能够有最匹配的识别模型进行识别,提高识别的准确性。并且根据不同的识别模型检测视频图像中人物的位置,能够摆脱旧式的人工识别查看方法,实现自动化快速识别监控视频内容。提高工作效率。In the above method for identifying the position of a person in an image, after the surveillance video file to be identified is obtained, the surveillance video file to be identified is preprocessed to obtain the video image to be identified, thereby facilitating the subsequent processing of video content identification. After determining the image type of the video image to be recognized, call the corresponding recognition model according to the image type, that is, when the image type is a color image, the human body posture model obtained through training recognizes the key points of the human body in the image to be recognized, and based on the recognized The key points of the human body determine the position information of the person in the video image to be recognized. When the image type is a night vision image, the lightweight target detection model obtained through training recognizes the position information of the person in the video image to be recognized. This ensures that different types of video images to be recognized can be recognized with the most matching recognition model, and the accuracy of recognition is improved. And according to different recognition models to detect the position of the person in the video image, it can get rid of the old manual recognition and viewing method, and realize the automatic and rapid recognition of surveillance video content. Improve work efficiency.
在一个实施例中,如图3所示,步骤S204,确定待识别视频图像的图像类型包括以下步骤:In one embodiment, as shown in FIG. 3, step S204, determining the image type of the video image to be recognized includes the following steps:
步骤S302,获取待识别视频图像中各像素的三通道像素值。Step S302: Obtain the three-channel pixel value of each pixel in the video image to be identified.
其中,像素是指组成图像的小方格,即图像中的最小单位。并且该小方格都有明确的位置和被分配的色彩数值,小方格颜色和位置就决定该图像呈现出来的样子。像素值即为该像素对应的色彩数值,通过像素值可以确定图像的类型。图像类型包括夜视图像和彩色图像。三通道像素值为RGB像素值,RGB像素值即为决定图像所呈现显颜色的色彩数值。RGB分别为红色red、绿色green和蓝色blue。具体地,当服务器根据像图像像素值确定该视频图像是夜视图像还是彩色图像时,首先获取图像中所有像素对应的RGB像素值。Among them, pixels refer to the small squares that make up the image, that is, the smallest unit in the image. And the small square has a clear position and assigned color value. The color and position of the small square determine the appearance of the image. The pixel value is the color value corresponding to the pixel, and the image type can be determined by the pixel value. Image types include night vision images and color images. The three-channel pixel value is the RGB pixel value, and the RGB pixel value is the color value that determines the displayed color of the image. RGB are red, green, and blue respectively. Specifically, when the server determines whether the video image is a night vision image or a color image according to the pixel value of the image image, it first obtains the RGB pixel values corresponding to all pixels in the image.
步骤S304,基于三通道像素值进行差值计算,选择差值最大的值作为像素差值。Step S304: Perform difference calculation based on the pixel values of the three channels, and select the value with the largest difference as the pixel difference.
具体地,获取各像素的三通道像素值后,即获取到RGB像素值后,对RGB进行差值计算。差值计算即是将RGB中任意两个进行减法运算,所得到多个差值中选择差值最大的值作为这个像素对应的像素差值。例如,以像素1为例,获取像素1对应的RGB值,每个RGB都有对应的分量值,分量值具体是多少依具体图像而定,一般RGB对应的分量值在0-255之间。即分别获取R对应的分量值,G对应的分量值以及B对应的分量值,然后将三个分量值互相进行差值运算。相当于分别计算R-G的绝对值,R-B的绝对值,G-B的绝对值,由于R-B或者B-R的值是相同的,但是符号相反,而符号相反在数学上虽有不同,但对于像素没有。因此通过取绝对值可以减少计算步骤,从而快速完成计算。也就是说,取差值最大的值作为像素1的像素差值即从R-G的绝对值,R-B的绝对值,G-B的绝对值中选取一个最大值作为像素1的像素差值。Specifically, after obtaining the three-channel pixel value of each pixel, that is, after obtaining the RGB pixel value, the difference calculation is performed on the RGB. The difference calculation is to subtract any two of the RGB, and select the value with the largest difference among the obtained multiple differences as the pixel difference value corresponding to this pixel. For example, taking pixel 1 as an example, the RGB value corresponding to pixel 1 is obtained. Each RGB has a corresponding component value. The specific component value depends on the specific image. Generally, the corresponding component value of RGB is between 0-255. That is, the component value corresponding to R, the component value corresponding to G, and the component value corresponding to B are respectively obtained, and then the three component values are subjected to mutual difference operation. It is equivalent to calculating the absolute value of R-G, R-B, and G-B respectively. Because the values of R-B or B-R are the same, but the sign is opposite, and the sign is opposite, although there is a mathematical difference, but not for the pixel. Therefore, the calculation steps can be reduced by taking the absolute value, and the calculation can be completed quickly. In other words, the maximum difference value is taken as the pixel difference value of pixel 1, that is, the absolute value of R-G, the absolute value of R-B, and the absolute value of G-B are selected as the pixel difference value of pixel 1.
步骤S306,根据预设值和像素差值确定待识别视频图像的图像类型。Step S306: Determine the image type of the video image to be recognized according to the preset value and the pixel difference value.
其中,预设值为预设用于判断视频图像是彩色图像或者夜视图像的参考像素值。在本实施例中,预设值为10。具体地,当获取到像素对应的像素差值后,将该像素差值与预设值10进行比较。若该像素差值大于预设值10,则确定该待识别视频图像为彩色图像,而若该像素差值小于等于预设值10,则确定该待识别视频图像为夜视图像。在本实施例中,通过待识别视频图像的像素值确定待识别视频图像的图像类型,保证后续可根据待识别视频图像的图像类型调用与该待识别视频图像最为匹配的识别模型进行识别,提高识别准确率。The preset value is a preset reference pixel value used to determine whether the video image is a color image or a night vision image. In this embodiment, the preset value is 10. Specifically, after the pixel difference value corresponding to the pixel is obtained, the pixel difference value is compared with a preset value of 10. If the pixel difference is greater than the preset value of 10, it is determined that the video image to be recognized is a color image, and if the pixel difference is less than or equal to the preset value of 10, it is determined that the video image to be recognized is a night vision image. In this embodiment, the image type of the video image to be recognized is determined by the pixel value of the video image to be recognized, ensuring that the recognition model that best matches the video image to be recognized can be called for subsequent recognition according to the image type of the video image to be recognized. Recognition accuracy rate.
在另一个实施例中,步骤S204,确定待识别视频图像的图像类型包括:获取待识别监控视频文件对应监控设备的采集模式调节时间,以及获取待识别视频图像对应的拍摄时间;根据采集模式调节时间确定待识别视频图像的图像类型。In another embodiment, step S204, determining the image type of the video image to be recognized includes: acquiring the acquisition mode adjustment time of the surveillance device corresponding to the surveillance video file to be identified, and acquiring the shooting time corresponding to the video image to be identified; adjusting according to the acquisition mode Time determines the image type of the video image to be recognized.
具体地,监控设备具有两种模式,包括彩色采集模式和夜视黑白采集模式。在监控设备采集监控视频时,由于光线较低的情况下采集到的彩色视频质量有所损失。而为了保证监控视频的质量,在光线较低的时候,监控设备能够自动将彩色采集模式调节为夜视黑白模式,从而采集夜视黑白的监控视频。因此,当确定待识别视频图像内容时,通过获取待识别视频图像内容对应的待识别监控视频文件的监控设备的采集模式调节时间,即获取从彩色采集模式调节到夜视黑白模式的时间,从而确定该监控设备调节模式的时间。然后,进一步获取待识别视频图像的拍摄时间,待识别视频图像的拍摄时间从视频信息中即能获取。通过将采集模式调节时间和拍摄时间进行比对,当拍摄时间在采集模式调节时间之前,即可确定待识别视频图像为彩色图像,而当拍摄时间在采集模式调节时间之后,即可确定待识别视频图像为夜视图像。在一个实施例中,在图像类型为彩色图像时,通过训练获得的人体姿态模型识别待识别图像中的人体关键点,并基于识别出的人体关键点确定待识别视频图像中的人物位置信息,具体包括:利用人体姿态模型的前置网络层对待识别视频图像进行特征提取,得到待识别视频图像对应的特征图;利用人体姿态模型的置信度网络层从特征图中提取所述待识别视频图像中人体的人体关键点,得到待识别视频图像中人体关键点对应的关键点置信图;利用人体姿态模型的关联度向量网络层从特征图中提取待识别视频图像中各所述人体关键点的关联度;根据关键点置信图和人体关键点的关联度确定待识别视频图像的人物位置信息。Specifically, the monitoring device has two modes, including a color collection mode and a night vision black and white collection mode. When the surveillance equipment collects surveillance video, the quality of the color video collected under the condition of low light is lost. In order to ensure the quality of the surveillance video, when the light is low, the surveillance equipment can automatically adjust the color acquisition mode to the night vision black and white mode to collect night vision black and white surveillance videos. Therefore, when the content of the video image to be identified is determined, the acquisition mode adjustment time of the monitoring device corresponding to the surveillance video file to be identified corresponding to the video image content to be identified is acquired, that is, the time to adjust from the color acquisition mode to the night vision black and white mode is acquired, thereby Determine the time for the monitoring device to adjust the mode. Then, the shooting time of the video image to be recognized is further obtained, and the shooting time of the video image to be recognized can be obtained from the video information. By comparing the acquisition mode adjustment time with the shooting time, when the shooting time is before the acquisition mode adjustment time, the video image to be recognized can be determined to be a color image, and when the shooting time is after the acquisition mode adjustment time, the video image to be recognized can be determined The video image is a night vision image. In one embodiment, when the image type is a color image, the human body posture model obtained through training recognizes the key points of the human body in the image to be recognized, and determines the position information of the person in the video image to be recognized based on the recognized key points of the human body, It specifically includes: using the front network layer of the human pose model to perform feature extraction on the video image to be recognized to obtain the feature map corresponding to the video image to be recognized; using the confidence network layer of the human pose model to extract the video image to be recognized from the feature map The key points of the human body in the human body are obtained, and the key point confidence map corresponding to the key points of the human body in the video image to be recognized is obtained; the correlation vector network layer of the human pose model is used to extract the key points of the human body in the video image to be recognized Correlation: Determine the position information of the person in the video image to be recognized according to the correlation between the key point confidence map and the key points of the human body.
具体地,当待识别视频图像为彩色图像时,首先将待识别视频图像输入人体字条模型的前置网络,经过前置网络层对待识别视频图像进行卷积池化等特征提取的操作,转换得到待识别视频图像对应的特征图。然后将特征图输入至双分支多层次的CNN网络,即经过双分支网络中的置信度网络得到每个人体关键点以及对应的关键点置信图,经过双分支网络中的关联度向量场网络得到每个人体关键点的关联度,即根据关键点置信图和人体关键点的关联度确定待识别视频图像的人物位置信息。通过关键点置信图可以找到待识别视频图像中人物的人体关键点,根据关联度可获取各人体关键点之间的有效连接,即通过关键点置信图和关联度可以确定人物位置。Specifically, when the video image to be recognized is a color image, the video image to be recognized is first input into the front network of the human body note model, and the front network layer performs the operation of feature extraction such as convolution pooling on the video image to be recognized, and the conversion is obtained The feature map corresponding to the video image to be recognized. Then input the feature map to the dual-branch multi-level CNN network, that is, through the confidence network in the dual-branch network, get each human key point and the corresponding key point confidence map, and get the correlation vector field network in the dual-branch network The degree of association of each key point of the human body is to determine the position information of the person in the video image to be recognized according to the degree of association between the key point confidence map and the key point of the human body. Through the key point confidence map, the key points of the human body in the video image to be recognized can be found, and the effective connection between the key points of the human body can be obtained according to the degree of association, that is, the position of the person can be determined through the key point confidence map and the degree of association.
在一个实施例中,根据关键点置信图和人体关键点的关联度确定待识别视频图像的人物位置信息,包括:根据人体关键点的关联度,将关键点置信图上的人体关键点进行连接,并计算得到关键点轮廓;根据关键点轮廓获取外接最小矩形,外接最小矩形为包括所述关键点轮廓的面积最小的矩形;根据外接最小矩形确定待识别视频图像中的人物位置信息。In one embodiment, determining the position information of the person in the video image to be recognized according to the correlation between the key point confidence map and the key points of the human body includes: connecting the key points of the human body on the key point confidence map according to the correlation of the key points of the human body , And calculate the key point contour; obtain the minimum circumscribed rectangle according to the key point contour, which is the rectangle with the smallest area including the contour of the key point; determine the position information of the person in the video image to be recognized according to the minimum circumscribed rectangle.
其中,关键点轮廓是指将人体关键点框起来的不规则形状,最小外接矩形即是指将所有关键点轮廓框起来的最小矩形。具体地,利用opencv工具根据关键点置信图和关联度进行计算,首先根据关联度将关键点置信图上的人体关键点进行连接,得到人体对应的姿态。并且,同时利用opencv工具计算得到关键点轮廓,在根据关键点轮廓得到最小外接矩形,外接最小矩形内区域的即是人物的所在位置,外接最小矩形的位置坐标即是人物位置信息。其中,若得到的外接最小矩形为相比规则的矩形有所偏差,也就是得到的外接最小矩形是不规则的矩形,则将其矫正为规则的矩形,最终得到的外接最小矩形为规则的矩形。Among them, the key point contour refers to an irregular shape that frames the key points of the human body, and the minimum circumscribed rectangle refers to the smallest rectangle that frames all the key point contours. Specifically, the opencv tool is used to calculate according to the key point confidence map and the degree of association. First, the key points of the human body on the key point confidence map are connected according to the degree of association to obtain the posture corresponding to the human body. And, at the same time, use the opencv tool to calculate the outline of the key points. According to the outline of the key points, the smallest circumscribed rectangle is obtained. The area circumscribed by the smallest rectangle is the position of the character, and the position coordinates of the circumscribed smallest rectangle is the character position information. Among them, if the obtained minimum circumscribed rectangle is deviated from the regular rectangle, that is, the obtained minimum circumscribed rectangle is an irregular rectangle, then it is corrected to a regular rectangle, and the final obtained minimum circumscribed rectangle is a regular rectangle .
在一个实施例中,如图4所示,提供另一种识别图像中人物位置的方法,即当得到待识别人物位置信息之后,还包括以下步骤:In an embodiment, as shown in FIG. 4, another method for recognizing the position of a person in an image is provided, that is, after obtaining the position information of the person to be recognized, the method further includes the following steps:
步骤S210,生成与人物位置信息对应的视频信息。Step S210, generating video information corresponding to the character position information.
步骤S212,将视频信息写入对应的日志中。Step S212: Write the video information into the corresponding log.
其中,预设目标包括但不限于人体,还可以是其他物体,根据实际需求进行预设。日志是指用于记录视频信息的文档。具体地,在本实施例中以监控视频为例,对监控视频进行识别的需求是为了识别得到监控视频中出现的人体。因此,本实施例中将人体作为预设目标。而当识别检测得到视频内容时,基于检测出来的人物位置信息生成对应的视频信息。其中,视频信息包括该视频图像是否包括预设目标,该视频图像来源于哪个监控视频文件、以及预设目标在视频图像中的坐标位置等。可以理解为,获取该视频图像来的来源、视频图像中人物的坐标位置之后,从而打包成一个文件,得到生成的视频信息。当生成视频信息后,将该视频信息写入对应的日志中,后续需要了解监控视频内容时可直接调用该日志文件,通过日志文件中记载的视频信息可以得知所有监控视频文件中的视频内容。Among them, the preset target includes but is not limited to the human body, and can also be other objects, which are preset according to actual needs. Logs refer to files used to record video information. Specifically, in this embodiment, a surveillance video is taken as an example, and the requirement for identifying the surveillance video is to identify a human body appearing in the surveillance video. Therefore, the human body is taken as the preset target in this embodiment. When the video content is obtained by recognition and detection, corresponding video information is generated based on the detected position information of the person. Wherein, the video information includes whether the video image includes a preset target, which surveillance video file the video image comes from, and the coordinate position of the preset target in the video image. It can be understood that the source of the video image and the coordinate position of the person in the video image are obtained, and then packaged into a file to obtain the generated video information. When the video information is generated, write the video information into the corresponding log. When you need to know the surveillance video content later, you can directly call the log file. The video information recorded in the log file can be used to know the video content of all surveillance video files. .
在一个实施例中,识别模型为预先训练好的网络模型,即人体姿态模型和轻量级目标检测模型为预先训练好,用于人物位置识别的模型。训练人体姿态模型和轻量级目标检测模型具体包括:获取监控设备的历史监控视频;从历史监控视频中提取彩色图像样本和夜视图像样本,并对彩色图像样本中的人体进行人体关键点的标注,以及对夜视图像样本中的人体进行位置坐标的标注,得到标注彩色图像和标注夜视图像;分别将标注彩色图像和标注夜视图像进行尺寸调整,得到训练彩色图像和训练夜视图像;将训练彩色图像中的人体关键点与标注彩色图像中标注的人体关键点进行映射,利用映射后的训练彩色图像对所述人体姿态模型进行训练;将训练夜视图像中的位置坐标与标注夜视图像中标注的位置坐标进行映射,利用映射后的训练夜视图像对轻量级目标检测模型进行训练。In one embodiment, the recognition model is a pre-trained network model, that is, the human pose model and the lightweight target detection model are pre-trained models used for character position recognition. Training the human body pose model and lightweight target detection model specifically includes: acquiring historical surveillance videos of monitoring equipment; extracting color image samples and night vision image samples from historical surveillance videos, and performing key points of the human body in the color image samples Annotate and annotate the position coordinates of the human body in the night vision image sample to obtain annotated color image and annotated night vision image; respectively adjust the size of the annotated color image and annotated night vision image to obtain training color image and training night vision image ; Map the key points of the human body in the training color image with the key points of the human body marked in the marked color image, and use the mapped training color image to train the human pose model; the position coordinates in the training night vision image and the mark The location coordinates marked in the night vision image are mapped, and the light-weight target detection model is trained using the mapped training night vision image.
具体地,基于历史监控视频得到的训练图像训练识别模型,使得识别模型对监控这一场景进行充分学习,后续识别监控视频内容时更加准确。由于识别模型包括openpose和ssdlite两个模型,两个模型对不同类型的图像进行识别,因此获取到的历史监控视频应当包括彩色视频和夜视视频。Specifically, the recognition model is trained based on the training images obtained from the historical surveillance video, so that the recognition model can fully learn the surveillance scene, and the subsequent identification of surveillance video content is more accurate. Since the recognition model includes two models, openpose and ssdlite, which recognize different types of images, the historical surveillance video obtained should include color video and night vision video.
当获取到历史监控视频后,利用FFmpeg从历史视频中提取满足训练要求的视频图像,即提取包括人体的彩色图像样本和夜视图像样本。。获取到包括人体的彩色图像样本和夜视图像样本后,使用标注软件将彩色图像中的人物的人体关键点进行坐标标注,得到标注彩色图像。其中,标注软件包括但不限于labelme标准软件。以及彩色图像是标注人体关键点,人体关键点一般可以有9、14、16、17、18等不同个数数量,为了实现更精确的识别,本实施例优选标注18个关键点的坐标,18个关键点包括鼻子,脖子,右肩,右肘,右腕,左肩,左肘,左腕,右髋,右膝,右踝,左髋,左膝,左踝,左眼,右眼,左耳,右耳。夜视图像则直接标注人物位置坐标,得到标注夜视图像。其中,坐标可以表示为(x坐标最小值,y坐标最小值,x坐标最大值,y坐标最大值),即表示为(xmin,ymin,xmax,ymax)。After acquiring the historical surveillance video, use FFmpeg to extract video images that meet the training requirements from the historical video, that is, extract the color image samples and night vision image samples including the human body. . After obtaining the color image sample and night vision image sample including the human body, the annotation software is used to mark the coordinates of the key points of the human body in the color image to obtain the labeled color image. Among them, the labeling software includes but not limited to labelme standard software. And the color image is used to mark the key points of the human body. Generally, the key points of the human body can have different numbers such as 9, 14, 16, 17, 18. In order to achieve more accurate recognition, this embodiment preferably marks the coordinates of 18 key points. The key points include nose, neck, right shoulder, right elbow, right wrist, left shoulder, left elbow, left wrist, right hip, right knee, right ankle, left hip, left knee, left ankle, left eye, right eye, left ear, Right ear. The night vision image is directly marked with the coordinates of the position of the person, and the marked night vision image is obtained. Among them, the coordinates can be expressed as (minimum x-coordinate, minimum y-coordinate, maximum x-coordinate, and maximum y-coordinate), that is, (xmin, ymin, xmax, ymax).
由于openpose和ssdlite两个模型对图像进行不同处理,因此能够接受的输入图像的尺寸有所不同。因此,彩色图像进行人体关键点标注后需要将对应的标注彩色图像缩放成432*368大小,而标注夜视图像缩放成300*300大小,缩放后的标注彩色图像和标注夜视图像作为训练彩色图像和训练夜视图像。而由于缩放后的标注彩色图像和标注夜视图像中所标注的坐标位置会有所变化,因此将缩放后的标注坐标与缩放前目标标注进行映射,即将训练彩色图像和训练夜视图像与对应的标注彩色图像和标注夜视图像进行映射后再将训练彩色图像和训练夜视图像输入至对应的模型中训练,通过建立映射关系后训练,使得模型训练过程中能够学习正确的坐标。其中,彩色图像输入至Openpose模型训练,夜视图像输入到ssdlite模型中进行训练。Since the openpose and ssdlite models process images differently, the size of the input image that can be accepted is different. Therefore, after the color image is annotated with human body key points, the corresponding annotated color image needs to be scaled to 432*368 size, and the annotated night vision image is scaled to 300*300, and the scaled annotated color image and annotated night vision image are used as training colors Images and training night vision images. However, since the coordinate positions marked in the zoomed annotated color image and annotated night vision image will change, the zoomed annotated coordinates will be mapped to the target annotation before zooming, that is, the training color image and the training night vision image will be mapped to the corresponding After mapping the labeled color image and the labeled night vision image, the training color image and the training night vision image are input to the corresponding model for training, and the mapping relationship is established after training, so that the correct coordinates can be learned during the model training process. Among them, the color image is input to the Openpose model for training, and the night vision image is input to the ssdlite model for training.
应该理解的是,虽然图2-3的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-3中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that, although the various steps in the flowchart of FIGS. 2-3 are displayed in sequence as indicated by the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least some of the steps in Figure 2-3 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
在一个实施例中,如图5所示,提供了一种识别图像中人物位置的装置,包括:预处理模块502、确定模块504和识别模块506,其中:In one embodiment, as shown in FIG. 5, a device for identifying the position of a person in an image is provided, including: a preprocessing module 502, a determination module 504, and an identification module 506, wherein:
预处理模块502,用于获取待识别监控视频文件,并对待识别监控视频文件进行预处理,获得待识别视频图像。The preprocessing module 502 is used to obtain the surveillance video file to be identified, and preprocess the surveillance video file to be identified to obtain the video image to be identified.
确定模块504,用于确定待识别视频图像的图像类型。The determining module 504 is used to determine the image type of the video image to be recognized.
识别模块508,用于在图像类型为彩色图像时,通过训练获得的人体姿态模型识别待识别视频图像中的人体关键点,并基于识别出的人体关键点确定待识别视频图像中的人物位置信息。The recognition module 508 is used to recognize the key points of the human body in the video image to be recognized by the human body posture model obtained through training when the image type is a color image, and determine the position information of the person in the video image to be recognized based on the recognized key point of the human body .
识别模块508还用于在图像类型为夜视图像时,通过训练获得的轻量级目标检测模型,识别出待识别视频图像中的人物位置信息。The recognition module 508 is also used to recognize the position information of the person in the video image to be recognized by the lightweight target detection model obtained through training when the image type is a night vision image.
在一个实施例中,确定模块504还用于获取待识别视频图像中各像素的三通道像素值;基于三通道像素值进行差值计算,选择差值最大的值作为像素差值;根据预设值和像素差值确定待识别视频图像的图像类型。In one embodiment, the determining module 504 is further configured to obtain the three-channel pixel value of each pixel in the video image to be identified; perform difference calculation based on the three-channel pixel value, and select the value with the largest difference as the pixel difference value; The value and the pixel difference determine the image type of the video image to be recognized.
在一个实施例中,确定模块504还用于获取待识别监控视频文件对应监控设备的采集模式调节时间,以及获取待识别视频图像对应的拍摄时间;根据采集模式调节时间确定待识别视频图像的图像类型。In one embodiment, the determining module 504 is further configured to obtain the acquisition mode adjustment time of the monitoring device corresponding to the surveillance video file to be identified, and the shooting time corresponding to the video image to be identified; determine the image of the video image to be identified according to the acquisition mode adjustment time Types of.
在一个实施例中,识别模块508还用于利用人体姿态模型的前置网络层对待识别视频图像进行特征提取,得到待识别视频图像对应的特征图;利用人体姿态模型的置信度网络层从特征图中提取所述待识别视频图像中人体的人体关键点,得到待识别视频图像中人体关键点对应的关键点置信图;利用人体姿态模型的关联度向量网络层从特征图中提取待识别视频图像中各所述人体关键点的关联度;根据关键点置信图和人体关键点的关联度确定待识别视频图像的人物位置信息。In one embodiment, the recognition module 508 is further configured to use the front network layer of the human pose model to perform feature extraction on the video image to be recognized to obtain the feature map corresponding to the video image to be recognized; use the confidence network layer of the human pose model to extract features from the Extract the human body key points of the human body in the video image to be recognized, and obtain the key point confidence map corresponding to the human body key points in the video image to be recognized; use the correlation vector network layer of the human pose model to extract the video to be recognized from the feature map The degree of association of each of the key points of the human body in the image; the position information of the person in the video image to be recognized is determined according to the degree of association between the key point confidence map and the key points of the human body.
在一个实施例中,识别模块508还用于根据人体关键点的关联度,将关键点置信图上的人体关键点进行连接,并计算得到关键点轮廓;根据关键点轮廓获取外接最小矩形,外接最小矩形为包括所述关键点轮廓的面积最小的矩形;根据外接最小矩形确定待识别视频图像中的人物位置信息。In one embodiment, the recognition module 508 is further configured to connect the key points of the human body on the key point confidence map according to the degree of association of the key points of the human body, and calculate the key point contour; obtain the minimum circumscribed rectangle according to the key point contour, The smallest rectangle is the rectangle with the smallest area including the outline of the key point; the position information of the person in the video image to be recognized is determined according to the circumscribed smallest rectangle.
在一个实施例中,识别图像中人物位置的装置还包括生成模块,用于生成与人物位置信息对应的视频信息;将视频信息写入对应的日志中。In an embodiment, the device for identifying the position of a person in an image further includes a generating module for generating video information corresponding to the position information of the person; and writing the video information into the corresponding log.
在一个实施例中,识别图像中人物位置的装置还包括训练模块,用于获取监控设备的历史监控视频;从历史监控视频中提取彩色图像样本和夜视图像样本,并对彩色图像样本中的人体进行人体关键点的标注,以及对夜视图像样本中的人体进行位置坐标的标注,得到标注彩色图像和标注夜视图像;分别将标注彩色图像和标注夜视图像进行尺寸调整,得到训练彩色图像和训练夜视图像;将训练彩色图像中的人体关键点与标注彩色图像中标注的人体关键点进行映射,利用映射后的训练彩色图像对所述人体姿态模型进行训练;将训练夜视图像中的位置坐标与标注夜视图像中标注的位置坐标进行映射,利用映射后的训练夜视图像对轻量级目标检测模型进行训练。In one embodiment, the device for recognizing the position of a person in an image further includes a training module for acquiring historical surveillance videos of the surveillance equipment; extracting color image samples and night vision image samples from the historical surveillance videos, and comparing the color image samples Annotate the key points of the human body and annotate the position coordinates of the human body in the night vision image sample to obtain annotated color image and annotated night vision image; adjust the size of the annotated color image and annotated night vision image to obtain the training color Images and training night vision images; map the human body key points in the training color image with the human body key points marked in the annotated color image, and use the mapped training color image to train the human body posture model; the night vision image will be trained The position coordinates in and the position coordinates marked in the marked night vision image are mapped, and the light-weight target detection model is trained using the mapped training night vision image.
关于识别图像中人物位置的装置的具体限定可以参见上文中对于识别图像中人物位置的方法的限定,在此不再赘述。上述监控视频内容识别装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific definition of the device for recognizing the position of the person in the image, please refer to the above definition of the method for recognizing the position of the person in the image, which is not repeated here. The various modules in the above-mentioned surveillance video content recognition device can be implemented in whole or in part by software, hardware and a combination thereof. The foregoing modules may be embedded in the form of hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图6所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种识别图像中人物位置的方法。In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 6. The computer equipment includes a processor, a memory, a network interface and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The computer equipment database is used to store data. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program is executed by the processor to realize a method of recognizing the position of the person in the image.
本领域技术人员可以理解,图6中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 6 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
在一个实施例中,提供了一种计算机设备,包括存储器和处理器,该存储器存储有计算机程序,该处理器执行计算机程序时实现以下步骤:In one embodiment, a computer device is provided, including a memory and a processor, the memory stores a computer program, and the processor implements the following steps when executing the computer program:
获取待识别监控视频文件,并对待识别监控视频文件进行预处理,获得待识别视频图像;Obtain the surveillance video file to be identified, and preprocess the surveillance video file to be identified to obtain the video image to be identified;
确定待识别视频图像的图像类型;Determine the image type of the video image to be recognized;
在图像类型为彩色图像时,通过训练获得的人体姿态模型识别待识别视频图像中的人体关键点,并基于识别出的人体关键点确定待识别视频图像中的人物位置信息;When the image type is a color image, the human body posture model obtained through training recognizes the key points of the human body in the video image to be recognized, and determines the position information of the person in the video image to be recognized based on the recognized key points of the human body;
在图像类型为夜视图像时,通过训练获得的轻量级目标检测模型,识别出待识别视频图像中的人物位置信息。When the image type is a night vision image, the lightweight target detection model obtained through training can identify the position information of the person in the video image to be recognized.
在一个实施例中,处理器执行计算机程序时还实现以下步骤:In an embodiment, the processor further implements the following steps when executing the computer program:
获取待识别视频图像中各像素的三通道像素值;基于三通道像素值进行差值计算,选择差值最大的值作为像素差值;根据预设值和像素差值确定待识别视频图像的图像类型。Obtain the three-channel pixel value of each pixel in the video image to be recognized; calculate the difference based on the three-channel pixel value, and select the value with the largest difference as the pixel difference; determine the image of the video image to be recognized according to the preset value and the pixel difference Types of.
在一个实施例中,处理器执行计算机程序时还实现以下步骤:In an embodiment, the processor further implements the following steps when executing the computer program:
获取待识别监控视频文件对应监控设备的采集模式调节时间,以及获取待识别视频图像对应的拍摄时间;根据采集模式调节时间确定待识别视频图像的图像类型。Obtain the acquisition mode adjustment time of the surveillance device corresponding to the surveillance video file to be identified, and acquire the shooting time corresponding to the video image to be identified; determine the image type of the video image to be identified according to the acquisition mode adjustment time.
在一个实施例中,处理器执行计算机程序时还实现以下步骤:In an embodiment, the processor further implements the following steps when executing the computer program:
利用人体姿态模型的前置网络层对待识别视频图像进行特征提取,得到待识别视频图像对应的特征图;利用人体姿态模型的置信度网络层从特征图中提取所述待识别视频图像中人体的人体关键点,得到待识别视频图像中人体关键点对应的关键点置信图;利用人体姿态模型的关联度向量网络层从特征图中提取待识别视频图像中各所述人体关键点的关联度;根据关键点置信图和人体关键点的关联度确定待识别视频图像的人物位置信息。Use the pre-network layer of the human pose model to extract features of the video image to be recognized to obtain the feature map corresponding to the video image to be recognized; use the confidence network layer of the human pose model to extract the human body in the video image to be recognized from the feature map Human body key points, obtain the key point confidence map corresponding to the human body key points in the video image to be recognized; use the correlation vector network layer of the human pose model to extract the correlation degree of each of the human key points in the video image to be recognized from the feature map; Determine the position information of the person in the video image to be recognized according to the correlation between the key point confidence map and the key points of the human body.
在一个实施例中,处理器执行计算机程序时还实现以下步骤:In an embodiment, the processor further implements the following steps when executing the computer program:
根据人体关键点的关联度,将关键点置信图上的人体关键点进行连接,并计算得到关键点轮廓;根据关键点轮廓获取外接最小矩形,外接最小矩形为包括所述关键点轮廓的面积最小的矩形;根据外接最小矩形确定待识别视频图像中的人物位置信息。According to the correlation degree of the key points of the human body, connect the key points of the human body on the key point confidence map, and calculate the key point contour; obtain the minimum circumscribed rectangle according to the key point contour, and the minimum circumscribed rectangle is the smallest area including the contour of the key point The rectangle; the position information of the person in the video image to be recognized is determined according to the minimum rectangle circumscribed.
在一个实施例中,处理器执行计算机程序时还实现以下步骤:In an embodiment, the processor further implements the following steps when executing the computer program:
生成与人物位置信息对应的视频信息;将视频信息写入对应的日志中。Generate video information corresponding to character position information; write the video information into the corresponding log.
在一个实施例中,处理器执行计算机程序时还实现以下步骤:In an embodiment, the processor further implements the following steps when executing the computer program:
获取监控设备的历史监控视频;从历史监控视频中提取彩色图像样本和夜视图像样本,并对彩色图像样本中的人体进行人体关键点的标注,以及对夜视图像样本中的人体进行位置坐标的标注,得到标注彩色图像和标注夜视图像;分别将标注彩色图像和标注夜视图像进行尺寸调整,得到训练彩色图像和训练夜视图像;将训练彩色图像中的人体关键点与标注彩色图像中标注的人体关键点进行映射,利用映射后的训练彩色图像对所述人体姿态模型进行训练;将训练夜视图像中的位置坐标与标注夜视图像中标注的位置坐标进行映射,利用映射后的训练夜视图像对轻量级目标检测模型进行训练。Obtain historical surveillance videos of surveillance equipment; extract color image samples and night vision image samples from historical surveillance videos, and mark human body key points in the color image samples, and position coordinates of the human body in the night vision image samples Annotate, get annotated color image and annotated night vision image; respectively adjust the size of annotated color image and annotated night vision image to obtain training color image and training night vision image; combine the human body key points in the training color image with the annotated color image The key points of the human body marked in the mapping are mapped, and the human body pose model is trained using the mapped training color image; the position coordinates in the training night vision image are mapped to the position coordinates marked in the marked night vision image, and the mapped The training night vision images for training lightweight target detection models.
在一个实施例中,提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现以下步骤:In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented:
获取待识别监控视频文件,并对待识别监控视频文件进行预处理,获得待识别视频图像;Obtain the surveillance video file to be identified, and preprocess the surveillance video file to be identified to obtain the video image to be identified;
确定待识别视频图像的图像类型;Determine the image type of the video image to be recognized;
在图像类型为彩色图像时,通过训练获得的人体姿态模型识别待识别视频图像中的人体关键点,并基于识别出的人体关键点确定待识别视频图像中的人物位置信息;When the image type is a color image, the human body posture model obtained through training recognizes the key points of the human body in the video image to be recognized, and determines the position information of the person in the video image to be recognized based on the recognized key points of the human body;
在图像类型为夜视图像时,通过训练获得的轻量级目标检测模型,识别出待识别视频图像中的人物位置信息。When the image type is a night vision image, the lightweight target detection model obtained through training can identify the position information of the person in the video image to be recognized.
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:In an embodiment, the computer program further implements the following steps when being executed by the processor:
获取待识别视频图像中各像素的三通道像素值;基于三通道像素值进行差值计算,选择差值最大的值作为像素差值;根据预设值和像素差值确定待识别视频图像的图像类型。Obtain the three-channel pixel value of each pixel in the video image to be recognized; calculate the difference based on the three-channel pixel value, and select the value with the largest difference as the pixel difference; determine the image of the video image to be recognized according to the preset value and the pixel difference Types of.
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:In an embodiment, the computer program further implements the following steps when being executed by the processor:
获取待识别监控视频文件对应监控设备的采集模式调节时间,以及获取待识别视频图像对应的拍摄时间;根据采集模式调节时间确定待识别视频图像的图像类型。Obtain the acquisition mode adjustment time of the surveillance device corresponding to the surveillance video file to be identified, and acquire the shooting time corresponding to the video image to be identified; determine the image type of the video image to be identified according to the acquisition mode adjustment time.
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:In an embodiment, the computer program further implements the following steps when being executed by the processor:
利用人体姿态模型的前置网络层对待识别视频图像进行特征提取,得到待识别视频图像对应的特征图;利用人体姿态模型的置信度网络层从特征图中提取所述待识别视频图像中人体的人体关键点,得到待识别视频图像中人体关键点对应的关键点置信图;利用人体姿态模型的关联度向量网络层从特征图中提取待识别视频图像中各所述人体关键点的关联度;根据关键点置信图和人体关键点的关联度确定待识别视频图像的人物位置信息。Use the pre-network layer of the human pose model to extract features of the video image to be recognized to obtain the feature map corresponding to the video image to be recognized; use the confidence network layer of the human pose model to extract the human body in the video image to be recognized from the feature map Human body key points, obtain the key point confidence map corresponding to the human body key points in the video image to be recognized; use the correlation vector network layer of the human pose model to extract the correlation degree of each of the human key points in the video image to be recognized from the feature map; Determine the position information of the person in the video image to be recognized according to the correlation between the key point confidence map and the key points of the human body.
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:In an embodiment, the computer program further implements the following steps when being executed by the processor:
根据人体关键点的关联度,将关键点置信图上的人体关键点进行连接,并计算得到关键点轮廓;根据关键点轮廓获取外接最小矩形,外接最小矩形为包括所述关键点轮廓的面积最小的矩形;根据外接最小矩形确定待识别视频图像中的人物位置信息。According to the correlation degree of the key points of the human body, connect the key points of the human body on the key point confidence map, and calculate the key point contour; obtain the minimum circumscribed rectangle according to the key point contour, and the minimum circumscribed rectangle is the smallest area including the contour of the key point The rectangle; the position information of the person in the video image to be recognized is determined according to the smallest rectangle circumscribed.
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:In an embodiment, the computer program further implements the following steps when being executed by the processor:
生成与人物位置信息对应的视频信息;将视频信息写入对应的日志中。Generate video information corresponding to character position information; write the video information into the corresponding log.
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:In an embodiment, the computer program further implements the following steps when being executed by the processor:
获取监控设备的历史监控视频;从历史监控视频中提取彩色图像样本和夜视图像样本,并对彩色图像样本中的人体进行人体关键点的标注,以及对夜视图像样本中的人体进行位置坐标的标注,得到标注彩色图像和标注夜视图像;分别将标注彩色图像和标注夜视图像进行尺寸调整,得到训练彩色图像和训练夜视图像;将训练彩色图像中的人体关键点与标注彩色图像中标注的人体关键点进行映射,利用映射后的训练彩色图像对所述人体姿态模型进行训练;将训练夜视图像中的位置坐标与标注夜视图像中标注的位置坐标进行映射,利用映射后的训练夜视图像对轻量级目标检测模型进行训练。Obtain historical surveillance videos of surveillance equipment; extract color image samples and night vision image samples from historical surveillance videos, and mark human body key points in the color image samples, and position coordinates of the human body in the night vision image samples Annotate, get annotated color image and annotated night vision image; respectively adjust the size of annotated color image and annotated night vision image to obtain training color image and training night vision image; combine the human body key points in the training color image with the annotated color image The key points of the human body marked in the mapping are mapped, and the human body pose model is trained using the mapped training color image; the position coordinates in the training night vision image are mapped to the position coordinates marked in the marked night vision image, and the mapped The training night vision images for training lightweight target detection models.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,也可存储于一易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink) DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The computer program can be stored in a non-volatile computer readable storage. The medium may also be stored in a volatile computer readable storage medium. When the computer program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction between the combinations of these technical features, they should It is considered as the range described in this specification.
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation manners of the present application, and the description is relatively specific and detailed, but it should not be understood as a limitation on the scope of the invention patent. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims (20)

  1. 一种识别图像中人物位置的方法,所述方法包括:A method for identifying the position of a person in an image, the method comprising:
    获取待识别监控视频文件,并对所述待识别监控视频文件进行预处理,获得待识别视频图像;Acquiring a surveillance video file to be identified, and preprocessing the surveillance video file to be identified to obtain a video image to be identified;
    确定所述待识别视频图像的图像类型;Determining the image type of the video image to be recognized;
    在所述图像类型为彩色图像时,通过训练获得的人体姿态模型识别所述待识别图像中的人体关键点,并基于识别出的人体关键点确定所述待识别视频图像中的人物位置信息;When the image type is a color image, the human body posture model obtained through training recognizes the key points of the human body in the image to be recognized, and determines the position information of the person in the video image to be recognized based on the recognized key points of the human body;
    在所述图像类型为夜视图像时,通过训练获得的轻量级目标检测模型,识别出所述待识别视频图像中的人物位置信息。When the image type is a night vision image, the lightweight target detection model obtained through training recognizes the position information of the person in the video image to be recognized.
  2. 根据权利要求1所述的方法,其中,所述确定所述待识别视频图像的图像类型的步骤,包括:The method according to claim 1, wherein the step of determining the image type of the video image to be recognized comprises:
    获取所述待识别视频图像中各像素的三通道像素值;Acquiring the three-channel pixel value of each pixel in the video image to be identified;
    基于所述三通道像素值进行差值计算,选择差值最大的值作为像素差值;Perform difference calculation based on the three-channel pixel values, and select the value with the largest difference as the pixel difference;
    根据预设值和所述像素差值确定所述待识别视频图像的图像类型。The image type of the video image to be recognized is determined according to the preset value and the pixel difference value.
  3. 根据权利要求1所述的方法,其中,所述确定所述待识别视频图像的图像类型,包括:The method according to claim 1, wherein said determining the image type of the video image to be recognized comprises:
    获取所述待识别监控视频文件对应监控设备的采集模式调节时间,以及获取所述待识别视频图像对应的拍摄时间;Acquiring the acquisition mode adjustment time of the monitoring device corresponding to the surveillance video file to be identified, and acquiring the shooting time corresponding to the video image to be identified;
    根据所述采集模式调节时间确定所述待识别视频图像的图像类型。The image type of the video image to be recognized is determined according to the acquisition mode adjustment time.
  4. 根据权利要求1所述的方法,其中,在所述图像类型为彩色图像时,通过训练获得的人体姿态模型识别所述待识别图像中的人体关键点,并基于识别出的人体关键点确定所述待识别视频图像中的人物位置信息,包括:The method according to claim 1, wherein when the image type is a color image, the human body posture model obtained through training recognizes the human body key points in the image to be recognized, and determines the human body key points based on the recognized human body key points The position information of the person in the video image to be recognized includes:
    利用所述人体姿态模型的前置网络层对所述待识别视频图像进行特征提取,得到所述待识别视频图像对应的特征图;Performing feature extraction on the video image to be recognized by using the pre-network layer of the human pose model to obtain a feature map corresponding to the video image to be recognized;
    利用所述人体姿态模型的置信度网络层从所述特征图中提取所述待识别视频图像中人体的人体关键点,得到所述待识别视频图像中人体关键点对应的关键点置信图;Extracting the human body key points of the human body in the video image to be recognized from the feature map by using the confidence network layer of the human pose model to obtain a key point confidence map corresponding to the human body key points in the video image to be recognized;
    利用所述人体姿态模型的关联度向量网络层从所述特征图中提取所述待识别视频图像中各所述人体关键点的关联度;Extracting the correlation degree of each key point of the human body in the video image to be recognized from the feature map by using the correlation degree vector network layer of the human body pose model;
    根据所述关键点置信图和所述人体关键点的关联度确定所述待识别视频图像的人物位置信息。The person position information of the video image to be recognized is determined according to the correlation between the key point confidence map and the key points of the human body.
  5. 根据权利要求4所述的方法,其中,所述根据所述关键点置信图和所述人体关键点的关联度确定所述待识别视频图像的人物位置信息,包括:The method according to claim 4, wherein the determining the person position information of the video image to be recognized according to the correlation between the key point confidence map and the key points of the human body comprises:
    根据所述人体关键点的关联度,将所述关键点置信图上的人体关键点进行连接,并计算得到关键点轮廓;According to the correlation degree of the key points of the human body, connect the key points of the human body on the key point confidence map, and calculate the outline of the key points;
    根据所述关键点轮廓获取外接最小矩形,所述外接最小矩形为包括所述关键点轮廓的面积最小的矩形;Obtaining a minimum circumscribed rectangle according to the contour of the key point, where the minimum circumscribed rectangle is a rectangle with the smallest area including the contour of the key point;
    根据所述外接最小矩形确定所述待识别视频图像中的人物位置信息。The position information of the person in the video image to be recognized is determined according to the minimum circumscribed rectangle.
  6. 根据权利要求1所述的方法,其中,得到所述待识别人物位置信息之后,还包括:The method according to claim 1, wherein after obtaining the position information of the person to be identified, the method further comprises:
    生成与所述人物位置信息对应的视频信息;Generating video information corresponding to the character position information;
    将所述视频信息写入对应的日志中。Write the video information into the corresponding log.
  7. 根据权利要求1所述的方法,其中,在所述获取待识别监控视频文件之前,还包括训练所述人体姿态模型和所述轻量级目标检测模型的步骤;所述训练所述人体姿态模型和所述轻量级目标检测模型的步骤包括:The method according to claim 1, wherein before said acquiring the surveillance video file to be recognized, it further comprises the step of training said human body pose model and said lightweight target detection model; said training said human body pose model The steps of the lightweight target detection model include:
    获取监控设备的历史监控视频;Obtain historical surveillance videos of surveillance equipment;
    从所述历史监控视频中提取彩色图像样本和夜视图像样本,并对所述彩色图像样本中的人体进行人体关键点的标注,以及对所述夜视图像样本中的人体进行位置坐标的标注,得到标注彩色图像和标注夜视图像;Extracting color image samples and night vision image samples from the historical surveillance video, marking the human body key points in the color image samples, and marking the position coordinates of the human body in the night vision image samples , Get annotated color image and annotated night vision image;
    分别对所述标注彩色图像和标注夜视图像进行尺寸调整,得到训练彩色图像和训练夜视图像;Performing size adjustment on the labeled color image and the labeled night vision image to obtain a training color image and a training night vision image;
    将所述训练彩色图像中的人体关键点与所述标注彩色图像中标注的人体关键点进行映射,利用映射后的训练彩色图像对所述人体姿态模型进行训练;Mapping the human body key points in the training color image with the human body key points marked in the marked color image, and train the human body pose model using the mapped training color image;
    将所述训练夜视图像中的位置坐标与所述标注夜视图像中标注的位置坐标进行映射,利用映射后的训练夜视图像对所述轻量级目标检测模型进行训练。The position coordinates in the training night vision image are mapped to the position coordinates marked in the marked night vision image, and the light-weight target detection model is trained using the mapped training night vision image.
  8. 一种识别图像中人物位置的装置,其中,所述装置包括:A device for identifying the position of a person in an image, wherein the device includes:
    预处理模块,用于获取待识别监控视频文件,并对所述待识别监控视频文件进行预处理,获得待识别视频图像;The preprocessing module is used to obtain the surveillance video file to be identified, and preprocess the surveillance video file to be identified to obtain the video image to be identified;
    确定模块,用于确定所述待识别视频图像的图像类型;The determining module is used to determine the image type of the video image to be recognized;
    识别模块,用于在所述图像类型为彩色图像时,通过训练获得的人体姿态模型识别所述待识别视频图像中的人体关键点,并基于识别出的人体关键点确定所述待识别视频图像中的人物位置信息;The recognition module is used to recognize the key points of the human body in the video image to be recognized by the human body posture model obtained by training when the image type is a color image, and determine the video image to be recognized based on the recognized key points of the human body Position information in
    所述识别模块还用于在所述图像类型为夜视图像时,通过训练获得的轻量级目标检测模型,识别出所述待识别视频图像中的人物位置信息。The recognition module is also used to recognize the position information of the person in the video image to be recognized by the lightweight target detection model obtained through training when the image type is a night vision image.
  9. 如权利要求8所述的装置,其中,所述确定所述待识别视频图像的图像类型,包括:The apparatus of claim 8, wherein the determining the image type of the video image to be recognized comprises:
    获取所述待识别视频图像中各像素的三通道像素值;Acquiring the three-channel pixel value of each pixel in the video image to be identified;
    基于所述三通道像素值进行差值计算,选择差值最大的值作为像素差值;Perform difference calculation based on the three-channel pixel values, and select the value with the largest difference as the pixel difference;
    根据预设值和所述像素差值确定所述待识别视频图像的图像类型。The image type of the video image to be recognized is determined according to the preset value and the pixel difference value.
  10. 如权利要求8所述的装置,其中,所述确定所述待识别视频图像的图像类型,包括:The apparatus of claim 8, wherein the determining the image type of the video image to be recognized comprises:
    获取所述待识别监控视频文件对应监控设备的采集模式调节时间,以及获取所述待识别视频图像对应的拍摄时间;Acquiring the acquisition mode adjustment time of the monitoring device corresponding to the surveillance video file to be identified, and acquiring the shooting time corresponding to the video image to be identified;
    根据所述采集模式调节时间确定所述待识别视频图像的图像类型。The image type of the video image to be recognized is determined according to the acquisition mode adjustment time.
  11. 如权利要求8所述的装置,其中,在所述图像类型为彩色图像时,通过训练获得的人体姿态模型识别所述待识别图像中的人体关键点,并基于识别出的人体关键点确定所述待识别视频图像中的人物位置信息,包括:The device according to claim 8, wherein when the image type is a color image, the human body posture model obtained through training recognizes the human body key points in the image to be recognized, and determines the human body key points based on the recognized human body key points The position information of the person in the video image to be recognized includes:
    利用所述人体姿态模型的前置网络层对所述待识别视频图像进行特征提取,得到所述待识别视频图像对应的特征图;Performing feature extraction on the video image to be recognized by using the pre-network layer of the human pose model to obtain a feature map corresponding to the video image to be recognized;
    利用所述人体姿态模型的置信度网络层从所述特征图中提取所述待识别视频图像中人体的人体关键点,得到所述待识别视频图像中人体关键点对应的关键点置信图;Extracting the human body key points of the human body in the video image to be recognized from the feature map by using the confidence network layer of the human pose model to obtain a key point confidence map corresponding to the human body key points in the video image to be recognized;
    利用所述人体姿态模型的关联度向量网络层从所述特征图中提取所述待识别视频图像中各所述人体关键点的关联度;Extracting the correlation degree of each key point of the human body in the video image to be recognized from the feature map by using the correlation degree vector network layer of the human body pose model;
    根据所述关键点置信图和所述人体关键点的关联度确定所述待识别视频图像的人物位置信息。The person position information of the video image to be recognized is determined according to the correlation between the key point confidence map and the key points of the human body.
  12. 如权利要求11所述的装置,其中,所述根据所述关键点置信图和所述人体关键点的关联度确定所述待识别视频图像的人物位置信息,包括:The device according to claim 11, wherein the determining the person position information of the video image to be recognized according to the correlation between the key point confidence map and the key points of the human body comprises:
    根据所述人体关键点的关联度,将所述关键点置信图上的人体关键点进行连接,并计算得到关键点轮廓;According to the correlation degree of the key points of the human body, connect the key points of the human body on the key point confidence map, and calculate the outline of the key points;
    根据所述关键点轮廓获取外接最小矩形,所述外接最小矩形为包括所述关键点轮廓的面积最小的矩形;Obtaining a minimum circumscribed rectangle according to the contour of the key point, where the minimum circumscribed rectangle is a rectangle with the smallest area including the contour of the key point;
    根据所述外接最小矩形确定所述待识别视频图像中的人物位置信息。The position information of the person in the video image to be recognized is determined according to the minimum circumscribed rectangle.
  13. 如权利要求8所述的装置,其中,得到所述待识别人物位置信息之后,还包括:8. The device according to claim 8, wherein after obtaining the position information of the person to be identified, it further comprises:
    生成与所述人物位置信息对应的视频信息;Generating video information corresponding to the character position information;
    将所述视频信息写入对应的日志中。Write the video information into the corresponding log.
  14. 如权利要求8所述的装置,其中,在所述获取待识别监控视频文件之前,所述预处理模块还用于训练所述人体姿态模型和所述轻量级目标检测模型;所述训练所述人体姿态模型和所述轻量级目标检测模型的步骤包括:The device according to claim 8, wherein, before the acquisition of the surveillance video file to be recognized, the preprocessing module is further used to train the human pose model and the lightweight target detection model; the training institute The steps of the human body pose model and the lightweight target detection model include:
    获取监控设备的历史监控视频;Obtain historical surveillance videos of surveillance equipment;
    从所述历史监控视频中提取彩色图像样本和夜视图像样本,并对所述彩色图像样本中的人体进行人体关键点的标注,以及对所述夜视图像样本中的人体进行位置坐标的标注,得到标注彩色图像和标注夜视图像;Extracting color image samples and night vision image samples from the historical surveillance video, marking the human body key points in the color image samples, and marking the position coordinates of the human body in the night vision image samples , Get annotated color image and annotated night vision image;
    分别对所述标注彩色图像和标注夜视图像进行尺寸调整,得到训练彩色图像和训练夜视图像;Performing size adjustment on the labeled color image and the labeled night vision image to obtain a training color image and a training night vision image;
    将所述训练彩色图像中的人体关键点与所述标注彩色图像中标注的人体关键点进行映射,利用映射后的训练彩色图像对所述人体姿态模型进行训练;Mapping the human body key points in the training color image with the human body key points marked in the marked color image, and train the human body pose model using the mapped training color image;
    将所述训练夜视图像中的位置坐标与所述标注夜视图像中标注的位置坐标进行映射,利用映射后的训练夜视图像对所述轻量级目标检测模型进行训练。The position coordinates in the training night vision image are mapped to the position coordinates marked in the marked night vision image, and the light-weight target detection model is trained using the mapped training night vision image.
  15. 一种计算机设备,其中,包括存储器和处理器,所述存储器存储有计算机程序,其中,所述处理器执行所述计算机程序时实现如下步骤:A computer device, including a memory and a processor, the memory storing a computer program, wherein the processor implements the following steps when the processor executes the computer program:
    获取待识别监控视频文件,并对所述待识别监控视频文件进行预处理,获得待识别视频图像;Acquiring a surveillance video file to be identified, and preprocessing the surveillance video file to be identified to obtain a video image to be identified;
    确定所述待识别视频图像的图像类型;Determining the image type of the video image to be recognized;
    在所述图像类型为彩色图像时,通过训练获得的人体姿态模型识别所述待识别图像中的人体关键点,并基于识别出的人体关键点确定所述待识别视频图像中的人物位置信息;When the image type is a color image, the human body posture model obtained through training recognizes the key points of the human body in the image to be recognized, and determines the position information of the person in the video image to be recognized based on the recognized key points of the human body;
    在所述图像类型为夜视图像时,通过训练获得的轻量级目标检测模型,识别出所述待识别视频图像中的人物位置信息。When the image type is a night vision image, the lightweight target detection model obtained through training recognizes the position information of the person in the video image to be recognized.
  16. 如权利要求15所述的计算机设备,其中,所述确定所述待识别视频图像的图像类型的步骤,包括:The computer device according to claim 15, wherein the step of determining the image type of the video image to be recognized comprises:
    获取所述待识别视频图像中各像素的三通道像素值;Acquiring the three-channel pixel value of each pixel in the video image to be identified;
    基于所述三通道像素值进行差值计算,选择差值最大的值作为像素差值;Perform difference calculation based on the three-channel pixel values, and select the value with the largest difference as the pixel difference;
    根据预设值和所述像素差值确定所述待识别视频图像的图像类型。The image type of the video image to be recognized is determined according to the preset value and the pixel difference value.
  17. 如权利要求15所述的计算机设备,其中,所述确定所述待识别视频图像的图像类型,包括:The computer device according to claim 15, wherein said determining the image type of the video image to be recognized comprises:
    获取所述待识别监控视频文件对应监控设备的采集模式调节时间,以及获取所述待识别视频图像对应的拍摄时间;Acquiring the acquisition mode adjustment time of the monitoring device corresponding to the surveillance video file to be identified, and acquiring the shooting time corresponding to the video image to be identified;
    根据所述采集模式调节时间确定所述待识别视频图像的图像类型。The image type of the video image to be recognized is determined according to the acquisition mode adjustment time.
  18. 如权利要求15所述的计算机设备,其中,在所述图像类型为彩色图像时,通过训练获得的人体姿态模型识别所述待识别图像中的人体关键点,并基于识别出的人体关键点确定所述待识别视频图像中的人物位置信息,包括:The computer device according to claim 15, wherein, when the image type is a color image, the human body posture model obtained through training recognizes the key points of the human body in the image to be recognized, and determines based on the recognized key points of the human body The position information of the person in the video image to be recognized includes:
    利用所述人体姿态模型的前置网络层对所述待识别视频图像进行特征提取,得到所述待识别视频图像对应的特征图;Performing feature extraction on the video image to be recognized by using the pre-network layer of the human pose model to obtain a feature map corresponding to the video image to be recognized;
    利用所述人体姿态模型的置信度网络层从所述特征图中提取所述待识别视频图像中人体的人体关键点,得到所述待识别视频图像中人体关键点对应的关键点置信图;Extracting the human body key points of the human body in the video image to be recognized from the feature map by using the confidence network layer of the human pose model to obtain a key point confidence map corresponding to the human body key points in the video image to be recognized;
    利用所述人体姿态模型的关联度向量网络层从所述特征图中提取所述待识别视频图像中各所述人体关键点的关联度;Extracting the correlation degree of each key point of the human body in the video image to be recognized from the feature map by using the correlation degree vector network layer of the human body pose model;
    根据所述关键点置信图和所述人体关键点的关联度确定所述待识别视频图像的人物位置信息。The person position information of the video image to be recognized is determined according to the correlation between the key point confidence map and the key points of the human body.
  19. 如权利要求15所述的计算机设备,其中,得到所述待识别人物位置信息之后,还包括:The computer device according to claim 15, wherein after obtaining the position information of the person to be identified, it further comprises:
    生成与所述人物位置信息对应的视频信息;Generating video information corresponding to the character position information;
    将所述视频信息写入对应的日志中。Write the video information into the corresponding log.
  20. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现权利要求1至7中任一项所述的方法的步骤。A computer-readable storage medium having a computer program stored thereon, wherein the computer program implements the steps of the method according to any one of claims 1 to 7 when the computer program is executed by a processor.
PCT/CN2020/093608 2019-07-12 2020-05-30 Method and apparatus for recognizing position of person in image, computer device and storage medium WO2021008252A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910628940.8A CN110502986A (en) 2019-07-12 2019-07-12 Identify character positions method, apparatus, computer equipment and storage medium in image
CN201910628940.8 2019-07-12

Publications (1)

Publication Number Publication Date
WO2021008252A1 true WO2021008252A1 (en) 2021-01-21

Family

ID=68586137

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/093608 WO2021008252A1 (en) 2019-07-12 2020-05-30 Method and apparatus for recognizing position of person in image, computer device and storage medium

Country Status (2)

Country Link
CN (1) CN110502986A (en)
WO (1) WO2021008252A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112818908A (en) * 2021-02-22 2021-05-18 Oppo广东移动通信有限公司 Key point detection method, device, terminal and storage medium
CN112861689A (en) * 2021-02-01 2021-05-28 上海依图网络科技有限公司 Searching method and device of coordinate recognition model based on NAS technology
CN112990057A (en) * 2021-03-26 2021-06-18 北京易华录信息技术股份有限公司 Human body posture recognition method and device and electronic equipment
CN113141518A (en) * 2021-04-20 2021-07-20 北京安博盛赢教育科技有限责任公司 Control method and control device for video frame images in live classroom
CN113326773A (en) * 2021-05-28 2021-08-31 北京百度网讯科技有限公司 Recognition model training method, recognition method, device, equipment and storage medium
CN113807342A (en) * 2021-09-17 2021-12-17 广东电网有限责任公司 Method and related device for acquiring equipment information based on image
CN113873196A (en) * 2021-03-08 2021-12-31 南通市第一人民医院 Method and system for improving infection prevention and control management quality

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110502986A (en) * 2019-07-12 2019-11-26 平安科技(深圳)有限公司 Identify character positions method, apparatus, computer equipment and storage medium in image
CN111178323B (en) * 2020-01-10 2023-08-29 北京百度网讯科技有限公司 Group behavior recognition method, device, equipment and storage medium based on video
CN111222486B (en) * 2020-01-15 2022-11-04 腾讯科技(深圳)有限公司 Training method, device and equipment for hand gesture recognition model and storage medium
CN111476729B (en) * 2020-03-31 2023-06-09 北京三快在线科技有限公司 Target identification method and device
CN111753643A (en) * 2020-05-09 2020-10-09 北京迈格威科技有限公司 Character posture recognition method and device, computer equipment and storage medium
CN112418135A (en) * 2020-12-01 2021-02-26 深圳市优必选科技股份有限公司 Human behavior recognition method and device, computer equipment and readable storage medium
CN113221832B (en) * 2021-05-31 2023-07-11 常州纺织服装职业技术学院 Human body identification method and system based on three-dimensional human body data
CN117354494B (en) * 2023-12-05 2024-02-23 天津华来科技股份有限公司 Testing method for night vision switching performance of intelligent camera

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573111A (en) * 2015-02-03 2015-04-29 中国人民解放军国防科学技术大学 Method for structured storage and pre-retrieval of pedestrian data in surveillance videos
US20150194034A1 (en) * 2014-01-03 2015-07-09 Nebulys Technologies, Inc. Systems and methods for detecting and/or responding to incapacitated person using video motion analytics
CN108829233A (en) * 2018-04-26 2018-11-16 深圳市深晓科技有限公司 A kind of exchange method and device
CN109961014A (en) * 2019-02-25 2019-07-02 中国科学院重庆绿色智能技术研究院 A kind of coal mine conveying belt danger zone monitoring method and system
CN110502986A (en) * 2019-07-12 2019-11-26 平安科技(深圳)有限公司 Identify character positions method, apparatus, computer equipment and storage medium in image

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101587622B (en) * 2009-06-18 2012-09-05 李秋华 Forest rocket detecting and identifying method and apparatus based on video image intelligent analysis
CN105005766B (en) * 2015-07-01 2018-06-01 深圳市迈科龙电子有限公司 A kind of body color recognition methods
CN106027931B (en) * 2016-04-14 2018-03-16 平安科技(深圳)有限公司 Video recording method and server
CN107844744A (en) * 2017-10-09 2018-03-27 平安科技(深圳)有限公司 With reference to the face identification method, device and storage medium of depth information
CN109740513B (en) * 2018-12-29 2020-11-27 青岛小鸟看看科技有限公司 Action behavior analysis method and device
CN109886139A (en) * 2019-01-28 2019-06-14 平安科技(深圳)有限公司 Human testing model generating method, sewage draining exit method for detecting abnormality and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150194034A1 (en) * 2014-01-03 2015-07-09 Nebulys Technologies, Inc. Systems and methods for detecting and/or responding to incapacitated person using video motion analytics
CN104573111A (en) * 2015-02-03 2015-04-29 中国人民解放军国防科学技术大学 Method for structured storage and pre-retrieval of pedestrian data in surveillance videos
CN108829233A (en) * 2018-04-26 2018-11-16 深圳市深晓科技有限公司 A kind of exchange method and device
CN109961014A (en) * 2019-02-25 2019-07-02 中国科学院重庆绿色智能技术研究院 A kind of coal mine conveying belt danger zone monitoring method and system
CN110502986A (en) * 2019-07-12 2019-11-26 平安科技(深圳)有限公司 Identify character positions method, apparatus, computer equipment and storage medium in image

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YANG, ALI ET AL.: "Video-based 24-hour vehicle detection method for traffic security-access monitoring", JOURNAL OF HEFEI UNIVERSITY OF TECHNOLOGY, vol. 35, no. 3, 31 March 2012 (2012-03-31), ISSN: 1008-3634 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112861689A (en) * 2021-02-01 2021-05-28 上海依图网络科技有限公司 Searching method and device of coordinate recognition model based on NAS technology
CN112818908A (en) * 2021-02-22 2021-05-18 Oppo广东移动通信有限公司 Key point detection method, device, terminal and storage medium
CN113873196A (en) * 2021-03-08 2021-12-31 南通市第一人民医院 Method and system for improving infection prevention and control management quality
CN112990057A (en) * 2021-03-26 2021-06-18 北京易华录信息技术股份有限公司 Human body posture recognition method and device and electronic equipment
CN113141518A (en) * 2021-04-20 2021-07-20 北京安博盛赢教育科技有限责任公司 Control method and control device for video frame images in live classroom
CN113326773A (en) * 2021-05-28 2021-08-31 北京百度网讯科技有限公司 Recognition model training method, recognition method, device, equipment and storage medium
CN113807342A (en) * 2021-09-17 2021-12-17 广东电网有限责任公司 Method and related device for acquiring equipment information based on image

Also Published As

Publication number Publication date
CN110502986A (en) 2019-11-26

Similar Documents

Publication Publication Date Title
WO2021008252A1 (en) Method and apparatus for recognizing position of person in image, computer device and storage medium
CN109359575B (en) Face detection method, service processing method, device, terminal and medium
Liu et al. Learning deep models for face anti-spoofing: Binary or auxiliary supervision
WO2021047232A1 (en) Interaction behavior recognition method, apparatus, computer device, and storage medium
WO2020228446A1 (en) Model training method and apparatus, and terminal and storage medium
CN106203242B (en) Similar image identification method and equipment
WO2021052375A1 (en) Target image generation method, apparatus, server and storage medium
WO2019071664A1 (en) Human face recognition method and apparatus combined with depth information, and storage medium
CN108335331B (en) Binocular vision positioning method and equipment for steel coil
WO2021017882A1 (en) Image coordinate system conversion method and apparatus, device and storage medium
US20070009159A1 (en) Image recognition system and method using holistic Harr-like feature matching
WO2021012370A1 (en) Pupil radius detection method and apparatus, computer device and storage medium
US11682231B2 (en) Living body detection method and device
JP2002216129A (en) Face area detector, its method and computer readable recording medium
WO2019033570A1 (en) Lip movement analysis method, apparatus and storage medium
US11315360B2 (en) Live facial recognition system and method
CN111461036B (en) Real-time pedestrian detection method using background modeling to enhance data
CN112232323A (en) Face verification method and device, computer equipment and storage medium
CN113298158A (en) Data detection method, device, equipment and storage medium
CN112200056A (en) Face living body detection method and device, electronic equipment and storage medium
CN111582155A (en) Living body detection method, living body detection device, computer equipment and storage medium
Potje et al. Extracting deformation-aware local features by learning to deform
CN109002776B (en) Face recognition method, system, computer device and computer-readable storage medium
CN111881841B (en) Face detection and recognition method based on binocular vision
JP5213778B2 (en) Facial recognition device and facial organ feature point identification method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20840875

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20840875

Country of ref document: EP

Kind code of ref document: A1