WO2022206744A1 - 信息关联方法、系统、装置、服务器及存储介质 - Google Patents

信息关联方法、系统、装置、服务器及存储介质 Download PDF

Info

Publication number
WO2022206744A1
WO2022206744A1 PCT/CN2022/083610 CN2022083610W WO2022206744A1 WO 2022206744 A1 WO2022206744 A1 WO 2022206744A1 CN 2022083610 W CN2022083610 W CN 2022083610W WO 2022206744 A1 WO2022206744 A1 WO 2022206744A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
target user
package
face
video
Prior art date
Application number
PCT/CN2022/083610
Other languages
English (en)
French (fr)
Inventor
张俊力
唐政
陈韬
Original Assignee
杭州海康威视数字技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州海康威视数字技术股份有限公司 filed Critical 杭州海康威视数字技术股份有限公司
Publication of WO2022206744A1 publication Critical patent/WO2022206744A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10116X-ray image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • the embodiments of the present application relate to the field of security supervision, and in particular, to an information association method, system, device, server, and storage medium.
  • the image of the package is associated with the face image of the user who placed the package on the security inspection machine, so that the associated information can be used in the fields of dangerous goods alarm, problem package traceability, key crowd control and other fields. application prospects. Therefore, how to perform information association has become an urgent problem to be solved at present.
  • the related art proposes an information association method.
  • users need to queue up in order to enter the package storage area and place the packages in the storage tray.
  • two face capture machines need to be installed up and down.
  • the face capture machine at the top faces the user's face and is used to capture face images.
  • the face capture machine at the bottom faces the body part and is used to determine the user's hands to push and store.
  • the association of the face image of the user who placed the package, the visible light package image and the X-ray package image is completed.
  • the above solution has relatively high requirements on the site, and needs to deploy a relatively complex hardware structure such as a pallet conveying channel, which is difficult to deploy.
  • the embodiments of the present application provide an information association method, system, device, server, and storage medium, which can solve the problems of complex hardware structure and difficult deployment in the related art.
  • the technical solution is as follows:
  • an information association method is provided.
  • a binocular camera is deployed above the security inspection machine, and a face capture machine is deployed above the X-ray detection area of the security inspection machine.
  • the above method includes:
  • the target user is detected and tracked through the first video collected by the binocular camera, the user ID of the target user is determined, and the key points of the human skeleton of the target user in the video frame image of the first video are in the binocular camera coordinate system
  • the above-mentioned user identification and the three-dimensional coordinates of the above-mentioned human skeleton key points in the above-mentioned binocular camera coordinate system in the video frame image of the above-mentioned first video it is determined that the above-mentioned target user places the package on the above-mentioned security inspection machine.
  • the X-ray package image of the above-mentioned package in the above-mentioned X-ray detection area is determined by the above-mentioned security inspection machine; based on the video frame image of the above-mentioned first video, the key point of the above-mentioned human skeleton is at
  • the face image of the target user is determined by the face capture machine; the visible light wrapping image, the X-ray wrapping image, and the face image are associated.
  • the information association system includes: a server, a binocular camera deployed above the security inspection machine, and a face capture machine deployed above the X-ray detection area of the security inspection machine; wherein,
  • the above-mentioned binocular camera is set to collect the first video including the target user
  • the above-mentioned server is configured to detect and track the above-mentioned target user through the above-mentioned first video, determine the user identity of the above-mentioned target user, and the key points of the human skeleton of the above-mentioned target user in the video frame image of the above-mentioned first video are in the binocular camera coordinates.
  • three-dimensional coordinates in the system based on the user identification and the three-dimensional coordinates of the key points of the human skeleton in the binocular camera coordinate system in the video frame image of the first video, determine the place where the target user places the package on the security inspection machine.
  • the package time and the visible light package image of the package; based on the package release time, the X-ray package image of the package in the X-ray detection area is determined by the security inspection machine; the key points of the human skeleton in the video frame image based on the first video
  • the three-dimensional coordinates in the above-mentioned binocular camera coordinate system are determined by the above-mentioned face capture machine to determine the above-mentioned face image of the target user;
  • the above-mentioned face capture machine is set to collect the face image of the above-mentioned target user.
  • an information association device is provided.
  • a binocular camera is deployed above the security inspection machine, and a face capture machine is deployed above the X-ray detection area of the security inspection machine.
  • the device includes:
  • the detection and tracking module is configured to detect and track the target user through the first video collected by the binocular camera, to determine the user identifier of the target user, and the key points of the human skeleton of the target user in the video frame image of the first video. 3D coordinates in the binocular camera coordinate system;
  • the first determination module is set based on the user identification and the three-dimensional coordinates of the key points of the human skeleton in the coordinate system of the binocular camera in the video frame image of the first video, to determine the target user to place the package on the security inspection machine. The time when the package is released and the visible light package image of the above package;
  • the second determining module is configured to determine the X-ray package image of the package in the X-ray detection area by the security inspection machine based on the package release time;
  • a third determining module is set based on the three-dimensional coordinates of the key points of the human skeleton in the coordinate system of the binocular camera in the video frame image of the first video, and determines the face image of the target user through the face capture machine;
  • the association module is used for associating the above-mentioned visible light wrapping image, the above-mentioned X-ray wrapping image and the above-mentioned face image.
  • a server includes a processor, a communication interface, a memory, and a communication bus, the processor, the communication interface, and the memory communicate with each other through the communication bus, and the memory is used to store a computer A program, wherein the processor is configured to execute the program stored in the memory, so as to implement the steps of the above-mentioned information association method.
  • a computer-readable storage medium wherein a computer program is stored in the storage medium, and when the computer program is executed by a processor, the steps of the above-mentioned information association method are implemented.
  • a computer program product comprising instructions which, when run on a computer, cause the computer to perform the steps of the above-described information association method.
  • a computer program containing instructions is provided, and when the computer program is run on a computer, the computer executes the steps of the above-mentioned information association method.
  • hardware devices such as a binocular camera, a face capture machine, and a security inspection machine can be used to realize the correlation between the visible light wrapping image, the X-ray wrapping image, and the face image of the target user, the hardware environment is simple to build, and the equipment requirements are simple.
  • FIG. 1 is a schematic structural diagram of an implementation environment provided by an embodiment of the present application.
  • FIG. 2 is a top view of an implementation environment provided by an embodiment of the present application.
  • FIG. 3 is a top view of an implementation environment provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a principle of calculating a disparity map provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of an information association system provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of an information association device provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • FIG. 1 is a schematic diagram of an implementation environment according to an exemplary embodiment.
  • the implementation environment includes a security inspection machine 101 , a binocular camera 102 , a face capture machine 103 and a server 104 .
  • the communication connection is a wired or wireless connection, which is not limited in this embodiment of the present application.
  • the security inspection machine 101 is an electronic device that completes inspection by sending the inspected package into the X-ray inspection channel by means of a conveyor belt. When the package enters the X-ray inspection channel, it will block the package detection sensor to generate a detection signal. The detection signal is sent to the controller of the security inspection machine 101. The controller generates an X-ray trigger signal and sends it to the X-ray source of the security inspection machine 101 to trigger the X-ray sources emit X-rays. The X-ray passes through the inspected package on the conveyor belt, and the X-ray is absorbed by the inspected package and bombards the dual-energy semiconductor detector installed in the X-ray inspection channel. Dual-energy semiconductor detectors convert X-rays into electrical signals, which are processed by a processor into an X-ray-wrapped image.
  • the binocular camera 102 is disposed above the security inspection machine 101 , and the shooting field of view of the binocular camera 102 includes the area where the security inspection machine 101 is located and the walking path of the user near the security inspection machine 101 .
  • the top view area of the camera in FIG. 2 is the shooting field of view of the binocular camera 102
  • the pedestrian path represents the user's walking path near the security inspection machine 101 .
  • the face capture machine 103 is deployed above the X-ray detection area of the security inspection machine 101, facing the personnel security inspection channel, and is used to capture the frontal face image of the user.
  • the face capture machine 103 is deployed above the X-ray detection area of the security inspection machine 101 and close to the left side of the X-ray detection area. In this way, when the user passes through the personnel security inspection channel, the face capture machine 103 can be A frontal face image of the user is captured.
  • the photographing field of view of the face capture machine 103 includes the area where the personnel security check passage in FIG. 3 is located.
  • the server 104 is a server or a server cluster composed of multiple servers, and of course it can also be a cloud computing service center.
  • the two cameras of the binocular camera 102 are used to capture and shoot the video in the field of view, obtain the first video and the second video, and send the first video and the second video to server 104.
  • the server 104 detects and tracks the target user based on the first video and the second video, thereby determining the user identification of the target user and the key points of the target user's human skeleton in the binocular camera coordinate system in the video frame image of the first video.
  • the three-dimensional coordinates determine the package release time and time when the target user places the package on the security inspection machine. Wrapped visible light wraps the image.
  • the binocular camera 102 detects and tracks the target user based on the first video and the second video, thereby determining the user identity of the target user , and the three-dimensional coordinates of the key points of the human skeleton of the target user in the binocular camera coordinate system in the video frame image of the first video, and based on the user ID of the target user, and the human skeleton of the target user in the video frame image of the first video.
  • the three-dimensional coordinates of the key points in the binocular camera coordinate system determine the package release time when the target user places the package on the security inspection machine and the visible light package image of the package.
  • the binocular camera 102 sends the package release time and the visible light package image to the server 104 .
  • the binocular camera 102 determines the user ID of the target user and the three-dimensional coordinates of the key points of the human skeleton of the target user in the binocular camera coordinate system in the video frame image of the first video, in order to facilitate the face capture machine 103
  • the binocular camera 102 also needs to send the three-dimensional coordinates of the human skeleton key points of the target user in the binocular camera coordinate system in the video frame image of the first video to the server 104, so that the server 104 can control
  • the face capture machine 103 captures the face image of the target user.
  • the server 104 determines the target user based on the package release time and the speed at which the security inspection machine 101 transmits the package.
  • the security inspection machine 101 is triggered to take an X-ray package image.
  • the server 104 can also send the package release time of the target user to the security inspection machine 101, and the security inspection machine 101 determines the time when the package placed by the target user is in the X-ray detection area based on the package release time and the transmission speed of the security inspection machine 101. , and then capture the X-ray wrapping image, and send the X-ray wrapping image to the server 104 .
  • the server 104 obtains the three-dimensional coordinates of the human skeleton key points of the target user in the binocular camera coordinate system in the video frame image of the first video
  • the server 104 based on the target user's skeleton in the video frame image of the first video
  • the three-dimensional coordinates of the key points of the human skeleton in the binocular camera coordinate system and each face image captured by the face capture machine 103 determine the face image of the target user.
  • the server 104 can also send the three-dimensional coordinates of the key points of the human skeleton of the target user in the binocular camera coordinate system in the video frame image of the first video to the face capture machine 103, and the face capture machine 103 based on the target user's 3D coordinates
  • the three-dimensional coordinates of the key points of the human skeleton in the binocular camera coordinate system and each face image captured by the face capture machine 103 determine the face image of the target user.
  • the face capture machine 103 sends the face image of the target user to the server 104 .
  • the server 104 After the server 104 obtains the visible light wrapping image, the X-ray wrapping image, and the target user's face image, the server 104 can associate the visible light wrapping image, the X-ray wrapping image, and the target user's face image.
  • the visible-light-wrapped image refers to the image captured under visible light
  • the X-ray-wrapped image refers to the image captured under X-ray.
  • the key points of human skeleton include joint points such as the top of the human body, shoulders, elbows, and wrists.
  • the above content only lists some implementations.
  • the above processing process may be partially processed by the server 104 and another part processed by the corresponding device, and the above various implementations can be combined arbitrarily.
  • the embodiment of the present application There is no restriction on this.
  • the server 104 is used to realize the communication between the various devices, thereby realizing the information association.
  • the information association method provided by the embodiments of the present application may also not need to use the server 104 to realize the communication of each device.
  • the security inspection machine 101 can communicate with the binocular camera 102, and the binocular camera 102 can also communicate with the face capture machine 103.
  • the binocular camera 102 is used to determine the package release time and the visible light package image. , and send the package release time to the security inspection machine 101, the security inspection machine 101 is used to determine the X-ray package image based on the package release time, and the face capture machine 103 is used to determine the face image of the target user.
  • the visible light wrapping image, the X-ray wrapping image and the face image of the target user are correlated by the binocular camera 102 .
  • the binocular camera 102 can also send the visible light package image to the server 104
  • the security inspection machine 101 sends the X-ray package image to the server 104
  • the face capture machine 103 sends the target user's face image to the server 104
  • the server 104 Correlate the visible light wrapping image, the X-ray wrapping image and the face image of the target user.
  • FIG. 4 is a flowchart of an information association method provided by an embodiment of the present application, which is illustrated by taking the application to a server as an example.
  • a binocular camera is deployed above the security inspection machine, and the X-ray detection area of the security inspection machine is A face capture machine is deployed above.
  • the method includes the following steps.
  • Step 401 The server detects and tracks the target user through the first video collected by the binocular camera, determines the user ID of the target user, and the key points of the human skeleton of the target user in the video frame image of the first video are in the binocular camera coordinate system. 3D coordinates in .
  • the server can obtain the first video collected by the binocular camera, and then detect and track the target user through the first video, determine the user ID of the target user, and detect and track the target user through the first video, determine The three-dimensional coordinates of the key points of the human skeleton of the target user in the video frame image of the first video in the binocular camera coordinate system.
  • the first video above is only one of the two videos collected by the binocular camera.
  • the above-mentioned target user is a user included in the video frame image of the first video.
  • the server determines a depth image corresponding to each video frame image in the first video based on the first video and the second video collected by the binocular camera. Based on the first video, the target user is detected and tracked, the user ID of the target user and the coordinates of the key points of the target user's human skeleton in the video frame image of the first video are determined, based on the corresponding video frame image in the first video and the coordinates of the key points of the human skeleton of the target user in the video frame image of the first video, determine the three-dimensional coordinates of the key points of the human skeleton of the target user in the binocular camera coordinate system in the video frame image of the first video .
  • the two cameras of the binocular camera are shooting the same scene at the same time, and the process of determining the depth image corresponding to each video frame image in the first video is the same. Therefore, next, taking a video frame image in the first video as an example, the implementation process of determining the corresponding depth image by the server will be introduced. Since the depth image is usually determined by capturing two left and right video frame images of the same scene at the same time, for the convenience of description, the left and right video frame images captured at the same time in the first video and the second video are respectively referred to as the first video. frame image and second video frame image. That is, the first video frame image and the second video frame image are obtained by simultaneously shooting the same scene by two cameras of the binocular camera.
  • the implementation process for the server to determine the depth image corresponding to the first video frame image includes: determining the first disparity map according to the first video frame image and the second video frame image, and determining the first disparity map according to the following formula (1) based on the first disparity map The depth value corresponding to each pixel in the first disparity map is obtained, thereby obtaining the depth image corresponding to the first video frame image.
  • depth refers to the depth value of the pixel in the depth image corresponding to the first video frame image
  • f refers to the normalized focal length, that is, the focal length in the internal parameter matrix of the binocular camera
  • Baseline refers to the distance between the optical centers of the two cameras of the binocular camera, also known as the baseline distance
  • disp refers to the disparity value of the pixels in the first disparity map.
  • the implementation process of determining the first disparity map by the server according to the first video frame image and the second video frame image includes: comparing the pixel points in the second video frame image with the pixels on the same Y coordinate in the first video frame image The pixel points are matched, and the difference between the abscissas between every two matched pixels is calculated, and the difference between the abscissas is the disparity value between the two pixels.
  • the disparity value is taken as the pixel value corresponding to the pixel in the first video frame image, so as to obtain a disparity image with the same size as the first video frame image.
  • FIG. 5 is a schematic diagram of a principle of calculating a disparity map according to an embodiment of the present application.
  • the left image in FIG. 5 is the first video frame image
  • the right image is the second video frame image.
  • each small square in FIG. 5 may be regarded as a pixel.
  • For the pixel point A in the second video frame image when searching for the matching pixel point of the pixel point A in the first video frame image, that is, determining the pixel point matching the pixel point A. First, take the pixel point A as the central pixel point to form a W ⁇ H pixel matrix, for example, a 9 ⁇ 9 pixel matrix can be formed.
  • a pixel matrix A of 9 ⁇ 9 is formed with pixel A as the center pixel.
  • B performs the matching calculation
  • a pixel matrix B of the same size of 9 ⁇ 9 is formed with the pixel point B as the center pixel point, as shown in the dotted line box in the right figure in FIG. 5 .
  • calculate the pixel difference between each pixel point in the pixel matrix A and the pixel point at the corresponding position in the pixel matrix B and add the multiple pixel differences to obtain the pixel difference sum.
  • the matching calculation is performed with the pixel point A through the above method, and finally a plurality of corresponding pixel difference sums are obtained.
  • the smallest pixel difference sum is selected from the plurality of pixel difference sums, and the corresponding pixel point of the smallest pixel difference sum is determined as the matching point of the pixel point A. Assuming that the matching point of pixel A in the first video frame image is pixel B, at this time, the difference between the abscissas between pixel A and pixel B is calculated, and the difference between the abscissas is used as two pixels.
  • the pixel point B in the disparity map is: the pixel position in the disparity map is the A pixel point with the same coordinate position of pixel point B in a video frame image.
  • the target user in the first video there are various methods for detecting and tracking the target user in the first video, that is to say, there are various methods for detecting and tracking the target user through the first video collected by the binocular camera, for example, using deep learning
  • this embodiment of the present application does not limit or describe it in detail.
  • the user ID of the target user is assigned to the target user in the process of detecting and tracking the target user in the first video.
  • a new user ID is generated as the user ID of the target user.
  • the server determines, based on the depth image corresponding to the first video frame image and the coordinates of the target user's human skeleton key point in the first video frame image, that the target user's human skeleton key point in the first video frame image is in the binocular camera
  • the realization process of the three-dimensional coordinates in the coordinate system includes: obtaining the internal reference matrix of the binocular camera, and comparing the coordinates (x, y) of the key point of the human skeleton of the target user in the first video frame image with the inverse of the internal reference matrix of the binocular camera.
  • the matrices are multiplied to obtain the coordinates (x', y') of the key points of the human skeleton of the target user in the binocular camera coordinate system.
  • the depth value corresponding to the coordinate (x, y) is obtained from the depth image corresponding to the first video frame image, wherein the depth value corresponding to the coordinate (x, y) is: the pixel coordinate in the depth image is (x, y) The depth value of the pixel. Taking the acquired depth value as z and combining it with the coordinates (x', y'), the three-dimensional coordinates (x', y', z).
  • the above takes the first video as an example to determine the depth image, detect and track the target user, and determine the three-dimensional coordinates of the key points of the human skeleton of the target user in the binocular camera coordinate system.
  • the second video determines the depth image, detects and tracks the target user, and determines the three-dimensional coordinates of the human skeleton key points of the target user in the binocular camera coordinate system, which are not limited in this embodiment of the present application.
  • the embodiment of the present application determines the three-dimensional coordinates of the key points of the human skeleton of the target user in the binocular camera coordinate system by combining the depth image, which improves the accuracy of detecting and tracking the target user compared with the two-dimensional coordinates.
  • Step 402 Based on the user ID of the target user and the three-dimensional coordinates of the key points of the human skeleton of the target user in the binocular camera coordinate system in the video frame image of the first video, the server determines the package release time when the target user places the package on the security inspection machine. and the visible wrap image of that wrap.
  • the server determines the package release time based on the three-dimensional coordinates of the human skeleton key points of the target user in the binocular camera coordinate system in the video frame image of the first video. Based on the user identification of the target user and the package release time, a visible light package image of the package is obtained from the first video.
  • the implementation process for the server to determine the moment of releasing the package includes: based on the three-dimensional coordinates of the human skeleton key point of the target user in the binocular camera coordinate system in the video frame image of the first video, determining the human skeleton key point of the target user.
  • the positional relationship with the package placement area satisfies the first package placement condition, it is determined that the target user is in the package release state, and the moment when the target user is in the package release state is determined as the package release time.
  • the first package placement condition means that one or more key points of the human skeleton of the target user are located in the package placement area in consecutive N video frame images, and N is an integer greater than 1.
  • the server determines whether one or more of the key points of the human skeleton of the target user are located in the package based on the three-dimensional coordinates of the key points of the human skeleton of the target user in the binocular camera coordinate system in the video frame image of the first video. placed in the area. If it is determined that one or more of the key points of the human skeleton of the target user are located in the package placement area in consecutive N video frame images, it is determined that the positional relationship between the key points of the human skeleton of the target user and the package placement area satisfies the first Package placement conditions, and determine the package release time.
  • the server determines that the target user is in the state of releasing the package, and determines the shooting moment of the i+Nth video frame image as the moment of releasing the package.
  • the parcel placement area refers to the expansion of the conveyor belt area of the security inspection machine.
  • the area T is the parcel placement area, which is obtained by the expansion of the conveyor belt area of the security inspection machine.
  • the server can obtain the three-dimensional coordinates of the package placement area in the binocular camera coordinate system.
  • the server determines the package release time, it compares the three-dimensional coordinates of the key points of the human skeleton of the target user in the binocular camera coordinate system in the video frame image of the first video with the three-dimensional coordinates of the package placement area in the binocular camera coordinate system. By comparison, it can be determined whether one or more key points of the human skeleton of the target user are located in the package placement area.
  • the implementation process for the server to determine the moment of releasing the package includes: based on the three-dimensional coordinates of the key points of the human skeleton of the target user in the binocular camera coordinate system in the video frame image of the first video, determining the movement change of the target user When the second package placement condition is satisfied, it is determined that the target user is in the package release state, and the moment when the target user is in the package release state is determined as the package release time.
  • the second package placement condition means that one or more of the human skeleton key points of the target user have fluctuations in consecutive M video frame images, and the fluctuation amplitude is greater than the amplitude threshold, and M is an integer greater than 1.
  • the second package placement condition refers to the change trend of the action of the target user from picking up the package to putting down the package.
  • the server determines, based on the three-dimensional coordinates of the key points of the human skeleton of the target user in the binocular camera coordinate system in the video frame image of the first video, that one or more of the key points of the human skeleton of the target user are continuous. Whether there is fluctuation in the M video frame images. If it is determined that one or more of the key points of the human skeleton of the target user have fluctuations in the consecutive M video frame images, and the fluctuation amplitude is greater than the amplitude threshold, it is determined that the change of the target user's movements satisfies the second package placement condition, And determine the release time.
  • the server determines that the target user is in the state of releasing the package, and determines the shooting time of the i+M-th video frame image as the package releasing time.
  • the first video includes multiple video frame images
  • the positions of the key points of the human skeleton of the target user will change with the shooting time of the multiple video frame images.
  • the key points of the human skeleton of the target user will change.
  • the three-dimensional coordinates in the binocular camera coordinate system also change as the shooting time of the multiple video frame images changes. Therefore, based on the three-dimensional coordinates of the key points of the human skeleton of the target user in the binocular camera coordinate system, the server can determine whether the key points of the human skeleton of the target user fluctuate, and determine the movement trend of the target user.
  • the key points of the human skeleton of the target user basically have no ups and downs, but in the process of placing the package by the target user, the key points of the human skeleton of the target user usually have ups and downs. Therefore, in this paper
  • the distance between the three-dimensional coordinates of the same human skeleton key point of the target user in the two adjacent video frame images in the binocular camera coordinate system can be determined, and a plurality of human skeleton key points one by one can be obtained. corresponding multiple distances. If each distance in the plurality of distances is smaller than the distance threshold, it is considered that there is no fluctuation in the key points of the human skeleton of the target user in the next video frame image.
  • the key points of the target user's human skeleton in the next video frame image have fluctuations, and the maximum distance among the multiple distances is determined as the next video frame image.
  • the fluctuation range of the key points of the human skeleton of the target user is determined as the next video frame image.
  • the target user When the target user does not place the package, the target user basically does not pick up or put down the package, but during the process of placing the package, the target user usually picks up the package and puts down the package, and The movement trend is usually also the trend of picking up the package and putting it down.
  • the action of the target user can usually be determined by the position of the key point of the human body of the target user, for example, the position of the key point of the bone on the arm. Therefore, in the embodiment of the present application, the action change trend of the target user can be determined according to the three-dimensional coordinates of the key points of the human skeleton of the target user in the binocular camera coordinate system, by means of deep learning or the like. Specific implementation manners are not described too much in this embodiment of the present application.
  • the realization process that the server obtains the visible light package image of the package from the first video based on the package release time includes: the server obtains from the first video a video frame image whose shooting time is the package release time, based on the user ID of the target user, An image area including the target user and the package being placed by the target user is determined from the acquired video frame images, and a visible light package image of the target user is acquired from the image area.
  • the server in the process of detecting and tracking the target user based on the first video, can identify the target user from the video frame images of the first video, and can also assign a user ID to the target user.
  • the video frame image whose time is the moment when the package is placed includes not only the target user, but also the package being placed by the target user. Therefore, after the server obtains from the first video the video frame image whose shooting time is the moment of releasing the package, the visible light package image of the target user can be determined from the obtained video frame image based on the user identification of the target user.
  • the target user's package release time period may be determined based on the package release time. For example, a preset time period before and after the package release time is used as the package release time period, and then the first For each video frame image within the package release time period in the video, a video frame image containing the user ID of the target user is further determined from the acquired video frame images as the visible light package image of the target user.
  • the packet release time is 17:31:29
  • the packet-release time period is: 17:31:28-17:31:30
  • the timestamp is selected from the first video at 17:31
  • Multiple video frame images from 28 seconds to 17:31:30
  • the video frame images containing the user identification of the target user from the selected multiple video frame images are used as the visible light wrapping image of the target user.
  • the above takes the first video as an example to introduce the implementation method for the server to obtain the visible light wrapping image.
  • the visible light wrapping image may also be obtained from the second video, which is not done in this embodiment of the present application. limited.
  • Step 403 The server determines, through the security inspection machine, an X-ray package image in which the package is in the X-ray detection area based on the package release time.
  • taking the package release time as the starting time point according to the conveying speed of the security inspection machine to transmit the package, determine the time when the package placed by the target user is in the X-ray detection area of the security inspection machine, and obtain the X-ray detection time. Based on the X-ray detection time, the security inspection machine determines the X-ray package image of the package in the X-ray detection area of the security inspection machine.
  • the speed of the conveyor belt of the security inspection machine is fixed and uniform, and the distance between the center point of the conveyor belt and the X-ray detection area is fixed. Therefore, the distance between the center point of the conveyor belt and the X-ray detection area can be divided by the security inspection.
  • the delivery speed of the package delivered by the machine is obtained, and the first duration is obtained. Then, the first duration is increased on the basis of the time of releasing the package to obtain the X-ray detection time.
  • the server sends the X-ray detection time to the security inspection machine, and the security inspection machine collects the image of the package at the X-ray detection time, thereby obtaining the X-ray package image.
  • the server may determine, from each X-ray package image collected by the security inspection machine, that the collection time is an X-ray package image whose X-ray detection time is the same or whose difference is less than a specified threshold, as the X-ray package image where the package is in the X-ray detection area.
  • the binocular camera is vertically deployed above the security inspection machine to capture visible light package images.
  • the visual angle of the visible light package image and the X-ray package image are basically the same.
  • Step 404 The server determines the face image of the target user through the face capture machine based on the three-dimensional coordinates of the key points of the human skeleton of the target user in the binocular camera coordinate system in the video frame image of the first video.
  • the server converts the three-dimensional coordinates of the human skeleton key points of the target user in the binocular camera coordinate system in the video frame image of the first video into the face-captured image coordinate system. Based on the coordinates of the key points of the human skeleton of the target user in the face capture image coordinate system, the face image of the target user is determined by the face capture machine.
  • the implementation process for the server to determine the coordinates of the key points of the human skeleton of the target user in the coordinate system of the face capture image includes: the server obtains the rotation matrix and translation matrix from the coordinate system of the binocular camera to the coordinate system of the face capture machine, and the face The internal parameter matrix of the camera. Then, multiply the three-dimensional coordinates of the human skeleton key points of the target user in the binocular camera coordinate system in the video frame image of the first video with the obtained rotation matrix and translation matrix to obtain the target user's human skeleton key points in the face. 3D coordinates in the camera coordinate system.
  • the rotation matrix and translation matrix from the coordinate system of the binocular camera to the coordinate system of the face capture machine need to be calibrated in advance.
  • the calibration process includes: Place the 14*11 black and white grid in the common field of view of the binocular camera and the face capture machine, and calculate the rotation matrix R1 of the binocular camera and the face capture machine relative to the world coordinate system of the black and white grid , R0 and offset matrices T1, T0. Based on the rotation matrices R1, R0 and the offset matrices T1, T0, the rotation matrix R and the offset matrix T from the binocular camera coordinate system to the face capture camera coordinate system are calculated according to the following formulas (2) and (3).
  • the server determines the face image of the target user through the face capture machine.
  • the realization process includes: the server starts from the first In the video frame image of a video, the key points of the human skeleton of the target user are in the coordinates of the face capture image coordinate system, and the coordinates of the key points of the target user's head and shoulders in the face capture image coordinate system are selected. Based on the coordinates of the key points of the target user's head and shoulders in the face capture image coordinate system, the region of the target user's face in the face capture image coordinate system is predicted, and the predicted face region of the target user is obtained.
  • the face image of the target user is determined.
  • the real face area in each image refers to the face area included in the image, for example, the face area determined by performing face recognition on the image.
  • the server converts the key points of the human skeleton of the target user in the video frame image of the first video from the binocular camera coordinate system to the coordinate system of the face-captured image, and obtains the key points of the human skeleton of the target user in the face-captured image.
  • the coordinates in the coordinate system based on the coordinates of the key points of the head and shoulders in the key points of the human skeleton of the target user in the face capture image coordinate system, predict the area of the target user's face in the face capture machine image coordinate system, and obtain The predicted face area of the target user. Compare the predicted face area of the target user with each real face area in the image captured by the face capture machine, and determine the real face area that overlaps with the predicted face area and has the largest overlapping area as the person of the target user. face image.
  • the predicted area of the target user's face in the face capture image coordinate system is Area 1.
  • the image captured by the face capture machine includes three real face areas, and the three real face areas are area 2, area 3 and area 4 respectively. Among them, area 1 and area 3 overlap, and area 1 and area 4 also overlap, but area 3 and area 1 have the largest overlap area.
  • the real face area corresponding to area 3 is determined as the face image of the target user .
  • multiple video frame images of the first video may include key points of the human skeleton of the target user.
  • multiple face regions of the target user can be predicted through the multiple video frame images.
  • the multiple images captured by the face capture camera may include the real face area of the target user.
  • the server can capture multiple predicted face regions of the target user in the coordinate system of the face capture image, and each real face region in the multiple images captured by the face capture machine.
  • the face images of the target user are respectively determined from the images, and multiple face images of the target user are obtained.
  • An optimal face image is determined from the plurality of face images.
  • the first video includes multiple video frame images
  • the face capture machine will also capture multiple images
  • the binocular camera and the face capture machine have a common shooting field of view. Therefore, the target user may Appears in the video frame image of the first video and the image captured by the face capture camera.
  • the server can determine a face image of the target user based on the images collected by the binocular camera and the face capture machine at the same time.
  • the server can also determine multiple face images of the target user based on images collected by the binocular camera and the face capture machine at multiple times.
  • the images collected by the binocular camera and the face capture machine at the same time are image 1 and image 2, respectively.
  • the server converts the three-dimensional coordinates of the key points of the human skeleton of the target user in image 1 in the binocular camera coordinate system to the face.
  • the face area of the target user is predicted based on the coordinates of the key points of the target user's head and shoulders in the image 1 in the face captured image coordinate system.
  • the face image of the target user is determined from Image 2.
  • the methods for determining the optimal face image from the plurality of face images include various methods, such as scoring multiple face images, and selecting the face image with the highest score as the optimal face image.
  • the embodiments of the present application do not limit the method for determining the optimal face image.
  • the face capture machine in the above solution is a monocular, but a binocular camera can also be used in practical applications.
  • the face capture machine is a binocular camera, the three-dimensional coordinates of each real face region in the image captured by the face capture machine in the face capture machine coordinate system can be determined.
  • the implementation process of determining the face image of the target user through the face capture machine includes: The three-dimensional coordinates of the human skeleton key points of the target user in the video frame image of a video in the binocular camera coordinate system are converted into the face capture machine coordinate system to obtain the human skeleton key points of the target user in the video frame image of the first video. 3D coordinates in the face capture camera coordinate system. Based on the three-dimensional coordinates of the key points of the target user's head and shoulders in the video frame image of the first video in the face capture machine coordinate system, the face image of the target user is determined by the face capture machine.
  • the face capture machine is a binocular camera
  • the three-dimensional coordinates of each real face region in the image captured by the face capture machine can be determined in the face capture machine coordinate system, and then the corresponding three-dimensional real face regions can be obtained. Then, based on the three-dimensional coordinates of the key points of the head and shoulder of the target user in the coordinate system of the face capture machine in the video frame image of the first video, predict the three-dimensional face area of the target user in the coordinate system of the face capture machine , to obtain the 3D predicted face region of the target user.
  • the binocular face capture machine locates the face image of the target user through three-dimensional coordinates, thereby improving the accuracy of determining the face image of the target user. That is, by using the spatial position coordinates, the face image of the target user can be determined more accurately.
  • Step 405 The server associates the visible light wrapping image, the X-ray wrapping image and the face image of the target user.
  • the server determines the visible light wrapping image, the X-ray wrapping image and the face image of the target user, the visible light wrapping image, the X-ray wrapping image and the face image of the target user can be associated.
  • the server determines the visible light package of the target user
  • the user ID of the target user and the visible light package image of the target user can be associated.
  • the visible light wrapping image and the X-ray wrapping image are associated to obtain a second association relationship.
  • the face image of the target user is determined, the user ID of the target user and the face image of the target user are associated to obtain a third association relationship.
  • the visible light wrapping image, the X-ray wrapping image and the face image of the target user are associated.
  • the server may also determine the optimal face image from multiple face images of the target user. In this way, when the server associates the visible light wrapping image, the X-ray wrapping image and the face image of the target user, the visible light wrapping image, the X-ray wrapping image and the optimal face image of the target user can be associated.
  • the association relationship between the three is stored, so that it is convenient for users to view, manage and trace problems later.
  • a binocular camera, a face capture machine, a security inspection machine, etc. can be used to realize the correlation between the visible light wrapping image, the X-ray wrapping image and the face image of the target user, the hardware environment is simple to build, and the equipment requirements are simple.
  • the introduction of depth images can achieve precise positioning and tracking of target users without user cooperation, and is basically not affected by human traffic.
  • the visible light wrapping image, the X-ray wrapping image and the optimal face image of the target user are associated, which is convenient for the application of various scenarios such as face comparison in the later stage.
  • FIG. 6 is a schematic structural diagram of an information association system provided by an embodiment of the present application, the information association system includes: a server 601, a binocular camera 602 deployed above the security inspection machine, and a person deployed above the X-ray detection area of the security inspection machine Face capture machine 603; of which,
  • the binocular camera 601 is configured to collect the first video including the target user
  • the server 601 is configured to detect and track the target user through the first video, determine the user identifier of the target user, and the human skeleton of the target user in the video frame image of the first video.
  • the three-dimensional coordinates of the key points in the binocular camera coordinate system based on the user identifier and the three-dimensional coordinates of the human skeleton key points in the binocular camera coordinate system in the video frame image of the first video, determine the package release time when the target user places the package on the security inspection machine and the visible light package image of the package; based on the package release time, the security inspection machine determines that the package is in the X-ray detection area of the X-ray Wrapped image; based on the three-dimensional coordinates of the human skeleton key points in the binocular camera coordinate system in the video frame image of the first video, the face image of the target user is determined by the face capture machine 603 ;
  • the face capture machine 603 is configured to collect the face image of the target user.
  • the server 601 detects and tracks the target user through the first video collected by the binocular camera 602, determines the user ID of the target user, and the target user in the video frame image of the first video.
  • the three-dimensional coordinates of the human skeleton key points in the binocular camera coordinate system may include: based on the first video and the second video collected by the binocular camera 602, determining the corresponding video frame images in the first video.
  • the based on the first video detect and track the target user, determine the user ID of the target user, and the coordinates of the key points of the human skeleton in the video frame image of the first video ; Based on the depth image corresponding to the video frame image in the first video, and the coordinates of the key point of the human skeleton in the video frame image of the first video, determine the position in the video frame image of the first video.
  • the three-dimensional coordinates of the human skeleton key points in the binocular camera coordinate system.
  • the server 601 determines that the target user is in the The package release time when the package is placed on the security inspection machine and the visible light package image of the package may include: based on the three-dimensional coordinates of the human skeleton key points in the binocular camera coordinate system in the video frame image of the first video, Determining the package releasing time; and obtaining a visible light package image of the package from the first video based on the user identifier and the package releasing time.
  • the server 601 determines the package release moment based on the three-dimensional coordinates of the human skeleton key points in the binocular camera coordinate system in the video frame image of the first video, which may include: based on the first video frame image.
  • the three-dimensional coordinates of the human skeleton key point in the binocular camera coordinate system in the video frame image of a video when it is determined that the positional relationship between the human skeleton key point and the package placement area satisfies the first package placement condition, determine the The target user is in the bag releasing state, and the moment when the target user is in the bag releasing state is determined as the bag releasing moment; or based on the key point of the human skeleton in the binocular
  • the three-dimensional coordinates in the camera coordinate system when it is determined that the change of the target user's action meets the second package placement condition, it is determined that the target user is in the package release state, and the moment when the target user is in the package release state is determined as the Pack time.
  • the first package placement condition means that one or more of the human skeleton key points are located in the package placement area in consecutive N video frame images, and the N is greater than 1.
  • the second package placement condition means that one or more of the human skeleton key points have fluctuations in consecutive M video frame images, and the fluctuation amplitude is greater than the amplitude threshold, and the M is greater than 1 Integer; or, the second package placement condition refers to the change trend of the action of the target user from picking up the package to putting down the package.
  • the server 601 determines, by the security inspection machine, an X-ray package image of the package in the X-ray detection area based on the package release time, which may include: taking the package release time as a starting time point, Determine the time when the package placed by the target user is in the X-ray detection area according to the conveying speed of the package conveyed by the security inspection machine, and obtain the X-ray detection time; based on the X-ray detection time, the security inspection machine determines the time An X-ray package image of the package in the X-ray detection area.
  • the server 601 determines the target user's identity through the face capture machine 603.
  • the face image may include: converting the three-dimensional coordinates of the human skeleton key points in the binocular camera coordinate system in the video frame image of the first video into the face capture image coordinate system, and the human face
  • the captured image coordinate system refers to the coordinate system of the image captured by the face capture machine 603; based on the coordinates of the human skeleton key points in the face captured image coordinate system in the video frame image of the first video, The face image of the target user is determined by the face capture machine 603 .
  • the server 601 determines, through the face capture machine 603, the coordinates of the target user based on the coordinates of the human skeleton key points in the face capture image coordinate system in the video frame image of the first video.
  • a face image which may include: selecting the key points of the head and shoulders of the target user from the key points of the human skeleton in the coordinates of the face capture image coordinate system in the video frame image of the first video The coordinates of the point in the face capture image coordinate system; based on the coordinates of the key points of the head and shoulders of the target user in the face capture image coordinate system, predict that the target user's face is in The area in the coordinate system of the face capture image is obtained, and the predicted face area of the target user is obtained; based on the predicted face area of the target user, and each real face in the image captured by the face capture machine 603 area, and determine the face image of the target user.
  • the multiple video frame images of the first video include the human skeleton key points, and the multiple images captured by the face capture machine 603 include the real face area of the target user;
  • the server 601 determines the face image of the target user based on the predicted face area of the target user and each real face area in the image captured by the face capture machine 603, including: based on the target user Multiple predicted face regions of the user, and each real face region in the multiple images captured by the face capture machine 603, the target user is determined from the multiple images captured by the face capture machine 603
  • the multiple face images, the multiple predicted face regions refer to the face regions predicted by the multiple video frame images; determine the optimal face image from the multiple face images;
  • the server 601 associates the visible light wrapping image, the X-ray wrapping image, and the face image, including: associating the visible light wrapping image, the X-ray wrapping image, and the optimal face image.
  • a binocular camera, a face capture machine, a security inspection machine, etc. can be used to realize the correlation between the visible light wrapping image, the X-ray wrapping image and the face image of the target user, the hardware environment is simple to build, and the equipment requirements are simple.
  • the introduction of depth images can achieve precise positioning and tracking of target users without user cooperation, and is basically not affected by human traffic.
  • the visible light wrapping image, the X-ray wrapping image and the optimal face image of the target user are associated, which is convenient for the application of various scenarios such as face comparison in the later stage.
  • FIG. 7 is a schematic structural diagram of an information associating apparatus provided by an embodiment of the present application.
  • the information associating apparatus may be implemented by software, hardware, or a combination of the two as part or all of a server.
  • a binocular camera is deployed above the security inspection machine, and a face capture machine is deployed above the X-ray detection area of the security inspection machine.
  • the apparatus includes: a detection and tracking module 701 , a first determination module 702 , a second determination module 703 , a third determination module 704 and an association module 705 .
  • the detection and tracking module 701 is configured to detect and track the target user through the first video collected by the binocular camera, determine the user ID of the target user, and the key points of the human skeleton of the target user in the video frame image of the first video are in the binocular. 3D coordinates in the camera coordinate system;
  • the first determination module 702 is set based on the user identification and the three-dimensional coordinates of the key points of the human skeleton in the binocular camera coordinate system in the video frame image of the first video, and determines the package release time and package when the target user places the package on the security inspection machine.
  • the visible light wrapped image
  • the second determination module 703 is configured to determine the X-ray package image of the package in the X-ray detection area by the security inspection machine based on the time of releasing the package;
  • the third determination module 704 is set based on the three-dimensional coordinates of the human skeleton key points in the binocular camera coordinate system in the video frame image of the first video, and determines the face image of the target user through the face capture machine;
  • the correlation module 705 is configured to correlate the visible light wrapping image, the X-ray wrapping image and the face image.
  • the detection and tracking module 701 includes:
  • the first determination submodule is configured to determine the depth image corresponding to the video frame image in the first video based on the first video and the second video collected by the binocular camera;
  • the second determination submodule is configured to detect and track the target user based on the first video, and determine the user identification of the target user and the coordinates of the key points of the human skeleton in the video frame image of the first video;
  • the third determination sub-module is configured to determine the human skeleton in the video frame image of the first video based on the depth image corresponding to the video frame image in the first video and the coordinates of the human skeleton key points in the video frame image of the first video The 3D coordinates of the keypoint in the binocular camera coordinate system.
  • the first determining module 702 includes:
  • the fourth determining sub-module is set based on the three-dimensional coordinates of the key points of the human skeleton in the binocular camera coordinate system in the video frame image based on the first video, and determines the time of releasing the package;
  • the obtaining sub-module is configured to obtain the visible light package image of the package from the first video based on the user identification and the time of releasing the package.
  • the fourth determination sub-module is specifically set to:
  • the target user's action changes meet the second package placement condition and the target user is determined to be in the release state, and the target user The moment in the package release state is determined as the package release time.
  • the first package placement condition refers to that one or more of the human skeleton key points are in the package placement area in consecutive N video frame images, and N is an integer greater than 1;
  • the second package placement condition means that one or more of the key points of the human skeleton have fluctuations in consecutive M video frame images, and the fluctuation amplitude is greater than the amplitude threshold, and M is an integer greater than 1; or, the second package placement condition
  • the condition refers to the change trend of the action of the target user from picking up the package to putting down the package.
  • the second determining module 703 includes:
  • the fifth determination sub-module is set at the starting time point of the package release time, and according to the transmission speed of the security inspection machine to transmit the package, determines the time when the package placed by the target user is in the X-ray detection area, and obtains the X-ray detection time;
  • the sixth determination sub-module is configured to determine the X-ray package image of the package in the X-ray detection area through the security inspection machine based on the X-ray detection time.
  • the third determining module 704 includes:
  • the conversion submodule is configured to convert the three-dimensional coordinates of the key points of the human skeleton in the binocular camera coordinate system in the video frame image of the first video into the face capture image coordinate system, and the face capture image coordinate system refers to the face capture image coordinate system.
  • the seventh determination sub-module is set based on the coordinates of the key points of the human skeleton in the coordinate system of the face capture image in the video frame image based on the first video, and determines the face image of the target user through the face capture machine.
  • the seventh determination submodule includes:
  • the first determination unit is set in the coordinates of the human skeleton key points in the face capture image coordinate system from the video frame image of the first video, and selects the key points of the head and shoulders of the target user in the face capture image coordinate system. the coordinates in ;
  • the prediction unit is set based on the coordinates of the key points of the target user's head and shoulders in the face capture image coordinate system, and predicts the area of the target user's face in the face capture image coordinate system to obtain the target user's prediction. face area;
  • the second determination unit is configured to determine the face image of the target user based on the predicted face area of the target user and each real face area in the image captured by the face capture machine.
  • the multiple video frame images of the first video include key points of the human skeleton of the target user, and the multiple images captured by the face capture machine include the real face area of the target user;
  • the third determining unit is specifically set in:
  • multiple face images of the target user are determined from the multiple images captured by the face capture machine.
  • the multiple predicted face regions refer to the face regions predicted by multiple video frame images of the first video;
  • the association module 705 is specifically set to:
  • a binocular camera, a face capture machine, a security inspection machine, etc. can be used to realize the correlation between the visible light wrapping image, the X-ray wrapping image and the face image of the target user, the hardware environment is simple to build, and the equipment requirements are simple.
  • the introduction of depth images can achieve precise positioning and tracking of target users without user cooperation, and is basically not affected by human traffic.
  • the visible light wrapping image, the X-ray wrapping image and the optimal face image of the target user are associated, which is convenient for the application of various scenarios such as face comparison in the later stage.
  • the information associating apparatus when the information associating apparatus provided in the above-mentioned embodiments associates information, only the division of the above-mentioned functional modules is used as an example for illustration. The internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • the information associating apparatus and the information associating method embodiments provided by the above embodiments belong to the same concept, and the specific implementation process thereof is detailed in the method embodiments, which will not be repeated here.
  • FIG. 8 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • Server 800 includes central processing unit (CPU) 801 , system memory 804 including random access memory (RAM) 802 and read only memory (ROM) 803 , and a system bus 805 connecting system memory 804 and central processing unit 801 .
  • Server 800 also includes a basic input/output system (I/O system) 806 that facilitates the transfer of information between various components within the computer, and a mass storage device 807 for storing operating system 813, application programs 814, and other program modules 815 .
  • I/O system basic input/output system
  • Basic input/output system 806 includes a display 808 for displaying information and input devices 809 such as a mouse, keyboard, etc., for user input of information. Both the display 808 and the input device 809 are connected to the central processing unit 801 through the input and output controller 810 connected to the system bus 805 .
  • the basic input/output system 807 may also include an input output controller 810 for receiving and processing input from various other devices such as a keyboard, mouse, or electronic stylus. Similarly, input output controller 810 also provides output to a display screen, printer, or other type of output device.
  • Mass storage device 807 is connected to central processing unit 801 through a mass storage controller (not shown) connected to system bus 805 .
  • Mass storage device 807 and its associated computer-readable media provide non-volatile storage for server 800 . That is, mass storage device 807 may include a computer-readable medium (not shown) such as a hard disk or a CD-ROM drive.
  • Computer-readable media can include computer storage media and communication media.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media include RAM, ROM, EPROM, EEPROM, flash memory or other solid state storage technology, CD-ROM, DVD or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices.
  • RAM random access memory
  • ROM read only memory
  • EPROM Erasable programmable read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • the server 800 may also run on a remote computer connected to a network through a network such as the Internet. That is, the server 800 can be connected to the network 812 through the network interface unit 811 connected to the system bus 805, or can also use the network interface unit 811 to connect to other types of networks or remote computer systems (not shown).
  • the above-mentioned memory also includes one or more programs, and the one or more programs are stored in the memory and configured to be executed by the CPU.
  • a computer-readable storage medium is also provided, and a computer program is stored in the storage medium, and when the computer program is executed by a processor, the steps of the information association method in the above-mentioned embodiments are implemented.
  • the computer-readable storage medium may be ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.
  • the computer-readable storage medium mentioned in the embodiments of the present application may be a non-volatile storage medium, in other words, may be a non-transitory storage medium.
  • a computer program product containing instructions, which, when executed on a computer, cause the computer to perform the steps of the information association method described above.
  • a computer program comprising instructions, when the computer program is run on a computer, the computer program causes the computer to perform the steps of the information association method described above.
  • references herein to "at least one” refers to one or more, and “plurality” refers to two or more.
  • “/” means or means, for example, A/B can mean A or B;
  • "and/or” in this document is only an association that describes an associated object Relation, it means that there can be three kinds of relations, for example, A and/or B can mean that A exists alone, A and B exist at the same time, and B exists alone.
  • words such as “first” and “second” are used to distinguish the same or similar items with basically the same function and effect. Those skilled in the art can understand that the words “first”, “second” and the like do not limit the quantity and execution order, and the words “first”, “second” and the like are not necessarily different.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Analysing Materials By The Use Of Radiation (AREA)

Abstract

本申请实施例公开了一种信息关联方法、系统、装置、服务器及存储介质,属于安防监管领域。包括:通过双目相机采集的第一视频对目标用户进行检测和跟踪,确定目标用户的用户标识以及目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标,基于用户标识以及人体骨骼关键点在双目相机坐标系中的三维坐标,确定放包时刻和可见光包裹图像,基于放包时刻,通过安检机确定包裹处于X光探测区域的X光包裹图像,基于人体骨骼关键点在双目相机坐标系中的三维坐标,通过人脸抓拍机确定目标用户的人脸图像,关联可见光包裹图像、X光包裹图像和人脸图像。本申请实施例通过双目相机、人脸抓拍机和安检机等简单设备即可实现信息的关联,硬件环境搭建简单。

Description

信息关联方法、系统、装置、服务器及存储介质
本申请要求于2021年03月29日提交中国专利局、申请号为20211033656 7.6发明名称为“信息关联方法、装置、服务器及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及安防监管领域,特别涉及一种信息关联方法、系统、装置、服务器及存储介质。
背景技术
在安检机对包裹进行安检的场景中,将包裹图像以及在安检机上放置包裹的用户的人脸图像进行关联,从而利用关联信息在危险品报警、问题包裹追溯、重点人群管控等领域具有重要的应用前景。因此,如何进行信息关联成为目前亟待解决的问题。
相关技术提出了一种信息关联方法,在该方法中,用户需要按顺序排队进入放包区域,将包裹放置在收纳托盘中。而且还需要上下架设两个人脸抓拍机,处于上方的人脸抓拍机正对用户人脸,用于抓拍人脸图像,处于下方的人脸抓拍机正对人体部位,用于确定用户双手推送收纳托盘到安检机传送带上这一动作,以及抓拍包裹未进入安检机的可见光包裹图像。之后,将处于下方的人脸抓拍机确定上述动作的时刻作为起始时刻,结合传送带速度,确定收纳托盘到达安检机内的X光探测器下的X光包裹图像。通过上下两个人脸抓拍机的同步,完成放置包裹的用户的人脸图像、可见光包裹图像和X光包裹图像的关联。
然而,上述方案对场地要求比较高,而且需要部署托盘传送通道等较为复杂的硬件结构,部署困难。
发明内容
本申请实施例提供了一种信息关联方法、系统、装置、服务器及存储介质,可以解决相关技术的硬件结构复杂,部署困难的问题。所述技术方案如下:
一方面,提供了一种信息关联方法,安检机的上方部署有双目相机,上述安检机的X光探测区域的上方部署有人脸抓拍机,上述方法包括:
通过上述双目相机采集的第一视频对目标用户进行检测和跟踪,确定上述目标用户的用户标识,以及上述第一视频的视频帧图像中上述目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标;基于上述用户标识,以及上述第一视频的视频帧图像中上述人体骨骼关键点在上述双目相机坐标系中的三维坐标,确定上述目标用户在上述安检机上放置包裹的放包时刻和上述包裹的可见光包裹图像;基于上述放包时刻,通过上述安检机确定上述包裹处于上述X光探测区域的X光包裹图像;基于上述第一视频的视频帧图像中上述人体骨骼关键点在上述双目相机坐标系中的三维坐标,通过上述人脸抓拍机确定上述目标用户的人脸图像;关联上述可见光包裹图像、上述X光包裹图像和上述人脸图像。
另一方面,提供了一种信息关联系统,上述信息关联系统包括:服务器、安检机的上方部署有双目相机,以及安检机的X光探测区域的上方部署有人脸抓拍机;其中,
上述双目相机,设置于采集包含目标用户的第一视频;
上述服务器,设置于通过上述第一视频对上述目标用户进行检测和跟踪,确定上述目标用户的用户标识,以及上述第一视频的视频帧图像中上述目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标;基于上述用户标识,以及上述第一视频的视频帧图像中上述人体骨骼关键点在上述双目相机坐标系中的三维坐标,确定上述目标用户在上述安检机上放置包裹的放包时刻和上述包裹的可见光包裹图像;基于上述放包时刻,通过上述安检机确定上述包裹处于上述X光探测区域的X光包裹图像;基于上述第一视频的视频帧图像中上述人体骨骼关键点在上述双目相机坐标系中的三维坐标,通过上述人脸抓拍机确定上述目标用户的人脸图像;
上述人脸抓拍机,设置于采集上述目标用户的人脸图像。
另一方面,提供了一种信息关联装置,安检机的上方部署有双目相机,上述安检机的X光探测区域的上方部署有人脸抓拍机,上述装置包括:
检测跟踪模块,设置于通过上述双目相机采集的第一视频对目标用户进行检测和跟踪,确定上述目标用户的用户标识,以及上述第一视频的视频帧图像中上述目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标;
第一确定模块,设置于基于上述用户标识,以及上述第一视频的视频帧图像中上述人体骨骼关键点在上述双目相机坐标系中的三维坐标,确定上述目标用户在上述安检机上放置包裹的放包时刻和上述包裹的可见光包裹图像;
第二确定模块,设置于基于上述放包时刻,通过上述安检机确定上述包裹处于上述X光探测区域的X光包裹图像;
第三确定模块,设置于基于上述第一视频的视频帧图像中上述人体骨骼关键点在上述双目相机坐标系中的三维坐标,通过上述人脸抓拍机确定上述目标用户的人脸图像;
关联模块,用于关联上述可见光包裹图像、上述X光包裹图像和上述人脸图像。
另一方面,提供了一种服务器,上述服务器包括处理器、通信接口、存储器和通信总线,上述处理器、上述通信接口和上述存储器通过上述通信总线完成相互间的通信,上述存储器用于存放计算机程序,上述处理器用于执行上述存储器上所存放的程序,以实现上述上述信息关联方法的步骤。
另一方面,提供了一种计算机可读存储介质,上述存储介质内存储有计算机程序,上述计算机程序被处理器执行时实现上述上述信息关联方法的步骤。
另一方面,提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述上述的信息关联方法的步骤。
另一方面,提供了一种包含指令的计算机程序,上述计算机程序在计算机上运行时,使得计算机执行上述上述的信息关联方法的步骤。
本申请实施例提供的技术方案至少可以带来以下有益效果:
本申请实施例采用双目相机、人脸抓拍机和安检机等硬件设备即可实现可见光包裹图像、X光包裹图像以及目标用户的人脸图像的关联,硬件环境搭建简单,设备需求简单。
附图说明
为了更清楚地说明本申请实施例和现有技术的技术方案,下面对实施例和现有技术中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种实施环境的结构示意图;
图2是本申请实施例提供的一种实施环境的俯视图;
图3是本申请实施例提供的一种实施环境的俯视图;
图4是本申请实施例提供的一种信息关联方法的流程图;
图5是本申请实施例提供的一种计算视差图的原理示意图;
图6是本申请实施例提供的一种信息关联系统的结构示意图;
图7是本申请实施例提供的一种信息关联装置的结构示意图;
图8是本申请实施例提供的一种服务器的结构示意图。
具体实施方式
为使本申请的目的、技术方案、及优点更加清楚明白,以下参照附图并举实施例,对本申请进一步详细说明。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
在对本申请实施例提供的信息关联方法进行详细的解释说明之前,先对本申请实施例提供的应用场景和实施环境进行介绍。
请参考图1,图1是根据一示例性实施例示出的一种实施环境的示意图。该实施环境包括安检机101、双目相机102、人脸抓拍机103和服务器104,安检机101、双目相机102和人脸抓拍机103均与服务器104进行通信连接。该通信连接为有线或者无线连接,本申请实施例对此不做限定。
其中,安检机101是一种借助于传送带将被检查包裹送入X光检查通道而完成检查的电子设备。包裹进入X光检查通道,将阻挡包裹检测传感器从而产生检测信号,检测信号被送往安检机101的控制器,由控制器产生X光触发信号并发送给安检机101的X射线源,以触发X射线源发射X光。X光穿过传送带上的被检包裹,X光被被检包裹吸收并轰击安装在X光检查通道内的双能量半导体探测器。双能量半导体探测器把X光转变为电信号,从而被处理器处理为X光包裹图像。
双目相机102部署在安检机101的上方,且双目相机102的拍摄视场包括安检机101所在的区域以及用户在安检机101附近的行走路径。比如,请参考图2,图2中的相机俯视区域为双目相机102的拍摄视场,其中行人路径表示用户在安检机101附近的行走路径。
人脸抓拍机103部署在安检机101的X光探测区域的上方,正对人员安检通道,用于抓拍用户的正脸图像。比如,请参考图3,人脸抓拍机103部署在安检机101的X光探测区域的上方且靠近X光探测区域的左边,这样,在用户通过人员安检通道时,人脸抓拍机103即可拍摄到用户的正脸图像。换句话说,人脸抓拍机103的拍摄视场包括图3中的人员安检通道所处的区域。
服务器104是一台服务器或者是由多台服务器组成的服务器集群,当然也可以是一个云计算服务中心。
在关联信息的过程中,在一种实现方式中,双目相机102的两个摄像头用于采集拍摄视场内的视频,得到第一视频和第二视频,将第一视频和第二视频发送给服务器104。服务器104基于第一视频和第二视频,对目标用户进行检测和跟踪,从而确定目标用户的用户标识和第一视频的视频帧图像中目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标,基于目标用户的用户标识,以 及第一视频的视频帧图像中目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标,确定目标用户在安检机上放置包裹的放包时刻和包裹的可见光包裹图像。
当然,在另一种实现方式中,在拍摄到第一视频和第二视频之后,双目相机102基于第一视频和第二视频,对目标用户进行检测和跟踪,从而确定目标用户的用户标识,以及第一视频的视频帧图像中目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标,并基于目标用户的用户标识,以及第一视频的视频帧图像中目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标,确定目标用户在安检机上放置包裹的放包时刻和包裹的可见光包裹图像。之后,双目相机102将放包时刻和可见光包裹图像发送给服务器104。同时,在双目相机102确定出目标用户的用户标识,以及第一视频的视频帧图像中目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标之后,为了便于人脸抓拍机103抓拍目标用户的人脸图像,双目相机102还需要将第一视频的视频帧图像中目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标发送给服务器104,以使服务器104控制人脸抓拍机103抓拍目标用户的人脸图像。
无论是上述两种实现方式中的哪种实现方式,一方面,在服务器104获取到放包时刻和可见光包裹图像之后,服务器104基于放包时刻以及安检机101传送包裹的传送速度,确定目标用户放置的包裹处于X光探测区域的时刻,进而触发安检机101拍摄X光包裹图像。当然,服务器104也可以将目标用户的放包时刻发送给安检机101,由安检机101基于放包时刻以及安检机101传送包裹的传送速度,确定目标用户放置的包裹处于X光探测区域的时刻,进而拍摄X光包裹图像,并将X光包裹图像发送给服务器104。
另一方面,在服务器104获取到第一视频的视频帧图像中目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标之后,服务器104基于第一视频的视频帧图像中目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标,以及人脸抓拍机103抓拍到的各个人脸图像,确定目标用户的人脸图像。当然,服务器104也可以将第一视频的视频帧图像中目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标发送给人脸抓拍机103,由人脸抓拍机103基于目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标,以及人脸抓拍机103抓拍到的各个人脸图像,确定目标用户的人脸图像。之后,人脸抓拍机103将目标用户的人脸图像发送给服务器104。
在服务器104获取到目标用户的可见光包裹图像、X光包裹图像和目标用户的人脸图像之后,服务器104即可关联可见光包裹图像、X光包裹图像和目标用户的人脸图像。
需要说明的是,可见光包裹图像是指包裹在可见光下拍摄到的图像,X光包裹图像是指包裹在X光下拍摄到的图像。人体骨骼关键点包括人体的头顶、肩膀、手肘、手腕等关节点。
另外,上述内容仅列出了部分实现方式,在实际应用中,上述处理过程也可以一部分由服务器104处理,另一部分由相应的设备处理,而且上述各种实现方式可以任意组合,本申请实施例对此不做限制。
另外,上述实施环境是通过服务器104来实现各个设备之间的通信,进而实现信息关联的。在某些情况下,本申请实施例提供的信息关联方法也可以不需要通过服务器104来实现各个设备的通信。在这种实施环境中,安检机101能够与双目相机102进行通信连接,双目相机102还能够与人脸抓拍机103进行通信连接,双目相机102用于确定放包时刻与可见光包裹图像,并将放包时刻发送给安检机101,安检机101用于基于放包时刻确定X光包裹图像,人脸抓拍机103用于确定目标 用户的人脸图像。最后,由双目相机102关联可见光包裹图像、X光包裹图像和目标用户的人脸图像。当然,双目相机102也可以将可见光包裹图像发送给服务器104,安检机101将X光包裹图像发送给服务器104,人脸抓拍机103将目标用户的人脸图像发送给服务器104,由服务器104关联可见光包裹图像、X光包裹图像和目标用户的人脸图像。
基于上述描述,信息关联的处理过程的组合方式有很多,接下来以其中的一种方式为例,对本申请实施例提供的信息关联方法进行详细的解释说明。
图4是本申请实施例提供的一种信息关联方法的流程图,以应用于服务器为例进行说明,在该方法中,安检机的上方部署有双目相机,安检机的X光探测区域的上方部署有人脸抓拍机。请参考图4,该方法包括如下步骤。
步骤401:服务器通过双目相机采集的第一视频对目标用户进行检测和跟踪,确定目标用户的用户标识,以及第一视频的视频帧图像中目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标。
也就是说,服务器可以获取双目相机采集的第一视频,进而通过第一视频对目标用户进行检测和跟踪,确定目标用户的用户标识,以及通过第一视频对目标用户进行检测和跟踪,确定第一视频的视频帧图像中目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标。其中,上述第一视频只双目相机采集的两个视频中的一个。上述目标用户为第一视频的视频帧图像中包含的用户。
在一些实施例中,服务器基于第一视频和双目相机采集的第二视频,确定第一视频中每个视频帧图像对应的深度图像。基于第一视频,对目标用户进行检测和跟踪,确定目标用户的用户标识,以及目标用户的人体骨骼关键点在第一视频的视频帧图像中的坐标,基于第一视频中的视频帧图像对应的深度图像,以及目标用户的人体骨骼关键点在第一视频的视频帧图像中的坐标,确定第一视频的视频帧图像中目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标。
由于双目相机的两个摄像头同时在拍摄同一场景,且第一视频中每个视频帧图像对应的深度图像的确定过程相同。因此,接下来以第一视频中的一个视频帧图像为例,对服务器确定对应的深度图像的实现过程进行介绍。由于深度图像通常是通过同时拍摄同一场景的左右两张视频帧图像来确定,因此,为了便于描述,将第一视频和第二视频中同时拍摄的左右两张视频帧图像分别称为第一视频帧图像和第二视频帧图像。也即是,第一视频帧图像和第二视频帧图像分别为双目相机的两个摄像头同时针对同一场景拍摄得到的。
其中,服务器确定第一视频帧图像对应的深度图像的实现过程包括:根据第一视频帧图像和第二视频帧图像,确定第一视差图,基于第一视差图,按照如下公式(1)确定第一视差图中的每个像素点对应的深度值,从而得到第一视频帧图像对应的深度图像。
depth=(f*baseline)/disp    (1)
其中,在上述公式(1)中,depth是指第一视频帧图像对应的深度图像中像素点的深度值,f是指归一化的焦距,也就是双目相机的内参矩阵中的焦距,baseline是指双目相机的两个摄像头的光心之间的距离,也称为基线距离,disp是指第一视差图中像素点的视差值。
作为一种示例,服务器根据第一视频帧图像和第二视频帧图像确定第一视差图的实现过程包括:将第二视频帧图像中的像素点与第一视频帧图像中同Y坐标上的像素点进行匹配,并计算每两个 匹配的像素点之间的横坐标之差,该横坐标之差即为两个像素点之间的视差值。将该视差值作为第一视频帧图像中该像素点对应的像素值,从而得到与第一视频帧图像相同大小的视差图像。
图5是本申请实施例示出的一种计算视差图的原理示意图。其中,假设图5中的左图为第一视频帧图像,右图为第二视频帧图像。其中,为了便于说明,可以将图5中每个小方格看作一个像素点。对于第二视频帧图像中的像素点A,当在第一视频帧图像中寻找该像素点A的匹配像素点,即确定与像素点A相匹配的像素点时。首先,以该像素点A为中心像素点,形成一个W×H的像素矩阵,如可以形成一个9×9的像素矩阵。之后,在第一视频帧图像中确定与该中心像素点具有相同Y坐标的像素点,也即是,在第一视频帧图像中得到与该中心像素点同Y坐标的一行像素点,如图5左图中的实线框所示。在将该中心像素点与这一行上的像素点逐个进行匹配时,计算该中心像素点所在的像素矩阵中的每个像素点与第一视频帧图像中对应位置上的像素点的像素差,将计算得到的多个像素差加和,得到像素差和。也即是,如图5中左图中的虚线框所示,假设以像素点A为中心像素点形成9×9的像素矩阵A,当该像素点A与第一视频帧图像中的像素点B进行匹配计算时,形成以像素点B为中心像素点的同样大小9×9的像素矩阵B,如图5中右图虚线框所示。之后,计算像素矩阵A中的每个像素点和像素矩阵B中对应位置处的像素点之间的像素差,并将多个像素差加和,得到像素差和。对于第一视频帧图像中与像素点A同Y坐标的其他像素点,通过上述方法与像素点A进行匹配计算,最终对应得到多个像素差和。从该多个像素差和中选择最小的像素差和,并将该最小的像素差和对应的像素点确定为像素点A的匹配点。假设像素点A在第一视频帧图像中的匹配点为像素点B,此时,计算像素点A和像素点B之间的横坐标之差,并将该横坐标之差作为两个像素点之间的视差值,并将该视差值作为与第一视频帧图像相同大小的视差图中像素点B的像素值,其中,视差图中像素点B为:视差图中像素位置与第一视频帧图像中像素点B的坐标位置相同的像素点。
其中,在第一视频中对目标用户进行检测和跟踪的方法包括多种,也就是说通过双目相机采集的第一视频对目标用户进行检测和跟踪的方法包括多种,比如,采用深度学习的方式,在第一视频的每个视频帧图像中检测目标用户的人体骨骼关键点,本申请实施例对此不做限制,也不进行详细阐述。但是需要说明的是,目标用户的用户标识是在第一视频中对目标用户进行检测和跟踪的过程中为目标用户分配的,例如,服务器第一次从第一视频的视频帧图像中检测到目标用户的人体骨骼关键点时,生成新的用户标识,作为目标用户的用户标识。
由于第一视频的每个视频帧图像中目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标的确定过程相同。因此,接下来仍以第一视频帧图像为例,对服务器确定目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标的实现过程进行介绍。
其中,服务器基于第一视频帧图像对应的深度图像,以及目标用户的人体骨骼关键点在第一视频帧图像中的坐标,确定第一视频帧图像中目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标的实现过程包括:获取双目相机的内参矩阵,将目标用户的人体骨骼关键点在第一视频帧图像中的坐标(x,y)与双目相机的内参矩阵的逆矩阵相乘,得到目标用户的人体骨骼关键点在双目相机坐标系中的坐标(x’,y’)。然后从第一视频帧图像对应的深度图像中获取坐标(x,y)对应的深度值,其中,坐标(x,y)对应的深度值为:深度图像中像素坐标为(x,y)的像素点的深度值。将获取的深度值作为z,与坐标(x’,y’)组合,得到第一视频帧图像中目标用户的人体骨骼关键 点在双目相机坐标系中的三维坐标(x’,y’,z)。
需要说明的是,上述是以第一视频为例,确定深度图像、对目标用户进行检测和跟踪,以及确定目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标,当然也可以基于第二视频确定深度图像、对目标用户进行检测和跟踪,以及确定目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标,本申请实施例对此不做限定。
本申请实施例通过结合深度图像,确定目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标,相比二维坐标来说,提高了对目标用户进行检测和跟踪的准确性。
步骤402:服务器基于目标用户的用户标识,以及第一视频的视频帧图像中目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标,确定目标用户在安检机上放置包裹的放包时刻和该包裹的可见光包裹图像。
在一些实施例中,服务器基于第一视频的视频帧图像中目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标,确定放包时刻。基于目标用户的用户标识和该放包时刻,从第一视频中,获取该包裹的可见光包裹图像。
作为一种示例,服务器确定放包时刻的实现过程包括:基于第一视频的视频帧图像中目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标,确定目标用户的人体骨骼关键点与包裹放置区域的位置关系满足第一包裹放置条件时,确定目标用户处于放包状态,将目标用户处于放包状态的时刻确定为放包时刻。
其中,第一包裹放置条件是指目标用户的人体骨骼关键点中的一处或多处在连续的N个视频帧图像中处于包裹放置区域内,N为大于1的整数。
也即是,服务器基于第一视频的视频帧图像中目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标,确定目标用户的人体骨骼关键点中的一处或多处是否位于包裹放置区域内。如果确定目标用户的人体骨骼关键点中的一处或多处在连续的N个视频帧图像中处于包裹放置区域内,则确定目标用户的人体骨骼关键点与包裹放置区域的位置关系满足第一包裹放置条件,并确定放包时刻。
比如,第一视频的第i个视频帧图像中目标用户的人体骨骼关键点中的一处或多处位于包裹放置区域内,且在接下来的连续N个视频帧图像中都处于包裹放置区域,那么服务器确定目标用户正在处于放包状态,并将第i+N个视频帧图像的拍摄时刻确定为放包时刻。
其中,包裹放置区域是指对安检机的传送带区域外扩后得到的,比如请参考图2,区域T即为包裹放置区域,是对安检机的传送带区域外扩后得到的。在安检机和双目相机的位置固定之后,服务器可以获取到包裹放置区域在双目相机坐标系中的三维坐标。这样,服务器确定放包时刻时,将第一视频的视频帧图像中目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标与包裹放置区域在双目相机坐标系中的三维坐标进行比较,从而能够确定目标用户的人体骨骼关键点中的一处或多处是否位于包裹放置区域内。
作为另一种示例,服务器确定放包时刻的实现过程包括:基于第一视频的视频帧图像中目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标,确定目标用户的动作变化情况满足第二包裹放置条件时,确定目标用户处于放包状态,将目标用户处于放包状态的时刻确定为放包时刻。
其中,第二包裹放置条件是指目标用户的人体骨骼关键点中的一处或多处在连续的M个视频 帧图像中存在起伏,且起伏幅度大于幅度阈值,M为大于1的整数。或者,第二包裹放置条件是指目标用户的动作变化趋势为拿起包裹到放下包裹的趋势。
也即是,服务器基于第一视频的视频帧图像中目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标,确定目标用户的人体骨骼关键点中的一处或多处在连续的M个视频帧图像中是否存在起伏。如果确定目标用户的人体骨骼关键点中的一处或多处在连续的M个视频帧图像中存在起伏,且起伏幅度大于幅度阈值,则确定目标用户的动作变化情况满足第二包裹放置条件,并确定放包时刻。
比如,第一视频的第i个视频帧图像中目标用户的人体骨骼关键点中的一处或多处存在起伏,且在接下来的连续M个视频帧图像中都存在起伏,而且目标用户的人体骨骼关键点在这M个视频帧图像中的起伏幅度均大于幅度阈值,那么服务器确定目标用户正在处于放包状态,并将第i+M个视频帧图像的拍摄时刻确定为放包时刻。
需要说明的是,第一视频包括多个视频帧图像,目标用户的人体骨骼关键点的位置会随着该多个视频帧图像的拍摄时间的变化而变化,这样,目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标也会随着该多个视频帧图像的拍摄时间的变化而变化。因此,服务器可以基于目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标,确定目标用户的人体骨骼关键点是否存在起伏,以及确定目标用户的动作变化趋势。
在目标用户未放置包裹的情况下,目标用户的人体骨骼关键点基本是不存在起伏,但是在目标用户放置包裹的过程中,目标用户的人体骨骼关键点通常是存在起伏的,所以,在本申请实施例中,可以确定前后相邻的两张视频帧图像中目标用户的同一人体骨骼关键点在双目相机坐标系中的三维坐标之间的距离,得到与多个人体骨骼关键点一一对应的多个距离。如果该多个距离中的每个距离均小于距离阈值,那么认为后一张视频帧图像中目标用户的人体骨骼关键点不存在起伏。如果该多个距离中存在大于距离阈值的距离,那么认为后一张视频帧图像中目标用户的人体骨骼关键点存在起伏,并将该多个距离中的最大距离确定为后一张视频帧图像中目标用户的人体骨骼关键点的起伏幅度。
在目标用户未放置包裹的情况下,目标用户基本不会作出拿起包裹、放下包裹的动作,但是在目标用户放置包裹的过程中,目标用户通常会作出拿起包裹、放下包裹的动作,而且动作变化趋势通常也是拿起包裹到放下包裹的趋势。并且,目标用户的动作通常可以通过目标用户的人体骨骼关键点的位置来确定,比如,胳膊上的骨骼关键点的位置。所以,在本申请实施例中,可以根据目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标,采用深度学习等方式,确定目标用户的动作变化趋势。具体实现方式本申请实施例不做过多介绍。
其中,服务器基于放包时刻,从第一视频中获取该包裹的可见光包裹图像的实现过程包括:服务器从第一视频中获取拍摄时间为放包时刻的视频帧图像,基于目标用户的用户标识,从获取的视频帧图像中确定包括目标用户以及目标用户正在放置的包裹的图像区域,从该图像区域中获取目标用户的可见光包裹图像。
基于上述描述,服务器基于第一视频对目标用户进行检测和跟踪的过程中,能够从第一视频的视频帧图像中识别出目标用户,还可以为目标用户分配用户标识,而且第一视频中拍摄时间为放包时刻的视频帧图像中不仅包括目标用户,还包括目标用户正在放置的包裹。因此,在服务器从第一 视频中获取到拍摄时间为放包时刻的视频帧图像之后,能够基于目标用户的用户标识,从获取的视频帧图像中确定目标用户的可见光包裹图像。
可选的,在确定出放包时刻之后,可以先基于放包时刻确定目标用户的放包时间段,例如将放包时刻的前后预设时长的时间段作为放包时间段,进而获取第一视频中放包时间段内的各视频帧图像,进一步的从所获取的各视频帧图像中确定出包含目标用户的用户标识的视频帧图像,作为目标用户的可见光包裹图像。
示例性的,放包时刻为17时31分29秒,则放包时间段为:17时31分28秒-17时31分30秒,进而从第一视频中选择时间戳位于17时31分28秒-17时31分30秒内的多张视频帧图像,并从所选择的多张视频帧图像中包含目标用户的用户标识的视频帧图像,作为目标用户的可见光包裹图像。
需要说明的是,上述是以第一视频为例,对服务器获取可见光包裹图像的实现方式进行介绍,实际应用中,也可以从第二视频中获取可见光包裹图像,本申请实施例对此不做限定。
步骤403:服务器基于放包时刻,通过安检机确定该包裹处于X光探测区域的X光包裹图像。
在一些实施例中,以该放包时刻为起始时间点,根据安检机传送包裹的传送速度,确定目标用户放置的包裹处于安检机的X光探测区域的时刻,得到X光探测时刻。基于X光探测时刻,通过安检机确定该包裹处于安检机的X光探测区域的X光包裹图像。
通常情况下,安检机的传送带的速度是固定且匀速的,而且传送带的中心点与X光探测区域的距离是固定的,因此,可以将传送带的中心点与X光探测区域的距离除以安检机传送包裹的传送速度,得到第一时长。然后,在放包时刻的基础上增加第一时长,得到X光探测时刻。接下来服务器将X光探测时刻发送给安检机,由安检机在X光探测时刻采集该包裹的图像,从而得到X光包裹图像。或者,服务器可以从安检机采集的各X光包裹图像中,确定采集时刻为X光探测时刻一致或相差小于指定阈值的X光包裹图像,作为包裹处于X光探测区域的X光包裹图像。
在一些实施例中,双目相机垂直部署在安检机的上方,用于拍摄可见光包裹图像。这样,可见光包裹图像与X光包裹图像的视角基本一致,在出现危险品报警、问题包裹追溯、重点人群管控时,便于比对确认,从而提高效率。
步骤404:服务器基于第一视频的视频帧图像中目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标,通过人脸抓拍机确定目标用户的人脸图像。
在一些实施例中,服务器将第一视频的视频帧图像中目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标转换到人脸抓拍图像坐标系中。基于目标用户的人体骨骼关键点在人脸抓拍图像坐标系中的坐标,通过人脸抓拍机确定目标用户的人脸图像。
其中,服务器确定目标用户的人体骨骼关键点在人脸抓拍图像坐标系中的坐标的实现过程包括:服务器获取双目相机坐标系到人脸抓拍机坐标系的旋转矩阵和平移矩阵,以及人脸抓拍机的内参矩阵。然后,将第一视频的视频帧图像中目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标与获取的旋转矩阵、平移矩阵相乘,得到目标用户的人体骨骼关键点在人脸抓拍机坐标系中的三维坐标。之后,将目标用户的人体骨骼关键点在人脸抓拍机坐标系中的三维坐标与人脸抓拍机的内参矩阵相乘,得到目标用户的人体骨骼关键点在人脸抓拍图像坐标系中的坐标。
双目相机坐标系到人脸抓拍机坐标系的旋转矩阵和平移矩阵需要事先标定得到。标定过程包括: 将14*11的黑白方格板放置在双目相机和人脸抓拍机的共同视场中,计算双目相机和人脸抓拍机相对于黑白格板世界坐标系的旋转矩阵R1、R0和偏移矩阵T1、T0。基于旋转矩阵R1、R0和偏移矩阵T1、T0,按照如下公式(2)和公式(3)计算双目相机坐标系到人脸抓拍机坐标系的旋转矩阵R和偏移矩阵T。
Figure PCTCN2022083610-appb-000001
Figure PCTCN2022083610-appb-000002
其中,在上述公式(2)和公式(3)中,
Figure PCTCN2022083610-appb-000003
是指旋转矩阵R1的逆矩阵。
其中,服务器基于第一视频的视频帧图像中目标用户的人体骨骼关键点在人脸抓拍图像坐标系中的坐标,通过人脸抓拍机确定目标用户的人脸图像的实现过程包括:服务器从第一视频的视频帧图像中目标用户的人体骨骼关键点在人脸抓拍图像坐标系中的坐标中,选择目标用户的头部和肩部的关键点在人脸抓拍图像坐标系中的坐标。基于目标用户的头部和肩部的关键点在人脸抓拍图像坐标系中的坐标,预测目标用户的人脸在人脸抓拍图像坐标系中的区域,得到目标用户的预测人脸区域。基于目标用户的预测人脸区域,以及人脸抓拍机拍摄的图像中各个真实人脸区域,确定目标用户的人脸图像。其中,每一图像中的真实人脸区域指该图像中包含的人脸区域,例如通过对该图像进行人脸识别所确定的人脸区域。
也即是,服务器将第一视频的视频帧图像中目标用户的人体骨骼关键点从双目相机坐标系转换到人脸抓拍图像坐标系中,得到目标用户的人体骨骼关键点在人脸抓拍图像坐标系中的坐标。然后,基于目标用户的人体骨骼关键点中的头部和肩部的关键点在人脸抓拍图像坐标系中的坐标,预测目标用户的人脸在人脸抓拍机图像坐标系中的区域,得到目标用户的预测人脸区域。将目标用户的预测人脸区域与人脸抓拍机拍摄的图像中的各个真实人脸区域进行比较,将与预测人脸区域存在重叠且重叠面积最大的一个真实人脸区域确定为目标用户的人脸图像。
比如,基于第一视频的视频帧图像中目标用户的头部和肩部的关键点在人脸抓拍图像坐标系中的坐标,预测目标用户的人脸在人脸抓拍图像坐标系中的区域为区域1。人脸抓拍机拍摄的图像中包括三个真实人脸区域,这三个真实人脸区域分别为区域2、区域3和区域4。其中,区域1与区域3存在重叠,且区域1与区域4也存在重叠,但是区域3与区域1的重叠面积最大,此时将区域3对应的真实人脸区域确定为目标用户的人脸图像。
在本申请实施例中,第一视频的多张视频帧图像中可能都包括目标用户的人体骨骼关键点,这样,通过该多张视频帧图像能够预测得到目标用户的多个人脸区域,而且人脸抓拍机拍摄到的多张图像中可能都包括目标用户的真实人脸区域。为了便于后期应用,服务器可以基于人脸抓拍图像坐标系中目标用户的多个预测人脸区域,以及人脸抓拍机拍摄的多张图像中各个真实人脸区域,从人脸抓拍机拍摄的多张图像中分别确定目标用户的人脸图像,得到目标用户的多张人脸图像。从该多张人脸图像中确定最优人脸图像。
需要说明的是,第一视频包括多个视频帧图像,人脸抓拍机也会抓拍到多张图像,且双目相机与人脸抓拍机存在共同的拍摄视场,因此,目标用户可能会同时出现在第一视频的视频帧图像和人脸抓拍机拍摄的图像中。这样,服务器能够基于双目相机和人脸抓拍机同一时刻采集的图像,确定目标用户的一张人脸图像。相应地,服务器也能够基于双目相机和人脸抓拍机在多个时刻采集的图像,确定目标用户的多张人脸图像。
比如,双目相机和人脸抓拍机在同一时刻采集的图像分别为图像1和图像2,服务器将图像1中目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标转换到人脸抓拍图像坐标系中,基于图像1中目标用户的头部和肩部的关键点在人脸抓拍图像坐标系中的坐标,预测目标用户的人脸区域。基于目标用户的预测人脸区域,以及图像2中各个真实人脸区域,从图像2中确定目标用户的人脸图像。
其中,从该多张人脸图像中确定最优人脸图像的方法包括多种,比如对多张人脸图像进行评分,选择评分最高的人脸图像作为最优人脸图像。本申请实施例对确定最优人脸图像的方法不做限定。
需要说明的是,上述方案中的人脸抓拍机为单目,但是在实际应用中也可以使用双目相机。当人脸抓拍机为双目相机时,可以确定人脸抓拍机拍摄的图像中的各个真实人脸区域在人脸抓拍机坐标系中的三维坐标。这样,服务器基于第一视频的视频帧图像中目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标,通过人脸抓拍机确定目标用户的人脸图像的实现过程包括:服务器将第一视频的视频帧图像中目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标转换到人脸抓拍机坐标系中,得到第一视频的视频帧图像中目标用户的人体骨骼关键点在人脸抓拍机坐标系中的三维坐标。基于第一视频的视频帧图像中目标用户的头部和肩部的关键点在人脸抓拍机坐标系中的三维坐标,通过人脸抓拍机确定目标用户的人脸图像。
由于人脸抓拍机为双目相机,因此可以确定人脸抓拍机拍摄的图像中的各个真实人脸区域在人脸抓拍机坐标系中的三维坐标,进而得到对应的各个三维真实人脸区域。然后,基于第一视频的视频帧图像中目标用户的头部和肩部的关键点在人脸抓拍机坐标系中的三维坐标,预测目标用户在人脸抓拍机坐标系中的三维人脸区域,得到目标用户的三维预测人脸区域。将目标用户的三维预测人脸区域与人脸抓拍机拍摄的图像中的各个真实人脸区域所对应的三维真实人脸区域进行比较,将存在重叠且重叠体积最大的三维真实人脸区域在人脸抓拍机拍摄的图像中对应的真实人脸区域确定为目标用户的人脸图像。
其中,相比单目的人脸抓拍机来说,双目的人脸抓拍机通过三维坐标来定位目标用户的人脸图像,从而提高了确定目标用户的人脸图像的准确率。也即是,通过借助空间位置坐标,能够更加精准地确定目标用户的人脸图像。
步骤405:服务器关联可见光包裹图像、X光包裹图像和目标用户的人脸图像。
在服务器确定出可见光包裹图像、X光包裹图像和目标用户的人脸图像之后,可以关联可见光包裹图像、X光包裹图像和目标用户的人脸图像。
实际应用中,可能会出现多个用户需要进行包裹的安检,为了提高信息关联的准确性,在服务器确定出目标用户的可见光包裹之后,可以关联目标用户的用户标识与目标用户的可见光包裹图像,得到第一关联关系。在确定出目标用户的X光包裹图像之后,关联可见光包裹图像和X光包裹图像,得到第二关联关系。在确定出目标用户的人脸图像之后,关联目标用户的用户标识和目标用户的人脸图像,得到第三关联关系。然后,基于第一关联关系、第二关联关系和第三关联关系,关联目标用户的可见光包裹图像、X光包裹图像和人脸图像。
基于上述描述,服务器还可以从目标用户的多张人脸图像中确定最优人脸图像。这样,在服务器关联目标用户的可见光包裹图像、X光包裹图像和人脸图像时,可以将目标用户的可见光包裹图像、X光包裹图像和最优人脸图像进行关联。
在关联得到可见光包裹图像、X光包裹图像和目标用户的人脸图像之后,存储三者之间的关联关系,这样,便于用户观看、管理和后期问题追溯。
本申请实施例采用双目相机、人脸抓拍机和安检机等即可实现可见光包裹图像、X光包裹图像以及目标用户的人脸图像的关联,硬件环境搭建简单,设备需求简单。而且引入深度图像,能够实现目标用户的精准定位和跟踪,无需用户配合,基本也不受人流量的影响,在用户较多、相互穿插、交错场景下也能实现信息的关联。另外,将目标用户的可见光包裹图像、X光包裹图像和最优人脸图像进行关联,便于后期人脸比对等多种场景的应用。
图6是本申请实施例提供的一种信息关联系统的结构示意图,该信息关联系统包括:服务器601、安检机的上方部署有双目相机602,以及安检机的X光探测区域的上方部署有人脸抓拍机603;其中,
所述双目相机601,设置于采集包含目标用户的第一视频;
所述服务器601,设置于通过所述第一视频对所述目标用户进行检测和跟踪,确定所述目标用户的用户标识,以及所述第一视频的视频帧图像中所述目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标;基于所述用户标识,以及所述第一视频的视频帧图像中所述人体骨骼关键点在所述双目相机坐标系中的三维坐标,确定所述目标用户在所述安检机上放置包裹的放包时刻和所述包裹的可见光包裹图像;基于所述放包时刻,通过所述安检机确定所述包裹处于所述X光探测区域的X光包裹图像;基于所述第一视频的视频帧图像中所述人体骨骼关键点在所述双目相机坐标系中的三维坐标,通过所述人脸抓拍机603确定所述目标用户的人脸图像;
所述人脸抓拍机603,设置于采集所述目标用户的人脸图像。
可选地,服务器601通过所述双目相机602采集的第一视频对目标用户进行检测和跟踪,确定所述目标用户的用户标识,以及所述第一视频的视频帧图像中所述目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标,可以包括:基于所述第一视频和所述双目相机602采集的第二视频,确定所述第一视频中的视频帧图像对应的深度图像;基于所述第一视频,对所述目标用户进行检测和跟踪,确定所述目标用户的用户标识,以及所述人体骨骼关键点在所述第一视频的视频帧图像中的坐标;基于所述第一视频中的视频帧图像对应的深度图像,以及所述人体骨骼关键点在所述第一视频的视频帧图像中的坐标,确定所述第一视频的视频帧图像中所述人体骨骼关键点在所述双目相机坐标系中的三维坐标。
可选地,服务器601基于所述用户标识,以及所述第一视频的视频帧图像中所述人体骨骼关键点在所述双目相机坐标系中的三维坐标,确定所述目标用户在所述安检机上放置包裹的放包时刻和所述包裹的可见光包裹图像,可以包括:基于所述第一视频的视频帧图像中所述人体骨骼关键点在所述双目相机坐标系中的三维坐标,确定所述放包时刻;基于所述用户标识和所述放包时刻,从所述第一视频中,获取所述包裹的可见光包裹图像。
可选地,服务器601基于所述第一视频的视频帧图像中所述人体骨骼关键点在所述双目相机坐标系中的三维坐标,确定所述放包时刻,可以包括:基于所述第一视频的视频帧图像中所述人体骨骼关键点在所述双目相机坐标系中的三维坐标,确定所述人体骨骼关键点与包裹放置区域的位置关系满足第一包裹放置条件时,确定所述目标用户处于放包状态,将所述目标用户处于放包状态的时刻确定为所述放包时刻;或者基于所述第一视频的视频帧图像中所述人体骨骼关键点在所述双目相 机坐标系中的三维坐标,确定所述目标用户的动作变化情况满足第二包裹放置条件时,确定所述目标用户处于放包状态,将所述目标用户处于放包状态的时刻确定为所述放包时刻。
可选地,所述第一包裹放置条件是指所述人体骨骼关键点中的一处或多处在连续的N个视频帧图像中处于所述包裹放置区域内,所述N为大于1的整数;所述第二包裹放置条件是指所述人体骨骼关键点中的一处或多处在连续的M个视频帧图像中存在起伏,且起伏幅度大于幅度阈值,所述M为大于1的整数;或者,所述第二包裹放置条件是指所述目标用户的动作变化趋势为拿起包裹到放下包裹的趋势。
可选地,服务器601基于所述放包时刻,通过所述安检机确定所述包裹处于所述X光探测区域的X光包裹图像,可以包括:以所述放包时刻为起始时间点,根据所述安检机传送包裹的传送速度,确定所述目标用户放置的包裹处于所述X光探测区域的时刻,得到X光探测时刻;基于所述X光探测时刻,通过所述安检机确定所述包裹处于所述X光探测区域的X光包裹图像。
可选地,服务器601基于所述第一视频的视频帧图像中所述人体骨骼关键点在所述双目相机坐标系中的三维坐标,通过所述人脸抓拍机603确定所述目标用户的人脸图像,可以包括:将所述第一视频的视频帧图像中所述人体骨骼关键点在所述双目相机坐标系中的三维坐标转换到人脸抓拍图像坐标系中,所述人脸抓拍图像坐标系是指所述人脸抓拍机603拍摄的图像的坐标系;基于所述第一视频的视频帧图像中所述人体骨骼关键点在所述人脸抓拍图像坐标系中的坐标,通过所述人脸抓拍机603确定所述目标用户的人脸图像。
可选地,服务器601基于所述第一视频的视频帧图像中所述人体骨骼关键点在所述人脸抓拍图像坐标系中的坐标,通过所述人脸抓拍机603确定所述目标用户的人脸图像,可以包括:从所述第一视频的视频帧图像中所述人体骨骼关键点在所述人脸抓拍图像坐标系的坐标中,选择所述目标用户的头部和肩部的关键点在所述人脸抓拍图像坐标系中的坐标;基于所述目标用户的头部和肩部的关键点在所述人脸抓拍图像坐标系中的坐标,预测所述目标用户的人脸在所述人脸抓拍图像坐标系中的区域,得到所述目标用户的预测人脸区域;基于所述目标用户的预测人脸区域,以及所述人脸抓拍机603拍摄的图像中各个真实人脸区域,确定所述目标用户的人脸图像。
可选地,第一视频的多张视频帧图像中包括所述人体骨骼关键点,所述人脸抓拍机603拍摄的多张图像中包括所述目标用户的真实人脸区域;
此时,服务器601基于所述目标用户的预测人脸区域,以及所述人脸抓拍机603拍摄的图像中各个真实人脸区域,确定所述目标用户的人脸图像,包括:基于所述目标用户的多个预测人脸区域,以及所述人脸抓拍机603拍摄的所述多张图像中各个真实人脸区域,从所述人脸抓拍机603拍摄的多张图像中确定所述目标用户的多张人脸图像,所述多个预测人脸区域是指通过所述多张视频帧图像预测得到的人脸区域;从所述多张人脸图像中确定最优人脸图像;
服务器601关联所述可见光包裹图像、所述X光包裹图像和所述人脸图像,包括:关联所述可见光包裹图像、所述X光包裹图像和所述最优人脸图像。
本申请实施例采用双目相机、人脸抓拍机和安检机等即可实现可见光包裹图像、X光包裹图像以及目标用户的人脸图像的关联,硬件环境搭建简单,设备需求简单。而且引入深度图像,能够实现目标用户的精准定位和跟踪,无需用户配合,基本也不受人流量的影响,在用户较多、相互穿插、交错场景下也能实现信息的关联。另外,将目标用户的可见光包裹图像、X光包裹图像和最优人脸 图像进行关联,便于后期人脸比对等多种场景的应用。
图7是本申请实施例提供的一种信息关联装置的结构示意图,该信息关联装置可以由软件、硬件或者两者的结合实现成为服务器的部分或者全部。在本申请实施例中,安检机的上方部署有双目相机,安检机的X光探测区域的上方部署有人脸抓拍机。请参考图7,该装置包括:检测跟踪模块701、第一确定模块702、第二确定模块703、第三确定模块704和关联模块705。
检测跟踪模块701,设置于通过双目相机采集的第一视频对目标用户进行检测和跟踪,确定目标用户的用户标识,以及第一视频的视频帧图像中目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标;
第一确定模块702,设置于基于用户标识,以及第一视频的视频帧图像中人体骨骼关键点在双目相机坐标系中的三维坐标,确定目标用户在安检机上放置包裹的放包时刻和包裹的可见光包裹图像;
第二确定模块703,设置于基于放包时刻,通过安检机确定包裹处于X光探测区域的X光包裹图像;
第三确定模块704,设置于基于第一视频的视频帧图像中人体骨骼关键点在双目相机坐标系中的三维坐标,通过人脸抓拍机确定目标用户的人脸图像;
关联模块705,设置于关联可见光包裹图像、X光包裹图像和人脸图像。
可选地,检测跟踪模块701包括:
第一确定子模块,设置于基于第一视频和双目相机采集的第二视频,确定第一视频中的视频帧图像对应的深度图像;
第二确定子模块,设置于基于第一视频,对目标用户进行检测和跟踪,确定目标用户的用户标识,以及人体骨骼关键点在第一视频的视频帧图像中的坐标;
第三确定子模块,设置于基于第一视频中的视频帧图像对应的深度图像,以及人体骨骼关键点在第一视频的视频帧图像中的坐标,确定第一视频的视频帧图像中人体骨骼关键点在双目相机坐标系中的三维坐标。
可选地,第一确定模块702包括:
第四确定子模块,设置于基于第一视频的视频帧图像中人体骨骼关键点在双目相机坐标系中的三维坐标,确定放包时刻;
获取子模块,设置于基于用户标识和放包时刻,从第一视频中,获取包裹的可见光包裹图像。
可选地,第四确定子模块,具体设置于:
基于第一视频的视频帧图像中人体骨骼关键点在双目相机坐标系中的三维坐标,确定人体骨骼关键点与包裹放置区域的位置关系满足第一包裹放置条件时,确定目标用户处于放包状态,将目标用户处于放包状态的时刻确定为放包时刻;或者
基于第一视频的视频帧图像中人体骨骼关键点在双目相机坐标系中的三维坐标,确定目标用户的动作变化情况满足第二包裹放置条件时,确定目标用户处于放包状态,将目标用户处于放包状态的时刻确定为放包时刻。
可选地,第一包裹放置条件是指人体骨骼关键点中的一处或多处在连续的N个视频帧图像中 处于包裹放置区域内,N为大于1的整数;
第二包裹放置条件是指人体骨骼关键点中的一处或多处在连续的M个视频帧图像中存在起伏,且起伏幅度大于幅度阈值,M为大于1的整数;或者,第二包裹放置条件是指目标用户的动作变化趋势为拿起包裹到放下包裹的趋势。
可选地,第二确定模块703包括:
第五确定子模块,设置于以放包时刻为起始时间点,根据安检机传送包裹的传送速度,确定目标用户放置的包裹处于X光探测区域的时刻,得到X光探测时刻;
第六确定子模块,设置于基于X光探测时刻,通过安检机确定包裹处于X光探测区域的X光包裹图像。
可选地,第三确定模块704包括:
转换子模块,设置于将第一视频的视频帧图像中人体骨骼关键点在双目相机坐标系中的三维坐标转换到人脸抓拍图像坐标系中,人脸抓拍图像坐标系是指人脸抓拍机拍摄的图像的坐标系;
第七确定子模块,设置于基于第一视频的视频帧图像中人体骨骼关键点在人脸抓拍图像坐标系中的坐标,通过人脸抓拍机确定目标用户的人脸图像。
可选地,第七确定子模块包括:
第一确定单元,设置于从第一视频的视频帧图像中人体骨骼关键点在人脸抓拍图像坐标系的坐标中,选择目标用户的头部和肩部的关键点在人脸抓拍图像坐标系中的坐标;
预测单元,设置于基于目标用户的头部和肩部的关键点在人脸抓拍图像坐标系中的坐标,预测目标用户的人脸在人脸抓拍图像坐标系中的区域,得到目标用户的预测人脸区域;
第二确定单元,设置于基于目标用户的预测人脸区域,以及人脸抓拍机拍摄的图像中各个真实人脸区域,确定目标用户的人脸图像。
可选地,第一视频的多张视频帧图像中包括目标用户的人体骨骼关键点,人脸抓拍机拍摄的多张图像中包括目标用户的真实人脸区域;
第三确定单元具体设置于:
基于目标用户的多个预测人脸区域,以及人脸抓拍机拍摄的多张图像中各个真实人脸区域,从人脸抓拍机拍摄的多张图像中确定目标用户的多张人脸图像,该多个预测人脸区域是指通过第一视频的多张视频帧图像预测得到的人脸区域;
从多张人脸图像中确定最优人脸图像;
关联模块705具体设置于:
关联可见光包裹图像、X光包裹图像和最优人脸图像。
本申请实施例采用双目相机、人脸抓拍机和安检机等即可实现可见光包裹图像、X光包裹图像以及目标用户的人脸图像的关联,硬件环境搭建简单,设备需求简单。而且引入深度图像,能够实现目标用户的精准定位和跟踪,无需用户配合,基本也不受人流量的影响,在用户较多、相互穿插、交错场景下也能实现信息的关联。另外,将目标用户的可见光包裹图像、X光包裹图像和最优人脸图像进行关联,便于后期人脸比对等多种场景的应用。
需要说明的是:上述实施例提供的信息关联装置在关联信息时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内 部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的信息关联装置与信息关联方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
图8是本申请实施例提供的一种服务器的结构示意图。服务器800包括中央处理单元(CPU)801、包括随机存取存储器(RAM)802和只读存储器(ROM)803的系统存储器804,以及连接系统存储器804和中央处理单元801的系统总线805。服务器800还包括帮助计算机内的各个器件之间传输信息的基本输入/输出系统(I/O系统)806,和用于存储操作系统813、应用程序814和其他程序模块815的大容量存储设备807。
基本输入/输出系统806包括有用于显示信息的显示器808和用于用户输入信息的诸如鼠标、键盘之类的输入设备809。其中显示器808和输入设备809都通过连接到系统总线805的输入输出控制器810连接到中央处理单元801。基本输入/输出系统807还可以包括输入输出控制器810以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地,输入输出控制器810还提供输出到显示屏、打印机或其他类型的输出设备。
大容量存储设备807通过连接到系统总线805的大容量存储控制器(未示出)连接到中央处理单元801。大容量存储设备807及其相关联的计算机可读介质为服务器800提供非易失性存储。也就是说,大容量存储设备807可以包括诸如硬盘或者CD-ROM驱动器之类的计算机可读介质(未示出)。
不失一般性,计算机可读介质可以包括计算机存储介质和通信介质。计算机存储介质包括以用于存储诸如计算机可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括RAM、ROM、EPROM、EEPROM、闪存或其他固态存储其技术,CD-ROM、DVD或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然,本领域技术人员可知计算机存储介质不局限于上述几种。上述的系统存储器804和大容量存储设备807可以统称为存储器。
根据本申请的各种实施例,服务器800还可以通过诸如因特网等网络连接到网络上的远程计算机运行。也即服务器800可以通过连接在系统总线805上的网络接口单元811连接到网络812,或者说,也可以使用网络接口单元811来连接到其他类型的网络或远程计算机系统(未示出)。
上述存储器还包括一个或者一个以上的程序,一个或者一个以上程序存储于存储器中,被配置由CPU执行。
在一些实施例中,还提供了一种计算机可读存储介质,该存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现上述实施例中信息关联方法的步骤。例如,所述计算机可读存储介质可以是ROM、RAM、CD-ROM、磁带、软盘和光数据存储设备等。
值得注意的是,本申请实施例提到的计算机可读存储介质可以为非易失性存储介质,换句话说,可以是非瞬时性存储介质。
应当理解的是,实现上述实施例的全部或部分步骤可以通过软件、硬件、固件或者其任意结合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。所述计算机指令可以存储在上述计算机可读存储介质中。
也即是,在一些实施例中,还提供了一种包含指令的计算机程序产品,当其在计算机上运行时, 使得计算机执行上述所述的信息关联方法的步骤。
在一些实施例中,还提供了一种包含指令的计算机程序,所述计算机程序在计算机上运行时,使得计算机执行上述所述的信息关联方法的步骤。
应当理解的是,本文提及的“至少一个”是指一个或多个,“多个”是指两个或两个以上。在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,为了便于清楚描述本申请实施例的技术方案,在本申请的实施例中,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。
以上所述仅为本申请的较佳实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。

Claims (15)

  1. 一种信息关联方法,其中,安检机的上方部署有双目相机,所述安检机的X光探测区域的上方部署有人脸抓拍机,所述方法包括:
    通过所述双目相机采集的第一视频对目标用户进行检测和跟踪,确定所述目标用户的用户标识,以及所述第一视频的视频帧图像中所述目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标;
    基于所述用户标识,以及所述第一视频的视频帧图像中所述人体骨骼关键点在所述双目相机坐标系中的三维坐标,确定所述目标用户在所述安检机上放置包裹的放包时刻和所述包裹的可见光包裹图像;
    基于所述放包时刻,通过所述安检机确定所述包裹处于所述X光探测区域的X光包裹图像;
    基于所述第一视频的视频帧图像中所述人体骨骼关键点在所述双目相机坐标系中的三维坐标,通过所述人脸抓拍机确定所述目标用户的人脸图像;
    关联所述可见光包裹图像、所述X光包裹图像和所述人脸图像。
  2. 如权利要求1所述的方法,所述通过所述双目相机采集的第一视频对目标用户进行检测和跟踪,确定所述目标用户的用户标识,以及所述第一视频的视频帧图像中所述目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标,包括:
    基于所述第一视频和所述双目相机采集的第二视频,确定所述第一视频中的视频帧图像对应的深度图像;
    基于所述第一视频,对所述目标用户进行检测和跟踪,确定所述目标用户的用户标识,以及所述人体骨骼关键点在所述第一视频的视频帧图像中的坐标;
    基于所述第一视频中的视频帧图像对应的深度图像,以及所述人体骨骼关键点在所述第一视频的视频帧图像中的坐标,确定所述第一视频的视频帧图像中所述人体骨骼关键点在所述双目相机坐标系中的三维坐标。
  3. 如权利要求1所述的方法,所述基于所述用户标识,以及所述第一视频的视频帧图像中所述人体骨骼关键点在所述双目相机坐标系中的三维坐标,确定所述目标用户在所述安检机上放置包裹的放包时刻和所述包裹的可见光包裹图像,包括:
    基于所述第一视频的视频帧图像中所述人体骨骼关键点在所述双目相机坐标系中的三维坐标,确定所述放包时刻;
    基于所述用户标识和所述放包时刻,从所述第一视频中,获取所述包裹的可见光包裹图像。
  4. 如权利要求3所述的方法,所述基于所述第一视频的视频帧图像中所述人体骨骼关键点在所述双目相机坐标系中的三维坐标,确定所述放包时刻,包括:
    基于所述第一视频的视频帧图像中所述人体骨骼关键点在所述双目相机坐标系中的三维坐标,确定所述人体骨骼关键点与包裹放置区域的位置关系满足第一包裹放置条件时,确定所述目标用户处于放包状态,将所述目标用户处于放包状态的时刻确定为所述放包时刻;或者
    基于所述第一视频的视频帧图像中所述人体骨骼关键点在所述双目相机坐标系中的三维坐标,确定所述目标用户的动作变化情况满足第二包裹放置条件时,确定所述目标用户处于放包状态,将所述目标用户处于放包状态的时刻确定为所述放包时刻。
  5. 如权利要求4所述的方法,所述第一包裹放置条件是指所述人体骨骼关键点中的一处或多处在连续的N个视频帧图像中处于所述包裹放置区域内,所述N为大于1的整数;
    所述第二包裹放置条件是指所述人体骨骼关键点中的一处或多处在连续的M个视频帧图像中存在起伏,且起伏幅度大于幅度阈值,所述M为大于1的整数;或者,所述第二包裹放置条件是指所述目标用户的动作变化趋势为拿起包裹到放下包裹的趋势。
  6. 如权利要求1所述的方法,所述基于所述放包时刻,通过所述安检机确定所述包裹处于所述X光探测区域的X光包裹图像,包括:
    以所述放包时刻为起始时间点,根据所述安检机传送包裹的传送速度,确定所述目标用户放置的包裹处于所述X光探测区域的时刻,得到X光探测时刻;
    基于所述X光探测时刻,通过所述安检机确定所述包裹处于所述X光探测区域的X光包裹图像。
  7. 如权利要求1所述的方法,所述基于所述第一视频的视频帧图像中所述人体骨骼关键点在所述双目相机坐标系中的三维坐标,通过所述人脸抓拍机确定所述目标用户的人脸图像,包括:
    将所述第一视频的视频帧图像中所述人体骨骼关键点在所述双目相机坐标系中的三维坐标转换到人脸抓拍图像坐标系中,所述人脸抓拍图像坐标系是指所述人脸抓拍机拍摄的图像的坐标系;
    基于所述第一视频的视频帧图像中所述人体骨骼关键点在所述人脸抓拍图像坐标系中的坐标,通过所述人脸抓拍机确定所述目标用户的人脸图像。
  8. 如权利要求7所述的方法,所述基于所述第一视频的视频帧图像中所述人体骨骼关键点在所述人脸抓拍图像坐标系中的坐标,通过所述人脸抓拍机确定所述目标用户的人脸图像,包括:
    从所述第一视频的视频帧图像中所述人体骨骼关键点在所述人脸抓拍图像坐标系的坐标中,选择所述目标用户的头部和肩部的关键点在所述人脸抓拍图像坐标系中的坐标;
    基于所述目标用户的头部和肩部的关键点在所述人脸抓拍图像坐标系中的坐标,预测所述目标用户的人脸在所述人脸抓拍图像坐标系中的区域,得到所述目标用户的预测人脸区域;
    基于所述目标用户的预测人脸区域,以及所述人脸抓拍机拍摄的图像中各个真实人脸区域,确定所述目标用户的人脸图像。
  9. 如权利要求8所述的方法,所述第一视频的多张视频帧图像中包括所述人体骨骼关键点,所述人脸抓拍机拍摄的多张图像中包括所述目标用户的真实人脸区域;
    所述基于所述目标用户的预测人脸区域,以及所述人脸抓拍机拍摄的图像中各个真实人脸区域,确定所述目标用户的人脸图像,包括:
    基于所述目标用户的多个预测人脸区域,以及所述人脸抓拍机拍摄的所述多张图像中各个真实人脸区域,从所述人脸抓拍机拍摄的多张图像中确定所述目标用户的多张人脸图像,所述多个预测人脸区域是指通过所述多张视频帧图像预测得到的人脸区域;
    从所述多张人脸图像中确定最优人脸图像;
    所述关联所述可见光包裹图像、所述X光包裹图像和所述人脸图像,包括:
    关联所述可见光包裹图像、所述X光包裹图像和所述最优人脸图像。
  10. 一种信息关联系统,其中,所述信息关联系统包括:服务器、安检机的上方部署有双目相机,以及安检机的X光探测区域的上方部署有人脸抓拍机;其中,
    所述双目相机,设置于采集包含目标用户的第一视频;
    所述服务器,设置于通过所述第一视频对所述目标用户进行检测和跟踪,确定所述目标用户的用户标识,以及所述第一视频的视频帧图像中所述目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标;基于所述用户标识,以及所述第一视频的视频帧图像中所述人体骨骼关键点在所述双目相机坐标系中的三维坐标,确定所述目标用户在所述安检机上放置包裹的放包时刻和所述包裹的可见光包裹图像;基于所述放包时刻,通过所述安检机确定所述包裹处于所述X光探测区域的X光包裹图像;基于所述第一视频的视频帧图像中所述人体骨骼关键点在所述双目相机坐标系中的三维坐标,通过所述人脸抓拍机确定所述目标用户的人脸图像;
    所述人脸抓拍机,设置于采集所述目标用户的人脸图像。
  11. 一种信息关联装置,其中,,安检机的上方部署有双目相机,所述安检机的X光探测区域的上方部署有人脸抓拍机,所述装置包括:
    检测跟踪模块,设置于通过所述双目相机采集的第一视频对目标用户进行检测和跟踪,确定所述目标用户的用户标识,以及所述第一视频的视频帧图像中所述目标用户的人体骨骼关键点在双目相机坐标系中的三维坐标;
    第一确定模块,设置于基于所述用户标识,以及所述第一视频的视频帧图像中所述人体骨骼关键点在所述双目相机坐标系中的三维坐标,确定所述目标用户在所述安检机上放置包裹的放包时刻和所述包裹的可见光包裹图像;
    第二确定模块,设置于基于所述放包时刻,通过所述安检机确定所述包裹处于所述X光探测区域的X光包裹图像;
    第三确定模块,设置于基于所述第一视频的视频帧图像中所述人体骨骼关键点在所述双目相机坐标系中的三维坐标,通过所述人脸抓拍机确定所述目标用户的人脸图像;
    关联模块,设置于关联所述可见光包裹图像、所述X光包裹图像和所述人脸图像。
  12. 一种服务器,其中,所述服务器包括处理器、通信接口、存储器和通信总线,所述处理器、所述通信接口和所述存储器通过所述通信总线完成相互间的通信,所述存储器用于存放计算机程序,所述处理器用于执行所述存储器上所存放的程序,以实现权利要求1-9任一所述方法的步骤。
  13. 一种计算机可读存储介质,其中,所述存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1-9任一所述方法的步骤。
  14. 一种包含指令的计算机程序产品,所述包含指令的计算机程序产品在计算机上运行时,使得计算机执行权利要求1-9任一所述的方法步骤。
  15. 一种计算机程序,所述计算机程序在计算机上运行时,使得计算机执行权利要求1-9任一所述的方法步骤。
PCT/CN2022/083610 2021-03-29 2022-03-29 信息关联方法、系统、装置、服务器及存储介质 WO2022206744A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110336567.6 2021-03-29
CN202110336567.6A CN112949577B (zh) 2021-03-29 2021-03-29 信息关联方法、装置、服务器及存储介质

Publications (1)

Publication Number Publication Date
WO2022206744A1 true WO2022206744A1 (zh) 2022-10-06

Family

ID=76227380

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/083610 WO2022206744A1 (zh) 2021-03-29 2022-03-29 信息关联方法、系统、装置、服务器及存储介质

Country Status (2)

Country Link
CN (1) CN112949577B (zh)
WO (1) WO2022206744A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115311608A (zh) * 2022-10-11 2022-11-08 之江实验室 一种多任务多目标关联追踪的方法及装置

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112949577B (zh) * 2021-03-29 2023-05-09 杭州海康威视数字技术股份有限公司 信息关联方法、装置、服务器及存储介质
CN113435543B (zh) * 2021-07-22 2024-04-09 湖南声迅科技有限公司 一种基于传送带标识的可见光和x光图像匹配方法及装置
CN114019572A (zh) * 2021-10-11 2022-02-08 安徽太测临峰光电科技股份有限公司 一种基于多摄像机融合的x光安检方法及安检装置
CN114419700A (zh) * 2021-12-29 2022-04-29 南京正驰科技发展有限公司 一种具备人身份与行李对应的x光安检系统
CN114295649B (zh) * 2021-12-31 2023-11-03 杭州睿影科技有限公司 一种信息关联方法、装置、电子设备及存储介质
CN115422391B (zh) * 2022-08-18 2023-05-26 成都智元汇信息技术股份有限公司 一种基于以图搜图的人包关联方法及装置
CN117455956A (zh) * 2023-12-22 2024-01-26 天津众合智控科技有限公司 一种基于ai技术的人包关联追踪方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109490976A (zh) * 2018-12-28 2019-03-19 同方威视技术股份有限公司 安全检查控制系统和方法
CN109726617A (zh) * 2017-10-30 2019-05-07 同方威视科技江苏有限公司 安检系统以及安检方法
CN109920108A (zh) * 2018-02-24 2019-06-21 北京首都机场航空安保有限公司 安全信息管理系统及方法
CN110472612A (zh) * 2019-08-22 2019-11-19 海信集团有限公司 人体行为识别方法及电子设备
CN112949577A (zh) * 2021-03-29 2021-06-11 杭州海康威视数字技术股份有限公司 信息关联方法、装置、服务器及存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019127273A1 (zh) * 2017-12-28 2019-07-04 深圳市锐明技术股份有限公司 一种多人脸检测方法、装置、服务器、系统及存储介质
CN110996084B (zh) * 2019-12-24 2022-12-27 成都极米科技股份有限公司 投影图像处理方法、装置、投影设备及存储介质
CN111290040A (zh) * 2020-03-12 2020-06-16 安徽启新明智科技有限公司 一种基于图像识别的主动式双视角关联的方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109726617A (zh) * 2017-10-30 2019-05-07 同方威视科技江苏有限公司 安检系统以及安检方法
CN109920108A (zh) * 2018-02-24 2019-06-21 北京首都机场航空安保有限公司 安全信息管理系统及方法
CN109490976A (zh) * 2018-12-28 2019-03-19 同方威视技术股份有限公司 安全检查控制系统和方法
CN110472612A (zh) * 2019-08-22 2019-11-19 海信集团有限公司 人体行为识别方法及电子设备
CN112949577A (zh) * 2021-03-29 2021-06-11 杭州海康威视数字技术股份有限公司 信息关联方法、装置、服务器及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115311608A (zh) * 2022-10-11 2022-11-08 之江实验室 一种多任务多目标关联追踪的方法及装置
CN115311608B (zh) * 2022-10-11 2023-03-21 之江实验室 一种多任务多目标关联追踪的方法及装置

Also Published As

Publication number Publication date
CN112949577B (zh) 2023-05-09
CN112949577A (zh) 2021-06-11

Similar Documents

Publication Publication Date Title
WO2022206744A1 (zh) 信息关联方法、系统、装置、服务器及存储介质
US9646212B2 (en) Methods, devices and systems for detecting objects in a video
US7280687B2 (en) Device for detecting position/orientation of object
WO2021114884A1 (zh) 点云标注方法、装置、系统、设备及存储介质
JP2003203238A (ja) フレームレートで多数のビデオストリームを対応させるスケーラブルアーキテクチャ
JP5554726B2 (ja) データ関連付けのための方法と装置
WO2021218792A1 (zh) 包裹处理设备、包裹处理方法、电子设备及存储介质
KR101125233B1 (ko) 융합기술기반 보안방법 및 융합기술기반 보안시스템
CN102087746A (zh) 图像处理装置、图像处理方法和程序
CN109151295A (zh) 一种目标对象抓拍方法、装置及视频监控设备
KR20230101815A (ko) 하나 이상의 센서에 의해 등록된 미확인 검출들을 이용한 3차원 객체 추적
CN114624263B (zh) 一种双源双视角基于目标识别的切图方法及系统
US20140333730A1 (en) Method of 3d reconstruction of a scene calling upon asynchronous sensors
EP3806039A1 (en) Spatial positioning method and device, system thereof and computer-readable medium
CN105225248B (zh) 识别物体的运动方向的方法和设备
Hassan et al. 3D distance measurement accuracy on low-cost stereo camera
CN109064499A (zh) 一种基于分布式解析的多层框架抗震实验高速视频测量方法
US9292963B2 (en) Three-dimensional object model determination using a beacon
JP2018010599A (ja) 情報処理装置、パノラマ画像表示方法、パノラマ画像表示プログラム
WO2021031626A1 (zh) 图像处理方法、装置、计算机系统以及可读存储介质
CN115049322B (zh) 一种集装箱堆场的集装箱管理方法及系统
CN111985326B (zh) 数据处理方法、装置、电子设备和存储介质
JP4185271B2 (ja) 位置検出装置及び位置検出プログラム
CN109374919A (zh) 一种基于单个拍摄设备确定移动速度的方法及装置
CN108416273A (zh) 一种快速人脸识别系统及其识别方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22778926

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22778926

Country of ref document: EP

Kind code of ref document: A1