CN116469142A - Target positioning and identifying method, device and readable storage medium - Google Patents

Target positioning and identifying method, device and readable storage medium Download PDF

Info

Publication number
CN116469142A
CN116469142A CN202310266610.5A CN202310266610A CN116469142A CN 116469142 A CN116469142 A CN 116469142A CN 202310266610 A CN202310266610 A CN 202310266610A CN 116469142 A CN116469142 A CN 116469142A
Authority
CN
China
Prior art keywords
target
face
face image
image information
person
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310266610.5A
Other languages
Chinese (zh)
Inventor
李清
彭俊坤
谭源正
赵丹
江勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peng Cheng Laboratory
Original Assignee
Peng Cheng Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peng Cheng Laboratory filed Critical Peng Cheng Laboratory
Priority to CN202310266610.5A priority Critical patent/CN116469142A/en
Publication of CN116469142A publication Critical patent/CN116469142A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • G06V10/95Hardware or software architectures specially adapted for image or video understanding structured as a network, e.g. client-server architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/17Terrestrial scenes taken from planes or by drones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • H04W4/029Location-based management or tracking services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W76/00Connection management
    • H04W76/10Connection setup
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Remote Sensing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target positioning and identifying method, equipment and a readable storage medium, wherein the method comprises the following steps: acquiring a plurality of face image information of a person target through an unmanned aerial vehicle group; determining target edge equipment of earliest finishing time in the edge equipment group and sending each face image information to the target edge equipment; converting face image coordinates in the face image information into a three-dimensional coordinate point set through target edge equipment, and determining face space positions corresponding to the face image coordinates according to the three-dimensional coordinate point set; and extracting face feature vectors in the face image information through the target edge equipment, and fusing the face feature vectors to determine identity information of the person target. The method is applied to a system consisting of the unmanned aerial vehicle group and the edge equipment group, and can realize high-precision positioning and high-precision identification of the character target with lower cost and lower time delay.

Description

Target positioning and identifying method, device and readable storage medium
Technical Field
The present invention relates to the field of wireless networks, and in particular, to a method, an apparatus, and a computer readable storage medium for locating and identifying a target.
Background
In recent years, person recognition technology has been widely used to improve public safety. Through the analysis of the face characteristics, the technology can effectively judge the identity information of the person. Traditional person identification schemes mainly rely on images captured by fixed-position cameras, which have limited fields of view, have low tracking efficiency on moving objects, and cannot perform large-scale person identification in crowd scenes in open environments.
The person identification technology can be applied to unmanned aerial vehicle monitoring, so that the person positioning and identification capability is improved. However, since the unmanned aerial vehicle with the person positioning and identifying technology is supported by high positioning function hardware, the difference between the orientation of the single unmanned aerial vehicle and the orientation of the person is large, and the unmanned aerial vehicle has limited airborne computing resources and the person positioning and identifying needs large computing power, so that the unmanned aerial vehicle is excessively heavy in load and is limited by high delay and other factors, the technical problems of high cost, low positioning precision of the person target, and poor precision and instantaneity of the person identification exist in the existing unmanned aerial vehicle.
Disclosure of Invention
The invention mainly aims to provide a target positioning and identifying method, equipment and a computer readable storage medium, and aims to solve the technical problems that an unmanned aerial vehicle applying a character identifying technology at present is high in cost, low in positioning precision of a character target and poor in precision and instantaneity of character identification.
In order to achieve the above object, the present invention provides a target positioning and identifying method, which is applied to a target positioning and identifying system; the target positioning and identifying system comprises an unmanned aerial vehicle group and an edge equipment group; the unmanned aerial vehicle group is in wireless communication connection with the edge equipment group;
the target positioning and identifying method comprises the following steps:
acquiring a plurality of face image information of a person target through the unmanned aerial vehicle group;
determining target edge equipment with earliest finishing time in the edge equipment group and scheduling each face image information to the target edge equipment;
converting face image coordinates in the face image information into a three-dimensional coordinate point set through the target edge equipment, and determining a face space position corresponding to the face image coordinates according to the three-dimensional coordinate point set; and
and extracting face feature vectors in the face image information through the target edge equipment, and fusing the face feature vectors to determine identity information of the person target.
Optionally, the step of acquiring, by the unmanned aerial vehicle group, a plurality of face image information of the person target includes:
Acquiring a plurality of scene images in a visual angle range through the unmanned aerial vehicle group at intervals of preset shooting interval duration;
determining and acquiring a human body target image in the scene image based on a preset target detection model;
acquiring face image information in the human body target image based on a preset face detection model; the face image information characterizes a face image having face detection frames and face image coordinates.
Optionally, the edge device group includes a scheduled edge device and a non-scheduled edge device;
the step of determining the target edge device of the earliest completion time in the edge device group includes:
transmitting a test file packet to each non-scheduling edge device through the scheduling edge device to determine the current transmission delay of the non-scheduling edge device; and
acquiring the current task queue length, the central processing unit performance parameter and the image processor performance parameter of the non-scheduling edge equipment;
inputting the transmission delay, the task queue length, the central processing unit performance parameter and the image processor performance parameter into a preset multivariable linear regression model to obtain the predicted completion time of the non-scheduling edge equipment;
Comparing the predicted completion times of the respective non-scheduled edge devices to determine a target edge device of earliest completion times among the respective non-scheduled edge devices.
Optionally, the step of converting, by the target edge device, coordinates of a face image in the face image information into a three-dimensional coordinate point set includes:
and inputting face image coordinates and preset height space limiting parameters in the face image information to a preset two-dimensional-three-dimensional conversion model through the target edge equipment so as to convert the face image coordinates into a three-dimensional coordinate point set with a plurality of three-dimensional coordinate points.
Optionally, the step of determining the face space position corresponding to the face image coordinates according to the three-dimensional coordinate point set includes:
traversing the three-dimensional coordinate points in the three-dimensional coordinate point set, and projecting the traversed three-dimensional coordinate points to each face image information;
judging whether the traversed three-dimensional coordinate points are projected in a face detection frame in all face image information or not;
and if the traversed three-dimensional coordinate points are in the face detection frames in all the face image information, determining the traversed three-dimensional coordinate points as the face space positions corresponding to the face image coordinates.
Optionally, the face image coordinates include: facial five-sense organ coordinate points; the step of fusing the face feature vectors to determine the identity information of the person target includes:
inputting the facial feature coordinate points and the resolution of each piece of facial image information to a preset fusion weight model to obtain feature fusion weights of each piece of facial image information;
multiplying the face feature vector by the corresponding feature fusion weight to obtain a weighted face feature vector;
and adding the face feature vectors after weighting to determine the identity information of the person target.
Optionally, the step of adding the face feature vectors after each weighting to determine identity information of the person target includes:
adding the weighted face feature vectors to obtain a fused face feature vector, and inputting the fused face feature vector into a support vector machine to determine a corresponding person class;
and if the person category is a preset attention person, outputting the spatial position and the face image of the person target.
Optionally, the target positioning and identifying system further comprises a cloud server; the edge equipment is in wireless communication connection with the cloud server;
After the step of determining the target edge device of the earliest completion time in the edge device group, the method further includes:
if the prediction completion time of the target edge equipment is longer than the preset shooting interval duration, sending the face image information to the cloud server; the cloud server is used for determining a face space position corresponding to the face image coordinates; and determining identity information of the persona target.
In addition, to achieve the above object, the present invention also provides a target positioning and identifying system, including:
the unmanned aerial vehicle group is used for acquiring a plurality of face image information of the person target;
the edge equipment group is used for determining target edge equipment with earliest completion time in the edge equipment group and scheduling the face image information to the target edge equipment;
converting face image coordinates in the face image information into a three-dimensional coordinate point set, and determining a face space position corresponding to the face image coordinates according to the three-dimensional coordinate point set; extracting face feature vectors in the face image information, and fusing the face feature vectors to determine identity information of the person target
In addition, in order to achieve the above object, the present invention also provides an object positioning and identifying device, which includes a processor, a storage unit, and an object positioning and identifying program stored on the storage unit and executable by the processor, wherein the object positioning and identifying program, when executed by the processor, implements the steps of the object positioning and identifying method as described above.
The present invention also provides a computer readable storage medium having stored thereon a target positioning and identifying program, wherein the target positioning and identifying program, when executed by a processor, implements the steps of the target positioning and identifying method as described above.
According to the target positioning and identifying method in the technical scheme, firstly, the unmanned aerial vehicle group is used for acquiring the face image information of the person targets, so that large-scale person identification can be performed in a crowd scene of an open environment, the tracking efficiency of the person targets can be effectively improved based on the flexibility of the unmanned aerial vehicle, the face image information of each person target at different angles can be acquired, the accuracy of face identification can be improved, the unmanned aerial vehicle also only transmits the face image information, all scene images in the visual angle range are not required to be transmitted, the transmitted data flow is reduced, the information transmission efficiency is improved, and the real-time performance of image processing is ensured. And then, determining the face image information acquired based on the unmanned aerial vehicle group at the moment as the current person target positioning and identification task scheduling to the target edge equipment by determining the target edge equipment with the earliest completion time in the edge equipment group and scheduling the face image information to the target edge equipment so as to acquire information such as the load state, the processing capacity and the like of each edge equipment in the edge equipment group in real time and select the target edge equipment with the current earliest completion time. Finally, the face image coordinates in the face image information are converted into a three-dimensional coordinate point set through the target edge equipment, and the face space position corresponding to the face image coordinates is determined according to the three-dimensional coordinate point set, so that the unmanned aerial vehicle can firstly determine various possible three-dimensional coordinate points, namely various possible face space coordinates, based on the two-dimensional face image coordinates obtained by a common camera even if the unmanned aerial vehicle is not provided with a special positioning device such as a high-cost depth camera or a laser radar, and then the actual face space coordinates of the human target can be determined by using a machine vision technology to cross search from a plurality of views by utilizing the face image information of a plurality of unmanned aerial vehicles based on the three-dimensional coordinate point set, and therefore, the face space coordinates, namely the positioning of the human target, are accurate, and the hardware cost of unmanned aerial vehicle positioning is greatly reduced. And extracting the face feature vectors in the face image information through the target edge equipment, merging the face feature vectors to determine the identity information of the person target, and merging the face feature vectors in the face image information with different face visual angles to obtain the relatively complete face feature vector of the person target, so that the comprehensive analysis of the face feature vectors is realized, and the identification accuracy of the identity information of the person target is greatly improved. In addition, the method for positioning and fusion recognition of the character targets is carried out in the target edge equipment with earliest finishing time, so that the load of the whole system is more balanced, and the efficiency and timeliness of character positioning and recognition are more excellent and stable.
Drawings
FIG. 1 is a schematic diagram of a hardware operating environment of a target locating and identifying device according to an embodiment of the present invention;
FIG. 2 is a flowchart of a first embodiment of a target positioning and recognition method according to the present invention;
FIG. 3 is a detailed flowchart of step S10 according to an embodiment of the target positioning and identifying method of the present invention;
FIG. 4 is a detailed flowchart of step S20 according to an embodiment of the target positioning and identifying method of the present invention;
FIG. 5 is a schematic diagram of an application scenario involved in the target location and recognition method of the present invention;
FIG. 6 is a system block diagram of dynamic task scheduling of heterogeneous devices according to the targeting and recognition method of the present invention;
FIG. 7 is a flowchart of a multi-view cross search algorithm related to the target location and identification method of the present invention;
FIG. 8 is a diagram of a fused weight network structure and training process involved in the target localization and recognition method of the present invention;
FIG. 9 is a schematic view of a scenario corresponding to dynamic scheduling and cross positioning involved in the target positioning and recognition method of the present invention;
fig. 10 is a schematic diagram of a multi-view fusion face recognition scenario related to the target positioning and recognition method of the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The embodiment of the invention provides target positioning and identifying equipment. The target positioning and identifying device comprises a plurality of unmanned aerial vehicles and a plurality of edge devices with data processing and calculating capabilities.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a hardware operating environment of a target positioning and identifying device according to an embodiment of the present invention.
As shown in fig. 1, the object locating and identifying apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory unit 1005, and a communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display (Display), an input unit such as a control panel, and the optional user interface 1003 may also include a standard wired interface, a wireless interface. Network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., a WIFI interface). The storage unit 1005 may be a high-speed RAM storage unit or a stable storage unit (non-volatile memory), such as a disk storage unit. The storage unit 1005 may alternatively be a storage device independent of the aforementioned processor 1001. A target positioning and recognition program may be included in the storage unit 1005 as a computer storage medium.
Those skilled in the art will appreciate that the hardware configuration shown in fig. 1 does not constitute a limitation of the apparatus, and may include more or fewer components than shown, or may combine certain components, or may be arranged in different components.
With continued reference to fig. 1, the storage unit 1005 in fig. 1, which is a computer-readable storage medium, may include an operating system, a user interface module, a network communication module, and a target positioning and recognition program.
In fig. 1, the network communication module is mainly used for connecting with a server and performing data communication with the server; and the processor 1001 may call the object locating and identifying program stored in the storage unit 1005 and perform the following operations:
acquiring a plurality of face image information of a person target through the unmanned aerial vehicle group;
determining target edge equipment with earliest finishing time in the edge equipment group and scheduling each face image information to the target edge equipment;
converting face image coordinates in the face image information into a three-dimensional coordinate point set through the target edge equipment, and determining a face space position corresponding to the face image coordinates according to the three-dimensional coordinate point set; and
And extracting face feature vectors in the face image information through the target edge equipment, and fusing the face feature vectors to determine identity information of the person target.
Further, the processor 1001 may call the object locating and identifying program stored in the memory 1005, and further perform the following operations:
acquiring a plurality of scene images in a visual angle range through the unmanned aerial vehicle group at intervals of preset shooting interval duration;
determining and acquiring a human body target image in the scene image based on a preset target detection model;
acquiring face image information in the human body target image based on a preset face detection model; the face image information characterizes a face image having face detection frames and face image coordinates.
Further, the processor 1001 may call the object locating and identifying program stored in the memory 1005, and further perform the following operations:
transmitting a test file packet to each non-scheduling edge device through the scheduling edge device to determine the current transmission delay of the non-scheduling edge device; and
acquiring the current task queue length, the central processing unit performance parameter and the image processor performance parameter of the non-scheduling edge equipment;
Inputting the transmission delay, the task queue length, the central processing unit performance parameter and the image processor performance parameter into a preset multivariable linear regression model to obtain the predicted completion time of the non-scheduling edge equipment;
comparing the predicted completion times of the respective non-scheduled edge devices to determine a target edge device of earliest completion times among the respective non-scheduled edge devices.
Further, the processor 1001 may call the object locating and identifying program stored in the memory 1005, and further perform the following operations:
and inputting face image coordinates and preset height space limiting parameters in the face image information to a preset two-dimensional-three-dimensional conversion model through the target edge equipment so as to convert the face image coordinates into a three-dimensional coordinate point set with a plurality of three-dimensional coordinate points.
Further, the processor 1001 may call the object locating and identifying program stored in the memory 1005, and further perform the following operations:
traversing the three-dimensional coordinate points in the three-dimensional coordinate point set, and projecting the traversed three-dimensional coordinate points to each face image information;
judging whether the traversed three-dimensional coordinate points are projected in a face detection frame in all face image information or not;
And if the traversed three-dimensional coordinate points are in the face detection frames in all the face image information, determining the traversed three-dimensional coordinate points as the face space positions corresponding to the face image coordinates.
Further, the processor 1001 may call the object locating and identifying program stored in the memory 1005, and further perform the following operations:
inputting the facial feature coordinate points and the resolution of each piece of facial image information to a preset fusion weight model to obtain feature fusion weights of each piece of facial image information;
multiplying the face feature vector by the corresponding feature fusion weight to obtain a weighted face feature vector;
and adding the face feature vectors after weighting to determine the identity information of the person target.
Further, the processor 1001 may call the object locating and identifying program stored in the memory 1005, and further perform the following operations:
adding the weighted face feature vectors to obtain a fused face feature vector, and inputting the fused face feature vector into a support vector machine to determine a corresponding person class;
and if the person category is a preset attention person, outputting the spatial position and the face image of the person target.
Further, the processor 1001 may call the object locating and identifying program stored in the memory 1005, and further perform the following operations:
if the prediction completion time of the target edge equipment is longer than the preset shooting interval duration, sending the face image information to the cloud server; the cloud server is used for determining a face space position corresponding to the face image coordinates; and determining identity information of the persona target.
In order to facilitate understanding of the following embodiments of the present invention, a series of major challenges and corresponding major solutions regarding the positioning and identification of human targets using a drone to which the present invention is directed will be briefly described herein:
in order to perform large-scale character recognition in crowd scenes in open environments, unmanned aerial vehicles may be used for character recognition and target location tracking. The human body recognition and tracking solution based on the unmanned aerial vehicle benefits from the wide field of view and high maneuverability, and can be applied to various application scenes such as military operations, safety services and the like. However, human identification and tracking using unmanned aerial vehicles has great challenges, the main challenges are as follows:
1) The person identification precision and real-time performance are poor
Currently, face recognition solutions based on deep neural networks have achieved higher accuracy, but provided that there is a sufficient amount of face pixels in the image. However, single unmanned aerial vehicle often leads to the face to be small in the scale of picture because flying height is high and unmanned aerial vehicle and people's of difference angle, and the deflection angle of face towards unmanned aerial vehicle camera is too big, is difficult to discern. Meanwhile, the face recognition technology based on the deep neural network consumes a great deal of computing resources. The unmanned aerial vehicle has limited airborne computing resources, the unmanned aerial vehicle is overloaded due to the fact that the face recognition technology based on the deep neural network is deployed on the unmanned aerial vehicle, high delay is brought to a system, and the unmanned aerial vehicle is not ideal for real-time recognition.
2) The accuracy and working range of target positioning are insufficient
In order to track target persons in a crowd, most advanced positioning techniques require a depth camera or lidar on the drone, which is about 10 times more expensive than conventional cameras. In addition, in outdoor and long-distance scenes, the positioning accuracy and the point cloud density of the depth camera and the laser radar are drastically reduced, so that the positioning accuracy is poor, and the working area is small.
In order to solve the technical problems and challenges to be dealt with, the main technical scheme of the invention is as follows:
the invention designs a real-time visual perception system for the cooperative work of multiple unmanned aerial vehicles, namely the target positioning and identifying system, which can accurately and rapidly position and identify the person targets in crowds. According to the invention, an unmanned plane platform with an onboard computing capability is used, a lightweight neural network model is deployed on an unmanned plane, so that people in a scene are detected from different perspectives of the unmanned plane, meanwhile, effective face data are extracted and unloaded to edge equipment through analysis of an unmanned plane end, and the transmitted data flow is reduced; in order to realize the positioning of the person, the invention uses images of different visual angles of a plurality of unmanned aerial vehicles, and uses a machine vision technology to cross search from a plurality of views to calculate the three-dimensional position of the person; in order to realize high-precision identification, the invention acquires face data of a plurality of unmanned aerial vehicles at different visual angles of people, uses a neural network technology to judge the easy identification of the people, and performs fusion identification according to the easy identification. In order to realize the optimization of the load balancing and the end-to-end time delay of the multi-unmanned aerial vehicle system, the task estimation completion time of each edge device in the system is dynamically estimated through a lightweight machine learning algorithm, the optimal completion device (target edge device) is selected according to the earliest completion time (EFT, earliest Finish Time), data are unloaded to the optimal completion device for execution, in addition, under the condition that the edge devices all need longer data processing time, the data can be unloaded to a cloud server, the scheduling and balancing of the workload among the edge devices and between the edge devices and the cloud server are realized, the processing delay is minimized, the efficiency of target positioning and identification is improved, and the real-time performance of target positioning and identification is ensured.
Further, in order to facilitate understanding of the application scenario related to the present invention, please refer to fig. 5, fig. 5 is a schematic diagram of the application scenario related to the target positioning and identifying method of the present invention. As shown in fig. 5, the unmanned aerial vehicle group includes three unmanned aerial vehicles 1, 2, and 3, each unmanned aerial vehicle collects face image data (information) in a certain area range in the air, and after determining a current optimal device from the edge computing device, the unmanned aerial vehicle is guided to unload the collected face image data to the current optimal device through scheduling decision information, so that a current person target positioning and identifying task is processed in the current optimal device to determine the spatial position and identity information of the person target.
The embodiment of the invention provides a target positioning and identifying method.
Referring to fig. 2, fig. 2 is a flowchart illustrating a first embodiment of a target positioning and identifying method according to the present invention; in a first embodiment of the present invention, the target positioning and identifying method is applied to a target positioning and identifying system; the target positioning and identifying system comprises an unmanned aerial vehicle group and an edge equipment group; the unmanned aerial vehicle group is in wireless communication connection with the edge equipment group;
The target positioning and identifying method comprises the following steps:
step S10, acquiring a plurality of face image information of a person target through the unmanned aerial vehicle group;
in this embodiment, the target positioning and recognition system is a comprehensive system with a person target positioning and recognition function, which is composed of a plurality of unmanned aerial vehicles (unmanned aerial vehicle groups) and a plurality of edge devices (edge device groups). The unmanned aerial vehicle group is used for shooting a scene in a view angle range from the air and acquiring face image information of a person target from the scene; the edge equipment refers to computing equipment or data processing equipment within a certain range with the unmanned aerial vehicle group, and is mainly used for scheduling positioning and identifying tasks and processing the positioning and identifying tasks of the person targets. The unmanned aerial vehicle group and the edge equipment group can communicate through wireless communication technologies such as wifi LAN, zigBee, 4G, 5G, in order to improve communication efficiency between the two, the space distance of unmanned aerial vehicle group and edge equipment group interval should not be too far away, can adjust according to actual need.
For unmanned aerial vehicle crowd, constitute by many unmanned aerial vehicle. The number of unmanned aerial vehicles in the unmanned aerial vehicle group can be set according to actual needs. The current unmanned aerial vehicle is equipped with the camera, can select ordinary camera in this embodiment, have basic video recording promptly, shoot the function can, can not need the camera that has the range finding function, also need not special positioner such as laser radar to reduce unmanned aerial vehicle crowd's cost.
In the flying process of the unmanned aerial vehicle group, each unmanned aerial vehicle can acquire face image information of a person object in a scene in respective view angle ranges, so that the unmanned aerial vehicle group acquires face image information of a plurality of view angles, and the face image information not only has the face image information of each person object in the scene, but also has the face image information of the same person object in different view angles.
Specifically, the unmanned aerial vehicle group firstly shoots and acquires a scene image, namely, the scene image including a person target and other objects is acquired and shot in the respective visual angle range, then the person target is identified from the scene image based on a lightweight neural network and marked, and further the face image information in the scene image is extracted.
Referring to fig. 3, in an embodiment, the step S10 includes:
step S11, acquiring a plurality of scene images in a visual angle range through an unmanned aerial vehicle group every preset shooting interval duration;
the preset shooting interval duration, that is, the sampling frequency of the scene images collected by the unmanned aerial vehicle, can be set according to actual needs, for example, 1s, that is, every 1s, the scene images in the respective view angle ranges are collected by the unmanned aerial vehicle in the unmanned aerial vehicle group, and therefore the unmanned aerial vehicle group obtains a plurality of scene images.
Step S12, determining and acquiring a human body target image in the scene image based on a preset target detection model;
step S13, acquiring face image information in the human body target image based on a preset face detection model; the face image information characterizes a face image having face detection frames and face image coordinates.
In this embodiment, the unmanned aerial vehicle has on-board computing capability, the scene image shot by the unmanned aerial vehicle is processed on the on-board side by the target detection model and the face detection model, and the models of the image recognition types carried by the unmanned aerial vehicle are all lightweight neural networks, so that the load of the unmanned aerial vehicle is reduced, more computing power is applied to other works, and the unmanned aerial vehicle is prevented from losing due to abnormal flight caused by overload of the load.
Specifically, a lightweight target detection algorithm YoloX-tiny and a face detection algorithm RetinaFace are deployed on the airborne side of the unmanned aerial vehicle. Compared with the traditional Yolo algorithm, the YoloX-tiny removes an Anchor mechanism, and reduces complexity and parameter quantity of a model. Meanwhile, the YoloX-tiny is adjusted and trained on a VisDrone unmanned plane target detection data set, and data enhancement modes such as random zooming, cutting, arrangement (Mosaic), multi-image mixing (Mix Up) and the like are adopted to strengthen training of a network so as to better detect characters in an unmanned plane visual angle. The two algorithm models extract the face image, the facial coordinates of the face and the boundary box position (face detection box) containing the face in the scene image.
After the scene image is acquired, a lightweight object detection model is used for identifying and determining which person images are from the scene image, and extracting a human body object image (person image) from the scene image. And further, acquiring the face image information in the human body target image based on the lightweight face detection model, marking a face detection frame on the face image in the process of recognizing the human body target image by the face detection model, and determining the image coordinates of the face image in the whole scene image, thereby obtaining the face image information with the face detection frame and the face image coordinates.
According to the embodiment, through the neural network for recognizing the lightweight image, the unmanned aerial vehicle only needs to execute the task of extracting the face image from the scene in parallel, so that the operation load of the unmanned aerial vehicle is reduced, the normal operation of other necessary modules for maintaining work such as shooting and flying of the unmanned aerial vehicle is ensured, and an oversized and complicated airborne device and a heat dissipation device are not needed, so that the flying pressure is reduced, and the energy consumption is reduced. And then the unmanned aerial vehicle sends the face image information with the extracted data quantity smaller than the scene image to the edge equipment, so that the instantaneity of data exchange can be ensured, and the time delay is reduced.
Step S20, determining target edge equipment with earliest finishing time in the edge equipment group and scheduling each face image information to the target edge equipment;
the edge device group needs to designate at least one edge device as a scheduling edge device for scheduling the current positioning and identification task. The scheduling edge device may be selected from one or several edge device groups, or may be selected according to the performance of the edge device, which is not limited herein.
And performing performance evaluation and time prediction required by completing the current task on other non-scheduling edge devices through the scheduling edge device so as to determine the target edge device with earliest completion time in the edge device group and schedule the face image information to the target edge device.
Specifically, after the target edge device is determined, the scheduling edge device can send relevant scheduling decision information to each unmanned aerial vehicle, so that the unmanned aerial vehicle group is instructed to unload all face image information to the target edge device, and the current person target positioning and identifying task is completed through the target edge device with earliest completion time. Therefore, the processing efficiency of the person target positioning and identifying task is improved, and real-time positioning and identifying can be ensured.
In an embodiment, the edge device group includes a scheduled edge device and a non-scheduled edge device;
referring to fig. 4, in the step S20, the step of determining the earliest completion time of the target edge device in the edge device group includes:
step S21, sending test file packets to each non-scheduling edge device through the scheduling edge device to determine the current transmission delay of the non-scheduling edge device; and
after the scheduled edge device is selected, the other edge devices are defined as non-edge devices.
And carrying out state collection and scheduling on other non-scheduling edge devices through the scheduling edge device. Specifically, the shooting interval of shooting scene images in the unmanned aerial vehicle group can be set by sending test file packets to each non-scheduling edge device, so as to receive state feedback data of the non-scheduling edge device to obtain the maximum transmission Delay of the non-scheduling edge device at the current moment j The test package may be an empty package.
Step S22, acquiring the current task queue length, the central processing unit performance parameter and the image processor performance parameter of the non-scheduling edge equipment;
receiving state feedback data of the non-scheduling edge equipment to obtain the current task Queue length Queue of the non-scheduling edge equipment j CPU of edge device, CPU of GPU performance parameters j ,GPU j
Step S23, inputting the transmission delay, the task queue length, the CPU performance parameter and the image processor performance parameter into a preset multivariable linear regression model to obtain the predicted completion time of the non-dispatching edge equipment;
step S24, comparing the predicted completion time of each non-scheduled edge device to determine a target edge device of earliest completion time among the non-scheduled edge devices.
Delay the maximum transmission Delay j Current task Queue length Queue j CPU of edge device, CPU of GPU performance parameters j ,GPU j The model is input into a multivariate linear regression (Multivariable Linear Regression, MLR) model preset in the dispatching edge device, and the model outputs the earliest completion time of the tasks of each non-dispatching edge device. Unmanned plane onceAfter shooting is completed, the dispatching edge device compares the predicted completion time of each non-dispatching edge device, and arranges the positioning and identifying task of the follow-up shooting to the non-dispatching edge device with the shortest predicted completion time, namely the target edge device corresponding to the earliest completion time. It should be noted that, the scheduling edge device may also participate in the person target positioning and task identification, so that the scheduling edge device may also estimate its own predicted completion time, so as to comprehensively compare all edge devices to determine the earliest completion time, that is, the current optimal edge device, and send the current task to the optimal device.
For the multivariate linear regression model in this embodiment, its corresponding mathematical expression may be:
EFT j =MLR(Delay j ,||Queue j ||,CPU j ,GPU j )=θ 01 Delay j +
θ 2 Queue j3 GPU j4 CPU j );
wherein θ is 01234 Updatable parameters that are multivariate linear regression; delay of j Maximum transmission delay for the jth edge device; queue j The number of the tasks queued in the jth edge device, i.e., the task queue length; CPU (Central processing Unit) j ,GPU j The CPU performance parameter and the GPU performance parameter of the jth edge device are respectively represented.
The multivariate linear regression model in this embodiment is a trainable updated model, and a track pool may be maintained to store the real history in order to train the multivariate linear regression model. Each track is a pair of tracks including a parameter (Delay j ,||Queue j ||,CPU j ,GPU j ) And the real completion time. The mean square loss error can be used for updating the multivariable linear regression model in real time in the running process of the system, so that the accurate prediction of the prediction completion time is realized.
To facilitate understanding of the flow of this embodiment of the present invention, please refer to fig. 6, fig. 6 is a system block diagram of heterogeneous device dynamic task scheduling related to the target positioning and identification method of the present invention. As shown in fig. 6, the edge device group includes an edge device 1, an edge device j. Any edge device is selected as a dispatching edge device, device state collection is carried out on other edge devices, and device states of the unmanned aerial vehicle, such as shooting states, flight states and the like of the unmanned aerial vehicle, can be collected. And then, based on the equipment state information of the edge equipment, performing multi-variable linear regression, wherein the track pool stores a real history record for the multi-variable linear regression model, and in the multi-variable linear regression process, the equipment selector selects the target edge equipment with earliest completion time and can generate an allocation table to perform number ordering and other allocation work on the continuously performed current task, for example, an executor corresponding to the task 1 is the edge equipment 1.
According to the embodiment of the invention, the target edge equipment with earliest completion time at the current moment is determined, so that the time delay for processing the target positioning and identifying tasks can be greatly reduced, the load among the edge equipment is balanced, the processing efficiency of the target positioning and identifying tasks is improved, the tasks can be ensured to be orderly carried out, and the running stability and reliability of the edge equipment group are improved.
Step S30, converting the face image coordinates in the face image information into a three-dimensional coordinate point set through the target edge equipment, and determining a face space position corresponding to the face image coordinates according to the three-dimensional coordinate point set; and
the face image information comprises face image coordinates which can be the facial coordinates, and one coordinate point of the facial features can be arbitrarily taken to convert two-dimensional coordinates into three-dimensional coordinates. The target edge device is used as execution device for processing the current human target positioning and recognition task, and can firstly convert through a preset image-space coordinate matrix on the basis of the two-dimensional coordinates of the human face image coordinates to obtain the two-dimensional space coordinates of the human target, so as to further predict various possible three-dimensional coordinate points of the human target, wherein the three-dimensional coordinate points only lack depth coordinates (commonly used z-axis coordinate representation) compared with the two-dimensional coordinate points. Thus, a three-dimensional coordinate point set can be obtained, and the real three-dimensional space coordinates of the character target are certain in the three-dimensional coordinate point set, so that in order to determine the space position of the character target, a unique three-dimensional coordinate point can be determined by aligning multi-view face images of the character target, wherein the unique three-dimensional coordinate point is the three-dimensional space coordinates of the face, namely the space position of the face, and in fact, the space position of the character target.
In an embodiment, step S30, the step of converting, by the target edge device, coordinates of a face image in the face image information into a three-dimensional coordinate point set includes:
and a step a of inputting face image coordinates and preset height space limiting parameters in the face image information to a preset two-dimensional-three-dimensional conversion model through the target edge equipment so as to convert the face image coordinates into a three-dimensional coordinate point set with a plurality of three-dimensional coordinate points.
In this embodiment, the preset two-dimensional-three-dimensional conversion model is used to convert two-dimensional face image coordinates to three-dimensional face space coordinates.
With one of the unmanned aerial vehicles D i For illustration, a three-dimensional coordinate position P of a person in the real world is defined W =[x W ,y W ,z W ] T Unmanned aerial vehicle D i The captured person has a coordinate in the image (face image coordinate) of P i =[x i ,y i ] T . According to the coordinate system conversion rule in machine vision, three-dimensional point P W Two-dimensional point P with image i The mapping relation of (2) is:
wherein C is i Representation unmanned plane D i Is an internal matrix of a camera, which canTo build an image pixel coordinate system and D i Is a mapping relationship between camera coordinate systems. R is R i And T i Representation unmanned plane D i Can establish D i Is a mapping relationship between the camera coordinate system and the world coordinate system. R is R i And T i Can be obtained by PNP visual positioning, C i The representation may be obtained by camera calibration techniques. From equation (1), the equation for two-dimensional image coordinates to obtain three-dimensional coordinates can be derived:
the formula (2), i.e. the mathematical expression of the two-dimensional to three-dimensional transformation model, is now due to depth informationThe three-dimensional coordinates of the face cannot be directly determined from the two-dimensional coordinates of the face image, but a series of possible three-dimensional coordinate point sets, called epipolar lines, are obtained.
In order to reduce the processing operation amount of the target edge equipment on the data, the face image coordinates P are input i And can also input preset height space limiting parameters: minimum height h min Maximum height h max And (3) in a preset two-dimensional-three-dimensional conversion model, limiting the range of the three-dimensional coordinate point set by utilizing the height space, and eliminating some three-dimensional coordinate points which are unlikely to be used for a person target. The height space limitation parameter here is related to the spatial height of the human body (mainly, the height) for the human body, and may be, for example, 0 to 3m, or other height space limitation parameters may be set, and is not limited herein.
In an embodiment, step S30, the step of determining, according to the three-dimensional coordinate point set, a face space position corresponding to the face image coordinate includes:
step b, traversing the three-dimensional coordinate points in the three-dimensional coordinate point set, and projecting the traversed three-dimensional coordinate points to each face image information;
for this embodiment, a multi-view cross search algorithm may be considered, where, for one unmanned aerial vehicle, after obtaining its three-dimensional coordinate point set, the three-dimensional coordinate point in the three-dimensional coordinate point set may be traversed, starting from the first three-dimensional coordinate point, and projecting the three-dimensional coordinate point to all view angles of all unmanned aerial vehicles, that is, to all face images.
Step c, judging whether the traversed three-dimensional coordinate points are projected in a face detection frame in all face image information;
and d, if the traversed three-dimensional coordinate points are in the face detection frames in all the face image information, determining the traversed three-dimensional coordinate points as the face space positions corresponding to the face image coordinates.
And judging whether the three-dimensional coordinate points are projected in the face detection frames in all the face images.
If the traversed three-dimensional coordinate point is projected in the face detection frames in all face image information, the three-dimensional coordinate point is necessarily the three-dimensional space coordinate where the person target is located, and at the moment, the face space position corresponding to the face image coordinate is determined, so that the traversing can be stopped.
If the three-dimensional coordinate points after traversing are not projected into the face detection frames in all the face image information, only partial face detection frames or no face detection frames are possible, the next three-dimensional coordinate points need to be continuously traversed until the three-dimensional coordinate points currently traversed are projected into the face detection frames in all the face image information, and therefore the spatial positions of the person targets are determined.
In order to further understand the multi-view cross search algorithm proposed in this embodiment, please refer to fig. 7, fig. 7 is a flowchart of the multi-view cross search algorithm related to the target positioning and identifying method of the present invention. The process of determining the face space position corresponding to the face image coordinates according to the three-dimensional coordinate point set may be:
all unmanned aerial vehicles perform face detection;
selecting any main visual angle unmanned aerial vehicle;
acquiring polar lines according to the detection result (face image coordinates);
limiting the depth search range according to the space;
traversing three-dimensional coordinate points on the polar line;
projecting to other unmanned aerial vehicle viewing angles;
judging whether the projection of the point falls in the face detection frames of all unmanned aerial vehicles or not;
if yes, exiting from the traversal, and using the current point as a position estimation point of the person; outputting a positioning result;
If not, go on traversing the next point.
Through the embodiment of the invention, the three-dimensional position of the person is calculated by utilizing the images of different visual angles of a plurality of unmanned aerial vehicles and using the machine vision technology to cross search from a plurality of views, so that people can be accurately positioned in the three-dimensional real world by shooting two-dimensional images by using a traditional camera, and facial sub-images of the same person can be aligned from different unmanned aerial vehicle visual angles. This embodiment allows multiple drones to form a unified perception of a real scene.
And S40, extracting face feature vectors in the face image information through the target edge equipment, and fusing the face feature vectors to determine the identity information of the person target.
After the multi-view face images of the person target are aligned, the target edge device runs a multi-view fusion recognition algorithm module, a Fusion Weight Network (FWN) is arranged in the module, face feature vectors (which can be vectors with 1x512 dimensions) in the face images of all the views are extracted based on an image recognition technology (such as ArcFace recognition algorithm), and in order to comprehensively utilize the face features of the different angles, the accuracy of face recognition is improved. The fusion weight network distributes a fusion weight for each face image to reflect the legibility of the face image. The legibility here is related to the viewing angle and resolution, the closer the viewing angle is to the front of the face, the higher the resolution, the better the corresponding legibility, and the higher the fusion weight.
Then, a fusion layer in the fusion weight network fuses the face feature vectors of different view angles by using the fusion weight value to generate fused face feature vectors. The fused face features contain more information than the face features extracted from the single image. Finally, the features are classified by a support vector machine algorithm (SVM). The support vector machine algorithm is trained on a database containing the feature vectors of the target characters, the fused face features are classified in the testing process, the characters classified as the target characters are considered as target characters needing to be searched, three-dimensional positions and face images of the target characters are obtained, and the support vector machine algorithm can be used for further operation, such as follow-up work of outdoor personnel searching, monitoring, broadcasting face images and the like.
In an embodiment, the face image coordinates include: facial five-sense organ coordinate points;
step S40, the step of merging the face feature vectors to determine identity information of the person target includes:
step e, inputting the facial feature coordinate points and the resolution of each piece of facial image information to a preset fusion weight model to obtain feature fusion weights of each piece of facial image information;
F, multiplying the face feature vector by the corresponding feature fusion weight to obtain a weighted face feature vector;
and g, adding the face feature vectors after weighting to determine the identity information of the person target.
Fig. 8 is a schematic diagram of a fusion weight network structure and training process related to the target positioning and identifying method of the present invention, as shown in fig. 8.
Referring to the upper half of fig. 8, a fusion weight network (weight fusion network) inputs the facial feature coordinate points of the face and the resolution of the face image, and outputs a fusion weight for fusion. The neural network structure of the fusion weight network is a simple full-connection layer, 5 layers are total, and the dimensions are 180-dimension, 360-dimension, 180-dimension, 2-dimension and 1-dimension respectively. The above network dimension is set from the angle range of the face towards the unmanned plane from 0 degree to 360 degrees, so that the high-dimension feature related to the angle can be extracted from the facial feature point coordinates of the face and mapped into the 1-dimensional fusion weight to reflect the easy identification of the face angle.
After the feature fusion weights of the face image information are obtained, the original face feature vectors are multiplied by the corresponding feature fusion weights to obtain weighted face feature vectors, and then the face feature vectors after weighting are added to obtain the fused face feature vectors. And then, determining the identity information of the person target based on the fused face feature vector. Specifically, the fusion layer normalizes the fusion weights of the angle feature vectors, multiplies the angle feature vectors by the face feature vectors of the corresponding angles, and adds all the multiplied face feature vectors to obtain the fused face feature vectors. The fused face feature vector contains more detail than the face feature vector extracted from the single image.
Classifying by a Support Vector Machine (SVM), inputting the fused face features into the support vector machine, outputting the classification of the features by the support vector machine, and classifying the person targets into preset attention person as target persons to be found. The preset attention person or target person can be set according to the actual application requirement. When the person category is identified as the preset attention person, the spatial position of the person target and the face image thereof can be output to a specified system, so that subsequent activities such as manual confirmation, tracking, searching and the like of the person can be facilitated.
That is, the step of adding the face feature vectors after each weighting to determine the identity information of the person object includes:
adding the weighted face feature vectors to obtain a fused face feature vector, and inputting the fused face feature vector into a support vector machine to determine a corresponding person class;
and if the person category is a preset attention person, outputting the spatial position and the face image of the person target.
Further, please refer to the lower half of fig. 8. For the fused weight network, the network structure and the training process are shown in the figure. The fusion weight network inputs the resolution of the facial feature coordinate points and the facial image of the face, and outputs a fusion weight for fusion. A simple full-connection layer is adopted as a network structure of the fusion weight network, so that light weight and rapid calculation are realized. In the training process of the fusion weight network, an Arcface face recognition algorithm is used for extracting features of the reference face image and the training set face image, and then the two features are compared to obtain the real weight of the training face image, wherein the weight can represent the deviation (namely reflect the legibility) of training data and the reference face image. And then, calculating a mean square loss function with the weight output by the fusion weight network to reversely update the fusion weight network.
In addition, due to the light and rapid characteristics of the fusion weight network, the fusion weight network is deployed on a processor (CPU) for reasoning, the Arcface face recognition algorithm is deployed on a graphic display card (GPU) for reasoning, and parallel work of the fusion weight network and the graphic display card is realized. Simultaneously maintaining a thread pool, creating a thread for the face image of each view angle, thereby realizing the parallel processing of multiple threads and further improving the efficiency of the fusion and recognition of the face image.
According to the embodiment of the invention, the face images from different unmanned aerial vehicle visual angles are fused together according to the weight reflecting the legibility and are calculated in parallel, so that the processing delay is reduced, more importantly, compared with the face image with a single visual angle, the accuracy of identifying the person target can be greatly improved through feature fusion of the face images with multiple visual angles and the face images according to a certain weight rule.
In addition, in an embodiment, the target positioning and identifying system further comprises a cloud server; the edge equipment is in wireless communication connection with the cloud server;
step S20, after the step of determining the target edge device with the earliest completion time in the edge device group, the method further includes:
If the prediction completion time of the target edge equipment is longer than the preset shooting interval duration, sending the face image information to the cloud server; the cloud server is used for determining a face space position corresponding to the face image coordinates; and determining identity information of the persona target.
If the predicted completion time of the target edge device is longer than the preset shooting interval duration, that is, the earliest completion time is longer than the preset shooting interval duration, then the predicted completion time of all the edge devices is longer than the preset shooting interval duration, which means that the current load of the edge device group is higher, in order to process face image data transmitted by the unmanned aerial vehicle in time, in this case, the edge device is scheduled to guide the unmanned aerial vehicle group to unload soybean milk face image information to a remote cloud server, so that the cloud server is used as the target edge device in each embodiment, and the same steps are executed to determine the spatial position and identity information of the person target.
According to the target positioning and identifying method in the technical scheme, firstly, the unmanned aerial vehicle group is used for acquiring the face image information of the person targets, so that large-scale person identification can be performed in a crowd scene of an open environment, the tracking efficiency of the person targets can be effectively improved based on the flexibility of the unmanned aerial vehicle, the face image information of each person target at different angles can be acquired, the accuracy of face identification can be improved, the unmanned aerial vehicle also only transmits the face image information, all scene images in the visual angle range are not required to be transmitted, the transmitted data flow is reduced, the information transmission efficiency is improved, and the real-time performance of image processing is ensured. And then, determining the face image information acquired based on the unmanned aerial vehicle group at the moment as the current person target positioning and identification task scheduling to the target edge equipment by determining the target edge equipment with the earliest completion time in the edge equipment group and scheduling the face image information to the target edge equipment so as to acquire information such as the load state, the processing capacity and the like of each edge equipment in the edge equipment group in real time and select the target edge equipment with the current earliest completion time. Finally, the face image coordinates in the face image information are converted into a three-dimensional coordinate point set through the target edge equipment, and the face space position corresponding to the face image coordinates is determined according to the three-dimensional coordinate point set, so that the unmanned aerial vehicle can firstly determine various possible three-dimensional coordinate points, namely various possible face space coordinates, based on the two-dimensional face image coordinates obtained by a common camera even if the unmanned aerial vehicle is not provided with a special positioning device such as a high-cost depth camera or a laser radar, and then the actual face space coordinates of the human target can be determined by using a machine vision technology to cross search from a plurality of views by utilizing the face image information of a plurality of unmanned aerial vehicles based on the three-dimensional coordinate point set, and therefore, the face space coordinates, namely the positioning of the human target, are accurate, and the hardware cost of unmanned aerial vehicle positioning is greatly reduced. And extracting the face feature vectors in the face image information through the target edge equipment, merging the face feature vectors to determine the identity information of the person target, and merging the face feature vectors in the face image information with different face visual angles to obtain the relatively complete face feature vector of the person target, so that the comprehensive analysis of the face feature vectors is realized, and the identification accuracy of the identity information of the person target is greatly improved. In addition, the method for positioning and fusion recognition of the character targets is carried out in the target edge equipment with earliest finishing time, so that the load of the whole system is more balanced, and the efficiency and timeliness of character positioning and recognition are more excellent and stable.
In order to further enhance understanding of the above embodiments of the present invention, relevant descriptions and supplements are provided in connection with each practical application scenario in the present invention. Please refer to fig. 9 and 10. FIG. 9 is a schematic view of a scenario corresponding to dynamic scheduling and cross positioning involved in the target positioning and recognition method of the present invention; fig. 10 is a schematic diagram of a multi-view fusion face recognition scenario related to the target positioning and recognition method of the present invention.
As shown in fig. 9, the lower half of fig. 9 is called dynamic scheduling (DTSH) of heterogeneous devices, and relates to a process of selecting a currently optimal edge device, that is, a process of determining a target edge device, where the system includes an edge device and a cloud device (cloud server), and after the scheduled edge device acquires a device state, the scheduled edge device inputs the device state into a multivariate linear regression to output an expected completion time (predicted completion time), and the target device is determined by a device selector, where the target device may be the target edge device or the cloud server.
The upper half of fig. 9 is called multi-drone character cross positioning (MDPL), and each of the drones 1, 2, 3 detects and acquires the faces of three character targets (character 1, character 2, character 3) respectively. The multi-view cross positioning is carried out through the target equipment, and the face images of the same person target are aligned, so that the space coordinates (X 1 、Y 1 、Z 1 ) The spatial coordinates (X k 、Y k 、Z k ) The spatial coordinates (X K 、Y K 、Z K )。
As shown in fig. 10, fig. 10 relates to multi-view fusion face recognition (MVFI), in the multi-image feature fusion stage, the face image of the specific person k is exemplified by the face image 1, the face image 2 and the face image 3, and five-sense organ coordinates and resolution in the face image information are input into a weighted fusion network (FWN) to obtain fusion weights of the respective images, and after being combined with the face features extracted by the face feature extraction network, the features are fused at a fusion layer of the fusion weight network to obtain fusion features, and the fused fusion features are compared with the face features of the target person (attention person), so that the recognition result of the person target is finally output.
Furthermore, the invention also provides a computer readable storage medium. The computer readable storage medium of the present invention stores a target positioning and identifying program, wherein the steps of the target positioning and identifying method described above are implemented when the target positioning and identifying program is executed by a processor.
The method implemented when the target positioning and identifying program is executed may refer to various embodiments of the target positioning and identifying method of the present invention, which are not described herein.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory location that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory location produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the invention, and all equivalent structural changes made by the description of the present invention and the accompanying drawings or direct/indirect application in other related technical fields are included in the scope of the invention.

Claims (10)

1. The target positioning and identifying method is characterized in that the target positioning and identifying method is applied to a target positioning and identifying system; the target positioning and identifying system comprises an unmanned aerial vehicle group and an edge equipment group; the unmanned aerial vehicle group is in wireless communication connection with the edge equipment group;
the target positioning and identifying method comprises the following steps:
acquiring a plurality of face image information of a person target through the unmanned aerial vehicle group;
determining target edge equipment with earliest finishing time in the edge equipment group and scheduling each face image information to the target edge equipment;
Converting face image coordinates in the face image information into a three-dimensional coordinate point set through the target edge equipment, and determining a face space position corresponding to the face image coordinates according to the three-dimensional coordinate point set; and
and extracting face feature vectors in the face image information through the target edge equipment, and fusing the face feature vectors to determine identity information of the person target.
2. The method of claim 1, wherein the step of acquiring a plurality of face image information of a person object by the unmanned aerial vehicle group comprises:
acquiring a plurality of scene images in a visual angle range through the unmanned aerial vehicle group at intervals of preset shooting interval duration;
determining and acquiring a human body target image in the scene image based on a preset target detection model;
acquiring face image information in the human body target image based on a preset face detection model; the face image information characterizes a face image having face detection frames and face image coordinates.
3. The target positioning and identification method of claim 1, wherein the group of edge devices includes scheduled edge devices and non-scheduled edge devices;
The step of determining the target edge device of the earliest completion time in the edge device group includes:
transmitting a test file packet to each non-scheduling edge device through the scheduling edge device to determine the current transmission delay of the non-scheduling edge device; and
acquiring the current task queue length, the central processing unit performance parameter and the image processor performance parameter of the non-scheduling edge equipment;
inputting the transmission delay, the task queue length, the central processing unit performance parameter and the image processor performance parameter into a preset multivariable linear regression model to obtain the predicted completion time of the non-scheduling edge equipment;
comparing the predicted completion times of the respective non-scheduled edge devices to determine a target edge device of earliest completion times among the respective non-scheduled edge devices.
4. The method for locating and identifying a target as defined in claim 1, wherein the step of converting, by the target edge device, coordinates of a face image in the face image information into a set of three-dimensional coordinate points includes:
and inputting face image coordinates and preset height space limiting parameters in the face image information to a preset two-dimensional-three-dimensional conversion model through the target edge equipment so as to convert the face image coordinates into a three-dimensional coordinate point set with a plurality of three-dimensional coordinate points.
5. The method for locating and identifying a target according to claim 1, wherein the step of determining a face space position corresponding to the face image coordinates according to the three-dimensional coordinate point set includes:
traversing the three-dimensional coordinate points in the three-dimensional coordinate point set, and projecting the traversed three-dimensional coordinate points to each face image information;
judging whether the traversed three-dimensional coordinate points are projected in a face detection frame in all face image information or not;
and if the traversed three-dimensional coordinate points are in the face detection frames in all the face image information, determining the traversed three-dimensional coordinate points as the face space positions corresponding to the face image coordinates.
6. The method of claim 1, wherein the face image coordinates comprise: facial five-sense organ coordinate points; the step of fusing the face feature vectors to determine the identity information of the person target includes:
inputting the facial feature coordinate points and the resolution of each piece of facial image information to a preset fusion weight model to obtain feature fusion weights of each piece of facial image information;
Multiplying the face feature vector by the corresponding feature fusion weight to obtain a weighted face feature vector;
and adding the face feature vectors after weighting to determine the identity information of the person target.
7. The method of object localization and identification of claim 6 wherein the step of adding each of the weighted face feature vectors to determine identity information of the person object comprises:
adding the weighted face feature vectors to obtain a fused face feature vector, and inputting the fused face feature vector into a support vector machine to determine a corresponding person class;
and if the person category is a preset attention person, outputting the spatial position and the face image of the person target.
8. The target positioning and recognition method of claim 1, wherein the target positioning and recognition system further comprises a cloud server; the edge equipment is in wireless communication connection with the cloud server;
after the step of determining the target edge device of the earliest completion time in the edge device group, the method further includes:
If the prediction completion time of the target edge equipment is longer than the preset shooting interval duration, sending the face image information to the cloud server; the cloud server is used for determining a face space position corresponding to the face image coordinates; and determining identity information of the persona target.
9. An object locating and identifying device, characterized in that it comprises a processor, a memory unit, and an object locating and identifying program stored on the memory unit that is executable by the processor, wherein the object locating and identifying program, when executed by the processor, implements the steps of the object locating and identifying method according to any of claims 1 to 8.
10. A computer readable storage medium, wherein a target positioning and identification program is stored on the computer readable storage medium, wherein the target positioning and identification program, when executed by a processor, implements the steps of the target positioning and identification method according to any of claims 1 to 8.
CN202310266610.5A 2023-03-13 2023-03-13 Target positioning and identifying method, device and readable storage medium Pending CN116469142A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310266610.5A CN116469142A (en) 2023-03-13 2023-03-13 Target positioning and identifying method, device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310266610.5A CN116469142A (en) 2023-03-13 2023-03-13 Target positioning and identifying method, device and readable storage medium

Publications (1)

Publication Number Publication Date
CN116469142A true CN116469142A (en) 2023-07-21

Family

ID=87177957

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310266610.5A Pending CN116469142A (en) 2023-03-13 2023-03-13 Target positioning and identifying method, device and readable storage medium

Country Status (1)

Country Link
CN (1) CN116469142A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117576764A (en) * 2024-01-15 2024-02-20 四川大学 Video irrelevant person automatic identification method based on multi-target tracking

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117576764A (en) * 2024-01-15 2024-02-20 四川大学 Video irrelevant person automatic identification method based on multi-target tracking
CN117576764B (en) * 2024-01-15 2024-04-16 四川大学 Video irrelevant person automatic identification method based on multi-target tracking

Similar Documents

Publication Publication Date Title
Geraldes et al. UAV-based situational awareness system using deep learning
Boudjit et al. Human detection based on deep learning YOLO-v2 for real-time UAV applications
CN106296812B (en) It is synchronous to position and build drawing method
Walter et al. On training datasets for machine learning-based visual relative localization of micro-scale UAVs
CN113377888B (en) Method for training object detection model and detection object
CN112070782B (en) Method, device, computer readable medium and electronic equipment for identifying scene contour
CN106973221A (en) Unmanned plane image capture method and system based on aesthetic evaluation
CN112927264B (en) Unmanned aerial vehicle tracking shooting system and RGBD tracking method thereof
Núnez et al. Real-time human body tracking based on data fusion from multiple RGB-D sensors
Devo et al. Autonomous single-image drone exploration with deep reinforcement learning and mixed reality
CN116469142A (en) Target positioning and identifying method, device and readable storage medium
Valenti et al. An autonomous flyer photographer
CN109885091B (en) Unmanned aerial vehicle autonomous flight control method and system
Pu et al. Aerial face recognition and absolute distance estimation using drone and deep learning
CN111611869B (en) End-to-end monocular vision obstacle avoidance method based on serial deep neural network
CN109544584A (en) It is a kind of to realize inspection surely as the method and system of precision measure
Basiri et al. Enhanced V-SLAM combining SVO and ORB-SLAM2, with reduced computational complexity, to improve autonomous indoor mini-drone navigation under varying conditions
Duan et al. Image digital zoom based single target apriltag recognition algorithm in large scale changes on the distance
CN110400333A (en) Coach's formula binocular stereo vision device and High Precision Stereo visual pattern acquisition methods
Ayalew et al. A review on object detection from unmanned aerial vehicle using CNN
CN114047783A (en) Unmanned aerial vehicle system and unmanned aerial vehicle simulation system
Maltezos et al. Preliminary design of a multipurpose UAV situational awareness platform based on novel computer vision and machine learning techniques
Zhang et al. Image recognition of supermarket shopping robot based on CNN
Wang et al. Aprus: An Airborne Altitude-Adaptive Purpose-Related UAV System for Object Detection
Li et al. Object recognition through UAV observations based on YOLO and generative adversarial network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination