CN115546516A

CN115546516A - Personnel gathering method and device, computer equipment and storage medium

Info

Publication number: CN115546516A
Application number: CN202211200033.1A
Authority: CN
Inventors: 不公告发明人
Original assignee: Beijing Real AI Technology Co Ltd
Current assignee: Beijing Real AI Technology Co Ltd
Priority date: 2022-09-29
Filing date: 2022-09-29
Publication date: 2022-12-30

Abstract

The embodiment of the application discloses a personnel document gathering method and device, computer equipment and a storage medium. The method comprises the following steps: acquiring a first person profile and a second person profile, wherein the first person profile comprises a first key frame set, the first key frame set comprises first key frames of at least one image feature type, the second person profile comprises a second key frame set, and the second key frame set comprises second key frames of at least one image feature type; performing archive similarity calculation on the first key frame set and the second key frame set according to the image feature type of the first key frame and the image feature type of the second key frame to obtain a target similarity value; and if the target similarity value is larger than or equal to a preset threshold value, performing file aggregation processing on the first person file and the second person file to obtain a third person file. By implementing the method of the embodiment of the application, the file gathering effect of personnel can be improved.

Description

Personnel gathering method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for gathering documents for people, a computer device, and a storage medium.

Background

With the continuous development of security monitoring, more and more face cameras and security cameras are applied to daily life, tracking of cross-time and cross-region tracks of people in snapshot becomes an important subject in the security field, and therefore a personnel document gathering technology is provided, and a personnel document gathering finger gathers a large number of face photos according to individuals.

The key point of the personnel document gathering is how to judge whether two documents belong to the same person, and if the two documents belong to the same person, the two documents are combined into the same document.

In the prior art, only face snapshot is used when gathering files, only one face snapshot is used as a key frame of a file when gathering files, a large number of human body photos generated by a security camera are not utilized, so that images of the file gathering of personnel are not abundant enough, and only one face snapshot is used as a key frame of the file, partial characteristic information of the file personnel can be lost, the accuracy and recall rate of the file gathering result are reduced, the situation that one person has multiple files possibly occurs, and the file gathering effect is still to be improved.

Disclosure of Invention

The embodiment of the application provides a method and a device for gathering files for personnel, computer equipment and a storage medium, and the effect of gathering files for personnel can be improved.

In a first aspect, an embodiment of the present application provides a method for gathering documents by people, which includes:

acquiring a first person profile and a second person profile, wherein the first person profile comprises a first key frame set, the first key frame set comprises first key frames of at least one image feature type, the second person profile comprises a second key frame set, and the second key frame set comprises second key frames of at least one image feature type;

performing archive similarity calculation on the first key frame set and the second key frame set according to the image feature type of the first key frame and the image feature type of the second key frame to obtain a target similarity value;

and if the target similarity value is larger than or equal to a preset threshold value, performing file aggregation processing on the first personnel file and the second personnel file to obtain a third personnel file.

In a second aspect, an embodiment of the present application further provides a personnel gathering device, which includes:

a transceiving module, configured to obtain a first person profile and a second person profile, where the first person profile includes a first keyframe set, the first keyframe set includes first keyframes of at least one image feature type, the second person profile includes a second keyframe set, and the second keyframe set includes second keyframes of at least one image feature type;

the processing module is used for calculating the file similarity of the first key frame set and the second key frame set according to the image feature type of the first key frame and the image feature type of the second key frame to obtain a target similarity value; and if the target similarity value is larger than or equal to a preset threshold value, performing file aggregation processing on the first person file and the second person file to obtain a third person file.

In some embodiments, when the processing module performs the step of calculating the archival similarity of the first key frame set and the second key frame set according to the image feature type of the first key frame and the image feature type of the second key frame to obtain the target similarity value, the processing module is specifically configured to:

determining a first target key frame and a second target key frame according to a preset corresponding relation between image feature types and priorities, wherein the first target key frame is a key frame which is not subjected to similarity calculation in the first key frame set and has the current highest image feature type priority in the first key frame, the second target key frame is a key frame which is not subjected to similarity calculation in the second key frame set and has the current highest image feature type priority in the second key frame, and the image feature types of the first target key frame and the second target key frame are the same;

performing archive similarity calculation on the first target key frame and the second target key frame to obtain a candidate similarity value;

and determining the target similarity value according to the comparison result of the candidate similarity value and the preset threshold value.

In some embodiments, when the step of determining the target similarity value according to the comparison result between the candidate similarity value and the preset threshold is executed by the processing module, the processing module is specifically configured to:

if the comparison result is that the candidate similarity value is greater than or equal to the preset threshold, determining the candidate similarity value as the target similarity value;

if the comparison result is that the candidate similarity value is smaller than the preset threshold value, determining whether other key frames which are not subjected to similarity calculation exist in the first key frame set and the second key frame set;

if other key frames which are not subjected to similarity calculation exist, returning to the step of executing the corresponding relation between the preset image feature type and the priority and determining a first target key frame and a second target key frame;

and if no other key frame which is not subjected to similarity calculation exists, determining the candidate similarity as the target similarity value.

In some embodiments, after performing the step of aggregating the first person profile and the second person profile to obtain a third person profile, the processing module is further configured to:

determining candidate key frames of the image feature types in the third personnel file according to the first key frames of the image feature types and the second key frames of the image feature types;

for the candidate key frames of each image feature type, if the number of the candidate key frames is less than or equal to a preset key frame number threshold, determining the candidate key frames as third key frames of the corresponding image feature types;

if the number of the candidate key frames is larger than the key frame number threshold, selecting a target number of candidate key frames from the candidate key frames as the third key frame corresponding to the image feature type, wherein the target number corresponds to the key frame number threshold.

In some embodiments, when the step of selecting a target number of candidate keyframes from the candidate keyframes as the third keyframe of the corresponding image feature type is executed by the processing module, the processing module is specifically configured to:

for each candidate key frame, determining the sum of the similarity of the candidate key frame and a plurality of target person images, wherein the target person images are person images in the third person file, and the feature types of the person images are the same as those of the candidate key frame images;

and selecting the candidate key frames with the maximum similarity sum of the target number from the candidate key frames as the third key frames of the corresponding image feature types according to the similarity sum of the candidate key frames.

and generating a staff track corresponding to the third staff file according to the spatio-temporal information of each staff image in the third staff file.

In some embodiments, the at least one image feature type comprises at least one of a full face image type, a respirator face image type, and a human image type.

In a third aspect, an embodiment of the present application further provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the above method when executing the computer program.

In a fourth aspect, the present application also provides a computer-readable storage medium, in which a computer program is stored, the computer program including program instructions, which when executed by a processor, implement the above method.

Compared with the prior art, in the embodiment of the application, because first personnel's archives and second personnel's archives can include the key frame of multiple image characteristic type, the key frame of different image type can represent the different states of archives personnel or the characteristic of different health positions, when gathering the shelves, combine the key frame of multiple image characteristic type to gather the shelves, can reduce losing of archives personnel's characteristic, make and gather the shelves more comprehensive, and then improve the rate of accuracy and the recall rate of gathering the shelves result, the condition that reduces one person and many shelves appears, improve personnel and gather the shelves effect.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic view of an application scenario of a person document gathering method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a method for gathering documents for people according to an embodiment of the present application;

FIG. 3 is a sub-flowchart of a method for gathering documents for a person according to an embodiment of the present application;

FIG. 4 is a flowchart illustrating a method for gathering documents by a person according to another embodiment of the present disclosure;

FIG. 5 is a schematic block diagram of a person gathering device provided in an embodiment of the present application;

FIG. 6 is a schematic diagram of a server according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a terminal in the embodiment of the present application;

fig. 8 is a schematic structural diagram of a server in the embodiment of the present application.

Detailed Description

The terms "first," "second," and the like in the description and in the claims of the embodiments of the application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "include" and "have", and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that includes a list of steps or modules is not necessarily limited to those explicitly listed, but may include other steps or modules not explicitly listed or inherent to such process, method, article, or apparatus, such that partitioning of the modules as presented in an embodiment of the present application is merely a logical partitioning, and may be implemented in practice in other ways, such that multiple modules may be combined or integrated into another system, or some features may be omitted, or not implemented, and such that shown or discussed couplings or direct couplings or communicative connections between modules may be through interfaces, and such that indirect couplings or communicative connections between modules may be electrical or other similar forms, none of which are limiting in the present embodiment. Moreover, the modules or sub-modules described as separate components may or may not be physically separated, may or may not be physical modules, or may be distributed in a plurality of circuit modules, and some or all of the modules may be selected according to actual needs to achieve the purpose of the embodiments of the present application.

The embodiment of the application provides a method, a device, a computer device and a storage medium for gathering files for people, wherein an execution main body of the method for gathering files for people can be the device for gathering files for people provided by the embodiment of the application or the computer device integrated with the device for gathering files for people, the device for gathering files for people can be realized in a hardware or software mode, and the computer device can be a terminal or a server.

When the computer device is a server, the server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, middleware service, a domain name service, a security service, a CDN, a big data and artificial intelligence platform, and the like.

When the computer device is a terminal, the terminal may include: smart terminals carrying multimedia data processing functions (e.g., video data playing function, music data playing function), such as a smart phone, a tablet pc, a notebook pc, a desktop pc, a smart tv, a smart speaker, a Personal Digital Assistant (PDA), a desktop pc, and a smart watch, but are not limited thereto.

The scheme of the embodiment of the application can be realized based on an artificial intelligence technology, and particularly relates to the technical field of computer vision in the artificial intelligence technology and the fields of cloud computing, cloud storage, databases and the like in the cloud technology, which are respectively introduced below.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject, and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, people gathering, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common people gathering, fingerprint recognition, and other biometric technologies.

With the research and development of artificial intelligence technology, the artificial intelligence technology is developed and researched in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical services, smart customer service and the like.

The scheme of the embodiment of the application can be realized based on a cloud technology, and particularly relates to the technical fields of cloud computing, cloud storage, databases and the like in the cloud technology, which are respectively introduced below.

Cloud technology refers to a hosting technology for unifying serial resources such as hardware, software, network and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data. Cloud technology (Cloud technology) is based on a general term of network technology, information technology, integration technology, management platform technology, application technology and the like applied in a Cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, image-like websites and more portal websites. With the high development and application of the internet industry, each article may have its own identification mark and needs to be transmitted to a background system for logic processing, data in different levels are processed separately, and various industrial data need strong system background support and can only be realized through cloud computing. According to the embodiment of the application, the identification result can be stored through a cloud technology.

A distributed cloud storage system (hereinafter, referred to as a storage system) refers to a storage system that integrates a large number of storage devices (storage devices are also referred to as storage nodes) of different types in a network through application software or application interfaces to cooperatively work by using functions such as cluster application, grid technology, and a distributed storage file system, and provides a data storage function and a service access function to the outside. In the embodiment of the application, information such as network configuration and the like can be stored in the storage system, so that the server can conveniently call the information.

At present, a storage method of a storage system is as follows: logical volumes are created, and when created, each logical volume is allocated physical storage space, which may be the disk composition of a certain storage device or of several storage devices. The client stores data on a certain logical volume, that is, stores the data on a file system, the file system divides the data into a plurality of parts, each part is an object, the object includes not only the data but also additional information such as data identification (ID, ID entry), the file system writes each object into a physical storage space of the logical volume, and the file system records storage location information of each object, so that when the client requests to access the data, the file system can allow the client to access the data according to the storage location information of each object.

The process of allocating physical storage space for the logical volume by the storage system specifically includes: physical storage space is divided in advance into stripes according to a group of capacity measures of objects stored in a logical volume (the measures often have a large margin with respect to the capacity of the actual objects to be stored) and Redundant Array of Independent Disks (RAID), and one logical volume can be understood as one stripe, thereby allocating physical storage space to the logical volume.

The Database (Database), which can be regarded as an electronic file cabinet in short, is a place for storing electronic files, and a user can add, query, update, delete, etc. data in the files. A "database" is a collection of data that is stored together in a manner that can be shared by multiple users, has as little redundancy as possible, and is independent of the application.

A Database Management System (DBMS) is a computer software System designed for managing a Database, and generally has basic functions of storage, interception, security assurance, backup, and the like. The database management system can make classification according to the database model supported by it, such as relational expression, XML (Extensible Markup Language); or classified according to the type of computer supported, e.g., server cluster, mobile phone; or classified according to the Query Language used, such as SQL (Structured Query Language), XQuery; or by performance impulse emphasis, e.g., maximum size, maximum operating speed; or other classification schemes. Regardless of the manner of classification used, some DBMSs are capable of supporting multiple query languages, for example, simultaneously, across classes. In the embodiment of the application, the identification result can be stored in the database management system, so that the server can conveniently call the identification result.

It should be noted that, in particular, the terminal according to the embodiments of the present application may be a device providing voice and/or data connectivity to a service terminal, a handheld device having a wireless connection function, or another processing device connected to a wireless modem. Such as mobile telephones (or "cellular" telephones) and computers with mobile terminals, such as mobile devices that may be portable, pocket, hand-held, computer-included, or vehicle-mounted, that exchange voice and/or data with a radio access network. For example, personal Communication Service (PCS) phones, cordless phones, session Initiation Protocol (SIP) phones, wireless Local Loop (WLL) stations, personal Digital Assistants (PDA) and other devices.

In some embodiments, the present embodiment can be applied to a personnel archives system 1 as shown in fig. 1, where the personnel archives system 1 includes a server 10 and at least one image capture device 20, the image capture device 20 sends a first collected personnel file to the server 10, where the personnel file includes one image or multiple collected images, and then the server 10 performs a personnel archives process on the first personnel file and a second personnel file in a memory, where the specific personnel archives method includes: the server 10 obtains a first person profile and a second person profile, the first person profile comprising a first set of keyframes, the first set of keyframes comprising first keyframes of at least one image feature type, the second person profile comprising a second set of keyframes, the second set of keyframes comprising second keyframes of at least one image feature type; performing archive similarity calculation on the first key frame set and the second key frame set according to the image feature type of the first key frame and the image feature type of the second key frame to obtain a target similarity value; and if the target similarity value is larger than or equal to a preset threshold value, performing file aggregation processing on the first person file and the second person file to obtain a third person file.

Wherein, in some embodiments, the first person profile and the second person profile are both person profiles that are sent to the server 10 by one or more image capture devices 20; in other embodiments, the first person profile may be a person profile transmitted to the server 10 by the image capturing device 20, and the second person profile is a person profile stored inside the server 10 and required to be aggregated with the first person profile; in still other embodiments, the first person profile and the second person profile are both stored inside the server 10, and two person profiles currently need to be aggregated; in this embodiment, the first person profile and the second person profile include one image or a plurality of post-archive-gathering images, and in this embodiment, the first person profile and the second person profile are two profiles that need to be archived currently, and a specific acquisition path of the first person profile and the second person profile is not limited here.

The technical solution of the present application will be described in detail with reference to several embodiments.

Referring to fig. 2, a method for gathering documents for people provided in an embodiment of the present application is described below, where the embodiment of the present application includes:

201. the method comprises the steps of obtaining a first person profile and a second person profile, wherein the first person profile comprises a first key frame set, the first key frame set comprises first key frames of at least one image feature type, the second person profile comprises a second key frame set, and the second key frame set comprises second key frames of at least one image feature type.

It should be noted that the key frames (including the first key frame and the second key frame) in this embodiment are the person images with better quality selected from the corresponding person files and capable of representing the files, and when two person files need to be subjected to archive aggregation processing, it is only necessary to determine whether the two files can be subjected to archive aggregation according to the key frames of the two person files, so as to reduce the calculation amount of archive aggregation.

The image acquisition device in this embodiment includes a face camera and a security camera and other various cameras, the face camera is used to capture a face image of a person, the security camera can capture a body image of the person, the face camera and the security camera can be disposed at the same position or at different positions, and specific details are not limited herein.

In some embodiments, in order to improve the quality of a document-gathering image, the server in this embodiment may perform filtering processing on an acquired image before performing document gathering on the image, and filter out an image that fails (for example, has a relatively low pixel ratio) in advance, or the image acquisition device may first determine the quality of the acquired image before sending the image to the server, and if the quality fails, discard the image, and do not send the image to the server.

In this embodiment, the at least one image feature type includes at least one of a complete face image type, a mask-worn face image type, and a human body image type. At this time, the first person file and the second person file include images of at least one image feature type of a complete face image type, a mask face image type, and a human body image type.

If the acquired image comprises a face image and a human body image of a person, and if the face (a complete face image or a mask-wearing face image) in the image is clear, dividing the image characteristic type of the image into a complete face image type or a mask-wearing face image type according to whether the face wears a mask or not; and if the human face image in the image is not clear and the human body image is clear, determining the image feature type of the image as the human body image type.

202. And performing archive similarity calculation on the first key frame set and the second key frame set according to the image feature type of the first key frame and the image feature type of the second key frame to obtain a target similarity value.

In some embodiments, when the first person profile only includes one target person image, the step of calculating the target similarity is as follows:

at this time, the target person image is a first key frame corresponding to the first person archive, a target image feature type of the first key frame is determined first, then a second key frame corresponding to the target image feature type is extracted from the second key frame set, then archive similarity calculation is performed on the first key frame and the second key frame, specifically, if a plurality of second key frames corresponding to the target image feature types exist, similarity calculation needs to be performed on the plurality of second key frames respectively, then a similarity average value is calculated, and the similarity average value is determined as a target similarity value.

It should be noted that, at the initial stage of the document aggregation of the second person document, the second person document may lack a second key frame corresponding to the target image feature type, and at this time, when determining whether the target person image belongs to an image in the image of the second key frame, it is necessary to determine whether an associated feature corresponding to the target person image exists in the second person document according to the existing feature association technology, if so, determine whether the target person image belongs to the second person document according to the associated feature, if so, add the target person image to the second person document, and determine the target person image as a key frame of the target image feature type of the second person document.

For example, if the target image feature type of the target person image is a human body image type and the second person file does not collect images of human body image types, then, whether a complete image (for example, a complete face image including a human body image or a mask-worn face image) close to the time-space information of the target person image exists in the second person file is further detected, then, similarity calculation is performed between the target person image and the human body image in the complete image to obtain a target similarity value, and finally, whether the target person image belongs to the second person file is determined according to the target similarity value.

In some embodiments, in order to avoid that the image feature type corresponding to the image feature type of the first key frame cannot be found in the second person profile, in this embodiment, the user may set at least one second key frame in the second person profile in advance for each image feature type, where the second key frame set includes second key frames of multiple image feature types.

In some embodiments, if the target image feature type of the target person image is the complete face image type, the comparison with the key frame of the complete face image type in the second person file is performed preferentially, if the comparison fails, the comparison with the key frame of the mask face image type in the second person file is performed further, and if the comparison fails, it is determined that the target person image does not belong to the second person file, otherwise, the target person image belongs to the second person file. Similarly, if the target image feature type of the target person image is the mask wearing face image type, at this time, the target person image is preferentially compared with the key frame of the mask wearing face image type in the second person file, if the comparison fails, the target person image is further compared with the key frame of the complete face image type in the second person file, if the comparison fails, the target person image does not belong to the second person file, and if the target image feature type of the target person image is the human body image type, at this time, the target person image can only be compared with the key frame of the human body image type in the second person file.

In some embodiments, when the first person profile includes a plurality of images and the first person profile includes a plurality of image feature types, the step of calculating the target similarity is as follows:

specifically, in some embodiments, the server sets different priorities for different image feature types, and the present scheme preferentially performs the file similarity calculation on the key frames of the image feature types with the higher priorities, at this time, please refer to fig. 3, and the specific steps of performing the file similarity calculation on the first key frame set and the second key frame set are as follows:

2021. and determining a first target key frame and a second target key frame according to the corresponding relation between the preset image feature type and the priority.

Specifically, in the present embodiment, the priority of the full face image type is higher than that of the mask wearing face image type, and the priority of the mask wearing face image type is higher than that of the human body image type.

The first target key frame is a key frame which is not subjected to similarity calculation in the first key frame set and has the highest image feature type priority currently, the second target key frame is a key frame which is not subjected to similarity calculation in the second key frame set and has the highest image feature type priority currently, and the first target key frame and the second target key frame have the same image feature type.

2022. And performing archive similarity calculation on the first target key frame and the second target key frame to obtain a candidate similarity value.

For example, a first face key frame of the complete face image type is acquired from the first key frame set, a second face key frame of the complete face image type is acquired from the second key frame set, and then archive similarity calculation is performed on the first face key frame and the second face key frame to obtain a candidate similarity value.

2023. And determining the target similarity value according to the comparison result of the candidate similarity value and the preset threshold value.

Specifically, if the comparison result is that the candidate similarity value is greater than or equal to the preset threshold, determining the candidate similarity value as the target similarity value; if the comparison result is that the candidate similarity value is smaller than the preset threshold value, determining whether other key frames which are not subjected to similarity calculation exist in the first key frame set and the second key frame set; if other key frames which are not subjected to similarity calculation exist, returning to execute the step 2021; and if no other key frame which is not subjected to similarity calculation exists, determining the candidate similarity as the target similarity value.

For example, in the present embodiment, a first face keyframe of a complete face image type is first acquired from a first keyframe set, a second face keyframe of the complete face image type is acquired from a second keyframe set, then archive similarity calculation is performed on the first face keyframe and the second face keyframe to obtain a candidate similarity value, then it is determined whether the candidate similarity value is greater than or equal to the preset threshold, if so, the candidate similarity value is determined as a target similarity value, if not, according to a preset priority order, an uncomputed image feature type is further acquired, in the present embodiment, it is further determined whether the candidate similarity value is greater than or equal to the preset threshold, if so, the candidate similarity value is determined as a target similarity, if not, archive similarity calculation is performed on the first mask keyframe and the second mask keyframe to obtain a candidate similarity value, then it is determined whether the candidate similarity value is greater than or equal to the preset threshold, if not, then the candidate similarity value is obtained according to the preset priority order, the first face keyframe and the second face keyframe are further calculated as a priority order, then the first face keyframe and the second keyframe are directly calculated as the second keyframe type, then the second keyframe of the candidate similarity value is directly calculated as the first keyframe from the first keyframe, and the second keyframe of the second keyframe set, then the candidate similarity value is obtained as the second keyframe of the candidate similarity value.

203. And if the target similarity value is larger than or equal to a preset threshold value, performing file aggregation processing on the first personnel file and the second personnel file to obtain a third personnel file.

In this embodiment, if the target similarity value is greater than or equal to the preset threshold, it indicates that the first person file and the second person file belong to the same person file, and at this time, the first person file and the second person file are subjected to archive aggregation processing to obtain a third person file.

If the target similarity value is smaller than the preset threshold value, it is determined that the first person file and the second person file do not belong to the same person file, and the first person file and the second person file do not need to be aggregated.

If the first person file is a single person image obtained from the image acquisition device (for convenience of reading, when the first person file is a single person image, the first person file is hereinafter referred to as a target person image), for example, a single person of the image acquisition device is photographed by a single person, and the second person file is a file of a person maintained in the server.

Similarly, if the first person profile includes multiple aggregated images, if no joinable profile is found in the server, then a new person profile is created based on the first person profile.

Further, in this embodiment, top K key frames are dynamically maintained for each image feature type key frame, where a value of Top K may be 2, or may also be other values, for example, 5, and a specific value is not limited here. At this time, in some embodiments, after the archives of the first person and the second person are aggregated to obtain a third person archives, the method further includes:

a. and determining candidate key frames of the image feature types in the third person file according to the first key frame of each image feature type and the second key frame of each image feature type.

For example, in the first person profile, there are 2 first keyframes of the complete face image type, 2 first keyframes of the mask-wearing face image type, and 2 first keyframes of the human body image type; in the second personnel file, 2 second key frames of the complete face image type, 2 second key frames of the mask-wearing face image type and 2 second key frames of the human body image type are provided; at this time, there are 4 candidate key frames of the full face image type, 4 candidate key frames of the mask-worn face image type, and 4 candidate key frames of the human body image type in the third person profile.

b. And for the candidate key frames of each image feature type, if the number of the candidate key frames is less than or equal to a preset key frame number threshold, determining the candidate key frames as third key frames of the corresponding image feature types.

For example, for candidate key frames of the complete face image type, if the number of the candidate key frames of the complete face image type is less than or equal to the key frame number threshold, the candidate key frames do not need to be screened, and the obtained candidate key frames are directly determined as third key frames of the corresponding image feature types.

In this embodiment, the threshold of the number of key frames is a maximum value of each type of key frame that needs to be maintained, and at this time, the threshold of the number of key frames is Top K. If the value of Top K is 2, at this time, if the number of the obtained candidate keyframes is less than or equal to 2, at this time, the obtained candidate keyframes are directly determined as the third keyframe corresponding to the image feature type.

c. And if the number of the candidate key frames is larger than the threshold value of the number of the key frames, selecting a target number of candidate key frames from the candidate key frames as the third key frames corresponding to the image feature types, wherein the target number corresponds to the threshold value of the number of the key frames.

For example, for candidate key frames of the complete face image type, if the number of candidate key frames of the complete face image type is greater than the key frame number threshold, the candidate key frames need to be further filtered.

In some embodiments, the specific screening method is as follows:

for each candidate key frame, determining the sum of the similarity of the candidate key frame and a plurality of target person images, wherein the target person images are person images in the third person file, and the feature types of the person images are the same as those of the candidate key frame images; and then according to the sum of the similarities of the candidate key frames, selecting the candidate key frames with the largest similarity sum of the target number from the candidate key frames as the third key frame of the corresponding image feature type.

For example, for candidate key frames of the complete face image type, if the number of the candidate key frames is 4 and the threshold of the number of the key frames is 2, at this time, 2 third key frames of the third person archive need to be selected from the 4 candidate key frames, and the specific steps are to respectively determine the sum of the similarity of each candidate key frame and the person image of the complete face image type in the third person archive, obtain the sum of the similarity corresponding to each candidate key frame, and then select 2 candidate key frames with the largest sum of the similarities as the third key frames. And further realize the dynamic maintenance of the number of key frames.

The screening of candidate key frames for the wearing mask face image type and the human body image type is similar to the screening of candidate key frames for the complete face image type, and details are not repeated here.

In some embodiments, after the archives of the first person and the second person are aggregated to obtain a third person archives, the method further includes: and generating a staff track corresponding to the third staff file according to the spatio-temporal information of each staff image in the third staff file.

Therefore, the personnel track can be generated according to the data after the file aggregation, and the embodiment has high recall rate of the images because the third personnel file contains a plurality of types of image characteristic types, so that the generated personnel track has high precision.

In some embodiments, in order to further understand the method for gathering documents for people provided in this embodiment, please refer to fig. 4, the image acquisition device sends a frame of complete face capture photo to the server, at this time, the server performs similarity calculation with a second person document in the server according to the frame of complete face capture photo as a first person document, at this time, K in Top K takes 2, where the color-deepened partial image in fig. 4 is a key frame image, first determines that the image feature type of the frame of complete face capture photo is a complete face image type, then performs document similarity calculation on the frame of complete face capture photo and a key frame of the complete face image type in the second document to obtain a target similarity result, and if the target similarity result is greater than or equal to a preset threshold, puts the frame of complete face capture photo into the second document to obtain a third document, and dynamically maintains a Top K frame key frame.

To sum up, in the embodiment of the application, because first personnel's archives and second personnel's archives can include the key frame of multiple image characteristic type, the key frame of different image type can represent the different states of archives personnel or the characteristic of different health positions, when gathering the shelves, combine the key frame of multiple image characteristic type to gather the shelves, can reduce losing of archives personnel ' characteristic, it is more comprehensive to make to gather the shelves, and then improve the rate of accuracy and the recall rate of gathering the shelves result, the condition that reduces one person and many shelves appears, it gathers the shelves effect to improve personnel.

Fig. 5 is a schematic block diagram of a person gathering device according to an embodiment of the present application. As shown in fig. 5, the present application also provides a personnel document gathering device corresponding to the above personnel document gathering method. The personnel arching device comprises a unit for executing the personnel arching method, and the device can be configured in a terminal or a server. Specifically, referring to fig. 5, the personnel document gathering device 500 includes a transceiver module 501 and a processing module 502, wherein:

a transceiver module 501, configured to obtain a first person profile and a second person profile, where the first person profile includes a first keyframe set, the first keyframe set includes first keyframes of at least one image feature type, the second person profile includes a second keyframe set, and the second keyframe set includes second keyframes of at least one image feature type;

a processing module 502, configured to perform archive similarity calculation on the first key frame set and the second key frame set according to the image feature type of the first key frame and the image feature type of the second key frame, so as to obtain a target similarity value; and if the target similarity value is larger than or equal to a preset threshold value, performing file aggregation processing on the first personnel file and the second personnel file to obtain a third personnel file.

In some embodiments, when the processing module 502 performs the step of calculating the archival similarity of the first key frame set and the second key frame set according to the image feature type of the first key frame and the image feature type of the second key frame to obtain the target similarity value, the processing module is specifically configured to:

determining a first target key frame and a second target key frame according to a preset corresponding relation between an image feature type and a priority, wherein the first target key frame is a key frame which is not subjected to similarity calculation in the first key frame set and has the highest image feature type priority currently, the second target key frame is a key frame which is not subjected to similarity calculation in the second key frame set and has the highest image feature type priority currently, and the first target key frame and the second target key frame have the same image feature type;

In some embodiments, when the step of determining the target similarity value according to the comparison result between the candidate similarity value and the preset threshold is executed by the processing module 502, the processing module is specifically configured to:

if the candidate similarity value is larger than or equal to the preset threshold value according to the comparison result, determining the candidate similarity value as the target similarity value;

In some embodiments, the processing module 502, after performing the step of aggregating the first person profile and the second person profile to obtain a third person profile, is further configured to:

and if the number of the candidate key frames is larger than the threshold value of the number of the key frames, selecting a target number of candidate key frames from the candidate key frames as the third key frames corresponding to the image feature types, wherein the target number corresponds to the threshold value of the number of the key frames.

In some embodiments, when the step of selecting a target number of candidate keyframes from the candidate keyframes as the third keyframes according to the corresponding image feature types is executed by the processing module 502, the processing module is specifically configured to:

and selecting the candidate key frames with the maximum similarity sum of the target number from the candidate key frames as the third key frame of the corresponding image feature type according to the similarity sum of the candidate key frames.

It should be noted that, as can be clearly understood by those skilled in the art, the specific implementation processes of the above-mentioned document gathering device and each unit may refer to the corresponding descriptions in the foregoing method embodiments, and for convenience and conciseness of description, detailed descriptions are omitted here.

The above describes the personnel document gathering device in the embodiment of the present application from the perspective of the modular functional entity, and the following describes the personnel document gathering device in the embodiment of the present application from the perspective of hardware processing.

It should be noted that, in the embodiments (including the embodiments shown in fig. 5) of the present application, all the entity devices corresponding to the transceiver modules may be transceivers, and all the entity devices corresponding to the processing modules may be processors. When one of the devices has the structure as shown in fig. 5, the processor, the transceiver and the memory implement the same or similar functions of the transceiver module and the processing module provided in the device embodiment corresponding to the device, and the memory in fig. 6 stores the computer program that the processor needs to call when executing the above-mentioned personnel document gathering method.

The apparatus shown in fig. 5 may have a structure as shown in fig. 6, when the apparatus shown in fig. 5 has a structure as shown in fig. 6, the processor in fig. 6 can implement the same or similar functions of the processing module provided by the apparatus embodiment corresponding to the apparatus, the transceiver in fig. 6 can implement the same or similar functions of the transceiver module provided by the apparatus embodiment corresponding to the apparatus, and the memory in fig. 6 stores a computer program that needs to be called when the processor executes the above-mentioned personnel gathering method. In this application, in the embodiment shown in fig. 5, the entity device corresponding to the transceiver module may be an input/output interface, and the entity device corresponding to the processing module may be a processor.

As shown in fig. 7, for convenience of description, only the parts related to the embodiments of the present application are shown, and details of the specific technology are not disclosed, please refer to the method part of the embodiments of the present application. The terminal device may be any terminal device including a mobile phone, a tablet computer, a Personal digital assistant (PDA, for short, in the english term), a Point of sale terminal (POS, for short, in the english term), a vehicle-mounted computer, and the terminal is taken as the mobile phone:

fig. 7 is a block diagram illustrating a partial structure of a mobile phone related to a terminal device provided in an embodiment of the present application. Referring to fig. 7, the handset includes: radio Frequency (RF) circuit 610, memory 620, input unit 630, display unit 640, sensor 650, audio circuit 660, wireless fidelity (Wi-Fi) module 670, processor 680, and power supply 690. Those skilled in the art will appreciate that the handset configuration shown in fig. 7 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

The following describes each component of the mobile phone in detail with reference to fig. 7:

the RF circuit 610 may be used for receiving and transmitting signals during information transmission and reception or during a call, and in particular, receives downlink information of a base station and then processes the received downlink information to the processor 680; in addition, the data for designing uplink is transmitted to the base station. In general, RF circuit 610 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, the RF circuitry 610 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to Global System for mobile communication (GSM), general Packet Radio Service (GPRS), code Division Multiple Access (CDMA), wideband Code Division Multiple Access (WCDMA), long Term Evolution (LTE), email, short Message Service (SMS), etc.

The memory 620 may be used to store software programs and modules, and the processor 680 may execute various functional applications and data processing of the mobile phone by operating the software programs and modules stored in the memory 620. The memory 620 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 620 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The input unit 630 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the cellular phone. Specifically, the input unit 630 may include a touch panel 631 and other input devices 632. The touch panel 631, also referred to as a touch screen, may collect touch operations of a user (e.g., operations of the user on the touch panel 631 or near the touch panel 631 by using any suitable object or accessory such as a finger or a stylus) thereon or nearby, and drive the corresponding connection device according to a preset program. Alternatively, the touch panel 631 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 680, and can receive and execute commands sent by the processor 680. In addition, the touch panel 631 may be implemented using various types, such as resistive, capacitive, infrared, and surface acoustic wave. The input unit 630 may include other input devices 632 in addition to the touch panel 631. In particular, other input devices 632 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

The display unit 640 may be used to display information input by the user or information provided to the user and various menus of the mobile phone. The Display unit 640 may include a Display panel 641, and optionally, the Display panel 641 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch panel 631 may cover the display panel 641, and when the touch panel 631 detects a touch operation thereon or nearby, the touch operation is transmitted to the processor 680 to determine the type of the touch event, and then the processor 680 provides a corresponding visual output on the display panel 641 according to the type of the touch event. Although in fig. 7, the touch panel 631 and the display panel 641 are two independent components to implement the input and output functions of the mobile phone, in some embodiments, the touch panel 631 and the display panel 641 may be integrated to implement the input and output functions of the mobile phone.

The handset may also include at least one sensor 650, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that adjusts the brightness of the display panel 641 according to the brightness of ambient light, and a proximity sensor that turns off the display panel 641 and/or the backlight when the mobile phone is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when stationary, and can be used for applications of recognizing the gesture of the mobile phone (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the mobile phone, further description is omitted here.

Audio circuitry 660, speaker 661, and microphone 662 can provide an audio interface between a user and a cell phone. The audio circuit 660 may transmit the electrical signal converted from the received audio data to the speaker 661, and convert the electrical signal into an audio signal through the speaker 661 for output; on the other hand, the microphone 662 converts the collected sound signals into electrical signals, which are received by the audio circuit 660 and converted into audio data, which are processed by the audio data output processor 680 and then transmitted via the RF circuit 610 to, for example, another cellular phone, or output to the memory 620 for further processing.

Wi-Fi belongs to short-distance wireless transmission technology, and a mobile phone can help a user to receive and send emails, browse webpages, access streaming media and the like through a Wi-Fi module 670, and provides wireless broadband internet access for the user. Although fig. 7 shows a Wi-Fi module 670, it is understood that it does not belong to the essential constitution of the handset and can be omitted entirely as needed within the scope of not changing the essence of the application.

The processor 680 is a control center of the mobile phone, and connects various parts of the entire mobile phone by using various interfaces and lines, and performs various functions of the mobile phone and processes data by operating or executing software programs and/or modules stored in the memory 620 and calling data stored in the memory 620, thereby performing overall monitoring of the mobile phone. Optionally, processor 680 may include one or more processing modules; preferably, the processor 680 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 680.

The handset also includes a power supply 690 (e.g., a battery) for powering the various components, which may be logically coupled to the processor 680 via a power management system, so as to manage charging, discharging, and power consumption via the power management system.

Although not shown, the mobile phone may further include a camera, a bluetooth module, etc., which are not described herein.

In the embodiment of the present application, the processor 680 included in the mobile phone further has a flowchart for controlling the execution of the method for gathering documents by people as shown in fig. 2.

Fig. 8 is a schematic diagram of a server 720 according to an embodiment of the present invention, where the server 720 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 722 (e.g., one or more processors) and a memory 732, and one or more storage media 730 (e.g., one or more mass storage devices) for storing applications 742 or data 744. Memory 732 and storage medium 730 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 730 may include one or more modules (not shown), each of which may include a sequence of instructions operating on a server. Still further, central processor 722 may be configured to communicate with storage medium 730 to perform a series of instruction operations in storage medium 730 on server 720.

The Server 720 may also include one or more power supplies 726, one or more wired or wireless network interfaces 750, one or more input-output interfaces 758, and/or one or more operating systems 741, such as Windows Server, mac OS X, unix, linux, freeBSD, and so forth.

The steps performed by the server in the above embodiments may be based on the structure of the server 720 shown in fig. 8. The steps of the server shown in fig. 2 in the above-described embodiment, for example, may be based on the server structure shown in fig. 8. For example, the processor 722, by invoking instructions in the memory 732, performs the following:

and if the target similarity value is larger than or equal to a preset threshold value, performing file aggregation processing on the first person file and the second person file to obtain a third person file.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to the related descriptions of other embodiments.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the apparatus and the module described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the embodiments of the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may be stored in a computer readable storage medium.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the present application are generated in whole or in part when the computer program is loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a server, a data center, etc., that includes one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.

The technical solutions provided by the embodiments of the present application are introduced in detail, and the principles and implementations of the embodiments of the present application are explained by applying specific examples in the embodiments of the present application, and the descriptions of the embodiments are only used to help understanding the method and core ideas of the embodiments of the present application; meanwhile, for a person skilled in the art, according to the idea of the embodiment of the present application, there may be a change in the specific implementation and application scope, and in summary, the content of the present specification should not be construed as a limitation to the embodiment of the present application.

Claims

1. A person gathering method, comprising:

2. The method of claim 1, wherein said performing an archival similarity calculation on the first key frame set and the second key frame set according to the image feature type of the first key frame and the image feature type of the second key frame to obtain a target similarity value comprises:

3. The method according to claim 2, wherein the determining the target similarity value according to the comparison result of the candidate similarity value and the preset threshold comprises:

4. The method of claim 1, wherein after the documenting the first person profile and the second person profile to obtain a third person profile, the method further comprises:

5. The method according to claim 4, wherein said selecting a target number of candidate key frames from the candidate key frames as the third key frame of the corresponding image feature type comprises:

6. The method of claim 1, wherein after the documenting the first person profile and the second person profile to obtain a third person profile, the method further comprises:

7. The method according to any one of claims 1 to 6, wherein the at least one image feature type comprises at least one of a full face image type, a mask face image type, and a human image type.

8. A personnel gathering device, comprising:

the processing module is used for calculating the file similarity of the first key frame set and the second key frame set according to the image feature type of the first key frame and the image feature type of the second key frame to obtain a target similarity value; and if the target similarity value is larger than or equal to a preset threshold value, performing file aggregation processing on the first personnel file and the second personnel file to obtain a third personnel file.

9. A computer arrangement, characterized in that the computer arrangement comprises a memory, on which a computer program is stored, and a processor, which when executing the computer program, carries out the method according to any one of claims 1-7.

10. A computer-readable storage medium, characterized in that the storage medium stores a computer program comprising program instructions which, when executed by a processor, implement the method according to any one of claims 1-7.