CN115495606A - Image gathering method and system - Google Patents

Image gathering method and system Download PDF

Info

Publication number
CN115495606A
CN115495606A CN202210963416.8A CN202210963416A CN115495606A CN 115495606 A CN115495606 A CN 115495606A CN 202210963416 A CN202210963416 A CN 202210963416A CN 115495606 A CN115495606 A CN 115495606A
Authority
CN
China
Prior art keywords
files
file
clustering
merging
threshold
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210963416.8A
Other languages
Chinese (zh)
Inventor
施翔飞
吴鸿伟
梁煜麓
蓝坤宏
赖光冰
江艺榕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Meiya Pico Information Co Ltd
Original Assignee
Xiamen Meiya Pico Information Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Meiya Pico Information Co Ltd filed Critical Xiamen Meiya Pico Information Co Ltd
Priority to CN202210963416.8A priority Critical patent/CN115495606A/en
Publication of CN115495606A publication Critical patent/CN115495606A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/535Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks

Abstract

The invention provides an image gathering method and system, which comprises the steps of acquiring a picture in real time to carry out feature structuralization and attribute identification, and filtering the picture which does not meet the quality score of the picture; the method comprises the steps of segmenting and grouping pictures according to space-time information, clustering in batches, clustering by density clustering, and merging in batches by using and searching sets; comparing and archiving discrete points which cannot be aggregated by batch clustering and classes formed by batch clustering with the dynamic base respectively; and merging the files of the same object, traversing all the files and comparing the files with the static base library, and real-naming the files which are not real-named. The method for gathering the image files through real-time clustering and cascading merging is provided, the image files are gathered according to a multi-merging link ring-to-ring buckling complementary mode, real-time performance is guaranteed, meanwhile, the rate of gathering the image files and the accuracy rate of gathering the image files can be further improved, and the problem that one person has multiple image files is solved.

Description

Image gathering method and system
Technical Field
The invention relates to the technical field of image processing, in particular to an image gathering method and system.
Background
In the field of security protection, with the development of image recognition technology, a large amount of portrait track data is generated every day, the capability of comparing each picture consumes a large amount of comparison computing resources, and due to the diversity of picture acquisition environments, comparison omission is often caused, so that the concept of image file aggregation is provided. The image gathering is an important data processing link, and the accuracy, reliability and real-time performance of various technical and tactical methods of subsequent big data are affected by the quality of the gathering effect. However, the existing image file gathering technology needs a large amount of comparison calculation power to have real-time performance and accuracy, basically at least T +1 period can be processed well on the premise of low calculation power, the real-time performance requirements of users cannot be met, a static base is needed to gather files, and a serious problem of one person with multiple files exists, so that a complete track cannot be provided for subsequent technical and tactical mining.
Disclosure of Invention
In order to solve the problem that the real-time performance is low in the prior art, the gathering can be triggered only in T +1 days; the computational power consumption is large, and all pictures are compared with each other by a comparison method 1: n; a static bottom library is required to carry out file aggregation; the invention provides an image gathering method and system, which solve the technical problem that one person has a plurality of files.
According to a first aspect of the present invention, an image gathering method is provided, including:
s1: acquiring a picture in real time, carrying out feature structuring and attribute identification, and filtering the picture which does not meet the picture quality score;
s2: the method comprises the steps of segmenting and grouping pictures according to space-time information, clustering in batches, clustering by density clustering, and merging in batches by using and searching sets;
s3: comparing and archiving discrete points which cannot be aggregated by batch clustering and classes formed by batch clustering with the dynamic base library respectively;
s4: and merging the files of the same object, traversing all the files to compare with the static base library, and performing real-name transformation on the files without real names.
In some specific embodiments, the clustering performed on the divided batches in S2 specifically includes:
s21: performing density clustering by adopting a DBSCAN algorithm;
s22: calculating an average vector of the clustering result to obtain a cluster center;
s23: and finding out the cluster center with the similarity ranking higher than the first threshold in the top results, and performing and searching operation.
In some specific embodiments, the union check set operation in S23 includes:
s231: initializing each class as a set, and traversing and judging whether root nodes to which elements in the set belong are a plurality of classes;
s232: and responding to the element root node being of the majority class, searching the root node to which other unprocessed elements belong, and merging the sets of the two elements if the two elements meet the association threshold.
In some specific embodiments, S3 specifically includes: carrying out 1: n, judging whether the files meet the search threshold value, and selecting different threshold values according to the attributes of the discrete points and the attributes of the compared files to judge whether the files meet the requirements for filing; and 1, carrying out batch clustering on the formed class and the dynamic base library: and N, judging whether the files have the search threshold values, selecting different threshold values according to the attributes of the classes and the attributes of the compared files to judge whether the files meet the requirements for filing, and if the files do not meet the requirements and the identities cannot be confirmed, adding the files.
In some specific embodiments, merging the archives of the same object in S4 specifically includes: determining a pre-merged file; concurrence 1: n, inquiring to obtain results of each class meeting the search threshold and storing the results to a preprocessing array; determining the maximum sample file, and merging and sorting; and (5) pruning the threshold value to obtain a final merging list which can be directly processed.
In some embodiments, determining the pre-merged profile comprises: screening out files with the file sample number larger than a first sample threshold value, and counting the file number DN without the last merging time field in the screening result; judging whether DN exceeds a first file number DNT threshold value, if yes, taking out DNT files meeting the conditions; if not, DN files meeting the conditions are taken out, files with the time difference absolute value between the last merging time and the current time point exceeding a first time difference threshold value are screened out, the files are sorted according to the ascending order of the last merging time, DNT-DN files before sorting are taken out and merged with the DN files.
In some embodiments, determining the maximum sample profile specifically comprises: and traversing all the preprocessed arrays, determining the file with the largest sample in the current file and the neighbor files thereof, judging whether the file with the largest sample is the current file, if not, updating the file with the largest sample into a key file, taking out the values of all the original neighbor files, taking the current file as the value, combining the value results of the same key, removing the weight, and sorting the results in a descending order according to the number of the samples of the key.
In some specific embodiments, the threshold pruning specifically comprises: initializing a deselection list, traversing the sorted preprocessing arrays, responding to the situation that the current key does not exist in the deselection list, acquiring a corresponding threshold value according to the file attribute corresponding to the key, adding the key to the deselection list, traversing the value array under the key, responding to the situation that the current value does not exist in the deselection list and meets the comparison threshold value, retaining the value, adding the value to the deselection list, and outputting a precombination list.
In some embodiments, the real-name converting the non-real-name archive in S4 specifically includes: determining a pre-identity landing file, traversing the pre-identity landing file, and comparing the central characteristics of the file class with a static base library; in response to the comparison result being greater than the first landing threshold, or the comparison result being less than the first landing threshold but greater than the second landing threshold, and the identity with the highest degree of similarity to the file reference being the same certificate reference and the comparison result being greater than the third landing threshold, landing the identity with the highest degree of similarity; in response to the presence of the same id card, recording a pre-merged identity file, merging the small files into the large file by taking the file with a large number of samples as a main file, and updating the last time of file updating; and triggering identity merging, and scanning the pre-merged file list for merging.
In some embodiments, determining the pre-identity floor profile specifically comprises: screening out the files with the file sample number larger than the second sample threshold value, and counting the file number DN' without the last landing time field in the screening result; judging whether DN ' exceeds the threshold of a second file number DNT ', if yes, taking out DNT ' files meeting the conditions; if not, DN 'files meeting the conditions are taken out, files with the time difference absolute value between the last landing time and the current time point exceeding a second time difference threshold are screened out, the files are sorted according to the ascending order of the last merging time, DNT' -DN 'files before sorting are taken out and merged with the DN' files.
According to a second aspect of the invention, a computer-readable storage medium is proposed, on which one or more computer programs are stored, which when executed by a computer processor implement the method of any of the above.
According to a third aspect of the present invention, there is provided an image filing system, comprising:
a picture acquisition unit: the method comprises the steps of configuring and using the image to obtain images in real time for feature structuralization and attribute identification, and filtering the images which do not meet the quality score of the images;
a picture clustering unit: the image segmentation and grouping device is configured to segment and group images according to time-space information, perform clustering in batches, perform clustering by using density clustering, and perform in-batch combination by using and searching sets;
an archiving unit: the method comprises the steps that discrete points which cannot be aggregated by batch clustering and classes formed by batch clustering are configured and are respectively compared with a dynamic base for filing;
identity landing unit: and configuring files for merging the same object, traversing all the files, comparing the files with the static base library, and performing real-name processing on the files without real names.
The invention provides an image gathering method and system, which realize gathering according to a multi-merging link ring-to-ring complementary form, have high real-time performance and have minute-level clustering performance; it is not necessary that all pictures are processed by 1: n, batch clustering is carried out firstly, and then external comparison filing is carried out, so that comparison calculation force can be greatly reduced; the file can be gathered in advance without a static bottom library, and offline real names are supported; and processing the problem of multiple archives of one person by adopting post-event combination similar to clustering. The invention provides good data service support for subsequent big data analysis and can directly provide reliable one-person one-file archive track information for users.
Drawings
The accompanying drawings are included to provide a further understanding of the embodiments and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments and together with the description serve to explain the principles of the invention. Other embodiments and many of the intended advantages of embodiments will be readily appreciated as they become better understood by reference to the following detailed description. Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a flow chart of an image archiving method according to an embodiment of the present application;
FIG. 3 is a main flow diagram of an image filing method according to an embodiment of the present application;
FIG. 4 is a schematic view of a merged flow of a particular embodiment of the present application;
FIG. 5 is a flow diagram of direct archiving according to a specific embodiment of the present application;
FIG. 6 is a flow diagram of an off-batch consolidation of a specific embodiment of the present application;
FIG. 7 is a flow diagram illustrating post-merger according to an exemplary embodiment of the present application;
FIG. 8 is a flowchart illustrating the determination of pre-merged files according to an embodiment of the present application;
FIG. 9 is a flow chart illustrating the determination of a maximum sample profile according to an embodiment of the present application;
FIG. 10 is a schematic flow chart of threshold pruning in accordance with a specific embodiment of the present application;
FIG. 11 is a schematic flow chart of an identity floor of a specific embodiment of the present application;
FIG. 12 is a flowchart illustrating the determination of pre-identity floor profiles in accordance with an embodiment of the present application;
FIG. 13 is a block diagram of an image archiving system according to an embodiment of the present application;
FIG. 14 is a block diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 illustrates an exemplary system architecture 100 to which an image filing method of an embodiment of the present application may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. Network 104 is the medium used to provide communication links between terminal devices 101, 102, 103 and server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
A user may use terminal devices 101, 102, 103 to interact with a server 105 over a network 104 to receive or send messages or the like. Various applications, such as a data processing application, a data visualization application, a web browser application, etc., may be installed on the terminal devices 101, 102, 103.
The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the terminal devices 101, 102, 103 are software, they can be installed in the electronic devices listed above. It may be implemented as a plurality of software or software modules (e.g., software or software modules used to provide distributed services) or as a single software or software module. And is not particularly limited herein.
The server 105 may be a server providing various services, such as a background information processing server providing support for mapping table data presented on the terminal devices 101, 102, 103. The background information processing server can process the acquired logical address and generate a processing result.
It should be noted that the method provided in the embodiment of the present application may be executed by the server 105, or may be executed by the terminal devices 101, 102, and 103, and the corresponding apparatus is generally disposed in the server 105, or may be disposed in the terminal devices 101, 102, and 103.
The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., software or software modules for providing distributed services) or as a single piece of software or software module. And is not particularly limited herein.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for an implementation.
An image filing method according to an embodiment of the present application, fig. 2 shows a flowchart of an image filing method according to an embodiment of the present application. As shown in fig. 2, the method includes:
s201: and acquiring the pictures in real time to perform feature structuring and attribute identification, and filtering the pictures which do not meet the quality score of the pictures.
S202: and segmenting and grouping the pictures according to the time-space information, clustering in batches, clustering by using density clustering, and merging in batches by using a merged set.
In a specific embodiment, clustering in batches specifically includes the steps of:
s21: performing density clustering by adopting a DBSCAN algorithm;
s22: calculating an average vector of the clustering result to obtain a cluster center;
s23: and finding out the cluster center with the similarity ranking higher than the first threshold in the top results, and performing and searching operation.
The operation of searching the set specifically comprises:
s231: initializing each class as a set, and traversing and judging whether root nodes to which elements in the set belong are a plurality of classes;
s232: and in response to the element root node being a majority class, searching for root nodes to which other unprocessed elements belong, and merging the sets of the two elements if the two elements meet an association threshold.
S203: and comparing and archiving discrete points which cannot be aggregated by batch clustering and classes formed by batch clustering with the dynamic base respectively.
In a specific embodiment, 1: n, judging whether the files meet the search threshold value, and selecting different threshold values according to the attributes of the discrete points and the attributes of the compared files to judge whether the files meet the requirements for filing; and 1, carrying out batch clustering on the formed class and the dynamic base library: and N, judging whether the files have the search threshold values, selecting different threshold values according to the attributes of the classes and the attributes of the compared files to judge whether the files meet the requirements for filing, and if the files do not meet the requirements and the identities cannot be confirmed, adding the files.
S204: and merging the files of the same object, traversing all the files and comparing the files with the static base library, and real-naming the files which are not real-named.
In a specific embodiment, merging the archives of the same object specifically includes: determining a pre-merged file; concurrence 1: n, inquiring to obtain results of each class meeting the search threshold and storing the results to a preprocessing array; determining a maximum sample file, and merging and sorting; and (5) pruning the threshold value to obtain a final merging list which can be directly processed. Wherein determining the pre-merged profile comprises: screening out files with the file sample number larger than a first sample threshold value, and counting the file number DN without the last merging time field in the screening result; judging whether DN exceeds DNT threshold of first number of files, if yes, taking out DNT files meeting the above condition; if not, DN files meeting the conditions are taken out, files with the time difference absolute value between the last merging time and the current time point exceeding a first time difference threshold value are screened out, the files are sorted according to the ascending order of the last merging time, DNT-DN files before sorting are taken out and merged with the DN files. Determining the maximum sample profile specifically includes: and traversing all the preprocessed arrays, determining the file with the largest sample in the current file and the neighbor files thereof, judging whether the file with the largest sample is the current file, if not, updating the file with the largest sample into a key file, taking out the values of all the original neighbor files, taking the current file as the value, combining the value results of the same key, removing the weight, and sorting the results in a descending order according to the number of the samples of the key. The threshold pruning specifically comprises: initializing a deselection list, traversing the sorted preprocessing arrays, responding to the situation that the current key does not exist in the deselection list, acquiring a corresponding threshold value according to the file attribute corresponding to the key, adding the key to the deselection list, traversing the value array under the key, responding to the situation that the current value does not exist in the deselection list and meets the comparison threshold value, retaining the value, adding the value to the deselection list, and outputting a precombination list.
In a specific embodiment, the real-name converting the non-real-name archive specifically includes: determining a pre-identity landing file, traversing the pre-identity landing file, and comparing the central characteristics of the file class with a static base library; in response to the comparison result being greater than the first landing threshold, or the comparison result being less than the first landing threshold but greater than the second landing threshold, and the identity with the highest degree of similarity to the file reference being the same certificate reference and the comparison result being greater than the third landing threshold, landing the identity with the highest degree of similarity; in response to the existence of the same id card, recording the pre-merged identity file, merging the small files into the large file by taking the file with a large number of samples as a main file, and updating the last file updating time; and triggering identity merging, and scanning a pre-merged file list for merging. Wherein, confirm that the pre-identity falls to the ground archives specifically includes: screening out the files with the file sample number larger than the second sample threshold value, and counting the file number DN' without the last landing time field in the screening result; judging whether DN ' exceeds the threshold of a second file number DNT ', if yes, taking out DNT ' files meeting the conditions; if not, DN 'files meeting the conditions are taken out, files with the time difference absolute value between the last landing time and the current time point exceeding a second time difference threshold value are screened out, the files are sorted according to the ascending order of the last merging time, DNT' -DN 'files before sorting are taken out and merged with the DN' files.
With continuing reference to FIG. 3, FIG. 3 illustrates a main flow diagram of an image archiving method according to a specific embodiment of the present application, as shown in FIG. 3, the method mainly includes the following steps:
step 301: and acquiring the pictures in real time to perform feature structuring and attribute identification, and filtering the pictures which do not meet the quality score of the pictures.
Step 302: segmenting and grouping the pictures according to the time-space information;
step 303: and calling a batch clustering module to perform clustering in batches, performing coarse clustering by using density clustering, and performing in-batch combination by using a combined set.
Step 304: and respectively calling the direct filing module, the batch external merging module and the dynamic bottom library to compare and file the obtained discrete points and the obtained classes.
Step 305: and calling a post merging module, clustering the files, and merging the files of the same object.
Step 306: and calling an identity landing module, traversing all the files, comparing with the static bottom library, and confirming the identity.
In a specific embodiment, the main process of batch clustering in the batch clustering module in step 303 is: firstly, adopting a DBSCAN algorithm to perform density clustering; calculating an average vector of the clustering result to obtain a cluster center; finding cluster centers 1024 corresponding to the respective tops and larger than a first threshold value; and (5) performing concurrent set searching operation. The merging search requirement of the application complies with the condition that the minority clusters are not merged with the minority clusters, and the minority clusters are not actively merged. Only majority cluster merging of few clusters and majority cluster merging of most clusters are performed. Wherein, the majority cluster refers to the cluster with the number of samples in the cluster larger than N, and the majority cluster is the minority cluster. A detailed merged gather operation flow, such as the merged gather flow diagram according to a specific embodiment of the present application shown in fig. 4, specifically includes the following flows:
step 401: each class is initialized as a collection, and the elements within the collection are the classes. I.e. the set in which each point is located is initialized to itself.
Step 402: and traversing the root node to which the judgment element belongs.
Step 403: it is determined whether the element is of a majority class, if so, proceed to step 404, otherwise, proceed to step 407.
Step 404: and searching the root node to which other elements which are not processed belong.
Step 405: and judging whether the two satisfy the association threshold value, if so, continuing to step 406, and if not, entering step 407.
Step 406: and combining the sets of the two elements into one, and taking the earliest root node as the root node.
Step 407: and judging whether the traversal is finished, if so, finishing, and if not, returning to the step 402 to continue the traversal.
In a specific embodiment, the direct archiving module in step 304 mainly processes discrete points that cannot be aggregated by batch clustering and the dynamic base library to perform 1: and N, judging whether the files meet the search threshold value or not, and selecting different threshold values according to the attributes of the discrete points and the attributes of the compared files to judge whether the files meet the requirements for filing or not. Fig. 5 shows a schematic flow chart of direct archiving according to a specific embodiment of the present application, including the following steps:
step 501: traversing all discrete points and respectively performing 1: and N searching.
Step 502: and judging whether files meeting the search threshold exist or not, if so, continuing to step 503, otherwise, performing step 504, not directly archiving, and pushing back to the banded clustering queue.
Step 503: and acquiring data such as masks, ages, sexes, sample numbers and the like corresponding to the files.
Step 505: judging whether the mask is worn or not, if so, performing step 507 to use a comparison threshold value of the mask; if not, go to step 506 to use the comparison threshold of male, female, old and young
Step 508: and judging whether the first threshold value is larger than the first threshold value of the corresponding type, if so, performing step 509 for filing, otherwise, performing step 504 for not directly filing, and pushing the tape clustering queue.
In a specific embodiment, in step 304, the batch external merging module mainly processes the class formed by the batch clustering and the dynamic base library to perform 1: and N, judging whether the files meet the search threshold value or not, selecting different threshold values according to the attributes of the classes and the attributes of the compared files to judge whether the files meet the requirements for filing or not, calling the identity landing module if the files do not meet the requirements for filing, and adding new files if the files are not compared. Detailed process as shown in fig. 6, the process diagram of the off-batch combination according to a specific embodiment of the present application includes the following steps:
step 601: and traversing all classes, and respectively carrying out 1.
Step 602: and judging whether a file meeting the search threshold exists, if so, performing step 603, and if not, performing step 608 to call the identity landing module to perform real-name processing.
Step 603: and acquiring data such as masks, ages, sexes, samples and the like corresponding to the files.
Step 604: and (4) judging whether the mask is worn, if so, performing a comparison threshold value of using the mask in step 606, and if not, performing a proportion threshold value of male, female, old and young in step 605.
Step 607: and judging whether the first threshold value is larger than the first threshold value of the corresponding type, if so, archiving in a step 610, and if not, calling an identity landing module to perform real-name processing in a step 608.
Step 609: and judging whether the real-name is successful, if so, performing step 610 for filing, and if not, performing step 611 for adding an unknown person file.
In a specific embodiment, in step 305, the merging module mainly handles the problem of multiple files for one person, and performs clustering merging on the files, thereby reducing the problem of multiple files for one person. Fig. 7 shows a flowchart of post-merger according to a specific embodiment of the present application, which includes the following steps:
step 701: a pre-merged archive is determined.
Step 702: concurrence 1: and N, inquiring to obtain results of each class meeting the search threshold, and enabling the results to be expressed by { key: the [ value1, \8230;, value N ] } form is stored to the and pre-processing array.
Step 703: the largest sample profile is determined and the rankings are merged.
Step 704: and (5) pruning the threshold value to obtain a final merging list which can be directly processed.
In an embodiment, the specific process of determining the pre-merged profile in step 701 is a schematic process of determining the pre-merged profile according to an embodiment of the present application shown in fig. 8, and includes the following steps:
step 801: and screening the files with the file sample number larger than a first sample threshold (such as 10).
Step 802: and counting the number DN of files without the field of last merging time under the screening result.
Step 803: and judging whether DN exceeds DNT threshold of the first number of files, if yes, executing step 804 to take out DNT files meeting the above condition, and if not, executing step 805.
Step 805: DN files meeting the above condition are fetched.
Step 806: and screening files of which the absolute value of the time difference between the last merging time and the current time point exceeds a first time difference threshold (such as 2 d).
Step 807: and sorting the files in ascending order according to the last merging time.
Step 808: the first DNT-DN (DNT minus DN) files are retrieved and merged with the previous DN files.
In an embodiment, the specific process of determining the maximum sample profile in step 703 is a schematic flowchart of determining the maximum sample profile according to an embodiment of the present application shown in fig. 9, and includes the following steps:
step 901: and traversing all preprocessing arrays.
Step 902: and determining the file with the largest sample in the current file and the neighbor files.
Step 903: and judging whether the file with the largest sample is the current file, if so, performing step 904 for no processing, and if not, performing step 905.
Step 905: and updating the sample maximum file into a key file, taking out all original neighbor files as values, and taking the current file as the value.
Step 906: and judging whether the traversal is finished, if so, continuing to the step 907.
Step 907: and combining the value results of the same key and removing the duplicate.
Step 908: the results are sorted in descending order by the number of samples of the key.
In a specific embodiment, a specific flow of the threshold pruning in the step 704 is a schematic flow chart of the threshold pruning according to a specific embodiment of the present application shown in fig. 10, and includes the following steps:
step 1001: and initializing a elimination list.
Step 1002: traversing the sorted preprocessing arrays.
Step 1003: and judging whether the current key exists in the elimination list, if so, carrying out step 1004 to eliminate the data, and if not, carrying out step 1006.
Step 1006: and (4) according to attributes such as masks and ages of the key corresponding files, obtaining a corresponding threshold value from a corresponding threshold value table, and adding the key to the elimination list.
Step 1007: and traversing the value array under the key.
Step 1008: and judging whether the current value exists in the elimination list, if so, performing step 1012 to eliminate the value data, and if not, performing step 1009.
Step 1009: and judging whether the comparison threshold is met. If not, go to step 1012 to eliminate the value data, if yes, go to step 1010.
Step 1010: the value is retained and added to the cull list.
Step 1011: and judging whether the traversal is finished, if so, entering a step 1005, and if not, returning to the step 1007 to continue the traversal of the value array under the key.
Step 1005: and judging whether the traversal is finished, if so, outputting a pre-merging list, and if not, returning to the step 1002 to continue traversing the sorted pre-processing array.
In a specific embodiment, the identity grounding module performs real-name processing on the file without real name in step 306. Fig. 11 is a schematic flow chart of identity landing according to a specific embodiment of the present application, including the following steps:
step 1111: and scanning the mango to determine the pre-identity landing file.
Step 1102: and traversing the result file.
Step 1103: and comparing the central characteristics of the files with the static base library.
Step 1104: and judging whether the score is greater than a first landing threshold (such as 90), if not, performing step 1105, and if so, performing step 1106.
Step 1105: judging whether the score is larger than a second floor threshold (such as 85 scores) or not, and if the document photo has the highest comparison similarity, the document photo is the same, and the threshold is greater than the third floor threshold (e.g. 92), if yes, go to step 1106, and if not, go to determine whether the traversal is over.
Step 1106: and taking the identity with the highest similarity to land.
Step 1107: judging whether the same id card exists, if not, executing step 1108 to update the file identity information of the monogo, if so, executing step 1109 to record the pre-merged identity file, and merging the small file into the large file by taking the file with more samples as the main file.
Step 1110: and updating the last file updating time.
Step 1111: and when the traversal is finished, triggering identity merging, scanning a pre-merged identity file list for merging, and updating the ES picture attribution file information.
In an embodiment, the specific process of determining the pre-identity floor profile in step 1111 is shown in fig. 12, which is a schematic flowchart of a process of determining the pre-identity floor profile according to an embodiment of the present application, and includes the following steps:
step 1201: and screening the files with the number of the file samples larger than a second sample threshold value (such as 10).
Step 1202: and counting the number DN' of the files without the last landing time field under the screening result.
Step 1203: and judging whether DN ' exceeds a threshold value of a second file number DNT ', if yes, performing step 1204 to take out DNT ' files meeting the conditions, and if not, performing step 1205.
Step 1205: DN' files meeting the above condition are fetched.
Step 1206: and screening out files that the absolute value of the time difference between the last landing time and the current time point exceeds a second time difference threshold (such as 2 d).
Step 1207: and sorting the files in ascending order according to the last merging time.
Step 1208: the first DNT ' -DN ' (DNT ' minus DN ') files are retrieved and merged with the previous DN ' files.
In the above embodiment, the dynamic base refers to a data set in which a picture set is changed, and the static base refers to a data set in which a picture set is not changed, and in the above embodiment, a threshold value, which is not illustrated, such as a threshold value range of a correlation threshold value, a search threshold value, and the like, is taken in an interval of [0,1], and the threshold value may be specifically set according to actual requirements.
The image clustering method has high real-time performance and minute-level clustering performance, can perform clustering without a static bottom library, and greatly reduces the comparison calculation power with the static bottom library by a batch clustering mode. The method provides good data service support for subsequent big data analysis, and can directly provide reliable one-person one-file archive track information for users.
With continued reference to FIG. 13, FIG. 13 illustrates a block diagram of an image filing system according to an embodiment of the present application. The system specifically comprises a picture acquisition unit 1301, a picture clustering unit 1302, an archiving unit 1303 and an identity landing unit 1304. The picture acquiring unit 1301 is configured to acquire a picture in real time to perform feature structuring and attribute identification, and filter pictures that do not meet picture quality scores; the picture clustering unit 1302 is configured to segment and group pictures according to time-space information, perform clustering in batches, perform clustering by density clustering, and perform intra-batch combination by using a combined set; the filing unit 1303 is configured to compare discrete points that cannot be aggregated by batch clustering and classes formed by batch clustering with the dynamic base library and file the discrete points and the classes; the identity place unit 1304 is configured to merge files of the same object, traverse all files and compare with the static base library, and perform real-name processing on files without real names.
Referring now to FIG. 14, shown is a block diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 14 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 14, the computer system 1400 includes a Central Processing Unit (CPU) 1401, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 1402 or a program loaded from a storage portion 1408 into a Random Access Memory (RAM) 1403. In the RAM 1403, various programs and data necessary for the operation of the system 1400 are also stored. The CPU 1401, ROM 1402, and RAM 1403 are connected to each other via a bus 1404. An input/output (I/O) interface 1405 is also connected to bus 1404.
The following components are connected to the I/O interface 1405: an input portion 1406 including a keyboard, a mouse, and the like; an output portion 1407 including a display such as a Liquid Crystal Display (LCD) and a speaker; a storage portion 1408 including a hard disk and the like; and a communication portion 1409 including a network interface card such as a LAN card, a modem, or the like. The communication section 1409 performs communication processing via a network such as the internet. The driver 1410 is also connected to the I/O interface 1405 as necessary. A removable medium 1411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1410 as necessary, so that a computer program read out therefrom is installed into the storage section 1408 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1409 and/or installed from the removable medium 1411. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 1401. Note that the computer-readable storage medium of the present application can be a computer-readable signal medium or a computer-readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable storage medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present application may be implemented by software or hardware.
As another aspect, the present application also provides a computer-readable storage medium, which may be included in the electronic device described in the above embodiments; or may be separate and not incorporated into the electronic device. The computer readable storage medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring pictures in real time to carry out feature structuralization and attribute identification, and filtering the pictures which do not meet the quality score of the pictures; the method comprises the steps of segmenting and grouping pictures according to space-time information, clustering in batches, clustering by density clustering, and merging in batches by using and searching sets; comparing and archiving discrete points which cannot be aggregated by batch clustering and classes formed by batch clustering with the dynamic base library respectively; and merging the files of the same object, traversing all the files to compare with the static base library, and performing real-name transformation on the files without real names.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements in which any combination of the features described above or their equivalents does not depart from the spirit of the invention disclosed above. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (12)

1. An image filing method, comprising:
s1: acquiring pictures in real time to carry out feature structuralization and attribute identification, and filtering the pictures which do not meet the quality score of the pictures;
s2: the pictures are segmented and grouped according to space-time information, clustering is carried out in batches, clustering is carried out by utilizing density clustering, and then in-batch combination is carried out by utilizing a combined searching set;
s3: comparing and archiving discrete points which cannot be aggregated by batch clustering and classes formed by batch clustering with the dynamic base respectively;
s4: and merging the files of the same object, traversing all the files to compare with the static base library, and performing real-name transformation on the files without real names.
2. The image filing method according to claim 1, wherein the clustering in the sub-batches in S2 specifically comprises:
s21: performing density clustering by adopting a DBSCAN algorithm;
s22: calculating an average vector of the clustering result to obtain a cluster center;
s23: and finding out the cluster center with the similarity ranking higher than the first threshold in the top results, and performing and searching operation.
3. The image filing method according to claim 2, wherein the S23 merge-and-search operation includes:
s231: initializing each class as a set, and traversing and judging whether a root node to which an element in the set belongs is a plurality of classes;
s232: and in response to the element root node being a majority class, searching for root nodes to which other unprocessed elements belong, and merging the sets of the two elements if the two elements meet an association threshold.
4. The image archiving method according to claim 1, wherein the S3 specifically comprises: carrying out 1: n, judging whether the files meet the search threshold value, and selecting different threshold values according to the attributes of the discrete points and the attributes of the compared files to judge whether the files meet the requirements for filing; and 1, carrying out batch clustering on the formed classes and the dynamic bottom library: and N, judging whether the files have the search threshold values, selecting different threshold values according to the attributes of the classes and the attributes of the compared files to judge whether the files meet the requirements for filing, and if the files do not meet the requirements and the identities cannot be confirmed, adding the files.
5. The image archive method according to claim 1, wherein the merging of archives of the same object in S4 specifically includes: determining a pre-merged file; concurrence 1: n, inquiring, obtaining results of each class meeting a search threshold value, and storing the results into a preprocessing array; determining a maximum sample file, and merging and sorting; and (5) pruning the threshold value to obtain a final merging list which can be directly processed.
6. The image filing method of claim 5, wherein the determining a pre-merged archive comprises: screening out the files with the file sample number larger than a first sample threshold value, and counting the file number DN without the last merging time field in the screening result; judging whether DN exceeds DNT threshold of first number of files, if yes, taking out DNT files meeting the above condition; if not, DN files meeting the conditions are taken out, files with the time difference absolute value between the last merging time and the current time point exceeding a first time difference threshold value are screened out, the files are sorted according to the ascending order of the last merging time, DNT-DN files before sorting are taken out and merged with the DN files.
7. The image archiving method according to claim 6, wherein the determining the maximum sample archive comprises: and traversing all the preprocessed arrays, determining the file with the largest sample in the current file and the neighbor files thereof, judging whether the file with the largest sample is the current file, if not, updating the file with the largest sample into a key file, taking out the values of all the original neighbor files, taking the current file as the value, combining the value results of the same key, removing the weight, and sorting the results in a descending order according to the number of the samples of the key.
8. The image binning method of claim 6, wherein said threshold pruning specifically comprises: initializing an elimination list, traversing the sorted preprocessing arrays, responding to the situation that the current key does not exist in the elimination list, acquiring a corresponding threshold value according to the file attribute corresponding to the key, adding the key to the elimination list, traversing the value array under the key, responding to the situation that the current value does not exist in the elimination list and meets a comparison threshold value, reserving the value, adding the value to the elimination list, and outputting a pre-combination list.
9. The image filing method according to claim 1, wherein the real-name filing of the non-real-name archive in S4 specifically includes: determining a pre-identity landing file, traversing the pre-identity landing file, and comparing the central characteristics of the file class with the static base; in response to the comparison result being greater than a first landing threshold, or the comparison result being less than the first landing threshold but greater than a second landing threshold, and the same certificate photo having the highest degree of similarity to the archive photo and the third landing threshold being greater than the comparison result, taking the identity having the highest degree of similarity for landing; in response to the existence of the same id card, recording the pre-merged identity file, merging the small files into the large file by taking the file with a large number of samples as a main file, and updating the last file updating time; and triggering identity merging, and scanning the pre-merged file list for merging.
10. The image filing method according to claim 9, wherein the determining the pre-identity landing profile specifically includes: screening out the files with the file sample number larger than the second sample threshold value, and counting the file number DN' without the last landing time field in the screening result; judging whether DN ' exceeds the threshold of a second file number DNT ', if yes, taking out DNT ' files meeting the conditions; if not, DN 'files meeting the conditions are taken out, files with the time difference absolute value between the last landing time and the current time point exceeding a second time difference threshold value are screened out, the files are sorted according to the ascending order of the last merging time, DNT' -DN 'files before sorting are taken out and merged with the DN' files.
11. A computer-readable storage medium having one or more computer programs stored thereon, which when executed by a computer processor perform the method of any one of claims 1 to 10.
12. An image filing system, comprising:
a picture acquisition unit: the method comprises the steps of configuring and using the image to obtain images in real time for feature structuralization and attribute identification, and filtering the images which do not meet the quality score of the images;
a picture clustering unit: the image processing system is configured to divide and group the images according to the time-space information, perform clustering in batches, perform clustering by using density clustering, and perform in-batch combination by using a combined search set;
a filing unit: the method comprises the steps that discrete points which cannot be aggregated by batch clustering and classes formed by batch clustering are configured and are respectively compared with a dynamic base for filing;
identity landing unit: and configuring files for merging the same object, traversing all the files and comparing the files with the static base library, and real-naming the files which are not real-named.
CN202210963416.8A 2022-08-11 2022-08-11 Image gathering method and system Pending CN115495606A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210963416.8A CN115495606A (en) 2022-08-11 2022-08-11 Image gathering method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210963416.8A CN115495606A (en) 2022-08-11 2022-08-11 Image gathering method and system

Publications (1)

Publication Number Publication Date
CN115495606A true CN115495606A (en) 2022-12-20

Family

ID=84466067

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210963416.8A Pending CN115495606A (en) 2022-08-11 2022-08-11 Image gathering method and system

Country Status (1)

Country Link
CN (1) CN115495606A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117056551A (en) * 2023-07-07 2023-11-14 北京瑞莱智慧科技有限公司 File aggregation method and device for driving path, computer equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117056551A (en) * 2023-07-07 2023-11-14 北京瑞莱智慧科技有限公司 File aggregation method and device for driving path, computer equipment and storage medium
CN117056551B (en) * 2023-07-07 2024-04-02 北京瑞莱智慧科技有限公司 File aggregation method and device for driving path, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN113255694B (en) Training image feature extraction model and method and device for extracting image features
US8718383B2 (en) Image and website filter using image comparison
US20210264195A1 (en) Technologies for enabling analytics of computing events based on augmented canonicalization of classified images
WO2022057302A1 (en) Clustering method and apparatus, electronic device, and storage medium
CN112257801B (en) Incremental clustering method and device for images, electronic equipment and storage medium
CN110941978B (en) Face clustering method and device for unidentified personnel and storage medium
CN111062431A (en) Image clustering method, image clustering device, electronic device, and storage medium
US9734434B2 (en) Feature interpolation
CN109697452B (en) Data object processing method, processing device and processing system
CN110895811B (en) Image tampering detection method and device
CN115495606A (en) Image gathering method and system
CN113657087B (en) Information matching method and device
CN114625918A (en) Video recommendation method, device, equipment, storage medium and program product
CN114693970A (en) Object classification method, deep learning model training method, device and equipment
CN111444364B (en) Image detection method and device
TW202018540A (en) Method, apparatus and electronic device for database updating and computer storage medium thereof
CN114639143B (en) Portrait archiving method, device and storage medium based on artificial intelligence
CN113657596B (en) Method and device for training model and image recognition
CN113051911B (en) Method, apparatus, device, medium and program product for extracting sensitive words
CN115546554A (en) Sensitive image identification method, device, equipment and computer readable storage medium
WO2021204039A1 (en) Method and apparatus for pushing information
CN113590447B (en) Buried point processing method and device
CN114897290A (en) Evolution identification method and device of business process, terminal equipment and storage medium
CN110413603B (en) Method and device for determining repeated data, electronic equipment and computer storage medium
CN112765022A (en) Webshell static detection method based on data stream and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination