CN117671440A - Abnormal portrait file detection method and system - Google Patents

Abnormal portrait file detection method and system Download PDF

Info

Publication number
CN117671440A
CN117671440A CN202311450400.8A CN202311450400A CN117671440A CN 117671440 A CN117671440 A CN 117671440A CN 202311450400 A CN202311450400 A CN 202311450400A CN 117671440 A CN117671440 A CN 117671440A
Authority
CN
China
Prior art keywords
file
abnormal
portrait
files
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311450400.8A
Other languages
Chinese (zh)
Inventor
江艺榕
毕永辉
黄仝宇
江逸鑫
梁煜麓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Information Security Research Institute Co ltd
Original Assignee
Xiamen Information Security Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Information Security Research Institute Co ltd filed Critical Xiamen Information Security Research Institute Co ltd
Priority to CN202311450400.8A priority Critical patent/CN117671440A/en
Publication of CN117671440A publication Critical patent/CN117671440A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention provides a detection method and a system for an abnormal portrait file, wherein the method comprises the following steps: responding to the acquisition of the archive information; the file features are characterized and extracted through a feature engineering module; predicting by adopting a pre-trained machine learning model to give a probability value that the file belongs to an abnormality; and determining the detected abnormal file. By adopting a machine learning method to fuse each dimension information and calculate the probability value of the file belonging to the abnormal file, the abnormal file is detected, and the detection rate is higher; through the abnormal characteristics of the multidimensional depiction files, the machine learning model is adopted to fuse the multidimensional abnormal characteristics, the probability value of whether the files are abnormal is output, the abnormal files can be detected more accurately, and the detection rate is higher, so that the performance of a portrait file gathering system is improved, and the quality of file gathering data is improved.

Description

Abnormal portrait file detection method and system
Technical Field
The invention belongs to the technical field of data analysis, and particularly relates to a detection method and a detection system for an abnormal portrait file.
Background
Along with the continuous progress of the front-end monitoring equipment portrait acquisition technology and the portrait analysis technology in artificial intelligence, the public safety industry accumulates a large amount of portrait data, and a portrait file is formed through a portrait filing technology, so that the industrial actual combat application is better supported.
The portrait filing technology generally extracts characteristic values and attributes in portrait pictures through a deep neural network model, clusters portrait features and attributes in the pictures by using a clustering algorithm, and finally merges and files the portrait features and attributes with an archive. Each file contains a plurality of pictures and space-time information (acquisition time of front-end equipment and acquisition place information of the front-end equipment) corresponding to the pictures, image feature vectors and attribute information (such as age, sex, whether wearing a mask or not, and the like) extracted by a deep neural network, position coordinate information of a small figure in a background figure, and the like.
Due to the influence of various factors, such as the installation position of front-end equipment, light, image shooting angle, image quality, influence or interference of surrounding objects or objects, etc., two types of abnormal phenomena exist in files generated by the existing portrait filing technology: firstly, an error file composed of a plurality of target personnel exists; and secondly, files formed by non-living objects such as billboards and the like. The existence of the abnormal files not only has different degrees of influence on the quality of the portrait filing data, but also can influence the actual combat effects of various subsequent technical and tactics based on the portrait filing data. At present, more effective abnormal portrait file detection methods are not proposed, more detection schemes based on space-time contradiction rules are adopted, and the detection rate of the schemes is not high and the accuracy is low.
In the prior art, most of judgment of abnormal portrait files is to simply screen out suspected abnormal files by adopting a space-time contradiction rule, and then calculate the similarity of picture characteristics of two pictures with space-time contradiction so as to further verify whether the files are abnormal. The space-time contradiction is generally characterized based on the characteristic that the moving distance is too large in a short time, and the characteristic belongs to stronger abnormal characterization and often has the defect of low detection rate. The prior art has the improvement of the characterization of the contradictory characteristics of time and space, the connectivity map of the acquisition equipment is constructed through the track data of the full-quantity portrait file, the communication probability between two equipment is characterized by the edges on the map, and the original moving distance is replaced by the lower communication probability. The method has stronger expressive force, but the scheme has the defect of single dimension consideration in practice, so that the detection rate of abnormal face files is lower.
In view of the above, it is very significant to provide a method and a system for detecting abnormal portrait files.
Disclosure of Invention
In order to solve the existing problems, the invention provides a detection method and a detection system for abnormal portrait files, which are characterized by multiple dimensions, and the characteristics of the files are subjected to fusion analysis by adopting a machine learning model, so that the abnormal files are detected more, the abnormal files in a file repository are effectively identified, and the detection rate is improved, so that the technical defect problems are solved.
In a first aspect, the present invention provides a method for detecting an abnormal portrait file, including the following steps:
responding to the acquisition of the archive information;
the file features are characterized and extracted through a feature engineering module;
predicting by adopting a pre-trained machine learning model to give a probability value that the file belongs to an abnormality;
and determining the detected abnormal file.
Preferably, the acquiring file information includes acquiring a file in an offline mode through a preset selection strategy, and the file acquisition content includes picture space-time information, face attribute identification content and position coordinate information of a portrait small drawing detection frame in a background drawing, wherein the preset selection strategy specifically includes:
selecting files with larger picture numbers;
selecting files with lower file cohesion;
firstly, selecting files which are not subjected to abnormal detection, and then selecting files which are detected normally last time but have earliest detection time.
Further preferably, the characterizing and extracting the archive feature by the feature engineering module includes classifying the archive feature into six categories, and specifically includes:
basic attribute distribution of portrait pictures, including but not limited to the number of archives pictures, the number of days when the last appearance time is from the current day, age group distribution of face attribute recognition, gender distribution and archives picture average quality and variance;
the time rule is collected, and the time of continuously capturing abnormally high time is counted: the time difference between the current track point and the rear track point is considered to be continuous within a threshold value, and when the duration reaches a preset threshold value, the time difference is considered to be abnormal, and the index frequency is increased by 1;
the duration of the continuous snapshot is abnormally high and the number of times of occurrence of multiple devices: the threshold value is set to be lower, and the constraint that more equipment appears is increased;
collecting space rule, collecting equipment quantity: the method comprises the steps of performing duplicate removal counting on acquisition equipment of portrait pictures in files;
number of acquisition regions: the method comprises the steps of converting an acquisition device of a portrait picture in an archive into an acquisition region, then performing duplicate removal counting, wherein the acquisition region adopts a street where the device is positioned as a definition of the region or adopts a GeoHash address coding technology to code longitude and latitude of the device into character strings to perform region division;
number of active sites: the method comprises the steps that a street where equipment is located is defined, or a GeoHash address coding technology is adopted to convert longitude and latitude of the equipment into character strings to divide the places, and the number of snapshot days is adopted to meet a threshold for active definition;
active site dispersion: the diagonal distance of the minimum rectangle which can cover the active places is used for describing the dispersion degree of the active places;
the characteristics of the front and back track points are that pictures in the file are ordered according to the front and back of the acquisition time, and adjacent pairs of pictures are called front and back track points, and the frequency of short-time and ultra-large distances of the front and back track points is high: filtering the scenes of riding automobiles and subways or independently calculating the times of the scenes by the position type of the acquisition equipment;
low frequency of front and back trace point device connectivity: constructing an equipment communication map according to front and back track points of all the portrait files in the period of one month or longer, and if the equipment of the front and back track points is different and the time difference is within a shorter threshold value, calculating 1 communication of two pieces of equipment corresponding to the front and back track points;
simultaneously calculating the number of track points collected by each device;
connectivity of the last two devices is defined as the number of connectivity/(average number of track points of the two devices or larger number of track points); the device connectivity map is updated periodically, such as once a week;
if the time difference between the front track point and the rear track point is within the threshold value and the connectivity of the equipment on the communication map is lower than the threshold value, the connectivity times of the front track point and the rear track point of the archive are increased by 1 time;
feature similarity between portrait pictures, file cohesiveness is the average value of feature vector similarity of any two pictures in a file: the file center feature is obtained by averaging feature vector quantities of a plurality of portrait pictures, and the file center feature vector is recorded as x= (x) 1 ,x 2 ,…,x n ) The cohesiveness can be converted into the inner product of x and x after being simplified by a mathematical formula, namely
The frequency of low similarity of the picture characteristics corresponding to the front and back track points with space-time contradiction exists: when the front track point and the rear track point meet the short-time large distance or low connectivity in the characteristics of the front track point and the rear track point, calculating the characteristic similarity of the corresponding picture, and if the similarity is lower than a threshold value, increasing the index frequency by 1;
the position change rule of the portrait detection frame in the background image, and the relative position of the portrait detection frame is unchanged, so that the number of devices is: the static target is characterized by utilizing the characteristic that the position coordinates of the portrait small drawing detection frame on the background drawing are almost unchanged, and the static target is characterized by IOU indexes:
wherein, molecule S Traffic intersection Representing the intersection of all detection frames, denominator S And is combined with And representing the union of all detection frames, calculating an IOU index from the portrait small image belonging to the same equipment in the file, and when the IOU is large, considering that the face of the file acquired by the equipment is almost unchanged, and adding 1 to the index.
Further preferably, predicting by using a pre-trained machine learning model, and giving the probability value that the profile belongs to the abnormality includes:
and (3) inputting the file features extracted by the feature engineering module by adopting a pre-trained machine learning model, outputting the abnormal probability value of the file, and identifying the file as an abnormal file if the probability value reaches a preset threshold value.
Further preferably, the pre-trained machine learning model uses an ensemble learning model, such as random forests, XGBoost, and LightGBM, to build the classification model.
Further preferably, when the preset threshold is trained offline according to the machine learning model, determining an effect on the verification set, and recommending a probability threshold with which F0.5 or F1 reaches the maximum as the preset threshold under the constraint that the accuracy rate is not lower than 99%;
constructing an abnormal file by combining the two normal files; and selecting two normal files, adopting a mode of searching files, and taking the non-identical target file with highest similarity in the search result as a merging object.
Further preferably, the method further comprises:
after the file with the confidence coefficient passing through the preset threshold value is identified as an abnormal file, the file is deleted or logically deleted, or the file is submitted to manual verification in a push early warning mode, verification results are collected and fed back to a model training link, so that the model optimization iteration purpose is achieved.
In a second aspect, an embodiment of the present invention further provides a system for detecting an abnormal portrait file, including:
the acquisition module is configured to acquire archive information;
the characteristic engineering module is configured to describe and extract file characteristics through the characteristic engineering module;
the prediction module is configured to predict by adopting a pre-trained machine learning model and give a probability value that the file belongs to an abnormality;
and the abnormal file determining module is configured to determine the detected abnormal file.
In a third aspect, an embodiment of the present invention provides an electronic device, including: one or more processors; and storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method as described in any of the implementations of the first aspect.
In a fourth aspect, embodiments of the present invention provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as described in any of the implementations of the first aspect.
Compared with the prior art, the invention has the beneficial effects that:
(1) The invention provides a multidimensional describing method and a multidimensional describing system for the abnormal characteristics of a portrait file, which adopts a machine learning method to fuse each dimension information and calculate the probability value of the file belonging to the abnormal file, thereby realizing the detection of the abnormal file and having higher detection rate.
(2) Through describing the abnormal characteristics of the file in a multi-dimensional way, adopting a machine learning model to fuse the multi-dimensional abnormal characteristics and outputting a probability value of whether the file is abnormal or not, the abnormal file can be detected more accurately, and the detection rate is higher, so that the performance of a portrait file gathering system is improved and the quality of file gathering data is improved; further optimizing effects of technical and tactical methods based on archive trajectories, such as association analysis, trajectory analysis, footage analysis, urban population perception, analysis and judgment under a digital map fusion scene, and the like.
Drawings
The accompanying drawings are included to provide a further understanding of the embodiments and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments and together with the description serve to explain the principles of the invention. Many of the intended advantages of other embodiments and embodiments will be readily appreciated as they become better understood by reference to the following detailed description. The elements of the drawings are not necessarily to scale relative to each other. Like reference numerals designate corresponding similar parts.
FIG. 1 is an exemplary device frame pattern to which an embodiment of the present invention may be applied;
FIG. 2 is a flowchart illustrating a method for detecting an abnormal portrait file according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a detection system for abnormal portrait files according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a computer apparatus suitable for use in implementing an embodiment of the invention.
Detailed Description
FIG. 1 illustrates an exemplary system architecture 100 of a method for detecting an abnormal portrait file or a system for detecting an abnormal portrait file to which embodiments of the present invention may be applied.
As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be various electronic devices with communication capabilities including, but not limited to, smartphones, tablet computers, laptop and desktop computers, and the like.
The server 105 may be a server that provides various services, such as a background information processing server that processes verification request information transmitted by the terminal devices 101, 102, 103. The background information processing server can analyze and other processes on the received verification request information and obtain a processing result.
It should be noted that, the method for detecting an abnormal portrait file according to the embodiment of the present invention is generally executed by the server 105, and accordingly, the system for detecting an abnormal portrait file is generally disposed in the server 105. In addition, the method for detecting an abnormal portrait file provided by the embodiment of the present invention is generally executed by the terminal devices 101, 102, 103, and accordingly, the system for detecting an abnormal portrait file is generally disposed in the terminal devices 101, 102, 103.
The server may be hardware or software. When the server is hardware, the server may be implemented as a distributed server cluster formed by a plurality of servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules (for example, to provide a distributed service), or may be implemented as a single software or a plurality of software modules, which are not specifically limited herein.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. In the case where the processed data does not need to be acquired from a remote location, the above-described apparatus architecture may not include a network, but only a server or terminal device.
In the prior art, most of judgment of abnormal portrait files is to simply screen out suspected abnormal files by adopting a space-time contradiction rule, and then calculate the similarity of picture characteristics of two pictures with space-time contradiction so as to further verify whether the files are abnormal. The space-time contradiction is generally characterized based on the characteristic that the moving distance is too large in a short time, and the characteristic belongs to stronger abnormal characterization and often has the defect of low detection rate.
The prior art has the improvement of the characterization of the contradictory characteristics of time and space, the connectivity map of the acquisition equipment is constructed through the track data of the full-quantity portrait file, the communication probability between two equipment is characterized by the edges on the map, and the original moving distance is replaced by the lower communication probability. The method has stronger expressive force, but the scheme has the defect of single dimension consideration in practice, so that the detection rate of abnormal face files is lower.
The invention is improved in two aspects based on the above steps: the first step is to describe the files from more dimensions, including but not limited to basic attribute distribution of portrait pictures, acquisition time law, acquisition space law, characteristics of front and back track points, feature similarity among portrait pictures, position change law of portrait pictures in a background picture and the like. And secondly, training a decision model by adopting a machine learning method, carrying out fusion analysis on the multidimensional features, and finally giving a decision whether the file is abnormal or not.
In order to effectively identify abnormal files in the archive, the detection rate is improved. According to the invention, archive features are characterized from multiple dimensions, and machine learning models are adopted to perform fusion analysis on the features of each dimension, so that abnormal archives are detected more. The method comprises the steps of acquiring file information, extracting file features, particularly abnormal features, from a plurality of dimensions, and predicting probability values of files belonging to the abnormal files by fusing the multi-dimensional features through a pre-trained machine learning model so as to detect the abnormal files.
In a first aspect, an embodiment of the present invention discloses a method for detecting an abnormal portrait file, as shown in fig. 2, the method includes the following steps:
s1, responding to file information acquisition;
in particular, in the aspect of acquiring files, in this embodiment, due to limited computing resources, the file is acquired periodically in an offline manner, which may involve a problem of file selection sequence. In this embodiment, a selection policy is provided: (1) selecting files with larger picture numbers; (2) selecting files with lower file cohesion; (3) firstly, selecting files which are not subjected to abnormal detection, and then selecting files which are detected normally last time but have earliest detection time.
The file acquisition content includes picture space-time information, face attribute identification content, position coordinate information of a portrait small-image detection frame on a background image, and the like.
S2, describing and extracting file features through a feature engineering module;
in this embodiment, 6 types of archive feature descriptions with operability are provided, including basic attribute distribution of portrait pictures, acquisition time law, acquisition space law, characteristics of front and rear track points, feature similarity among portrait pictures, and position change law of portrait detection frames in background pictures.
Specifically, the method comprises the following steps:
basic attribute distribution of portrait pictures includes, but is not limited to, the number of archival pictures, the number of days from the last time of appearance, age-group distribution of face attribute identification, gender distribution, archival picture average quality and variance.
The time rule is collected, and the time of continuously capturing abnormally high time is counted: the time difference between the current track point and the rear track point is within a threshold, and in the embodiment, the threshold can be a level of 1 hour, if the duration reaches a preset threshold, the index is considered to be abnormal, and the index number is increased by 1;
the duration of the continuous snapshot is abnormally high and the number of times of occurrence of multiple devices: and compared with the abnormal high duration of continuous snapshot, the threshold value can be lower, but more constraints of equipment are required to be increased.
Collecting space rule, collecting equipment quantity: the method comprises the steps of performing duplicate removal counting on acquisition equipment of portrait pictures in files;
number of acquisition regions: and converting the acquisition equipment of the portrait pictures in the files into an acquisition area and then removing the duplicate count. The acquisition area can adopt the street where the equipment is located as the definition of the area, and the longitude and latitude of the equipment can be encoded into character strings by adopting a GeoHash address encoding technology to divide the area.
Number of active sites: the location can be defined by adopting a street in which the equipment is located or adopting a GeoHash address coding technology to convert longitude and latitude of the equipment into character strings for location division. The definition of activity may be partitioned using the snapshot days meeting a threshold. Since the number of active sites of an ordinary person is not too large, abnormality is often accompanied when the number of active sites is too large.
Active site dispersion: to characterize the dispersion of active sites, in this embodiment, one approach is to characterize the diagonal distance of the smallest rectangle that can cover the active sites.
Characteristics of the front and rear track points, meaning of the front and rear track points: the pictures in the file are ordered according to the front and back of the acquisition time, and the adjacent picture pairs are called front and back track points.
Frequency of short-time oversized distance of front and back track points: generally, since the speed of the movement of a person is limited, if the time difference between the track of the front and rear points is short, but the latitude and longitude distance is too large, the movement speed of the person obviously exceeds the upper limit of the normal movement speed of the target, which often means that an abnormal risk exists. In practice, because the moving speed is high due to the traffic modes such as car riding and subway riding, when the frequency of short and overlarge distance between the front track point and the rear track point is calculated, the scenes such as car riding and subway riding can be filtered through the position type of the acquisition equipment or the times of the scenes can be calculated independently;
low frequency of front and back trace point device connectivity: here, it is necessary to construct an apparatus communication map in advance. Constructing according to front and back track points of a month (a longer period of time) of all the portrait files, and if the devices of the front and back track points are different and the time difference is within a shorter threshold value, calculating 1-time communication of two devices corresponding to the front and back track points;
meanwhile, the number of track points collected by each device can be calculated; finally, connectivity of two devices can be defined as:
number of connections/(average number of trace points of two devices or larger number of trace points
The device connectivity profile may be updated on a periodic basis, such as once a week. With the device connectivity profile, if the time difference between the front and back trace points is within the threshold value and the connectivity of the device on the connectivity profile is lower than the threshold value, the number of connectivity times of the device on the front and back trace points of the profile is increased by 1 time.
Feature similarity between portrait pictures, file cohesiveness (average value of feature vector similarity of any two pictures in a file): the characteristic of the file center can be flattened by the characteristic vector of a plurality of portrait picturesAll obtained, the record center feature vector is x= (x) 1 ,x 2 ,…,x n ) The cohesiveness can be converted into the inner product of x and x after being simplified by a mathematical formula, namely
The frequency of low similarity of the picture characteristics corresponding to the front and back track points with space-time contradiction exists: when the front track point and the rear track point meet the short-time large distance or low connectivity in the characteristics of the front track point and the rear track point, calculating the characteristic similarity of the corresponding pictures, and if the similarity is lower than a threshold value, increasing the index times by 1.
The position change rule of the portrait detection frame in the background image, and the relative position of the portrait detection frame is unchanged, so that the number of devices is: static targets such as billboards often have a fixed-position attribute, and in order to characterize anomalies thereof, the feature that the position coordinates of the portrait small-image detection frame on the background image are almost unchanged can be used for characterization. In this embodiment, the IOU index in the target detection field is borrowed to describe:
wherein, molecule S Traffic intersection Is the intersection of all detection frames, denominator S And is combined with Is the union of all the detection frames.
And calculating an IOU index from the portrait small image which belongs to the same equipment in the file, wherein when the IOU is large, the face of the file which is acquired by the equipment can be considered to be almost unchanged, and 1 is added to the file on the index.
S3, predicting by adopting a pre-trained machine learning model, and giving out a probability value that the file belongs to an abnormality;
specifically, a machine learning model trained in advance is adopted, file features extracted by a feature engineering module are input, a probability value of file abnormality is output, and if the probability value reaches a preset threshold value, the file is identified as an abnormal file.
The machine learning model trained in advance can be an integrated learning model, such as random forest, XGBoost, lightGBM and the like, to establish a classification model.
In this embodiment, the preset threshold may be determined according to the effect on the verification set when the machine learning model is trained offline: the accuracy rate is more emphasized in the actual scene, and the recall rate is required to be improved as much as possible under the condition of higher accuracy rate, so that the probability threshold value for enabling F0.5 or F1 to reach the maximum is required to be recommended as a preset threshold value under the constraint that the accuracy rate is not lower than 99%.
Factors affecting the model effect are in addition to the feature engineering, and also the sample quality. In practice, manually labeling files is often time-consuming and it is relatively difficult to obtain an abnormal file. The exception archive may be constructed by merging the two normal archives at this time. In addition, the two selected normal files can also adopt a file searching mode, and the non-identical target file with the highest similarity in the search result is used as a merging object, so that the constructed sample has higher quality and is more beneficial to model training.
S4, determining the detected abnormal file.
Specifically, after the file with the confidence coefficient passing through the preset threshold value is identified as an abnormal file, the file can be deleted, or can be logically deleted, or a push early warning mode is adopted, the file is submitted to manual verification, verification results are collected, and the file is fed back to a model training link to achieve the aim of model optimization iteration.
In a second aspect, an embodiment of the present invention further discloses a system for detecting an abnormal portrait file, as shown in fig. 3, including: the system comprises an acquisition module 31, a characteristic engineering module 32, a prediction module 33 and an abnormal archive determination module 34.
In a specific embodiment, the obtaining module 31 is configured to obtain archive information; a feature engineering module 32 configured to characterize and extract archive features by the feature engineering module; a prediction module 33 configured to perform prediction by using a pre-trained machine learning model, and give a probability value that the archive belongs to an anomaly; the abnormal profile determination module 34 is configured to determine the detected abnormal profile.
The invention provides a multidimensional describing method and a multidimensional describing system for the abnormal characteristics of a portrait file, which adopts a machine learning method to fuse each dimension information and calculate the probability value of the file belonging to the abnormal file, thereby realizing the detection of the abnormal file and having higher detection rate;
through describing the abnormal characteristics of the file in a multi-dimensional way, adopting a machine learning model to fuse the multi-dimensional abnormal characteristics and outputting a probability value of whether the file is abnormal or not, the abnormal file can be detected more accurately, and the detection rate is higher, so that the performance of a portrait file gathering system is improved and the quality of file gathering data is improved; further optimizing effects of technical and tactical methods based on archive trajectories, such as association analysis, trajectory analysis, footage analysis, urban population perception, analysis and judgment under a digital map fusion scene, and the like.
Referring now to FIG. 4, there is illustrated a schematic diagram of a computer apparatus 600 suitable for use in an electronic device (e.g., the server or terminal device illustrated in FIG. 1) for implementing an embodiment of the present invention. The electronic device shown in fig. 4 is only an example and should not be construed as limiting the functionality and scope of use of the embodiments of the invention.
As shown in fig. 4, the computer apparatus 600 includes a Central Processing Unit (CPU) 601 and a Graphics Processor (GPU) 602, which can perform various appropriate actions and processes according to programs stored in a Read Only Memory (ROM) 603 or programs loaded from a storage section 609 into a Random Access Memory (RAM) 604. In the RAM 604, various programs and data required for the operation of the apparatus 600 are also stored. The CPU 601, GPU602, ROM 603, and RAM 604 are connected to each other through a bus 605. An input/output (I/O) interface 606 is also connected to the bus 605.
The following components are connected to the I/O interface 606: an input portion 607 including a keyboard, a mouse, and the like; an output portion 608 including a speaker, such as a Liquid Crystal Display (LCD), etc.; a storage portion 609 including a hard disk and the like; and a communication section 610 including a network interface card such as a LAN card, a modem, or the like. The communication section 610 performs communication processing via a network such as the internet. The drive 611 may also be connected to the I/O interface 606 as needed. A removable medium 612 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 611 as necessary, so that a computer program read out therefrom is mounted into the storage section 609 as necessary.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication portion 610, and/or installed from the removable medium 612. The above-described functions defined in the method of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 601 and a Graphics Processor (GPU) 602.
It should be noted that the computer readable medium according to the present invention may be a computer readable signal medium or a computer readable medium, or any combination of the two. The computer readable medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor apparatus, device, or means, or a combination of any of the foregoing. More specific examples of the computer-readable medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution apparatus, device, or apparatus. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may be any computer readable medium that is not a computer readable medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution apparatus, device, or apparatus. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based devices which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described modules may also be provided in a processor.
As another aspect, the present invention also provides a computer-readable medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: responding to the acquisition of the archive information; the file features are characterized and extracted through a feature engineering module; predicting by adopting a pre-trained machine learning model to give a probability value that the file belongs to an abnormality; and determining the detected abnormal file.
The above description is only illustrative of the preferred embodiments of the present invention and of the principles of the technology employed. It will be appreciated by persons skilled in the art that the scope of the invention referred to in the present invention is not limited to the specific combinations of the technical features described above, but also covers other technical features formed by any combination of the technical features described above or their equivalents without departing from the inventive concept described above. Such as the above-mentioned features and the technical features disclosed in the present invention (but not limited to) having similar functions are replaced with each other.

Claims (10)

1. The method for detecting the abnormal portrait file is characterized by comprising the following steps:
responding to the acquisition of the archive information;
the file features are characterized and extracted through a feature engineering module;
predicting by adopting a pre-trained machine learning model to give a probability value that the file belongs to an abnormality;
and determining the detected abnormal file.
2. The method for detecting an abnormal portrait file according to claim 1, wherein acquiring file information includes acquiring file periodically in an offline manner through a preset selection policy, wherein file acquisition content includes picture space-time information, face attribute identification content, and position coordinate information of a portrait small picture detection frame in a background picture, and the preset selection policy specifically includes:
selecting files with larger picture numbers;
selecting files with lower file cohesion;
firstly, selecting files which are not subjected to abnormal detection, and then selecting files which are detected normally last time but have earliest detection time.
3. The method of claim 2, wherein the characterizing and extracting the profile features by the feature engineering module includes classifying the profile features into six categories, and the method comprises:
basic attribute distribution of portrait pictures, including but not limited to the number of archives pictures, the number of days when the last appearance time is from the current day, age group distribution of face attribute recognition, gender distribution and archives picture average quality and variance;
the time rule is collected, and the time of continuously capturing abnormally high time is counted: the time difference between the current track point and the rear track point is considered to be continuous within a threshold value, and when the duration reaches a preset threshold value, the time difference is considered to be abnormal, and the index frequency is increased by 1;
the duration of the continuous snapshot is abnormally high and the number of times of occurrence of multiple devices: the threshold value is set to be lower, and the constraint that more equipment appears is increased;
collecting space rule, collecting equipment quantity: the method comprises the steps of performing duplicate removal counting on acquisition equipment of portrait pictures in files;
number of acquisition regions: the method comprises the steps of converting an acquisition device of a portrait picture in an archive into an acquisition region, then performing duplicate removal counting, wherein the acquisition region adopts a street where the device is positioned as a definition of the region or adopts a GeoHash address coding technology to code longitude and latitude of the device into character strings to perform region division;
number of active sites: the method comprises the steps that a street where equipment is located is defined, or a GeoHash address coding technology is adopted to convert longitude and latitude of the equipment into character strings to divide the places, and the number of snapshot days is adopted to meet a threshold for active definition;
active site dispersion: the diagonal distance of the minimum rectangle which can cover the active places is used for describing the dispersion degree of the active places;
the characteristics of the front and back track points are that pictures in the file are ordered according to the front and back of the acquisition time, and adjacent pairs of pictures are called front and back track points, and the frequency of short-time and ultra-large distances of the front and back track points is high: filtering the scenes of riding automobiles and subways or independently calculating the times of the scenes by the position type of the acquisition equipment;
low frequency of front and back trace point device connectivity: constructing an equipment communication map according to front and back track points of all the portrait files in the period of one month or longer, and if the equipment of the front and back track points is different and the time difference is within a shorter threshold value, calculating 1 communication of two pieces of equipment corresponding to the front and back track points;
simultaneously calculating the number of track points collected by each device;
connectivity of the last two devices is defined as the number of connectivity/(average number of track points of the two devices or larger number of track points); the device connectivity map is updated periodically, such as once a week;
if the time difference between the front track point and the rear track point is within the threshold value and the connectivity of the equipment on the communication map is lower than the threshold value, the connectivity times of the front track point and the rear track point of the archive are increased by 1 time;
feature similarity between portrait pictures, file cohesiveness is the average value of feature vector similarity of any two pictures in a file: the file center feature is obtained by averaging feature vector quantities of a plurality of portrait pictures, and the file center feature vector is recorded as x= (x) 1 ,x 2 ,…,x n ) The cohesiveness can be converted into the inner product of x and x after being simplified by a mathematical formula, namely
The frequency of low similarity of the picture characteristics corresponding to the front and back track points with space-time contradiction exists: when the front track point and the rear track point meet the short-time large distance or low connectivity in the characteristics of the front track point and the rear track point, calculating the characteristic similarity of the corresponding picture, and if the similarity is lower than a threshold value, increasing the index frequency by 1;
the position change rule of the portrait detection frame in the background image, and the relative position of the portrait detection frame is unchanged, so that the number of devices is: the static target is characterized by utilizing the characteristic that the position coordinates of the portrait small drawing detection frame on the background drawing are almost unchanged, and the static target is characterized by IOU indexes:
wherein, molecule S Traffic intersection Representing the intersection of all detection frames, denominator S And is combined with And representing the union of all detection frames, calculating an IOU index from the portrait small image belonging to the same equipment in the file, and when the IOU is large, considering that the face of the file acquired by the equipment is almost unchanged, and adding 1 to the index.
4. A method of detecting an abnormal portrait archive according to claim 3, wherein predicting using a pre-trained machine learning model, giving a probability value that the archive belongs to an abnormality includes:
and (3) inputting the file features extracted by the feature engineering module by adopting a pre-trained machine learning model, outputting the abnormal probability value of the file, and identifying the file as an abnormal file if the probability value reaches a preset threshold value.
5. The method of claim 4, wherein the pre-trained machine learning model uses an ensemble learning model such as random forest, XGBoost and LightGBM to build classification models.
6. The method for detecting an abnormal portrait file according to claim 5, wherein when a preset threshold is determined according to the effect on a verification set during offline training of a machine learning model, under the constraint that the accuracy rate is not lower than 99%, a probability threshold that F0.5 or F1 reaches the maximum is recommended as the preset threshold;
constructing an abnormal file by combining the two normal files; and selecting two normal files, adopting a mode of searching files, and taking the non-identical target file with highest similarity in the search result as a merging object.
7. The method for detecting an abnormal portrait file according to claim 6 further comprising:
after the file with the confidence coefficient passing through the preset threshold value is identified as an abnormal file, the file is deleted or logically deleted, or the file is submitted to manual verification in a push early warning mode, verification results are collected and fed back to a model training link, so that the model optimization iteration purpose is achieved.
8. A system for detecting an abnormal portrait file, comprising:
the acquisition module is configured to acquire archive information;
the characteristic engineering module is configured to describe and extract file characteristics through the characteristic engineering module;
the prediction module is configured to predict by adopting a pre-trained machine learning model and give a probability value that the file belongs to an abnormality;
and the abnormal file determining module is configured to determine the detected abnormal file.
9. An electronic device, comprising:
one or more processors;
a storage means for storing one or more programs;
when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1 to 7.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any one of claims 1 to 7.
CN202311450400.8A 2023-11-02 2023-11-02 Abnormal portrait file detection method and system Pending CN117671440A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311450400.8A CN117671440A (en) 2023-11-02 2023-11-02 Abnormal portrait file detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311450400.8A CN117671440A (en) 2023-11-02 2023-11-02 Abnormal portrait file detection method and system

Publications (1)

Publication Number Publication Date
CN117671440A true CN117671440A (en) 2024-03-08

Family

ID=90077917

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311450400.8A Pending CN117671440A (en) 2023-11-02 2023-11-02 Abnormal portrait file detection method and system

Country Status (1)

Country Link
CN (1) CN117671440A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117931738A (en) * 2024-03-21 2024-04-26 南京启数智能系统有限公司 Portrait file track treatment method and system based on road network reachability
CN117931738B (en) * 2024-03-21 2024-06-07 南京启数智能系统有限公司 Portrait file track treatment method and system based on road network reachability

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117931738A (en) * 2024-03-21 2024-04-26 南京启数智能系统有限公司 Portrait file track treatment method and system based on road network reachability
CN117931738B (en) * 2024-03-21 2024-06-07 南京启数智能系统有限公司 Portrait file track treatment method and system based on road network reachability

Similar Documents

Publication Publication Date Title
CN111178523B (en) Behavior detection method and device, electronic equipment and storage medium
US8660368B2 (en) Anomalous pattern discovery
US10009579B2 (en) Method and system for counting people using depth sensor
CN113673459B (en) Video-based production and construction site safety inspection method, system and equipment
CN111612038B (en) Abnormal user detection method and device, storage medium and electronic equipment
CN111311136A (en) Wind control decision method, computer equipment and storage medium
CN110941978B (en) Face clustering method and device for unidentified personnel and storage medium
KR20090006397A (en) System and method for multi-stage filtering of malicious videos in video distribution environment
CN111709765A (en) User portrait scoring method and device and storage medium
KR102359090B1 (en) Method and System for Real-time Abnormal Insider Event Detection on Enterprise Resource Planning System
CN113570396A (en) Time series data abnormity detection method, device, equipment and storage medium
CN110929799A (en) Method, electronic device, and computer-readable medium for detecting abnormal user
CN112434178A (en) Image classification method and device, electronic equipment and storage medium
KR20220146670A (en) Traffic anomaly detection methods, devices, devices, storage media and programs
CN111405475A (en) Multidimensional sensing data collision fusion analysis method and device
CN110895811B (en) Image tampering detection method and device
CN110533094B (en) Evaluation method and system for driver
CN114519879A (en) Human body data archiving method, device, equipment and storage medium
CN110969645A (en) Unsupervised abnormal track detection method and unsupervised abnormal track detection device for crowded scenes
CN113254761A (en) Intelligent early warning system and method for specific behavior information
KR102230559B1 (en) Method and Apparatus for Creating Labeling Model with Data Programming
CN115223022B (en) Image processing method, device, storage medium and equipment
CN113746780A (en) Abnormal host detection method, device, medium and equipment based on host image
CN116959099A (en) Abnormal behavior identification method based on space-time diagram convolutional neural network
CN117671440A (en) Abnormal portrait file detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination