CN113095342A - Audit model optimization method and device based on misjudged sample picture and server - Google Patents
Audit model optimization method and device based on misjudged sample picture and server Download PDFInfo
- Publication number
- CN113095342A CN113095342A CN201911340682.XA CN201911340682A CN113095342A CN 113095342 A CN113095342 A CN 113095342A CN 201911340682 A CN201911340682 A CN 201911340682A CN 113095342 A CN113095342 A CN 113095342A
- Authority
- CN
- China
- Prior art keywords
- feature vector
- category
- feature
- misjudgment
- optimized
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012550 audit Methods 0.000 title claims abstract description 59
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000005457 optimization Methods 0.000 title claims abstract description 15
- 239000013598 vector Substances 0.000 claims abstract description 122
- 238000012549 training Methods 0.000 claims abstract description 53
- 238000011156 evaluation Methods 0.000 claims description 17
- 238000003860 storage Methods 0.000 claims description 16
- 238000001914 filtration Methods 0.000 claims description 6
- 238000010586 diagram Methods 0.000 description 17
- 238000012545 processing Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 7
- 238000004590 computer program Methods 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000001902 propagating effect Effects 0.000 description 2
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/231—Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Library & Information Science (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the invention provides an auditing model optimization method and device based on misjudged sample pictures and a server. The method comprises the following steps: acquiring a feature vector of a misjudgment sample picture; clustering the feature vectors of the misjudgment sample pictures to obtain feature vector data of N categories; acquiring N groups of corresponding training data sets from a feature database according to the feature vector data of the N categories; performing parallel training on an audit model to be optimized by utilizing the N groups of training data sets; and determining an optimal audit model from the trained audit models to be optimized. According to the embodiment of the invention, the automatic optimization training of the auditing model based on the misjudged sample picture can be realized, so that the emergency of online content auditing can be responded in time, and the labor cost and the time cost can be reduced.
Description
Technical Field
The present invention relates to the field of computer vision recognition technologies, and in particular, to a method for optimizing an audit model based on a misjudged sample (badcase) picture, an apparatus for optimizing an audit model based on a misjudged sample picture, a server, and a computer-readable storage medium.
Background
With the rapid development of internet technology, video resources on the network are more and more. A great deal of video fishes and dragons are mixed, for example, some video clips contain contents such as pornography and severe violence, or some videos have copyright problems. In order to be able to filter these videos containing objectionable content, the content of the video needs to be reviewed.
When content is audited, an audit model is usually used for auditing the picture to judge whether illegal and illegal contents exist, but the audit model can have the condition of misjudgment or missed judgment, and at this time, picture data of misjudgment samples fed back by a user can be received. It is important how to analyze the picture data of the misjudged sample to perform targeted optimization on the audit model.
At present, analysis of misjudged samples mainly depends on manual review by operators, the operators need to summarize picture categories of the misjudged samples, then take pictures of specific categories of mobile phones off line, and manually mark the pictures to generate a training set, so that targeted training of an audit model is realized.
However, the method for optimizing the audit model based on the analysis of the misjudged samples is still in a strong manual processing stage, high labor cost and time cost are consumed, the development period is long, and the emergency situation of online content audit cannot be responded in time. Therefore, there is a need to provide a new method for automatic optimization training of a mis-judged sample-based audit model.
Disclosure of Invention
An object of the embodiments of the present invention is to provide a new technical solution for auditing model optimization based on misjudged sample pictures.
According to a first aspect of the present invention, there is provided a method for optimizing an audit model based on a misjudged sample picture, the method including:
acquiring a feature vector of a misjudgment sample picture;
clustering the feature vectors of the misjudgment sample pictures to obtain feature vector data of N categories;
acquiring N groups of corresponding training data sets from a feature database according to the feature vector data of the N categories;
performing parallel training on an audit model to be optimized by utilizing the N groups of training data sets;
and determining an optimal audit model from the trained audit models to be optimized.
Optionally, the obtaining the feature vector of the misjudgment sample picture includes:
and extracting the characteristics of each misjudgment sample picture to obtain a characteristic vector corresponding to each misjudgment sample picture.
Optionally, a hierarchical clustering method is adopted to cluster the feature vectors of the misjudged sample pictures.
Optionally, the clustering the feature vectors of the misjudgment sample pictures to obtain feature vector data of N categories includes:
clustering the characteristic vectors of the misjudgment sample pictures to obtain clustering results of N categories of different levels;
calculating the median of the feature vectors of the misjudgment sample pictures in each category to obtain category feature vectors corresponding to each category;
determining the label with the largest number of labels in each category as a category label corresponding to each category;
and respectively determining the N category feature vectors and the corresponding category labels as the feature vector data of the categories.
Optionally, the obtaining, according to the feature vector data of the N categories, N corresponding sets of training data sets from a feature database includes:
for each category of feature vector data, matching feature vectors in the category of feature vector data with feature vectors in the feature database to obtain a plurality of image data meeting a preset similarity threshold;
and enabling each picture data to respectively form training data in a picture-class format with the class in the feature vector data of the class, and obtaining a training data set corresponding to the feature vector data of the class.
Optionally, the determining an optimal audit model from the trained audit models to be optimized includes:
running N trained auditing models to be optimized in parallel;
obtaining an evaluation index value of each trained to-be-optimized auditing model; the evaluation index value comprises the average value of recall rate, accuracy rate and F1 value; the F1 value is a harmonic mean of the recall rate and the accuracy rate;
and selecting the trained auditing model to be optimized corresponding to the optimal evaluation index value to determine the trained auditing model to be the optimal auditing model.
Optionally, before the obtaining the feature vector of the misjudged sample picture, the method further includes:
and filtering repeated misjudgment sample pictures according to the RGB value of each misjudgment sample picture.
According to a second aspect of the embodiments of the present invention, there is provided an apparatus for optimizing an audit model based on a misjudged sample picture, the apparatus including:
the acquisition module is used for acquiring the characteristic vector of the misjudged sample picture;
the clustering module is used for clustering the characteristic vectors of the misjudgment sample pictures to obtain characteristic vector data of N categories;
the matching module is used for acquiring N groups of corresponding training data sets from a feature database according to the feature vector data of the N categories;
the training module is used for performing parallel training on the to-be-optimized auditing model by utilizing the N groups of training data sets;
and the determining module is used for determining an optimal auditing model from the trained auditing models to be optimized.
Optionally, the obtaining module is specifically configured to:
and extracting the characteristics of each misjudgment sample picture to obtain a characteristic vector corresponding to each misjudgment sample picture.
Optionally, the clustering module is specifically configured to: and clustering the characteristic vectors of the misjudgment sample pictures by adopting a hierarchical clustering method.
Optionally, the clustering module is specifically configured to:
clustering the characteristic vectors of the misjudgment sample pictures to obtain clustering results of N categories of different levels;
calculating the median of the feature vectors of the misjudgment sample pictures in each category to obtain category feature vectors corresponding to each category;
determining the label with the largest number of labels in each category as a category label corresponding to each category;
and respectively determining the N category feature vectors and the corresponding category labels as the feature vector data of the categories.
Optionally, the matching module is specifically configured to:
for each category of feature vector data, matching feature vectors in the category of feature vector data with feature vectors in the feature database to obtain a plurality of image data meeting a preset similarity threshold;
and enabling each picture data to respectively form training data in a picture-class format with the class in the feature vector data of the class, and obtaining a training data set corresponding to the feature vector data of the class.
Optionally, the determining module is specifically configured to:
running N trained auditing models to be optimized in parallel;
obtaining an evaluation index value of each trained to-be-optimized auditing model; the evaluation index value comprises the average value of recall rate, accuracy rate and F1 value; the F1 value is a harmonic mean of the recall rate and the accuracy rate;
and selecting the trained auditing model to be optimized corresponding to the optimal evaluation index value to determine the trained auditing model to be the optimal auditing model.
Optionally, the apparatus further includes a filtering module, configured to filter repeated misjudgment sample pictures according to the RGB values of each misjudgment sample picture.
According to a third aspect of the present invention, there is provided a server including the apparatus for optimizing an audit model based on a misjudged sample picture according to the second aspect of the present invention, or the server includes:
a memory for storing executable commands;
a processor, configured to execute the method for optimizing an audit model based on a misjudged sample picture according to any one of the first aspect of the present invention under the control of the executable command.
According to a fourth aspect of the present invention, there is provided a computer-readable storage medium storing executable instructions, which when executed by a processor, perform the method for optimizing an audit model based on a misjudged sample picture according to any one of the first aspect of the present invention.
According to one embodiment of the invention, automatic optimization training of the auditing model based on the misjudged sample picture can be realized, so that the emergency situation of online content auditing can be responded in time, and the labor cost and the time cost are reduced.
Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.
Fig. 1 is a schematic structural diagram of a server to which an audit model optimization method based on a misjudged sample picture according to an embodiment of the present invention may be applied;
FIG. 2 is a flowchart of an audit model optimization method based on misjudged sample pictures according to an embodiment of the present invention;
FIG. 3 shows a schematic diagram of hierarchical clustering in accordance with an embodiment of the present invention;
FIG. 4 shows a schematic flow diagram of an example according to an embodiment of the invention;
FIG. 5 is a schematic structural diagram of an apparatus for optimizing an audit model based on a misjudged sample picture according to an embodiment of the present invention;
FIG. 6 is a functional block diagram of a server according to an embodiment of the present invention.
Detailed Description
Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
< hardware configuration >
Fig. 1 is a block diagram showing a hardware configuration of a server 1000 that can implement an embodiment of the present invention.
Server 1000 may be, for example, a blade server or the like.
In one example, server 1000 may be a computer.
In another example, the server 1000 may be as shown in fig. 1, including a processor 1100, a memory 1200, an interface device 1300, a communication device 1400, a display device 1500, an input device 1600. Although the server may also include speakers, microphones, etc., these components are not relevant to the present invention and are omitted here.
The processor 1100 may be, for example, a central processing unit CPU, a microprocessor MCU, or the like. The memory 1200 includes, for example, a ROM (read only memory), a RAM (random access memory), a nonvolatile memory such as a hard disk, and the like. The interface device 1300 includes, for example, a USB interface, a serial interface, and the like. Communication device 1400 is capable of wired or wireless communication, for example. The display device 1500 is, for example, a liquid crystal display panel. The input device 1600 may include, for example, a touch screen, a keyboard, and the like.
The servers shown in fig. 1 are merely illustrative and are in no way meant to limit the invention, its application, or uses. In an embodiment of the present invention, the memory 1200 of the server 1000 is configured to store instructions for controlling the processor 1100 to operate so as to execute any method for optimizing an audit model based on a misjudged sample picture according to an embodiment of the present invention.
It should be understood by those skilled in the art that although a plurality of devices are shown for the server 1000 in fig. 1, the present invention may only relate to some of the devices, for example, only the processor 1100 and the storage device 1200 of the server 1000.
The skilled person can design the instructions according to the disclosed solution. How the instructions control the operation of the processor is well known in the art and will not be described in detail herein.
< method examples >
The embodiment provides an auditing model optimization method based on misjudged sample pictures, which may be implemented by a server, for example, the server may be the server 1000 shown in fig. 1.
As shown in FIG. 2, the method comprises the following steps 2100-2500:
The misjudgment sample picture refers to picture data of badcase fed back by a user when misjudgment or missed judgment occurs when the picture is audited by using the audit model.
Before this step, the server 1000 may obtain a misjudged sample picture fed back by the user, and store the misjudged sample picture in a data warehouse. In a content auditing scene, for example, a live broadcast scene, most pictures are repeated in a large amount, so that to reduce the computational load, the server 1000 cleans the misjudged sample pictures in the data warehouse and filters the repeated misjudged sample pictures before obtaining the feature vectors of the misjudged sample pictures. In one example, the server 1000 may filter the repeated erroneous determination sample pictures according to RGB (Red, Green, Blue, Red, Green, Blue) values of each erroneous determination sample picture.
After filtering out repeated misjudgment sample pictures, the server 1000 extracts the features of each misjudgment sample picture to obtain a feature vector corresponding to each misjudgment sample picture. For example, the corresponding feature vector may be extracted from the misjudgment sample picture by an artificial intelligence technique such as a neural network algorithm.
In the step, in order to reduce the process of manually knowing the data distribution in advance, a hierarchical clustering method which does not need to input hyperparameters such as the clustering number, the distance threshold value and the like is adopted to cluster the feature vectors of the misjudged sample pictures. In practical application, the hierarchical clustering method includes a bottom-up merging method and a top-down splitting method, and in this embodiment, a bottom-up merging method, such as a DBSCAN clustering algorithm, may be used to cluster feature vectors of misjudged sample pictures.
Specifically, the server 1000 clusters the feature vectors of the misjudged sample pictures by using a hierarchical clustering method to obtain clustering results of N classes of different levels, as shown in fig. 3, the feature vectors of the misjudged sample pictures are subjected to clustering analysis to obtain clustering results of level 1(N classes), level 2(m classes), … and level T (k classes), where N > m > k > 2.
After obtaining the clustering result, the server 1000 performs parallel computation on the clustering result of each layer, and calculates a median of the feature vectors of the misjudgment sample pictures in each category to obtain category feature vectors corresponding to each category; determining the label with the maximum number of labels in each category as a category label corresponding to each category; and respectively determining the N characteristic vectors of the category and the corresponding category labels as the characteristic vector data of the category. Note that the label is added when the user feeds back the erroneous judgment sample picture.
Specifically, for each feature vector data of the category, the server 1000 matches the feature vector in the feature vector data of the category with the feature vector in the feature database, for example, matches the distance similarity of the feature vectors, so as to obtain a plurality of image data meeting a preset similarity threshold; and enabling each picture data to respectively form training data in a picture-class format with the class in the feature vector data of the class, and obtaining a training data set corresponding to the feature vector data of the class. Namely, each hierarchical clustering result outputs a group of corresponding training data sets to wait for the training of the auditing model.
And 2400, performing parallel training on the to-be-optimized auditing model by using the N groups of training data sets.
Specifically, the N groups of training data sets obtained in the above steps are respectively input into the to-be-optimized audit model, and the to-be-optimized audit model is trained in parallel to obtain trained to-be-optimized audit models 1, 2, …, and N.
And 2500, determining an optimal audit model from the trained audit models to be optimized.
In this step, the server 1000 runs N trained to-be-optimized audit models in parallel; obtaining an evaluation index value of each trained to-be-optimized auditing model; the evaluation index value comprises the average value of the recall rate, the accuracy rate and the F1 value; the F1 value is a harmonic mean of the recall rate and the accuracy rate; and selecting the trained auditing model to be optimized corresponding to the optimal evaluation index value to determine the optimal auditing model, and deploying online.
< example >
Fig. 4 shows a schematic flow diagram of an example according to an embodiment of the invention.
As shown in fig. 4, the method for optimizing an audit model based on a misjudged sample picture in this example may include the following steps:
For example, the repeated erroneous judgment sample pictures may be filtered according to the RGB values of each of the erroneous judgment sample pictures.
And 4400, clustering the characteristic vectors of the misjudged sample pictures by adopting a hierarchical clustering method to obtain N types of clustering results of different levels.
Step 4600, match the feature vectors in the feature vector data of the category with the feature vectors in the feature database to obtain a plurality of image data meeting a preset similarity threshold.
And 4800, performing parallel training on the to-be-optimized audit model by using the N groups of training data sets.
4900 running N trained to-be-optimized audit models in parallel to obtain evaluation index values corresponding to the N trained to-be-optimized audit models, and determining an optimal audit model deployment online based on the evaluation index values.
The method for optimizing the audit model based on the misjudged sample picture according to the embodiment is described above with reference to the drawings and examples. The method of the embodiment obtains the characteristic vector of the misjudged sample picture; clustering the feature vectors of the misjudgment sample pictures to obtain feature vector data of N categories; acquiring N groups of corresponding training data sets from a feature database according to the feature vector data of the N categories; performing parallel training on the to-be-optimized audit model by using the N groups of training data sets; and determining an optimal audit model from the trained audit model to be optimized. According to the embodiment of the invention, the automatic optimization training of the auditing model based on the misjudged sample picture can be realized, so that the emergency of online content auditing can be responded in time, and the labor cost and the time cost can be reduced.
< apparatus embodiment >
The present embodiment provides an auditing model optimizing device based on misjudged sample pictures, which is, for example, the auditing model optimizing device 5000 based on misjudged sample pictures shown in fig. 5.
As shown in fig. 5, the apparatus 5000 for optimizing an audit model based on a misjudged sample picture may include: the device comprises an acquisition module 5100, a clustering module 5200, a matching module 5300, a training module 5400 and a determination module 5500.
The obtaining module 5100 is configured to obtain a feature vector of the misjudged sample picture.
The clustering module 5200 is configured to cluster the feature vectors of the misjudgment sample pictures to obtain feature vector data of N categories.
The matching module 5300 is configured to obtain N corresponding sets of training data sets from the feature database according to the feature vector data of the N categories.
The training module 5400 is configured to perform parallel training on the to-be-optimized audit model by using the N sets of training data sets.
The determining module 5500 is configured to determine an optimal audit model from the trained audit models to be optimized.
Specifically, the obtaining module 5100 may be configured to extract features of each misjudged sample picture to obtain a feature vector corresponding to each misjudged sample picture.
Optionally, the clustering module 5200 clusters the feature vectors of the misjudged sample pictures by using a hierarchical clustering method. Specifically, the clustering module 5200 can cluster the feature vectors of the misjudged sample pictures to obtain N types of clustering results of different levels; calculating the median of the feature vectors of the misjudgment sample picture in each category to obtain the category feature vectors corresponding to each category; determining the label with the largest number of labels in each category as a category label corresponding to each category; and respectively determining the N characteristic vectors of the category and the corresponding category labels as the characteristic vector data of the category.
In an example, the matching module 5300 may be specifically configured to, for each of the feature vector data of the category, match a feature vector in the feature vector data of the category with a feature vector in the feature database to obtain a plurality of image data meeting a preset similarity threshold; and enabling each picture data to respectively form training data in a picture-class format with the class in the feature vector data of the class, and obtaining a training data set corresponding to the feature vector data of the class.
In one example, the determining module 5500 is specifically configured to run N trained to-be-optimized audit models in parallel; obtaining an evaluation index value of each trained to-be-optimized auditing model; the evaluation index value comprises the average value of the recall rate, the accuracy rate and the F1 value; and selecting the trained to-be-optimized auditing model corresponding to the optimal evaluation index value to determine the model as the optimal auditing model.
Optionally, the device 5000 for optimizing the audit model based on the misjudged sample pictures may further include a filtering module, configured to filter repeated misjudged sample pictures according to RGB (Red, Green, Blue, Red, Green, and Blue) values of each of the misjudged sample pictures.
The device for optimizing the audit model based on the misjudged sample picture in this embodiment can be used for executing the technical scheme of the method embodiment, and the implementation principle and the technical effect are similar, and are not described herein again.
< apparatus embodiment >
In this embodiment, a server is further provided, where the server may include the auditing model optimizing device 5000 based on the misjudged sample picture described in the device embodiment of the present invention; alternatively, the server is a server 6000 shown in fig. 6, and includes:
a memory 6100 for storing executable commands.
A processor 6200, configured to perform a method described in any method embodiment of the present invention under control of an executable command stored in a memory 6100.
< computer-readable storage Medium embodiment >
The present embodiments provide a computer-readable storage medium having stored therein executable instructions that, when executed by a processor, perform the method described in any of the method embodiments of the present invention.
The present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.
The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.
The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.
The computer program instructions for carrying out operations of the present invention may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing an electronic circuit, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA), with state information of computer-readable program instructions, which can execute the computer-readable program instructions.
Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, implementation by software, and implementation by a combination of software and hardware are equivalent.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.
Claims (10)
1. An auditing model optimization method based on misjudgment sample pictures is characterized by comprising the following steps:
acquiring a feature vector of a misjudgment sample picture;
clustering the feature vectors of the misjudgment sample pictures to obtain feature vector data of N categories;
acquiring N groups of corresponding training data sets from a feature database according to the feature vector data of the N categories;
performing parallel training on an audit model to be optimized by utilizing the N groups of training data sets;
and determining an optimal audit model from the trained audit models to be optimized.
2. The method according to claim 1, wherein the obtaining the feature vector of the misjudged sample picture comprises:
and extracting the characteristics of each misjudgment sample picture to obtain a characteristic vector corresponding to each misjudgment sample picture.
3. The method according to claim 1, wherein the feature vectors of the misjudged sample pictures are clustered by a hierarchical clustering method.
4. The method according to claim 3, wherein the clustering the feature vectors of the misjudgment sample pictures to obtain feature vector data of N categories comprises:
clustering the characteristic vectors of the misjudgment sample pictures to obtain clustering results of N categories of different levels;
calculating the median of the feature vectors of the misjudgment sample pictures in each category to obtain category feature vectors corresponding to each category;
determining the label with the largest number of labels in each category as a category label corresponding to each category;
and respectively determining the N category feature vectors and the corresponding category labels as the feature vector data of the categories.
5. The method according to claim 1, wherein the obtaining of the corresponding N sets of training data sets from the feature database according to the N categories of feature vector data comprises:
for each category of feature vector data, matching feature vectors in the category of feature vector data with feature vectors in the feature database to obtain a plurality of image data meeting a preset similarity threshold;
and enabling each picture data to respectively form training data in a picture-class format with the class in the feature vector data of the class, and obtaining a training data set corresponding to the feature vector data of the class.
6. The method of claim 1, wherein the determining an optimal audit model from the trained audit models to be optimized comprises:
running N trained auditing models to be optimized in parallel;
obtaining an evaluation index value of each trained to-be-optimized auditing model; the evaluation index value comprises the average value of recall rate, accuracy rate and F1 value; the F1 value is a harmonic mean of the recall rate and the accuracy rate;
and selecting the trained auditing model to be optimized corresponding to the optimal evaluation index value to determine the trained auditing model to be the optimal auditing model.
7. The method according to claim 1, wherein before the obtaining the feature vector of the misjudged sample picture, the method further comprises:
and filtering repeated misjudgment sample pictures according to the RGB value of each misjudgment sample picture.
8. An apparatus for optimizing an audit model based on a misjudged sample picture, the apparatus comprising:
the acquisition module is used for acquiring the characteristic vector of the misjudged sample picture;
the clustering module is used for clustering the characteristic vectors of the misjudgment sample pictures to obtain characteristic vector data of N categories;
the matching module is used for acquiring N groups of corresponding training data sets from a feature database according to the feature vector data of the N categories;
the training module is used for performing parallel training on the to-be-optimized auditing model by utilizing the N groups of training data sets;
and the determining module is used for determining an optimal auditing model from the trained auditing models to be optimized.
9. A server comprising the apparatus for optimizing an audit model based on a misjudged sample picture according to claim 8, or comprising:
a memory for storing executable commands;
a processor for executing the method for auditing model optimization based on misjudgment sample pictures according to any one of claims 1-7 under the control of the executable command.
10. A computer-readable storage medium storing executable instructions that when executed by a processor perform the method for audit model optimization based on false positive sample pictures according to any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911340682.XA CN113095342B (en) | 2019-12-23 | 2019-12-23 | Audit model optimization method and device based on misjudgment sample picture and server |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911340682.XA CN113095342B (en) | 2019-12-23 | 2019-12-23 | Audit model optimization method and device based on misjudgment sample picture and server |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113095342A true CN113095342A (en) | 2021-07-09 |
CN113095342B CN113095342B (en) | 2024-07-05 |
Family
ID=76663099
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911340682.XA Active CN113095342B (en) | 2019-12-23 | 2019-12-23 | Audit model optimization method and device based on misjudgment sample picture and server |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113095342B (en) |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140003708A1 (en) * | 2012-06-28 | 2014-01-02 | International Business Machines Corporation | Object retrieval in video data using complementary detectors |
CN105912500A (en) * | 2016-03-30 | 2016-08-31 | 百度在线网络技术(北京)有限公司 | Machine learning model generation method and machine learning model generation device |
CN107194430A (en) * | 2017-05-27 | 2017-09-22 | 北京三快在线科技有限公司 | A kind of screening sample method and device, electronic equipment |
CN107562742A (en) * | 2016-06-30 | 2018-01-09 | 苏宁云商集团股份有限公司 | A kind of image processing method and device |
CN108460427A (en) * | 2018-03-29 | 2018-08-28 | 国信优易数据有限公司 | A kind of disaggregated model training method, device and sorting technique and device |
US20180308234A1 (en) * | 2017-04-24 | 2018-10-25 | Taihao Medical Inc. | System and method for cloud medical image analysis |
CN108830294A (en) * | 2018-05-09 | 2018-11-16 | 四川斐讯信息技术有限公司 | A kind of augmentation method of image data |
US20180340729A1 (en) * | 2016-10-19 | 2018-11-29 | Emanate Wireless, Inc. | Cold storage health monitoring system |
CN108959567A (en) * | 2018-07-04 | 2018-12-07 | 武汉大学 | It is suitable for the safe retrieving method of large-scale image under a kind of cloud environment |
CN108960782A (en) * | 2018-07-10 | 2018-12-07 | 北京木瓜移动科技股份有限公司 | content auditing method and device |
CN109034188A (en) * | 2018-06-15 | 2018-12-18 | 北京金山云网络技术有限公司 | Acquisition methods, acquisition device, equipment and the storage medium of machine learning model |
CN109034076A (en) * | 2018-08-01 | 2018-12-18 | 天津工业大学 | A kind of automatic clustering method and automatic cluster system of mechanical fault signals |
CN109495783A (en) * | 2018-11-02 | 2019-03-19 | 平安科技(深圳)有限公司 | Video reviewing method, device, electronic equipment and medium |
CN109543713A (en) * | 2018-10-16 | 2019-03-29 | 北京奇艺世纪科技有限公司 | The modification method and device of training set |
CN109726120A (en) * | 2018-12-05 | 2019-05-07 | 北京计算机技术及应用研究所 | A kind of software defect confirmation method based on machine learning |
WO2019196130A1 (en) * | 2018-04-12 | 2019-10-17 | 广州飒特红外股份有限公司 | Classifier training method and device for vehicle-mounted thermal imaging pedestrian detection |
-
2019
- 2019-12-23 CN CN201911340682.XA patent/CN113095342B/en active Active
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140003708A1 (en) * | 2012-06-28 | 2014-01-02 | International Business Machines Corporation | Object retrieval in video data using complementary detectors |
CN105912500A (en) * | 2016-03-30 | 2016-08-31 | 百度在线网络技术(北京)有限公司 | Machine learning model generation method and machine learning model generation device |
CN107562742A (en) * | 2016-06-30 | 2018-01-09 | 苏宁云商集团股份有限公司 | A kind of image processing method and device |
US20180340729A1 (en) * | 2016-10-19 | 2018-11-29 | Emanate Wireless, Inc. | Cold storage health monitoring system |
US20180308234A1 (en) * | 2017-04-24 | 2018-10-25 | Taihao Medical Inc. | System and method for cloud medical image analysis |
CN107194430A (en) * | 2017-05-27 | 2017-09-22 | 北京三快在线科技有限公司 | A kind of screening sample method and device, electronic equipment |
CN108460427A (en) * | 2018-03-29 | 2018-08-28 | 国信优易数据有限公司 | A kind of disaggregated model training method, device and sorting technique and device |
WO2019196130A1 (en) * | 2018-04-12 | 2019-10-17 | 广州飒特红外股份有限公司 | Classifier training method and device for vehicle-mounted thermal imaging pedestrian detection |
CN108830294A (en) * | 2018-05-09 | 2018-11-16 | 四川斐讯信息技术有限公司 | A kind of augmentation method of image data |
CN109034188A (en) * | 2018-06-15 | 2018-12-18 | 北京金山云网络技术有限公司 | Acquisition methods, acquisition device, equipment and the storage medium of machine learning model |
CN108959567A (en) * | 2018-07-04 | 2018-12-07 | 武汉大学 | It is suitable for the safe retrieving method of large-scale image under a kind of cloud environment |
CN108960782A (en) * | 2018-07-10 | 2018-12-07 | 北京木瓜移动科技股份有限公司 | content auditing method and device |
CN109034076A (en) * | 2018-08-01 | 2018-12-18 | 天津工业大学 | A kind of automatic clustering method and automatic cluster system of mechanical fault signals |
CN109543713A (en) * | 2018-10-16 | 2019-03-29 | 北京奇艺世纪科技有限公司 | The modification method and device of training set |
CN109495783A (en) * | 2018-11-02 | 2019-03-19 | 平安科技(深圳)有限公司 | Video reviewing method, device, electronic equipment and medium |
CN109726120A (en) * | 2018-12-05 | 2019-05-07 | 北京计算机技术及应用研究所 | A kind of software defect confirmation method based on machine learning |
Non-Patent Citations (3)
Title |
---|
YUE Y , SHEN J , LIU R: "An Improved Adaptive Weighted Gaussian Nearest Neighbor Classification Method", 2019 CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 12 September 2019 (2019-09-12), pages 2712 - 2715 * |
张晓明: "基于SIFT特征的人脸表情识别研究", 《中国优秀硕士学位论文全文数据库(信息科技辑)》, 15 May 2015 (2015-05-15), pages 138 - 1123 * |
朱亚奇;邓维斌;: "一种基于不平衡数据的聚类抽样方法", 南京大学学报(自然科学), no. 02, 30 March 2015 (2015-03-30), pages 211 - 219 * |
Also Published As
Publication number | Publication date |
---|---|
CN113095342B (en) | 2024-07-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11188789B2 (en) | Detecting poisoning attacks on neural networks by activation clustering | |
CN111753701B (en) | Method, device, equipment and readable storage medium for detecting violation of application program | |
EP3989158A1 (en) | Method, apparatus and device for video similarity detection | |
CN113382279B (en) | Live broadcast recommendation method, device, equipment, storage medium and computer program product | |
KR102002024B1 (en) | Method for processing labeling of object and object management server | |
CN112381104A (en) | Image identification method and device, computer equipment and storage medium | |
CN105518712A (en) | Keyword notification method, equipment and computer program product based on character recognition | |
CN111931809A (en) | Data processing method and device, storage medium and electronic equipment | |
CN105787133A (en) | Method and device for filtering advertisement information | |
CN109766435A (en) | The recognition methods of barrage classification, device, equipment and storage medium | |
CN113033682B (en) | Video classification method, device, readable medium and electronic equipment | |
CN112434178A (en) | Image classification method and device, electronic equipment and storage medium | |
KR102075111B1 (en) | Ui function test system and method | |
CN111783812A (en) | Method and device for identifying forbidden images and computer readable storage medium | |
CN111931859A (en) | Multi-label image identification method and device | |
CN113963186A (en) | Training method of target detection model, target detection method and related device | |
CN110895811B (en) | Image tampering detection method and device | |
CN114842411A (en) | Group behavior identification method based on complementary space-time information modeling | |
CN112016521A (en) | Video processing method and device | |
CN114898266A (en) | Training method, image processing method, device, electronic device and storage medium | |
CN113962199A (en) | Text recognition method, text recognition device, text recognition equipment, storage medium and program product | |
CN111444364B (en) | Image detection method and device | |
CN110674497B (en) | Malicious program similarity calculation method and device | |
CN116824455A (en) | Event detection method, device, equipment and storage medium | |
CN113095342B (en) | Audit model optimization method and device based on misjudgment sample picture and server |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |