CN113095342A - Audit model optimization method and device based on misjudged sample picture and server - Google Patents

Audit model optimization method and device based on misjudged sample picture and server Download PDF

Info

Publication number
CN113095342A
CN113095342A CN201911340682.XA CN201911340682A CN113095342A CN 113095342 A CN113095342 A CN 113095342A CN 201911340682 A CN201911340682 A CN 201911340682A CN 113095342 A CN113095342 A CN 113095342A
Authority
CN
China
Prior art keywords
feature vector
category
feature
misjudgment
optimized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911340682.XA
Other languages
Chinese (zh)
Other versions
CN113095342B (en
Inventor
王森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Cloud Network Technology Co Ltd
Original Assignee
Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Cloud Network Technology Co Ltd filed Critical Beijing Kingsoft Cloud Network Technology Co Ltd
Priority to CN201911340682.XA priority Critical patent/CN113095342B/en
Publication of CN113095342A publication Critical patent/CN113095342A/en
Application granted granted Critical
Publication of CN113095342B publication Critical patent/CN113095342B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/231Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides an auditing model optimization method and device based on misjudged sample pictures and a server. The method comprises the following steps: acquiring a feature vector of a misjudgment sample picture; clustering the feature vectors of the misjudgment sample pictures to obtain feature vector data of N categories; acquiring N groups of corresponding training data sets from a feature database according to the feature vector data of the N categories; performing parallel training on an audit model to be optimized by utilizing the N groups of training data sets; and determining an optimal audit model from the trained audit models to be optimized. According to the embodiment of the invention, the automatic optimization training of the auditing model based on the misjudged sample picture can be realized, so that the emergency of online content auditing can be responded in time, and the labor cost and the time cost can be reduced.

Description

Audit model optimization method and device based on misjudged sample picture and server
Technical Field
The present invention relates to the field of computer vision recognition technologies, and in particular, to a method for optimizing an audit model based on a misjudged sample (badcase) picture, an apparatus for optimizing an audit model based on a misjudged sample picture, a server, and a computer-readable storage medium.
Background
With the rapid development of internet technology, video resources on the network are more and more. A great deal of video fishes and dragons are mixed, for example, some video clips contain contents such as pornography and severe violence, or some videos have copyright problems. In order to be able to filter these videos containing objectionable content, the content of the video needs to be reviewed.
When content is audited, an audit model is usually used for auditing the picture to judge whether illegal and illegal contents exist, but the audit model can have the condition of misjudgment or missed judgment, and at this time, picture data of misjudgment samples fed back by a user can be received. It is important how to analyze the picture data of the misjudged sample to perform targeted optimization on the audit model.
At present, analysis of misjudged samples mainly depends on manual review by operators, the operators need to summarize picture categories of the misjudged samples, then take pictures of specific categories of mobile phones off line, and manually mark the pictures to generate a training set, so that targeted training of an audit model is realized.
However, the method for optimizing the audit model based on the analysis of the misjudged samples is still in a strong manual processing stage, high labor cost and time cost are consumed, the development period is long, and the emergency situation of online content audit cannot be responded in time. Therefore, there is a need to provide a new method for automatic optimization training of a mis-judged sample-based audit model.
Disclosure of Invention
An object of the embodiments of the present invention is to provide a new technical solution for auditing model optimization based on misjudged sample pictures.
According to a first aspect of the present invention, there is provided a method for optimizing an audit model based on a misjudged sample picture, the method including:
acquiring a feature vector of a misjudgment sample picture;
clustering the feature vectors of the misjudgment sample pictures to obtain feature vector data of N categories;
acquiring N groups of corresponding training data sets from a feature database according to the feature vector data of the N categories;
performing parallel training on an audit model to be optimized by utilizing the N groups of training data sets;
and determining an optimal audit model from the trained audit models to be optimized.
Optionally, the obtaining the feature vector of the misjudgment sample picture includes:
and extracting the characteristics of each misjudgment sample picture to obtain a characteristic vector corresponding to each misjudgment sample picture.
Optionally, a hierarchical clustering method is adopted to cluster the feature vectors of the misjudged sample pictures.
Optionally, the clustering the feature vectors of the misjudgment sample pictures to obtain feature vector data of N categories includes:
clustering the characteristic vectors of the misjudgment sample pictures to obtain clustering results of N categories of different levels;
calculating the median of the feature vectors of the misjudgment sample pictures in each category to obtain category feature vectors corresponding to each category;
determining the label with the largest number of labels in each category as a category label corresponding to each category;
and respectively determining the N category feature vectors and the corresponding category labels as the feature vector data of the categories.
Optionally, the obtaining, according to the feature vector data of the N categories, N corresponding sets of training data sets from a feature database includes:
for each category of feature vector data, matching feature vectors in the category of feature vector data with feature vectors in the feature database to obtain a plurality of image data meeting a preset similarity threshold;
and enabling each picture data to respectively form training data in a picture-class format with the class in the feature vector data of the class, and obtaining a training data set corresponding to the feature vector data of the class.
Optionally, the determining an optimal audit model from the trained audit models to be optimized includes:
running N trained auditing models to be optimized in parallel;
obtaining an evaluation index value of each trained to-be-optimized auditing model; the evaluation index value comprises the average value of recall rate, accuracy rate and F1 value; the F1 value is a harmonic mean of the recall rate and the accuracy rate;
and selecting the trained auditing model to be optimized corresponding to the optimal evaluation index value to determine the trained auditing model to be the optimal auditing model.
Optionally, before the obtaining the feature vector of the misjudged sample picture, the method further includes:
and filtering repeated misjudgment sample pictures according to the RGB value of each misjudgment sample picture.
According to a second aspect of the embodiments of the present invention, there is provided an apparatus for optimizing an audit model based on a misjudged sample picture, the apparatus including:
the acquisition module is used for acquiring the characteristic vector of the misjudged sample picture;
the clustering module is used for clustering the characteristic vectors of the misjudgment sample pictures to obtain characteristic vector data of N categories;
the matching module is used for acquiring N groups of corresponding training data sets from a feature database according to the feature vector data of the N categories;
the training module is used for performing parallel training on the to-be-optimized auditing model by utilizing the N groups of training data sets;
and the determining module is used for determining an optimal auditing model from the trained auditing models to be optimized.
Optionally, the obtaining module is specifically configured to:
and extracting the characteristics of each misjudgment sample picture to obtain a characteristic vector corresponding to each misjudgment sample picture.
Optionally, the clustering module is specifically configured to: and clustering the characteristic vectors of the misjudgment sample pictures by adopting a hierarchical clustering method.
Optionally, the clustering module is specifically configured to:
clustering the characteristic vectors of the misjudgment sample pictures to obtain clustering results of N categories of different levels;
calculating the median of the feature vectors of the misjudgment sample pictures in each category to obtain category feature vectors corresponding to each category;
determining the label with the largest number of labels in each category as a category label corresponding to each category;
and respectively determining the N category feature vectors and the corresponding category labels as the feature vector data of the categories.
Optionally, the matching module is specifically configured to:
for each category of feature vector data, matching feature vectors in the category of feature vector data with feature vectors in the feature database to obtain a plurality of image data meeting a preset similarity threshold;
and enabling each picture data to respectively form training data in a picture-class format with the class in the feature vector data of the class, and obtaining a training data set corresponding to the feature vector data of the class.
Optionally, the determining module is specifically configured to:
running N trained auditing models to be optimized in parallel;
obtaining an evaluation index value of each trained to-be-optimized auditing model; the evaluation index value comprises the average value of recall rate, accuracy rate and F1 value; the F1 value is a harmonic mean of the recall rate and the accuracy rate;
and selecting the trained auditing model to be optimized corresponding to the optimal evaluation index value to determine the trained auditing model to be the optimal auditing model.
Optionally, the apparatus further includes a filtering module, configured to filter repeated misjudgment sample pictures according to the RGB values of each misjudgment sample picture.
According to a third aspect of the present invention, there is provided a server including the apparatus for optimizing an audit model based on a misjudged sample picture according to the second aspect of the present invention, or the server includes:
a memory for storing executable commands;
a processor, configured to execute the method for optimizing an audit model based on a misjudged sample picture according to any one of the first aspect of the present invention under the control of the executable command.
According to a fourth aspect of the present invention, there is provided a computer-readable storage medium storing executable instructions, which when executed by a processor, perform the method for optimizing an audit model based on a misjudged sample picture according to any one of the first aspect of the present invention.
According to one embodiment of the invention, automatic optimization training of the auditing model based on the misjudged sample picture can be realized, so that the emergency situation of online content auditing can be responded in time, and the labor cost and the time cost are reduced.
Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.
Fig. 1 is a schematic structural diagram of a server to which an audit model optimization method based on a misjudged sample picture according to an embodiment of the present invention may be applied;
FIG. 2 is a flowchart of an audit model optimization method based on misjudged sample pictures according to an embodiment of the present invention;
FIG. 3 shows a schematic diagram of hierarchical clustering in accordance with an embodiment of the present invention;
FIG. 4 shows a schematic flow diagram of an example according to an embodiment of the invention;
FIG. 5 is a schematic structural diagram of an apparatus for optimizing an audit model based on a misjudged sample picture according to an embodiment of the present invention;
FIG. 6 is a functional block diagram of a server according to an embodiment of the present invention.
Detailed Description
Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
< hardware configuration >
Fig. 1 is a block diagram showing a hardware configuration of a server 1000 that can implement an embodiment of the present invention.
Server 1000 may be, for example, a blade server or the like.
In one example, server 1000 may be a computer.
In another example, the server 1000 may be as shown in fig. 1, including a processor 1100, a memory 1200, an interface device 1300, a communication device 1400, a display device 1500, an input device 1600. Although the server may also include speakers, microphones, etc., these components are not relevant to the present invention and are omitted here.
The processor 1100 may be, for example, a central processing unit CPU, a microprocessor MCU, or the like. The memory 1200 includes, for example, a ROM (read only memory), a RAM (random access memory), a nonvolatile memory such as a hard disk, and the like. The interface device 1300 includes, for example, a USB interface, a serial interface, and the like. Communication device 1400 is capable of wired or wireless communication, for example. The display device 1500 is, for example, a liquid crystal display panel. The input device 1600 may include, for example, a touch screen, a keyboard, and the like.
The servers shown in fig. 1 are merely illustrative and are in no way meant to limit the invention, its application, or uses. In an embodiment of the present invention, the memory 1200 of the server 1000 is configured to store instructions for controlling the processor 1100 to operate so as to execute any method for optimizing an audit model based on a misjudged sample picture according to an embodiment of the present invention.
It should be understood by those skilled in the art that although a plurality of devices are shown for the server 1000 in fig. 1, the present invention may only relate to some of the devices, for example, only the processor 1100 and the storage device 1200 of the server 1000.
The skilled person can design the instructions according to the disclosed solution. How the instructions control the operation of the processor is well known in the art and will not be described in detail herein.
< method examples >
The embodiment provides an auditing model optimization method based on misjudged sample pictures, which may be implemented by a server, for example, the server may be the server 1000 shown in fig. 1.
As shown in FIG. 2, the method comprises the following steps 2100-2500:
step 2100, obtaining a feature vector of the misjudgment sample picture.
The misjudgment sample picture refers to picture data of badcase fed back by a user when misjudgment or missed judgment occurs when the picture is audited by using the audit model.
Before this step, the server 1000 may obtain a misjudged sample picture fed back by the user, and store the misjudged sample picture in a data warehouse. In a content auditing scene, for example, a live broadcast scene, most pictures are repeated in a large amount, so that to reduce the computational load, the server 1000 cleans the misjudged sample pictures in the data warehouse and filters the repeated misjudged sample pictures before obtaining the feature vectors of the misjudged sample pictures. In one example, the server 1000 may filter the repeated erroneous determination sample pictures according to RGB (Red, Green, Blue, Red, Green, Blue) values of each erroneous determination sample picture.
After filtering out repeated misjudgment sample pictures, the server 1000 extracts the features of each misjudgment sample picture to obtain a feature vector corresponding to each misjudgment sample picture. For example, the corresponding feature vector may be extracted from the misjudgment sample picture by an artificial intelligence technique such as a neural network algorithm.
Step 2200, clustering the feature vectors of the misjudgment sample pictures to obtain feature vector data of N categories.
In the step, in order to reduce the process of manually knowing the data distribution in advance, a hierarchical clustering method which does not need to input hyperparameters such as the clustering number, the distance threshold value and the like is adopted to cluster the feature vectors of the misjudged sample pictures. In practical application, the hierarchical clustering method includes a bottom-up merging method and a top-down splitting method, and in this embodiment, a bottom-up merging method, such as a DBSCAN clustering algorithm, may be used to cluster feature vectors of misjudged sample pictures.
Specifically, the server 1000 clusters the feature vectors of the misjudged sample pictures by using a hierarchical clustering method to obtain clustering results of N classes of different levels, as shown in fig. 3, the feature vectors of the misjudged sample pictures are subjected to clustering analysis to obtain clustering results of level 1(N classes), level 2(m classes), … and level T (k classes), where N > m > k > 2.
After obtaining the clustering result, the server 1000 performs parallel computation on the clustering result of each layer, and calculates a median of the feature vectors of the misjudgment sample pictures in each category to obtain category feature vectors corresponding to each category; determining the label with the maximum number of labels in each category as a category label corresponding to each category; and respectively determining the N characteristic vectors of the category and the corresponding category labels as the characteristic vector data of the category. Note that the label is added when the user feeds back the erroneous judgment sample picture.
Step 2300, obtaining N groups of corresponding training data sets from the feature database according to the feature vector data of the N categories.
Specifically, for each feature vector data of the category, the server 1000 matches the feature vector in the feature vector data of the category with the feature vector in the feature database, for example, matches the distance similarity of the feature vectors, so as to obtain a plurality of image data meeting a preset similarity threshold; and enabling each picture data to respectively form training data in a picture-class format with the class in the feature vector data of the class, and obtaining a training data set corresponding to the feature vector data of the class. Namely, each hierarchical clustering result outputs a group of corresponding training data sets to wait for the training of the auditing model.
And 2400, performing parallel training on the to-be-optimized auditing model by using the N groups of training data sets.
Specifically, the N groups of training data sets obtained in the above steps are respectively input into the to-be-optimized audit model, and the to-be-optimized audit model is trained in parallel to obtain trained to-be-optimized audit models 1, 2, …, and N.
And 2500, determining an optimal audit model from the trained audit models to be optimized.
In this step, the server 1000 runs N trained to-be-optimized audit models in parallel; obtaining an evaluation index value of each trained to-be-optimized auditing model; the evaluation index value comprises the average value of the recall rate, the accuracy rate and the F1 value; the F1 value is a harmonic mean of the recall rate and the accuracy rate; and selecting the trained auditing model to be optimized corresponding to the optimal evaluation index value to determine the optimal auditing model, and deploying online.
< example >
Fig. 4 shows a schematic flow diagram of an example according to an embodiment of the invention.
As shown in fig. 4, the method for optimizing an audit model based on a misjudged sample picture in this example may include the following steps:
step 4100, obtaining a misjudgment sample picture fed back by the user, and storing the misjudgment sample picture in a data warehouse.
Step 4200, filtering the repeated misjudged sample pictures.
For example, the repeated erroneous judgment sample pictures may be filtered according to the RGB values of each of the erroneous judgment sample pictures.
Step 4300, extracting the features of each misjudged sample picture to obtain a feature vector corresponding to each misjudged sample picture.
And 4400, clustering the characteristic vectors of the misjudged sample pictures by adopting a hierarchical clustering method to obtain N types of clustering results of different levels.
Step 4500, calculating a median of the feature vectors of the misjudgment sample picture in each category to obtain category feature vectors corresponding to each category; determining the label with the largest number of labels in each category as a category label corresponding to each category; feature vector data of N categories are obtained.
Step 4600, match the feature vectors in the feature vector data of the category with the feature vectors in the feature database to obtain a plurality of image data meeting a preset similarity threshold.
Step 4700, making each picture data and the category in the feature vector data of the category form training data in a picture-category format, and obtaining a training data set corresponding to the feature vector data of the category.
And 4800, performing parallel training on the to-be-optimized audit model by using the N groups of training data sets.
4900 running N trained to-be-optimized audit models in parallel to obtain evaluation index values corresponding to the N trained to-be-optimized audit models, and determining an optimal audit model deployment online based on the evaluation index values.
The method for optimizing the audit model based on the misjudged sample picture according to the embodiment is described above with reference to the drawings and examples. The method of the embodiment obtains the characteristic vector of the misjudged sample picture; clustering the feature vectors of the misjudgment sample pictures to obtain feature vector data of N categories; acquiring N groups of corresponding training data sets from a feature database according to the feature vector data of the N categories; performing parallel training on the to-be-optimized audit model by using the N groups of training data sets; and determining an optimal audit model from the trained audit model to be optimized. According to the embodiment of the invention, the automatic optimization training of the auditing model based on the misjudged sample picture can be realized, so that the emergency of online content auditing can be responded in time, and the labor cost and the time cost can be reduced.
< apparatus embodiment >
The present embodiment provides an auditing model optimizing device based on misjudged sample pictures, which is, for example, the auditing model optimizing device 5000 based on misjudged sample pictures shown in fig. 5.
As shown in fig. 5, the apparatus 5000 for optimizing an audit model based on a misjudged sample picture may include: the device comprises an acquisition module 5100, a clustering module 5200, a matching module 5300, a training module 5400 and a determination module 5500.
The obtaining module 5100 is configured to obtain a feature vector of the misjudged sample picture.
The clustering module 5200 is configured to cluster the feature vectors of the misjudgment sample pictures to obtain feature vector data of N categories.
The matching module 5300 is configured to obtain N corresponding sets of training data sets from the feature database according to the feature vector data of the N categories.
The training module 5400 is configured to perform parallel training on the to-be-optimized audit model by using the N sets of training data sets.
The determining module 5500 is configured to determine an optimal audit model from the trained audit models to be optimized.
Specifically, the obtaining module 5100 may be configured to extract features of each misjudged sample picture to obtain a feature vector corresponding to each misjudged sample picture.
Optionally, the clustering module 5200 clusters the feature vectors of the misjudged sample pictures by using a hierarchical clustering method. Specifically, the clustering module 5200 can cluster the feature vectors of the misjudged sample pictures to obtain N types of clustering results of different levels; calculating the median of the feature vectors of the misjudgment sample picture in each category to obtain the category feature vectors corresponding to each category; determining the label with the largest number of labels in each category as a category label corresponding to each category; and respectively determining the N characteristic vectors of the category and the corresponding category labels as the characteristic vector data of the category.
In an example, the matching module 5300 may be specifically configured to, for each of the feature vector data of the category, match a feature vector in the feature vector data of the category with a feature vector in the feature database to obtain a plurality of image data meeting a preset similarity threshold; and enabling each picture data to respectively form training data in a picture-class format with the class in the feature vector data of the class, and obtaining a training data set corresponding to the feature vector data of the class.
In one example, the determining module 5500 is specifically configured to run N trained to-be-optimized audit models in parallel; obtaining an evaluation index value of each trained to-be-optimized auditing model; the evaluation index value comprises the average value of the recall rate, the accuracy rate and the F1 value; and selecting the trained to-be-optimized auditing model corresponding to the optimal evaluation index value to determine the model as the optimal auditing model.
Optionally, the device 5000 for optimizing the audit model based on the misjudged sample pictures may further include a filtering module, configured to filter repeated misjudged sample pictures according to RGB (Red, Green, Blue, Red, Green, and Blue) values of each of the misjudged sample pictures.
The device for optimizing the audit model based on the misjudged sample picture in this embodiment can be used for executing the technical scheme of the method embodiment, and the implementation principle and the technical effect are similar, and are not described herein again.
< apparatus embodiment >
In this embodiment, a server is further provided, where the server may include the auditing model optimizing device 5000 based on the misjudged sample picture described in the device embodiment of the present invention; alternatively, the server is a server 6000 shown in fig. 6, and includes:
a memory 6100 for storing executable commands.
A processor 6200, configured to perform a method described in any method embodiment of the present invention under control of an executable command stored in a memory 6100.
< computer-readable storage Medium embodiment >
The present embodiments provide a computer-readable storage medium having stored therein executable instructions that, when executed by a processor, perform the method described in any of the method embodiments of the present invention.
The present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.
The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.
The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.
The computer program instructions for carrying out operations of the present invention may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing an electronic circuit, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA), with state information of computer-readable program instructions, which can execute the computer-readable program instructions.
Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, implementation by software, and implementation by a combination of software and hardware are equivalent.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims (10)

1. An auditing model optimization method based on misjudgment sample pictures is characterized by comprising the following steps:
acquiring a feature vector of a misjudgment sample picture;
clustering the feature vectors of the misjudgment sample pictures to obtain feature vector data of N categories;
acquiring N groups of corresponding training data sets from a feature database according to the feature vector data of the N categories;
performing parallel training on an audit model to be optimized by utilizing the N groups of training data sets;
and determining an optimal audit model from the trained audit models to be optimized.
2. The method according to claim 1, wherein the obtaining the feature vector of the misjudged sample picture comprises:
and extracting the characteristics of each misjudgment sample picture to obtain a characteristic vector corresponding to each misjudgment sample picture.
3. The method according to claim 1, wherein the feature vectors of the misjudged sample pictures are clustered by a hierarchical clustering method.
4. The method according to claim 3, wherein the clustering the feature vectors of the misjudgment sample pictures to obtain feature vector data of N categories comprises:
clustering the characteristic vectors of the misjudgment sample pictures to obtain clustering results of N categories of different levels;
calculating the median of the feature vectors of the misjudgment sample pictures in each category to obtain category feature vectors corresponding to each category;
determining the label with the largest number of labels in each category as a category label corresponding to each category;
and respectively determining the N category feature vectors and the corresponding category labels as the feature vector data of the categories.
5. The method according to claim 1, wherein the obtaining of the corresponding N sets of training data sets from the feature database according to the N categories of feature vector data comprises:
for each category of feature vector data, matching feature vectors in the category of feature vector data with feature vectors in the feature database to obtain a plurality of image data meeting a preset similarity threshold;
and enabling each picture data to respectively form training data in a picture-class format with the class in the feature vector data of the class, and obtaining a training data set corresponding to the feature vector data of the class.
6. The method of claim 1, wherein the determining an optimal audit model from the trained audit models to be optimized comprises:
running N trained auditing models to be optimized in parallel;
obtaining an evaluation index value of each trained to-be-optimized auditing model; the evaluation index value comprises the average value of recall rate, accuracy rate and F1 value; the F1 value is a harmonic mean of the recall rate and the accuracy rate;
and selecting the trained auditing model to be optimized corresponding to the optimal evaluation index value to determine the trained auditing model to be the optimal auditing model.
7. The method according to claim 1, wherein before the obtaining the feature vector of the misjudged sample picture, the method further comprises:
and filtering repeated misjudgment sample pictures according to the RGB value of each misjudgment sample picture.
8. An apparatus for optimizing an audit model based on a misjudged sample picture, the apparatus comprising:
the acquisition module is used for acquiring the characteristic vector of the misjudged sample picture;
the clustering module is used for clustering the characteristic vectors of the misjudgment sample pictures to obtain characteristic vector data of N categories;
the matching module is used for acquiring N groups of corresponding training data sets from a feature database according to the feature vector data of the N categories;
the training module is used for performing parallel training on the to-be-optimized auditing model by utilizing the N groups of training data sets;
and the determining module is used for determining an optimal auditing model from the trained auditing models to be optimized.
9. A server comprising the apparatus for optimizing an audit model based on a misjudged sample picture according to claim 8, or comprising:
a memory for storing executable commands;
a processor for executing the method for auditing model optimization based on misjudgment sample pictures according to any one of claims 1-7 under the control of the executable command.
10. A computer-readable storage medium storing executable instructions that when executed by a processor perform the method for audit model optimization based on false positive sample pictures according to any one of claims 1-7.
CN201911340682.XA 2019-12-23 2019-12-23 Audit model optimization method and device based on misjudgment sample picture and server Active CN113095342B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911340682.XA CN113095342B (en) 2019-12-23 2019-12-23 Audit model optimization method and device based on misjudgment sample picture and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911340682.XA CN113095342B (en) 2019-12-23 2019-12-23 Audit model optimization method and device based on misjudgment sample picture and server

Publications (2)

Publication Number Publication Date
CN113095342A true CN113095342A (en) 2021-07-09
CN113095342B CN113095342B (en) 2024-07-05

Family

ID=76663099

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911340682.XA Active CN113095342B (en) 2019-12-23 2019-12-23 Audit model optimization method and device based on misjudgment sample picture and server

Country Status (1)

Country Link
CN (1) CN113095342B (en)

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140003708A1 (en) * 2012-06-28 2014-01-02 International Business Machines Corporation Object retrieval in video data using complementary detectors
CN105912500A (en) * 2016-03-30 2016-08-31 百度在线网络技术(北京)有限公司 Machine learning model generation method and machine learning model generation device
CN107194430A (en) * 2017-05-27 2017-09-22 北京三快在线科技有限公司 A kind of screening sample method and device, electronic equipment
CN107562742A (en) * 2016-06-30 2018-01-09 苏宁云商集团股份有限公司 A kind of image processing method and device
CN108460427A (en) * 2018-03-29 2018-08-28 国信优易数据有限公司 A kind of disaggregated model training method, device and sorting technique and device
US20180308234A1 (en) * 2017-04-24 2018-10-25 Taihao Medical Inc. System and method for cloud medical image analysis
CN108830294A (en) * 2018-05-09 2018-11-16 四川斐讯信息技术有限公司 A kind of augmentation method of image data
US20180340729A1 (en) * 2016-10-19 2018-11-29 Emanate Wireless, Inc. Cold storage health monitoring system
CN108959567A (en) * 2018-07-04 2018-12-07 武汉大学 It is suitable for the safe retrieving method of large-scale image under a kind of cloud environment
CN108960782A (en) * 2018-07-10 2018-12-07 北京木瓜移动科技股份有限公司 content auditing method and device
CN109034188A (en) * 2018-06-15 2018-12-18 北京金山云网络技术有限公司 Acquisition methods, acquisition device, equipment and the storage medium of machine learning model
CN109034076A (en) * 2018-08-01 2018-12-18 天津工业大学 A kind of automatic clustering method and automatic cluster system of mechanical fault signals
CN109495783A (en) * 2018-11-02 2019-03-19 平安科技(深圳)有限公司 Video reviewing method, device, electronic equipment and medium
CN109543713A (en) * 2018-10-16 2019-03-29 北京奇艺世纪科技有限公司 The modification method and device of training set
CN109726120A (en) * 2018-12-05 2019-05-07 北京计算机技术及应用研究所 A kind of software defect confirmation method based on machine learning
WO2019196130A1 (en) * 2018-04-12 2019-10-17 广州飒特红外股份有限公司 Classifier training method and device for vehicle-mounted thermal imaging pedestrian detection

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140003708A1 (en) * 2012-06-28 2014-01-02 International Business Machines Corporation Object retrieval in video data using complementary detectors
CN105912500A (en) * 2016-03-30 2016-08-31 百度在线网络技术(北京)有限公司 Machine learning model generation method and machine learning model generation device
CN107562742A (en) * 2016-06-30 2018-01-09 苏宁云商集团股份有限公司 A kind of image processing method and device
US20180340729A1 (en) * 2016-10-19 2018-11-29 Emanate Wireless, Inc. Cold storage health monitoring system
US20180308234A1 (en) * 2017-04-24 2018-10-25 Taihao Medical Inc. System and method for cloud medical image analysis
CN107194430A (en) * 2017-05-27 2017-09-22 北京三快在线科技有限公司 A kind of screening sample method and device, electronic equipment
CN108460427A (en) * 2018-03-29 2018-08-28 国信优易数据有限公司 A kind of disaggregated model training method, device and sorting technique and device
WO2019196130A1 (en) * 2018-04-12 2019-10-17 广州飒特红外股份有限公司 Classifier training method and device for vehicle-mounted thermal imaging pedestrian detection
CN108830294A (en) * 2018-05-09 2018-11-16 四川斐讯信息技术有限公司 A kind of augmentation method of image data
CN109034188A (en) * 2018-06-15 2018-12-18 北京金山云网络技术有限公司 Acquisition methods, acquisition device, equipment and the storage medium of machine learning model
CN108959567A (en) * 2018-07-04 2018-12-07 武汉大学 It is suitable for the safe retrieving method of large-scale image under a kind of cloud environment
CN108960782A (en) * 2018-07-10 2018-12-07 北京木瓜移动科技股份有限公司 content auditing method and device
CN109034076A (en) * 2018-08-01 2018-12-18 天津工业大学 A kind of automatic clustering method and automatic cluster system of mechanical fault signals
CN109543713A (en) * 2018-10-16 2019-03-29 北京奇艺世纪科技有限公司 The modification method and device of training set
CN109495783A (en) * 2018-11-02 2019-03-19 平安科技(深圳)有限公司 Video reviewing method, device, electronic equipment and medium
CN109726120A (en) * 2018-12-05 2019-05-07 北京计算机技术及应用研究所 A kind of software defect confirmation method based on machine learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
YUE Y , SHEN J , LIU R: "An Improved Adaptive Weighted Gaussian Nearest Neighbor Classification Method", 2019 CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 12 September 2019 (2019-09-12), pages 2712 - 2715 *
张晓明: "基于SIFT特征的人脸表情识别研究", 《中国优秀硕士学位论文全文数据库(信息科技辑)》, 15 May 2015 (2015-05-15), pages 138 - 1123 *
朱亚奇;邓维斌;: "一种基于不平衡数据的聚类抽样方法", 南京大学学报(自然科学), no. 02, 30 March 2015 (2015-03-30), pages 211 - 219 *

Also Published As

Publication number Publication date
CN113095342B (en) 2024-07-05

Similar Documents

Publication Publication Date Title
US11188789B2 (en) Detecting poisoning attacks on neural networks by activation clustering
CN111753701B (en) Method, device, equipment and readable storage medium for detecting violation of application program
EP3989158A1 (en) Method, apparatus and device for video similarity detection
CN113382279B (en) Live broadcast recommendation method, device, equipment, storage medium and computer program product
KR102002024B1 (en) Method for processing labeling of object and object management server
CN112381104A (en) Image identification method and device, computer equipment and storage medium
CN105518712A (en) Keyword notification method, equipment and computer program product based on character recognition
CN111931809A (en) Data processing method and device, storage medium and electronic equipment
CN105787133A (en) Method and device for filtering advertisement information
CN109766435A (en) The recognition methods of barrage classification, device, equipment and storage medium
CN113033682B (en) Video classification method, device, readable medium and electronic equipment
CN112434178A (en) Image classification method and device, electronic equipment and storage medium
KR102075111B1 (en) Ui function test system and method
CN111783812A (en) Method and device for identifying forbidden images and computer readable storage medium
CN111931859A (en) Multi-label image identification method and device
CN113963186A (en) Training method of target detection model, target detection method and related device
CN110895811B (en) Image tampering detection method and device
CN114842411A (en) Group behavior identification method based on complementary space-time information modeling
CN112016521A (en) Video processing method and device
CN114898266A (en) Training method, image processing method, device, electronic device and storage medium
CN113962199A (en) Text recognition method, text recognition device, text recognition equipment, storage medium and program product
CN111444364B (en) Image detection method and device
CN110674497B (en) Malicious program similarity calculation method and device
CN116824455A (en) Event detection method, device, equipment and storage medium
CN113095342B (en) Audit model optimization method and device based on misjudgment sample picture and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant