CN115565201A - Taboo picture identification method, equipment and storage medium - Google Patents

Taboo picture identification method, equipment and storage medium Download PDF

Info

Publication number
CN115565201A
CN115565201A CN202210405622.7A CN202210405622A CN115565201A CN 115565201 A CN115565201 A CN 115565201A CN 202210405622 A CN202210405622 A CN 202210405622A CN 115565201 A CN115565201 A CN 115565201A
Authority
CN
China
Prior art keywords
picture
resources
taboo
resource
hand
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210405622.7A
Other languages
Chinese (zh)
Other versions
CN115565201B (en
Inventor
石英男
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Honor Device Co Ltd
Original Assignee
Honor Device Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Honor Device Co Ltd filed Critical Honor Device Co Ltd
Priority to CN202210405622.7A priority Critical patent/CN115565201B/en
Publication of CN115565201A publication Critical patent/CN115565201A/en
Application granted granted Critical
Publication of CN115565201B publication Critical patent/CN115565201B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The application provides a taboo picture identification method, equipment and a storage medium, wherein the method comprises the following steps: acquiring picture resources related in an operating system; identifying hand picture resources in the picture resources according to the hand identification model, wherein the hand picture resources comprise the picture resources of hands; determining coordinate information of hand joint points in hand picture resources according to the gesture recognition model; and predicting whether the hand picture resources are taboo pictures according to the SVM algorithm and the coordinate information of the joint points. Therefore, the picture resources after preprocessing are set to be subjected to hand recognition through the hand recognition model with low calculation cost and then returned to the boundary box, so that the gesture detection range is narrowed, and the recognition efficiency of the gesture recognition model can be improved.

Description

Taboo picture identification method, equipment and storage medium
Technical Field
The present application relates to the field of picture processing technologies, and in particular, to a method, a device, and a storage medium for identifying a tabu picture.
Background
With the global development, more and more products are not only limited to the domestic market, but also move to the international market. Since different countries and regions have their own unique political positions, historical backgrounds, religious beliefs, etc., in order to make products, such as various electronic devices with display interfaces, better meet local requirements, when issuing operating systems related to picture resources, identification of taboo pictures is generally performed manually at present to ensure that versions of the operating systems for different countries and regions can meet the requirements of the local political positions, historical backgrounds, religious beliefs, etc.
However, the identification method of the tabu picture is too dependent on manpower, which is time-consuming and labor-consuming, and therefore it is urgently needed to provide a tabu picture identification scheme to reduce the dependence on manpower and improve the identification efficiency.
Disclosure of Invention
In order to solve the technical problems, the application provides a taboo picture identification method, a device and a storage medium, which aim to reduce the human input, automatically identify picture resources related in an operating system in a more stable, more efficient and more accurate mode, and ensure that operating system versions aiming at different countries and regions can meet the requirements of local political grounds, historical backgrounds, religious beliefs and the like.
In a first aspect, the present application provides a method for identifying a tabu picture. The method comprises the following steps: acquiring picture resources related in an operating system; identifying hand picture resources in the picture resources according to the hand identification model, wherein the hand picture resources comprise the picture resources of hands; determining coordinate information of hand joint points in hand picture resources according to the gesture recognition model; and predicting whether the hand picture resources are taboo pictures according to the SVM algorithm and the coordinate information of the joint points. Therefore, the picture resources after preprocessing are set to be subjected to hand recognition through the hand recognition model with low calculation cost and then returned to the boundary box, so that the gesture detection range is narrowed, and the recognition efficiency of the gesture recognition model can be improved.
According to the first aspect, the hand recognition model is trained on hand data in the gesture recognition data set based on the SSD target recognition framework to obtain a data set used for carrying out hand recognition on picture resources, and a bounding box is returned. Therefore, the hand recognition model is set to be an SSD target recognition frame structure with high processing speed and low cost, so that the accuracy is guaranteed, meanwhile, the extraction of the characteristic information can be efficiently completed, and further, the operation resources are saved.
According to the first aspect, or any implementation manner of the first aspect above, the gesture recognition model is of a MediaPipe frame structure, and is configured to locate 21 3D hand joint points inside a hand region in a hand picture resource recognized by the hand recognition model, and extract coordinate information of the 21 3D hand joint points.
According to the first aspect, or any implementation manner of the first aspect above, the method further includes: identifying a taboo animal picture resource in the picture resources according to the taboo animal identification model, wherein the taboo animal picture resource comprises the picture resources of the taboo animal in the picture resources; the taboo animal identification model is obtained by training data in the Imagenet data set and image data of a taboo animal crawled by a network based on the ResNext convolutional neural network. Therefore, a taboo animal identification model which can cover a real scene (a photo of an animal) and a virtual scene (a cartoon/simple-stroke animal) can be obtained, and a taboo picture of the taboo animal type can be identified more accurately.
According to the first aspect, or any implementation manner of the first aspect above, the method further includes: identifying a taboo flag picture resource in the picture resource according to a taboo flag identification model, wherein the taboo flag picture resource is a picture resource comprising a taboo flag in the picture resource, and the taboo flag identification model is a YoloV3 frame structure; and/or identifying the sensitive area picture resources in the picture resources according to the sensitive area identification model, wherein the sensitive area picture resources comprise the picture resources of the sensitive area, and the sensitive area identification model is a YoloV3 frame structure.
According to a first aspect, or any one of the above implementations of the first aspect, the method further comprises: when the picture resource is determined to be a tabu picture, generating a visual warning report according to the name and the path address of the picture resource and the visual picture resource; submitting the visual warning report to a manual review; and when the review information submitted by the review personnel is received, deleting the taboo picture determined by the review personnel from the operating system in response to the review information submitted by the review personnel. Therefore, by generating the visual warning report, whether the picture resource is really a tabu picture can be directly determined during manual review, and the picture resource can be determined without searching for the picture resource according to the name and the path address of the picture resource, so that the review efficiency of the manual review link is improved.
According to the first aspect, or any implementation manner of the first aspect, after obtaining the picture resource involved in the operating system, the method further includes: extracting characteristic information from the picture resources according to the self-learned ResNet50 convolutional neural network model; classifying the characteristic information according to a classification model after self-learning, and identifying a false-alarm picture resource which is determined by a rechecker and is not a tabu picture; and filtering the false alarm picture resources from the picture resources. Therefore, each misinformation picture resource only needs to be checked once by a rechecker, and the rechecker cannot be in a visual warming report table subsequently, so that unnecessary human input can be reduced.
According to the first aspect, or any implementation manner of the first aspect above, the method further includes: determining false alarm picture resources in a visual warning report according to the review information submitted by review personnel; constructing a learning data set according to the false alarm picture resources; performing oversampling training on the ResNet50 convolutional neural network model based on the data in the learning data set to obtain a self-learned ResNet50 convolutional neural network model; and performing fitting training on a classification model based on the data in the learning data set to obtain a self-learned classification model. Therefore, the same false alarm picture resources can be filtered out firstly, and similar picture resources can be filtered out.
According to the first aspect, or any implementation manner of the first aspect, after obtaining the picture resource involved in the operating system, the method further includes: and preprocessing the picture resources, and unifying the size and the channel number of the picture resources. Therefore, the specification of the picture resources to be recognized is realized, and the recognition model for recognizing the taboo pictures of different taboo types can quickly and accurately extract the characteristic information for recognition.
According to the first aspect, or any implementation manner of the first aspect, before preprocessing the picture resource and unifying the size and the number of channels of the picture resource, the method further includes: determining the format of the picture resource; when the format of the picture resource is a vector drawable object DVG format, the attribute and the value of the picture resource are adapted, and the picture resource is converted into a Scalable Vector Graphics (SVG) format from the DVG format; and converting the picture resource in the SVG format into a portable network graphics PNG format. Therefore, the picture resources in the DVG format in the XML form can be converted into the visual PNG format, so that feature extraction of a subsequent recognition model is facilitated, and a visual warning report is generated.
In a second aspect, the present application provides an electronic device. The electronic device includes: a memory and a processor, the memory and the processor coupled; the memory stores program instructions that, when executed by the processor, cause the electronic device to perform the method of the first aspect or any possible implementation of the first aspect.
In a third aspect, the present application provides a computer readable medium for storing a computer program comprising instructions for performing the method of the first aspect or any possible implementation manner of the first aspect.
In a fourth aspect, the present application provides a computer program comprising instructions for carrying out the method of the first aspect or any possible implementation manner of the first aspect.
In a fifth aspect, the present application provides a chip, which includes a processing circuit and a transceiver pin. Wherein the transceiver pin and the processing circuit are in communication with each other via an internal connection path, and the processing circuit is configured to perform the method of the first aspect or any one of the possible implementations of the first aspect to control the receiving pin to receive signals and to control the sending pin to send signals.
Drawings
FIG. 1 is a schematic diagram illustrating the data layer acquiring and pre-processing a picture asset;
FIG. 2 is a diagram illustrating an exemplary hand-containing picture asset;
fig. 3 is a schematic view of a hand image resource obtained by identifying the image resource shown in fig. 2 by using the hand identification model provided in the present application;
fig. 4 is a schematic view of a hand image resource obtained after identifying an image resource including a palm by using the hand identification model provided in the present application;
FIG. 5 is a schematic diagram of the hand image resource shown in FIG. 4 after positioning the hand joint points by using the gesture recognition model provided in the present application;
fig. 6 is a schematic processing flow diagram of a data layer, a reinforcement learning layer, a model layer, and a judgment layer in the tabu image recognition method provided by the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The term "and/or" herein is merely an association relationship describing an associated object, and means that there may be three relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone.
The terms "first" and "second," and the like, in the description and in the claims of the embodiments of the present application are used for distinguishing between different objects and not for describing a particular order of the objects. For example, the first target object and the second target object, etc. are specific sequences for distinguishing different target objects, rather than describing target objects.
In the embodiments of the present application, words such as "exemplary" or "for example" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g.," is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present relevant concepts in a concrete fashion.
In the description of the embodiments of the present application, the meaning of "a plurality" means two or more unless otherwise specified. For example, a plurality of processing units refers to two or more processing units; a plurality of systems refers to two or more systems.
In order to better understand the technical solutions provided in the embodiments of the present application, before describing the technical solutions of the present application, first, a description is given of types of pictures (pictures that are not allowed to appear) that are currently considered as taboo pictures by a specific country or region. Specifically, the current taboo pictures can be roughly classified into: naked skin, contraindication gesture, animal contraindication, contraindication flag and map mark.
Illustratively, in some implementations, for a taboo picture of a bare skin type, the taboo requirement is determined primarily for a taboo picture of a female with severe bans to use skin exposure, such as central, eastern, and northern africa. For example, the couplet cacique and other middle east islam countries where the major garments are veil and arabic robe, islam teaches women to wear long sleeves and pants, not to be exposed, banning the garments for extra-short skirts and flat chest and back, banning various ladies' hats, and so the feminine pictures for such garments are considered contra pictures in these countries.
For example, in other implementations, for tabu pictures of the tabu gesture type, the tabu requirements for different interpretations of the same gesture by different regions are primarily determined. For example, in the case of a picture of "magic ox horn" (a picture of a gesture with ox horn), rock in china, the picture is not considered a contraindicated picture, but in italy, grapevine, spain and other countries, the picture is used for stiring people and wives and is stigmatic, so the picture is a contraindicated picture in these western countries.
For example, in other implementations, the contraindication picture for the contraindicated animal type is determined primarily for contraindication requirements of different regions for the same animal. For example, for the north and middle african regions of the middle east, the "pig" and "dog" are sensitive content, and the pictures containing the "pig" and "dog" in these regions are considered contra pictures, while the "cat" and especially the "black cat" in the south american sector are also sensitive content, and therefore the pictures containing the "cat" in these regions, and especially the "black cat" are considered contra pictures.
Illustratively, in other implementations, the contraindication picture of the contraindication flag type is determined mainly according to the current global contraindication flag requirements.
For example, in other implementations, for map logo type taboo pictures, the taboo requirement is currently determined mainly for the existence of national disputes.
The five taboo picture types are described herein, and it should be understood that the above description is only an example for better understanding of the technical solution of the present embodiment, and is not to be taken as the only limitation on the present embodiment. In practical application, if the region where the product is popularized relates to other taboo requirements, the type of the taboo picture and the requirement for judging the taboo picture can be adjusted according to requirements, and the method is not limited in the application.
Based on the taboo requirement (determination requirement) corresponding to the taboo picture type, in order to enable products, particularly various electronic devices with display interfaces, to better meet requirements of local political standings, historical backgrounds, religious beliefs and the like, the taboo picture identification scheme provided by the application is provided, so that picture resources related to an operating system can be automatically identified in a more stable, more efficient and more accurate mode under the condition of reducing manpower input, and therefore the operating system versions for different countries and regions can meet the requirements of the local political standings, the historical backgrounds, the religious beliefs and the like.
Understandably, the operating system of the electronic device is usually issued by the package server. Therefore, in some implementation manners, the technical scheme provided by the application can be applied to a package taking server, so that when an iterative operating system needs to be updated and published, the taboo picture identification method provided by the application can be used for identifying an upgrade package of the operating system needing to be published, and then the identified taboo picture is filtered (deleted), so that the published operating system versions for different countries and regions can meet the requirements of the current country and region.
For example, in other implementations, the identification of the tabu picture may also be implemented by an Over-the-Air Technology (OTA) server for different countries and regions. That is, when the package server issues the upgrade package of the iterative operating system, only one copy needs to be issued, when the OTA server for different countries and regions acquires the upgrade package from the package server, according to the taboo requirement placed in advance, the technical scheme provided by the application identifies the taboo picture, and then filters the identified taboo picture, thereby ensuring that the iterative operating system can meet the requirements of the current country and region such as political grounds, historical backgrounds and religious beliefs.
It should be understood that the above description is only an example for better understanding of the technical solution of the present embodiment, and is not to be taken as the only limitation of the present embodiment.
Implementation details of the technical solutions provided in the embodiments of the present application are described below, and the following description is provided only for convenience of understanding and is not necessary for implementing the present solution.
Illustratively, taking a feasible implementation manner as an example, a platform architecture to which the technical solution provided by the present application is applicable may be divided into a data layer, a model layer, a judgment layer, and a reinforcement learning layer. The data layer is used for acquiring image resources needing to be identified and processed in the operating system and preprocessing the image resources; the model layer is used for combining the identification models corresponding to different taboo picture types according to business requirements so as to identify different taboo pictures; judging the resources of the layer for backtracking and confirming the taboo picture, and automatically generating a visual report; the reinforcement learning layer is used for training the reinforcement learning model according to the misreported picture resources marked in the visual report, so that the picture resources provided by the data layer can be filtered by the reinforcement learning model in the subsequent identification, the picture resources which are mistakenly considered as taboo pictures before are removed, and then the filtered picture resources are sent to each identification model in the model layer for identification processing.
In order to better understand the above platform architecture, the contents related to each layer are described below.
Specifically, the processing logic of the data layer is roughly divided into two steps of picture resource acquisition and picture resource preprocessing.
Illustratively, for the picture resource acquisition step, in some implementations, it may be, for example, an automated scan of the image resources within the code bin upon receiving a distributed version control system (git) command to scan the picture resources within the code bin. Taking the Android system as an example, the scanned image resources can be classified into a Portable Network Graphics (PNG) format, a Joint Photographic Experts Group (JPEG/JPG) format, and a Android-specific Drawable object (DVG) format.
In addition, it should be noted that, in the technical solution provided in the present application, in order to facilitate the accurate positioning of the position of the tabu picture and deletion thereof when the picture resource is determined to be the tabu picture in the following, in the picture resource obtaining step, in addition to obtaining the picture resource itself, the name of the picture resource, the stored path address, and the like may also be obtained, so that the subsequent accurate positioning of the position of the tabu picture according to the name and the path address is facilitated to delete the same.
For the convenience of the following description, the program/function for scanning the picture resources from the code bin is referred to as a picture resource scanner in this application, for example, as shown in fig. 1. Furthermore, it can be understood that, regarding the picture resources in the PNG format and the JPG format, such as the phone picture resource 101 shown in fig. 1, the background color thereof is transparent (without background color) in the code bin, and therefore, the background color is substantially invisible to the naked eye for the user, so in order to facilitate the identification processing of the model layer, the judgment layer and the reinforcement learning layer, when the picture resources in the PNG format and the JPG format are obtained by scanning, the picture resources in the PNG format and the JPG format need to be preprocessed, that is, the picture resources are executed, so as to obtain the phone picture resource 102 shown in fig. 1, that is, the phone picture resource can be seen with the naked eye.
Furthermore, for the picture resources in the DVG format, in order to save the image storage space in the Android system, a vector drawable object (vectortrawable) is used to store the picture resources, especially some icons carried by the system, and the icons in the VD form, i.e. the picture resources in the DVG format, are stored in the code bin in the XML format, for example, the XML code of the picture resources in the video recorder can be as shown in 103 in fig. 1. For such picture resources, since the picture resources are also invisible, the model layer, the judgment layer, and the reinforcement learning layer cannot directly perform feature extraction, and when the picture resources in the DVG format are obtained by scanning, preprocessing is also required, that is, a picture resource preprocessing step is executed, so that the video recorder picture resources 104 shown in fig. 1 are obtained, that is, the picture resources can be visible to the naked eye.
Furthermore, it will be appreciated that although picture assets in the PNG format and JPG format are not visible to the naked eye, the system can recognize that such picture assets are themselves stored in picture form, not in XML code. Thus, the preprocessing steps performed on the PNG format and JPG format picture resources may include, for example, size processing, channel number processing, and visualization processing.
For example, specifically, in the technical solution provided in the present application, for the size processing, specifically, the sizes of the picture resources scanned by the picture resource scanner are unified into a format of 640 × 3; for the channel number processing, the single channel, the three channels or the four channels are all processed into three channels in a unified way in the application.
It should be noted that, in practical applications, the common image channels are divided into a single channel, three channels and four channels (RGBA). Wherein, the single channel is gray scale photograph (GrayScale), and the number of layers is only one; three channels are common color photographs (RGB), and the number of the channels is three layers of R, G and B; the four channels are RGB + Alpha pictures, and the last layer of Alpha is usually an opaque parameter, so that the four channels may be completely transparent (as shown in fig. 101) or completely black.
In addition, it should be noted that the above visualization processing is specifically directed to a picture resource in which four channels and the Alpha layer are all black or all transparent.
For example, in the case that all the Alpha layers are 1 (all black) or 0 (all transparent), it may be selected to directly delete the Alpha layers and directly use the RGB layers in a normal case, but for a picture resource that has an Alpha layer for a small part of the picture and uses the RGB layers for the transparentization processing, it is necessary to use the Alpha layers as the processed single-picture layer images, remove the RGB layers, and then superimpose them to form a three-channel picture.
It should be understood that the above description is only an example for better understanding of the technical solution of the present embodiment, and is not intended to limit the present embodiment. In practical application, a preprocessing process of the picture resources can be set according to business needs, for example, any of the above items can be selected according to business needs, or image enhancement processing is added on the basis of the above processing process, so that subsequent feature extraction is facilitated, and the picture resources are better identified.
In addition, as can be seen from the above description, the picture resource in the DVG format exists in the form of XML code. Therefore, for the preprocessing of such picture resources, it is necessary to firstly adapt the attributes and values in the DVG format picture resources to the attributes and values corresponding to Scalable Vector Graphics (SVG), then save the attributes and values in the SVG format picture resources, and finally convert the SVG format picture resources into PNG format picture resources by using a calirosvg module package (a tool package for converting the SVG format into the PNG format) in python (a computer programming language), thereby completing the conversion from the DVG format to the PNG format. Then, the above-mentioned pre-processing operations given to the PNG-format picture resources, such as size processing, channel number processing, and visualization processing, may be performed.
Understandably, because the picture resources in the DVG (or VG) format and the picture resources in the SVG format are implemented based on XML syntax, and the two formats are common to the names of the basic graphic definitions, for example, lines (lines), texts (text), rectangles (rect), circles (circle), etc. are the same names, and the picture resources in the two formats are laid out based on coordinates. Thus, mutual conversion can be achieved by adapting the attributes and values. However, the image resources in the SVG format can be converted into the image resources in the PNG format based on the calirosvg module package in python, and the DVG has no way of directly converting the DVG format into the PNG format or the JPG format for a while, so that the image resources in the DVG format need to be converted into the image resources in the SVG format first, and if the image resources in the DVG format can be directly converted into the PNG format or the JPG format subsequently with the development of the image processing technology, the operation of adapting the attributes and values in the image resources in the DVG format into the corresponding attributes and values in the image resources in the SVG format can be skipped, and the operation of converting the image resources in the DVG format into the image resources in the SVG format can be directly performed by means of a tool.
Specifically, in the present application, when the attributes and values in the DVG-formatted picture resource are reprogrammed to the corresponding attributes and values in the SVG-formatted picture resource, for example, the DVG-formatted picture resource may be loaded first, then the scope of the DVG-formatted picture resource is identified, then the origin coordinate when the DVG-formatted picture resource is converted into the SVG-formatted picture resource is calculated, then the attributes and values of the DVG-formatted picture resource are reprogrammed according to the calculated origin coordinate, so as to obtain the graphic element in the SVG format, finally the SVG file is created, and the converted image element is added to the SVG file, so that the operations of reprogramming the attributes and values in the DVG-formatted picture resource into the corresponding attributes and values in the SVG-formatted picture resource can be implemented.
The description of the picture resource acquiring step and the picture resource preprocessing step in the data layer is introduced here, and it should be understood that the above description is only an example for better understanding of the technical solution of the present embodiment, and is not to be taken as the only limitation to the present embodiment. For the operation of adapting the attributes and values in the DVG format picture resources to the corresponding attributes and values in the SVG format picture resources, the operation of converting the SVG format picture resources to the PNG format picture resources, and the specific implementation details of the size processing, the channel number processing, the visualization processing, and the image enhancement processing, reference may be made to the existing standard, which is not described herein again.
For the processing logic of the model layer, according to the taboo picture types, for example, five types given above, the identification models corresponding to each taboo picture type can be selected in the model layer according to the business requirements for combination. For convenience of subsequent description, in the technical solution provided in the present application, a recognition model for recognizing a tabu picture of a naked-skin category is referred to as a naked-skin recognition model, a recognition model for recognizing a tabu picture of a tabu gesture category is referred to as a tabu gesture recognition model, a recognition model for recognizing a tabu picture of a tabu animal category is referred to as a tabu animal recognition model, a recognition model for recognizing a tabu picture of a tabu flag category is referred to as a tabu flag recognition model, and a recognition model for recognizing a tabu picture of a sensitive area (tabu map identifier) category is referred to as a sensitive area recognition model.
It should be understood that the above description is only an example for better understanding of the technical solution of the present embodiment, and is not intended to limit the present embodiment. In practical application, the identification models for identifying different tabu pictures can be named according to needs, and the naming is not limited herein.
Illustratively, the above names are examples, and in the technical solutions provided by the present application, the bare skin identification model includes a portrait identification model and a bare skin level detection model. The face recognition model is obtained by training face-type picture resources in the COCO data set based on a YoloV5 network and is used for recognizing the face and the gender of the preprocessed picture resources; the bare skin degree detection model is obtained by training data in Imagenet data sets based on an NSFW frame of a residual error network ResNet50 and is used for detecting the bare skin degree of female picture resources identified by the image identification model, namely the female picture resources.
It can be understood that the Yolo v5 network and the general Yolo series algorithm belong to the same category of one-step detection algorithm. In the aspect of data preprocessing, yoloV5 continues to use the mosaic image online enhancement mode proposed by YoloV4, and aims to increase the number of small targets in a single batch, improve the identification capability of a network on the small targets, and increase the data information of the single batch. That is, yoloV5 is used for target object recognition, specifically, female recognition in the present application, that is, the target object to be recognized is a portrait and gender is female. The COCO dataset, i.e., a Common Objects in cotext (COCO) dataset in the COntext, is a dataset that can be used for image recognition, and includes large and rich object detection, segmentation and caption data. The COCO data set is mainly intercepted from a complex daily scene by taking scene understating as a target, and the position of the target in an image is calibrated through accurate segmentation. Currently, images in the COCO dataset include 91 classes of objects, 328000 images and 2500000 labels (labels). In addition, the COCO dataset is also the largest dataset with semantic segmentation so far, and provides 80 classes, more than 33 ten thousand picture resources, wherein 20 ten thousand picture resources are labeled, and the number of individuals in the whole dataset is more than 150 ten thousand. Since the COCO dataset carries the portrait (person) category itself, and different labels are applied to males and females, respectively. Therefore, in the technical scheme provided by the application, the portrait identification model for identifying the female picture resources performs iterative training on the picture resources of the portrait category in the COCO data set through the yoloV5 network until the preset convergence requirement is met, so that after the preprocessed picture resources are input into the portrait identification model as the output parameters, the portrait identification model identifies the picture resources, and then whether the currently input picture resources contain females or not, namely whether the currently input picture resources are female picture resources can be accurately determined.
In addition, it should be noted that the Yolo v5 network adopts a normal example definition mode different from the general Yolo series algorithm. The general Yolo series algorithm defines a positive example by using an Intersection Over Unity (IOU) value of a prior frame and a real target frame, the prior frame with the IOU value larger than a threshold is set as the positive example, and since the prior frame and the real target frame are in a one-to-one correspondence relationship, only the positive examples with the same number as the real target frame exist at most, and the positive examples and the negative examples are unbalanced. And the YoloV5 defines a positive case by using the aspect ratio of the prior frame to the real target frame, the positive case is determined when the aspect ratio is smaller than a threshold value, meanwhile, the proportion of the positive case is increased in a mode of allowing a plurality of prior frames to be matched with one real target frame, an offset is set, the same target is predicted by using adjacent grids at the same time, the number of the positive cases is further increased, and the proportion of the positive cases is greatly increased.
In terms of the loss function, yoloV5 guides training by using three parts of class loss, target loss and regression loss. Let the total index of the prediction box be i, the positive case index of the prediction be p, y be the prediction category, y 1 Is a real category.
Illustratively, with respect to class loss, a binary cross entropy is employed, which is calculated as follows (1):
Figure RE-GDA0003736326840000081
illustratively, regarding the target loss, a binary cross entropy is adopted, and the formula is calculated as the following formula (2):
Figure RE-GDA0003736326840000082
illustratively, regarding the regression loss, a GIOU (Generalized Intersection over Union) calculation method is adopted, which is a calculation formula as the following formula (3), wherein iou is an objective function of the regression part of the target detection and a measure p To predict the intersection ratio of the positive case p and the corresponding real case:
Figure RE-GDA0003736326840000083
illustratively, regarding the total loss, it is a weighted sum of the above three partial losses, and its calculation formula is as follows (4):
loss=α·cls loss +β·obj loss +γ·giou loss (4)
the content of the YoloV5 network and the content of the COCO data set are introduced here, and when a portrait recognition model is obtained by training image resources of portrait categories in the COCO data set based on the YoloV5 network, the content that is not described in this application may refer to the YoloV5 network and the related standard of the COCO data set, and details are not described here again.
In addition, in some implementations, the content of the portrait recognition model obtained by the input training may be, for example, a picture resource after a preprocessing operation in the data layer, or a picture resource in which a contra-indicated picture that is mistakenly reported by the past period is filtered by the reinforcement learning layer.
In addition, in some implementations, the content output after the face recognition model recognition processing may be set to "1" or "0", and it is agreed that "1" indicates that the currently recognized picture resource includes a woman, i.e., a female picture resource, and "0" indicates that the currently recognized picture resource does not include a woman.
In addition, in other implementation manners, the content output after the portrait recognition model recognition processing may be set to "Yes" or "No", and it is agreed that "Yes" indicates that the currently recognized picture resource includes a woman, that is, a woman picture resource, and "No" indicates that the currently recognized picture resource does not include a woman.
It should be understood that the above description is only an example for better understanding of the technical solution of the present embodiment, and is not to be taken as the only limitation of the present embodiment.
In addition, it should be noted that the Imagenet data set used for training the nudity detection model has a plurality of nodes (nodes) like a network. Each Node corresponds to an item or subclass. According to the standard of the Imagenet data set, a Node contains at least 500 pictures/images which correspond to objects and can be used for training, namely, the Imagenet data set is a huge picture library which can be used for image/visual training actually. Regarding the bare skin degree detection model, the present application still adopts the NSFW (Not Suitable For Work) framework of ResNet50 commonly used For training the bare skin degree detection model, but because the taboo picture of the bare skin category in the technical scheme provided by the present application mainly refers to a female picture in which skin exposure is strictly prohibited in the middle, east, and north non-regions, the bare skin degree detection model related to the technical scheme provided by the present application is substantially different from the existing bare skin degree detection model For identifying unsuitable browsing contents. Specifically, the bare skin degree detection model in the technical scheme provided by the application is obtained by training features such as the number of skin areas in the female image resource identified by the human image identification model, the ratio of pixels of each skin area to all pixels in the female image resource, the largest skin area in the female image resource, the total skin area and the like. Moreover, because the regions have low tolerance to the exposure, a convergence value of the intersection can be set, and the iterative training is continuously updated until the convergence requirement is met.
Specifically, in the application, when the bare skin degree detection model obtained through the feature training is used for detecting the female image resource, the features are mainly used as scoring parameters, namely, the scoring parameters such as the number of skin areas, the ratio of the pixels of each skin area to all the pixels in the female image resource, the largest skin area in the female image resource, the total skin area and the like are scored, and then the bare fraction of the female image resource is determined.
In addition, it should be noted that for some women with only two bare arms, the bare score determined by the bare skin level detection model of the present application may have a score of 0.22, which does not belong to a contraindicated picture for most regions, but still is regarded as a contraindicated picture for the central east north african region, so a lower bare threshold, such as 0.01, may be set, so that by comparing the bare threshold with the bare score determined by the bare skin level detection model, it may be determined whether the current female picture resource is a contraindicated picture.
For example, in some implementations, when the nudity score is smaller than the nudity threshold, it may be determined that the female picture resource is not a tabu picture, otherwise it may be determined that the female picture resource is a tabu picture of a nudity skin type.
The contents of the NSFW framework of ResNet50 and the Imagenet data set are described here, and when training a bare skin level detection model for data in the Imagenet data set based on the NSFW framework of ResNet50, the contents not described in the present application may be referred to the NSFW framework of ResNet50 and the relevant standards of the Imagenet data set, and thus, the details are not described here again.
In addition, it should be noted that, in some implementation manners, the content output after the identification processing by the bare skin degree detection model may be set to be "1" or "0", and it is agreed that "1" indicates that the current female picture resource is a taboo picture of a bare skin type, that is, the bare score is greater than or equal to the bare threshold, and "0" indicates that the current female picture resource is not a taboo picture, that is, the bare score is less than the bare threshold.
In addition, in other implementation manners, the content output after the bare skin degree detection model identification processing may be set to "Yes" or "No", and "Yes" is agreed to indicate that the current female picture resource is a taboo picture of a bare skin type, that is, the bare score is greater than or equal to the bare threshold, and "No" indicates that the current female picture resource is not a taboo picture, that is, the bare score is less than the bare threshold.
It should be understood that the above description is only an example for better understanding of the technical solution of the present embodiment, and is not to be taken as the only limitation of the present embodiment.
Therefore, according to the technical scheme provided by the application, the exposed skin identification model for identifying the taboo pictures of the exposed skin types is set to comprise the portrait identification model and the exposed skin degree detection model, the portrait and the gender of the preprocessed picture resources or the picture resources of the taboo pictures which are filtered by the reinforcement learning layer and are mistakenly reported in the past are identified by the portrait identification model, so that a large number of taboo pictures which do not conform to the exposed skin types can be filtered, then the female picture resources identified by the portrait identification model are scored by the exposed skin degree detection model, the exposed score is determined, and the exposed score is compared with a lower exposed threshold value, so that whether the current picture resources are taboo pictures can be efficiently and accurately determined.
For example, most taboo gestures are small and numerous, and it is difficult to collect a large amount of picture resources for deep learning training, so that conventional neural network technology cannot be directly utilized for detection. Still with the above name example, in particular to the technical solution provided by the present application, in order to efficiently and accurately implement recognition of a tabu gesture picture, the tabu gesture recognition model includes a hand recognition model and a gesture recognition model. The hand recognition model is obtained by training hand data in the gesture recognition data set based on a SSD (Single-shot Detector) target recognition framework, is used for carrying out hand recognition on preprocessed picture resources and returns to a bounding box; the gesture recognition model is a MediaPipe frame structure and is used for carrying out accurate key point positioning on 21 3D hand joint points (landmarks) in a hand area in hand picture resources (including the picture resources of hands) recognized by the hand recognition model (namely, automatic dotting on the 21 3D landmarks), extracting the coordinate information of the 21 3D landmarks, and analyzing the coordinate information of the 21 3D landmarks based on a Support Vector Machine (SVM) algorithm, so that gestures in the hand picture resources are predicted.
The SSD is a target detection algorithm, the main design idea of an SSD target recognition framework is that features are extracted in a layered mode, frame regression and classification are sequentially carried out, specifically, in the technical scheme provided by the application, a gesture recognition model is obtained by training hand data in a gesture recognition data set based on the SSD target recognition framework, and therefore when a preprocessed picture resource is recognized, recognition processing is carried out on the hand data by the gesture recognition model, so that the feature information is extracted by using deep learning, and the accuracy of the extracted feature information is guaranteed.
In addition, the hand recognition model is of an SSD target recognition frame structure with high processing speed and low cost, so that the accuracy is guaranteed, and meanwhile, the feature information can be efficiently extracted, so that the operation resources are saved.
In addition, in some implementations, the Gesture Recognition data set may be, for example, a Hand position Recognition (HGRD) data set, and various Hand pictures/images are collected in the HGRD data set.
In another implementation, the Gesture recognition Data set may be, for example, a dynamic Gesture Data set, such as a CGD (ChaLearn Gesture Data), a ChaLearn LAP isocgd Data set derived from the CGD, a ChaLearn LAP consistent gd derived from the CGD, which is not listed here, but is not limited in this application.
It should be understood that the above description is only an example for better understanding of the technical solution of the present embodiment, and is not to be taken as the only limitation of the present embodiment.
Further, the MediaPipe framework referred to above is a framework mainly used for constructing audio, video, or any time-series data. With the help of the MediaPipe framework, pipes can be built for different media processing functions. At present, mediaPipe is mainly applied to multi-hand tracking, face detection, object detection and tracking, 3D object detection and tracking, automatic video clipping pipeline, etc., which are not listed here, but the present application is not limited thereto. Specifically, in the technical solution provided by the present application, the gesture recognition model is based on the characteristic that the MediaPipe frame can be applied to multi-hand tracking, that is, the gesture recognition model of the MediaPipe frame structure uses single palm detection, and once completed, it performs dotting on 21 3D landworks in the detected hand region, that is, performs accurate keypoint positioning.
In addition, after the gesture recognition model dots the 21 3D landworks and extracts the coordinate information of the 21 3D landworks, an SVS algorithm is adopted, which is specifically a one-to-one method (OVO SVMs or pair in short). The idea about the algorithm is to design one SVM between any two samples, so that k (integer greater than 0) classes of samples need to design k (k-1)/2 SVMs.
Illustratively, when the class of a sample is predicted by using the SVM, for example, for classification of an unknown sample, the class with the highest final score is the class of the unknown sample, and for understanding, the following description is provided with reference to an example.
For example, assume that the sample has 4 classes, A, B, C, and D. In prediction, every two of the 4 classes a, B, C and D can be used as an SVM, and then 6 SVM training sets of (a, B), (a, C), (a, D), (B, C), (B, D) and (C, D) are obtained. During testing, the corresponding vectors are respectively tested for 6 results, and then a screen projection mode is adopted, so that a group of results can be obtained finally.
Illustratively, regarding the voting process, for example, there are:
when initializing A, B, C and D, setting A = B = C = D =0;
these 6 SVM training sets are then voted:
(a, B) -classsifier, if a win, a = a +1; otherwise, B = B +1;
(a, C) -classsifier, if a win, a = a +1; otherwise, C = C +1;
(a, D) -classsifier, if a win, a = a +1; otherwise, D = D +1;
(B, C) -classifier, if B win, B = B +1; otherwise, C = C +1;
(B, D) -classsifier, if B win, then B = B +1; otherwise, D = D +1;
(C, D) -classilier, if C win, C = C +1; otherwise, D = D +1;
The decision is the Max(A,B,C,D)。
thus, if the above voting process is performed, a =3, b =2, c =1, d =0, and the final classification of the sample is the a classification.
It should be understood that the above description is only an example for better understanding of the technical solution of the present embodiment, and is not intended to limit the present embodiment.
For better understanding of the recognition logic of the tabu gesture picture, the following description is made in conjunction with fig. 2 to 5.
Illustratively, if the picture resource input into the hand recognition model through the preprocessing is shown in fig. 2 (including the electronic device 201 and the hand 202), after the picture resource is recognized by the hand recognition model, it is determined that the picture resource shown in fig. 2 includes the hand, and therefore the picture resource is determined as the hand picture resource (including the picture resource of the hand), and the bounding box 203 of the position where the hand 202 is located is marked, as shown in fig. 3.
In addition, in some implementations, the content output after the hand recognition model is processed may be set to "1" or "0", and it is agreed that "1" indicates that the current picture resource includes a hand, that is, the hand picture resource, and "0" indicates that the current picture resource does not include a hand.
For example, if the hand included in the picture resource input into the hand recognition model through the preprocessing is shown in fig. 4, after the picture resource is recognized through the hand recognition model, it is determined that the hand is included in the picture resource shown in fig. 4, and therefore the picture resource is determined as a hand picture resource (a picture resource including a hand), and a bounding box 302 where the hand 301 is located is marked, as shown in fig. 4.
In addition, in other implementations, the content output after the hand recognition model is processed may be set to "Yes" or "No", and it is agreed that "Yes" indicates that the current picture resource contains a hand, that is, the current picture resource is a hand picture resource, and "No" indicates that the current picture resource does not contain a hand.
It should be understood that the above description is only an example for better understanding of the technical solution of the present embodiment, and is not to be taken as the only limitation of the present embodiment.
In order to more intuitively know the dotting of the gesture recognition model on 21 landworks in the hand picture resources, the hand picture resources recognized in fig. 4 are taken as an example.
For example, if the hand image resource identified by the hand recognition model is input into the gesture recognition model, the hand landmark schematic diagram shown in fig. 5 is obtained after the hand image resource is dotted by the gesture recognition model, where 0 to 20 are the dotted points of 21 landworks.
Illustratively, after obtaining the 21 3D landworks dotting, the coordinate information of the 21 3D landworks is extracted. For convenience of explanation, the coordinate information of the 21 3D landworks in the present application is taken as a sample set, the sample set is then divided into 16 classes, and based on the SVM algorithm, the samples are subjected to prediction processing by 16 (16-1)/2 SVMs, so as to implement classification processing on the gestures in the hand picture resources.
Still taking the hand picture resource shown in fig. 4 as an example, after predicting the samples corresponding to the 21 dotted coordinate information of 3D landworks in fig. 5 based on the SVM algorithm, it may be determined that the gesture in the hand picture resource shown in fig. 4 is a "five-finger open" gesture.
In addition, it can be understood that, in the present application, the sample is divided into 16 categories determined based on the currently determined 16 tabu gestures, that is, in practical applications, how many categories the sample corresponding to the hand image resource is divided into can be determined according to the category of the tabu gestures that needs to be recognized.
The contents of the SSD object recognition framework, the MediaPipe framework, and the SVM algorithm are introduced next, and when the hand recognition model is trained based on the SSD object recognition framework, the gesture recognition model is constructed based on the MediaPipe framework, and the gesture is predicted based on the SVM algorithm, the contents not described in the present application can refer to the relevant standards of the SSD object recognition framework, the MediaPipe framework, and the SVM algorithm, and are not described herein again.
Therefore, in the technical scheme provided by the application, the tabu gesture recognition model of the tabu picture of the tabu gesture type through the setting device comprises a hand recognition model and a gesture recognition model, the picture resource after preprocessing or the picture resource of the tabu picture after filtering out the current error report through the reinforcement learning layer is firstly subjected to hand recognition through the hand recognition model of the SSD target recognition frame structure with small calculation cost and is returned to the boundary frame, so that the gesture detection range is narrowed, and the recognition efficiency of the gesture recognition model can be improved.
In addition, in the technical scheme provided by the application, when the hand recognition model recognizes that the current picture resource is the hand picture resource, the hand picture marked with the hand boundary box is input into the gesture recognition model for recognition processing, so that the gesture recognition model of the MediaPipe frame structure can perform gesture detection in a range where a hand is determined, and further the recognition precision of the gesture recognition model is improved.
For example, in consideration of actual business requirements, for a taboo picture of a taboo animal type, no matter a real animal image or a cartoon/simple stroke type animal image is not suitable for being displayed in a region where the taboo picture is abstaining from being displayed, so that in order to better meet user requirements and improve user experience, in the technical scheme provided by the application, when the taboo animal recognition model is trained based on a resenxt network (a convolutional neural network), a data set required for training is fused with animal image data in an Imagenet data set and various cartoon/simple stroke animal image data crawled from the network, so that a taboo animal recognition model capable of covering a real scene (a picture of an animal) and a virtual scene (an animal with cartoon/simple strokes) can be obtained, and the taboo picture of the taboo animal type can be recognized more accurately.
It can be understood that 1200 ten thousand picture resources are covered in the current Imagenet data set, the picture resources cover more than 1000 kinds, and the training data is relatively rich for three kinds of common contraindicated animals, such as 'pigs' and 'dogs' contraindicated in the North Africa Central Africa region, and 'cats' contraindicated in the south American region, especially 'black cats', which have about 35 ten thousand picture resources.
Furthermore, in order to guarantee that the three types of contraindication animals exist in the form of cartoon/simple strokes, as many as possible numbers, for example, 5 ten thousand, can be crawled, and the crawled contents are processed by 1.
Therefore, iterative training is carried out on the basis of 35 picture resources of three types of taboo animals in the Imagenet data set and the crawled picture resources in the form of 5 ten thousand cartoon/simple strokes, and until convergence conditions are met, a taboo animal recognition model meeting business requirements can be obtained.
For the way of crawling the picture resources in cartoon/simplified picture form from the network, reference may be made to the existing standard, which is not described herein again.
Illustratively, considering that the number of the taboo flags and the sensitive areas is limited, so that a large amount of picture resources are difficult to collect for deep learning training, and there is no suitable framework and pipeline model temporarily, a yolo v3 network is adopted to construct a recognition model for recognizing the two types of taboo pictures, that is, in practical applications, a taboo flag recognition model for recognizing the taboo pictures of the taboo flag type and a sensitive area recognition model for recognizing the taboo pictures of the sensitive area type may be separately available, that is, the taboo flag recognition model is trained based on training data of the yolo v3 network on picture resources of the taboo flag type, and the sensitive area recognition model is trained based on training data of the yolo v3 network on picture resources of the sensitive area type.
For example, in other implementations, the taboo flag recognition model for recognizing the taboo picture of the taboo flag type and the sensitive area recognition model for recognizing the taboo picture of the sensitive area type may also be one recognition model, that is, the recognition model may recognize both the taboo picture of the taboo flag type and the taboo picture of the sensitive area type, so that the data of the picture resource of the taboo flag type and the data of the picture resource of the sensitive area type may be fused into one data set, and then the data in the data set is trained based on the yolo v3 network to obtain the recognition model that can recognize both the taboo picture of the taboo flag type and the taboo picture of the sensitive area type.
The construction method of the identification model corresponding to the five taboo types in the model layer is described here, and it should be understood that the above description is only an example for better understanding of the technical solution of the present embodiment, and is not to be taken as the only limitation to the present embodiment. Regarding the various network architectures used, the algorithms are not detailed in this application, and may refer to existing standards, which are not described here.
For the processing logic of the fault judgment, in order to further improve the identification accuracy, for the picture resource that does not pass through any one of the pipelines in the model layer (the pipeline corresponding to the identification models of different taboo types), that is, the picture resource determined as a taboo picture by the model layer, a warning report (hereinafter referred to as a warning report) may be generated from the name and the path address of the picture resource and the visual picture resource, and the warning report is submitted to a human for rechecking.
It should be noted that, although the present application still involves a manual review link, most of the picture resources are quickly and accurately filtered out by means of the identification model identification processing corresponding to different taboo types in the model layer before the manual review, and the picture resources submitted for the manual review are greatly reduced, for example, the number of the picture resources may be reduced from tens of thousands to hundreds of pictures, so that the labor cost can be effectively reduced.
In addition, in the corning report submitted for manual review, the picture resources determined as the tabu pictures by the identification model in the model layer are displayed in a visual form, so that whether the picture resources are the tabu pictures can be directly determined during manual review, and the picture resources do not need to be searched according to the names and path addresses of the picture resources to be determined, and the review efficiency of the manual review link is improved.
For better understanding, the following description is made with reference to a visual warning report table shown in table 1.
TABLE 1 visual warning report form
Figure RE-GDA0003736326840000141
Illustratively, as shown in table 1, the picture resource with the picture format of PNG, the name of black cat 1_1.PNG, the path address of http:// schema. Android. Com/apk/res/android1.1", the picture resource with the path address of JPG, the name of democratic horn gesture, JPG, the picture resource with the path address of http:// schema. Android. Com/apk/res/android1.2", and the picture resource with the path address of http:// schema. Android. Com/apk/res/android1.3 "are determined to be a tabbed picture after identification processing by the tabbed animal identification model and tabbed gesture identification model, in which case, the picture resource scanner acquires the name and the path address of the obtained picture resource, and the picture resource is processed as well as the visual report of the picture resource is generated and then submitted to the visual report core as shown in the visual report table 1.
Correspondingly, if the submitted rechecking information of the rechecking personnel is received after the visualized warning report table is submitted to manual rechecking, the picture resource which is determined to be false alarm by the rechecking personnel is deleted from the visualized warning report table in response to the rechecking operation of the rechecking personnel, and finally the rest taboo pictures in the visualized warning report table are deleted from the system according to the path addresses of the rest taboo pictures in the visualized warning report table.
For ease of understanding, the following description is specifically made with reference to a visual warming report table after manual review shown in table 2.
TABLE 2 visual warming report Table
Figure RE-GDA0003736326840000151
Illustratively, as shown in table 2, according to the manual review result ("whether the review information" Yes "or" No "recorded in a column is false-positive"), it is possible or which picture resources are false-positive, and it needs to be deleted from the visual warning report table shown in table 2.
For example, in some implementations, it may be agreed that "Yes" indicates that the current picture resource is a false positive, that is, the picture resource is not a tabu picture, and is not required to be deleted from the system and can be normally displayed; the convention 'No' indicates that the current picture resource has No false alarm, namely the picture resource is a contraindicated picture, needs to be deleted from the system and cannot be displayed.
For example, in other implementation manners, it may be further agreed that "1" indicates that the current picture resource is a false report, that is, the picture resource is not a tabu picture, and can be normally displayed without being deleted from the system; the convention "0" indicates that the current picture resource has no false alarm, i.e. the picture resource is a tabu picture, and needs to be deleted from the system and cannot be displayed.
Illustratively, based on the visual warming report table shown in table 2, in response to the review operation of the review personnel, the review personnel is determined as a false-positive photo resource at the judgment layer, for example, a "touch gesture" stored in a path address of "http:// schema. Android. Com/apk/res/android1.3" in table 2, the PNG "photo resource is deleted from the visual warming report table, and the visual warming report table for deleting the false-positive photo resource is shown in table 3.
TABLE 3 visual warning report form
Figure RE-GDA0003736326840000161
And finally, deleting the remaining taboo picture 'black cat 1_1. PNG' in the visual warning report table from the system according to the path address 'http:// schema. Android. Com/apk/res/android 1.1' of the remaining taboo pictures in the visual warning report table shown in the table 3, and deleting the 'magic horn gesture.JPG' from the system according to the path address 'http:// schema. Android. Com/apk/res/android 1.2'.
It should be understood that the above description is only an example for better understanding of the technical solution of the present embodiment, and is not intended to limit the present embodiment.
For the processing logic of the reinforcement learning layer, in order to reduce the false alarm condition of the tabu pictures identified by each identification model in the model layer and further reduce the human input, the rechecker can recheck the picture resources (subsequently called false alarm picture resources) determined as false alarms to record, then extract the feature information of the false alarm picture resources based on the ResNet50 convolutional neural network, train the convolutional neural network model of the ResNet50 frame by adopting an oversampling strong memory training mode, train a classification model (OneClassSVM) by adopting an overfitting mode, then input the picture resources preprocessed by the data layer into the identification model of each pipeline in the model layer for identification, firstly input the ResNet50 convolutional neural network model in the reinforcement learning layer for feature extraction, and then input the advanced features into the OneClassSVM classification model for processing, so that not only can the same false alarm picture resources be filtered out firstly, but also similar picture resources can be filtered out. That is, for each misinformation picture resource, the rechecker only needs to examine once, and then the rechecker will not appear in the visual warning report table.
In addition, it should be noted that, regarding the construction of the ResNet50 convolutional neural network model in the reinforcement learning layer in the present application, specifically, an existing ResNet network model may be modified to adapt to the image classification task in the present application.
Specifically, the ResNet network model is modified, for example, to remove the last layer (top layer) of the network model, then fix (freeze) parameters of all the preceding neural network layers, then add 32 × 640 × 3 fully-connected layers and the last layer of decision layers in sequence at the end of the network model, and finally train by using the feature information in the over-sampled misreport image resource data set as parameters.
Understandably, since the number of the false alarm picture resources is relatively small, in order to ensure the accuracy of the trained ResNet50 convolutional neural network model, the false alarm picture resources need to be copied, so that enough sample data is concentrated in the training sample data, for example, 1000 copies of the same false alarm picture resource, or even more.
In addition, it should be noted that, in some implementation manners, the update training of the ResNet50 convolutional neural network model and the oneplasssvm classification model in the reinforcement learning layer may be, for example, when a preset time is met, for example, one month is met, the misreported image resources recorded in sequence are obtained, and then the sequential learning training is performed according to the above manner, so that it is ensured that the misreported image resources in the past (previous) period and the similar image resources do not reappear in the next taboo image recognition, thereby reducing the misreported image resources recorded in the visual warning report table, and further reducing the invalid human input.
For example, in another implementation manner, the update training of the ResNet50 convolutional neural network model and the oneplasssvm classification model in the reinforcement learning layer may be, for example, sequentially learning and training in the above manner when the recorded false-alarm picture resources meet a preset threshold, so as to ensure that the current (previous) false-alarm picture resources and similar picture resources do not reappear in the next tabu picture recognition, thereby reducing the false-alarm picture resources recorded in the visual warning report table, and further reducing the invalid human input.
For example, in another implementation manner, the update training of the ResNet50 convolutional neural network model and the oneplasssvm classification model in the reinforcement learning layer may be, for example, when the operating system needs to perform update iteration, that is, when a new version is released, sequential learning training is performed in the above manner, so that it is ensured that the earlier (previous) false alarm picture resources and similar picture resources do not reappear in the next taboo picture recognition, thereby reducing the false alarm picture resources recorded in the visual warning report table, and further reducing the invalid human input.
It should be understood that the above description is only an example for better understanding of the technical solution of the present embodiment, and is not to be taken as the only limitation of the present embodiment.
In addition, it should be noted that, in some implementation manners, the reinforcement learning layer related in the technical scheme provided by the present application may be introduced according to business requirements, that is, the picture resources processed by the data layer are first input into the reinforcement learning layer for processing, and then the picture resources processed by the reinforcement learning layer are input into the model layer for the identification operation of the tabu picture.
In addition, in other implementation manners, the reinforcement learning layer related in the technical solution provided by the present application may not be introduced, that is, the image resource processed by the data layer is directly input to the model layer to perform the operation of identifying the taboo image.
In addition, in other implementation manners, any one or more identification models (a naked skin identification model, a tabu gesture identification model, a tabu animal identification model, a tabu flag identification model and a sensitive area identification model) in the model layers can be selected according to business requirements to be combined, so that a plurality of pipelines (Pipeline) for deep learning are used for identifying tabu pictures, and therefore identification and filtering of the tabu pictures in an operating system can be completed more stably, more efficiently, more accurately, more sharply and more user-friendly, and requirements of local political grounds, historical backgrounds, religion and the like can be met for operating system versions of different countries and regions.
In order to better understand the technical solution provided by the present application, the following describes implementation manners of different combinations of five recognition models in the model layer by taking scenes related to the data layer, the reinforcement learning layer, the model layer, and the judgment layer as examples in combination with fig. 6.
Referring to fig. 6, after receiving the identification operation of the taboo picture, for example, receiving a specified git command, in response to the git command, the picture resource scanner mentioned above acquires the picture resource involved in the operating system, i.e., step S101 is executed.
Illustratively, in step S101, the picture resource scanner obtains the picture resource from the operating system, specifically from the code bin.
Understandably, in practical applications, a large amount of picture resources may be stored in the code bin, so that the step S102 of preprocessing the scanned picture resources is performed once for each scanned picture resource.
In addition, in other implementation manners, after all the picture resources in the code bin are scanned out, step S102 may be invoked once, and then all the scanned picture resources are preprocessed once.
In other implementation manners, step S102 may be executed once after each preset number of picture resources are scanned, so as to implement a timed and quantitative batch processing operation.
It should be understood that the above description is only an example for better understanding of the technical solution of the present embodiment, and is not intended to limit the present embodiment.
Then, when the image resource is preprocessed in step S102, a corresponding preprocessing method is selected according to the format of the image resource that needs to be preprocessed currently, and the preprocessing is performed.
For preprocessing the picture resources with different formats, reference may be made to fig. 1, and the text description part in fig. 1 is not described herein again.
Next, after the operation of step S102 is completed, the operation that needs to be executed by the data layer in the present application is completed, and at this time, the preprocessed picture resource may be directly input into the recognition model selected in the model layer, or may be input into the reinforcement learning layer first, which is taken as an example for the present application.
Continuing to refer to fig. 6, exemplarily, after the picture resource is preprocessed in step S102, the preprocessed picture resource is input to a ResNet50 convolutional neural network model (hereinafter, referred to as ResNet 50) in the reinforcement learning layer and subjected to self-learning training to perform feature extraction, and then the extracted feature information is processed by an oneplasssvm classification model (hereinafter, referred to as oneplasssvm) in the reinforcement learning layer and subjected to self-learning training to determine whether the currently input picture resource is a false-positive picture resource, that is, step S203 is performed, and whether the current input picture resource is a false-positive picture resource is determined through ResNet50 and oneplasssvm.
Correspondingly, if the current picture resource is determined to be a false alarm picture resource after being processed by the ResNet50 and the OneClassSVM, namely the current picture resource is mistaken as a picture resource of a contraindicated picture, the step S204 is executed, and the false alarm picture resource is filtered; otherwise, determining that the current picture resource is not a picture resource which is mistakenly regarded as a tabu picture in the past, and inputting the processed picture resource into the identification model selected from the model layer.
Understandably, the filtering of the false alarm picture resources in step S204 may be, for example, removing the picture resource set scanned by the picture resource scanner, so that the filtered false alarm picture resources are not identified any more in the subsequent processing, and the false alarm picture resources are not reappeared in the visual warning report, thereby reducing the false alarm situation from the source and reducing the human input cost.
Continuing to refer to fig. 6, illustratively, in the reinforcement learning layer, it is also necessary to periodically perform self-learning training on the ResNet50 and oneplasssvm, i.e., to perform step S201 and step S202. In step S201, the false-positive image resource is collected as a learning data set, and in some implementations, the false-positive image resource is obtained from the false-positive image resource recorded in the judgment layer in response to the review information submitted by the review personnel.
Then, when the learning data set is obtained and the self-learning training conditions are met, for example, the preset time is reached, or the number of the false alarm picture resources in the learning data set reaches the preset number, step S202 is executed, and over-sampling training is performed on the ResNet50 based on the learning data set and over-fitting training is performed on the oneplasssvm.
For the process of performing oversampling training on the ResNet50 and performing overfitting training on the oneplasssvm based on the learning data set, reference may be made to the above description of the processing logic portion of the reinforcement learning layer, and details are not repeated here.
Continuing to refer to fig. 6, for example, in a feasible implementation manner, for example, when the electronic product is facing to the middle east north african region, the taboo picture to be filtered is a female picture resource with bare skin, so after the processing in step S203, the picture resource determined as not being the misreported picture resource needs to be input into the bare skin identification model of the model layer, that is, step S301 is executed, and the taboo picture is identified by the bare skin identification model.
Specifically, as can be seen from the above description, the bare skin identification model includes a portrait identification model and a bare skin level detection model. Therefore, the taboo picture identification process is carried out by the naked skin identification model, specifically, the portrait identification model obtained by training the portrait resources in the COCO data set based on the YoloV5 network is used for identifying the portrait and the gender, so as to identify the female picture resources, then the identified female picture resources are handed to a naked skin degree detection model obtained by training the data in the Imagenet data set by an NSFW frame based on ResNet50 to be processed, the naked score of the female picture resources is determined, and finally, whether the female picture resources are taboo pictures, namely, whether the female picture resources are taboo pictures of naked skin is determined according to a preset naked threshold and the naked score determined by the naked skin degree detection model.
Continuing with fig. 6, after the tabu picture is identified by the bare skin identification model of the model layer, the determining layer may execute step 401, i.e. determine whether the tabu picture is the tabu picture according to the output of the bare skin identification model.
Correspondingly, if after the taboo picture is identified by the bare skin identification model, the picture resource is determined not to be a taboo picture, namely the picture resource can be displayed, step S402 is executed to filter out the non-taboo picture, so as to ensure that the last issued operating system version does not include the female picture of the bare skin; if the taboo picture is identified by the naked skin identification model, and the picture resource is determined to be the taboo picture, namely the taboo picture of the naked skin, the information such as the name, the path address and the visual picture resource of the picture resource is recorded into a visual warning report, and then the step 403 is executed.
In practical application, it can be understood that, each time an identification result corresponding to one picture resource is obtained, after the judgment in step S401, the information of the picture resource determined as the taboo picture is recorded in the visual warning report, and then after all the picture resources are identified, the obtained visual warning report is submitted to the rechecking personnel for rechecking.
Correspondingly, after submitting the visual warning report to the rechecker, if the recheck information submitted by the rechecker is received, step S404 is executed, the taboo picture determined by the rechecker is deleted from the system in response to the recheck information submitted by the rechecker, and the false-alarm picture resource is recorded.
For the form of the visual warning report and the form of the review information submitted by the review personnel, reference may be made to the descriptions of table 1 to table 3 above, which are not repeated herein.
Continuing to refer to fig. 6, for example, in a possible implementation manner, when the electronic product is a region facing italy, porter, spain, or the like, where gestures, such as buffalo horn gestures, are contraindicated, the contraindicated pictures to be filtered are picture resources of the buffalo horn gestures, and therefore after processing in step S203, picture resources determined as not being misreported picture resources need to be input into a contraindicated gesture recognition model of the model layer, that is, step S302 is executed, and the contraindicated gesture recognition model performs contraindicated picture recognition.
Specifically, as can be seen from the above description, the tabu gesture recognition model includes a hand recognition model and a gesture recognition model. The process of contraindication picture recognition by a contraindication gesture recognition model is specifically that hand data in a gesture recognition data set is trained by the SSD-based target recognition framework to obtain a hand recognition model for carrying out hand recognition on picture resources, then the hand picture resources are recognized, a boundary frame is marked in the recognized hand picture resources, then the hand picture resources marked with the boundary frame are delivered to a hand MediaPipe framework structure gesture recognition model for positioning 21 3D hand joint points in a hand area in the hand picture resources, the coordinate information of the 21 3D hand joint points is extracted, and finally the coordinate information of the 21 3D landworks is analyzed by adopting an SVM algorithm, so that a gesture in the hand picture resources is predicted, and whether the gesture is the contraindication gesture is determined.
Continuing with fig. 6, for example, after the tabu image is recognized by the tabu gesture recognition model of the model layer, the determining layer performs step 401, i.e. determines whether the tabu image is the tabu image according to the output of the tabu gesture recognition model.
Correspondingly, if after the taboo picture is identified by the taboo gesture identification model, the picture resource is determined not to be the taboo picture, namely the picture resource can be displayed, step S402 is executed to filter out the non-taboo picture, thereby ensuring that the finally issued operating system version does not include the taboo gesture picture of the demon gesture; if the tabu picture is identified by the tabu gesture identification model, and the picture resource is determined to be a tabu picture, that is, a tabu picture of the demon gesture, information such as the name, the path address, the visual picture resource and the like of the picture resource is recorded in the visual warning report, and then step 403 is executed.
The execution logic of step 403 and step 404 is substantially the same as the above implementation, and is not described here again.
With reference to fig. 6, for example, in a possible implementation manner, for example, when the electronic product is a ground area for middle east north africa and the like that is abstaining from pigs and dogs, the taboo pictures that need to be filtered are taboo picture resources of taboo animals such as pigs and dogs, so after the processing in step S203, the picture resources that are determined not to be the misreported picture resources need to be input into the taboo animal identification model of the model layer, that is, step S303 is executed, and the taboo animal identification model performs taboo picture identification.
For details of implementing the taboo animal identification model to identify the taboo animal, reference may be made to the description part of the above model layer for several taboo animal identification models, which is not repeated herein.
Continuing to refer to fig. 6, for example, after the taboo animal identification model of the model layer performs taboo picture identification, the determination layer performs step 401, i.e., determines whether the taboo picture is a taboo picture according to the output of the taboo animal identification model.
Correspondingly, if after contraindicated picture identification is carried out by the contraindicated animal identification model, the picture resource is determined not to be a contraindicated picture, namely the picture resource can be displayed, step S402 is executed to filter out the non-contraindicated picture, thereby ensuring that the last released version of the operating system does not include the contraindicated gesture pictures of the pig and the dog; if the taboo picture is identified by the taboo animal identification model, the picture resource is determined to be the taboo picture, namely the taboo picture containing the pig and/or the dog, the information such as the name, the path address and the visual picture resource of the picture resource is recorded into the visual warning report, and the step 403 is executed.
The execution logic of step 403 and step 404 is substantially the same as the above implementation, and is not described herein again.
Continuing to refer to fig. 6, for example, in a possible implementation manner, when the electronic device is oriented to a region sensitive to a contraindication flag, a contraindication picture to be filtered is a picture resource of a contraindication flag type, so that after the processing of step S203, a picture resource determined as not a misreported picture resource needs to be input into a contraindication flag identification model of the model layer, that is, step S304 is executed, and the contraindication flag identification model performs contraindication picture identification.
For details of implementing the taboo flag identification model identification taboo flag, reference may be made to the description part of the above model layer for the taboo flag identification model, and details are not repeated here.
Continuing with fig. 6, after the taboo picture is identified by the taboo flag identification model of the model layer, the decision layer may execute step 401, i.e. determine whether the taboo picture is a taboo picture according to the output of the taboo flag identification model.
Correspondingly, if after the taboo picture is identified by the taboo flag identification model, the picture resource is determined not to be a taboo picture, that is, the picture resource can be displayed, step S402 is executed to filter out non-taboo pictures, so as to ensure that the last issued operating system version does not include the picture resource of the taboo flag; if the tabu picture is identified by the tabu flag identification model and the picture resource is determined to be a tabu picture, the information of the name, the path address, the visual picture resource and the like of the picture resource is recorded in the visual warning report, that is, step 403 is executed.
The execution logic of step 403 and step 404 is substantially the same as the above implementation, and is not described herein again.
Continuing to refer to fig. 6, for example, in a possible implementation manner, when the electronic device is oriented to a region that is sensitive to map marks (some sensitive regions), a tabu picture that needs to be filtered is a picture resource that is currently listed as a sensitive region, so that after the processing of step S203, a picture resource that is determined not to be a misreported picture resource needs to be input into the sensitive region identification model of the model layer, that is, step S305 is executed, and the sensitive region identification model performs tabu picture identification.
For details of implementing the sensitive region identification model to identify the sensitive region, reference may be made to the description part of the model layer for the sensitive region identification model, which is not described herein again.
Continuing with fig. 6, for example, after the sensitive region identification model of the model layer performs tabu picture identification, the determining layer may execute step 401, i.e. determine whether the sensitive region identification model is a tabu picture according to the output of the sensitive region identification model.
Correspondingly, if after the taboo picture is identified by the sensitive region identification model, the picture resource is determined not to be a taboo picture, namely the picture resource can be displayed, step S402 is executed to filter out the non-taboo picture, so as to ensure that the last issued operating system version does not include the picture resource which is currently listed as the map mark corresponding to the sensitive region; if the image resource is determined to be a tabu image after the tabu image is identified by the sensitive area identification model, recording information such as the name, the path address and the visual image resource of the image resource into a visual warning report, namely executing step 403.
The execution logic of step 403 and step 404 is substantially the same as the above implementation, and is not described here again.
Continuing to refer to fig. 6, for example, in a possible implementation manner, for example, the area where the electronic product is facing has requirements for both the bare skin of the female and some gestures, the bare skin recognition model and the tabu gesture recognition model need to be selected to jointly complete the recognition processing of the image resource. That is, after the processing of step S203, the picture resource determined not to be the misreported picture resource needs to be input into the bare skin recognition model and the tabu gesture recognition model of the model layer, that is, step S301 is executed in which the bare skin recognition model performs tabu picture recognition, and step S302 is executed in which the tabu gesture recognition model performs tabu picture recognition.
It should be noted that, in some implementations, when the number of the recognition models selected from the model layer is 2, or even more, the recognition processing operations of the respective recognition models may be performed in parallel, that is, the picture resources processed in step 203 are respectively input to the selected recognition models for recognition processing.
In addition, it should be noted that, in other implementations, when the number of the recognition models selected from the model layer is 2 or more, the recognition processing operations of the recognition models may be performed in series, that is, the picture resource processed in step 203 is first input to a selected recognition model for recognition processing, and then the recognition model performs recognition processing on the picture resource and then passes to the next selected recognition model for recognition processing.
It should be understood that the above description is only an example for better understanding of the technical solution of the present embodiment, and is not to be taken as the only limitation of the present embodiment.
For convenience of explanation, fig. 6 will be described by taking a parallel execution mode as an example. After the image resources processed by the reinforcement learning layer are input into a naked skin recognition model and a taboo gesture recognition model in a model layer, the portrait recognition model obtained by training the image resources of the portrait category in the COCO data set based on a YoloV5 network in the naked skin recognition model recognizes the portrait and the gender so as to recognize the female image resources, then the recognized female image resources are handed to a naked skin degree detection model obtained by training the data in the Imagenet data set based on an NSFW frame of ResNet50 to be processed, the naked score of the female image resources is determined, and finally, whether the female image resources are taboo images, namely taboo images of naked skin, is determined according to a preset naked threshold and the naked score determined by the naked skin degree detection model; the hand recognition method comprises the steps that a hand recognition model is obtained in a tabu gesture recognition model based on hand data training in a SSD target recognition frame gesture recognition data set, hand recognition is conducted on picture resources, then the hand picture resources are recognized and processed, a boundary frame is marked in the recognized hand picture resources, then the hand picture resources marked with the boundary frame are delivered to a hand MediaPipe frame structure gesture recognition model to locate 21 3D hand joint points in a hand area in the hand picture resources, coordinate information of the 21 3D hand joint points is extracted, finally the 21 3D hand joint points are analyzed through an SVM algorithm, gestures in the hand picture resources are predicted, and whether the gestures are tabu gestures or not are determined.
It should be noted that step S401 is executed after the taboo image recognition is performed by the naked skin recognition model or after the taboo image recognition is performed by the taboo gesture recognition model.
It can be understood that, after the taboo picture is identified by the bare skin identification model, the step S401 executed in the judgment layer specifically determines whether the taboo picture is the taboo picture according to the output of the bare skin identification model; if the tabu gesture recognition model identifies the tabu picture, the step S401 executed in the judgment layer specifically determines whether the tabu picture is a tabu picture according to the output of the tabu gesture recognition model.
Correspondingly, if the current picture resource is not a tabu picture (neither a tabu picture of a bare skin type nor a tabu picture of a tabu gesture type) after the determination of step S401, step S402 is performed to filter out the non-tabu picture, thereby ensuring that the last released os version does not include a tabu picture of a bare skin type and a tabu picture of a tabu gesture type; if it is determined in step S401 that the picture resource is a tabu picture, no matter the tabu picture is a naked skin type or a tabu picture is a tabu gesture type, the information of the name, the path address, the visual picture resource, etc. of the picture resource is recorded in the visual warning report, and step S403 is executed.
The execution logic of step 403 and step 404 is substantially the same as the above implementation, and is not described herein again.
Continuing to refer to fig. 6, for example, in a possible implementation manner, for example, the region where the electronic product is facing has a requirement for a contra-gesture of a female and a requirement for some animals, a contra-gesture recognition model and a contra-animal recognition model are selected to jointly complete the recognition processing of the picture resource. That is, after the processing of step S203, the image resource determined as not being the misreported image resource needs to be input into the tabu gesture recognition model and the tabu animal recognition model of the model layer, that is, step S302 is performed in which the tabu gesture recognition model performs tabu image recognition, and step S303 is performed in which the tabu animal recognition model performs tabu image recognition.
For convenience of explanation, the parallel execution mode is taken as an example for explanation. Namely, after the picture resource processed by the reinforcement learning layer is input into a taboo gesture recognition model and a taboo animal recognition model in a model layer, a hand recognition model obtained by training hand data in a gesture recognition data set based on an SSD target recognition frame in the taboo gesture recognition model performs hand recognition on the picture resource, further identifies the hand picture resource, marks a bounding box in the identified hand picture resource, then hands the hand picture resource marked with the bounding box to a gesture recognition model with a hand MediaPipe frame structure to position 21 3D hand joint points in a hand area in the hand picture resource, extracts coordinate information of the 21 3D hand joint points, and finally analyzes the coordinate information of the 21 3D hand joint points by adopting an SVM algorithm, thereby predicting the gesture in the hand picture resource and further determining whether the gesture is the taboo gesture; the taboo animal identification model identifies whether the picture resources contain the taboo animals according to the mode described in the implementation mode.
It should be noted that step S401 is executed for the judgment layer regardless of whether the tabu gesture recognition model performs tabu image recognition or the tabu animal recognition model performs tabu image recognition.
Understandably, if the tabu gesture recognition model identifies the tabu picture, the step S401 executed in the judgment layer specifically determines whether the tabu picture is a tabu picture according to the output of the tabu gesture recognition model; if the contraindication image is identified by the contraindication animal identification model, the step S401 executed in the judgment layer specifically determines whether the contraindication image is the contraindication image according to the output of the contraindication animal identification model.
Correspondingly, if the current picture resource is not a tabu picture (neither a tabu picture of a tabu gesture type nor a tabu picture of a tabu animal type) after the determination of step S401, step S402 is performed to filter out the non-tabu picture, thereby ensuring that the last released operating system version does not include the tabu picture of the tabu gesture type and the tabu picture of the tabu animal type; if it is determined in step S401 that the picture resource is a tabu picture, no matter the tabu picture is a tabu gesture type, or a tabu picture is a tabu animal type, the information such as the name, the path address, and the visual picture resource of the picture resource is recorded in the visual warning report, and step S403 is executed.
The execution logic of step 403 and step 404 is substantially the same as the above implementation, and is not described herein again.
Continuing to refer to fig. 6, for example, in a possible implementation manner, for example, if the region facing the electronic product has requirements on a contraindication gesture of a female, some animals, some flags and map identifications, a contraindication gesture recognition model, a contraindication animal recognition model, a contraindication flag recognition model and a sensitive region recognition model are selected to jointly complete the recognition processing of the picture resource. After the processing of step S203, the image resource determined to be not the misreported image resource needs to be input into the taboo gesture recognition model, the taboo animal recognition model, the taboo flag recognition model and the sensitive area recognition model of the model layer, that is, step S302 is executed, and the taboo gesture recognition model performs taboo image recognition; executing step S302, carrying out taboo picture identification on a taboo animal identification model; executing the step S304, and identifying the taboo picture by the taboo flag identification model; step S305 is executed, and the sensitive region identification model performs tabu image identification.
For convenience of explanation, the parallel execution mode is taken as an example for explanation. After the picture resources processed by the reinforcement learning layer are input into a taboo gesture recognition model and a taboo animal recognition model in a model layer, hand recognition models in the taboo gesture recognition model are obtained through hand data training in a gesture recognition data set based on an SSD (solid State disk) target recognition framework, hand recognition is carried out on the picture resources, then the hand picture resources are recognized, a boundary frame is marked in the recognized hand picture resources, then the hand picture resources marked with the boundary frame are handed to a hand MediaPipe framework structure gesture recognition model to locate 21 3D hand joint points in a hand area in the hand picture resources, coordinate information of the 21 3D hand joint points is extracted, finally, the coordinate information of the 21 3D landworks is analyzed by adopting an SVM (support vector machine) algorithm, so that gestures in the hand picture resources are predicted, and whether the gestures are taboo gestures is determined; the taboo animal identification model identifies whether the picture resources contain taboos according to the mode described in the implementation mode; the taboo flag identification model identifies whether the picture resource contains a taboo flag or not according to the mode described in the implementation mode; the sensitive area identification model identifies whether the picture resource contains the sensitive area or not according to the mode described in the implementation mode.
It should be noted that step S401 is executed for determining whether the tabbed image is recognized by the tabbed gesture recognition model, the tabbed image is recognized by the tabbed animal recognition model, the tabbed image is recognized by the tabbed flag recognition model, and the tabbed image is recognized by the sensitive region recognition model.
Understandably, if the tabu gesture recognition model identifies the tabu picture, the step S401 executed in the judgment layer specifically determines whether the tabu picture is a tabu picture according to the output of the tabu gesture recognition model; if the contraindication image is identified by the contraindication animal identification model, the step S401 executed in the fault is specifically to determine whether the contraindication image is the contraindication image according to the output of the contraindication animal identification model; if the taboo picture is identified by the taboo flag identification model, the step S401 executed in the fault judgment layer specifically determines whether the taboo picture is a taboo picture according to the output of the taboo flag identification model; if the sensitive region identification model is used for contra-picture identification, the step S401 executed in the fault judgment specifically determines whether the picture is a contra-picture according to the output of the sensitive region identification model.
Correspondingly, if the current picture resource is not a tabu picture (neither a tabu picture of a tabu gesture type nor a tabu picture of an animal type nor a tabu picture of a tabu flag type nor a tabu picture of a sensitive area type) after the determination of the step S401, the step S402 is executed to filter out the non-tabu picture, thereby ensuring that the finally issued version of the operating system does not include the tabu picture of the tabu gesture type, the tabu picture of the animal type, the tabu picture of the tabu flag type and the tabu picture of the sensitive area type; if it is determined in step S401 that the picture resource is a tabu picture, whether the picture resource is a tabu picture of a tabu gesture type, a tabu picture of a tabu animal type, a tabu picture of a tabu flag type, or a tabu picture of a sensitive area type, information such as the name, the path address, and the visual picture resource of the picture resource is recorded in a visual corning report, and step S403 is executed.
The execution logic of step 403 and step 404 is substantially the same as the above implementation, and is not described here again.
It should be understood that the above description is only an example for better understanding of the technical solution of the present embodiment, and is not to be taken as the only limitation of the present embodiment. In practical application, according to business requirements, any one of the 5 identification models provided by the model layer can be selected for tabu picture identification operation, any two of the 5 identification models provided by the model layer can be selected for tabu picture identification operation, any three of the 5 identification models provided by the model layer can be selected for tabu picture identification operation, any four of the 5 identification models provided by the model layer can be selected for tabu picture identification operation, and 5 identification models provided by the model layer can be selected for tabu picture identification operation together.
In addition, it should be further noted that, in an actual application scenario, the method for identifying a tabu picture provided by the foregoing embodiments implemented by the electronic device may also be executed by a chip system included in the electronic device, where the chip system may include a processor. The system-on-chip may be coupled to the memory, such that the computer program stored in the memory is called when the system-on-chip is running to implement the steps performed by the electronic device. The processor in the system on chip may be an application processor or a processor other than an application processor.
In addition, an embodiment of the present application further provides a computer-readable storage medium, where a computer instruction is stored in the computer-readable storage medium, and when the computer instruction runs on an electronic device, the electronic device is caused to execute the above related method steps to implement the taboo picture identification method in the above embodiment.
In addition, an embodiment of the present application further provides a computer program product, which when run on an electronic device, causes the electronic device to execute the above related steps, so as to implement the taboo picture identification method in the above embodiment.
In addition, embodiments of the present application also provide a chip (which may also be a component or a module), which may include one or more processing circuits and one or more transceiver pins; the receiving pin and the processing circuit are communicated with each other through an internal connecting channel, and the processing circuit executes the related method steps to realize the tabu picture identification method in the embodiment so as to control the receiving pin to receive signals and control the sending pin to send signals.
In addition, as can be seen from the foregoing description, the electronic device, the computer readable storage medium, the computer program product, or the chip provided in the embodiments of the present application are all configured to execute the corresponding methods provided above, and therefore, the beneficial effects achieved by the electronic device, the computer readable storage medium, the computer program product, or the chip may refer to the beneficial effects in the corresponding methods provided above, which are not described herein again.
The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and these modifications or substitutions do not depart from the scope of the technical solutions of the embodiments of the present application.

Claims (12)

1. A tabu picture identification method is characterized by comprising the following steps:
acquiring picture resources related in an operating system;
identifying hand picture resources in the picture resources according to a hand identification model, wherein the hand picture resources comprise the picture resources of hands;
determining coordinate information of hand joint points in the hand picture resources according to a gesture recognition model;
and predicting whether the hand picture resources are tabu pictures according to a Support Vector Machine (SVM) algorithm and the coordinate information of the joint points.
2. The method of claim 1, wherein the hand recognition model is trained on hand data in a potential recognition dataset based on an SSD target recognition framework to obtain a hand recognition for the picture resource, and returns a bounding box.
3. The method as claimed in claim 1, wherein the gesture recognition model is a MediaPipe framework structure, and is used to locate 21 3D hand joint points inside a hand region in the hand picture resources recognized by the hand recognition model, and extract coordinate information of the 21 3D hand joint points.
4. The method of claim 1, further comprising:
identifying a taboo animal picture resource in the picture resource according to a taboo animal identification model, wherein the taboo animal picture resource is the picture resource comprising a taboo animal in the picture resource;
the taboo animal identification model is obtained by training data in the Imagenet data set and image data of the taboo animal obtained through network crawling on the basis of the ResNext convolutional neural network.
5. The method of claim 1 or 4, further comprising:
identifying a contraindicated flag picture resource in the picture resources according to a contraindicated flag identification model, wherein the contraindicated flag picture resource is a picture resource including a contraindicated flag in the picture resources, and the contraindicated flag identification model is a YoloV3 frame structure;
and/or the presence of a gas in the atmosphere,
identifying a sensitive area picture resource in the picture resources according to a sensitive area identification model, wherein the sensitive area picture resource comprises the picture resource of a sensitive area, and the sensitive area identification model is of a YoloV3 frame structure.
6. The method of claim 5, further comprising:
when the picture resource is determined to be a tabu picture, generating a visual warning report according to the name and the path address of the picture resource and the visual picture resource;
submitting the visual warning report to a manual review;
and when the review information submitted by the review personnel is received, deleting the taboo picture determined by the review personnel from the operating system in response to the review information submitted by the review personnel.
7. The method according to claim 6, characterized in that after said obtaining picture resources involved in the operating system, the method further comprises:
extracting characteristic information from the picture resources according to the self-learned ResNet50 convolutional neural network model;
classifying the characteristic information according to a self-learned classification model, and identifying a false alarm picture resource which is determined by a rechecker and is not a tabu picture;
and filtering the false alarm picture resources from the picture resources.
8. The method of claim 7, further comprising:
determining the misreported picture resources in the visual warning report according to the review information submitted by review personnel;
constructing a learning data set according to the misinformation picture resource;
performing oversampling training on the ResNet50 convolutional neural network model based on the data in the learning data set to obtain a self-learned ResNet50 convolutional neural network model;
and performing over-fitting training on a classification model based on the data in the learning data set to obtain a self-learned classification model.
9. The method according to claim 5, characterized in that after said obtaining picture resources involved in the operating system, the method further comprises:
and preprocessing the picture resources, and unifying the size and the channel number of the picture resources.
10. The method according to claim 9, wherein before the preprocessing the picture resource to unify the size and the number of channels of the picture resource, the method further comprises:
determining the format of the picture resource;
when the format of the picture resource is a vector drawable object DVG format, the attribute and the value of the picture resource are adapted, and the picture resource is converted into a scalable vector graphics format SVG from the DVG format;
and converting the picture resources in the SVG format into a Portable Network Graphics (PNG) format.
11. An electronic device, characterized in that the electronic device comprises: a memory and a processor, the memory and the processor coupled; the memory stores program instructions that, when executed by the processor, cause the electronic device to perform the tabu picture recognition method according to any of claims 1 to 10.
12. A computer-readable storage medium, characterized by comprising a computer program which, when run on an electronic device, causes the electronic device to perform the contra-indication picture recognition method according to any of claims 1 to 10.
CN202210405622.7A 2022-04-18 2022-04-18 Taboo picture identification method, apparatus and storage medium Active CN115565201B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210405622.7A CN115565201B (en) 2022-04-18 2022-04-18 Taboo picture identification method, apparatus and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210405622.7A CN115565201B (en) 2022-04-18 2022-04-18 Taboo picture identification method, apparatus and storage medium

Publications (2)

Publication Number Publication Date
CN115565201A true CN115565201A (en) 2023-01-03
CN115565201B CN115565201B (en) 2024-03-26

Family

ID=84737889

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210405622.7A Active CN115565201B (en) 2022-04-18 2022-04-18 Taboo picture identification method, apparatus and storage medium

Country Status (1)

Country Link
CN (1) CN115565201B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101901346A (en) * 2010-05-06 2010-12-01 复旦大学 Method for identifying unsuitable content in colour digital image
CN106446932A (en) * 2016-08-30 2017-02-22 上海交通大学 Machine learning and picture identification-based evolvable prohibited picture batch processing method
CN107330453A (en) * 2017-06-19 2017-11-07 中国传媒大学 The Pornographic image recognizing method of key position detection is recognized and merged based on substep
CN111104820A (en) * 2018-10-25 2020-05-05 中车株洲电力机车研究所有限公司 Gesture recognition method based on deep learning
CN111783812A (en) * 2019-11-18 2020-10-16 北京沃东天骏信息技术有限公司 Method and device for identifying forbidden images and computer readable storage medium
JP2021144724A (en) * 2020-06-29 2021-09-24 ベイジン バイドゥ ネットコム サイエンス アンド テクノロジー カンパニー リミテッド Image examination method, device, electronic apparatus, and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101901346A (en) * 2010-05-06 2010-12-01 复旦大学 Method for identifying unsuitable content in colour digital image
CN106446932A (en) * 2016-08-30 2017-02-22 上海交通大学 Machine learning and picture identification-based evolvable prohibited picture batch processing method
CN107330453A (en) * 2017-06-19 2017-11-07 中国传媒大学 The Pornographic image recognizing method of key position detection is recognized and merged based on substep
CN111104820A (en) * 2018-10-25 2020-05-05 中车株洲电力机车研究所有限公司 Gesture recognition method based on deep learning
CN111783812A (en) * 2019-11-18 2020-10-16 北京沃东天骏信息技术有限公司 Method and device for identifying forbidden images and computer readable storage medium
JP2021144724A (en) * 2020-06-29 2021-09-24 ベイジン バイドゥ ネットコム サイエンス アンド テクノロジー カンパニー リミテッド Image examination method, device, electronic apparatus, and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIAQING LIU等: "An Improved Hand Gesture Recognition with Two-Stage Convolution Neural Networks Using a Hand Color Image and its Pseudo-Depth Image", 2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP) *
吴斌方;陈涵;肖书浩;: "基于SVM与Inception-v3的手势识别", 计算机系统应用, no. 05 *

Also Published As

Publication number Publication date
CN115565201B (en) 2024-03-26

Similar Documents

Publication Publication Date Title
US11797847B2 (en) Selecting instances of detected objects in images utilizing object detection models
US10095925B1 (en) Recognizing text in image data
CN111488826B (en) Text recognition method and device, electronic equipment and storage medium
US20120083294A1 (en) Integrated image detection and contextual commands
CN110033018B (en) Graph similarity judging method and device and computer readable storage medium
CN110348294A (en) The localization method of chart, device and computer equipment in PDF document
US11587301B2 (en) Image processing device, image processing method, and image processing system
TW201443807A (en) Visual clothing retrieval
CN110334214B (en) Method for automatically identifying false litigation in case
US11657306B2 (en) Form structure extraction by predicting associations
CN112884764A (en) Method and device for extracting land parcel in image, electronic equipment and storage medium
CN110363190A (en) A kind of character recognition method, device and equipment
CN116049397B (en) Sensitive information discovery and automatic classification method based on multi-mode fusion
WO2020071558A1 (en) Business form layout analysis device, and analysis program and analysis method therefor
Ravagli et al. Text recognition and classification in floor plan images
CN115546824B (en) Taboo picture identification method, apparatus and storage medium
CN117437647A (en) Oracle character detection method based on deep learning and computer vision
CN111931721B (en) Method and device for detecting color and number of annual inspection label and electronic equipment
Li et al. Comic image understanding based on polygon detection
CN114842482B (en) Image classification method, device, equipment and storage medium
CN110413823A (en) Garment image method for pushing and relevant apparatus
CN115565201A (en) Taboo picture identification method, equipment and storage medium
CN114387600A (en) Text feature recognition method and device, computer equipment and storage medium
CN110853115A (en) Method and equipment for creating development process page
Laumer et al. A Semi-automatic Label Digitization Workflow for the Siegfried Map

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant