CN115565201B

CN115565201B - Taboo picture identification method, apparatus and storage medium

Info

Publication number: CN115565201B
Application number: CN202210405622.7A
Authority: CN
Inventors: 石英男
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2022-04-18
Filing date: 2022-04-18
Publication date: 2024-03-26
Anticipated expiration: 2042-04-18
Also published as: CN115565201A

Abstract

The application provides a tabu picture identification method, a tabu picture identification device and a storage medium, wherein the tabu picture identification method comprises the following steps: acquiring picture resources related in an operating system; identifying hand picture resources in the picture resources according to the hand identification model, wherein the hand picture resources are picture resources comprising hands in the picture resources; determining coordinate information of a hand joint point in the hand picture resource according to the gesture recognition model; predicting whether the hand picture resource is a tabu picture according to the SVM algorithm and the coordinate information of the node. In this way, the preprocessed picture resources are set to be subjected to hand recognition through the hand recognition model with small calculation cost, and the hand recognition is returned to the boundary box, so that the gesture detection range is narrowed, and the recognition efficiency of the gesture recognition model can be improved.

Description

Taboo picture identification method, apparatus and storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a method, an apparatus, and a storage medium for identifying tabu images.

Background

With the global development, more and more products are not limited to domestic markets but also move to international markets. Because different countries and regions have own special historical backgrounds and the like, in order to enable products, such as various electronic devices with display interfaces, to better meet local demands, currently, when an operating system related to picture resources is released, identification of tabu pictures is usually performed manually so as to ensure that the operating system versions of different countries and regions can meet the local demands of the historical backgrounds and the like.

However, the recognition method of the taboo picture is too dependent on manpower, which is time-consuming and labor-consuming, so that it is highly desirable to provide a taboo picture recognition scheme to reduce the dependence on manpower and improve the recognition efficiency.

Disclosure of Invention

In order to solve the technical problems, the application provides a tabu picture identification method, tabu picture identification equipment and a storage medium, which aim to reduce human input, automatically identify picture resources related in an operating system in a more stable, efficient and accurate mode, and ensure that operating system versions aiming at different countries and regions can meet local requirements of historical backgrounds and the like.

In a first aspect, the present application provides a method for identifying a tabu picture. The method comprises the following steps: acquiring picture resources related in an operating system; identifying hand picture resources in the picture resources according to the hand identification model, wherein the hand picture resources are picture resources comprising hands in the picture resources; determining coordinate information of a hand joint point in the hand picture resource according to the gesture recognition model; predicting whether the hand picture resource is a tabu picture according to the SVM algorithm and the coordinate information of the node. In this way, the preprocessed picture resources are set to be subjected to hand recognition through the hand recognition model with small calculation cost, and the hand recognition is returned to the boundary box, so that the gesture detection range is narrowed, and the recognition efficiency of the gesture recognition model can be improved.

According to the first aspect, the hand recognition model trains hand data in the potential recognition data set based on the SSD destination recognition frame to obtain a hand recognition for the picture resource, and returns to the bounding box. Therefore, the hand recognition model is arranged to be of the SSD target recognition frame structure with high processing speed and low cost, so that the extraction of the characteristic information can be efficiently completed while the accuracy is ensured, and further, the operation resource is saved.

According to the first aspect, or any implementation manner of the first aspect, the gesture recognition model is a MediaPipe framework structure, and is used for positioning 21 3D hand joint points inside a hand region in a hand picture resource recognized by the hand recognition model, and extracting coordinate information of the 21 3D hand joint points.

According to the first aspect, or any implementation manner of the first aspect, the method further includes: identifying a tabu animal picture resource in the picture resource according to the tabu animal identification model, wherein the tabu animal picture resource is a picture resource comprising tabu animals in the picture resource; the contraindicated animal identification model is obtained by training data in an image data set and picture data of network crawled contraindicated animals based on a Resnext convolutional neural network. Therefore, a contraindicated animal identification model which can cover a real scene (animal photo) and a virtual scene (cartoon/simple animal) can be obtained, and therefore, a contraindicated animal type contraindicated picture can be accurately identified.

According to the first aspect, or any implementation manner of the first aspect, the method further includes: identifying a tabu flag picture resource in the picture resource according to a tabu flag identification model, wherein the tabu flag picture resource is a picture resource comprising a tabu flag in the picture resource, and the tabu flag identification model is a YoloV3 framework structure; and/or identifying the sensitive region picture resource in the picture resource according to the sensitive region identification model, wherein the sensitive region picture resource is the picture resource comprising the sensitive region in the picture resource, and the sensitive region identification model is a YoloV3 framework structure.

According to the first aspect, or any implementation manner of the first aspect, the method further includes: when the picture resource is determined to be a tabu picture, a visual warning report is generated according to the name, the path address and the visual picture resource of the picture resource; submitting the visual warning report to manual review; when receiving the review information submitted by the review personnel, deleting the tabu pictures determined by the review personnel from the operating system in response to the review information submitted by the review personnel. Therefore, by generating the visual warning report, whether the picture resource is actually a tabu picture or not can be directly determined during manual review, and the picture resource can be determined without searching according to the name and the path address of the picture resource, so that the review efficiency of the manual review link is improved.

According to the first aspect, or any implementation manner of the first aspect, after acquiring a picture resource involved in an operating system, the method further includes: extracting characteristic information from the picture resources according to the self-learned ResNet50 convolutional neural network model; classifying the characteristic information according to a classification model after self-learning, and identifying false alarm picture resources, wherein the false alarm picture resources are picture resources which are determined by rechecking personnel and are not tabu pictures; and filtering the false alarm picture resources from the picture resources. Therefore, each false alarm picture resource is only required to be checked once by a recheck person, and the follow-up picture resource cannot appear in the visual warning report table, so that unnecessary manpower investment can be reduced.

According to the first aspect, or any implementation manner of the first aspect, the method further includes: determining false alarm picture resources in the visual warning report according to review information submitted by a review person; constructing a learning data set according to the false positive picture resources; performing oversampling training on the ResNet50 convolutional neural network model based on data in the learning data set to obtain a self-learned ResNet50 convolutional neural network model; and carrying out fitting training on a classification model based on the data in the learning data set to obtain a classification model after self-learning. Therefore, not only the false alarm picture resources which are the same at first can be filtered, but also similar picture resources can be filtered.

According to the first aspect, or any implementation manner of the first aspect, after acquiring a picture resource involved in an operating system, the method further includes: and preprocessing the picture resources, and unifying the size and the channel number of the picture resources. Therefore, the specification of the picture resources to be identified is realized, and therefore the identification model for identifying the tabu pictures with different tabu types can be used for rapidly and accurately identifying the extracted characteristic information.

According to the first aspect, or any implementation manner of the first aspect, before preprocessing the picture resource and unifying the size and the channel number of the picture resource, the method further includes: determining the format of a picture resource; when the format of the picture resource is a vector drawing object DVG format, the attribute and the value of the picture resource are adapted, and the picture resource is converted from the DVG format to a scalable vector image format SVG; and converting the picture resources in the SVG format into the portable network graphics PNG format. In this way, the picture resources in DVG format in XML form can be converted into visual PNG format, which is convenient for the subsequent recognition model to extract the characteristics and generate visual warning report.

In a second aspect, the present application provides an electronic device. The electronic device includes: a memory and a processor, the memory and the processor coupled; the memory stores program instructions that, when executed by the processor, cause the electronic device to perform the instructions of the first aspect or of the method in any possible implementation of the first aspect.

In a third aspect, the present application provides a computer readable medium for storing a computer program comprising instructions for performing the method of the first aspect or any possible implementation of the first aspect.

In a fourth aspect, the present application provides a computer program comprising instructions for performing the method of the first aspect or any possible implementation of the first aspect.

In a fifth aspect, the present application provides a chip comprising processing circuitry, a transceiver pin. Wherein the transceiver pin and the processing circuit communicate with each other via an internal connection path, the processing circuit performing the method of the first aspect or any one of the possible implementation manners of the first aspect to control the receiving pin to receive signals and to control the transmitting pin to transmit signals.

Drawings

FIG. 1 is a schematic diagram illustrating a data layer acquiring and preprocessing picture resources;

FIG. 2 is a pictorial asset schematic diagram containing a hand, shown by way of example;

FIG. 3 is a schematic diagram of a hand image resource obtained after the image resource shown in FIG. 2 is identified by using the hand identification model provided by the present application;

fig. 4 is a schematic diagram of a hand picture resource obtained after a picture resource including a palm is identified by using the hand identification model provided by the application;

FIG. 5 is a schematic diagram of a hand joint point in the hand image resource shown in FIG. 4 after positioning using the gesture recognition model provided by the present application;

fig. 6 is a schematic diagram of a processing flow of a data layer, a reinforcement learning layer, a model layer and a judgment layer in the tabu picture recognition method provided by the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone.

The terms first and second and the like in the description and in the claims of embodiments of the present application are used for distinguishing between different objects and not necessarily for describing a particular sequential order of objects. For example, the first target object and the second target object, etc., are used to distinguish between different target objects, and are not used to describe a particular order of target objects.

In the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as examples, illustrations, or descriptions. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

In the description of the embodiments of the present application, unless otherwise indicated, the meaning of "a plurality" means two or more. For example, the plurality of processing units refers to two or more processing units; the plurality of systems means two or more systems.

In order to better understand the technical solution provided by the embodiments of the present application, before describing the technical solution of the present application, a description is first given of a type of a taboo picture (a picture that is not allowed to appear) currently considered by a specific country and region. Specifically, the present taboo pictures can be roughly divided into: bare skin, contraindicated gestures, contraindicated animals, contraindicated flags and map signs.

For example, in some implementations, for a tabu picture of a bare skin type, the tabu requirement is determined primarily for areas where portrait pictures of skin exposure are strictly forbidden.

For example, in other implementations, for taboo pictures of a taboo gesture type, the taboo requirements for different interpretations of the same gesture in different regions are primarily determined.

For example, in other implementations, for taboo pictures of a taboo animal type, the taboo requirements for the same animal are determined primarily for different regions.

For example, in other implementations, for taboo pictures of the taboo flag type, it is determined mainly based on the taboo flag requirements currently given globally.

For example, in other implementations, for map logo type taboo pictures, the present is primarily determined for the taboo requirements for which national disputes exist.

With respect to the above five tabu picture types, it should be understood that the above description is merely an example listed for better understanding of the technical solution of the present embodiment, and is not the only limitation of the present embodiment. In practical application, if the region in which the product is promoted relates to other tabu requirements, the type of the tabu picture and the requirement for judging the tabu picture can be adjusted according to the requirements, and the application is not limited to the requirements.

Based on the taboo requirements (judging requirements) corresponding to the taboo picture types, in order to enable products, particularly various electronic equipment with display interfaces, to better meet local requirements such as historical backgrounds, the taboo picture identification scheme provided by the application is provided, so that picture resources related to an operating system can be automatically identified in a more stable, efficient and accurate mode under the condition of reducing manpower input, and the requirements such as the local historical backgrounds can be met for operating system versions of different countries and regions.

It will be appreciated that with respect to the operating system of an electronic device, it is typically published by a beat server. Therefore, in some implementation manners, the technical scheme provided by the application can be suitable for the shooting server, so that when the operating system requiring updating iteration is released, the method for identifying the tabu picture provided by the application can be used for identifying the upgrade package of the operating system requiring releasing, and then the identified tabu picture is filtered (deleted), thereby ensuring that the released operating system versions aiming at different countries and regions meet the requirements of the current countries and regions.

For example, in other implementations, the identification of taboo pictures may also be implemented by an Over-the-Air Technology (OTA) server for different countries and regions. When the update package of the iterative operating system is released by the package shooting server, only one copy of update package is released, and when the update package is acquired from the package shooting server aiming at OTA servers in different countries and regions, the identification of the tabu pictures is carried out according to the technical scheme provided by the application according to the tabu requirements put in advance, and then the identified tabu pictures are filtered, so that the operating system after the update iteration can meet the requirements of the current country and region such as historical background.

It should be understood that the above description is only an example for better understanding of the technical solution of the present embodiment, and is not to be taken as the only limitation of the present embodiment.

Implementation details of the technical solutions provided in the embodiments of the present application are described below, and the following details are provided only for convenience of understanding, and are not necessary to implement the present embodiment.

By way of example, taking a feasible implementation manner as an example, a platform architecture applicable to the technical scheme provided by the application can be divided into a data layer, a model layer, a judgment layer and a reinforcement learning layer. The data layer is used for acquiring image resources which need to be identified in the operating system and preprocessing the image resources; the model layer is used for combining the identification models corresponding to different tabu picture types according to service requirements so as to identify different tabu pictures; the judging layer is used for backtracking and confirming resources of the tabu pictures and automatically generating a visual report; the reinforcement learning layer is used for training the reinforcement learning model according to the false alarm picture resources marked in the visual report, so that in the follow-up recognition, the reinforcement learning model filters the picture resources provided by the data layer, so that the picture resources which are mistakenly considered as tabu pictures before are removed, and then the filtered picture resources are sent to each recognition model in the model layer for recognition processing.

For a better understanding of the platform architecture, the following description will refer to each layer separately.

Specifically, for the processing logic of the data layer, the method is roughly divided into two steps of picture resource acquisition and picture resource preprocessing.

For example, for the picture resource acquisition step, in some implementations, for example, the image resources within the code bins may be automatically scanned after receiving a distributed version control system (git) command to scan the picture resources within the code bins. Taking the Android system as an example, the scanned image resources can be classified into a portable network graphics (Portable Network Graphics, PNG) format, a joint photographic experts group (Joint Photographic Experts Group, JPEG/JPG) format, and an Android specific renderable object (Drawable Vector Graphics, DVG) format, for example.

In addition, it should be noted that, in the technical solution provided in the present application, in order to facilitate the deletion of the position of the taboo picture when the picture resource is determined to be the taboo picture in the following, in the picture resource obtaining step, besides the picture resource itself may be obtained, the name of the picture resource, the stored path address and the like may also be obtained, so that the position of the taboo picture may be accurately located according to the name and the path address in the following, and deleted.

For convenience of the following description, the procedure/function of scanning a picture resource from a code bin is referred to herein as a picture resource scanner, such as that shown in fig. 1. Furthermore, it can be understood that, regarding the picture resources in PNG format and JPG format, for example, the phone picture resource 101 shown in fig. 1, the background color thereof is transparent (no background color) in the code bin, and thus is virtually invisible to the naked eye of the user, so in order to facilitate the recognition processing of the model layer, the judgment layer, and the reinforcement learning layer, when the picture resources in PNG format and JPG format are scanned, it is necessary to perform preprocessing, that is, perform a picture resource preprocessing step, so as to obtain the phone picture resource 102 shown in fig. 1, that is, visible to the naked eye.

In addition, for the picture resources in DVG format, in the Android system, in order to save the image storage space, a vector drawable object (vector drawable) is used to store the picture resources, especially some icons of the system itself, while icons in VD form, i.e. the picture resources in DVG format are stored in XML format in a code bin, for example, XML code of the picture resources of a video recorder may be as shown in 103 in fig. 1. For such picture resources, the model layer, the judging layer and the reinforcement learning layer cannot directly perform feature extraction, and when the DVG-format picture resources are obtained by scanning, the model layer, the judging layer and the reinforcement learning layer also need to be subjected to preprocessing, i.e. a picture resource preprocessing step is performed, so that the video recorder picture resources 104 shown in fig. 1 are obtained, i.e. can be seen by naked eyes.

Furthermore, it is understood that while picture resources in PNG and JPG formats are not visible to the naked eye, the system can recognize that such picture resources are themselves stored in picture form, not XML code. Thus, the preprocessing step performed on the picture resources in PNG format and JPG format may include, for example, size processing, channel number processing, and visualization processing.

For example, in the technical solution provided in the present application, for size processing, specifically, the size of the picture resource scanned by the picture resource scanner is unified into a format of 640×640×3; for the channel number processing, the single channel, the three channels and the four channels are uniformly processed into the three channels in the application.

It should be noted that, in practical applications, the number of common image channels is divided into a single channel, a three channel, and a four channel (RGBA). Wherein, the single channel is gray scale photo (GrayScale), and the number of layers is only one; three channels are common color photos (RGB), and the number of the channels is R, G, B three layers; the four channels are rgb+alpha pictures, and Alpha of the last layer is an opaque parameter in general, so that the four-channel picture resource may be completely transparent (as shown in fig. 101) or completely black.

In addition, the visualization process described above is specifically directed to four channels, and the Alpha layer is a completely black or transparent picture resource.

For example, in the case where the Alpha layer is all 1 (all black) or all 0 (all transparent), it is generally possible to select to delete the Alpha layer directly and use the RGB layer directly, but for the image resource in which the Alpha layer exists in a small portion and the RGB layer is used for the transparency process, it is necessary to use the Alpha layer as a processed single-layer image, remove the RGB layer, and then superimpose the single-layer image into a three-channel image.

It should be understood that the above description is only an example for better understanding of the technical solution of the present embodiment, and is not to be taken as the only limitation of the present embodiment. In practical application, a preprocessing flow of the picture resources can be set according to service requirements, for example, any of the above items is selected according to the service requirements, or image enhancement processing is added on the basis of the processing flow, so that subsequent feature extraction is facilitated, and further the picture resources are better identified.

Further, as is apparent from the above description, the picture resources in the DVG format exist in the form of XML codes. Therefore, for preprocessing of such picture resources, the attribute and the value in the picture resources in the DVG format need to be firstly adapted to the attribute and the value corresponding to the scalable vector image (Scalable Vector Graphics, SVG), then stored as the picture resources in the SVG format, and finally converted into the picture resources in the PNG format by using the cairosvg module package (a tool package for converting the SVG format into the PNG format) in the python (a computer programming language), thereby completing the conversion from the DVG format into the PNG format. Then, the preprocessing operations given above for the PNG format picture resources, such as size processing, channel number processing, and visualization processing, can be performed.

It can be appreciated that, since the picture resources in DVG (or VG) format and the picture resources in SVG format are both implemented based on XML syntax, and the names of the basic graphic definitions are common to both formats, for example, lines (text), rectangles (rect), circles (circle), etc. are the same names, and the picture resources in both formats are laid out based on coordinates. Thus, by adapting the properties and values, a mutual transformation can be achieved. The SVG format picture resource can be converted into the PNG format picture resource based on the cairosvg module package in python, and the DVG has no mode of directly converting into the PNG format or JPG format, so that the DVG format picture resource needs to be converted into the SVG format picture resource first, if the DVG format picture resource can be directly converted into the PNG format or JPG format subsequently with the development of picture processing technology, the operation of converting the attribute and the value in the DVG format picture resource into the corresponding attribute and the value in the SVG format picture resource can be skipped, and the operation of converting the DVG format picture resource into the SVG format picture resource can be directly performed by means of a tool.

Specifically, in the application, when the attribute and the value in the picture resource in the DVG format are adapted to the corresponding attribute and the value in the picture resource in the SVG format, for example, the picture resource in the DVG format is loaded first, then the scope of the picture resource in the DVG format is identified, the origin coordinate when the picture resource in the SVG format is converted is calculated, then the attribute and the value of the picture resource in the DVG format are adapted according to the calculated origin coordinate, further the picture element in the SVG format is obtained, finally the SVG file is created, and the converted picture element is added to the SVG file, so that the operation of adapting the attribute and the value in the picture resource in the DVG format to the corresponding attribute and the value in the picture resource in the SVG format can be realized.

The description of the picture resource obtaining step and the picture resource preprocessing step in the data layer will be presented here, and it should be understood that the foregoing description is merely an example for better understanding the technical solution of the present embodiment, and is not the only limitation of the present embodiment. For the operation of adapting the attribute and the value in the picture resource in the DVG format to the corresponding attribute and the value in the picture resource in the SVG format, and the operation of converting the picture resource in the SVG format to the picture resource in the PNG format, specific implementation details of the size processing, the channel number processing, the visualization processing, and the image enhancement processing may refer to the existing standards, and will not be described herein.

For the processing logic of the model layer, according to the tabu picture types, for example, the five kinds of the tabu picture types can be selected and combined at the model layer according to the service requirements. In order to facilitate the following description, in the technical solution provided in the present application, the recognition model for recognizing the tabu picture of the bare skin type is referred to as a bare skin recognition model, the recognition model for recognizing the tabu picture of the tabu gesture type is referred to as a tabu gesture recognition model, the recognition model for recognizing the tabu picture of the tabu animal type is referred to as a tabu animal recognition model, the recognition model for recognizing the tabu picture of the tabu flag type is referred to as a tabu flag recognition model, and the recognition model for recognizing the tabu picture of the sensitive region (tabu map identification) type is referred to as a sensitive region recognition model.

It should be understood that the above description is only an example for better understanding of the technical solution of the present embodiment, and is not to be taken as the only limitation of the present embodiment. In practical application, the recognition models for recognizing different tabu pictures can be named according to the needs, and the method is not limited.

For example, taking the above name as an example, specifically, in the technical solution provided in the present application, the bare skin recognition model includes a portrait recognition model and a bare skin degree detection model. The image recognition model is obtained by training image resources of image categories in the COCO data set based on the YoloV5 network and is used for recognizing images and sexes of the preprocessed image resources; the bare skin degree detection model is obtained by training data in an image data set based on an NSFW framework of a residual network ResNet50 and is used for detecting the bare skin degree of a portrait picture resource identified by the portrait identification model, namely a picture resource comprising a portrait.

It can be appreciated that the YoloV5 network and the generic Yolo series algorithm fall within the category of one-step detection algorithms. In the aspect of data preprocessing, the YoloV5 adopts the mosaic image online enhancement mode proposed by YoloV4, and aims to increase the number of small targets in a single batch, improve the recognition capability of a network on the small targets and increase the data information of the single batch. That is, yoloV5 is used to identify a target object, specifically, in this application, to identify a person, that is, the identified target object is a person. The COCO data set, i.e., the common object in COntext (Common Objects in COCO) data set, is a data set that can be used for image recognition, including large, rich object detection, segmentation and caption data. The COCO data set aims at scene understanding (scene understanding), is mainly intercepted from a complex daily scene, and the target in the image is calibrated by accurate segment. Currently, images in the COCO dataset include 91-class objects, 328000 images and 2500000 labels. Furthermore, the COCO data set is the largest data set with semantic segmentation so far, the category provided by the COCO data set is 80, more than 33 ten thousand picture resources are provided, wherein 20 ten thousand picture resources are marked, and the number of individuals in the whole data set is more than 150 ten thousand. Since the COCO dataset is self-contained in the portrait (person) category, and different labels are given to men and women, respectively. Therefore, in the technical scheme provided by the application, the portrait identification model for identifying portrait picture resources carries out iterative training on the picture resources of portrait categories in the COCO data set based on the YoloV5 network until the preset convergence requirement is met, so that after the preprocessed picture resources are used as output parameters to be input into the portrait identification model, whether the currently input picture resources contain portrait, namely whether the portrait picture resources are contained or not can be accurately determined through the identification processing of the portrait identification model on the picture resources.

In addition, it should be noted that the YoloV5 network adopts a positive example definition mode different from the general Yolo series algorithm. The general Yolo series algorithm utilizes the intersection ratio (Intersection over Union, IOU) value of the prior frame and the real target frame to define the positive example, the prior frame with the IOU value larger than the threshold value is set as the positive example, and the prior frame and the real target frame are in one-to-one correspondence, so that at most, only the positive example with the same number as the real target frame exists, and the situation of imbalance of the positive example and the negative example exists. The yolv 5 uses the aspect ratio of the prior frame and the real target frame to define the positive example, the aspect ratio is smaller than the threshold value, and the positive example proportion is increased in a manner of allowing a plurality of prior frames to match one real target frame, and the offset is set, and the adjacent grids are used for simultaneously predicting the same target, so that the number of the positive examples is further increased, and the positive example proportion is greatly increased.

Yolv 5 uses three parts of category loss, objective loss and regression loss in terms of loss function to guide training. Let the total index of the prediction frame be i, the index of the prediction positive example be p, y be the prediction category, y ₁ Is a true category.

Illustratively, regarding the class loss, a binary cross entropy is employed, whose calculation formula is as follows (1):

Illustratively, regarding the target loss, a binary cross entropy is used, and the calculation formula is as follows (2):

illustratively, regarding regression loss, GIOU (Generalized Intersect)ion over Union, which can be used as a target detection regression partial objective function and a metric) calculation method, the calculation formula is as follows (3), wherein iou _p For predicting the intersection ratio of the positive example frame p and the corresponding real frame:

illustratively, regarding the total loss, which is a weighted sum of the above three partial losses, the calculation formula is as follows (4):

loss＝α·cls _loss +β·obj _loss +γ·giou _loss (4)

regarding the yolv 5 network and the content of the COCO dataset, when the portrait identification model is obtained based on the image resource training of the portrait category in the COCO dataset by the yolv 5 network, the content not described in the present application may refer to the yolv 5 network and the relevant standards of the COCO dataset, which are not described herein.

In addition, it should be noted that, in some implementations, the content of the portrait identification model obtained by input training may be, for example, a picture resource after the preprocessing operation in the data layer, or a picture resource of a tabu picture that is filtered out by the reinforcement learning layer.

In addition, in some implementations, the content output after the portrait identification model identification processing may be set to be "1" or "0", where it is agreed that "1" indicates that the currently identified picture resource contains a portrait, that is, that "0" indicates that the currently identified picture resource does not contain a portrait.

In addition, in other implementation manners, the content output after the recognition processing of the portrait recognition model may be set to be "Yes" or "No", where it is agreed that "Yes" indicates that the currently recognized picture resource includes a portrait, that is, the portrait picture resource, and "No" indicates that the currently recognized picture resource does not include a portrait.

Furthermore, it should be noted that, just like a network, the Imagenet dataset used for training the bare skin degree detection model has a plurality of nodes (nodes). Each Node corresponds to an item or sub-category. According to the standards of the Imagenet dataset, a Node contains at least 500 pictures/images of corresponding objects for training, i.e. the Imagenet dataset is actually a huge picture library for training of images/vision. Regarding the bare skin degree detection model, the application still adopts a NSFW (Not Suitable For Work) framework of ResNet50 commonly used for training the existing bare skin degree detection model, but because the tabu pictures of the bare skin types in the technical scheme provided by the application mainly refer to the portrait pictures required by certain areas to be strictly forbidden to use skin exposure, the bare skin degree detection model related in the technical scheme provided by the application is essentially different from the existing bare skin degree detection model used for identifying unsuitable browsing contents (pornography pictures). Specifically, the bare skin degree detection model related to the technical scheme provided by the application is obtained by training the characteristics of the number of skin areas in the portrait picture resources, the ratio of the pixels of each skin area to all the pixels in the portrait picture resources, the maximum skin area and total skin area in the portrait picture resources and the like identified by the portrait identification model. And because the tolerance of the areas to the exposure is lower, a lower convergence value can be set, and iterative training is continuously updated until the convergence requirement is met.

Specifically, in the application, when the exposed skin degree detection model obtained through the feature training is used for detecting the portrait picture resources, the feature is mainly used as a scoring parameter, namely scoring parameters such as the number of skin areas, the ratio of pixels of each skin area to all pixels in the portrait picture resources, the maximum skin area and the total skin area in the portrait picture resources and the like are scored, so that the exposed score of the portrait picture resources is determined.

In addition, it should be noted that, for some figures with only two bare arms, the bare score determined by the bare skin degree detection model of the present application may have a score of 0.22, which is not a tabu picture for most areas, but still is considered as a tabu picture for areas with bare skin being forbidden, so a lower bare threshold, such as 0.01, may be set, and thus, by comparing the bare threshold with the size of the bare score determined by the bare skin degree detection model, it may be determined whether the current figure picture resource is a tabu picture.

For example, in some implementations, when the exposure score is less than the exposure threshold, it may be determined that the portrait picture resource is not a tabu picture, and conversely, it is determined that the portrait picture resource is a tabu picture of an exposed skin type.

The NSFW framework of the res net50 and the content of the Imagenet dataset are described herein, and when training the data in the Imagenet dataset into the bare skin detection model based on the NSFW framework of the res net50, the content not described in the present application may refer to the NSFW framework of the res net50 and the relevant standards of the Imagenet dataset, which will not be described herein.

In addition, it should be noted that, in some implementations, the content output after the identification process of the bare skin degree detection model may be set to be "1" or "0", where "1" indicates that the current portrait picture resource is a tabu picture of the bare skin type, that is, the bare score is greater than or equal to the bare threshold value, and "0" indicates that the current portrait picture resource is not a tabu picture, that is, the bare score is less than the bare threshold value.

In addition, in other implementation manners, the content output after the identification processing of the bare skin degree detection model is set to be "Yes" or "No", and it is agreed that "Yes" indicates that the current portrait picture resource is a tabu picture of the bare skin type, that is, the bare score is greater than or equal to the bare threshold value, and "No" indicates that the current portrait picture resource is not a tabu picture, that is, the bare score is less than the bare threshold value.

In the technical scheme provided by the application, the naked skin recognition model for recognizing the contraindicated picture of the naked skin type comprises a portrait recognition model and a naked skin degree detection model, and the pretreated picture resources or the picture resources of the contraindicated picture subjected to the deadline filtering by the reinforcement learning layer are firstly recognized by the portrait recognition model, so that a large number of contraindicated pictures which do not accord with the naked skin type can be filtered out, then the portrait picture resources recognized by the portrait recognition model are scored by the naked skin degree detection model, the naked score is determined, and the naked score is compared with a lower naked threshold value, so that whether the current picture resources are the contraindicated pictures can be determined efficiently and accurately.

For example, since most tabu gestures are popular, it is difficult to collect a large amount of picture resources for deep learning training, and thus conventional neural network technology cannot be directly used for detection. Still taking the above name as an example, in particular, in the technical solution provided in the present application, in order to efficiently and accurately implement recognition of a tabu gesture image, a tabu gesture recognition model includes a hand recognition model and a gesture recognition model. The hand recognition model is obtained by training hand data in a hand recognition data set based on an SSD (Single-shot Detector) target recognition frame, is used for carrying out hand recognition on the preprocessed picture resources, and returns to a boundary frame; the gesture recognition model is a MediaPipe framework structure and is used for accurately positioning key points (namely automatically dotting 21 3D landmarks) in a hand region in a hand picture resource (including a picture resource of the hand) recognized by the hand recognition model, extracting coordinate information of the 21 3D landmarks, and analyzing the coordinate information of the 21 3D landmarks based on a support vector machine (Support Vector Machine, SVM) algorithm so as to predict gestures in the hand picture resource.

It can be understood that the SSD is a target detection algorithm, the main design idea of the SSD target recognition frame is feature layered extraction, and frame regression and classification are sequentially performed, and in the technical scheme provided by the application, the gesture recognition model is obtained based on hand data training in the gesture recognition data set by the SSD target recognition frame, so that when the preprocessed picture resource is recognized, recognition processing is performed by the gesture recognition model, thereby realizing feature information extraction by deep learning, and ensuring the accuracy of the extracted feature information.

In addition, the hand recognition model is of an SSD target recognition frame structure with high processing speed and low cost, so that the extraction of the characteristic information can be efficiently finished while the accuracy is ensured, and the operation resource is saved.

Furthermore, it should be noted that in some implementations, the gesture recognition data set may be, for example, HGRD (Hand Gesture Recognition Dataset) data set, and the HGRD data set collects various hand pictures/images.

Furthermore, in other implementations, the gesture recognition data set may be, for example, a dynamic gesture data set, such as CGD (ChaLearn Gesture Data), chaLearn LAP IsoGD data set derived from CGD, chaLearn LAP ConGD derived from CGD, etc., which are not listed here, but are not limiting in this application.

In addition, the MediaPipe framework described above is a framework mainly used for constructing audio, video, or any time-series data. With the help of the MediaPipe framework, pipes can be built for different media processing functions. Currently, mediaPipe is mainly applied to multi-hand tracking, face detection, object detection and tracking, 3D object detection and tracking, automatic video cropping pipeline, etc., which are not listed here, but are not limited thereto. Specifically, in the technical scheme provided by the application, the gesture recognition model is based on the characteristic that the MediaPipe framework can be applied to multi-hand tracking, namely, the gesture recognition model of the MediaPipe framework structure uses single palm detection, and once the gesture recognition model is completed, the gesture recognition model can point 21 3D landmarks in the detected hand area, namely, accurate key point positioning is performed.

In addition, after the gesture recognition model points 21 3D landmarks and extracts coordinate information of the 21 3D landmarks, an SVS algorithm is adopted, which is specifically referred to as one-to-one (one-to-versus-one, abbreviated as OVO SVMs or parilwise in this application). The intention with this algorithm is to design an SVM between any two samples, so k (integer greater than 0) classes of samples require the design of k (k-1)/2 SVMs.

Illustratively, when the class of the sample is predicted by the SVM, for example, the class of an unknown sample is classified, and the class with the highest ranking is the class of the unknown sample, for convenience of understanding, the following description is given with reference to examples.

For example, assume that the sample has 4 classifications, A, B, C and D, respectively. At the time of prediction, the 4 classifications A, B, C and D can be used as an SVM, and then 6 SVM training sets (A, B), (A, C), (A, D), (B, C), (B, D) and (C, D) are obtained. And in the test, the corresponding vectors are used for testing 6 results respectively, then a screen throwing mode is adopted, and finally a group of results can be obtained.

Illustratively, regarding the voting process, for example, there are:

upon initialization A, B, C, D, a=b=c=d=0 is set;

the 6 SVM training sets are then voted:

(a, B) -classifier, if awin, a=a+1; other, b=b+1;

(a, C) -classifier, if awin, a=a+1; other, c=c+1;

(a, D) -classifier, if awin, a=a+1; other, d=d+1;

(B, C) -classifier, if bwin, b=b+1; other, c=c+1;

(B, D) -classifier, if bwin, b=b+1; other, d=d+1;

(C, D) -classifier, if cwin, c=c+1; other, d=d+1;

The decision is the Max(A，B，C，D)。

thus, if the voting process is finished with a=3, b=2, c=1, d=0, the final classification of the sample is the a classification.

For better understanding of the recognition logic of the tabu gesture pictures, the following description is made with reference to fig. 2 to 5.

For example, if the picture resource input with the hand recognition model after preprocessing is as shown in fig. 2 (including the electronic device 201 and the hand 202), after the picture resource is recognized by the hand recognition model, it is determined that the hand is included in the picture resource shown in fig. 2, so that the picture resource is determined to be a hand picture resource (including the hand picture resource), and the bounding box 203 where the hand 202 is located is marked, as shown in fig. 3.

In addition, it should be noted that, in some implementations, the content output after the processing of the hand recognition model may be set to "1" or "0", where it is agreed that "1" indicates that the current picture resource includes a hand, that is, the hand picture resource, and "0" indicates that the current picture resource does not include a hand.

For example, if the hand included in the picture resource input to the hand recognition model through the preprocessing is shown in fig. 4, after the hand recognition model recognizes the picture resource, it is determined that the hand is included in the picture resource shown in fig. 4, so that the picture resource is determined as a hand picture resource (including the picture resource of the hand), and the bounding box 302 where the hand 301 is located is marked, as shown in fig. 4.

In addition, in other implementations, the content output after the processing of the hand recognition model may be set to be "Yes" or "No", where it is agreed that "Yes" indicates that the current picture resource includes a hand, that is, the hand picture resource, and "No" indicates that the current picture resource does not include a hand.

To more intuitively learn the dotting of 21 3D landmarks in the hand picture resources by the gesture recognition model, the hand picture resources recognized in fig. 4 are taken as an example.

For example, if the hand image resource identified by the hand recognition model is input into the gesture recognition model, after the hand image resource is clicked by the gesture recognition model, a hand landmark schematic diagram as shown in fig. 5 is obtained, wherein 0 to 20 are the points of 21 3D landmarks.

Illustratively, after the 21 3D landarrays are obtained, the coordinate information of the 21 3D landarrays is extracted. For convenience of explanation, the coordinate information of the 21 3D landmarks is taken as a sample set, the sample set is divided into 16 classifications, and based on the SVM algorithm given above, the samples are subjected to prediction processing through 16 (16-1)/2 SVMs, so that classification processing of gestures in the hand picture resources is realized.

Taking the hand picture resource shown in fig. 4 as an example, after predicting samples corresponding to the coordinate information of the 21 3D landmarks after dotting in fig. 5 based on the SVM algorithm, it may be determined that the gesture in the hand picture resource shown in fig. 4 is a gesture of "five fingers open".

In addition, it can be understood that the classification of the samples into 16 categories is determined based on the currently determined 16 tabu gestures, that is, in practical application, the classification of the samples corresponding to the hand picture resources into how many categories can be determined according to the types of the tabu gestures to be recognized.

Regarding the SSD destination recognition framework, mediaPipe framework, and SVM algorithm, the description is given next, and when training a hand recognition model based on the SSD destination recognition framework, constructing a gesture recognition model based on the MediaPipe framework, and predicting a gesture based on the SVM algorithm, what is not described in the present application may refer to the SSD destination recognition framework, the MediaPipe framework, and relevant standards of the SVM algorithm, which are not described herein.

Therefore, in the technical scheme provided by the application, the tabu gesture recognition model of the tabu picture of the tabu gesture type of the equipment is set to comprise a hand recognition model and a gesture recognition model, and the picture resources after pretreatment or the picture resources of the tabu picture which are filtered by the reinforcement learning layer and are reported by the past time are set to be subjected to hand recognition through the hand recognition model of the SSD target recognition frame structure with small calculation cost, and returned to the boundary box, so that the gesture detection range is reduced, and the recognition efficiency of the gesture recognition model can be improved.

In addition, in the technical scheme provided by the application, when the hand recognition model recognizes that the current picture resource is the hand picture resource, the hand picture marked with the hand bounding box is input into the gesture recognition model for recognition processing, so that the gesture recognition model of the MediaPipe framework structure can perform gesture detection in the range of determining that the hand exists, and the recognition precision of the gesture recognition model is improved.

For example, considering actual business requirements, for taboo animal type taboo pictures, whether real animal images or animal images of cartoon/simple figure type are not suitable for being displayed in areas where the taboo pictures are taboo, so that in order to better meet user requirements and promote user experience, in the technical scheme provided by the application, when training the taboo animal identification model based on a Resnext network (a convolutional neural network), the animal image data in the Imagenet data set and the animal image data of various cartoon/simple figure types crawled from the network are fused, so that a taboo animal identification model capable of covering real scenes (animal photos) and virtual scenes (cartoon/simple figure type animal) can be obtained, and the taboo animal type taboo pictures can be more accurately identified.

It can be understood that currently, the Imagenet dataset already covers 1200 ten thousand picture resources, and the picture resources cover more than 1000 kinds, and for common contraindicated animals, such as "pigs" and "dogs", "cats", and especially "black cats", the three animals probably have 35 ten thousand picture resources, so that the training data are rich.

Furthermore, to ensure that as many as possible of these three types of contraindicated animals in the form of cartoons/stick-shapes can be crawled, for example 5 tens of thousands, and the crawled content is processed 1:1 for subsequent processing.

From this, based on the picture resource of 35 three kinds of tabu animals in Imagenet dataset, and the picture resource of 5 ten thousand cartoons/simple form of crawling for example, carry out iterative training, until meeting convergence condition can obtain the tabu animal identification model that satisfies the business demand.

For the way to crawl the cartoon/simplified drawing form of the picture resources from the network, reference may be made to the existing standard, which will not be described here.

For example, considering that the number of taboo flags and sensitive regions is limited, a large amount of picture resources are difficult to collect for deep learning training, and a proper framework and pipeline model are not available temporarily, so that a yolv 3 network is adopted to construct a recognition model for recognizing the two types of taboo pictures, namely, in practical application, a taboo flag recognition model for recognizing taboo pictures of a taboo flag type and a sensitive region recognition model for recognizing the taboo pictures of a sensitive region type can exist independently, namely, the taboo flag recognition model is trained based on training data of the yolv 3 network on the picture resources of the taboo flag type, and the sensitive region recognition model is trained based on training data of the yolv 3 network on the picture resources of the sensitive region type.

In other implementations, the taboo flag recognition model for recognizing the taboo flag type and the sensitive region recognition model for recognizing the taboo picture of the sensitive region type may be one recognition model, that is, the recognition model may recognize the taboo flag type or the taboo picture of the sensitive region type, so that the data of the picture resource of the taboo flag type and the data of the picture resource of the sensitive region type may be fused into one data set, and then training is performed on the data in the data set based on the yov 3 network to obtain the recognition model capable of recognizing the taboo flag type or the taboo picture of the sensitive region type.

The above description is only an example for better understanding the technical solution of the present embodiment, and is not to be taken as the only limitation of the present embodiment. The algorithm is not described in detail in this application, and reference may be made to the existing standard, which is not described here.

For the processing logic of the judging layer, in order to further improve the accuracy of identification, for the picture resource which does not pass through any one pipeline (the pipeline corresponding to the identification model of different tabu types) in the model layer, namely the picture resource which is determined to be a tabu picture by the model layer, a warning report (called as a warning report in the follow-up) can be generated by the name and the path address of the picture resource and the visualized picture resource, and the warning report is submitted to the manual review.

It should be noted that, although the present application still involves a manual review link, before the manual review, most of the picture resources are rapidly and accurately filtered by means of recognition model recognition processing corresponding to different tabu types in the model layer, and the picture resources submitted to the manual review are greatly reduced, for example, from tens of thousands to hundreds of pictures, so that the labor cost can be effectively reduced.

In addition, in the warning report submitted for manual review, the picture resources which are determined to be tabu pictures by the identification model in the model layer are displayed in a visual mode, so that whether the picture resources are tabu pictures or not can be directly determined during manual review, and the picture resources are not required to be searched according to the names and path addresses of the picture resources to be determined, and the review efficiency of the manual review link is improved.

For better understanding, a visual warning report table is specifically described below in conjunction with one shown in table 1.

Table 1 visual warning report table

/>

Exemplary, as shown in table 1, the picture format is PNG format, the name is black cat 1_1.PNG, the path address is "http:// schema. Android. Com/apk/res/android1.1", the picture format is JPG format, the name is devil horn gesture JPG, the path address is "http:// schema. Android. Com/apk/res/android1.2", and the picture format is PNG format, the name is touch gesture PNG, the path address is "http:// schema. Android. Com/apk/res/android1.3", the picture resource is determined to be a tabu picture after being identified by the tabu animal identification model and the tabu gesture identification model of the model layer, in this case, the name and the path address acquired by the picture resource scanner when acquiring the picture resource and the pre-processing are manually submitted to the visual report table as shown in table visual report table 1.

Correspondingly, if the visual warning report table is submitted to the manual review, the submitted review information of the review personnel is received, and the review personnel is responded to the review operation of the review personnel, the picture resources determined to be false reports by the review personnel are deleted from the visual warning report table, and finally, the residual tabu pictures in the visual warning report table are deleted from the system according to the path addresses of the residual tabu pictures in the visual warning report table.

For ease of understanding, the following is specifically described in connection with a visual warning report table after manual review as shown in table 2.

Table 2 visual warning report table

For example, as shown in table 2, according to the manual review result ("whether to misreport" a list of recorded review information "Yes" or "No"), the picture resources may or may not be misreported, and need to be deleted from the visual warning report table shown in table 2.

For example, in some implementations, "Yes" may be agreed to indicate that the current picture resource is a false alarm, i.e., the picture resource is not a tabu picture, and is not required to be deleted from the system, and may be displayed normally; the agreed No indicates that the current picture resource is not misreported, i.e. the picture resource is a tabu picture, which needs to be deleted from the system and cannot be displayed.

For example, in other implementations, it may be further agreed that "1" indicates that the current picture resource is a false alarm, i.e., the picture resource is not a tabu picture, and is not required to be deleted from the system, so that the picture resource can be normally displayed; the convention of "0" indicates that the current picture resource is not misreported, i.e. the picture resource is a tabu picture, which needs to be deleted from the system and cannot be displayed.

For example, based on the visual warning report table shown in table 2, in response to the review operation of the review personnel at the judging layer, the review personnel is determined to be a false-reported picture resource, for example, a touch gesture PNG picture resource stored in table 2 under the path address of "http:// schema. Android. Com/apk/res/android1.3" is deleted from the visual warning report table, and the visual warning report table for deleting the false-reported picture resource is shown in table 3.

Table 3 visual warning report table

Finally, deleting the remaining tabu pictures 'black cat 1_1. PNG' in the visual warning report table from the system according to the path addresses 'http:// schema. Android. Com/apk/res/android 1.1' of the remaining tabu pictures in the visual warning report table, and deleting 'devil ox horn gesture. JPG' from the system according to the path addresses 'http:// schema. Android. Com/apk/res/android 1.2'.

For processing logic of the reinforcement learning layer, in order to reduce false alarm conditions of tabu pictures identified by each identification model in the model layer, human input is further reduced, a recheck person can recheck picture resources (later called false alarm picture resources) determined to be false alarm, feature information of the false alarm picture resources is extracted based on a ResNet50 convolutional neural network, a convolutional neural network model of the ResNet50 framework is trained in an oversampled strong memory training mode, a classification model (OneClassSVM) is trained in a fitting mode, the identification model of each pipeline in the model layer after pretreatment of the data layer is input into the ResNet50 convolutional neural network model in the reinforcement learning layer for feature extraction before identification of the identification model of each pipeline in the model layer, and then the advanced features are input into the OneClassSVM classification model for processing, so that the false alarm picture resources in the same mode can be filtered out, and similar picture resources can be filtered out. That is, each false alarm picture resource is only required to be checked once by a recheck person, and the follow-up will not appear in the visual warning report table.

In addition, it should be further noted that, regarding the construction of the res net50 convolutional neural network model in the reinforcement learning layer in the present application, the existing res net network model may be specifically modified so as to adapt to the image classification task in the present application.

Specifically, the modification of the res net network model is, for example, to remove the last layer (top layer) of the network model, then fix (freeze) parameters of all the neural network layers in front of the network model, then sequentially add the full connection layer of 32×640×640×3 and the last layer of judgment layer at the last of the network model, and finally train by using the feature information in the error report picture resource data set after the oversampling process as parameters.

It can be appreciated that, because the number of false alarm picture resources is relatively small, in order to ensure the accuracy of the trained res net50 convolutional neural network model, the false alarm picture resources need to be duplicated, so that enough sample data exists in the training sample data set, for example, the same false alarm picture resource is duplicated for 1000 copies or more.

In addition, it should be noted that in some implementations, the update training of the res net50 convolutional neural network model and the onecsasssvm classification model in the reinforcement learning layer may be, for example, when a preset time, such as one month, is satisfied, the sequentially recorded false alarm picture resources are obtained, and then the sequentially learning training is performed according to the above manner, so as to ensure that the false alarm picture resources in the past (before) and similar picture resources will not reappear in the next tabu picture identification, thereby reducing the false alarm picture resources recorded in the visual warning report table, and further reducing the ineffective human input.

For example, in other implementations, the update training of the res net50 convolutional neural network model and the onecsasssvm classification model in the reinforcement learning layer may be, for example, performing sequential learning training according to the above manner when the recorded misinformation picture resources meet a preset threshold, so as to ensure that the misinformation picture resources before the forward period (before) and similar picture resources cannot reappear in the next tabu picture identification, thereby reducing the misinformation picture resources recorded in the visual warning report table and further reducing ineffective human input.

For example, in other implementations, the update training of the res net50 convolutional neural network model and the OneClassSVM classification model in the reinforcement learning layer may be, for example, that when the operating system needs to perform update iteration, that is, issue a new version, learning training is performed sequentially according to the above manner, so as to ensure that the misinformation picture resources before the forward period (before) and similar picture resources will not reappear in the next tabu picture identification, thereby reducing the misinformation picture resources recorded in the visual warning report table, and further reducing ineffective human input.

In addition, it should be noted that, in some implementation manners, the reinforcement learning layer related in the technical scheme provided in the application may be introduced according to the service requirement, that is, the picture resource processed by the data layer is input into the reinforcement learning layer for processing, and then the picture resource processed by the reinforcement learning layer is input into the model layer for performing the recognition operation of the tabu picture.

In addition, in other implementation manners, the reinforcement learning layer related in the technical scheme provided by the application is not introduced, namely, the picture resources processed by the data layer are directly input into the model layer to perform the recognition operation of the tabu pictures.

In addition, in other implementation modes, any one or a combination of a plurality of recognition models (a bare skin recognition model, a tabu gesture recognition model, a tabu animal recognition model, a tabu flag recognition model and a sensitive area recognition model) in the model layer can be selected according to service requirements, and a plurality of deeply learned pipelines (Pipeline) are realized to recognize tabu pictures, so that the recognition and filtration of the tabu pictures in an operating system can be more stable, more efficient, more accurate, sharper and more user-friendly, and further the requirements of local historical backgrounds and the like can be met for the operating system versions of different countries and areas.

In order to better understand the technical solution provided in the present application, the implementation manner of different combinations of five recognition models in the model layer is described below by taking the scenario related to the data layer, the reinforcement learning layer, the model layer and the judgment layer as an example in conjunction with fig. 6.

Referring to fig. 6, after receiving an identification operation of a tabu picture, for example, a specified git command, in response to the git command, a picture resource involved in the operating system is acquired by the above-described picture resource scanner, that is, step S101 is performed.

Illustratively, in step S101, the picture resource scanner obtains the picture resource from the operating system, specifically, from the code bin.

It will be appreciated that in practical applications, a large number of picture resources may be stored in the code bin, so that each time a scan of the picture resources is performed, step S102 is performed, i.e. the scanned picture resources are preprocessed.

In addition, in other implementations, after all the picture resources in the code bin are scanned, the step S102 may be invoked once, and then preprocessing is performed once on all the scanned picture resources.

In addition, in other implementations, step S102 may be performed once after a preset number of picture resources are scanned, so as to implement a timed and quantitative batch processing operation.

Then, step S102 is performed to preprocess the picture resources, and the corresponding preprocessing method is selected according to the format of the picture resources currently required to be preprocessed.

For preprocessing the picture resources in different formats, refer to fig. 1, and the text description of fig. 1 will not be repeated here.

Then, after the operation in step S102 is performed, the operation that needs to be performed by the data layer in the present application is performed, and at this time, the preprocessed picture resource may be directly input into the recognition model selected in the model layer, or may be input into the reinforcement learning layer first.

With continued reference to fig. 6, exemplary, after the picture resource is preprocessed in step S102, the preprocessed picture resource is input into a self-learning-trained res net50 convolutional neural network model (hereinafter abbreviated as "res net 50") in the reinforcement learning layer to perform feature extraction, and then the self-learning-trained onecsm classification model (hereinafter abbreviated as "onecsm") in the reinforcement learning layer processes the extracted feature information, so as to determine whether the currently input picture resource is a false positive picture resource, that is, execute step S203, and determine whether the currently input picture resource is a false positive picture resource through the res net50 and the onecsm.

Correspondingly, if the current picture resource is determined to be the false alarm picture resource after being processed by the ResNet50 and the OneClassSVM, namely, the current picture resource is mistakenly regarded as the picture resource of the tabu picture, executing the step S204, and filtering the false alarm picture resource; otherwise, if the current picture resource is not the picture resource which is mistakenly considered as the tabu picture, the processed picture resource is input into the identification model selected in the model layer.

It can be understood that the filtering of the false alarm picture resource in step S204 may be, for example, removing the false alarm picture resource set scanned by the picture resource scanner, so that the subsequent processing will not identify the filtered false alarm picture resource, and the false alarm picture resource will not appear in the visual warning report again, so that the false alarm situation is reduced from the source, and the labor input cost is reduced.

With continued reference to fig. 6, in the reinforcement learning layer, it is also necessary to regularly perform self-learning training on the res net50 and oneplasssvm, i.e., perform step S201 and step S202, by way of example. The false positive picture resources are collected as the learning data set in step S201, and in some implementations are obtained from the false positive picture resources recorded in the judgment layer in response to the review information submitted by the review personnel.

Then, when the learning data set is obtained and the self-learning training condition is satisfied, for example, the preset time is reached, or the misinformation picture resources in the learning data set reach the preset number, step S202 is executed, the res net50 is oversampled based on the learning data set, and the oneplasssvm is fitted.

For the process of over-sampling the ResNet50 based on the learning dataset and fitting the OneClassSVM, reference is made to the description of the processing logic of the reinforcement learning layer, which is not repeated here.

With continued reference to fig. 6, in an exemplary implementation, for example, when the electronic product is a region facing the forbidden bare skin, the tabu picture to be filtered is a portrait picture resource of the bare skin, so after the processing in step S203, the picture resource determined to be not the false report picture resource needs to be input to the bare skin recognition model of the model layer, that is, step S301 is executed, and the tabu picture recognition is performed by the bare skin recognition model.

Specifically, as can be seen from the above description, the bare skin recognition model includes a portrait recognition model and a bare skin degree detection model. The process of tabu image recognition by the naked skin recognition model specifically includes that the human image recognition model obtained by training the human image type image resources in the COCO data set is subjected to human image and gender recognition by the YoloV5 network, then human image resources are recognized, the recognized human image resources are submitted to the naked skin degree detection model obtained by training the image data set by the NSFW framework based on the ResNet50, the naked score of the human image resources is determined, and finally whether the human image resources are tabu images or not is determined according to the preset naked threshold and the naked score determined by the naked skin degree detection model.

With continued reference to fig. 6, after the tabu image recognition is performed on the bare skin recognition model of the model layer, the judgment layer performs step 401 of determining whether the model layer is a tabu image according to the output of the bare skin recognition model.

Correspondingly, if the tabu picture is identified by the bare skin identification model, determining that the picture resource is not a tabu picture, that is, the picture resource can be displayed, executing step S402, and filtering the non-tabu picture, thereby ensuring that the finally released operating system version does not comprise the portrait picture of the bare skin; if the tabu picture is identified by the bare skin identification model, determining that the picture resource is a tabu picture, that is, a tabu picture of the bare skin, recording information such as a name, a path address, a visualized picture resource and the like of the picture resource into a visualized warning report, that is, executing step 403.

It can be understood that in practical application, after the identification result corresponding to one picture resource is obtained and determined in step S401, the above information of the picture resource determined as a tabu picture is recorded into a visual warning report, and after all picture resources are identified, the obtained visual warning report is submitted to a rechecking personnel for rechecking.

Accordingly, after submitting the visual warning report to the rechecking personnel, if the rechecking information submitted by the rechecking personnel is received, step S404 is executed, and in response to the rechecking information submitted by the rechecking personnel, the taboo picture determined by the rechecking personnel is deleted from the system, and the false report picture resource is recorded.

For the form of the visual warning report, and the form of the review information submitted by the review personnel, refer to the descriptions of tables 1 to 3, and are not repeated here.

With continued reference to fig. 6, in an exemplary implementation, for example, when the electronic product is a region facing a gesture, for example, a devitrification ox horn gesture, the taboo picture to be filtered is a picture resource of the devitrification ox horn gesture, so after the processing in step S203, the picture resource determined to be not the misinformation picture resource needs to be input to the taboo gesture recognition model of the model layer, that is, step S302 is executed, and the taboo gesture recognition model performs taboo picture recognition.

Specifically, as can be seen from the above description, the tabu gesture recognition model includes a hand recognition model and a gesture recognition model. Therefore, in the process of performing tabu image recognition by the tabu gesture recognition model, specifically, the above-mentioned SSD-target-recognition-frame-based hand data training is used for obtaining a hand recognition model to perform hand recognition on image resources, further, the hand image resources are subjected to recognition processing, a boundary frame is marked in the recognized hand image resources, then the hand image resources marked with the boundary frame are intersected with 21 3D hand joint points in a hand region in the hand image resources by the hand recognition model of the hand MediaPipe framework structure, coordinate information of the 21 3D hand joint points is extracted, finally, the coordinate information of the 21 3D landmarks is analyzed by adopting an SVM algorithm, so that the gesture in the hand image resources is predicted, and further, whether the gesture is a tabu gesture is determined.

With continued reference to fig. 6, after the tabu gesture recognition model of the model layer performs the tabu picture recognition, the judgment layer may execute step 401, i.e. determine whether the tabu picture is based on the output of the tabu gesture recognition model.

Correspondingly, if the tabu gesture recognition model is used for recognizing the tabu image, determining that the image resource is not the tabu image, namely, the image resource can be displayed, executing step S402, and filtering the non-tabu image, thereby ensuring that the finally released operating system version does not comprise the tabu gesture image of the devil gesture; if the tabu gesture recognition model is used for recognizing the tabu image, determining that the image resource is a tabu image, namely, a tabu image of the devil gesture, recording the information such as the name, the path address, the visualized image resource and the like of the image resource into a visualized warning report, namely, executing step 403.

The execution logic of steps 403 and 404 is substantially the same as that of the above implementation, and will not be described here again.

With continued reference to fig. 6, in an exemplary implementation, for example, in a possible implementation manner, the electronic product is directed to a region where "pig" and "dog" are contraindicated, and the contraindicated pictures to be filtered are contraindicated picture resources of contraindicated animals such as "pig" and "dog", so that after the processing in step S203, the picture resources determined as not being false alarm picture resources need to be input to a contraindicated animal identification model of the model layer, that is, step S303 is executed, and the contraindicated animal identification model performs contraindicated picture identification.

For implementation details of the contraindicated animal identification model for identifying contraindicated animals, reference may be made to the description part of the model layer for several contraindicated animal identification models, and details are not repeated here.

With continued reference to fig. 6, after the tabu animal identification model of the model layer performs the tabu picture identification, the judgment layer may execute step 401, i.e. determine whether it is a tabu picture according to the output of the tabu animal identification model.

Correspondingly, if the tabu animal recognition model is used for tabu image recognition, determining that the image resource is not a tabu image, namely, the image resource can be displayed, executing step S402, and filtering out the non-tabu image, thereby ensuring that the finally released operating system version does not comprise the tabu gesture images of 'pigs' and 'dogs'; if the tabu animal identification model identifies a tabu picture, determining that the picture resource is a tabu picture, i.e. a tabu picture containing a pig and/or a dog, recording the information such as the name, the path address, the visualized picture resource, etc. of the picture resource into a visualized warning report, i.e. executing step 403.

With continued reference to fig. 6, in an exemplary implementation, in a possible implementation, the electronic device is directed to a region sensitive to a taboo flag, and the taboo picture to be filtered is a picture resource of a taboo flag type, so after being processed in step S203, the picture resource determined to be not a false report picture resource needs to be input to a taboo flag identification model of a model layer, that is, step S304 is executed, and the taboo flag identification model performs taboo picture identification.

For details of implementation of the taboo flag recognition model in the taboo flag recognition model, reference may be made to the description part of the model layer for the taboo flag recognition model, which is not repeated here.

With continued reference to fig. 6, after the tabu flag recognition model of the model layer performs the tabu picture recognition, the judging layer may execute step 401, i.e. determine whether it is a tabu picture according to the output of the tabu flag recognition model.

Correspondingly, if the tabu picture is identified by the tabu flag identification model, determining that the picture resource is not a tabu picture, that is, the picture resource can be displayed, executing step S402 to filter out the non-tabu picture, thereby ensuring that the last released operating system version does not include the picture resource of the tabu flag; if the tabu flag recognition model is used for tabu picture recognition, determining that the picture resource is a tabu picture, recording the name, path address, visualized picture resource and other information of the picture resource into a visualized warning report, namely executing step 403.

With continued reference to fig. 6, in an exemplary implementation, the electronic device is directed to a region sensitive to map markers (some sensitive regions), and the taboo pictures to be filtered are picture resources currently listed as the sensitive regions, so after processing in step S203, the picture resources determined to be not false alarm picture resources need to be input to the sensitive region identification model of the model layer, that is, step S305 is performed, and the sensitive region identification model performs taboo picture identification.

For details of the implementation of the sensitive area recognition model to recognize the sensitive area, reference may be made to the description part of the above model layer for the sensitive area recognition model, which is not repeated here.

With continued reference to fig. 6, after the tabu picture recognition is performed on the sensitive area recognition model of the model layer, the judgment layer may execute step 401, i.e. determine whether the tabu picture is based on the output of the sensitive area recognition model.

Correspondingly, if the tabu picture is identified by the sensitive area identification model, determining that the picture resource is not a tabu picture, that is, the picture resource can be displayed, executing step S402, and filtering the non-tabu picture, thereby ensuring that the finally released operating system version does not comprise the picture resource currently listed as the map mark corresponding to the sensitive area; if the tabu picture is identified by the sensitive area identification model, determining that the picture resource is a tabu picture, recording the information such as the name, the path address, the visualized picture resource and the like of the picture resource into a visualized warning report, namely executing step 403.

With continued reference to fig. 6, in an exemplary implementation, for example, in a region facing an electronic product, there is a requirement on both exposed skin of a person and some gestures, then the selection of an exposed skin recognition model and a tabu gesture recognition model are required to jointly complete the recognition processing of the picture resource. That is, after the processing in step S203, the picture resources determined as not false alarm picture resources need to be input to the bare skin recognition model and the tabu gesture recognition model of the model layer, that is, step S301 is executed, the bare skin recognition model performs tabu picture recognition, and step S302 is executed, the tabu gesture recognition model performs tabu picture recognition.

It should be noted that, in some implementations, when the number of recognition models selected from the model layer is 2 or more, the recognition processing operations of each recognition model may be performed in parallel, that is, the picture resources processed in step 203 are respectively input to the selected recognition models for recognition processing.

In addition, it should be noted that, in other implementations, when the number of recognition models selected from the model layer is 2 or more, the recognition processing operation of each recognition model may be performed serially, that is, the picture resource processed in step 203 is input to a selected recognition model for recognition processing, and then the recognition model performs recognition processing on the picture resource.

For ease of illustration, fig. 6 illustrates an example of a parallel execution. Namely, after the picture resources processed by the reinforcement learning layer are input into a naked skin recognition model and a tabu gesture recognition model in a model layer, a figure recognition model obtained by training picture resources of a figure category in a COCO data set based on a YoloV5 network in the naked skin recognition model can recognize figures and sexes, further, figure picture resources are recognized, then the recognized figure picture resources are submitted to a naked skin degree detection model obtained by training data in an image data set based on an NSFW framework of ResNet50 to be processed, the naked score of the figure picture resources is determined, and finally, whether the figure picture resources are tabu pictures or not is determined according to a preset naked threshold value and the naked skin degree detection model, namely whether the figure picture resources are tabu pictures of naked skin or not is determined; training hand data in a gesture recognition data set based on an SSD target recognition frame to obtain a hand recognition model in the gesture recognition data set, carrying out hand recognition on a picture resource, further carrying out hand picture resource recognition processing, marking a boundary frame in the recognized hand picture resource, then delivering the hand picture resource marked with the boundary frame to 21 3D hand joint points in a hand region in the hand picture resource by the gesture recognition model of a hand MediaPipe frame structure, extracting coordinate information of the 21 3D hand joint points, and finally adopting an SVM algorithm to analyze the coordinate information of the 21 3D landmarks, so as to predict the gesture in the hand picture resource, and further determining whether the gesture is a tabu gesture.

In addition, it should be noted that, step S401 is executed by the judging layer, regardless of whether the bare skin recognition model performs the tabu picture recognition or the tabu gesture recognition model performs the tabu picture recognition.

It can be understood that, if the tabu picture is identified by the bare skin identification model, the step S401 executed in the judgment layer is specifically to determine whether the tabu picture is based on the output of the bare skin identification model; if the tabu gesture recognition model performs the tabu picture recognition, the step S401 executed in the judging layer specifically determines whether the tabu picture is the tabu picture according to the output of the tabu gesture recognition model.

Accordingly, if the current picture resource is not a taboo picture (neither a taboo picture of a bare skin type nor a taboo picture of a taboo gesture type) after the judgment in step S401, step S402 is executed to filter out the non-taboo picture, thereby ensuring that the finally released operating system version does not include the taboo picture of the bare skin type and the taboo picture of the taboo gesture type; if the determination in step S401 is made that the picture resource is a taboo picture, the name, path address, and visualized picture resource of the picture resource are recorded in the visualized warning report, i.e. step 403 is executed, regardless of whether the picture resource is a taboo picture of the bare skin type or a taboo picture of the taboo gesture type.

With continued reference to fig. 6, in an exemplary implementation, for example, in a region facing an electronic product, there is a requirement for a tabu gesture of a portrait, and there is a requirement for some animals, and then a tabu gesture recognition model and a tabu animal recognition model need to be selected to complete the recognition processing of the picture resource together. That is, after the processing in step S203, the picture resources determined as not false alarm picture resources need to be input to the tabu gesture recognition model and the tabu animal recognition model of the model layer, that is, step S302 is performed, the tabu gesture recognition model performs tabu picture recognition, and step S303 is performed, the tabu animal recognition model performs tabu picture recognition.

For ease of illustration, the description will be given with respect to a parallel execution mode. Namely, after the picture resources processed by the reinforcement learning layer are input into a tabu gesture recognition model and a tabu animal recognition model in a model layer, hand data training in a gesture recognition data set is carried out on the basis of an SSD target recognition frame in the tabu gesture recognition model to obtain a hand recognition model, hand recognition is carried out on the picture resources, further, the hand picture resources are recognized, a boundary box is marked in the recognized hand picture resources, then the hand picture resources marked with the boundary box are delivered to 21 3D hand joint points in a hand area in the hand picture resources by a gesture recognition model of a hand MediaPipe frame structure, coordinate information of the 21 3D hand joint points is extracted, finally, the coordinate information of the 21 3D landmarks is analyzed by adopting an SVM algorithm, so that the gesture in the hand picture resources is predicted, and whether the gesture is a tabu gesture is determined; the contraindicated animal identification model identifies whether the picture resource contains contraindicated animals in the manner described in the above implementation.

In addition, it should be noted that, the step S401 is executed by the judgment layer, regardless of whether the tabu gesture recognition model performs the tabu image recognition or the tabu animal recognition model performs the tabu image recognition.

It can be understood that, if the tabu gesture recognition model performs the tabu image recognition, the step S401 executed in the judging layer specifically determines whether the tabu image is the tabu image according to the output of the tabu gesture recognition model; if the tabu animal identification model is the tabu image, the step S401 executed in the judging layer is to determine whether the tabu image is the tabu image according to the output of the tabu animal identification model.

Accordingly, if the current picture resource is not a taboo picture (neither a taboo picture of a taboo gesture type nor a taboo picture of a taboo animal type) after the judgment in step S401, step S402 is executed to filter out the non-taboo picture, thereby ensuring that the finally released operating system version does not include the taboo picture of the taboo gesture type and the taboo picture of the taboo animal type; if the determination in step S401 is made that the picture resource is a taboo picture, whether the picture resource is a taboo picture of a taboo gesture type or a taboo picture of a taboo animal type, the information such as the name, the path address, the visualized picture resource, etc. of the picture resource is recorded into the visualized warning report, that is, step 403 is executed.

With continued reference to fig. 6, in an exemplary implementation, for example, in a region facing an electronic product, there is a requirement on a tabu gesture of a portrait, there is a requirement on some animals, and there is a requirement on some flags and map identifications, and then a tabu gesture recognition model, a tabu animal recognition model, a tabu flag recognition model and a sensitive region recognition model need to be selected to jointly complete the recognition processing of the picture resource. Namely, after the processing in step S203, the picture resources determined to be not false report picture resources need to be input into a tabu gesture recognition model, a tabu animal recognition model, a tabu flag recognition model and a sensitive region recognition model of the model layer, that is, step S302 is executed, and the tabu gesture recognition model performs tabu picture recognition; step S302 is executed, and a tabu animal identification model carries out tabu picture identification; step S304 is executed, and the tabu flag recognition model performs tabu picture recognition; step S305 is executed, where the sensitive area recognition model performs tabu picture recognition.

For ease of illustration, the description will be given with respect to a parallel execution mode. Namely, after the picture resources processed by the reinforcement learning layer are input into a tabu gesture recognition model and a tabu animal recognition model in a model layer, hand data training in a gesture recognition data set is carried out on the basis of an SSD target recognition frame in the tabu gesture recognition model to obtain a hand recognition model, hand recognition is carried out on the picture resources, further, the hand picture resources are recognized, a boundary box is marked in the recognized hand picture resources, then the hand picture resources marked with the boundary box are delivered to 21 3D hand joint points in a hand area in the hand picture resources by a gesture recognition model of a hand MediaPipe frame structure, coordinate information of the 21 3D hand joint points is extracted, finally, the coordinate information of the 21 3D landmarks is analyzed by adopting an SVM algorithm, so that the gesture in the hand picture resources is predicted, and whether the gesture is a tabu gesture is determined; the contraindicated animal identification model identifies whether the picture resource contains contraindicated animals according to the mode described in the implementation mode; the tabu flag recognition model recognizes whether the picture resource contains a tabu flag according to the mode described in the implementation manner; the sensitive region identification model identifies whether the picture resource contains a sensitive region in the manner described in the above implementation.

In addition, it should be noted that, the step S401 is executed by the judging layer, whether after the tabu gesture recognition model performs the tabu image recognition, the tabu animal recognition model performs the tabu image recognition, the tabu flag recognition model performs the tabu image recognition, or the sensitive area recognition model performs the tabu image recognition.

It can be understood that, if the tabu gesture recognition model performs the tabu image recognition, the step S401 executed in the judging layer specifically determines whether the tabu image is the tabu image according to the output of the tabu gesture recognition model; if the tabu animal identification model carries out tabu picture identification, the step S401 executed in the judging layer specifically determines whether the tabu picture is based on the output of the tabu animal identification model; if the tabu flag recognition model is the tabu picture recognition, the step S401 executed in the judging layer is to determine whether the tabu picture is the tabu picture or not according to the output of the tabu flag recognition model; if the detection result is that the detection result is a tabu image, the step S401 executed in the judgment layer is specifically to determine whether the detection result is a tabu image according to the output of the detection result.

Accordingly, if the current picture resource is not a taboo picture (neither a taboo picture of a taboo gesture type nor a taboo picture of a taboo animal type nor a taboo picture of a taboo flag type nor a taboo picture of a sensitive region type) after the judgment in step S401, step S402 is performed to filter out the non-taboo picture, thereby ensuring that the last released operating system version does not include the taboo picture of the taboo gesture type, the taboo picture of the taboo animal type, the taboo picture of the taboo flag type and the taboo picture of the sensitive region type; if the determination in step S401 is made that the picture resource is a taboo picture, whether it is a taboo picture of a taboo gesture type, a taboo picture of a taboo animal type, a taboo picture of a taboo flag type, or a taboo picture of a sensitive region type, the information such as the name, the path address, the visualized picture resource, etc. of the picture resource is recorded in the visualized warning report, that is, step 403 is executed.

It should be understood that the above description is only an example for better understanding of the technical solution of the present embodiment, and is not to be taken as the only limitation of the present embodiment. In practical application, any one of 5 recognition models given by a model layer can be selected for performing tabu picture recognition operation according to service requirements, any two of 5 recognition models given by the model layer can be selected for performing tabu picture recognition operation, any three of 5 recognition models given by the model layer can be selected for performing tabu picture recognition operation, any four of 5 recognition models given by the model layer can be selected for performing tabu picture recognition operation, and the 5 recognition models given by the model layer can be selected for performing tabu picture recognition operation together.

In addition, it should be further noted that, in an actual application scenario, the method for identifying a tabu picture provided in each embodiment implemented by an electronic device may also be implemented by a chip system included in the electronic device, where the chip system may include a processor. The chip system may be coupled to a memory such that the chip system, when running, invokes a computer program stored in the memory, implementing the steps performed by the electronic device described above. The processor in the chip system can be an application processor or a non-application processor.

In addition, the embodiment of the application further provides a computer readable storage medium, and the computer storage medium stores computer instructions, which when executed on the electronic device, cause the electronic device to execute the related method steps to implement the tabu picture identification method in the embodiment.

In addition, the embodiment of the application also provides a computer program product, when the computer program product runs on the electronic device, the electronic device is caused to execute the related steps so as to realize the tabu picture identification method in the embodiment.

In addition, embodiments of the present application also provide a chip (which may also be a component or module) that may include one or more processing circuits and one or more transceiver pins; the processing circuit executes the related method steps to realize the tabu picture identification method in the embodiment so as to control the receiving pin to receive signals and control the sending pin to send signals.

In addition, as can be seen from the foregoing description, the electronic device, the computer-readable storage medium, the computer program product, or the chip provided in the embodiments of the present application are used to perform the corresponding methods provided above, and therefore, the advantages achieved by the method can refer to the advantages in the corresponding methods provided above, which are not repeated herein.

The above embodiments are merely for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A taboo picture identification method, characterized in that the method comprises:

acquiring picture resources related in an operating system, and the name and stored path address of each picture resource; the obtaining the picture resources related in the operating system comprises the following steps: after receiving a distributed version control system command for scanning picture resources in a code bin, automatically scanning the picture resources in the code bin; the formats of the picture resources comprise a portable network graphic format, a joint image expert group format and a drawing object format; wherein, the background color of the picture resources in the portable network graphic format and the joint image expert group format in the code bin is transparent, and the picture resources cannot be seen by naked eyes of users; the picture resources in the drawable object format are stored in the code bin in an XML format, and cannot be seen by the naked eyes of a user;

For each picture resource, carrying out picture resource preprocessing operation, and processing the picture resource which cannot be seen by the naked eyes of a user into a picture resource which can be seen by the naked eyes of the user; and performing a picture resource preprocessing operation on each picture resource, and processing the picture resource which cannot be seen by the naked eyes of the user into a picture resource which is seen by the naked eyes of the user, wherein the picture resource preprocessing operation comprises the following steps: unifying the size and the channel number of the picture resources in the portable network graphic format and the combined image expert group format, and performing visualization processing; the method comprises the steps of adapting the attribute and the value of the picture resource of the drawing object format, converting the picture resource from the drawing object format to a scalable vector image format, converting the picture resource of the scalable vector image format to the picture resource of the portable network graphic format, unifying the size and the channel number of the picture resource, and performing visualization processing;

for each picture resource visible to the naked eyes of a user, extracting characteristic information from the picture resource according to a ResNet50 convolutional neural network model, classifying the characteristic information according to a classification model, and identifying the picture resource which is mistakenly regarded as a tabu picture before the tabu picture identification operation;

Filtering the picture resources which are mistakenly regarded as the tabu pictures before the tabu picture identification operation of the time from the picture resources; wherein the filtered picture resources which are mistakenly considered as tabu pictures are still stored in the code bin;

for each picture resource visible to the naked eyes of a user, identifying a hand picture resource in the picture resource according to a hand identification model, wherein the hand picture resource is a picture resource comprising hands in the picture resource;

determining coordinate information of a hand joint point in the hand picture resource according to the gesture recognition model;

predicting whether the hand picture resource is a tabu picture or not according to a Support Vector Machine (SVM) algorithm and the coordinate information of the joint points;

when the picture resource is determined to be a tabu picture, generating a visual warning report according to the name and the path address of the picture resource which is determined to be the tabu picture and the tabu picture visible to the naked eyes of a user, which are acquired from a code bin;

receiving review information which is made by a review person in the visual warning report for each picture resource which is identified as a tabu picture, wherein the review information corresponding to each tabu picture is used for describing whether the tabu picture is a false report picture resource, and the false report picture resource is a picture resource which is determined by the review person and is not the tabu picture;

Deleting the false alarm picture resources, names of the false alarm picture resources and path addresses from the visual warning report according to the review information corresponding to each tabu picture recorded in the visual warning report;

deleting the picture resources marked as tabu pictures in the visual warning report from the code bin according to the visual warning report of deleting the related information of the false report picture resources;

copying the false positive picture resource to construct a learning data set;

re-performing oversampling training on the ResNet50 convolutional neural network model based on the data in the learning data set to obtain a self-learned ResNet50 convolutional neural network model, and re-performing fitting training on the classification model based on the data in the learning data set to obtain the self-learned classification model; when the picture resources related in the operating system comprise the false alarm picture resources, the false alarm picture resources are filtered out when the tabu picture identification is carried out based on the re-trained ResNet50 convolutional neural network model and the re-trained classification model, and the hand identification model does not carry out the tabu picture identification on the same false alarm picture resources.

2. The method of claim 1, wherein the hand recognition model is based on hand data training in a gesture recognition dataset by an SSD destination recognition framework to obtain a hand recognition for the picture resource and return to a bounding box.

3. The method according to claim 1, wherein the gesture recognition model is a MediaPipe framework structure, and is used for positioning 21 3D hand joint points inside a hand region in the hand picture resource recognized by the hand recognition model, and extracting coordinate information of the 21 3D hand joint points.

4. The method according to claim 1, wherein the method further comprises:

identifying a tabu animal picture resource in the picture resources according to a tabu animal identification model, wherein the tabu animal picture resource is a picture resource comprising tabu animals in the picture resources;

the tabu animal identification model is obtained by training data in an image data set and picture data of tabu animals crawled by a network based on a ResNext convolutional neural network.

5. The method according to any one of claims 1 to 4, further comprising:

Identifying a tabu flag picture resource in the picture resources according to a tabu flag identification model, wherein the tabu flag picture resource is the picture resource comprising a tabu flag in the picture resources, and the tabu flag identification model is a YoloV3 framework structure;

and/or the number of the groups of groups,

and identifying the sensitive area picture resources in the picture resources according to a sensitive area identification model, wherein the sensitive area picture resources are picture resources including sensitive areas in the picture resources, and the sensitive area identification model is a YoloV3 framework structure.

6. An electronic device, the electronic device comprising: a memory and a processor, the memory and the processor coupled; the memory stores program instructions that, when executed by the processor, cause the electronic device to perform the tabu picture identification method of any one of claims 1 to 5.

7. A computer readable storage medium comprising a computer program which, when run on an electronic device, causes the electronic device to perform the tabu picture recognition method of any one of claims 1 to 5.