CN117975173A - Child evil dictionary picture identification method and device based on light-weight visual converter - Google Patents
Child evil dictionary picture identification method and device based on light-weight visual converter Download PDFInfo
- Publication number
- CN117975173A CN117975173A CN202410389340.1A CN202410389340A CN117975173A CN 117975173 A CN117975173 A CN 117975173A CN 202410389340 A CN202410389340 A CN 202410389340A CN 117975173 A CN117975173 A CN 117975173A
- Authority
- CN
- China
- Prior art keywords
- network
- convolution
- layer
- converter
- mobile network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000000007 visual effect Effects 0.000 title claims abstract description 45
- 238000000034 method Methods 0.000 title claims abstract description 38
- 239000013598 vector Substances 0.000 claims abstract description 63
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 69
- 238000010606 normalization Methods 0.000 claims description 48
- 230000004913 activation Effects 0.000 claims description 46
- 238000012549 training Methods 0.000 claims description 16
- 230000009849 deactivation Effects 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000004422 calculation algorithm Methods 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 15
- 238000012360 testing method Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 5
- 230000004044 response Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 244000052769 pathogen Species 0.000 description 1
- 230000009323 psychological health Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000013515 script Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a child evil dictionary picture identification method and device based on a lightweight visual converter, and relates to the technical field of image identification. The identification method comprises the steps of S1, acquiring a cartoon image to be identified. S2, preprocessing the cartoon image. S3, inputting the preprocessed cartoon image into a trained real-time child evil dictionary picture identification model based on a lightweight visual converter, and obtaining a prediction vector. And S4, comparing and judging the prediction vector based on the prediction threshold value to judge whether the cartoon image belongs to the child evil dictionary picture. The network structure of the real-time child evil dictionary picture identifying model based on the light-weight visual transducer comprises a first convolution layer network, a first mobile network, a second mobile network, a third mobile network, a fourth mobile network, a first light-weight transducer network, a fifth mobile network, a second light-weight transducer network, a sixth mobile network, a third light-weight transducer network, a second convolution layer network and a multi-layer perceptron which are sequentially connected.
Description
Technical Field
The invention relates to the technical field of image recognition, in particular to a child evil dictionary picture recognition method and device based on a lightweight visual converter.
Background
A child's dictionary picture refers to a class of inappropriate images that might appear on a child content platform. These pictures often camouflage a child-friendly look, but actually contain subjects and scenes that are not suitable for viewing by children.
The image automatic recognition technology for the classical image of children is still in the development stage compared with the general image recognition technology for recognizing the object contained in the image, and the main challenges thereof include the following five points.
1. The camouflage is strong: child pathogens are often imitated or camouflaged into normal child content, making them difficult to automatically identify. The producer may ingeniously use a familiar child animated character or style, confusing the boundaries between truly child friendly content and inappropriate content.
2. Diversity and variation: producers continually try new ways to circumvent recognition. This diversity and the changing nature add to the complexity of the recognition algorithm.
3. The difficulty of data acquisition is high: training and optimizing models requires a large amount of marking data, which may be limited by ethical and legal aspects.
4. Semantic understanding: some evil content may contain a hidden malicious theme, and a deep understanding of semantics and context is required. This makes simple image feature extraction likely to fail to capture all of the problems.
5. Deployment cost: to cope with the above four problems, the conventional model needs to be built very large, thus resulting in difficulty in deployment at the edge equipment and slower running speed.
In view of the above, the applicant has studied the prior art and has made the present application.
Disclosure of Invention
The invention provides a child evil dictionary picture identifying method and device based on a lightweight visual converter, so as to improve at least one of the technical problems.
In a first aspect, an embodiment of the present invention provides a method for identifying a child evil dictionary based on a lightweight visual converter, which includes steps S1 to S4.
S1, acquiring a cartoon image to be identified.
S2, preprocessing according to the cartoon image.
And S3, inputting the preprocessed cartoon image into a pre-trained real-time child evil dictionary picture identification model based on the lightweight visual converter, and obtaining whether the cartoon image belongs to a predictive vector of the child evil dictionary picture.
And S4, comparing and judging the prediction vector based on a prediction threshold value so as to judge whether the cartoon image belongs to a child evil dictionary picture.
The network structure of the real-time child evil dictionary picture identifying model based on the light-weight visual converter comprises a first convolution layer network, a first mobile network, a second mobile network, a third mobile network, a fourth mobile network, a first light-weight converter network, a fifth mobile network, a second light-weight converter network, a sixth mobile network, a third light-weight converter network, a second convolution network and a multi-layer perceptron which are sequentially connected.
In an alternative embodiment, the training step of the real-time child evil dictionary picture identifying model based on the lightweight visual converter includes steps A1 to A4.
A1, carrying out batch random selection from the training set. Wherein each batch is n pictures.
A2, scaling the scale of n pictures in each batch to 256×256, and then carrying out data enhancement on all the pictures in the batch.
And A3, inputting the picture data with the enhanced data into a real-time child evil dictionary picture identification model which is not trained and is based on a lightweight visual converter, and obtaining a group of prediction confidence.
And A4, carrying out loss calculation on the prediction confidence coefficient and the labels of the n pictures, and optimizing the obtained loss value through a back propagation algorithm until training is completed. Wherein the Loss function adopts a Focal Loss function.
The Focal Loss function is:
,
in the method, in the process of the invention, Representing a class loss function,/>Is a hyper-parameter for balancing the non-uniformity of positive and negative samples in the loss function,/>Classification predictions for representative models,/>Hyper-parameters for adjusting the loss of easy and difficult samples in a loss function,/>, for exampleCategory labels representing pictures.
In an alternative embodiment, the first convolutional layer network input is a picture of size 256×256×3, and the output is a vector of size 128×128×16. The output of the first mobile network is a 128 x 16 vector. The output of the second mobile network is a 64 x 24 vector. The output of the third mobile network is a 64 x 24 vector. The output of the fourth mobile network is a 32 x 48 vector. The output of the first lightweight converter network is a 32 x 48 vector. The output of the fifth mobile network is a 16 x 64 vector. The output of the second lightweight converter network is a 16×16×64 vector. The output of the sixth mobile network is an 8x 80 vector. The output of the third lightweight converter network is a vector of 8 multiplied by 80, and the third lightweight converter network is formed by connecting 4 convolutional layer networks and 3 converter modules. The output of the second convolutional layer network is an 8x 320 vector. The output of the multi-layer perceptron is a confidence of 1 x 2.
In an alternative embodiment, the first convolutional layer network comprises a convolutional layer, a bulk normalization layer, and SiLU activation functions connected in sequence. Wherein the convolution layer of the first convolution layer network comprises n=16 convolution kernels of size 3×3.
In an alternative embodiment, the second convolutional layer network comprises a sequentially connected convolutional layer, a batch normalization layer, and SiLU activation function sequentially connected components. Wherein the convolution layer of the second convolution layer network comprises n=320 convolution kernels of size 1×1.
In an alternative embodiment, the first mobile network and the third mobile network each employ a residual structure. The mobile network of the residual structure comprises a grouping convolution, a batch normalization layer, siLU activation functions, a convolution layer, a batch normalization layer, siLU activation functions and addition operation sequence connection which are connected in sequence.
The packet convolution of the first mobile network comprises 32 convolution kernels of size 3 x 3, step size 1, and number of groups 32. The convolution layer of the first mobile network comprises 16 convolution kernels of size 1×1, step size 1, and padding 0.
The packet convolution of the third mobile network comprises 48 convolution kernels of size 3 x 3, step size 1, and number of groups 48. The convolution layer of the third mobile network comprises 24 convolution kernels of size 1×1, step size 1, and padding 0.
In an alternative embodiment, the second mobile network, the fourth mobile network, the fifth mobile network and the sixth mobile network all adopt a sequential connection structure. The mobile network of the sequential connection structure comprises a packet convolution, a batch normalization layer, siLU activation functions, a convolution layer, a batch normalization layer, and SiLU activation functions, which are sequentially connected.
The packet convolution of the second mobile network comprises 32 convolution kernels of size 3 x3, step size 1, and number of groups 32. The convolutional layer of the second mobile network comprises 24 convolutional kernels of size 1 x 1, step size 1, and padding 0.
The packet convolution of the fourth mobile network comprises 48 convolution kernels of size 3 x3, step size 1, and number of groups 48. The convolution layer of the fourth mobile network comprises 48 convolution kernels of size 1×1, step size 1 and padding 0.
The packet convolution of the fifth mobile network comprises 96 convolution kernels of size 3 x 3, step size 1, and number of groups 96. The convolution layer of the fifth mobile network comprises 64 convolution kernels of size 1×1, step size 1, and padding 0.
The packet convolution of the sixth mobile network comprises 128 convolution kernels of size 3 x 3, step size 1, and number of groups 128. The convolution layer of the sixth mobile network comprises 80 convolution kernels of size 1 x1, step size 1, and padding 0.
In an alternative embodiment, the lightweight converter network comprises a first convolutional layer network, a second convolutional layer network, a vector stretch operation, a plurality of converter modules, a vector stretch operation, and a third convolutional layer network connected in sequence. Wherein the number of converter modules of the first lightweight converter network is 2. The number of converter modules of the second lightweight converter network is 4. The number of converter modules of the third lightweight converter network is 3. The first convolutional layer network comprises a convolutional layer, a batch normalization layer and SiLU activation functions which are connected in sequence. The convolution kernel size of the first convolution layer network is 3×3. The second convolution layer network and the third convolution layer network each comprise a convolution layer, a batch normalization layer and SiLU activation functions which are sequentially connected. The convolution kernel sizes of the second convolution layer network and the third convolution layer network are 1 multiplied by 1.
The number of convolution kernels of convolution layer number one of the first lightweight transformer network is 48. The number of convolution kernels of convolution layer number two of the first lightweight transformer network is 64. The number of convolution kernels of convolution layer No. three of the first lightweight transformer network is 48.
The number of convolution kernels of the first convolution layer of the second lightweight transformer network is 64. The number of convolution kernels of convolution layer number two of the second lightweight transformer network is 80. The number of convolution kernels of convolution layer No. three of the second lightweight transformer network is 64.
The number of convolution kernels of convolution layer number one of the third lightweight transformer network is 80. The number of convolution kernels of convolution layer number two of the third lightweight transformer network is 96. The number of convolution kernels of convolution layer No. three of the third lightweight transformer network is 80.
In an alternative embodiment, the converter module comprises a connected attention layer and feed forward network.
The attention layer of the converter module includes a first layer normalization layer, a first linear layer, a vector stretch operation, an attention operation, a vector stretch operation, and a second linear layer connected in sequence.
The feed forward network of the converter module includes a second normalization layer, a third linear layer, siLU activation functions, a first Dropout random deactivation, a fourth linear layer, and a second Dropout random deactivation connected in sequence.
Wherein the output dimension of the first linear layer is 96. The output dimension of the second linear layer is the same as the input vector dimension of the converter module. The output dimension of the third linear layer is twice the input vector of the converter module. The output dimension of the fourth linear layer is the same as the input vector dimension of the converter. The deactivation rate of both the first Dropout random deactivation and the second Dropout random deactivation was 0.1.
In an alternative embodiment, the multi-layered perceptron includes 5 layers of sequentially connected linear layers of 320×8×8 input dimensions and output dimensions, layer normalization layers, and ReLU activation functions, and sequentially connected linear layers of 320×8×8 output dimensions of 1×1×2, layer normalization layers, and Sigmoid activation functions.
In a second aspect, an embodiment of the present invention provides a child evil dictionary picture identifying apparatus based on a lightweight visual transducer, which includes a processor, a memory, and a computer program stored in the memory. The computer program is executable by the processor to implement a child evil dictionary picture recognition method based on a lightweight visual converter as described in any of the first aspects.
In a third aspect, embodiments of the present invention provide a computer-readable storage medium. The computer readable storage medium comprises a stored computer program, wherein the computer program controls a device where the computer readable storage medium is located to execute the method for identifying the child evil dictionary based on the lightweight visual converter according to any section of the first aspect.
By adopting the technical scheme, the invention can obtain the following technical effects:
The child evil dictionary picture identifying method based on the lightweight visual converter does not need a large amount of calculation force, can identify the child evil dictionary picture in the data in real time under the background of large data flow by utilizing the computer visual analysis technology, and relieves the challenges of strong camouflage property, diversity change, semantic understanding and high deployment cost for the child evil dictionary. The recognition model for the classical children can quickly and efficiently recognize various illegal scenes with low cost.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a method for identifying a child dictionary picture based on a lightweight visual transducer.
Fig. 2 is a detailed network structure diagram of a child dictionary picture recognition model in an embodiment.
Fig. 3 is a schematic diagram of a first convolutional layer network in an embodiment.
Fig. 4 is a schematic diagram of a second convolutional layer network structure in an embodiment.
Fig. 5 is a schematic diagram of a mobile network structure (residual structure) in an embodiment.
Fig. 6 is a schematic diagram of a mobile network structure (sequential connection structure) in an embodiment.
Fig. 7 is a schematic diagram of a lightweight converter network structure in an embodiment.
Fig. 8 is a schematic diagram of a converter module structure in an embodiment.
Fig. 9 is a schematic structural diagram of a multi-layer perceptron in an embodiment.
Fig. 10 is a schematic diagram of a training process in an embodiment.
Fig. 11 is a schematic diagram of a test flow in an embodiment.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1 to 11, a first embodiment of the present invention provides a method for identifying a child dictionary picture based on a lightweight visual transducer, which can be performed by a child dictionary picture identifying device (hereinafter referred to as an identifying device) based on a lightweight visual transducer. In particular, by one or more processors in the identification device, to implement steps S1 to S4.
S1, acquiring a cartoon image to be identified.
The cartoon image to be identified can be a normal cartoon image or a child evil dictionary image similar to the normal image.
It is understood that the identification device may be an electronic device with computing capabilities, such as a portable notebook computer, a desktop computer, a server, a smart phone, or a tablet computer.
S2, preprocessing according to the cartoon image.
And S3, inputting the preprocessed cartoon image into a pre-trained real-time child evil dictionary picture identification model based on the lightweight visual converter, and obtaining whether the cartoon image belongs to a predictive vector of the child evil dictionary picture.
And S4, comparing and judging the prediction vector based on a prediction threshold value so as to judge whether the cartoon image belongs to a child evil dictionary picture.
As shown in fig. 2, the network structure of the real-time child evil dictionary picture recognition model based on the lightweight visual converter includes a first convolution layer network, a first mobile network, a second mobile network, a third mobile network, a fourth mobile network, a first lightweight converter network, a fifth mobile network, a second lightweight converter network, a sixth mobile network, a third lightweight converter network, a second convolution network, and a multi-layer perceptron that are sequentially connected.
Preferably, the first convolution layer network input is a picture with a size of 256×256×3, and the output is a vector with a size of 128×128×16. The output of the first mobile network is a 128 x 16 vector. The output of the second mobile network is a 64 x 24 vector. The output of the third mobile network is a 64 x 24 vector. The output of the fourth mobile network is a 32 x 48 vector. The output of the first lightweight converter network is a 32 x 48 vector. The output of the fifth mobile network is a 16 x 64 vector. The output of the second lightweight converter network is a 16×16×64 vector. The output of the sixth mobile network is an 8 x 80 vector. The output of the third lightweight converter network is a vector of 8 multiplied by 80, and the third lightweight converter network is formed by connecting 4 convolutional layer networks and 3 converter modules. The output of the second convolutional layer network is an 8 x 320 vector. The output of the multi-layer perceptron is a confidence of 1 x 2.
Specifically, the real-time child evil dictionary picture identifying model based on the light-weight visual converter in the embodiment of the invention adopts the light-weight visual converter as a core module. Compared with the traditional vision transducer, the light vision transducer has no obvious reduction in precision, but has smaller parameter quantity, and can be deployed on edge equipment and smoothly run. The low-power-consumption real-time child evil dictionary picture identification model based on the lightweight visual converter judges whether the picture belongs to the child evil dictionary or not by understanding the semantics of the picture.
The child evil dictionary picture identifying method based on the lightweight visual converter does not need a large amount of calculation force, can identify the child evil dictionary picture in the data in real time under the background of large data flow by utilizing the computer visual analysis technology, and relieves the challenges of strong camouflage property, diversity change, semantic understanding and high deployment cost for the child evil dictionary. So that the recognition model of the children's evil dictionary can recognize various offensive scenes with low cost, high speed and high efficiency
In an alternative embodiment of the present invention, as shown in fig. 3, based on the above embodiment, the first convolution layer network includes a convolution layer, a batch normalization layer, and SiLU activation functions connected in sequence. Wherein the convolution layer of the first convolution layer network comprises n=16 convolution kernels of size 3×3.
In an alternative embodiment of the present invention, as shown in fig. 4, based on the above embodiment, the second convolution layer network includes a convolution layer, a batch normalization layer, and SiLU activation functions connected sequentially. Wherein the convolution layer of the second convolution layer network comprises n=320 convolution kernels of size 1×1.
In an alternative embodiment of the present invention, as shown in fig. 5, on the basis of the above embodiment, the first mobile network and the third mobile network each adopt a residual structure. The mobile network of the residual structure comprises a grouping convolution, a batch normalization layer, siLU activation functions, a convolution layer, a batch normalization layer, siLU activation functions and addition operation sequence connection which are connected in sequence. Wherein the summing operation is to sum the input of the mobile network of the residual structure and the output of the last SiLU activation function.
Specifically, the first mobile network is composed of a group convolution (32 convolution kernels with a size of 3×3, a step size of 1, and a group number of 32), a batch normalization layer, siLU activation functions, a convolution layer (16 convolution kernels with a size of 1×1, a step size of 1, and a filling of 0), a batch normalization layer, siLU activation functions, and an addition operation sequence.
The third mobile network consists of a group convolution (48 convolution kernels of size 3×3, step size 1, group number 48), a batch normalization layer, siLU activation functions, a convolution layer (24 convolution kernels of size 1×1, step size 1, padding 0), a batch normalization layer, siLU activation functions, and an addition operation order concatenation.
As shown in fig. 6, in an alternative embodiment of the present invention, the second mobile network, the fourth mobile network, the fifth mobile network and the sixth mobile network all adopt a sequential connection structure based on the above embodiment. The mobile network of the sequential connection structure comprises a packet convolution, a batch normalization layer, siLU activation functions, a convolution layer, a batch normalization layer, and SiLU activation functions, which are sequentially connected.
Specifically, the second mobile network is composed of a group convolution (32 convolution kernels with a size of 3×3, a step size of 1, and a group number of 32), a batch normalization layer, siLU activation functions, a convolution layer (24 convolution kernels with a size of 1×1, a step size of 1, and a filling of 0), a batch normalization layer, and SiLU activation functions connected in sequence.
The fourth mobile network consists of a sequential concatenation of packet convolutions (48 convolution kernels of size 3 x3, step size 1, number of groups 48), batch normalization layers, siLU activation functions, convolution layers (48 convolution kernels of size 1 x1, step size 1, padding 0), batch normalization layers, and SiLU activation functions.
The fifth mobile network consists of a sequential concatenation of a packet convolution (96 convolution kernels of size 3 x3, step size 1, group number 96), a batch normalization layer, siLU activation functions, a convolution layer (64 convolution kernels of size 1 x1, step size 1, padding 0), a batch normalization layer, and SiLU activation functions.
The sixth mobile network consists of a sequential concatenation of packet convolutions (128 convolution kernels of size 3 x 3, step size 1, group number 128), batch normalization layers, siLU activation functions, convolution layers (80 convolution kernels of size 1 x 1, step size 1, padding 0), batch normalization layers, and SiLU activation functions.
As shown in fig. 7, in an alternative embodiment of the present invention, the lightweight converter network includes a first convolution layer network, a second convolution layer network, a vector stretching operation, a plurality of converter modules, a vector stretching operation, and a third convolution layer network, which are sequentially connected. Wherein the number of converter modules of the first lightweight converter network is 2. The number of converter modules of the second lightweight converter network is 4. The number of converter modules of the third lightweight converter network is 3.
The first convolutional layer network comprises a convolutional layer, a batch normalization layer and SiLU activation functions which are connected in sequence. The convolution kernel size of the first convolution layer network is 3×3. The second convolution layer network and the third convolution layer network each comprise a convolution layer, a batch normalization layer and SiLU activation functions which are sequentially connected. The convolution kernel sizes of the second convolution layer network and the third convolution layer network are 1 multiplied by 1.
The number of convolution kernels of convolution layer number one of the first lightweight transformer network is 48. The number of convolution kernels of convolution layer number two of the first lightweight transformer network is 64. The number of convolution kernels of convolution layer No. three of the first lightweight transformer network is 48.
The number of convolution kernels of the first convolution layer of the second lightweight transformer network is 64. The number of convolution kernels of convolution layer number two of the second lightweight transformer network is 80. The number of convolution kernels of convolution layer No. three of the second lightweight transformer network is 64.
The number of convolution kernels of convolution layer number one of the third lightweight transformer network is 80. The number of convolution kernels of convolution layer number two of the third lightweight transformer network is 96. The number of convolution kernels of convolution layer No. three of the third lightweight transformer network is 80.
Specifically, the first convolutional layer network and the fourth convolutional layer network have the same structure as the first convolutional layer network, but n is 48, 64, or 80 (corresponding to the first lightweight converter network, the second lightweight converter network, and the third lightweight converter network, respectively). The convolutional layer network No. two is the same as the convolutional layer network No. two, but n is 64, 80, or 96 (corresponding to the first lightweight converter network, the second lightweight converter network, and the third lightweight converter network, respectively). The third convolutional layer network is the same as the second convolutional layer network, but n is 48, 64, or 80 (corresponding to the first, second, and third lightweight converter networks, respectively).
In an alternative embodiment of the invention, as shown in fig. 8, on the basis of the above-described embodiment, the converter module comprises a connected attention layer and feed-forward network.
The attention layer of the converter module includes a first layer normalization layer, a first linear layer, a vector stretch operation, an attention operation, a vector stretch operation, and a second linear layer connected in sequence.
The feed forward network of the converter module includes a second normalization layer, a third linear layer, siLU activation functions, a first Dropout random deactivation, a fourth linear layer, and a second Dropout random deactivation connected in sequence.
Wherein the output dimension of the first linear layer is 96. The output dimension of the second linear layer is the same as the input vector dimension of the converter module. The output dimension of the third linear layer is twice the input vector of the converter module. The output dimension of the fourth linear layer is the same as the input vector dimension of the converter. The deactivation rate of both the first Dropout random deactivation and the second Dropout random deactivation was 0.1.
Specifically, the converter module is a base module of a lightweight converter network.
As shown in fig. 9, in an alternative embodiment of the present invention, the multi-layered perceptron includes 5-layered sequentially connected linear layers, layer normalization layers, and ReLU activation functions with input dimensions and output dimensions of 320×8×8, and sequentially connected linear layers, layer normalization layers, and Sigmoid activation functions with input dimensions of 320×8×8 and output dimensions of 1×1×2.
Specifically, the child dictionary picture is input into a low-power-consumption real-time child dictionary picture identification model based on a lightweight visual transducer, and the model outputs the confidence degree of whether the picture is the child dictionary.
Aiming at the defects of the disclosed child dictionary identification product and technology thereof, the embodiment of the invention is particularly used for solving the problems that the child dictionary pictures in various scenes cannot be identified, the highly camouflaged child dictionary pictures cannot be identified, real-time running is difficult to be deployed on edge equipment and the like.
The real-time child evil dictionary picture identifying model based on the light-weight visual converter is high in identifying accuracy and identifying speed, and can be operated smoothly on edge equipment with less calculation force required by the light-weight model, so that various child evil dictionary images can be better understood in more scenes. Therefore, the harm of the evil dictionary of the children to the psychological health of the children can be greatly reduced.
As shown in fig. 10, in an alternative embodiment of the present invention, the training step of the real-time child dictionary picture recognition model based on the lightweight visual converter includes steps A1 to A4.
It will be appreciated that a picture identification dataset of the dictionary of children needs to be acquired prior to training. Specifically, a large number of pictures are crawled on main stream websites at home and abroad and some websites according to keywords such as 'children's evil dictionary ',' elsagate ', and the like through crawler scripts, and then manual secondary screening is carried out to obtain a high-quality children's evil dictionary data set. These pictures include two categories, normal animated images and images of children's evil dictionary.
After the data set is manufactured, the children's evil dictionary data set is divided into a training set and a testing set, so that a low-power consumption real-time children's evil dictionary picture identification model of the training and testing light-weight visual converter is provided. Wherein, the pictures of the training set and the test set are not repeated with each other.
A1, carrying out batch random selection from the training set. Wherein each batch is n pictures. Specifically, the value of n can be reasonably selected according to the size of the video memory of the GPU device, which is not particularly limited in the present invention.
A2, scaling the scale of n pictures in each batch to 256×256, and then carrying out data enhancement on all the pictures in the batch. The data enhancement is performed by MixUp, color space, picture rotation, affine transformation and other data enhancement methods.
And A3, inputting the picture data with the enhanced data into a real-time child evil dictionary picture identification model which is not trained and is based on a lightweight visual converter, and obtaining a group of prediction confidence.
And A4, carrying out loss calculation on the prediction confidence coefficient and the labels of the n pictures, and optimizing the obtained loss value through a back propagation algorithm until training is completed. Wherein the Loss function adopts a Focal Loss function.
The Focal Loss function is:
,
in the method, in the process of the invention, Representing a class loss function,/>Is a hyper-parameter for balancing the non-uniformity of positive and negative samples in the loss function,/>Classification predictions for representative models,/>Hyper-parameters for adjusting the loss of easy and difficult samples in a loss function,/>, for exampleCategory labels representing pictures.
Through the learning mode, the low-power-consumption real-time child evil dictionary picture recognition model based on the lightweight visual converter is enabled to learn and understand the scene of the child evil dictionary continuously and iteratively.For adjusting the loss of easy and difficult samples, the loss function can be made to pay more attention to the difficult samples. /(I)Class labels representing pictures are between 0 and 1.
In an alternative embodiment of the present invention, as shown in fig. 11, the accuracy of the model needs to be tested after training.
First, a picture is read from a test set portion of a child evil data set, and the read picture is uniformly scaled to 256×256 pictures. And then converting the picture data into tensors, and inputting the tensors into a low-power-consumption real-time child evil dictionary picture identification model based on a lightweight visual converter to obtain a group of prediction vectors. Finally, comparing the picture with a threshold value to judge whether the picture is a child evil dictionary.
Preferably, the threshold is 0.5 in this patent, and in other embodiments, the threshold may be set to other values, which the present invention is not limited to specifically. The real-time child evil dictionary picture identifying model based on the light-weight visual converter with low power consumption through the test can achieve 93.7% of identifying precision of the test set of the child evil dictionary identifying data set, and the recall rate is 91.3%.
The second embodiment of the invention provides a child evil dictionary picture identifying device based on a light vision converter, which comprises a processor, a memory and a computer program stored in the memory. The computer program can be executed by the processor to implement a child evil dictionary picture recognition method based on the lightweight visual converter according to any one of the embodiments.
Embodiment III, embodiments of the present invention provide a computer-readable storage medium. The computer readable storage medium comprises a stored computer program, wherein the computer program controls a device in which the computer readable storage medium is located to execute the method for identifying the child evil dictionary based on the lightweight visual converter according to any one of the embodiments.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus and method embodiments described above are merely illustrative, for example, flow diagrams and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present invention may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, an electronic device, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be understood that the term "and/or" as used herein is merely one relationship describing the association of the associated objects, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.
Depending on the context, the word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event)" or "in response to detection (stated condition or event), depending on the context.
References to "first\second" in the embodiments are merely to distinguish similar objects and do not represent a particular ordering for the objects, it being understood that "first\second" may interchange a particular order or precedence where allowed. It is to be understood that the "first\second" distinguishing aspects may be interchanged where appropriate, such that the embodiments described herein may be implemented in sequences other than those illustrated or described herein.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (5)
1. A child evil dictionary picture identifying method based on a light vision converter is characterized by comprising the following steps:
acquiring a cartoon image to be identified;
Preprocessing according to the cartoon image;
Inputting the preprocessed cartoon image into a pre-trained real-time child evil dictionary picture identification model based on a lightweight visual converter, and obtaining whether the cartoon image belongs to a predictive vector of the child evil dictionary picture;
comparing and judging the prediction vector based on a prediction threshold value to judge whether the cartoon image belongs to a child evil dictionary picture or not;
the network structure of the real-time child evil dictionary picture identifying model based on the light-weight visual converter comprises a first convolution layer network, a first mobile network, a second mobile network, a third mobile network, a fourth mobile network, a first light-weight converter network, a fifth mobile network, a second light-weight converter network, a sixth mobile network, a third light-weight converter network, a second convolution layer network and a multi-layer perceptron which are sequentially connected;
The first convolution layer network input is a picture with the size of 256×256×3, and outputs a vector with the size of 128×128×16; the output of the first mobile network is a 128 x 16 vector; the output of the second mobile network is a 64×64×24 vector; the output of the third mobile network is a 64×64×24 vector; the output of the fourth mobile network is a 32×32×48 vector; the output of the first lightweight converter network is a 32 x 48 vector; the output of the fifth mobile network is a 16×16×64 vector; the output of the second lightweight converter network is a 16×16×64 vector; the output of the sixth mobile network is a vector of 8 x 80; the output of the third light-weight converter network is a vector of 8 multiplied by 80, and the third light-weight converter network is formed by connecting 4 convolutional layer networks and 3 converter modules; the output of the second convolutional layer network is an 8×8×320 vector; the output of the multi-layer perceptron is 1 multiplied by 2 confidence;
The first convolution layer network comprises a convolution layer, a batch normalization layer and SiLU activation functions which are sequentially connected; wherein the convolution layer of the first convolution layer network comprises n=16 convolution kernels of size 3×3;
The second convolution layer network comprises convolution layers, batch normalization layers and SiLU activation functions which are sequentially connected; wherein the convolution layer of the second convolution layer network comprises n=320 convolution kernels of size 1×1;
The first mobile network and the third mobile network both adopt residual error structures;
The mobile network of the residual structure comprises a grouping convolution, a batch normalization layer, siLU activation functions, a convolution layer, a batch normalization layer, siLU activation functions and addition operation sequence which are connected in sequence;
the packet convolution of the first mobile network comprises 32 convolution kernels with the size of 3×3, the step size is 1, and the group number is 32; the convolution layer of the first mobile network comprises 16 convolution kernels with the size of 1 multiplied by 1, the step length is 1, and the filling is 0;
the packet convolution of the third mobile network comprises 48 convolution kernels with a size of 3×3, a step size of 1, and a group number of 48; the convolution layer of the third mobile network comprises 24 convolution kernels with the size of 1 multiplied by 1, the step length is 1, and the filling is 0;
The second mobile network, the fourth mobile network, the fifth mobile network and the sixth mobile network all adopt sequential connection structures;
the mobile network of the sequential connection structure comprises a grouping convolution, a batch normalization layer, siLU activation functions, a convolution layer, a batch normalization layer and SiLU activation functions which are sequentially connected;
the packet convolution of the second mobile network comprises 32 convolution kernels with the size of 3×3, the step size is 1, and the group number is 32; the convolution layer of the second mobile network comprises 24 convolution kernels with the size of 1 multiplied by 1, the step length is 1, and the filling is 0;
The packet convolution of the fourth mobile network comprises 48 convolution kernels with the size of 3×3, a step size of 1 and a group number of 48; the convolution layer of the fourth mobile network comprises 48 convolution kernels with the size of 1 multiplied by 1, the step length is 1, and the filling is 0;
the packet convolution of the fifth mobile network comprises 96 convolution kernels with the size of 3×3, the step size is 1, and the group number is 96; the convolution layer of the fifth mobile network comprises 64 convolution kernels with the size of 1×1, the step size is 1, and the padding is 0;
The packet convolution of the sixth mobile network comprises 128 convolution kernels of 3×3 size, a step size of 1, and a group number of 128; the convolution layer of the sixth mobile network comprises 80 convolution kernels with the size of 1×1, the step size is 1, and the filling is 0;
The lightweight converter network comprises a first convolution layer network, a second convolution layer network, vector stretching operation, a plurality of converter modules, vector stretching operation and a third convolution layer network which are connected in sequence; the first convolution layer network comprises a convolution layer, a batch normalization layer and SiLU activation functions which are sequentially connected; the convolution kernel size of the first convolution layer network is 3 multiplied by 3; the second convolution layer network and the third convolution layer network comprise convolution layers, batch normalization layers and SiLU activation functions which are sequentially connected; the convolution kernel sizes of the second convolution layer network and the third convolution layer network are 1 multiplied by 1;
The number of convolution kernels of the first convolution layer of the first lightweight converter network is 48; the number of convolution kernels of convolution layer number two of the first lightweight converter network is 64; the number of convolution kernels of convolution layer No. three of the first lightweight converter network is 48;
the number of convolution kernels of the first convolution layer of the second lightweight converter network is 64; the number of convolution kernels of the second convolution layer of the second lightweight converter network is 80; the number of convolution kernels of the third convolution layer of the second lightweight converter network is 64;
the number of convolution kernels of the first convolution layer of the third lightweight converter network is 80; the number of convolution kernels of the second convolution layer of the third lightweight converter network is 96; the number of convolution kernels of the third convolution layer of the third lightweight converter network is 80;
the number of converter modules of the first lightweight converter network is 2; the number of converter modules of the second lightweight converter network is 4; the number of converter modules of the third lightweight converter network is 3;
the converter module includes a connected attention layer and feed forward network;
the attention layer of the converter module comprises a first layer normalization layer, a first linear layer, a vector stretching operation, an attention operation, a vector stretching operation and a second linear layer which are sequentially connected;
The feedforward network of the converter module comprises a second normalization layer, a third linear layer, siLU activation functions, a first Dropout random deactivation, a fourth linear layer and a second Dropout random deactivation which are connected in sequence;
Wherein the output dimension of the first linear layer is 96; the output dimension of the second linear layer is the same as the input vector dimension of the converter module; the output dimension of the third linear layer is twice the input vector of the converter module; the output dimension of the fourth linear layer is the same as the input vector dimension of the converter; the deactivation rate of both the first Dropout random deactivation and the second Dropout random deactivation was 0.1.
2. The method for identifying the classical pattern of children based on the lightweight visual transducer according to claim 1, wherein the training step of the real-time classical pattern identification model of children based on the lightweight visual transducer comprises the following steps:
Randomly selecting batches from the training set; wherein each batch is n pictures;
scaling the scale of n pictures in each batch to 256×256, and then performing data enhancement on all the pictures in the batch;
inputting the picture data with the enhanced data into a real-time child evil dictionary picture identification model which is not trained and based on a lightweight visual converter, and obtaining a group of prediction confidence;
Carrying out loss calculation on the prediction confidence coefficient and the labels of the n pictures, and optimizing the obtained loss value through a back propagation algorithm until training is completed; wherein, the Loss function adopts a Focal Loss function;
The Focal Loss function is:
,
in the method, in the process of the invention, Representing a class loss function,/>Is a super-parameter for balancing the non-uniformity of positive and negative samples in the loss function,Classification predictions for representative models,/>Super-parameters for adjusting the losses of easy and difficult samples in the loss function,Category labels representing pictures.
3. The method for identifying a child's evil dictionary picture based on a lightweight visual converter of claim 1 or 2, wherein the multi-layered perceptron includes 5-layered sequentially connected linear layers of input dimension and output dimension 320 x 8, layer normalization layer and ReLU activation function, and sequentially connected linear layers of input dimension 320 x 8 and output dimension 1 x 2, layer normalization layer and Sigmoid activation function.
4. A child evil dictionary picture identifying device based on a light vision converter, which is characterized by comprising a processor, a memory and a computer program stored in the memory; the computer program is executable by the processor to implement a child evil dictionary picture recognition method based on a lightweight visual converter as set forth in any one of claims 1 to 3.
5. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored computer program, wherein the computer program when run controls a device in which the computer readable storage medium is located to perform a method for identifying a child's evil dictionary based on a lightweight visual converter as claimed in any one of claims 1 to 3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410389340.1A CN117975173B (en) | 2024-04-02 | 2024-04-02 | Child evil dictionary picture identification method and device based on light-weight visual converter |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410389340.1A CN117975173B (en) | 2024-04-02 | 2024-04-02 | Child evil dictionary picture identification method and device based on light-weight visual converter |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117975173A true CN117975173A (en) | 2024-05-03 |
CN117975173B CN117975173B (en) | 2024-06-21 |
Family
ID=90849893
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410389340.1A Active CN117975173B (en) | 2024-04-02 | 2024-04-02 | Child evil dictionary picture identification method and device based on light-weight visual converter |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117975173B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110929603A (en) * | 2019-11-09 | 2020-03-27 | 北京工业大学 | Weather image identification method based on lightweight convolutional neural network |
CN114612477A (en) * | 2022-03-03 | 2022-06-10 | 成都信息工程大学 | Lightweight image segmentation method, system, medium, terminal and application |
CN115690542A (en) * | 2022-11-03 | 2023-02-03 | 国网甘肃省电力公司 | Improved yolov 5-based aerial insulator directional identification method |
CN116309110A (en) * | 2023-01-06 | 2023-06-23 | 南京莱斯电子设备有限公司 | Low-light image defogging method based on lightweight deep neural network |
CN116563844A (en) * | 2023-04-06 | 2023-08-08 | 武汉轻工大学 | Cherry tomato maturity detection method, device, equipment and storage medium |
CN117456480A (en) * | 2023-12-21 | 2024-01-26 | 华侨大学 | Light vehicle re-identification method based on multi-source information fusion |
US20240064318A1 (en) * | 2021-02-25 | 2024-02-22 | Huawei Technologies Co., Ltd. | Apparatus and method for coding pictures using a convolutional neural network |
CN117689731A (en) * | 2024-02-02 | 2024-03-12 | 陕西德创数字工业智能科技有限公司 | Lightweight new energy heavy-duty truck battery pack identification method based on improved YOLOv5 model |
-
2024
- 2024-04-02 CN CN202410389340.1A patent/CN117975173B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110929603A (en) * | 2019-11-09 | 2020-03-27 | 北京工业大学 | Weather image identification method based on lightweight convolutional neural network |
US20240064318A1 (en) * | 2021-02-25 | 2024-02-22 | Huawei Technologies Co., Ltd. | Apparatus and method for coding pictures using a convolutional neural network |
CN114612477A (en) * | 2022-03-03 | 2022-06-10 | 成都信息工程大学 | Lightweight image segmentation method, system, medium, terminal and application |
CN115690542A (en) * | 2022-11-03 | 2023-02-03 | 国网甘肃省电力公司 | Improved yolov 5-based aerial insulator directional identification method |
CN116309110A (en) * | 2023-01-06 | 2023-06-23 | 南京莱斯电子设备有限公司 | Low-light image defogging method based on lightweight deep neural network |
CN116563844A (en) * | 2023-04-06 | 2023-08-08 | 武汉轻工大学 | Cherry tomato maturity detection method, device, equipment and storage medium |
CN117456480A (en) * | 2023-12-21 | 2024-01-26 | 华侨大学 | Light vehicle re-identification method based on multi-source information fusion |
CN117689731A (en) * | 2024-02-02 | 2024-03-12 | 陕西德创数字工业智能科技有限公司 | Lightweight new energy heavy-duty truck battery pack identification method based on improved YOLOv5 model |
Non-Patent Citations (2)
Title |
---|
李亚 等;: "基于多任务学习的人脸属性识别方法", 计算机工程, no. 03, 15 March 2020 (2020-03-15) * |
郑冬 等;: "基于轻量化SSD的车辆及行人检测网络", 南京师大学报(自然科学版), no. 01, 20 March 2019 (2019-03-20) * |
Also Published As
Publication number | Publication date |
---|---|
CN117975173B (en) | 2024-06-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110866140B (en) | Image feature extraction model training method, image searching method and computer equipment | |
EP3798917A1 (en) | Generative adversarial network (gan) for generating images | |
US8566746B2 (en) | Parameterization of a categorizer for adjusting image categorization and retrieval | |
CN110728298A (en) | Multi-task classification model training method, multi-task classification method and device | |
CN110458282A (en) | Multi-angle multi-mode fused image description generation method and system | |
US20180349743A1 (en) | Character recognition using artificial intelligence | |
CN113312500A (en) | Method for constructing event map for safe operation of dam | |
CN110096964A (en) | A method of generating image recognition model | |
CN110263174B (en) | Topic category analysis method based on focus attention | |
CN114973222B (en) | Scene text recognition method based on explicit supervision attention mechanism | |
CN114495129B (en) | Character detection model pre-training method and device | |
CN114358203A (en) | Training method and device for image description sentence generation module and electronic equipment | |
CN111325237B (en) | Image recognition method based on attention interaction mechanism | |
CN112818774A (en) | Living body detection method and device | |
CN112906780A (en) | Fruit and vegetable image classification system and method | |
CN112348808A (en) | Screen perspective detection method and device | |
CN116343109A (en) | Text pedestrian searching method based on self-supervision mask model and cross-mode codebook | |
Ouf | Leguminous seeds detection based on convolutional neural networks: Comparison of faster R-CNN and YOLOv4 on a small custom dataset | |
Jishan et al. | Hybrid deep neural network for bangla automated image descriptor | |
CN114692750A (en) | Fine-grained image classification method and device, electronic equipment and storage medium | |
Shankar et al. | Comparing YOLOV3, YOLOV5 & YOLOV7 Architectures for Underwater Marine Creatures Detection | |
CN111445545B (en) | Text transfer mapping method and device, storage medium and electronic equipment | |
CN117975173B (en) | Child evil dictionary picture identification method and device based on light-weight visual converter | |
CN111967331A (en) | Face representation attack detection method and system based on fusion feature and dictionary learning | |
Raihan et al. | CNN modeling for recognizing local fish |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |