CN115082873A - Image recognition method and device based on path fusion and storage medium - Google Patents

Image recognition method and device based on path fusion and storage medium Download PDF

Info

Publication number
CN115082873A
CN115082873A CN202110262494.0A CN202110262494A CN115082873A CN 115082873 A CN115082873 A CN 115082873A CN 202110262494 A CN202110262494 A CN 202110262494A CN 115082873 A CN115082873 A CN 115082873A
Authority
CN
China
Prior art keywords
image
sample
weight
feature
normalized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110262494.0A
Other languages
Chinese (zh)
Inventor
汪韬
张睿欣
陈星宇
李绍欣
李季檩
黄飞跃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Cloud Computing Beijing Co Ltd
Original Assignee
Tencent Cloud Computing Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Cloud Computing Beijing Co Ltd filed Critical Tencent Cloud Computing Beijing Co Ltd
Priority to CN202110262494.0A priority Critical patent/CN115082873A/en
Publication of CN115082873A publication Critical patent/CN115082873A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses an image recognition method, an image recognition device and a storage medium based on channel fusion, belonging to the field of artificial intelligence and belonging to the computer vision technology and machine learning, wherein the method comprises the following steps: and acquiring an image to be identified, and extracting the image characteristics of the image to be identified through a shared convolution layer in the image identification model. And acquiring a first normalized image characteristic and a second normalized image characteristic corresponding to the image characteristic through a first batch of normalization layers and a second batch of normalization layers in the image recognition model. And acquiring a first weight and a second weight corresponding to the image features through a path feature fusion module in the image recognition model. And determining the fusion characteristic of the image to be recognized based on the first normalized image characteristic, the first weight, the second normalized image characteristic and the second weight, and determining the image type of the image to be recognized based on the fusion characteristic. By the method and the device, the accuracy of image recognition can be improved, the operation is simple, and the applicability is high.

Description

Image recognition method and device based on path fusion and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to an image recognition method and apparatus based on path fusion, and a storage medium.
Background
Deep learning is currently widely used in many vision tasks, such as image classification, image detection, and face recognition. In the present stage, when the deep learning model is applied to actual services, an attacker can change the model input by adding noise which cannot be perceived by human vision/hearing to a normal sample, so that the model output is tampered, especially in scenes directly involving consumption identity verification such as face brushing verification, and if the robustness of the deep learning model is not high enough, a great security risk is brought.
The inventor of the present application finds, in the course of research and practice, that, for the risk problem of the deep learning model, in the prior art, the deep learning model is trained by using an adjustable confrontation Training (OAT), and according to the risk degrees of different business scenarios, a weighing parameter is manually set to balance the natural precision (the accuracy of a normal sample) and the confrontation precision (the accuracy of a confrontation sample) of the model. For example, a larger trade-off parameter is set in a scene with a larger risk and a higher security requirement, and at the moment, the natural precision of the model is lower but the confrontation precision is higher; and setting smaller balance parameters in a scene with smaller risk and lower safety requirement, wherein the natural precision of the model is higher but the confrontation precision is lower. However, in real scenarios, traffic conditions may vary significantly over time, and therefore fixed trade-off parameters are difficult to adapt to changing traffic scenarios. In addition, after the balance parameters are introduced, the smaller balance parameters are set in a scene with smaller risk and lower safety requirement, so that the damage to the natural precision is lower but the robustness of the corresponding model is not high, and therefore, the problem of inconsistency still exists between the robustness of the model and the natural precision, and the recognition accuracy of the model to the image is low, and the applicability is poor.
Disclosure of Invention
The embodiment of the application provides an image identification method, an image identification device and a storage medium based on channel fusion, which can improve the identification accuracy of image categories and have the advantages of simple operation and high applicability.
A first aspect of an embodiment of the present application provides an image identification method based on path fusion, including:
acquiring an image to be recognized, inputting the image to be recognized into an image recognition model, and extracting image characteristics of the image to be recognized through a shared convolution layer in the image recognition model;
acquiring first normalized image features corresponding to the image features through a first batch of normalization layers in the image recognition model, and acquiring second normalized image features corresponding to the image features through a second batch of normalization layers in the image recognition model, wherein the first batch of normalization layers and the second batch of normalization layers are respectively obtained by training sample images with different noise intensities;
acquiring a first weight and a second weight corresponding to the image features through a path feature fusion module in the image recognition model, wherein the first weight is used for marking the weight occupied by the first normalized image features, the second weight is used for marking the weight occupied by the second normalized image features, and the sum of the first weight and the second weight is 1;
determining a fusion feature of the image to be recognized based on the first normalized image feature, the first weight, the second normalized image feature, and the second weight, and determining an image type of the image to be recognized based on the fusion feature.
With reference to the first aspect, in a possible implementation manner, before the image to be recognized is input to an image recognition model, the method further includes:
acquiring at least three first sample images comprising at least three noise intensities, inputting each first sample image into the image recognition model, and extracting sample image characteristics of each first sample image through a shared convolution layer in the image recognition model;
training at least three batch normalization layers in the image recognition model according to the sample image features of the first sample images to obtain at least three trained batch normalization layers, wherein the sample image features of a sample image with noise intensity are used for training one batch normalization layer;
and determining a first batch normalization layer and a second batch normalization layer of the image recognition model from the at least three batch normalization layers, wherein the first batch normalization layer is obtained by training the sample image with the minimum noise intensity in the at least three sample images, and the second batch normalization layer is obtained by training the sample image with the maximum noise intensity in the at least three sample images.
With reference to the first aspect, in a possible implementation manner, after the determining a first batch normalization layer and a second batch normalization layer of the image recognition model from the at least three batch normalization layers, the method further includes:
deleting the rest of the at least three batch normalization layers except the first batch normalization layer and the second batch normalization layer to obtain a batch normalization layer of the image recognition model.
With reference to the first aspect, in a possible implementation manner, before the obtaining, by the pathway feature fusion module in the image recognition model, the first weight and the second weight corresponding to the image feature, the method further includes:
acquiring a second sample image of a path feature fusion module for training the image recognition model, inputting the second sample image into the image recognition model, and extracting sample image features of the second sample image through a shared convolution layer in the image recognition model;
acquiring first normalized sample image features corresponding to the sample image features through a first batch of normalization layers in the image recognition model, and acquiring second normalized sample image features corresponding to the sample image features through a second batch of normalization layers in the image recognition model;
the path feature fusion module is trained based on the sample image features, the first normalized sample image features, the second normalized sample image features, and the image class label of the second sample image, such that the path feature fusion module outputs a weight corresponding to a normalized image feature output by any image feature through the first batch normalization layer, and a weight corresponding to a normalized image feature output by the second batch normalization layer.
With reference to the first aspect, in one possible implementation manner, the training the path feature fusion module based on the sample image feature, the first normalized sample image feature, the second normalized sample image feature, and an image type label of the second sample image includes:
acquiring a first sample weight and a second sample weight corresponding to the sample image features through a path feature fusion module in the image recognition model, wherein the first sample weight is used for marking the weight occupied by the first normalized sample image features, the second sample weight is used for marking the weight occupied by the second normalized sample image features, and the sum of the first sample weight and the second sample weight is 1;
determining a fusion feature of the second sample image based on the first normalized sample image feature, the first sample weight, the second normalized sample image feature, and the second sample weight, and determining an image type of the second sample image based on the fusion feature of the second sample image;
and calculating a classification loss of the image recognition model according to the image type of the second sample image and the image type label of the second sample image, and adjusting the network parameters of the path fusion feature module based on the classification loss so as to train the path feature fusion module.
With reference to the first aspect, in one possible implementation manner, the determining a fusion feature of the image to be recognized based on the first normalized image feature, the first weight, the second normalized image feature, and the second weight includes:
determining a first weighted image feature according to the first normalized image feature and the first weight, and determining a second weighted image feature according to the second normalized image feature and the second weight;
generating fusion characteristics of the image to be identified according to the first weight image characteristics and the second weight image characteristics;
the fusion characteristics of the image to be recognized meet the following requirements:
X=W 0 BN 0 (x)+W 1 BN 1 (x)
wherein X is the fusion feature of the image to be recognized, W0 is the first weight, BN 0 (x) For the first normalized image feature described above, W 1 Is the above-mentioned second weight, BN 1 (x) For the second normalized image feature, W 0 BN 0 (x) For the first weighted image feature, W 1 BN 1 (x) The second weight characteristic.
In a second aspect, the present application provides an image recognition apparatus based on path fusion, the apparatus comprising:
the first acquisition module is used for acquiring an image to be identified, inputting the image to be identified into an image identification model, and extracting the image characteristics of the image to be identified through a shared convolution layer in the image identification model;
a second obtaining module, configured to obtain a first normalized image feature corresponding to the image feature through a first batch of normalization layers in the image recognition model, and obtain a second normalized image feature corresponding to the image feature through a second batch of normalization layers in the image recognition model, where the first batch of normalization layers and the second batch of normalization layers are obtained by training sample images with different noise intensities respectively;
a third obtaining module, configured to obtain, through a path feature fusion module in the image recognition model, a first weight and a second weight corresponding to the image feature, where the first weight is used to mark a weight occupied by the first normalized image feature, the second weight is used to mark a weight occupied by the second normalized image feature, and a sum of the first weight and the second weight is 1;
a first determining module, configured to determine a fusion feature of the image to be recognized based on the first normalized image feature, the first weight, the second normalized image feature, and the second weight, and determine an image type of the image to be recognized based on the fusion feature.
With reference to the second aspect, in a possible implementation manner, the apparatus further includes:
a fourth obtaining module, configured to obtain at least three first sample images including at least three noise intensities, input each first sample image into the image recognition model, and extract a sample image feature of each first sample image through a shared convolution layer in the image recognition model;
the first training module is used for training at least three batch normalization layers in the image recognition model according to the sample image characteristics of the first sample images so as to obtain at least three batch normalization layers after training, wherein the sample image characteristics of a sample image with noise intensity are used for training one batch normalization layer;
and a second determining module, configured to determine a first batch normalization layer and a second batch normalization layer of the image recognition model from the at least three batch normalization layers, where the first batch normalization layer is trained from a sample image with a minimum noise intensity among the at least three sample images, and the second batch normalization layer is trained from a sample image with a maximum noise intensity among the at least three sample images.
With reference to the second aspect, in a possible implementation manner, the apparatus further includes:
and a deleting module, configured to delete the remaining batch normalization layers except the first batch normalization layer and the second batch normalization layer from the at least three batch normalization layers to obtain a batch normalization layer of the image recognition model.
With reference to the second aspect, in a possible implementation manner, the apparatus further includes:
a fifth obtaining module, configured to obtain a second sample image used for training a path feature fusion module of the image recognition model, input the second sample image into the image recognition model, and extract a sample image feature of the second sample image through a shared convolution layer in the image recognition model;
a sixth obtaining module, configured to obtain, through a first batch of normalization layers in the image recognition model, a first normalized sample image feature corresponding to the sample image feature, and obtain, through a second batch of normalization layers in the image recognition model, a second normalized sample image feature corresponding to the sample image feature;
a second training module, configured to train the path feature fusion module based on the sample image features, the first normalized sample image features, the second normalized sample image features, and the image type labels of the second sample images, so that the path feature fusion module outputs a weight corresponding to a normalized image feature output by the first batch normalization layer for any image feature, and a weight corresponding to a normalized image feature output by the second batch normalization layer.
With reference to the second aspect, in a possible implementation manner, the second training module further includes:
a first obtaining unit, configured to obtain, by a path feature fusion module in the image recognition model, a first sample weight and a second sample weight corresponding to the sample image feature, where the first sample weight is used to mark a weight occupied by the first normalized sample image feature, the second sample weight is used to mark a weight occupied by the second normalized sample image feature, and a sum of the first sample weight and the second sample weight is 1;
a first specification unit configured to specify a fusion feature of the second sample image based on the first normalized sample image feature, the first sample weight, the second normalized sample image feature, and the second sample weight, and specify an image type of the second sample image based on the fusion feature of the second sample image;
and a first adjusting unit configured to calculate a classification loss of the image recognition model based on the image type of the second sample image and the image type label of the second sample image, and adjust the network parameter of the path fusion feature module based on the classification loss.
With reference to the second aspect, in a possible implementation manner, the first determining unit is further configured to:
determining a first weighted image feature according to the first normalized image feature and the first weight, and determining a second weighted image feature according to the second normalized image feature and the second weight;
generating fusion characteristics of the image to be identified according to the first weight image characteristics and the second weight image characteristics;
the fusion characteristics of the images to be recognized meet the following conditions:
X=W 0 BN 0 (x)+W 1 BN 1 (x)
wherein X is the fusion characteristic of the image to be identified, and W 0 Is the above first weight, BN 0 (x) For the first normalized image feature described above, W 1 Is the above second weight, BN 1 (x) Is composed ofThe second normalized image feature, W 0 BN 0 (x) For the first weighted image feature, W 1 BN 1 (x) The second weight characteristic.
In a third aspect, the present application provides a computer device comprising: a processor, a transceiver, a memory, and a network interface; the processor is connected to a memory, a transceiver and a network interface, wherein the network interface is configured to provide a data communication function, the memory is configured to store program codes, and the processor and the transceiver are configured to call the program codes to perform the method according to any one of the possible embodiments of the first aspect and the first aspect of the present application.
In a fourth aspect, the present application provides a computer-readable storage medium storing a computer program comprising program instructions that, when executed by a processor, perform the method performed by any one of the above-mentioned first aspect and possible embodiments of the first aspect of the present application.
In the application, an image to be recognized is obtained, the image to be recognized is input into an image recognition model, and image features of the image to be recognized are extracted through a shared convolution layer in the image recognition model. And then acquiring a first batch of normalized features corresponding to the image features through a first batch of normalization layers in the trained image recognition model, and acquiring a second batch of normalized features corresponding to the image features through a second batch of normalization layers in the trained image recognition model. And acquiring a first weight and a second weight corresponding to the image features through a channel feature fusion module in the trained image recognition model, wherein the first weight is used for marking the weight occupied by the first normalized feature, the second weight is used for marking the weight occupied by the second normalized feature, and the sum of the first weight and the second weight is 1. And finally, determining the fusion characteristics of the image to be recognized based on the first normalized image characteristics, the first weight, the second normalized image characteristics and the second weight, and determining the image type of the image to be recognized based on the fusion characteristics. Because the image features are normalized through the trained first batch normalization layer and the trained second batch normalization layer to obtain corresponding normalized features, the normalization processing of two different batch normalization layers for different images to be recognized can improve the accuracy and the robustness, and in the application, the first weight and the second weight are directly output by the trained channel feature fusion module according to the images to be recognized with any noise intensity, so that the applicability in an actual service scene is higher, and the operation is simpler.
Drawings
In order to more clearly illustrate the technical solutions in the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a schematic diagram of a network architecture provided in an embodiment of the present application;
FIG. 2 is a scene schematic diagram of an image identification method based on path fusion provided in an embodiment of the present application;
FIG. 3 is a schematic flow chart of an image recognition method based on path fusion according to an embodiment of the present disclosure;
FIG. 4 is a schematic structural diagram of an image recognition model provided in an embodiment of the present application;
FIG. 5 is another schematic flowchart of an image identification method based on path fusion according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram illustrating a training process of an image recognition model according to an embodiment of the present disclosure;
FIG. 7 is a schematic diagram of another training process of an image recognition model provided by an embodiment of the present application;
FIG. 8 is a schematic structural diagram of a multi-pass image recognition model provided by an embodiment of the present application;
FIG. 9 is a schematic structural diagram of a single-pass image recognition model provided in an embodiment of the present application;
fig. 10 is a schematic flowchart of a training method of a path fusion module according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of an image recognition apparatus based on pathway fusion according to an embodiment of the present application;
fig. 12 is another schematic structural diagram of an image recognition apparatus based on pathway fusion according to an embodiment of the present application;
fig. 13 is a schematic structural diagram of a computer device provided in an embodiment of the present application.
Detailed Description
The technical solutions in the present application will be described clearly and completely with reference to the accompanying drawings in the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
The image recognition method based on the path fusion provided by the embodiment of the application belongs to Computer Vision technology (CV) and Machine Learning (ML) belonging to the field of artificial intelligence. Computer vision is a science for researching how to make a machine "see", and further, it means that a camera and a computer are used to replace human eyes to perform machine vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition. Machine learning is a multi-field cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formula learning.
Fig. 1 is a network architecture diagram provided in an embodiment of the present application. As shown in fig. 1, the network architecture diagram may include a service server 100 and a user terminal cluster, which may include a user terminal 10a, a user terminal 10b, … and a user terminal 10n, wherein there may be a communication connection between the user terminal cluster, for example, there may be a communication connection between the user terminal 10a and the user terminal 10b, there may be a communication connection between the user terminal 10b and the user terminal 10n, and any user terminal in the user terminal cluster may have a communication connection with the service server 100, for example, there may be a communication connection between the user terminal 10a and the service server 100, and there may be a communication connection between the user terminal 10b and the service server 100.
The user terminal cluster (including the user terminal 10a, the user terminal 10b, and the user terminal 10n) may be installed with a target application. Optionally, the target application may include an application having a function of displaying data information such as text, images, and videos. For example, the target application may be an image category detection application, and may be used for a user to upload an image or a video and detect an image category to which an image recognition object included in the image or the video belongs. Or the target application can also be a face brushing payment application and can be used for checking images or videos collected by a camera of the user terminal and identifying whether the image category in a picture shot by the camera is a person face; if the image type is confirmed as the face of the person, whether the image type of the face of the person is consistent with the target image type of the face of the person corresponding to the payment account is further confirmed. And if the image type of the figure face in the picture shot by the camera is consistent with the target image type of the figure face corresponding to the payment account, performing payment transaction, and if the image type of the figure face in the picture shot by the camera is not consistent with the target image type of the figure face corresponding to the payment account, intercepting the transaction. Optionally, the target application may also be an attendance card punching application based on face recognition, and may be used to check an image or a video acquired by a camera of the user terminal, and recognize whether an image category in a picture shot by the camera is a face of a person. If the image type confirms the person face, whether the image type of the person face is in accordance with the target image type of the person face corresponding to the login account is further confirmed, if the image type of the person face in the picture shot by the camera is in accordance with the target image type of the person face corresponding to the login account, attendance card punching is carried out, and if not, attendance card punching cannot be carried out. The service server 100 in the present application may collect service data, such as images or videos, uploaded by the applications, and optionally, the service data may include images to be identified uploaded by users. For convenience of explanation, the image to be recognized is directly exemplified as the service data. The service server 100 may determine a first normalized image feature, a first weight, a second normalized image feature, and a second weight of the image to be recognized from the images to be recognized, and determine a fusion feature of the image to be recognized based on the first normalized image feature, the first weight, the second normalized image feature, and the second weight. Subsequently, the service server 100 determines the image type of the image to be recognized based on the fusion feature, and returns the image type to the user terminal. Optionally, the user terminal may be any one user terminal selected from the user terminal cluster in the embodiment corresponding to fig. 1, for example, the user terminal may be the user terminal 10b, and the user may view the image category of the image to be recognized on the display page of the user terminal 10 b.
It is to be understood that the method provided in the embodiment of the present application may be executed by a computer device, where the computer device includes, but is not limited to, a terminal or a server, the service server 100 in the embodiment of the present application may be a computer device, and a user terminal in a user terminal cluster may also be a computer device, which is not limited herein. The service server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN, big data and artificial intelligence platform. The terminal may include: smart terminals carrying an image recognition function (e.g., recognizing a facial region of a person in an image) such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart television, a smart speaker, a desktop computer, a smart watch, and the like, but are not limited thereto. The user terminal and the service server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.
Referring to fig. 2, fig. 2 is a scene schematic diagram of an image recognition method based on path fusion according to an embodiment of the present application. As shown in fig. 2, when the user a uses a target application (such as a face-brushing payment application) in the user terminal, the user a uploads an image 20a through the user terminal 10b, the image 20a includes an object to be identified (such as a person), and then the user terminal 10b sends an image identification request for the image 20a to the service server 100. Specifically, the service server 100 may detect and collect an image 20a to be recognized, which includes a portrait, uploaded by a user, input the image 20a to be recognized into an image recognition model, extract image features of the image to be recognized based on a shared convolution layer in the image recognition model, and then the service server 100 obtains normalized image features corresponding to the image features through a first batch normalization layer in the image recognition model and obtains second normalized image features corresponding to the image features through a second batch normalization layer in the image recognition model, where the first batch normalization layer and the second batch normalization layer are respectively obtained by training sample images with different noise intensities. And then acquiring a first weight and a second weight corresponding to the image features through a path feature fusion module in the image recognition model, wherein the first weight is used for marking the weight occupied by the first normalized image features, the second weight is used for marking the weight occupied by the second normalized image features, and the sum of the first weight and the second weight is 1. Finally, the service server 100 may determine, by the image recognition model, a fusion feature of the image to be recognized based on the first normalized image feature, the first weight, the second normalized image feature, and the second weight, and determine an image type of the image to be recognized based on the fusion feature. Optionally, the service server 100 may return the image category of the image to be recognized, which is determined based on the image recognition model, to the user terminal 10b, and then the user terminal 10b may determine whether the face verification passes based on the image category returned by the service server 100, and if the face verification passes, the user terminal enters a payment link, and then the user a may view a display page on the user terminal 10b, where the payment is successful; if the verification is not passed and the transaction is intercepted, user a may view a display page on user terminal 10b that payment failed.
When the service server 100 obtains the first normalized feature and the second normalized feature of the image to be recognized through the image recognition model, the first normalized feature and the second normalized feature can be obtained through a first normalization layer and a second normalization layer of the image recognition model. The service server 100 may obtain the first weight and the second weight through the path feature fusion module when obtaining the first weight and the second weight through the image recognition model. Therefore, in order to improve the prediction accuracy of the first normalization layer, the second normalization layer and the path feature fusion module in the image recognition model for the first normalization feature, the second normalization feature, the first weight and the second weight, the first normalization layer, the second normalization layer and the path feature fusion module in the image recognition model may be trained and adjusted to optimize the trained and adjusted first normalization layer, second normalization layer and path feature fusion module. For specific processes of obtaining the trained first batch of normalization layers and the trained second batch of normalization layers, reference may be made to the following description of steps S201 to S203 in the corresponding embodiment of fig. 5. For a specific process of obtaining the trained path feature fusion module, reference may be made to the following description of steps S301 to S303 in the corresponding embodiment of fig. 7.
Optionally, if the trained image recognition model is locally stored in the user terminal 10b, the user terminal 10b may locally perform image type recognition on the image to be recognized. Since training the first normalization layer, the second normalization layer, and the path feature fusion module in the image recognition model involves a large amount of offline computation, the local image recognition model of the user terminal may be sent to the user terminal 10b after being trained by the service server 100, and may be specifically determined according to an actual application scenario, which is not limited herein.
Further, for convenience of understanding, please refer to fig. 3, where fig. 3 is a schematic flowchart of an image identification method based on path fusion according to an embodiment of the present application. The method may be performed by a user terminal (e.g., the user terminal shown in fig. 1 or fig. 2) or may be performed by both the user terminal and a service server (e.g., the service server 100 in the embodiment corresponding to fig. 1 or fig. 2). For ease of understanding, the present embodiment is described as an example in which the method is executed by the user terminal described above. The image identification method based on the path fusion at least comprises the following steps S101-S104:
s101, obtaining an image to be recognized, inputting the image to be recognized into an image recognition model, and extracting image features of the image to be recognized through a shared convolution layer in the image recognition model.
In some possible embodiments, please refer to fig. 4, fig. 4 is a schematic structural diagram of an image recognition model provided in the present application, and as shown in fig. 4, the shared convolutional layer functions to perform feature extraction on an input image to be recognized. Convolution characteristics of the image to be recognized in different dimensions are obtained through a plurality of shared convolution layers in the image recognition model, and the characteristics extracted by each shared convolution layer can be abstractly combined into higher-order characteristics in the next layer. Based on the convolution characteristics, each convolution layer can output a corresponding characteristic map, and one point on the characteristic map corresponds to one area of the input image. The image features extracted from each shared convolution layer can also exist in a three-dimensional form, namely, the three-dimensional features formed by overlapping a plurality of two-dimensional pictures can be seen. Each two-dimensional image is a feature map, if the two-dimensional image is a gray picture, only one feature map is mutually superposed, and if the two-dimensional image is a color picture, three different feature maps (red, green and blue) are usually mutually superposed. Optionally, a plurality of convolution kernels may also exist between layers of each shared convolution layer, and the feature map generated by each shared convolution layer is a convolution result of the feature map corresponding to the previous shared convolution layer and the convolution kernels between the two layers, that is, how many convolution kernels between layers of each shared convolution layer generate the same number of feature maps. With the increase of the depth of the image recognition model, the extracted features of each shared convolution layer become more and more detailed, and the size of the feature map becomes smaller and smaller, so that the number of the feature maps of the next convolution layer can be increased to more fully extract the features of the previous layer until the image features which can fully represent the image to be recognized are output.
S102, acquiring a first normalized image characteristic and a second normalized image characteristic corresponding to the image characteristic through a first batch of normalization layers and a second batch of normalization layers in the image recognition model.
In some possible embodiments, a first normalized image feature corresponding to the image feature is obtained through a first batch normalization layer in the image recognition model, and a second normalized image feature corresponding to the image feature is obtained through a second batch normalization layer in the image recognition model. The first normalization layer and the second normalization layer are obtained by training sample images with different noise intensities, wherein the specific process of training the sample images with different noise intensities and obtaining the trained first normalization layer and second normalization layer may refer to step S203 in the subsequent embodiment corresponding to fig. 5, and is not described herein again.
Specifically, the input image features are subjected to averaging and variance Normalization through a Batch Normalization (BN) layer, so that the problem that the network learning speed is slow and even the learning is difficult due to the fact that feature distribution in a deep network is more sporadic is solved to a certain extent. Therefore, the batch normalization layer is connected in series after a plurality of shared convolution layers of the image recognition model, so that the model training is accelerated and the model precision is improved. Referring to fig. 4, as shown in fig. 4, in an alternative embodiment of the present application, the image features corresponding to the image to be recognized with any noise intensity are input into the trained first normalization layer and the trained second normalization layer, so as to obtain the corresponding first normalized image features BN 0 (x) And a second normalized image feature BN 1 (x)。
S103, acquiring a first weight and a second weight corresponding to the image features through a path feature fusion module in the image recognition model.
In some possible embodiments, please refer to fig. 4, as shown in fig. 4, the image features are input into a trained path feature fusion module, and a first weight W corresponding to the image features is obtained 0 And a second weight W 1 The first weight W 0 For markingThe weight of the first normalized image feature, the second weight W 1 The weight of the second normalized image feature is marked, and the first weight W 0 And the above-mentioned second weight W 1 The sum is 1. When the path feature fusion module is trained, other components in the image model need to be fixed to maintain respective performances of the first normalization layer and the second normalization layer, and meanwhile, the trained path feature fusion module can further extract features according to the acquired image features and generate a first weight W for marking the first normalization features 0 And a second weight W for marking a second normalized feature 1 And is more accurate. Please refer to steps S301 to S303 in the corresponding embodiment of fig. 7, which is not described herein again.
And S104, determining fusion characteristics of the image to be recognized, and determining the image category of the image to be recognized based on the fusion characteristics.
In some possible embodiments, a first weighted image feature is determined according to the first normalized image feature and the first weight, a second weighted image feature is determined according to the second normalized image feature and the second weight, a fusion feature of the image to be recognized is generated according to the first weighted image feature and the second weighted image feature, and the image category of the image to be recognized is determined based on the fusion feature. The accuracy of image category prediction is improved by carrying out image recognition on the image to be recognized through the fusion features.
Specifically, referring to fig. 4, as shown in fig. 4, the fusion feature of the image to be recognized satisfies the following conditions:
X=W 0 BN 0 (x)+W 1 BN 1 (x)
wherein X is the fusion characteristic of the image to be identified, and W 0 Is the above first weight, BN 0 (x) For the first normalized image feature described above, W 1 Is the above-mentioned second weight, BN 1 (x) For the second normalized image feature, W 0 BN 0 (x) Is the first weight mapImage feature, W 1 BN 1 (x) The second weight characteristic.
In the application, an image to be recognized is obtained, the image to be recognized is input into an image recognition model, and image features of the image to be recognized are extracted through a shared convolution layer in the image recognition model. And then acquiring a first batch of normalized features corresponding to the image features through a first batch of normalization layers in the trained image recognition model, and acquiring a second batch of normalized features corresponding to the image features through a second batch of normalization layers in the trained image recognition model. And acquiring a first weight and a second weight corresponding to the image features through a path feature fusion module in the trained image recognition model, wherein the first weight is used for marking the weight occupied by the first normalized feature, the second weight is used for marking the weight occupied by the second normalized feature, and the sum of the first weight and the second weight is 1. And finally, determining a first weight image characteristic according to the first normalized image characteristic and the first weight, determining a second weight image characteristic according to the second normalized image characteristic and the second weight, generating a fusion characteristic of the image to be identified according to the first weight image characteristic and the second weight image characteristic, and determining the image type of the image to be identified based on the fusion characteristic. Because the image features are normalized through the trained first batch normalization layer and the trained second batch normalization layer to obtain corresponding normalized features, the accuracy and robustness of an image recognition model can be improved by normalizing different images to be recognized through two different batch normalization layers, and in the application, the first weight and the second weight are directly output by the trained channel feature fusion module according to the images to be recognized with any noise intensity, so that the applicability in an actual service scene is higher, and the operation is simpler.
In some possible embodiments, please refer to fig. 5, and fig. 5 is a flowchart illustrating a training method of a batch normalization layer according to an embodiment of the present disclosure. The method may be performed by a user terminal (e.g., the user terminal shown in fig. 1 or fig. 2) or may be performed by both the user terminal and a service server (e.g., the service server 100 in the embodiment corresponding to fig. 1 or fig. 2). For ease of understanding, the present embodiment is described as an example in which the method is executed by the user terminal described above. The training method of the batch normalization layer at least includes the following steps S201 to S203:
s201, at least three first sample images including at least three noise intensities are obtained, and sample image features of the first sample images are extracted through a shared convolution layer in an image recognition model.
In some possible embodiments, please refer to fig. 6, and fig. 6 is a schematic diagram of a training process of an image recognition model provided in the present application. As shown in fig. 6, the acquired at least three first sample images including at least three noise intensities include a normal sample image (a sample image having a noise intensity equal to 0) and a countermeasure sample image (a sample image having a noise intensity greater than 0), where the countermeasure sample image refers to a sample image obtained by adding a certain noise intensity to the normal sample image by a certain method. The training at this moment is called as confrontation training, namely, the confrontation sample image is also taken as a training sample to be added into a training sample set, and the normal sample image and the confrontation sample image are trained simultaneously, so that the robustness of the image recognition model to the confrontation sample image can be improved along with the gradual increase of the training times of the image recognition model, and meanwhile, the image recognition model has natural precision not lower than that of a common training model and higher confrontation precision. The natural precision refers to the recognition accuracy of the image recognition model to the normal sample image, and the countermeasure precision refers to the recognition accuracy of the image recognition model to the countermeasure sample image. In an optional embodiment of the present application, the process of generating the countermeasure sample image by adding a certain noise intensity to the normal sample image by a certain method may be implemented by an iterative attack. And simultaneously inputting the confrontation sample image and the normal sample image which are obtained through the iterative attack into the shared convolution layer in the image recognition model, and extracting the sample image characteristics of each sample image.
Specifically, a challenge sample may be obtained using a Gradient based attack (PGD). The method comprises the steps of carrying out iterative attack on an input normal sample image for multiple times, inputting the normal sample image into an input layer and an output layer of the iterative attack in each step of iterative attack to determine an input value and an output value of the normal sample image, and determining a loss value according to a difference value between the output value of the normal sample image and a real output label value of the normal sample image. And reversely propagating the loss value to an input layer of the iterative attack, and calculating the sample gradient of the normal sample image according to the input value and the loss value of the normal sample image. As shown in formula (1):
Figure BDA0002970590070000161
wherein, in the formula (1), X t For the input sample, y is the true label, L is the loss function,
Figure BDA0002970590070000162
denotes the gradient calculation, g t Representing the gradient calculation result.
In some possible embodiments, since the noise added to the normal sample image is usually limited to a certain range (determined according to the implementation scenario), the sample gradient is projected to a certain threshold (determined according to the implementation scenario) as the noise intensity. Alternatively, the noise intensity may be expressed as g t /||g t L. As shown in formula (2):
X t+1 =∏ X+S (X t +∈(g t /||g t ||)) (2)
and (2) the iteration of each step is accumulated based on the noise of the previous step, but the accumulated value does not exceed the element, wherein the element plays a role in restricting the noise strength and restricts the attack effect on the sample image. The larger the epsilon is, the larger the generated image noise intensity of the confrontation sample is, the more easily the confrontation sample is perceived by human eyes, and the stronger the corresponding attack effect is; the smaller the epsilon, the lower the generated strength of the noise of the confrontation sample image, the less easily the confrontation sample image is perceived by human eyes, and the lower the corresponding attack effect is. For convenience of description, the following will exemplify the case where the noise strength is represented by ∈. Optionally, referring to fig. 6, as shown in fig. 6, the noise intensity is added to the normal sample image to generate a countermeasure sample image, and the first sample image including the countermeasure sample image and the normal sample image is input into the shared convolution layer in the image recognition model and the sample image features of the sample images are extracted.
S202, training at least three batch normalization layers in the image recognition model according to the sample image features of the first sample images to obtain at least three batch normalization layers after training.
In some possible embodiments, at least three batch normalization layers in the image recognition model are trained according to the acquired sample image features of each first sample image to obtain at least three trained batch normalization layers, wherein the sample image features of a noise-intensity sample image are used for training one batch normalization layer. Optionally, the at least three batch normalization layers may be five batch normalization layers. Referring to fig. 6, as shown in fig. 6, the image recognition model includes five batch normalization layers connected in parallel, and all the batch normalization layers use the same shared convolution layer and the same activation layer, thereby forming a multi-pass image recognition model. The shared convolution layer is used for extracting image features, each batch of normalization layers are used for normalizing the extracted image features, and then the normalized image features are input into the activation layer to be subjected to nonlinear transformation so that the learning capability of the image recognition model is stronger. Because various noise intensity countermeasure sample images may be encountered in an actual business scene, various noise intensity countermeasure sample images are added during training, and the countermeasure sample images are trained by selecting different batch normalization layers according to different noise intensities, so that the disentanglement of the sample image features with different noise intensities is fully realized. De-entanglement is also referred to herein as decoupling, i.e., transforming features of the original three-dimensional sample image features entangled with each other into a space of better characterized features where the transformation of different features may be separated from each other. For example, a face data set in sample images with different noise intensities is subjected to convolution operation through a shared convolution layer, and different sample image features are input into corresponding batch normalization layers according to the noise intensities, so that separate feature representation of information such as whether each face in the face data set smiles and hair color can be obtained.
Specifically, in an alternative embodiment of the present application, the at least three batch normalization layers may be five batch normalization layers, and the corresponding image recognition model is a five-way image recognition model. Optionally, the at least three first sample images including at least three noise intensities may be five first sample images including five noise intensities, where the five noise intensities may be {0, 1, 2, 4, 8 }. Here, {0, 1, 2, 4, 8} is merely an example, and may be determined according to an actual application scenario, and is not limited herein. For example, as shown in fig. 6, the corresponding sample image feature with the noise intensity of 0 may be input into the batch normalization layer 1 in fig. 6 for training, the corresponding sample image feature with the noise intensity of 1 may be input into the batch normalization layer 2 for training, the corresponding sample image feature with the noise intensity of 2 may be input into the batch normalization layer 3 for training, the corresponding sample image feature with the noise intensity of 4 may be input into the batch normalization layer 4 for training, the corresponding sample image feature with the noise intensity of 8 may be input into the batch normalization layer 5 for training, and finally the batch normalization layer 1, the batch normalization layer 2, the batch normalization layer 3, the batch normalization layer 4, and the batch normalization layer 5 after training may be obtained. In the application, if the noise intensity of the sample image is known and an appropriate batch normalization layer is selected for training, the image recognition model can obtain the best accuracy.
S203, a first batch normalization layer and a second batch normalization layer of the image recognition model are determined from the at least three batch normalization layers.
In some possible embodiments, please refer to fig. 7, fig. 7 is another schematic diagram of a training process of the image recognition model provided in the present application, and as shown in fig. 7, a first batch normalization layer and a second batch normalization layer of the image recognition model are determined from at least three batch normalization layers, where the first batch normalization layer is trained from a sample image with the smallest noise intensity (for example, noise intensity 0) among the three sample images, and in this case, the first batch normalization layer may be batch normalization layer 1. The second batch normalization layer is trained from the sample image with the highest noise intensity (for example, noise intensity 8) among the three sample images, and in this case, the first batch normalization layer may be batch normalization layer 5. Optionally, after the first batch normalization layer and the second batch normalization layer of the image recognition model are determined from the at least three batch normalization layers, the rest of the at least three batch normalization layers except the first batch normalization layer and the second batch normalization layer may be masked or deleted to obtain the batch normalization layer of the image recognition model. This is because in an actual business scene, the noise intensity of the confrontation sample image may be an arbitrary decimal, the five noise intensities {0, 1, 2, 4, 8} may not completely correspond to each other, and when the noise intensity of the confrontation sample image cannot be obtained, the accuracy of using the five-path image recognition model is not high enough. Therefore, in an alternative embodiment of the present application, a five-pass image recognition model as shown in fig. 6 is constructed, and the batch normalization layer 1, the batch normalization layer 2, the batch normalization layer 3, the batch normalization layer 4, and the batch normalization layer 5 are trained by using sample image features corresponding to five noise intensities, and after the training is completed, the batch normalization layer 2, the batch normalization layer 3, and the batch normalization layer 4 are deleted, the batch normalization layer 1 is used as a first batch normalization layer in the image recognition model, and the batch normalization layer 5 is used as a second batch normalization layer in the image recognition model. As shown in fig. 7, the trained first normalization layer and second normalization layer may have the processing capability of all the sample image features corresponding to the noise intensities between 0 and 8. The five-path image recognition model with five normalization layers is constructed at the initial stage of training to improve the feature extraction capability of the shared convolution layer on each noise intensity confrontation sample image, the batch normalization layer 1 and the batch normalization layer 5 in fig. 6 are reserved, and the deletion of the batch normalization layer 2, the batch normalization layer 3 and the batch normalization layer 4 is to realize the processing on any noise intensity corresponding sample image feature, so that the accuracy of the image recognition model is improved and the actual application scene is enlarged.
In the application, at least three first sample images including at least three noise intensities are extracted through a shared convolution layer in an image recognition model, and then the sample image features of the first sample images are input into at least three batch normalization layers in the image recognition model for training. And finally, taking the batch normalization layer with the minimum noise intensity in the processed three sample images as a first batch normalization layer, taking the batch normalization layer with the maximum noise intensity in the processed three sample images as a second batch normalization layer, and deleting the rest batch normalization layers except the first batch normalization layer and the second batch normalization layer in the at least three batch normalization layers to obtain the batch normalization layer of the image recognition model. In the present application, during training of the image recognition model, a multi-pass image recognition model is first constructed, as shown in fig. 8, fig. 8 is a schematic structural diagram of the multi-pass image recognition model provided in the present application, where fig. 8 includes at least three batch normalization layers. The first sample image is input into the multi-channel image recognition model shown in fig. 8 for training, the extraction capability of the shared convolution layer on the image features is improved, and the processing capability of the first batch of normalization layers and the second batch of normalization layers on the sample image features corresponding to any noise intensity is realized by deleting the rest batch of normalization layers. As shown in fig. 9, fig. 9 is a schematic structural diagram of a single-pass image recognition model provided in the present application. Through training of the multi-channel image recognition model, the natural precision and the confrontation precision of a first batch of normalization layers can be superior to those of batch normalization layers in the single-channel image recognition model shown in fig. 9, and the natural precision and the confrontation precision of a second batch of normalization layers can also be superior to those of batch normalization layers in the single-channel image recognition model shown in fig. 9, so that the multi-channel image recognition model is superior to the single-channel image recognition model shown in fig. 9, and the multi-channel image recognition model is simple in operation and high in applicability.
In some possible embodiments, please refer to fig. 10 together, and fig. 10 is a flowchart illustrating a training method of a path fusion module according to an embodiment of the present disclosure. The method may be performed by a user terminal (e.g., the user terminal shown in fig. 1 or fig. 2) or may be performed by both the user terminal and a service server (e.g., the service server 100 in the embodiment corresponding to fig. 1 or fig. 2). For ease of understanding, the present embodiment is described as an example in which the method is executed by the user terminal described above. The training method of the path fusion module at least comprises the following steps S301 to S305:
s301, a second sample image of the path feature fusion module for training the image recognition model is obtained, and sample image features of the second sample image are extracted through the shared convolution layer in the image recognition model.
In some possible embodiments, the second sample image includes a challenge sample of any noise intensity. Specifically, for a specific implementation of obtaining the sample image feature of the second sample image in step S301, reference may be made to the description of obtaining the sample image feature of the first sample image in step S201 in the embodiment corresponding to fig. 5, and details will not be repeated here.
S302, acquiring a first normalized sample image characteristic and a second normalized sample image characteristic corresponding to the sample image characteristic through a first batch normalization layer and a second batch normalization layer in the image recognition model.
In some possible embodiment modes, normalization parameters obtained after training of the first normalization layer and the second normalization layer may be obtained. The first normalization layer normalizes the sample image characteristics according to the corresponding normalization parameters to obtain first normalized sample image characteristics; and the second batch of normalization layers normalize the sample image characteristics according to the corresponding normalization parameters and obtain second normalized sample image characteristics. Therefore, a first normalized sample image feature corresponding to the sample image feature may be obtained through a first batch of normalization layers in the image recognition model, and a second normalized sample image feature corresponding to the sample image feature may be obtained through a second batch of normalization layers in the image recognition model. Here, the first batch normalization layer may be the batch normalization layer 1 shown in fig. 6, and the second batch normalization layer may be the batch normalization layer 5 shown in fig. 6, that is, when the road feature fusion module is trained, the batch normalization layer in the image recognition model is trained, and at this time, the batch normalization layer in the image recognition model is the first batch normalization layer and the second batch normalization layer. At this time, the first normalization layer, the second normalization layer and other components in the image model are fixed, so that the respective performances of the first normalization layer, the second normalization layer and other components can be maintained, the accuracy of the trained channel feature fusion module for outputting the weight of the normalized features is improved, and the method is simple to operate and high in applicability.
S303, training the path characteristic fusion module based on the sample image characteristics, the first normalized sample image characteristics, the second normalized sample image characteristics and the image category label of the second sample image.
In some possible embodiments, a first sample weight and a second sample weight corresponding to the sample image feature are obtained by a path feature fusion module in an image recognition model, the first sample weight is used for marking the weight occupied by the first normalized sample image feature, the second sample weight is used for marking the weight occupied by the second normalized sample image feature, and the sum of the first sample weight and the second sample weight is 1. A fused feature of the second sample image is determined based on the first normalized sample image feature, the first sample weight, the second normalized sample image feature, and the second sample weight.
Wherein the fusion characteristic of the second sample image satisfies:
X sample (A) =W Sample 0 BN Sample 0 (x)+W Sample 1 BN Sample 1 (x)
In the formula, X Sample (A) W is a fusion feature of the second sample image Sample 0 Is the first sample weight, BN Sample 0 (x) For the first normalized sample image feature, W Sample 1 For the above second sample weight, BN Sample 1 (x) The second normalized sample image feature is described above.
In some possible embodiments, the image class of the second sample image is determined based on the fusion feature of the second sample image, and the classification loss of the image recognition model is calculated according to the image class of the second sample image and the image class label of the second sample image, so as to adjust the network parameter of the path fusion feature module based on the classification loss. Optionally, if it is determined that the image category of the second sample image is for image classification, the cross entropy loss function may be used to calculate the classification loss of the image category of the second sample image and the image category label of the second sample image; if the image class of the second sample image is determined to be for face recognition, the classification loss of the image class of the second sample image and the image class label of the second sample image may be calculated using an Arcface loss function. And finally, adjusting the network parameters of the path fusion feature module based on the classification loss so that the path feature fusion module outputs the weight corresponding to the normalized image features of any image feature output by the first batch of normalization layers and the weight corresponding to the normalized image features output by the second batch of normalization layers. When the path feature fusion module is trained, other components in the image model need to be fixed to maintain respective performances of the first normalization layer and the second normalization layer, and meanwhile, the trained path feature fusion module can further extract features according to the acquired image features and enable the generated first weight for marking the first normalization features and the generated second weight for marking the second normalization features to be more accurate.
In the present application, a sample image feature of a second sample image is extracted by a shared convolution layer in an image recognition model, a first normalized sample image feature and a second normalized sample image feature corresponding to the sample image feature are acquired by a first normalization layer and a second normalization layer in the image recognition model, a first sample weight and a second sample weight corresponding to the sample image feature are acquired by a path feature fusion module in the image recognition model, a fusion feature of the second sample image is determined based on the first normalized sample image feature, the first sample weight, the second normalized sample image feature and the second sample weight, and an image type of the second sample image is determined based on the fusion feature of the second sample image. And finally, the trained channel feature fusion module can extract the image features of the image to be recognized and automatically adjust a first weight for marking the first normalized image features and a second weight for marking the second normalized image features. Therefore, the method has higher applicability in the actual service scene and simpler operation.
Further, please refer to fig. 11, fig. 11 is a schematic structural diagram of an image recognition apparatus based on pathway fusion according to the present application. The image recognition device based on path fusion can be a computer program (including program code) running in a computer device, for example, the image recognition device based on path fusion is an application software; the apparatus may be adapted to perform the corresponding steps in the methods provided herein. As shown in fig. 11, the image recognition apparatus based on path fusion includes: the device comprises a first obtaining module 10, a second obtaining module 20, a third obtaining module 30 and a first determining module 40.
A first obtaining module 10, configured to obtain an image to be recognized, input the image to be recognized into an image recognition model, and extract an image feature of the image to be recognized through a shared convolution layer in the image recognition model;
a second obtaining module 20, configured to obtain first normalized image features corresponding to the image features through a first normalization layer in the image recognition model, and obtain second normalized image features corresponding to the image features through a second normalization layer in the image recognition model, where the first normalization layer and the second normalization layer are obtained through training of sample images with different noise intensities, respectively;
a third obtaining module 30, configured to obtain, through a path feature fusion module in the image recognition model, a first weight and a second weight corresponding to the image feature, where the first weight is used to mark a weight occupied by the first normalized image feature, the second weight is used to mark a weight occupied by the second normalized image feature, and a sum of the first weight and the second weight is 1;
a first determining module 40, configured to determine a fusion feature of the image to be recognized based on the first normalized image feature, the first weight, the second normalized image feature, and the second weight, and determine an image type of the image to be recognized based on the fusion feature.
In some possible embodiments, the first obtaining module 10 may obtain an image to be recognized, where the image to be recognized may be an image uploaded by a user terminal or an image captured by a camera of the user terminal. The first obtaining module 10 may input the obtained image to be recognized into an image recognition model, and obtain the image features of the image to be recognized through the image recognition model. As shown in fig. 4, the shared convolutional layer in the image recognition model may perform feature extraction on the input image to be recognized to obtain the image features of the image to be recognized. The second obtaining module 20 may obtain a first normalized image feature corresponding to the image feature through a first batch normalization layer in the image recognition model, and obtain a second normalized image feature corresponding to the image feature through a second batch normalization layer in the image recognition model. The method has the advantages that the mean value and the variance of the input image features are normalized through the batch normalization layer, so that the problem that the network learning speed is slow and even the learning is difficult due to scattered feature distribution in a deep network is solved to a certain extent. Therefore, the batch normalization layer is connected in series after the plurality of shared convolution layers of the image recognition model, so that the model training is accelerated and the model precision is improved. As shown in fig. 4, in an alternative embodiment of the present application, the image features corresponding to the image to be recognized with any noise intensity are input into the trained first normalization layer and the second normalization layer, so as to obtain the corresponding first normalized image features BN 0 (x) And a second normalized image featureBN characterization 1 (x)。
In some possible embodiments, the third obtaining module 30 may obtain the first weight W corresponding to the image feature through a path feature fusion module in the image recognition model 0 And a second weight W 1 The path feature fusion module is a trained path feature fusion module, and when the path feature fusion module is trained, other components in the image model need to be fixed to maintain respective performances of the first normalization layer and the second normalization layer, and meanwhile, the trained path feature fusion module can further extract features according to the acquired image features and generate a first weight W for marking the first normalization features 0 And a second weight W for marking a second normalized feature 1 And is more accurate. As shown in fig. 4, in an alternative embodiment of the present application, the image features are input into a trained path feature fusion module, and a first weight W corresponding to the image features is obtained 0 And a second weight W 1 . The first determination module 40 may be based on the first normalized image feature BN described above 0 (x) The first weight W 0 The second normalized image feature BN 1 (x) And the second weight W 1 Determining the fusion characteristic X of the image to be recognized, and determining the image category of the image to be recognized based on the fusion characteristic. In an alternative embodiment of the present application, the above-mentioned feature may be X ═ W by fusion 0 BN 0 (x)+W 1 BN 1 (x) .1. the The accuracy of image category prediction is improved by carrying out image recognition on the image to be recognized through the fusion features.
In the present application, an image to be recognized is obtained by the first obtaining module 10, the image to be recognized is input into an image recognition model, and image features of the image to be recognized are extracted through a shared convolution layer in the image recognition model. Then, the second obtaining module 20 obtains a first batch of normalized features corresponding to the image features through a first batch of normalization layers in the trained image recognition model, and obtains a second batch of normalized features corresponding to the image features through a second batch of normalization layers in the trained image recognition model. The third obtaining module 30 obtains a first weight and a second weight corresponding to the image feature through a path feature fusion module in the trained image recognition model, where the first weight is used to mark the weight occupied by the first normalized feature, the second weight is used to mark the weight occupied by the second normalized feature, and the sum of the first weight and the second weight is 1. Finally, the first determining module 40 determines a fusion feature of the image to be recognized based on the first normalized image feature, the first weight, the second normalized image feature, and the second weight, and determines an image type of the image to be recognized based on the fusion feature. Because the image features are normalized through the trained first batch normalization layer and the trained second batch normalization layer to obtain corresponding normalized features, the accuracy and robustness of an image recognition model can be improved by normalizing different images to be recognized through two different batch normalization layers, and in the application, the first weight and the second weight are directly output by the trained channel feature fusion module according to the images to be recognized with any noise intensity, so that the applicability in an actual service scene is higher, and the operation is simpler.
In a possible implementation, referring to fig. 12, the apparatus further includes:
a fourth obtaining module 50, configured to obtain at least three first sample images including at least three noise intensities, input each first sample image into the image recognition model, and extract a sample image feature of each first sample image through a shared convolution layer in the image recognition model;
a first training module 60, configured to train at least three batch normalization layers in the image recognition model according to sample image features of the first sample images, so as to obtain at least three batch normalization layers after training, where a sample image feature of a sample image with a noise intensity is used to train one batch normalization layer;
a second determining module 70, configured to determine a first batch normalization layer and a second batch normalization layer of the image recognition model from the at least three batch normalization layers, where the first batch normalization layer is trained from the sample image with the smallest noise intensity among the at least three sample images, and the second batch normalization layer is trained from the sample image with the largest noise intensity among the at least three sample images.
In a possible embodiment, the above apparatus further comprises:
a deleting module 80, configured to delete the remaining batch normalization layers except the first batch normalization layer and the second batch normalization layer from the at least three batch normalization layers to obtain a batch normalization layer of the image recognition model.
In some possible embodiments, the fourth obtaining module 50 may obtain at least three first sample images including at least three noise intensities, including a normal sample image (a sample image with a noise intensity equal to 0) and a countermeasure sample image (a sample image with a noise intensity greater than 0), where the countermeasure sample image refers to a sample image obtained by adding a certain noise intensity to the normal sample image by a certain method. Namely, the confrontation sample image is also taken as a training sample and added into the training sample set, the normal sample image and the confrontation sample image are trained simultaneously, the robustness of the image recognition model to the confrontation sample image can be improved along with the gradual increase of the training times of the image recognition model, and meanwhile, the image recognition model has natural precision not lower than that of a common training model and higher confrontation precision. In an optional embodiment of the present application, the process of generating the countermeasure sample image by adding a certain noise intensity to the normal sample image by a certain method may be implemented by an iterative attack. And simultaneously inputting the confrontation sample image and the normal sample image which are obtained through the iterative attack into the shared convolution layer in the image recognition model, and extracting the sample image characteristics of each sample image. Alternatively, as shown in fig. 6, the noise intensity is added to the normal sample image to generate a countermeasure sample image, and the first sample image including the countermeasure sample image and the normal sample image is input to the shared convolution layer in the image recognition model and the sample image features of the sample images are extracted.
In some possible embodiments, the first training module 60 may train at least three batch normalization layers in the image recognition model according to the acquired sample image features of each first sample image to obtain at least three trained batch normalization layers, wherein the sample image features of the sample image with one noise intensity are used for training one batch normalization layer. Optionally, the at least three batch normalization layers may be five batch normalization layers. As shown in fig. 6, the image recognition model includes five batch normalization layers connected in parallel, and the image recognition model of multiple channels is configured by using the same shared convolution layer and the same activation layer for all the batch normalization layers. The shared convolution layer is used for extracting image features, each batch of normalization layers are used for normalizing the extracted image features, and then the normalized image features are input into the activation layer to be subjected to nonlinear transformation so that the learning capability of the image recognition model is stronger. Because various noise intensity countermeasure sample images may be encountered in an actual business scene, various noise intensity countermeasure sample images are added during training, and the countermeasure sample images are trained by selecting different batch normalization layers according to different noise intensities, so that the disentanglement of the sample image features with different noise intensities is fully realized. In an optional embodiment of the present application, as shown in fig. 6, the corresponding sample image features with noise intensity of 0 may be input into the batch normalization layer 1 in fig. 6 for training, the corresponding sample image features with noise intensity of 1 may be input into the batch normalization layer 2 for training, the corresponding sample image features with noise intensity of 2 may be input into the batch normalization layer 3 for training, the corresponding sample image features with noise intensity of 4 may be input into the batch normalization layer 4 for training, the corresponding sample image features with noise intensity of 8 may be input into the batch normalization layer 5 for training, and finally the trained batch normalization layer 1, batch normalization layer 2, batch normalization layer 3, batch normalization layer 4, and batch normalization layer 5 may be obtained. In the application, if the noise intensity of the sample image is known and an appropriate batch normalization layer is selected for training, the image recognition model can obtain the best accuracy.
In some possible embodiments, the second determining module 70 may determine a first batch normalization layer and a second batch normalization layer of the image recognition model from the at least three batch normalization layers. Optionally, referring to fig. 7, as shown in fig. 7, a first batch normalization layer and a second batch normalization layer of the image recognition model are determined from at least three batch normalization layers, where the first batch normalization layer is trained from a sample image with the smallest noise intensity (for example, noise intensity 0) among the three sample images, and at this time, the first batch normalization layer may be batch normalization layer 1. The second batch normalization layer is trained from the sample image with the highest noise intensity (for example, noise intensity 8) among the three sample images, and in this case, the first batch normalization layer may be batch normalization layer 5.
In some possible embodiments, the deleting module 80 may delete the rest of the at least three batch normalization layers except the first batch normalization layer and the second batch normalization layer after determining the first batch normalization layer and the second batch normalization layer of the image recognition model, so as to obtain the batch normalization layer of the image recognition model. This is because in an actual business scene, the noise intensity of the confrontation sample image may be an arbitrary decimal, the five noise intensities {0, 1, 2, 4, 8} may not completely correspond to each other, and when the noise intensity of the confrontation sample image cannot be obtained, the accuracy of using the five-path image recognition model is not high enough. Therefore, in an alternative embodiment of the present application, a five-pass image recognition model as shown in fig. 6 is constructed, and the batch normalization layer 1, the batch normalization layer 2, the batch normalization layer 3, the batch normalization layer 4, and the batch normalization layer 5 are trained by using sample image features corresponding to five noise intensities, and after the training is completed, the batch normalization layer 2, the batch normalization layer 3, and the batch normalization layer 4 are deleted, the batch normalization layer 1 is used as a first batch normalization layer in the image recognition model, and the batch normalization layer 5 is used as a second batch normalization layer in the image recognition model. As shown in fig. 7, the trained first normalization layer and second normalization layer may have the processing capability of all the sample image features corresponding to the noise intensities between the noise intensity 0 and the noise intensity 8. The five-path image recognition model with five normalization layers is constructed at the initial stage of training to improve the feature extraction capability of the shared convolution layer on each noise intensity confrontation sample image, the batch normalization layer 1 and the batch normalization layer 5 in fig. 6 are reserved, and the deletion of the batch normalization layer 2, the batch normalization layer 3 and the batch normalization layer 4 is to realize the processing on any noise intensity corresponding sample image feature, so that the accuracy of the image recognition model is improved and the actual application scene is enlarged.
In the present application, at least three first sample images including at least three noise intensities, acquired by the fourth acquiring module 50, are extracted through a shared convolution layer in an image recognition model to obtain sample image features of each first sample image, and then the sample image features of each first sample image are input into at least three batch normalization layers in the image recognition model for training through the first training module 60. Finally, the second determining module 70 uses the batch normalization layer with the minimum noise intensity in the three processed sample images as the first batch normalization layer, uses the batch normalization layer with the maximum noise intensity in the three processed sample images as the second batch normalization layer, and deletes the other batch normalization layers except the first batch normalization layer and the second batch normalization layer in the at least three batch normalization layers by the deleting module 80 to obtain the batch normalization layer of the image recognition model. In the present application, during training of the image recognition model, a multi-pass image recognition model is first constructed, as shown in fig. 8, fig. 8 is a schematic structural diagram of the multi-pass image recognition model provided in the present application, where fig. 8 includes at least three batch normalization layers. The first sample image is input into the multi-channel image recognition model shown in fig. 8 for training, the extraction capability of the shared convolution layer on the image features is improved, and the processing capability of the first batch of normalization layers and the second batch of normalization layers on the sample image features corresponding to any noise intensity is realized by deleting the rest batch of normalization layers. As shown in fig. 9, fig. 9 is a schematic structural diagram of a single-pass image recognition model provided in the present application. Through training of the multi-channel image recognition model, the natural precision and the confrontation precision of a first batch of normalization layers can be superior to those of batch normalization layers in the single-channel image recognition model shown in fig. 9, and the natural precision and the confrontation precision of a second batch of normalization layers can also be superior to those of batch normalization layers in the single-channel image recognition model shown in fig. 9, so that the multi-channel image recognition model is superior to the single-channel image recognition model shown in fig. 9, and the multi-channel image recognition model is simple in operation and high in applicability.
In a possible embodiment, the above apparatus further comprises:
a fifth obtaining module 90, configured to obtain a second sample image used for training a path feature fusion module of the image recognition model, input the second sample image into the image recognition model, and extract a sample image feature of the second sample image through a shared convolution layer in the image recognition model;
a sixth obtaining module 100, configured to obtain, through a first batch of normalization layers in the image recognition model, first normalized sample image features corresponding to the sample image features, and obtain, through a second batch of normalization layers in the image recognition model, second normalized sample image features corresponding to the sample image features;
a second training module 110, configured to train the path feature fusion module based on the sample image features, the first normalized sample image features, the second normalized sample image features, and the image type labels of the second sample images, so that the path feature fusion module outputs a weight corresponding to a normalized image feature of any image feature output through the first batch normalization layer, and a weight corresponding to a normalized image feature output through the second batch normalization layer.
In a possible implementation manner, the second training module 110 further includes:
a first obtaining unit 1101, configured to obtain, by a path feature fusion module in the image recognition model, a first sample weight and a second sample weight corresponding to the sample image feature, where the first sample weight is used to label a weight occupied by the first normalized sample image feature, the second sample weight is used to label a weight occupied by the second normalized sample image feature, and a sum of the first sample weight and the second sample weight is 1;
a first determining unit 1102 configured to determine a fusion feature of the second sample image based on the first normalized sample image feature, the first sample weight, the second normalized sample image feature, and the second sample weight, and determine an image type of the second sample image based on the fusion feature of the second sample image;
a first adjusting unit 1103, configured to calculate a classification loss of the image recognition model according to the image type of the second sample image and the image type label of the second sample image, and adjust a network parameter of the path fusion feature module based on the classification loss.
In a possible implementation, the first determining unit 1102 is further configured to:
determining a first weighted image feature according to the first normalized image feature and the first weight, and determining a second weighted image feature according to the second normalized image feature and the second weight;
generating fusion characteristics of the image to be identified according to the first weight image characteristics and the second weight image characteristics;
the fusion characteristics of the images to be recognized meet the following conditions:
X=W 0 BN 0 (x)+W 1 BN 1 (x)
wherein X is the fusion characteristic of the image to be identified, and W 0 Is the above first weight, BN 0 (x) For the first normalized image feature described above, W 1 Is the above-mentioned second weight, BN 1 (x) For the second normalized image feature, W 0 BN 0 (x) For the first weighted image feature, W 1 BN 1 (x) The second weight characteristic.
In some possible embodiments, the fifth obtaining module 90 may obtain a second sample image used for training the path feature fusion module of the image recognition model, and extract a sample image feature of the second sample image through a shared convolution layer in the image recognition model, where the second sample image includes a countermeasure sample of any noise intensity. The sixth obtaining module 100 may obtain the normalized parameters obtained after the training of the first normalization layer and the second normalization layer. The first normalization layer normalizes the sample image characteristics according to the corresponding normalization parameters to obtain first normalized sample image characteristics; and the second batch of normalization layers normalize the sample image characteristics according to the corresponding normalization parameters and obtain second normalized sample image characteristics. Therefore, a first normalized sample image feature corresponding to the sample image feature may be obtained through a first batch of normalization layers in the image recognition model, and a second normalized sample image feature corresponding to the sample image feature may be obtained through a second batch of normalization layers in the image recognition model. Here, the first batch normalization layer may be the batch normalization layer 1 shown in fig. 6, and the second batch normalization layer may be the batch normalization layer 5 shown in fig. 6, that is, when the path feature fusion module is trained, the batch normalization layer in the image recognition model is already trained, and at this time, the batch normalization layer in the image recognition model is the first batch normalization layer and the second batch normalization layer. At this time, fixing other components such as the first normalization layer and the second normalization layer in the image model can maintain respective performances of the components such as the first normalization layer and the second normalization layer, and improve the accuracy of the trained weight of the normalized feature output by the path feature fusion module.
In some possible embodiments, the first obtaining unit 1101 of the second training module 110 may obtain, through the path feature fusion module in the image recognition model, a first sample weight and a second sample weight corresponding to the sample image feature, where the first sample weight is used to mark a weight occupied by the first normalized sample image feature, the second sample weight is used to mark a weight occupied by the second normalized sample image feature, and a sum of the first sample weight and the second sample weight is 1. The first determination unit 1102 in 110 of the second training module may be based on the first normalized sample image feature, the firstA sample weight, the second normalized sample image feature, and the second sample weight determine a fusion feature of the second sample image. Wherein the fusion feature of the second sample image may be X Sample (A) =W Sample 0 BN Sample 0 (x)+W Sample 1 BN Sample 1 (x) In the formula, X Sample (A) For the fusion feature of the second sample image, W Sample 0 For the first sample weight, BN Sample 0 (x) For the first normalized sample image feature described above, W Sample 1 For the above second sample weight, BN Sample 1 (x) The second normalized sample image feature is described above.
In some possible embodiments, the first adjusting unit 1103 in the second training module 110 may determine an image class of the second sample image based on the fusion feature of the second sample image, and calculate a classification loss of the image recognition model according to the image class of the second sample image and the image class label of the second sample image, so as to adjust the network parameter of the path fusion feature module based on the classification loss. Optionally, if it is determined that the image category of the second sample image is for image classification, the cross entropy loss function may be used to calculate the classification loss of the image category of the second sample image and the image category label of the second sample image; if the image class of the second sample image is determined to be for face recognition, the classification loss of the image class of the second sample image and the image class label of the second sample image may be calculated using an Arcface loss function. And finally, adjusting the network parameters of the path fusion feature module based on the classification loss so that the path feature fusion module outputs the weight corresponding to the normalized image features of any image feature output by the first batch of normalization layers and the weight corresponding to the normalized image features output by the second batch of normalization layers. When the path feature fusion module is trained, other components in the image model need to be fixed to maintain respective performances of the first normalization layer and the second normalization layer, and meanwhile, the trained path feature fusion module can further extract features according to the acquired image features and enable the generated first weight for marking the first normalization features and the generated second weight for marking the second normalization features to be more accurate.
In the present application, a fifth obtaining module 90 obtains a second sample image for training a path feature fusion module of an image recognition model, and extracts sample image features of the second sample image through a shared convolution layer in the image recognition model, a sixth obtaining module 100 obtains first normalized sample image features and second normalized sample image features corresponding to the sample image features through a first normalization layer and a second normalization layer in the image recognition model, a first obtaining unit 1101 in a second training module 110 obtains first sample weights and second sample weights corresponding to the sample image features through the path feature fusion module in the image recognition model, a first determining unit 1102 in the second training module 110 can determine first weighted image features according to the first normalized image features and the first weights, and determine second weighted image features according to the second normalized image features and the second weights, and generating a fusion feature of the image to be recognized according to the first weight image feature and the second weight image feature, and determining the image type of the second sample image based on the fusion feature of the second sample image. The first adjusting unit 1103 in the second training module 110 calculates the classification loss of the image recognition model according to the image class of the second sample image and the image class label of the second sample image, so as to adjust the network parameters of the path fusion feature module based on the classification loss, and finally, the trained path feature fusion module can extract the image features of the image to be recognized and automatically adjust the first weight for marking the first normalized image features and the second weight for marking the second normalized image features. Therefore, the method has higher applicability in the actual service scene and simpler operation.
Further, please refer to fig. 13, where fig. 13 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 13, the computer device 2000 may be applied to a server, which may be the service server 100 in the embodiment corresponding to fig. 1; the computer device 2000 may be applied to a terminal, which may be the user terminal 10a, the user terminals 10b, …, or the user terminal 10n in the embodiment corresponding to fig. 1; the computer device 2000 may also be the computer device in the embodiment corresponding to fig. 3. The computer device 2000 may include: a processor 2001, a network interface 2004 and a memory 2005, the computer device 2000 further comprising: a transceiver 2003, and at least one communication bus 2002. The communication bus 2002 is used to implement connection communication between these components. The network interface 2004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). Memory 2005 may be a high-speed RAM memory or a non-volatile memory (e.g., at least one disk memory). The memory 2005 may optionally also be at least one memory device located remotely from the aforementioned processor 2001. As shown in fig. 13, the memory 2005 which is a kind of computer-readable storage medium may include therein an operating system, a network communication module, a user interface module, and a device control application program.
In the computer device 2000 shown in fig. 13, the network interface 2004 may provide a network communication function; the processor 2001 and transceiver 2003 may be used to invoke the device control application stored in memory 2005 to implement:
the transceiver 2003 is used for acquiring an image to be recognized.
The processor 2001 is configured to input the image to be recognized into an image recognition model, and extract image features of the image to be recognized through a shared convolution layer in the image recognition model.
The processor 2001 is further configured to obtain a first normalized image feature corresponding to the image feature through a first normalization layer in the image recognition model, and obtain a second normalized image feature corresponding to the image feature through a second normalization layer in the image recognition model, where the first normalization layer and the second normalization layer are obtained by training sample images with different noise intensities, respectively;
the processor 2001 is configured to obtain a first weight and a second weight corresponding to the image feature through a path feature fusion module in the image recognition model, where the first weight is used to mark a weight occupied by the first normalized image feature, the second weight is used to mark a weight occupied by the second normalized image feature, and a sum of the first weight and the second weight is 1;
the processor 2001 is configured to determine a fusion feature of the image to be recognized based on the first normalized image feature, the first weight, the second normalized image feature, and the second weight, and determine an image type of the image to be recognized based on the fusion feature.
In one possible implementation, the processor 2001 is further configured to:
inputting each first sample image into the image recognition model based on at least three first sample images including at least three noise intensities acquired by the transceiver 2003, and extracting sample image features of each first sample image by a shared convolution layer in the image recognition model;
training at least three batch normalization layers in the image recognition model according to the sample image features of the first sample images to obtain at least three trained batch normalization layers, wherein the sample image features of a sample image with noise intensity are used for training one batch normalization layer;
and determining a first batch of normalization layers and a second batch of normalization layers of the image recognition model from the at least three batch of normalization layers, wherein the first batch of normalization layers is obtained by training the sample images with the minimum noise intensity in the at least three sample images, and the second batch of normalization layers is obtained by training the sample images with the maximum noise intensity in the at least three sample images.
In one possible embodiment, after the first batch normalization layer and the second batch normalization layer of the image recognition model are determined from the at least three batch normalization layers, the processor 2001 is further configured to:
deleting the rest of the at least three batch normalization layers except the first batch normalization layer and the second batch normalization layer to obtain a batch normalization layer of the image recognition model.
In one possible implementation, the processor 2001 is further configured to:
acquiring a second sample image of a path feature fusion module for training the image recognition model, inputting the second sample image into the image recognition model, and extracting sample image features of the second sample image through a shared convolution layer in the image recognition model;
acquiring first normalized sample image features corresponding to the sample image features through a first batch of normalization layers in the image recognition model, and acquiring second normalized sample image features corresponding to the sample image features through a second batch of normalization layers in the image recognition model;
the processor 2001 is further configured to:
the path feature fusion module is trained based on the sample image features, the first normalized sample image features, the second normalized sample image features, and the image class label of the second sample image, such that the path feature fusion module outputs a weight corresponding to a normalized image feature output by any image feature through the first batch normalization layer, and a weight corresponding to a normalized image feature output by the second batch normalization layer.
In one possible implementation, the processor 2001 is further configured to:
acquiring a first sample weight and a second sample weight corresponding to the sample image features through a path feature fusion module in the image recognition model, wherein the first sample weight is used for marking the weight occupied by the first normalized sample image features, the second sample weight is used for marking the weight occupied by the second normalized sample image features, and the sum of the first sample weight and the second sample weight is 1;
the processor 2001 is further configured to:
determining a fusion feature of the second sample image based on the first normalized sample image feature, the first sample weight, the second normalized sample image feature, and the second sample weight, and determining an image type of the second sample image based on the fusion feature of the second sample image;
and calculating a classification loss of the image recognition model according to the image type of the second sample image and the image type label of the second sample image, and adjusting the network parameters of the path fusion feature module based on the classification loss so as to train the path feature fusion module.
In one possible embodiment, the determining the fusion feature of the image to be recognized based on the first normalized image feature, the first weight, the second normalized image feature, and the second weight includes:
determining a first weighted image feature according to the first normalized image feature and the first weight, and determining a second weighted image feature according to the second normalized image feature and the second weight;
generating fusion characteristics of the image to be identified according to the first weight image characteristics and the second weight image characteristics;
the fusion characteristics of the images to be recognized meet the following conditions:
X=W 0 BN 0 (x)+W 1 BN 1 (x)
wherein X is the fusion characteristic of the image to be identified, and W 0 Is the above first weight, BN 0 (x) For the first normalized image feature described above, W 1 Is the above-mentioned second weight, BN 1 (x) For the second normalized image feature, W 0 BN 0 (x) For the first weighted image feature, W 1 BN 1 (x) The second weight characteristic.
In the application, an image to be recognized is obtained, the image to be recognized is input into an image recognition model, and image features of the image to be recognized are extracted through a shared convolution layer in the image recognition model. And then acquiring a first batch of normalized features corresponding to the image features through a first batch of normalization layers in the trained image recognition model, and acquiring a second batch of normalized features corresponding to the image features through a second batch of normalization layers in the trained image recognition model. And acquiring a first weight and a second weight corresponding to the image features through a channel feature fusion module in the trained image recognition model, wherein the first weight is used for marking the weight occupied by the first normalized feature, the second weight is used for marking the weight occupied by the second normalized feature, and the sum of the first weight and the second weight is 1. Finally, determining a first weight image characteristic according to the first normalized image characteristic and the first weight, and determining a second weight image characteristic according to the second normalized image characteristic and the second weight; and generating a fusion feature of the image to be recognized according to the first weight image feature and the second weight image feature, and determining the image type of the image to be recognized based on the fusion feature. Because the image features are normalized through the trained first batch normalization layer and the trained second batch normalization layer to obtain corresponding normalized features, the accuracy and robustness of the image recognition model can be improved by using two different batch normalization layers for normalization processing on different images to be recognized.
The training of each batch of normalization layer in the image model can be performed by obtaining at least three first sample images including a normal sample image (a sample image with a noise intensity equal to 0) and a countersample image (a sample image with a noise intensity greater than 0) and extracting sample image features of each first sample image through a shared convolution layer in the image recognition model, and inputting the sample image features of each first sample image into at least three batch of normalization layers in the image recognition model for training. And finally, taking the batch normalization layer with the minimum noise intensity in the processed three sample images as a first batch normalization layer, taking the batch normalization layer with the maximum noise intensity in the processed three sample images as a second batch normalization layer, and deleting the rest batch normalization layers except the first batch normalization layer and the second batch normalization layer in the at least three batch normalization layers to obtain the batch normalization layer of the image recognition model. Because various noise intensity countermeasure sample images may be encountered in an actual business scene, various noise intensity countermeasure sample images are added during training, and the countermeasure sample images are trained by selecting different batch normalization layers according to different noise intensities, so that the disentanglement of the sample image features with different noise intensities is fully realized. The natural precision and the confrontation precision of the first batch of normalization layers can be better than those of the batch normalization layers in the single-path image recognition model shown in FIG. 9 through training of the multi-path image recognition model, and the natural precision and the confrontation precision of the second batch of normalization layers can also be better than those of the batch normalization layers in the single-path image recognition model shown in FIG. 9, so that the multi-path image recognition model is better than that of the single-path image recognition model shown in FIG. 9.
Wherein, the training of the path characteristic fusion module in the image model can extract the sample image characteristics of the second sample image through the shared convolution layer in the image recognition model, acquiring a first normalized sample image characteristic and a second normalized sample image characteristic corresponding to the sample image characteristic through a first batch normalization layer and a second batch normalization layer in the image recognition model, acquiring a first sample weight and a second sample weight corresponding to the sample image feature by a path feature fusion module in an image recognition model, determining a fusion feature of the second sample image based on the first normalized sample image feature, the first sample weight, the second normalized sample image feature and the second sample weight, and determining the image type of the second sample image based on the fusion feature of the second sample image. And finally, the trained channel feature fusion module can extract the image features of the image to be recognized and automatically adjust a first weight for marking the first normalized image features and a second weight for marking the second normalized image features. In addition, in the application, the first weight and the second weight are directly output by the trained channel feature fusion module according to the image to be recognized with any noise intensity, so that the applicability in an actual service scene is higher, and the operation is simpler.
Further, here, it is to be noted that: the present application further provides a computer-readable storage medium, where the computer program executed by the aforementioned image recognition apparatus based on path fusion is stored in the computer-readable storage medium, and the computer program includes program instructions, and when the processor executes the program instructions, the description of the image recognition method based on path fusion in the embodiment corresponding to fig. 3 and/or fig. 5 and/or fig. 10 can be executed, and therefore, details will not be described here again. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in embodiments of the computer-readable storage medium referred to in the present application, reference is made to the description of embodiments of the method of the present application. By way of example, the program instructions may be deployed to be executed on one computing device or on multiple computing devices at one site.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The computer-readable storage medium may be a control device of a power converter provided in any of the foregoing embodiments or an internal storage unit of the device, such as a hard disk or a memory of an electronic device. The computer readable storage medium may also be an external storage device of the electronic device, such as a plug-in hard disk, a Smart Memory Card (SMC), a Secure Digital (SD) card, a flash card (flash card), and the like, which are provided on the electronic device. The computer readable storage medium may further include a magnetic disk, an optical disk, a read-only memory (ROM), a random access memory (ram), or the like. Further, the computer readable storage medium may also include both an internal storage unit and an external storage device of the electronic device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the electronic device. The computer readable storage medium may also be used to temporarily store data that has been output or is to be output.
The terms "first", "second", and the like in the claims, in the description and in the drawings of the present invention are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus. Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments. The term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present application, the disclosed circuits and methods may also be implemented in other ways. For example, the above-described apparatus embodiments are illustrative, and for example, the division of circuit blocks into only one type of logical division may be implemented in practice in another type of division, for example, multiple blocks or components may be combined or integrated into another system, or some features may be omitted, or not implemented.
The functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims (10)

1. An image recognition method based on path fusion, characterized in that the method comprises:
acquiring an image to be recognized, inputting the image to be recognized into an image recognition model, and extracting image characteristics of the image to be recognized through a shared convolution layer in the image recognition model;
acquiring first normalized image features corresponding to the image features through a first batch of normalization layers in the image recognition model, and acquiring second normalized image features corresponding to the image features through a second batch of normalization layers in the image recognition model, wherein the first batch of normalization layers and the second batch of normalization layers are respectively obtained by training sample images with different noise intensities;
acquiring a first weight and a second weight corresponding to the image features through a path feature fusion module in the image recognition model, wherein the first weight is used for marking the weight occupied by the first normalized image features, the second weight is used for marking the weight occupied by the second normalized image features, and the sum of the first weight and the second weight is 1;
determining a fusion feature of the image to be identified based on the first normalized image feature, the first weight, the second normalized image feature and the second weight, and determining an image category of the image to be identified based on the fusion feature.
2. The method of claim 1, wherein before inputting the image to be recognized into an image recognition model, the method further comprises:
acquiring at least three first sample images comprising at least three noise intensities, inputting each first sample image into the image recognition model, and extracting sample image characteristics of each first sample image through a shared convolution layer in the image recognition model;
training at least three batch normalization layers in the image recognition model according to the sample image features of the first sample images to obtain at least three trained batch normalization layers, wherein the sample image features of a sample image with noise intensity are used for training one batch normalization layer;
and determining a first batch normalization layer and a second batch normalization layer of the image recognition model from the at least three batch normalization layers, wherein the first batch normalization layer is obtained by training the sample image with the minimum noise intensity in the at least three sample images, and the second batch normalization layer is obtained by training the sample image with the maximum noise intensity in the at least three sample images.
3. The method of claim 2, wherein after determining a first batch normalization layer and a second batch normalization layer of the image recognition model from the at least three batch normalization layers, the method further comprises:
deleting the rest of the at least three batch normalization layers except the first batch normalization layer and the second batch normalization layer to obtain the batch normalization layer of the image recognition model.
4. The method according to claim 2 or 3, wherein before the obtaining of the first weight and the second weight corresponding to the image feature by the pathway feature fusion module in the image recognition model, the method further comprises:
acquiring a second sample image, inputting the second sample image into the image recognition model, and extracting sample image characteristics of the second sample image through a shared convolution layer in the image recognition model;
acquiring first normalized sample image features corresponding to the sample image features through a first batch of normalization layers in the image recognition model, and acquiring second normalized sample image features corresponding to the sample image features through a second batch of normalization layers in the image recognition model;
training the path feature fusion module based on the sample image features, the first normalized sample image features, the second normalized sample image features and the image category labels of the second sample image, so that the path feature fusion module outputs a weight corresponding to a normalized image feature of any image feature output through the first batch of normalization layers and a weight corresponding to a normalized image feature output through the second batch of normalization layers.
5. The method of claim 4, wherein training the road feature fusion module based on the sample image features, the first normalized sample image features, the second normalized sample image features, and the image class labels of the second sample image comprises:
acquiring a first sample weight and a second sample weight corresponding to the sample image features through a path feature fusion module in the image recognition model, wherein the first sample weight is used for marking the weight occupied by the first normalized sample image features, the second sample weight is used for marking the weight occupied by the second normalized sample image features, and the sum of the first sample weight and the second sample weight is 1;
determining a fusion feature of the second sample image based on the first normalized sample image feature, the first sample weight, the second normalized sample image feature, and the second sample weight, and determining an image class of the second sample image based on the fusion feature of the second sample image;
and calculating the classification loss of the image recognition model according to the image category of the second sample image and the image category label of the second sample image, and adjusting the network parameters of the path fusion feature module based on the classification loss so as to train the path feature fusion module.
6. The method of claim 5, wherein the determining a fused feature of the image to be identified based on the first normalized image feature, the first weight, the second normalized image feature, and the second weight comprises:
determining a first weighted image feature according to the first normalized image feature and the first weight, and determining a second weighted image feature according to the second normalized image feature and the second weight;
and generating fusion characteristics of the image to be recognized according to the first weight image characteristics and the second weight image characteristics.
7. An image recognition apparatus based on path fusion, the apparatus comprising:
the first acquisition module is used for acquiring an image to be identified, inputting the image to be identified into an image identification model, and extracting the image characteristics of the image to be identified through a shared convolution layer in the image identification model;
a second obtaining module, configured to obtain first normalized image features corresponding to the image features through a first batch of normalization layers in the image recognition model, and obtain second normalized image features corresponding to the image features through a second batch of normalization layers in the image recognition model, where the first batch of normalization layers and the second batch of normalization layers are obtained by training sample images with different noise intensities respectively;
a third obtaining module, configured to obtain, through a pathway feature fusion module in the image recognition model, a first weight and a second weight corresponding to the image feature, where the first weight is used to mark a weight occupied by the first normalized image feature, the second weight is used to mark a weight occupied by the second normalized image feature, and a sum of the first weight and the second weight is 1;
a first determining module, configured to determine a fusion feature of the image to be identified based on the first normalized image feature, the first weight, the second normalized image feature, and the second weight, and determine an image category of the image to be identified based on the fusion feature.
8. The apparatus of claim 7, further comprising:
the fourth acquisition module is used for acquiring at least three first sample images comprising at least three noise intensities, inputting each first sample image into the image recognition model, and extracting the sample image characteristics of each first sample image through a shared convolution layer in the image recognition model;
the first training module is used for training at least three batch normalization layers in the image recognition model according to the sample image characteristics of each first sample image so as to obtain at least three batch normalization layers after training, wherein the sample image characteristics of a sample image with noise intensity are used for training one batch normalization layer;
and the second determining module is used for determining a first batch normalization layer and a second batch normalization layer of the image recognition model from the at least three batch normalization layers, wherein the first batch normalization layer is obtained by training a sample image with the minimum noise intensity in the three sample images, and the second batch normalization layer is obtained by training a sample image with the maximum noise intensity in the three sample images.
9. A computer device, comprising: a processor, a transceiver, a memory, and a network interface;
the processor is coupled to a memory for providing data communication functionality, a transceiver for storing program code, and a network interface for invoking the program code to perform the method of any of claims 1-6.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions which, when executed by a processor, perform the method of any of claims 1-6.
CN202110262494.0A 2021-03-10 2021-03-10 Image recognition method and device based on path fusion and storage medium Pending CN115082873A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110262494.0A CN115082873A (en) 2021-03-10 2021-03-10 Image recognition method and device based on path fusion and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110262494.0A CN115082873A (en) 2021-03-10 2021-03-10 Image recognition method and device based on path fusion and storage medium

Publications (1)

Publication Number Publication Date
CN115082873A true CN115082873A (en) 2022-09-20

Family

ID=83241740

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110262494.0A Pending CN115082873A (en) 2021-03-10 2021-03-10 Image recognition method and device based on path fusion and storage medium

Country Status (1)

Country Link
CN (1) CN115082873A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115550071A (en) * 2022-11-29 2022-12-30 支付宝(杭州)信息技术有限公司 Data processing method, device, storage medium and equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115550071A (en) * 2022-11-29 2022-12-30 支付宝(杭州)信息技术有限公司 Data processing method, device, storage medium and equipment
CN115550071B (en) * 2022-11-29 2023-04-07 支付宝(杭州)信息技术有限公司 Data processing method, device, storage medium and equipment

Similar Documents

Publication Publication Date Title
WO2022161286A1 (en) Image detection method, model training method, device, medium, and program product
CN111368943B (en) Method and device for identifying object in image, storage medium and electronic device
CN111461089A (en) Face detection method, and training method and device of face detection model
CN114331829A (en) Countermeasure sample generation method, device, equipment and readable storage medium
CN108229375B (en) Method and device for detecting face image
CN111444826A (en) Video detection method and device, storage medium and computer equipment
CN110516734B (en) Image matching method, device, equipment and storage medium
CN112801054A (en) Face recognition model processing method, face recognition method and device
CN115050064A (en) Face living body detection method, device, equipment and medium
CN114282059A (en) Video retrieval method, device, equipment and storage medium
CN114219971A (en) Data processing method, data processing equipment and computer readable storage medium
CN112308093B (en) Air quality perception method based on image recognition, model training method and system
CN111626212B (en) Method and device for identifying object in picture, storage medium and electronic device
CN113591603A (en) Certificate verification method and device, electronic equipment and storage medium
CN115082873A (en) Image recognition method and device based on path fusion and storage medium
CN113706550A (en) Image scene recognition and model training method and device and computer equipment
CN113255531B (en) Method and device for processing living body detection model, computer equipment and storage medium
CN115708135A (en) Face recognition model processing method, face recognition method and device
CN115905605A (en) Data processing method, data processing equipment and computer readable storage medium
CN114677611A (en) Data identification method, storage medium and device
CN114067394A (en) Face living body detection method and device, electronic equipment and storage medium
CN113569809A (en) Image processing method, device and computer readable storage medium
CN112749711A (en) Video acquisition method and device and storage medium
CN117079336B (en) Training method, device, equipment and storage medium for sample classification model
CN113011387B (en) Network training and human face living body detection method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination