CN113065634A - Image processing method, neural network training method and related equipment - Google Patents

Image processing method, neural network training method and related equipment Download PDF

Info

Publication number
CN113065634A
CN113065634A CN202110218469.2A CN202110218469A CN113065634A CN 113065634 A CN113065634 A CN 113065634A CN 202110218469 A CN202110218469 A CN 202110218469A CN 113065634 A CN113065634 A CN 113065634A
Authority
CN
China
Prior art keywords
information
category
similarity
feature
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110218469.2A
Other languages
Chinese (zh)
Inventor
李傲雪
黄维然
谢凌曦
李震国
王立威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Huawei Technologies Co Ltd
Original Assignee
Peking University
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University, Huawei Technologies Co Ltd filed Critical Peking University
Priority to CN202110218469.2A priority Critical patent/CN113065634A/en
Publication of CN113065634A publication Critical patent/CN113065634A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Abstract

The embodiment of the application discloses an image processing method, a neural network training method and related equipment, which are applied to the field of image processing in the field of artificial intelligence, and the method comprises the following steps: updating each first category information according to the first similarity information to obtain N second category information, wherein the first similarity information indicates the similarity between the first category information, and the second category information indicates the updated feature information of the first-category support image; obtaining first characteristic information corresponding to the query image through a first neural network, updating the first characteristic information according to second category information to obtain second characteristic information, and executing characteristic processing operation according to the second characteristic information to obtain a prediction result; training a first neural network according to a first loss function, the first loss function indicating a degree of similarity between the predicted result and the expected result; the probability that the second characteristic information is over-fitted to a small number of images is reduced, and the accuracy of the prediction result output by the whole neural network is improved.

Description

Image processing method, neural network training method and related equipment
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to an image processing method, a neural network training method, and a related device.
Background
Artificial Intelligence (AI) is the simulation, extension, and expansion of human Intelligence using a computer or computer-controlled machine. The artificial intelligence includes the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. Currently, small sample learning (few-shot learning) of neural networks for image processing is a viable research direction for artificial intelligence.
The small sample learning refers to that a small number of labeled training images are adopted to train the neural network for a new category, but because the number of images of the new category adopted in the small sample learning is small, the neural network is trained in a small sample learning mode, the trained neural network is used for feature acquisition, the acquired features are easily over-fitted to the small number of images of the new category instead of the features of the whole new category, and the accuracy of a prediction result output by the whole neural network is further influenced.
Therefore, a training method capable of improving the accuracy of the neural network is urgently needed to be proposed.
Disclosure of Invention
The embodiment of the application provides an image processing method, a neural network training method and related equipment, which are used for reducing the probability that second feature information is over-fitted to a small number of images, namely, the feature information capable of correctly reflecting the whole category is favorably acquired, so that the accuracy of a prediction result output by the whole neural network is improved.
In order to solve the above technical problem, an embodiment of the present application provides the following technical solutions:
in a first aspect, an embodiment of the present application provides a training method for a neural network, which may be used in the field of image processing in the field of artificial intelligence, and the method includes: the method comprises the steps that training equipment obtains a training image set, wherein the training image set comprises N types of supporting images and query images, and N is an integer larger than 1; the N types of support images and the query images correspond to a training mode of meta-learning, the query images can also be called test images, and the meta-learning is a branch in small sample learning research. The training equipment acquires N pieces of first category information corresponding to N types of supporting images, and updates each piece of first category information according to the N pieces of first category information and first similarity information to obtain N pieces of second category information, wherein one piece of first category information indicates feature information of one type of supporting images before updating, the first similarity information indicates similarity between the pieces of first category information in the N pieces of first category information, and one piece of second category information indicates updated feature information of one type of supporting images. The training equipment performs feature extraction on the query image through a first neural network to obtain at least one piece of first feature information corresponding to the query image, and updates the at least one piece of first feature information through the first neural network according to the N pieces of second category information to obtain at least one piece of second feature information corresponding to the query image; the training equipment executes characteristic processing operation through the first neural network according to the at least one piece of second characteristic information to obtain a prediction result corresponding to the query image; the training equipment trains the first neural network according to the expected result corresponding to the query image, the prediction result and the first loss function until a preset condition is met, wherein the first loss function indicates the similarity between the prediction result corresponding to the query image and the expected result corresponding to the query image.
In this implementation, since the number of the support images of the target category (any one of the N categories) in the N categories is small, the first category information corresponding to the support images of the target category has a large deviation from the actual feature information of the image of the entire target category, that is, the first category information corresponding to the target category cannot correctly reflect the actual feature of the image of the entire target category, and after the category information corresponding to the other category is introduced into the first category information corresponding to the target category, the deviation between the first category information corresponding to the target category and the actual feature information of the image of the entire target category can be reduced, that is, the second category information corresponding to the target category can more accurately reflect the actual feature of the image of the entire target category, and the second feature information is obtained based on the second category information, so that the probability that the second feature information is over-fitted to a small number of training images is reduced, the method is also beneficial to acquiring the characteristic information which can correctly reflect the whole category, thereby improving the accuracy of the prediction result output by the whole neural network.
In one possible implementation manner of the first aspect, the target category is any one of N categories for first category information corresponding to the target category among the N pieces of first category information. The training equipment can acquire the similarity between the first class information corresponding to the target class and the first class information corresponding to each class except the target class, and determine the weight of the first class information corresponding to each class except the target class according to the similarity; the training equipment multiplies the first class information corresponding to each class except the target class by the weight corresponding to the first class information; and the multiplication result is superposed with the first class information corresponding to the target class, so that second similarity information corresponding to the target class is obtained.
In a possible implementation manner of the first aspect, the updating, by the training device, at least one first feature information through the first neural network according to the N second category information to obtain at least one second feature information corresponding to the query image includes: calculating second similarity information according to the N pieces of second category information and the at least one piece of first characteristic information, wherein the second similarity information indicates the similarity between each piece of second category information and each piece of first characteristic information; and updating each piece of first characteristic information according to the second similarity information to obtain at least one piece of second characteristic information corresponding to the at least one piece of first characteristic information one by one.
In this implementation manner, the second similarity information between each second category information and the first feature information is calculated, that is, the correlation between the second category information and the first feature information is calculated first, and the first feature information is updated based on the correlation between the second category information and the first feature information, since the N types of support images correspond to N second category information, but only one of the N second category information has the highest similarity with a specific first feature information, the second category information with the highest similarity should be used to update the first feature information, therefore, the updating operation is executed based on the second similarity information, so that the fineness of the process of generating the second feature information is improved, the second feature information which can more accurately reflect the features of the whole category can be obtained, and the accuracy of the output result of the whole neural network can be improved.
In a possible implementation manner of the first aspect, the first neural network includes M feature channels in a feature extraction network, the second category information includes feature information corresponding to the M feature channels, and further, the second category information may be specifically expressed as a vector including M elements, where the M elements correspond to the M feature channels one to one; the first feature information includes feature information corresponding to M feature channels, and further, the first feature information may be specifically expressed as a vector including M elements, where the M elements are in one-to-one correspondence with the M feature channels. The second similarity information is used to indicate the similarity between the second category information and the first feature information in each feature channel of the M feature channels, that is, the second similarity information is used to indicate the similarity between the target second category information and the target first feature information at the feature channel level.
In this implementation, since the N types of support images correspond to N pieces of second category information, but for a specific channel of a specific piece of first feature information, only the specific channel of the second category information has the highest similarity with the specific channel of the specific piece of first feature information, and the specific channel of the second category information with the highest similarity should be used to update the specific channel of the first feature information most. Therefore, the updating operation is executed based on the channel feature similarity, the accuracy of generating the second feature information is further improved, the second category information which more accurately reflects the features of the whole category can be obtained, and the accuracy of the output result of the whole neural network can be improved.
In a possible implementation manner of the first aspect, the calculating, by the training device, second similarity information according to the N second category information and the at least one first feature information includes: and normalizing any one of the N pieces of second category information, normalizing any one of the at least one piece of first characteristic information, and multiplying the normalized second category information and the normalized first characteristic information to obtain the importance information.
In a possible implementation manner of the first aspect, for a calculation process of second similarity information between any one of the N second category information and any one of the at least one first feature information, the calculating, by the training device, second similarity information according to the N second category information and the at least one first feature information includes: the training equipment directly carries out similarity calculation on the whole second category information and the whole first characteristic information to obtain second similarity information, and the second similarity information indicates the similarity of the second category information and the first characteristic information in the whole information level.
In the implementation mode, the second similarity information of the whole information level is calculated, the target first characteristic information is updated based on the similarity of the whole information level, and the similarity between the second category information and the first characteristic information can be more accurately measured due to the fact that the second similarity information is fused with the characteristics of the whole information level, so that the more proper second category information is selected to update the first characteristic information, and the accuracy of the generated second characteristic information is improved; in addition, another implementation mode for calculating the similarity information between the second category information and the first characteristic information is provided, and the implementation flexibility of the scheme is improved.
In one possible implementation manner of the first aspect, the method further includes: the training equipment updates at least one piece of first characteristic information through a first neural network according to the N pieces of first category information to obtain at least one piece of third characteristic information corresponding to the query image, wherein the at least one piece of third characteristic information is in one-to-one correspondence with the at least one piece of second characteristic information; and the training equipment performs fusion operation on the at least one third characteristic information and the at least one second characteristic information through the first neural network to obtain at least one updated second characteristic information corresponding to the query image. The training device performs feature processing operations through the first neural network according to the at least one piece of second feature information, including: and the training equipment executes the characteristic processing operation through the first neural network according to the at least one piece of updated second characteristic information.
In this implementation manner, the first feature information is updated according to the second category information to obtain the second feature information, the first feature information is also updated according to the first category information to obtain the third feature information, the third feature information is further fused with the second feature information to obtain the updated second feature information, and the updated second feature information is subjected to feature processing; because the second characteristic information is obtained based on the second category information, that is, in order to avoid overfitting of the trained neural network, the second characteristic information includes information of other categories, and the third characteristic information ensures information of a single category as much as possible, that is, the second characteristic information and the third characteristic information have complementarity, the trained neural network can synthesize more abundant information to execute the characteristic processing operation, so as to improve the accuracy of the prediction result output by the neural network.
In one possible implementation manner of the first aspect, the training device performing, by the first neural network, a feature processing operation according to the at least one second feature information to obtain a prediction result corresponding to the query image includes: the training equipment executes object detection operation through the first neural network according to the at least one piece of second characteristic information to obtain a first prediction result corresponding to the query image, wherein the first prediction result indicates the prediction position of at least one detection frame in the query image and the prediction category corresponding to each detection frame. Or the training equipment performs classification operation through the first neural network according to the at least one piece of second characteristic information to obtain a second prediction result corresponding to the query image, wherein the second prediction result indicates the prediction category of the query image.
In this implementation, the first neural network may be a neural network used for object detection and may also be a neural network used for image classification, so that the application scenario of the scheme is expanded, and the implementation flexibility of the scheme is improved.
In a second aspect, an embodiment of the present application provides an image processing method, which may be used in the field of image processing in the field of artificial intelligence, where the method may include: the execution equipment acquires N pieces of second category information, wherein one piece of second category information indicates updated feature information of one category of images, the N pieces of second category information are obtained according to the N pieces of first category information and first similarity information, one piece of first category information indicates the feature information of one category of images, the first similarity information indicates the similarity between each piece of first category information in the N pieces of first category information, and N is an integer greater than or equal to 1; the execution equipment extracts the features of the image to be processed through the first neural network to obtain at least one piece of first feature information corresponding to the image to be processed, and updates the at least one piece of first feature information through the first neural network according to the N pieces of second category information to obtain at least one piece of second feature information corresponding to the image to be processed; and the execution equipment executes the characteristic processing operation through the first neural network according to the at least one piece of second characteristic information so as to obtain a prediction result corresponding to the image to be processed.
In one possible implementation manner of the second aspect, the obtaining, by the execution device, N second category information includes: the execution device also generates N pieces of second category information according to the N types of labeled support images, or receives the N pieces of second category information sent by the training device.
In the second aspect of the embodiment of the present application, the executing device may further perform steps performed by the training device in various possible implementation manners of the first aspect, and for specific implementation steps of the second aspect and the various possible implementation manners of the second aspect of the embodiment of the present application and beneficial effects brought by each possible implementation manner, reference may be made to descriptions in the first aspect and the various possible implementation manners of the first aspect, and details are not repeated here.
In a third aspect, an embodiment of the present application provides a training method for a neural network, which may be used in the field of image processing in the field of artificial intelligence, and the method includes: the training equipment acquires a training image set, wherein the training image set comprises a query image and N types of support images, N is an integer greater than or equal to 1, and the N types of support images and the query image correspond to a training mode of meta-learning; the method comprises the steps of obtaining N pieces of first category information corresponding to N types of supporting images, and updating each piece of first category information according to the N pieces of first category information and first similarity information to obtain N pieces of second category information, wherein one piece of first category information indicates feature information of one type of supporting images before updating, the first similarity information indicates similarity between the N pieces of first category information, and one piece of second category information indicates updated feature information of one type of supporting images. The training equipment performs feature extraction on the query image through a second neural network to obtain first feature information corresponding to the query image, and calculates third similarity information indicating the similarity between the first feature information and each piece of second category information in the N pieces of second category information; and the training equipment generates first indication information through the second neural network according to the third similarity information, wherein the first indication information is used for indicating the prediction category corresponding to the query image. The training equipment trains the second neural network according to second indicating information, first indicating information and a second loss function corresponding to the query image until a preset condition is met, wherein the second indicating information is used for indicating a correct category corresponding to the query image, and the second loss function indicates the similarity between the first indicating information and the second indicating information.
In the embodiment of the application, because a training mode of small samples is adopted, the number of the supporting images of each category in the N categories is small, and N pieces of first category information obtained according to the N categories of supporting images can hardly reflect the information of each category correctly; after the category information corresponding to other categories is introduced into the first category information corresponding to the target category (any one of the N categories), the deviation between the first category information corresponding to the target category and the actual feature information of the image of the whole target category can be reduced, that is, the second category information corresponding to the target category can more accurately reflect the actual feature of the image of the whole target category, that is, compared with the N first category information, the N second category information can more accurately reflect the actual feature of the image of the N categories; since the predicted category of the query image is obtained by calculating the similarity between the feature information of the query image and the N category information corresponding to the N category images, the accuracy of the predicted category of the query image output by the neural network can be improved as the accuracy of the N category information corresponding to the N category images is improved.
In the third aspect of the embodiment of the present application, the training device may further perform steps performed by the training device in various possible implementation manners of the first aspect, and for specific implementation steps of the third aspect and the various possible implementation manners of the third aspect of the embodiment of the present application and beneficial effects brought by each possible implementation manner, reference may be made to descriptions in the first aspect and the various possible implementation manners of the first aspect, and details are not repeated here.
In a fourth aspect, an embodiment of the present application provides an image processing method, which may be used in the field of image processing in the field of artificial intelligence, where the method may include: the execution equipment acquires N pieces of second category information, wherein one piece of second category information indicates updated characteristic information of one type of image, the second category information is obtained according to the N pieces of first category information and first similarity information, one piece of first category information indicates the characteristic information of one type of image, the first similarity information indicates the similarity between the first category information in the N pieces of first category information, and N is an integer greater than or equal to 1. The executing equipment performs feature extraction on the image to be processed through a second neural network to obtain first feature information corresponding to the image to be processed, and calculates third similarity information, wherein the third similarity information indicates the similarity between the first feature information and each piece of second category information in the N pieces of second category information; and the execution equipment generates first indication information through the second neural network according to the third similarity information, wherein the first indication information is used for indicating a prediction category corresponding to the image to be processed.
In the fourth aspect of the embodiment of the present application, the execution device may further execute the steps executed by the execution device in various possible implementation manners of the second aspect, and for specific implementation steps of the fourth aspect and various possible implementation manners of the fourth aspect of the embodiment of the present application and beneficial effects brought by each possible implementation manner, reference may be made to descriptions in the second aspect and various possible implementation manners of the second aspect, and details are not repeated here.
In a fifth aspect, an embodiment of the present application provides a training apparatus for a neural network, which can be used in the field of image processing in the field of artificial intelligence, and the apparatus includes: the training image acquisition module is used for acquiring a training image set, wherein the training image set comprises N types of supporting images and query images, N is an integer larger than 1, and the N types of supporting images and the query images correspond to a meta-learning training mode; the acquisition module is further used for acquiring N pieces of first category information corresponding to the N types of support images, and updating each piece of first category information according to the N pieces of first category information and the first similarity information to obtain N pieces of second category information, wherein one piece of first category information indicates feature information of one type of support image before updating, one piece of first similarity information indicates similarity between each piece of first category information in the N pieces of first category information, and one piece of second category information indicates updated feature information of one type of support image; the generating module is used for extracting the characteristics of the query image through the first neural network to obtain at least one piece of first characteristic information corresponding to the query image, and updating the at least one piece of first characteristic information through the first neural network according to the N pieces of second category information to obtain at least one piece of second characteristic information corresponding to the query image; the processing module is used for executing feature processing operation through the first neural network according to at least one piece of second feature information so as to obtain a prediction result corresponding to the query image; and the training module is used for training the first neural network according to the expected result, the predicted result and the first loss function corresponding to the query image until a preset condition is met, and the first loss function indicates the similarity between the predicted result and the expected result.
The training apparatus for a neural network provided in the fifth aspect of the embodiment of the present application may further perform steps performed by the training device in each possible implementation manner of the first aspect, and for specific implementation steps of the fifth aspect and each possible implementation manner of the fifth aspect of the embodiment of the present application and beneficial effects brought by each possible implementation manner, reference may be made to descriptions in each possible implementation manner of the first aspect, and details are not repeated here.
In a sixth aspect, an embodiment of the present application provides an image processing apparatus, which may be used in the field of image processing in the field of artificial intelligence, and includes: the acquisition module is used for acquiring N pieces of second category information, wherein one piece of second category information indicates updated characteristic information of one category of images, the second category information is obtained according to N pieces of first category information and first similarity information, one piece of first category information indicates the characteristic information of one category of images, the first similarity information indicates the similarity between each piece of first category information in the N pieces of first category information, and N is an integer greater than or equal to 1; the generating module is used for extracting the features of the image to be processed through the first neural network to obtain at least one piece of first feature information corresponding to the image to be processed, and updating the at least one piece of first feature information through the first neural network according to the N pieces of second category information to obtain at least one piece of second feature information corresponding to the image to be processed; and the processing module is used for executing characteristic processing operation through the first neural network according to the at least one piece of second characteristic information so as to obtain a prediction result corresponding to the image to be processed.
The image processing apparatus provided in the sixth aspect of the embodiment of the present application may further perform steps performed by an execution device in each possible implementation manner of the second aspect, and for specific implementation steps of each possible implementation manner of the sixth aspect and the sixth aspect of the embodiment of the present application and beneficial effects brought by each possible implementation manner, reference may be made to descriptions in each possible implementation manner of the second aspect, and details are not repeated here.
In a seventh aspect, an embodiment of the present application provides a training apparatus for a neural network, which may be used in the field of image processing in the field of artificial intelligence, and the apparatus includes: the training image acquisition module is used for acquiring a training image set, wherein the training image set comprises a query image and N types of support images, N is an integer greater than or equal to 1, and the N types of support images and the query image correspond to a training mode of meta-learning; the acquisition module is further used for acquiring N pieces of first category information corresponding to the N types of support images, and updating each piece of first category information according to the N pieces of first category information and the first similarity information to obtain N pieces of second category information, wherein one piece of first category information indicates feature information of one type of support image before updating, one piece of first similarity information indicates similarity between each piece of first category information in the N pieces of first category information, and one piece of second category information indicates updated feature information of one type of support image; the generating module is used for extracting the features of the query image through a second neural network to obtain first feature information corresponding to the query image and calculating third similarity information, wherein the third similarity information indicates the similarity between the first feature information and each piece of second category information in the N pieces of second category information; the processing module is used for generating first indication information through a second neural network according to the third similarity information, and the first indication information is used for indicating a prediction category corresponding to the query image; the training module is used for training the second neural network according to second indicating information, first indicating information and a second loss function corresponding to the query image until a preset condition is met, the second indicating information is used for indicating a correct category corresponding to the query image, and the second loss function indicates the similarity between the first indicating information and the second indicating information.
The training apparatus for a neural network provided in the seventh aspect of the embodiment of the present application may further perform steps performed by the training device in each possible implementation manner of the third aspect, and for specific implementation steps of the seventh aspect and each possible implementation manner of the seventh aspect of the embodiment of the present application and beneficial effects brought by each possible implementation manner, reference may be made to descriptions in each possible implementation manner of the third aspect, and details are not repeated here.
In an eighth aspect, an embodiment of the present application provides an image processing apparatus, which can be used in the field of image processing in the field of artificial intelligence, and includes: the acquisition module is used for acquiring N pieces of second category information, wherein one piece of second category information indicates updated characteristic information of one category of images, the second category information is obtained according to N pieces of first category information and first similarity information, one piece of first category information indicates the characteristic information of one category of images, the first similarity information indicates the similarity between each piece of first category information in the N pieces of first category information, and N is an integer greater than or equal to 1; the generating module is used for extracting features of the image to be processed through a second neural network to obtain first feature information corresponding to the image to be processed, and calculating third similarity information, wherein the third similarity information indicates the similarity between the first feature information and each piece of second category information in the N pieces of second category information; and the processing module is used for generating first indication information through the second neural network according to the third similarity information, and the first indication information is used for indicating a prediction category corresponding to the image to be processed.
The image processing apparatus provided in the eighth aspect of the embodiment of the present application may further perform steps performed by the execution device in each possible implementation manner of the fourth aspect, and for specific implementation steps of the eighth aspect and each possible implementation manner of the eighth aspect of the embodiment of the present application and beneficial effects brought by each possible implementation manner, reference may be made to descriptions in each possible implementation manner of the fourth aspect, and details are not repeated here.
In a ninth aspect, the present application provides a training apparatus, which may include a processor, a memory coupled to the processor, and a program instruction stored in the memory, wherein the program instruction stored in the memory when executed by the processor implements the training method for the neural network according to the first aspect or the third aspect.
In a tenth aspect, the present application provides an execution device, which may include a processor, a processor coupled with a memory, the memory storing program instructions, and the program instructions stored in the memory, when executed by the processor, implement the image processing method according to the second aspect or the fourth aspect.
In an eleventh aspect, the present application provides a computer-readable storage medium, in which a computer program is stored, and when the program runs on a computer, the program causes the computer to execute the method for training a neural network according to the first aspect or the third aspect, or the computer to execute the method for processing an image according to the second aspect or the fourth aspect.
In a twelfth aspect, the present embodiments provide a circuit system, where the circuit system includes a processing circuit configured to execute the method for training a neural network according to the first aspect or the third aspect, or the processing circuit is configured to execute the method for processing an image according to the second aspect or the fourth aspect.
In a thirteenth aspect, the present application provides a computer program, which when run on a computer, causes the computer to execute the method for training a neural network according to the first or third aspect, or causes the computer to execute the method for processing an image according to the second or fourth aspect.
In a fourteenth aspect, the present application provides a chip system, which includes a processor, and is configured to implement the functions recited in the above aspects, for example, sending or processing data and/or information recited in the above methods. In one possible design, the system-on-chip further includes a memory for storing program instructions and data necessary for the server or the communication device. The chip system may be formed by a chip, or may include a chip and other discrete devices.
Drawings
FIG. 1 is a schematic structural diagram of an artificial intelligence body framework provided by an embodiment of the present application;
FIG. 2 is a system architecture diagram of an image processing system according to an embodiment of the present application;
fig. 3 is a schematic flowchart of a training method of a neural network according to an embodiment of the present disclosure;
fig. 4 is a schematic diagram illustrating generation of second category information in a training method of a neural network according to an embodiment of the present application;
fig. 5 is a schematic diagram illustrating second feature information generated in the training method for a neural network according to the embodiment of the present application;
fig. 6 is a schematic flowchart illustrating a first prediction result generated in the neural network training method according to an embodiment of the present disclosure;
fig. 7 is a schematic flowchart illustrating a first prediction result generated in the neural network training method according to an embodiment of the present disclosure;
fig. 8 is a schematic flowchart of an image processing method according to an embodiment of the present application;
fig. 9 is a schematic flowchart of another method for training a neural network according to an embodiment of the present disclosure;
fig. 10 is a schematic flowchart of another image processing method according to an embodiment of the present application;
fig. 11 is a schematic flowchart of another method for training a neural network according to an embodiment of the present disclosure;
fig. 12 is a schematic flowchart of another image processing method according to an embodiment of the present application;
fig. 13 is a schematic structural diagram of a training apparatus for a neural network according to an embodiment of the present disclosure;
fig. 14 is a schematic structural diagram of an alternative training apparatus for a neural network according to an embodiment of the present application;
fig. 15 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;
fig. 16 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;
fig. 17 is a schematic structural diagram of a training apparatus for a neural network according to an embodiment of the present application;
FIG. 18 is a schematic structural diagram of a training apparatus for neural networks according to an embodiment of the present disclosure;
fig. 19 is a schematic structural diagram of another image processing apparatus according to an embodiment of the present application;
fig. 20 is a schematic structural diagram of an execution device according to an embodiment of the present application;
FIG. 21 is a schematic structural diagram of a training apparatus provided in an embodiment of the present application;
fig. 22 is a schematic structural diagram of a chip according to an embodiment of the present disclosure.
Detailed Description
The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely descriptive of the various embodiments of the application and how objects of the same nature can be distinguished. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Embodiments of the present application are described below with reference to the accompanying drawings. As can be known to those skilled in the art, with the development of technology and the emergence of new scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.
The general workflow of the artificial intelligence system will be described first, please refer to fig. 1, which shows a schematic structural diagram of an artificial intelligence body framework, and the artificial intelligence body framework is explained below from two dimensions of "intelligent information chain" (horizontal axis) and "IT value chain" (vertical axis). Where "intelligent information chain" reflects a list of processes processed from the acquisition of data. For example, the general processes of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision making and intelligent execution and output can be realized. In this process, the data undergoes a "data-information-knowledge-wisdom" refinement process. The 'IT value chain' reflects the value of the artificial intelligence to the information technology industry from the bottom infrastructure of the human intelligence, information (realization of providing and processing technology) to the industrial ecological process of the system.
(1) Infrastructure
The infrastructure provides computing power support for the artificial intelligent system, realizes communication with the outside world, and realizes support through a foundation platform. Communicating with the outside through a sensor; the computing power is provided by an intelligent chip, and the intelligent chip may specifically adopt hardware acceleration chips such as a Central Processing Unit (CPU), an embedded neural Network Processor (NPU), a Graphics Processing Unit (GPU), an Application Specific Integrated Circuit (ASIC), or a Field Programmable Gate Array (FPGA); the basic platform comprises distributed computing framework, network and other related platform guarantees and supports, and can comprise cloud storage and computing, interconnection and intercommunication networks and the like. For example, sensors and external communications acquire data that is provided to intelligent chips in a distributed computing system provided by the base platform for computation.
(2) Data of
Data at the upper level of the infrastructure is used to represent the data source for the field of artificial intelligence. The data relates to graphs, images, voice and texts, and also relates to the data of the Internet of things of traditional equipment, including service data of the existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.
(3) Data processing
Data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.
The machine learning and the deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.
Inference means a process of simulating an intelligent human inference mode in a computer or an intelligent system, using formalized information to think about and solve a problem by a machine according to an inference control strategy, and a typical function is searching and matching.
The decision-making refers to a process of making a decision after reasoning intelligent information, and generally provides functions of classification, sequencing, prediction and the like.
(4) General capabilities
After the above-mentioned data processing, further based on the result of the data processing, some general capabilities may be formed, such as algorithms or a general system, e.g. translation, analysis of text, computer vision processing, speech recognition, recognition of images, etc.
(5) Intelligent product and industrial application
The intelligent product and industry application refers to the product and application of an artificial intelligence system in various fields, and is the encapsulation of an artificial intelligence integral solution, the intelligent information decision is commercialized, and the landing application is realized, and the application field mainly comprises: intelligent terminal, intelligent manufacturing, intelligent transportation, intelligent house, intelligent medical treatment, intelligent security protection, autopilot, wisdom city etc..
The method and the device can be applied to the field of image processing in the field of artificial intelligence, and further can be applied to an application scene in which a small sample mode is adopted for training to obtain a neural network for image processing; the training in the small sample mode includes a meta-learning (meta-learning) based training method, a two-stage fine-tuning training method, or other types of small sample training methods; it should be noted that the two-stage tuning training method is compatible with meta-learning (meta-learning), and in the two stages of the two-stage tuning, each stage can be trained by using the meta-learning training method. The functions of the neural network may specifically be object detection, image classification, or other functions performed on the image, and are not exhaustive here.
As an example, in the field of smart manufacturing, for example, a trained neural network may be applied in a scenario where failure detection is performed on equipment in a plant at a remote location. In the application scenario, a maintenance person remotely obtains an image of equipment in a factory, and detects the position of a key module in the equipment through each module in the neural network equipment according to the image so as to position the equipment in a fault. However, when the neural network is trained, only a few labeled training images of each equipment module of the newly shipped equipment in a laboratory environment can be obtained, and a large number of labeled training images of each equipment module of the newly shipped equipment in an actual factory environment are lacked; the neural network used to detect the various modules in the device needs to be trained in a small sample fashion.
As another example, in the field of smart driving, for example, smart driving places high demands on the performance of object detection algorithms, and some unusual (which may also be referred to as Out of Distribution) situations can reduce the stability of the detection algorithms, making smart driving techniques difficult to land on a commercial basis. As an example, one small stop sign under an overpass, for example, is misdetected as a pedestrian, causing the vehicle to suddenly brake; moreover, for example, the green belt covers part of the body of the pedestrian, so that the pedestrian and the like without detecting the roadside belong to unusual scenes. Due to the fact that the scenes are uncommon, only a small amount of labeling data of the scenes can be obtained, and therefore the neural network for detecting the objects in the driving environment is intelligently trained in a small sample mode.
It should be understood that the above examples are only for convenience of understanding the present solution and are not intended to limit the present solution. In the foregoing various scenarios, since the small sample learning is to train the neural network with a small number of labeled training images, the feature extraction capability of the trained neural network is easily over-fitted to a small number of labeled training images, and the feature information of the entire class cannot be accurately extracted, which may affect the accuracy of the prediction result output by the entire neural network. For example, in a training stage, for example, if a green belt blocks part of the body of a pedestrian in a training image corresponding to a driving scene, and the pedestrians all wear red clothes, the trained neural network is easily over-fitted to learn a feature that "the pedestrians wear red clothes", but "the pedestrians wear red clothes" is not a feature of "human", and therefore, the accuracy of the prediction result output by the trained neural network is affected.
In order to solve the above problem, an embodiment of the present application provides a training method of a neural network, and before describing the training method of the neural network in detail, an image processing system provided in the embodiment of the present application is described with reference to fig. 2. Referring to fig. 2, fig. 2 is a system architecture diagram of an image processing system according to an embodiment of the present disclosure, in fig. 2, a training system 200 of a neural network includes an executing device 210, a training device 220, a database 230, and a data storage system 240, and the executing device 210 includes a calculating module 211.
The database 230 stores a training image set, the training device 220 generates the target model/rule 201, and performs iterative training on the target model/rule 201 by using the training image set in the database 230 to obtain a mature target model/rule 201. Further, the target model/rule 201 may be specifically implemented by using a neural network or a non-neural network type model, and in the embodiment of the present application, only the example that the target model/rule 201 specifically uses a neural network is taken as an example for description.
Specifically, in each training process, the training device 220 may obtain N second category information, where one second category information indicates updated feature information of one category of images, the second category information is obtained according to N first category information and first similarity information, one first category information indicates feature information of one category of images, the first similarity information indicates similarity between each of the N first category information, and N is an integer greater than or equal to 1. The training device 220 performs feature extraction on the training image through the target model/rule 201 to obtain first feature information corresponding to the training image, updates the first feature information according to the N second category information and the first feature information to obtain second feature information corresponding to the training image, and performs feature processing operation through the target model/rule 201 according to the second feature information to obtain a prediction result corresponding to the training image. Since the number of the supported images of the target categories in the N categories is small, the first category information corresponding to the target categories cannot correctly reflect the actual features of the images of the entire target categories, and after the category information corresponding to the other categories is introduced into the first category information corresponding to the target categories, the deviation between the first category information corresponding to the target categories and the actual feature information of the images of the entire target categories can be reduced, and the second feature information is obtained based on the second category information, so that the probability that the second feature information is over-fitted to a small number of training images can be reduced, and the accuracy of the prediction result output by the entire neural network can be improved.
After training the mature target model/rule 201 obtained by the device 220, the target model/rule 201 is deployed to the execution device 210, and the calculation module 211 of the execution device 210 may perform image processing through the target model/rule 201. The execution device 210 may be embodied in various systems or devices, such as a mobile phone, a tablet, a laptop, a VR device, a monitoring system, a radar data processing system, and so on.
The execution device 210 may call data, code, etc. in the data storage system 240, or store data, instructions, etc. in the data storage system 240. The data storage system 240 may be disposed in the execution device 210 or the data storage system 240 may be an external memory with respect to the execution device 210.
In some embodiments of the present application, please refer to fig. 2, a "user" may directly interact with the execution device 210, that is, the execution device 210 may directly display the predicted image output by the target model/rule 201 to the "user", it is to be noted that fig. 2 is only an architectural schematic diagram of the image processing system provided in the embodiments of the present invention, and the positional relationship among the devices, modules, and the like shown in the diagram does not constitute any limitation. For example, in other embodiments of the present application, the execution device 210 and the client device may also be separate devices, the execution device 210 is configured with an input/output (in/out, I/O) interface, and the execution device 210 performs data interaction with the client device through the I/O interface.
In combination with the above description, the neural network trained by the training device 220 may be used for object detection and image classification, the implementation flows corresponding to the two application scenarios are different, and the following description is made on the two scenarios in which the neural network is used for object detection and the neural network is used for image classification.
The neural network is used for carrying out object detection on the image
In some embodiments of the present application, the neural network is used to perform object detection on an image, and a specific implementation flow of a training phase and an inference phase of the neural network provided in the embodiments of the present application is described below.
(I) training phase
In this embodiment of the present application, the training phase describes a process how the training device 220 generates a mature neural network by using the image data set in the database 230, specifically, please refer to fig. 3, fig. 3 is a flowchart of a training method of a neural network provided in this embodiment of the present application, and the training method of a neural network provided in this embodiment of the present application may include:
301. the training device obtains a training image set, wherein the training image set comprises N types of supporting images and a query image.
In the embodiment of the application, the training device is preconfigured with training data of a first neural network, and the training data of the first neural network includes a plurality of images and an expected result corresponding to each image. Since the first neural network functions to perform object detection on the image, the expected result is used to indicate the correct position of each of the one or more detection frames corresponding to the image, and is also used to indicate the correct category corresponding to each of the detection frames.
Before the training device performs a training operation on the first neural network in a meta-learning training mode, a training image set corresponding to one meta-task needs to be acquired. Specifically, the training device may select N types of support (support) images and at least one query (query) image from the training data of the first neural network (i.e., select the training image set) according to an expected result corresponding to each image in the training data, and obtain an expected result corresponding to each image in the training image set. The query image may also be referred to as a test image, N is an integer greater than 1, for example, a value of N may be 2, 3, or 4; each type of support image in the N types of support images comprises one or more support images, the expected result corresponding to each support image is used for indicating the position of one detection frame, and the expected result corresponding to each support image is also used for indicating the correct type corresponding to the one detection frame.
It should be noted that, although the expected result corresponding to each image in the training data of the first neural network indicates the position of one or more detection frames, the expected result corresponding to each support image indicates the position of only one detection frame. As an example, for example, the N classes include a cat, a rabbit, a car, and a dog, and two classes of objects, "cat" and "rabbit" exist in one target image of the training data at the same time, positions of two detection frames are indicated in an expected result corresponding to the target image in the training data, and correct classes corresponding to the two detection frames are "cat" and "rabbit", respectively; the training device determines the target image as a support image with a category of "rabbit", and the training device only obtains the position of the detection frame corresponding to the category of "rabbit" from the expected result corresponding to the target image included in the training data.
302. The training equipment acquires N pieces of first category information corresponding to the N types of support images, wherein the first category information indicates feature information of one type of support images before updating.
In the embodiment of the application, after the training device acquires the N types of support images and the expected result corresponding to each support image, the training device needs to acquire N pieces of first-class information corresponding to the N types of support images, and one piece of first-class information indicates feature information of one type of support image before updating.
Specifically, the N types of support images include one or more target types of support images, the training device performs feature extraction on the support image of each target type through the neural network according to an expected result corresponding to the support image of each target type to obtain feature information of the support image of each target type, and then obtains first type information corresponding to the support image of each target type according to the feature information of the support image of each target type in the one or more target types of support images. The training equipment performs the operation on the support image of each category in the N categories of support images, so that N pieces of first category information corresponding to the N categories of support images are obtained.
303. The training equipment updates each piece of first category information according to the N pieces of first category information and the first similarity information to obtain N pieces of second category information, wherein the first similarity information indicates the similarity between the pieces of first category information in the N pieces of first category information, and one piece of second category information indicates the updated feature information of one type of supporting image.
In the embodiment of the application, after acquiring N pieces of first category information corresponding to N types of support images, the training device may calculate first similarity information; after the first similarity information is obtained through calculation, each piece of first category information is updated according to the N pieces of first category information and the first similarity information, so that N pieces of second category information are obtained.
The first similarity information indicates the similarity between each of the N pieces of first category information, and the first similarity information may be specifically expressed as a matrix, an array, an index, a table or other types of data; further, if the first similarity information is represented as an N-by-N matrix, each row in the matrix represents the similarity between one first category information and the other first category information in the N first category information. A second category of information indicates updated feature information for a type of support image.
Specifically, the method is directed to the calculation of the similarity between two pieces of first-class information. In an implementation manner, a similarity calculation function may be preconfigured in the training device, and the training device may calculate similarities between every two pieces of first category information in the N pieces of first category information one by one through the preconfigured similarity calculation function to generate the first similarity information, where the similarity calculation function includes, but is not limited to, cosine similarity, euclidean distance, mahalanobis distance, manhattan distance, or other functions for calculating similarity, and the like, which is not exhaustive here.
In another implementation manner, a neural network with adjustable weight parameters may be preconfigured in the training device, and the training device calculates similarity between every two pieces of first-class information in the N pieces of first-class information one by using the neural network to generate first similarity information.
To a process of generating a second category of information. And the training equipment updates each first category information through a generation module in the current feature updating network according to each first category information and the first similarity information so as to generate N second category information corresponding to the N categories of the support images.
Specifically, the target category is any one of the N categories with respect to the first category information corresponding to the target category among the N pieces of first category information. The training equipment can acquire the similarity between the first class information corresponding to the target class and the first class information corresponding to each class except the target class, and determine the weight of the first class information corresponding to each class except the target class according to the similarity; the training equipment multiplies the first class information corresponding to each class except the target class by the weight corresponding to the first class information; and the multiplication result is superposed with the first class information corresponding to the target class, so that second similarity information corresponding to the target class is obtained. The first category information corresponding to each of the N categories of the training device performs the aforementioned operation to obtain second category information (i.e., updated category information) corresponding to each of the N categories.
More specifically, the training device may directly use the similarity between the first category information corresponding to the other categories and the first category information corresponding to the target category as the weight, or may further process the similarity between the first category information corresponding to the other categories and the first category information corresponding to the target category to obtain the weight, where the further processing may be normalization processing, multiplication with a preset value, other processing, or the like.
As an example, for example, the value of N is 5, and 5 pieces of first category information corresponding to the N categories are first category information 1, first category information 2, first category information 3, first category information 4, and first category information 5, respectively. Wherein the similarity between the first category information 1 and the first category information 2 is 0.3, the similarity between the first category information 1 and the first category information 3 is 0.2, the similarity between the first category information 1 and the first category information 4 is 0.2, and the similarity between the first category information 1 and the first category information 5 is 0.3, the updated feature of the first category information 1 (i.e. the second category information corresponding to the first category information 1) can be obtained by superposing the feature of the first category information 1, the feature of the first category information 2 by 0.3, the feature of the first category information 3 by 0.2, the feature of the first category information 4 by 0.2, and the feature of the first category information 5 by 0.3, and the aforementioned operations are repeatedly performed on the first category information 2 to the first category information 5, so that the updated feature of each first category information in the training first category information set can be obtained, which should be understood, the foregoing examples are merely provided to facilitate understanding of the present solution.
To further understand the present solution, an example of a formula for generating a second category of information is disclosed below:
gi=∑kφ(hi,hk)ψ(hk)+hi; (1)
wherein, giRepresents one of N second category information, hiRepresents one first-class information of N first-class information, hkRepresenting N first category information except hiThe first category of information, phi (,) is a similarity calculation function for measuring the similarity between two input elements, it should be understood that equation (1) is only an example and is not intended to limit the present solution.
For a more intuitive understanding of the present disclosure, please refer to fig. 4, where fig. 4 is a schematic diagram of generating the second category information in the training method of the neural network according to the embodiment of the present disclosure. In fig. 4, taking the value of N as 5 as an example, 5 pieces of first category information corresponding to the 5 categories are first category information 1, first category information 2, first category information 3, first category information 4, and first category information 5, respectively. As shown in fig. 4, the training device calculates the similarity between any two pieces of the 5 pieces of first-class information by using a similarity calculation function, and in fig. 4, for example, the similarity between each piece of the first-class information 1 to 4 and the first-class information 5 is calculated, and the obtained four similarity scores are 0.3, 0.2, and 0.3. The training device takes the similarity between each of the first category information 1 to the first category information 4 and the first category information 5 as the weight of the first category information, and superimposes the first category information 1 to the first category information 4 on the first category information 5 in a weighted summation mode to obtain the second category information 5. The training device performs the above operations on the first category information 1 to the first category information 4, and then obtains the second category information 1 to the second category information 4, it should be understood that the example in fig. 4 is only for facilitating understanding of the present solution, and is not used for limiting the present solution.
304. The training equipment performs feature extraction on the query image through the first neural network to obtain first feature information corresponding to the query image.
In the embodiment of the application, the training equipment inputs the query image into the first neural network, so as to extract the features of the query image through the feature extraction network in the first neural network, and obtain one or more feature maps corresponding to the query image; if the feature extraction network in the first neural network comprises M feature channels, M feature graphs corresponding to the M feature channels one to one are generated through the feature extraction network in the first neural network, and the value of M is an integer greater than or equal to 1; each eigen map may be expressed as a matrix, and one or more eigen maps corresponding to the query image may be expressed as a matrix or may be expressed as a three-dimensional tensor. Further, the feature extraction network in the first neural network may adopt a convolutional neural network, for example, a residual neural network ResNet-101, a residual neural network ResNet-50, a convolutional neural network VGG or other neural networks, and the like may be adopted.
The training device processes the feature map through a region extraction network (RPN) in the first neural network to obtain C candidate frames corresponding to the query image, where the candidate frames may also be referred to as a region to be selected, a region of interest (ROI), or other names. And the training equipment intercepts the feature graph corresponding to the query image according to the C candidate frames through the first neural network, so as to obtain C first feature information corresponding to the C candidate frames one by one. Each of the C pieces of first feature information includes feature information corresponding to M feature channels, and each piece of first feature information may be specifically expressed as a vector including M elements, where the M elements are in one-to-one correspondence with the M feature channels.
305. And the training equipment updates the first characteristic information through the first neural network according to the second category information so as to obtain second characteristic information corresponding to the query image.
In this embodiment of the application, after obtaining N second category information and C first feature information, the training device may update each first feature information by using the N second category information to obtain C times N second feature information corresponding to the query image.
Specifically, for target first feature information in the C first feature information and target second category information in the N second category information, the target first feature information is any one of the C first feature information, and the target second category information is any one of the N second category information. In one implementation manner, the training device calculates second similarity information between the second category information of the target and the first feature information of the target, and updates the first feature information of the target according to the second similarity information to obtain second feature information corresponding to the first feature information of the target.
It should be noted that, if the target first feature information and each of the N second category information perform the foregoing operation, the N second category information corresponding to the target first feature information can be obtained. Since each piece of first feature information in the C pieces of first feature information needs to perform an operation performed by the training device on the target first feature information, the training device may generate N pieces of second category information corresponding to each piece of first feature information, thereby obtaining C times N pieces of second feature information corresponding to the query image in total.
In the embodiment of the application, the second similarity information between each second category information and the first characteristic information is calculated, that is, the correlation between the second category information and the first characteristic information is calculated first, and the first characteristic information is updated based on the correlation between the second category information and the first characteristic information, since the N types of support images correspond to N second category information, but only one of the N second category information has the highest similarity with a specific first feature information, the second category information with the highest similarity should be used to update the first feature information, therefore, the updating operation is executed based on the second similarity information, so that the fineness of the process of generating the second feature information is improved, the second feature information which can more accurately reflect the features of the whole category can be obtained, and the accuracy of the output result of the whole neural network can be improved.
Further, in one case, the first category information is obtained through a feature extraction network in a first neural network, each of the N first category information includes feature information corresponding to M feature channels, and each of the second category information also includes feature information corresponding to M feature channels, and the second category information may be specifically expressed as a vector including M elements, where the M elements correspond to the M feature channels one to one. The second similarity information is used to indicate similarity between the target second category information and the target first feature information in each feature channel of the M feature channels, that is, the second similarity information is used to indicate similarity between the target second category information and the target first feature information at a feature channel level.
In the embodiment of the present application, since the N types of support images correspond to N pieces of second category information, but for a specific channel of a specific piece of first feature information, only a specific channel of the second category information among the N pieces of second category information has the highest similarity with the specific channel of the specific piece of first feature information, and the specific channel of the second category information with the highest similarity should be most used to update the specific channel of the piece of first feature information. Therefore, the updating operation is executed based on the channel feature similarity, the accuracy of generating the second feature information is further improved, the second category information which more accurately reflects the features of the whole category can be obtained, and the accuracy of the output result of the whole neural network can be improved.
More specifically, in one implementation manner, for the calculation manner of the second similarity information in this case, the training device performs normalization processing on both the target second category information and the target first feature information, and multiplies the target second category information subjected to the normalization processing by the target first feature information subjected to the normalization processing, so as to obtain similarities (i.e., the second similarity information) between the target second category information and the target first feature information in each of the M feature channels.
In another implementation manner, the training device performs normalization processing on the target second category information and the target first feature information, subtracts the normalized target second category information and the normalized target first feature information, and then performs normalization processing to obtain the similarity (i.e., the second similarity information) between each feature channel of the target second category information and each feature channel of the target first feature information in the M feature channels.
In another implementation manner, the training device performs normalization processing on the target second category information and the target first feature information, splices the target second category information subjected to normalization processing and the target first feature information subjected to normalization processing, and inputs the spliced target second category information and the target first feature information into the fully-connected network to obtain the second similarity information generated by the fully-connected network.
In another case, the second similarity information indicates a similarity of the second category information and the first feature information at an entire information hierarchy. The training equipment directly carries out similarity calculation on the second category information of the target and the first characteristic information of the target to obtain second similarity information. In the embodiment of the application, the second similarity information of the whole information level is calculated, the target first characteristic information is updated based on the similarity of the whole information level, and the similarity between the second category information and the first characteristic information can be more accurately measured due to the fact that the second similarity information is fused with the characteristics of the whole information level, so that the more proper second category information is selected to update the first characteristic information, and the accuracy of the generated second characteristic information is improved; in addition, another implementation mode for calculating the similarity information between the second category information and the first characteristic information is provided, and the implementation flexibility of the scheme is improved.
Further, in an implementation manner, the training device may perform similarity calculation on the target second category information and the target first feature information according to a preconfigured similarity calculation function to obtain second similarity information between the target second category information and the target first feature information, where the similarity calculation function includes, but is not limited to, cosine similarity, euclidean distance, mahalanobis distance, manhattan distance, or other functions used for calculating similarity, and the like, which is not exhaustive here.
In another implementation, the training device calculates a similarity between the target second category information and the target first feature information using a neural network to generate second similarity information between the target second category information and the target first feature information.
A process for generating second feature information from the second similarity information and the first feature information. In one implementation, after obtaining the second similarity information between the target second category information and the target first feature information, the training device may directly multiply the second similarity information with the target first feature information to obtain one second feature information corresponding to the target first feature information.
In another implementation manner, the training device may also determine a weight of the target second category information according to second similarity information between the target second category information and the target first feature information, and perform weighted summation on the target second category information and the target first feature information to obtain one piece of second feature information corresponding to the target first feature information. The training device may also calculate a second feature information corresponding to the target first feature information in other manners, which are not exhaustive here.
To further understand the present solution, an example of a formula for generating the second characteristic information is disclosed as follows:
Figure BDA0002954882190000161
wherein r isjRepresents the jth first characteristic information, g, in the C first characteristic informationiRepresents the ith second category information of the N second category information, AT(rj,gi) Representing second characteristic information which is generated according to the ith second type information and corresponds to the jth first characteristic information, sigma (.) represents a sigmoid function, and the sigmoid function is used for normalization processing,
Figure BDA0002954882190000162
which represents a multiplication, the number of which is,
Figure BDA0002954882190000163
representing the similarity between the jth first characteristic information and the ith second category information, it should be understood that equation (2) is only one example shown for convenience of understanding the present solution and is not intended to limit the present solution.
For a more intuitive understanding of the present disclosure, please refer to fig. 5, and fig. 5 is a schematic diagram illustrating the generation of the second feature information in the neural network training method according to the embodiment of the present disclosure. Wherein a1 represents any one of the N second category information, a2 represents any one of the C first feature information, and A3 represents one of the C by N second feature information. As shown in fig. 5, the training apparatus normalizes a1 and a2 by a sigmoid function, multiplies a1 subjected to the normalization and a2 subjected to the normalization by each other to obtain second similarity information between a1 and a2, and multiplies a2 by the second similarity information between a1 and a2 to obtain second feature information (i.e., A3 in fig. 5) corresponding to a 2. It should be understood that the example in fig. 5 is only for convenience of understanding the present solution and is not intended to limit the present solution.
In another implementation manner, after the training device obtains C pieces of first feature information and N pieces of second category information, for the target first feature information and the target second category information, the training device may also not calculate second similarity information between the target first feature information and the target second category information, but directly concatenate the target first feature information and the target second category information to complete updating of the target first feature information by using the target second category information, so as to obtain one piece of second feature information corresponding to the target first feature information.
In another implementation manner, the training device may further perform a weighted summation on the target first feature information and the target second category information to obtain one second feature information corresponding to the target first feature information. The training device may also use other ways to complete the update of the target first feature information by using the target second category information, which is not exhaustive here.
306. And the training equipment updates the first characteristic information through the first neural network according to the first category information so as to obtain third characteristic information corresponding to the query image.
In some embodiments of the application, after the training device obtains the C pieces of first feature information, each piece of first feature information in the C pieces of first feature information may be updated through the first neural network according to the N pieces of first category information, so as to obtain C times N pieces of third feature information corresponding to the query image. The specific implementation manner of step 306 is similar to the specific implementation manner of step 305, except that the second category information in step 305 is replaced with the first category information in step 306, and C times N second feature information is obtained in step 305, and C times N third feature information is obtained in step 306, and the specific implementation manner may refer to the description in step 305, which is not described herein again.
To further understand the present solution, an example of a formula for generating the third characteristic information is disclosed as follows:
Figure BDA0002954882190000171
wherein r isjRepresents the jth first characteristic information, h, of the C first characteristic informationiRepresenting the ith first category information of the N first category information, AC(rj,hi) Representing a third feature information corresponding to the jth first feature information generated based on the ith first category information,
Figure BDA0002954882190000172
representing addition, -subtraction, and FCN (.) representing a fully connected network, it should be understood that equation (3) is only one example shown to facilitate understanding of the present scheme and is not intended to limit the present scheme.
It should be noted that, in the embodiment of the present application, the execution sequence between step 306 and step 305 is not limited, and step 305 may be executed first, and then step 306 is executed; step 306 may be performed first, and then step 305 may be performed.
307. And the training equipment performs fusion operation on the third characteristic information and the second characteristic information through the first neural network to obtain updated second characteristic information corresponding to the query image.
In some embodiments of the application, after the training device obtains C by N second feature information and C by N third feature information, since N first category information and N second category information are in a one-to-one correspondence relationship, the C by N second feature information and the C by N third feature information are also in a one-to-one correspondence relationship, and then the third feature information and the second feature information may be fused by the first neural network according to a correspondence relationship between the C by N second feature information and the C by N third feature information, so as to obtain updated C by N updated second feature information corresponding to the query image. The above fusion operations include, but are not limited to, splicing, adding, subtracting, or other fusion methods, which are not exhaustive here.
308. The training equipment executes object detection operation through the first neural network according to the second characteristic information so as to obtain a first prediction result corresponding to the query image, wherein the first prediction result indicates the prediction position of at least one detection frame in the query image and the prediction category corresponding to each detection frame.
In some embodiments of the present application, steps 306 and 307 are optional steps, and if steps 306 and 307 are executed, step 308 may include: the training apparatus performs an object detection operation through the feature processing network in the first neural network according to the C-by-N updated second feature information generated through step 307, to obtain a first prediction result corresponding to the query image (i.e., a first prediction result output by the entire first neural network), where the first prediction result indicates a predicted position of at least one detection box in the query image and a prediction category corresponding to each detection box. The feature processing network in the first neural network may be specifically implemented by a Regional Convolutional Neural Network (RCNN), a Fast regional convolutional neural network (Fast RCNN), or another neural network.
For a more intuitive understanding of the present disclosure, please refer to fig. 6, and fig. 6 is a schematic flowchart illustrating a process of generating a first prediction result in the neural network training method according to the embodiment of the present disclosure. As shown in fig. 6, the training device acquires N types of support images and one query image. B1, the training equipment acquires N pieces of first category information corresponding to the N types of support images; b2, the training equipment calculates first similarity information according to the N pieces of first category information, wherein the first similarity information indicates the similarity between any two pieces of first category information in the N pieces of first category information; b3, generating N pieces of second category information by the training equipment according to the first similarity information and the N pieces of first category information; b4, the training equipment performs feature extraction on the query image through a first neural network to obtain C pieces of first feature information corresponding to the query image; b5, the training equipment updates each first feature information according to the N second category information through the first neural network to obtain N times C second feature information corresponding to the query image; b6, the training equipment updates each first feature information according to the N first class information through the first neural network to obtain N times C third feature information corresponding to the query image; b7, the training equipment performs fusion operation on the N-by-C second characteristic information and the N-by-C third characteristic information through the first neural network to obtain N-by-C updated second characteristic information; b8, the training device outputs a first prediction result corresponding to the query image according to the N times C updated second feature information through the first neural network, wherein the first prediction result indicates at least one detection box corresponding to the query image and a prediction category of each detection box, and it should be understood that the example in fig. 6 is only for convenience of understanding the scheme and is not used for limiting the scheme.
In the embodiment of the application, the first characteristic information is updated according to the second category information to obtain the second characteristic information, the first characteristic information is also updated according to the first category information to obtain the third characteristic information, the third characteristic information and the second characteristic information are further fused to obtain the updated second characteristic information, and the updated second characteristic information is subjected to characteristic processing; because the second characteristic information is obtained based on the second category information, that is, in order to avoid overfitting of the trained neural network, the second characteristic information includes information of other categories, and the third characteristic information ensures information of a single category as much as possible, that is, the second characteristic information and the third characteristic information have complementarity, the trained neural network can synthesize more abundant information to execute the characteristic processing operation, so as to improve the accuracy of the prediction result output by the neural network.
If steps 306 and 307 are not performed, step 308 may include: the training apparatus performs an object detection operation by the feature processing network in the first neural network according to the C times N second feature information generated by step 305 to obtain a first prediction result corresponding to the query image (i.e., a first prediction result output by the entire first neural network).
For a more intuitive understanding of the present disclosure, please refer to fig. 7, and fig. 7 is a schematic flowchart illustrating a process of generating a first prediction result in a training method of a neural network according to an embodiment of the present disclosure. In fig. 7, taking the value of N as 3 as an example, the 3 types of support images are support images of a dog, a cat, and a car, respectively, the training device performs feature extraction on the 3 types of support images through a feature extraction network of a first neural network, so as to obtain 3 pieces of first-type information corresponding to the 3 types of support images, and one piece of first-type information represents the features of one type of image. In fig. 7, only in the process of generating the second category information corresponding to the category "dog", the training device performs similarity calculation on the first category information corresponding to the category "dog" and the first category information corresponding to the category "cat" to obtain a similarity 1; similarity calculation is carried out on the first category information corresponding to the category of the dog and the first category information corresponding to the category of the car, and similarity 2 is obtained. The training equipment multiplies the first category information corresponding to the category of cat by the similarity 1 to obtain a product result 1, multiplies the first category information corresponding to the category of car by the similarity 2 to obtain a product result 2, and sums the first category information corresponding to the category of dog, the product result 1 and the product result 2 to obtain the second category information corresponding to the category of dog; the training device repeatedly executes the above-described operations, and can obtain the second category information corresponding to the category "cat" and the second category information corresponding to the category "car".
The training device may also perform feature extraction on the query image through a feature extraction network of the first neural network to generate C pieces of first feature information corresponding to the query image, where a value of C is taken as 3 in fig. 7 as an example. After the training device obtains 3 second category information and 3 first feature information corresponding to the 3 categories, it may update each first feature information according to the 3 second category information to obtain 9 (i.e., 3 times 3) second feature information, and for the description of the specific implementation manner of "updating the first feature information according to the second category information" in fig. 7, reference may be made to the description of fig. 5 above, which is not repeated herein. The training device generates a first prediction result corresponding to the query image through the feature processing network of the first neural network according to the 9 pieces of second feature information, as shown in fig. 7, where the first prediction result indicates a predicted position of one detection box corresponding to the query image and a prediction category (i.e., "dog" in fig. 7) corresponding to the detection box, and it should be understood that the example in fig. 7 is only for convenience of understanding the present solution and is not used to limit the present solution.
309. The training equipment trains the first neural network according to a first expected result, a first prediction result and a first loss function corresponding to the query image until a preset condition is met, and the first loss function indicates the similarity between the first prediction result and the first expected result.
In this embodiment, the training device may calculate a function value of the first loss function according to the first expected result and the first predicted result corresponding to the query image, and train the first neural network according to the function value of the first loss function. Specifically, the training device may perform gradient update on the weight parameter of the first neural network in a back propagation manner according to the function value of the first loss function, so as to complete one training of the graph neural network. The training device may repeatedly perform steps 301 to 309 a plurality of times to enable iterative training of the first neural network.
Wherein the first expected result indicates a correct position of at least one detection frame corresponding to the query image and a correct category corresponding to each detection frame. The first penalty function indicates a degree of similarity between the first predicted outcome and the first expected outcome. The training goal of the training device is to approximate the similarity between the first predicted outcome and the first desired outcome. The preset condition may be that the function value of the first loss function is greater than or less than a preset threshold, or that the number of times of the iterative training is greater than a preset number of times.
(II) reasoning phase
In the embodiment of the present application, specifically, please refer to fig. 8, where fig. 8 is a schematic flowchart of an image processing method according to the embodiment of the present application, and the image processing method according to the embodiment of the present application may include:
801. the execution equipment acquires N pieces of second category information, wherein one piece of second category information indicates updated feature information of one category of images, the N pieces of second category information are obtained according to the N pieces of first category information and the first similarity information, one piece of first category information indicates the feature information of one category of images, and the first similarity information indicates the similarity between each piece of first category information in the N pieces of first category information.
In this embodiment of the application, in one case, the execution device generates N pieces of second category information in a manner that steps 301 to 303 in the embodiment corresponding to fig. 3 are used, that is, the execution device also generates N pieces of second category information according to N types of labeled support images, where the N types of labeled support images may be sent to the execution device by the training device or may be obtained by the execution device during the operation process.
Under another condition, the training equipment calculates to obtain N second category information according to N support images included in the training data of the first neural network, and sends the N second category information to the execution equipment; correspondingly, the executing device receives the N pieces of second category information.
802. The executing equipment extracts the features of the image to be processed through the first neural network to obtain first feature information corresponding to the image to be processed.
803. And the execution equipment updates the first characteristic information through the first neural network according to the second category information so as to obtain second characteristic information corresponding to the image to be processed.
804. And the execution equipment updates the first characteristic information through the first neural network according to the first class information so as to obtain third characteristic information corresponding to the image to be processed.
805. And the execution equipment executes fusion operation on the third characteristic information and the second characteristic information through the first neural network so as to obtain updated second characteristic information corresponding to the image to be processed.
806. The execution device executes the object detection operation through the first neural network according to the second characteristic information to obtain a first prediction result corresponding to the image to be processed, wherein the first prediction result indicates the prediction position of at least one detection frame in the image to be processed and the prediction category corresponding to each detection frame.
In this embodiment of the application, a specific implementation manner of the executing device executing steps 802 to 806 is similar to an implementation manner of steps 304 to 308 in the embodiment corresponding to fig. 3, except that the query images in steps 304 to 308 in the embodiment corresponding to fig. 3 are replaced with the images to be processed in steps 802 to 806.
Second, neural network is used for image classification
In some embodiments of the present application, a neural network is used for image classification, and a specific implementation flow of a training phase and an inference phase of the neural network provided in the embodiments of the present application is described below.
(I) training phase
In some embodiments of the present application, specifically, please refer to fig. 9, where fig. 9 is a schematic flowchart of a training method of a neural network provided in the embodiment of the present application, and the training method of the neural network provided in the embodiment of the present application may include:
901. the training device obtains a training image set, wherein the training image set comprises N types of supporting images and a query image.
902. The training equipment acquires N pieces of first category information corresponding to the N types of support images, wherein the first category information indicates feature information of one type of support images before updating.
903. The training equipment updates each piece of first category information according to the N pieces of first category information and the first similarity information to obtain N pieces of second category information, wherein the first similarity information indicates the similarity between the pieces of first category information in the N pieces of first category information, and one piece of second category information indicates the updated feature information of one type of supporting image.
In the embodiment of the present application, a specific implementation manner of the training device to execute steps 901 to 903 is similar to a specific implementation manner of the training device to execute steps 301 to 303 in the embodiment corresponding to fig. 3, and can be understood by referring to the above description. The training device is also pre-configured with training data of a first neural network, where the training data of the first neural network includes a plurality of images and a desired result corresponding to each image. The difference is that since the first neural network functions to perform image classification, the expected result in the present application scenario corresponding to the support image is used to indicate the correct classification of the entire support image.
904. The training equipment performs feature extraction on the query image through the first neural network to obtain first feature information corresponding to the query image.
905. And the training equipment updates the first characteristic information through the first neural network according to the second category information so as to obtain second characteristic information corresponding to the query image.
906. And the training equipment updates the first characteristic information through the first neural network according to the first category information so as to obtain third characteristic information corresponding to the query image.
907. And the training equipment performs fusion operation on the third characteristic information and the second characteristic information through the first neural network to obtain updated second characteristic information corresponding to the query image.
In the embodiment of the present application, the specific implementation manner of the training device to execute steps 904 to 907 is similar to the specific implementation manner of the training device to execute steps 304 to 307 in the corresponding embodiment of fig. 3, and can be understood by referring to the above description. The difference is that in step 904, the training device performs feature extraction on the query image through the first neural network to obtain one piece of first feature information corresponding to the entire query image, where the one piece of first feature information is used to indicate features of the entire query image.
Correspondingly, in step 905, the training device updates the first feature information through the first neural network according to the second category information, so as to obtain N second feature information corresponding to the query image. In step 906, the training device updates the first feature information through the first neural network according to the first category information to obtain N third feature information corresponding to the query image.
908. And the training equipment executes classification operation through the first neural network according to the second characteristic information to obtain a second prediction result corresponding to the query image, wherein the second prediction result indicates the prediction category of the query image.
909. The training equipment trains the first neural network according to a second expected result, a second predicted result and a first loss function corresponding to the query image until a preset condition is met, and the first loss function indicates the similarity between the second predicted result and the second expected result.
In the embodiment of the present application, the specific implementation manner of the training apparatus to execute the steps 908 and 909 is similar to the specific implementation manner of the training apparatus to execute the steps 308 and 309 in the corresponding embodiment of fig. 3, and can be understood by referring to the above description. The difference is that in steps 908 and 909, the training device performs a classification operation through the first neural network, the second prediction result indicates a prediction class of the entire query image, and the second desired result indicates a correct class of the entire query image.
In the embodiment of the application, the first neural network not only can be a neural network for object detection, but also can be a neural network for image classification, so that the application scene of the scheme is expanded, and the realization flexibility of the scheme is improved.
(II) reasoning phase
In some embodiments of the present application, specifically, please refer to fig. 10, where fig. 10 is a schematic flowchart of an image processing method provided in the embodiment of the present application, and the image processing method provided in the embodiment of the present application may include:
1001. the execution equipment acquires N pieces of second category information, wherein one piece of second category information indicates updated feature information of one category of images, the N pieces of second category information are obtained according to the N pieces of first category information and the first similarity information, one piece of first category information indicates the feature information of one category of images, and the first similarity information indicates the similarity between each piece of first category information in the N pieces of first category information.
In this embodiment of the application, in one case, the execution device generates N pieces of second category information in a manner that steps 801 to 803 in the embodiment corresponding to fig. 8 are used, that is, the execution device also generates N pieces of second category information according to N types of labeled support images, where the N types of labeled support images may be sent to the execution device by the training device or may be obtained by the execution device during the operation process.
Under another condition, the training equipment calculates to obtain N second category information according to N support images included in the training data of the first neural network, and sends the N second category information to the execution equipment; correspondingly, the executing device receives the N pieces of second category information.
1002. The executing equipment extracts the features of the image to be processed through the first neural network to obtain first feature information corresponding to the image to be processed.
1003. And the execution equipment updates the first characteristic information through the first neural network according to the second category information so as to obtain second characteristic information corresponding to the image to be processed.
1004. And the execution equipment updates the first characteristic information through the first neural network according to the first class information so as to obtain third characteristic information corresponding to the image to be processed.
1005. And the execution equipment executes fusion operation on the third characteristic information and the second characteristic information through the first neural network so as to obtain updated second characteristic information corresponding to the image to be processed.
1006. And the execution equipment executes classification operation through the first neural network according to the second characteristic information so as to obtain a second prediction result corresponding to the query image, wherein the second prediction result indicates the prediction category of the query image.
In this embodiment of the application, a specific implementation manner of the executing device executing steps 1002 to 1006 is similar to an implementation manner of steps 904 to 908 in the embodiment corresponding to fig. 9, except that the query image in steps 904 to 908 in the embodiment corresponding to fig. 3 is replaced with the to-be-processed image in steps 1002 to 1006.
In the above embodiments, after acquiring N pieces of second category information and one piece of first feature information, the first feature information is updated by using the N pieces of second category information, but in the case where the neural network is used for image classification, after acquiring the N pieces of second category information, the similarity between the first feature information and each piece of second category information may be directly calculated, so as to obtain the prediction category corresponding to the image directly from the similarity between the first feature information and each piece of second category information.
In the embodiment of the present application, since the number of the support images of the target category (any one of the N categories) in the N categories is small, a large deviation exists between the first category information corresponding to the support images of the target category and the actual feature information of the image of the entire target category, that is, the first category information corresponding to the target category cannot correctly reflect the actual feature of the image of the entire target category, after the category information corresponding to the other categories is introduced into the first category information corresponding to the target category, the deviation between the first category information corresponding to the target category and the actual feature information of the image of the entire target category can be reduced, that is, the second category information corresponding to the target category can more accurately reflect the actual feature of the image of the entire target category, and the second feature information is obtained based on the second category information, so that the probability that the second feature information is over-fitted to a small number of training images is reduced, the method is also beneficial to acquiring the characteristic information which can correctly reflect the whole category, thereby improving the accuracy of the prediction result output by the whole neural network. In addition, the embodiment of the application not only provides a specific implementation scheme of the training phase of the first neural network, but also provides a specific implementation scheme of the reasoning phase, and the completeness of the scheme is improved.
(I) training phase
In an embodiment of the present application, specifically, please refer to fig. 11, where fig. 11 is a schematic flowchart of a training method of a neural network provided in an embodiment of the present application, and the training method of the neural network provided in the embodiment of the present application may include:
1101. the training device obtains a training image set, wherein the training image set comprises N types of supporting images and a query image.
1102. The training equipment acquires N pieces of first category information corresponding to the N types of support images, wherein the first category information indicates feature information of one type of support images before updating.
1103. The training equipment updates each piece of first category information according to the N pieces of first category information and the first similarity information to obtain N pieces of second category information, wherein the first similarity information indicates the similarity between the pieces of first category information in the N pieces of first category information, and one piece of second category information indicates the updated feature information of one type of supporting image.
1104. The training equipment performs feature extraction on the query image through the second neural network to obtain first feature information corresponding to the query image.
In the embodiment of the present application, the specific implementation manner of the training apparatus to execute steps 1101 to 1104 is similar to the specific implementation manner of the training apparatus to execute steps 901 to 904 in the corresponding embodiment of fig. 9, and can be understood by referring to the above description. The second neural networks in steps 1101 to 1104 have similar meanings to the first neural networks in steps 901 to 904, and are all neural networks for image classification, and since the first neural network can also be interpreted as a neural network for object detection in the embodiment corresponding to fig. 3, the first neural network and the second neural network are used for distinguishing in the embodiment of the present application.
1105. The training apparatus calculates third similarity information indicating a similarity between the first feature information and each of the N second category information.
In some embodiments of the present application, after obtaining the N second category information and the first feature information corresponding to the query image, the training device may calculate third similarity information, where the third similarity information indicates a similarity between the first feature information and each of the N second category information, and may also be referred to as a distance between the first feature information and each of the N second category information.
Specifically, the third similarity information is calculated. In an implementation manner, a similarity calculation function may be preconfigured in the training device, and the training device may calculate similarities between the first feature information and each of the N second category information one by one through the preconfigured similarity calculation function to generate third similarity information, where the similarity calculation function includes, but is not limited to, cosine similarity, euclidean distance, mahalanobis distance, manhattan distance, or other functions for calculating similarity, and the like, which is not exhaustive here.
In another implementation, the neural network layer in the second neural network may be utilized in the training device to calculate the similarity between the first feature information and each of the N second category information one by one, so as to generate third similarity information.
Optionally, the training device may further update the first feature information corresponding to the query image by using the N second category information to obtain updated first feature information (that is, the second feature information in the embodiment corresponding to fig. 3), and correspondingly, the third similarity information indicates a similarity between the updated first feature information and each of the N second category information, and a specific implementation manner of the training device updating the first feature information corresponding to the query image by using the N second category information may refer to the description in the embodiment corresponding to fig. 3, which is not described herein again.
1106. And the training equipment generates first indication information through the second neural network according to the third similarity information, wherein the first indication information is used for indicating the prediction category corresponding to the query image.
In some embodiments of the application, after obtaining the third similarity information, the training device may generate the first indication information through the second neural network according to the third similarity information. The higher the similarity between the first feature information and one of the N second category information, the higher the probability that the query image is the category to which the second category information points. The first indication information is used for indicating a prediction category corresponding to the query image; further, the first indication information may include N probability values indicating probabilities that prediction categories corresponding to the query image are N categories, respectively. Alternatively, the first indication information is a certain category.
1107. The executing equipment trains the second neural network according to second indicating information, the first indicating information and a second loss function corresponding to the query image until a preset condition is met, wherein the second indicating information is used for indicating a correct category corresponding to the query image, and the second loss function indicates the similarity between the first indicating information and the second indicating information.
In this embodiment of the present application, a specific implementation manner of the training device to execute step 1107 is similar to a specific implementation manner of the training device to execute step 907 in the embodiment corresponding to fig. 9, and can be understood by referring to the above description.
In the embodiment of the application, because a training mode of small samples is adopted, the number of the supporting images of each category in the N categories is small, and N pieces of first category information obtained according to the N categories of supporting images can hardly reflect the information of each category correctly; after the category information corresponding to other categories is introduced into the first category information corresponding to the target category (any one of the N categories), the deviation between the first category information corresponding to the target category and the actual feature information of the image of the whole target category can be reduced, that is, the second category information corresponding to the target category can more accurately reflect the actual feature of the image of the whole target category, that is, compared with the N first category information, the N second category information can more accurately reflect the actual feature of the image of the N categories; since the predicted category of the query image is obtained by calculating the similarity between the feature information of the query image and the N category information corresponding to the N category images, the accuracy of the predicted category of the query image output by the neural network can be improved as the accuracy of the N category information corresponding to the N category images is improved.
(II) reasoning phase
In the embodiment of the present application, specifically, please refer to fig. 12, where fig. 12 is a flowchart illustrating a method for processing an image according to the embodiment of the present application, where the method for processing an image according to the embodiment of the present application may include:
1201. the execution equipment acquires N pieces of second category information, wherein one piece of second category information indicates updated feature information of one category of images, the N pieces of second category information are obtained according to the N pieces of first category information and the first similarity information, one piece of first category information indicates the feature information of one category of images, and the first similarity information indicates the similarity between each piece of first category information in the N pieces of first category information.
1202. The executing equipment extracts the features of the image to be processed through the first neural network to obtain first feature information corresponding to the image to be processed.
In the embodiment of the present application, the specific implementation manner of the training device to perform steps 1201 and 1202 is similar to the specific implementation manner of the training device to perform steps 1101 and 1102 in the embodiment corresponding to fig. 10, and can be understood by referring to the above description.
1203. The execution device calculates third similarity information indicating a similarity between the first feature information and each of the N second category information.
1204. And the execution equipment generates first indication information through the second neural network according to the third similarity information, wherein the first indication information is used for indicating a prediction category corresponding to the image to be processed.
In this embodiment of the present application, a specific implementation manner of executing steps 1203 and 1204 by the device is similar to a specific implementation manner of executing steps 1105 and 1106 by the device in the embodiment corresponding to fig. 11, and can be understood by referring to the above description.
In order to more intuitively understand the beneficial effects brought by the embodiments of the present application, taking the neural network as an example for object detection, the data set PASCAL VOC is used for evaluation, and the evaluation result on the data set PASCAL VOC is shown in the following table 1.
K=1 K=2 K=3 K=5
TFA(fc) 22.9 34.5 40.4 46.7
FA 24.2 35.3 42.2 49.1
The embodiments of the present application 25.8 36.9 43.4 51.1
TABLE 1
In table 1, the measurement index is Mean Average Precision (MAP), the larger the index is, the better the performance of the neural network is, and referring to table 1, the performance of the neural network obtained by the training method provided by the embodiment of the present application is the best in all four cases, that is, the K value is 1, 2, 3, and 5.
On the basis of the embodiments corresponding to fig. 1 to 12, in order to better implement the above-mentioned scheme of the embodiments of the present application, the following also provides related equipment for implementing the above-mentioned scheme. Referring to fig. 13, fig. 13 is a schematic structural diagram of a training apparatus for a neural network according to an embodiment of the present disclosure, where the training apparatus 1300 for a neural network includes: an obtaining module 1301, configured to obtain a training image set, where the training image set includes N types of support images and a query image, and N is an integer greater than 1; the obtaining module 1301 is further configured to obtain N pieces of first category information corresponding to the N types of support images, and update each piece of first category information according to the N pieces of first category information and the first similarity information to obtain N pieces of second category information, where one piece of first category information indicates feature information of one type of support image before update, the first similarity information indicates similarity between each piece of first category information in the N pieces of first category information, and one piece of second category information indicates updated feature information of one type of support image; a generating module 1302, configured to perform feature extraction on the query image through a first neural network to obtain at least one piece of first feature information corresponding to the query image, and update the at least one piece of first feature information through the first neural network according to the N pieces of second category information to obtain at least one piece of second feature information corresponding to the query image; the processing module 1303 is configured to perform a feature processing operation through the first neural network according to the at least one piece of second feature information to obtain a prediction result corresponding to the query image; the training module 1304 is configured to train the first neural network according to the expected result, the predicted result, and the first loss function corresponding to the query image until a preset condition is met, where the first loss function indicates a similarity between the predicted result and the expected result.
In a possible design, please refer to fig. 14, fig. 14 is a schematic structural diagram of a training apparatus for a neural network provided in an embodiment of the present application, and the generating module 1302 includes a calculating sub-module 13021 and an updating sub-module 13022, where the calculating sub-module 13021 is configured to calculate second similarity information according to N second category information and at least one first feature information, where the second similarity information indicates a similarity between the second category information and the first feature information; the updating submodule 13022 is configured to update the first feature information according to the second similarity information to obtain second feature information.
In one possible design, the first neural network includes M feature channels in the feature extraction network, the second category information includes feature information corresponding to the M feature channels, the first feature information includes feature information corresponding to the M feature channels, and the second similarity information is used to indicate similarities between the second category information and each of the M feature channels of the first feature information.
In one possible design, the calculating sub-module 13021 is specifically configured to perform similarity calculation on the second category information and the first feature information to obtain second similarity information, where the second similarity information indicates the similarity of the second category information and the first feature information in the whole information hierarchy.
In one possible design, referring to fig. 14, the training apparatus 1300 for neural network further includes: an updating module 1305, configured to update, according to the N pieces of first category information, at least one piece of first feature information through a first neural network to obtain at least one piece of third feature information corresponding to the query image; a fusion module 1306, configured to perform a fusion operation on the at least one third feature information and the at least one second feature information through the first neural network to obtain at least one updated second feature information corresponding to the query image; the processing module 1303 is specifically configured to execute a feature processing operation through the first neural network according to the at least one updated second feature information.
In one possible design, the processing module 1303 is specifically configured to: according to the at least one piece of second characteristic information, performing object detection operation through the first neural network to obtain a first prediction result corresponding to the query image, wherein the first prediction result indicates the prediction position of at least one detection frame in the query image and the prediction category corresponding to each detection frame; or, according to the at least one piece of second characteristic information, performing a classification operation through the first neural network to obtain a second prediction result corresponding to the query image, wherein the second prediction result indicates the prediction category of the query image.
It should be noted that, the information interaction, the execution process, and the like between the modules/units in the training apparatus 1300 of the neural network are based on the same concept as the method embodiments corresponding to fig. 3 to fig. 7 and fig. 9 in the present application, and specific contents may refer to the description in the foregoing method embodiments in the present application, and are not repeated herein.
Referring to fig. 15, fig. 15 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application, where the image processing apparatus 1500 includes: an obtaining module 1501, configured to obtain N pieces of second category information, where one piece of second category information indicates updated feature information of one category of images, the second category information is obtained according to N pieces of first category information and first similarity information, one piece of first category information indicates feature information of one category of images, the first similarity information indicates similarity between each piece of first category information in the N pieces of first category information, and N is an integer greater than or equal to 1; the generating module 1502 is configured to perform feature extraction on the image to be processed through the first neural network to obtain at least one piece of first feature information corresponding to the image to be processed, and update the at least one piece of first feature information through the first neural network according to the N pieces of second category information to obtain at least one piece of second feature information corresponding to the image to be processed; the processing module 1503 is configured to perform a feature processing operation through the first neural network according to the at least one second feature information, so as to obtain a prediction result corresponding to the image to be processed.
In a possible design, please refer to fig. 16, fig. 16 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present disclosure, and the image processing apparatus 1500 further includes: an updating module 1504, configured to update the first feature information through the first neural network according to the N second category information to obtain updated first feature information corresponding to the query image, where the third similarity information indicates a similarity between the updated first feature information and each of the N second category information.
It should be noted that, the contents of information interaction, execution process, and the like between the modules/units in the image processing apparatus 1500 are based on the same concept as the method embodiments corresponding to fig. 8 and fig. 10 in the present application, and specific contents may refer to the description in the foregoing method embodiments in the present application, and are not described herein again.
Referring to fig. 17, fig. 17 is a schematic structural diagram of a training apparatus for a neural network provided in an embodiment of the present application, where the training apparatus 1700 for a neural network includes: an obtaining module 1701, configured to obtain a training image set, where the training image set includes a query image and N types of support images, and N is an integer greater than or equal to 1; the obtaining module 1701 is further configured to obtain N pieces of first category information corresponding to the N types of support images, and update each piece of first category information according to the N pieces of first category information and the first similarity information to obtain N pieces of second category information, where one piece of first category information indicates feature information of one type of support image before update, the first similarity information indicates similarity between each piece of first category information of the N pieces of first category information, and one piece of second category information indicates updated feature information of one type of support image; a generating module 1702, configured to perform feature extraction on the query image through a second neural network to obtain first feature information corresponding to the query image, and calculate third similarity information, where the third similarity information indicates a similarity between the first feature information and each piece of second category information in the N pieces of second category information; a processing module 1703, configured to generate, according to the third similarity information, first indication information through the second neural network, where the first indication information is used to indicate a prediction category corresponding to the query image; a training module 1704, configured to train a second neural network according to second indication information, the first indication information, and a second loss function corresponding to the query image until a preset condition is satisfied, where the second indication information is used to indicate a correct category corresponding to the query image, and the second loss function indicates a similarity between the first indication information and the second indication information.
In a possible design, please refer to fig. 18, fig. 18 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application, and the generating module 1702 includes a calculating submodule 17021 and an updating submodule 17022, where the calculating submodule 17021 is configured to calculate second similarity information according to N second category information and at least one first feature information, and the second similarity information indicates a similarity between the second category information and the first feature information; and an updating submodule 17022, configured to update the first feature information according to the second similarity information to obtain second feature information.
It should be noted that, the information interaction, the execution process, and the like between the modules/units in the training apparatus 1700 of the neural network are based on the same concept as those of the method embodiments corresponding to fig. 11 in the present application, and specific contents thereof may be referred to the description of the foregoing method embodiments in the present application, and are not described herein again.
Referring to fig. 19, fig. 19 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application, and an image processing apparatus 1900 includes: an obtaining module 1901, configured to obtain N pieces of second category information, where one piece of second category information indicates updated feature information of one category of images, the second category information is obtained according to N pieces of first category information and first similarity information, one piece of first category information indicates feature information of one category of images, the first similarity information indicates similarity between each piece of first category information in the N pieces of first category information, and N is an integer greater than or equal to 1; a generating module 1902, configured to perform feature extraction on the image to be processed through the first neural network to obtain at least one piece of first feature information corresponding to the image to be processed, and update the at least one piece of first feature information through the first neural network according to the N pieces of second category information to obtain at least one piece of second feature information corresponding to the image to be processed; a processing module 1903, configured to perform a feature processing operation through the first neural network according to the at least one second feature information, so as to obtain a prediction result corresponding to the image to be processed.
It should be noted that, the contents of information interaction, execution process, and the like between the modules/units in the image processing apparatus 1900 are based on the same concept as that of the method embodiments corresponding to fig. 12 in the present application, and specific contents may refer to the description in the foregoing method embodiments in the present application, and are not described herein again.
Referring to fig. 20, fig. 20 is a schematic structural diagram of an execution device provided in the embodiment of the present application, and the execution device 2000 may be embodied as a virtual reality VR device, a mobile phone, a tablet, a notebook computer, an intelligent wearable device, a monitoring data processing device, or a radar data processing device, which is not limited herein. Specifically, the execution apparatus 2000 includes: a receiver 2001, a transmitter 2002, a processor 2003 and a memory 2004 (wherein the number of processors 2003 in the execution device 2000 may be one or more, one processor is taken as an example in fig. 20), wherein the processor 2003 may include an application processor 20031 and a communication processor 20032. In some embodiments of the present application, the receiver 2001, the transmitter 2002, the processor 2003, and the memory 2004 may be connected by a bus or other means.
Memory 2004 may include both read-only memory and random access memory, and provides instructions and data to processor 2003. A portion of the memory 2004 may also include non-volatile random access memory (NVRAM). The memory 2004 stores the processor and operating instructions, executable modules or data structures, or a subset or an expanded set thereof, wherein the operating instructions may include various operating instructions for performing various operations.
The processor 2003 controls the operation of the execution apparatus. In a particular application, the various components of the execution device are coupled together by a bus system that may include a power bus, a control bus, a status signal bus, etc., in addition to a data bus. For clarity of illustration, the various buses are referred to in the figures as a bus system.
The method disclosed in the above embodiments of the present application may be applied to the processor 2003 or implemented by the processor 2003. The processor 2003 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by instructions in the form of hardware, integrated logic circuits, or software in the processor 2003. The processor 2003 may be a general-purpose processor, a Digital Signal Processor (DSP), a microprocessor or a microcontroller, and may further include an Application Specific Integrated Circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components. The processor 2003 may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 2004 and the processor 2003 reads the information in the memory 2004 and in combination with its hardware performs the steps of the method described above.
Receiver 2001 may be used to receive input numeric or character information and to generate signal inputs related to performing device related settings and function control. The transmitter 2002 may be configured to output numeric or character information via the first interface; the transmitter 2002 is also operable to send a command to the disk group via the first interface to modify data in the disk group; the transmitter 2002 may also include a display device such as a display screen.
In this embodiment of the application, in one case, the image processing apparatus 1500 described in the corresponding embodiment of fig. 15 or fig. 16 may be disposed on the execution device 2000, and the application processor 20031 in the processor 2003 is used to implement the function of the execution device in the corresponding embodiment of fig. 8 or fig. 10. It should be noted that, the specific manner in which the application processor 20031 executes the above steps is based on the same concept as that of each method embodiment corresponding to fig. 8 or fig. 10 in the present application, and the technical effect brought by the method embodiment is the same as that of each method embodiment corresponding to fig. 8 or fig. 10 in the present application, and specific contents may refer to descriptions in the foregoing method embodiments in the present application, and are not described again here.
In this embodiment, in one case, the image processing apparatus 1900 described in the corresponding embodiment of fig. 19 may be disposed on the execution device 2000, and the application processor 20031 in the processor 2003 is used to implement the function of the execution device in the corresponding embodiment of fig. 12. It should be noted that, the specific manner in which the application processor 20031 executes the above steps is based on the same concept as that of each method embodiment corresponding to fig. 12 in the present application, and the technical effect brought by the specific manner is the same as that of each method embodiment corresponding to fig. 12 in the present application, and specific contents may refer to the description in the foregoing method embodiments in the present application, and are not described again here.
Referring to fig. 21, fig. 21 is a schematic structural diagram of a training device provided in the embodiment of the present application, specifically, the training device 2100 is implemented by one or more servers, and the training device 2100 may generate a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 2122 (e.g., one or more processors) and a memory 2132, and one or more storage media 2130 (e.g., one or more mass storage devices) for storing an application program 2142 or data 2144. Memory 2132 and storage medium 2130 may be, among others, transient storage or persistent storage. The program stored on storage medium 2130 may include one or more modules (not shown), each of which may include a sequence of instructions for operating on the exercise device. Still further, central processor 2122 may be configured to communicate with storage medium 2130 for performing a series of instructional operations on training device 2100, storage medium 2130.
The training apparatus 2100 may also include one or more power supplies 2126, one or more wired or wireless network interfaces 2150, one or more input-output interfaces 2158, and/or one or more operating systems 2141, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and so forth.
In this embodiment of the application, in one case, the training apparatus 1300 of the neural network described in the embodiment corresponding to fig. 13 or fig. 14 is deployed on the training device 2100, and the central processor 2122 is configured to implement the functions of the training device in the embodiments corresponding to fig. 3 to fig. 7 and fig. 9, it should be noted that a specific manner in which the central processor 2122 executes the above steps is based on the same concept as that of each method embodiment corresponding to fig. 3 to fig. 7 and fig. 9 in the application, and the technical effects brought by the method embodiment are the same as that of each method embodiment corresponding to fig. 3 to fig. 7 and fig. 9 in the application, and specific contents may refer to descriptions in the foregoing method embodiments in the application, and are not repeated herein.
In another case, the training device 1700 of the neural network described in the embodiment corresponding to fig. 17 or fig. 18 is deployed on the training apparatus 2100, and the central processor 2122 is configured to implement the function of the training apparatus in the embodiment corresponding to fig. 11, it should be noted that a specific manner in which the central processor 2122 executes the above steps is based on the same concept as that of each method embodiment corresponding to fig. 11 in the present application, and the technical effect brought by the specific manner is the same as that of each method embodiment corresponding to fig. 11 in the present application, and specific contents may refer to descriptions in the foregoing method embodiments in the present application, and are not described herein again.
Embodiments of the present application also provide a computer program product, which when executed on a computer, causes the computer to execute the steps executed by the apparatus in the method described in the foregoing embodiment shown in fig. 8 or 10, or causes the computer to execute the steps executed by the training apparatus in the method described in the foregoing embodiments shown in fig. 3 to 7 and 9, or causes the computer to execute the steps executed by the apparatus in the method described in the foregoing embodiment shown in fig. 12, or causes the computer to execute the steps executed by the training apparatus in the method described in the foregoing embodiment shown in fig. 11.
Also provided in the embodiments of the present application is a computer-readable storage medium, in which a program for signal processing is stored, and when the program is run on a computer, the program causes the computer to execute the steps executed by an apparatus in the method described in the foregoing embodiment shown in fig. 8 or 10, or causes the computer to execute the steps executed by an apparatus in the method described in the foregoing embodiments shown in fig. 3 to 7 and 9, or causes the computer to execute the steps executed by an apparatus in the method described in the foregoing embodiment shown in fig. 12, or causes the computer to execute the steps executed by an apparatus in the method described in the foregoing embodiment shown in fig. 11.
The image processing apparatus, the training apparatus of the neural network, the execution device, and the training device provided in the embodiment of the present application may specifically be a chip, where the chip includes: a processing unit, which may be for example a processor, and a communication unit, which may be for example an input/output interface, a pin or a circuit, etc. The processing unit may execute the computer-executable instructions stored in the storage unit to enable the chip to execute the image processing method described in the embodiment shown in fig. 8 or 10, or to enable the chip to execute the neural network training method described in the embodiments shown in fig. 3 to 7 and 9, or to enable the chip to execute the image processing method described in the embodiment shown in fig. 12, or to enable the chip to execute the neural network training method described in the embodiment shown in fig. 11. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, and the like, and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as a read-only memory (ROM) or another type of static storage device that can store static information and instructions, a Random Access Memory (RAM), and the like.
Specifically, referring to fig. 22, fig. 22 is a schematic structural diagram of a chip provided in the embodiment of the present application, where the chip may be represented as a neural network processor NPU 220, and the NPU 220 is mounted on a main CPU (Host CPU) as a coprocessor, and the Host CPU allocates tasks. The core portion of the NPU is an arithmetic circuit 220, and the controller 2204 controls the arithmetic circuit 2203 to extract matrix data in the memory and perform multiplication.
In some implementations, the arithmetic circuit 2203 internally includes a plurality of processing units (PEs). In some implementations, the operational circuitry 2203 is a two-dimensional systolic array. The operational circuit 2203 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuitry 2203 is a general-purpose matrix processor.
For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 2202 and buffers it on each PE in the arithmetic circuit. The arithmetic circuit takes the matrix a data from the input memory 2201 and performs matrix operation with the matrix B, and partial or final results of the obtained matrix are stored in an accumulator (accumulator) 2208.
The unified memory 2206 is used for storing input data and output data. The weight data directly passes through a Memory Access Controller (DMAC) 2205, and the DMAC is transferred to a weight Memory 2202. The input data is also carried into the unified memory 2206 by the DMAC.
BIU is a Bus Interface Unit 2210, used for the interaction of AXI Bus with DMAC and Instruction Fetch memory (IFB) 2209.
A Bus Interface Unit 2210 (BIU) is used for the fetch memory 2209 to fetch instructions from the external memory, and for the memory access controller 2205 to fetch the original data of the input matrix a or the weight matrix B from the external memory.
The DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 2206, to transfer weight data to the weight memory 2202, or to transfer input data to the input memory 2201.
The vector calculation unit 2207 includes a plurality of operation processing units, and further processes the output of the operation circuit such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like, if necessary. The method is mainly used for non-convolution/full-connection layer network calculation in the neural network, such as Batch Normalization, pixel-level summation, up-sampling of a feature plane and the like.
In some implementations, the vector calculation unit 2207 can store the processed output vector to the unified memory 2206. For example, the vector calculation unit 2207 may apply a linear function and/or a nonlinear function to the output of the operation circuit 2203, such as linear interpolation of the feature planes extracted by the convolution layer, and further such as a vector of accumulated values to generate the activation values. In some implementations, the vector calculation unit 2207 generates normalized values, pixel-level summed values, or both. In some implementations, the vector of processed outputs can be used as activation inputs to the operational circuitry 2203, e.g., for use in subsequent layers in a neural network.
An instruction fetch buffer 2209 connected to the controller 2204, for storing instructions used by the controller 2204;
the unified memory 2206, the input memory 2201, the weight memory 2202, and the instruction fetch memory 2209 are all On-Chip memories. The external memory is private to the NPU hardware architecture.
Here, the operations of the layers in the first neural network illustrated in fig. 3 to 9 may be performed by the operation circuit 2203 or the vector calculation unit 2207.
Wherein any of the aforementioned processors may be a general purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits configured to control the execution of the programs of the method of the first aspect.
It should be noted that the above-described embodiments of the apparatus are merely schematic, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiments of the apparatus provided in the present application, the connection relationship between the modules indicates that there is a communication connection therebetween, and may be implemented as one or more communication buses or signal lines.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus necessary general-purpose hardware, and certainly can also be implemented by special-purpose hardware including special-purpose integrated circuits, special-purpose CPUs, special-purpose memories, special-purpose components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions may be various, such as analog circuits, digital circuits, or dedicated circuits. However, for the present application, the implementation of a software program is more preferable. Based on such understanding, the technical solutions of the present application may be substantially embodied in the form of a software product, which is stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, an exercise device, or a network device) to execute the method according to the embodiments of the present application.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.
The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, training device, or data center to another website site, computer, training device, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a training device, a data center, etc., that incorporates one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Claims (26)

1. A method of training a neural network, the method comprising:
acquiring a training image set, wherein the training image set comprises N types of supporting images and query images, N is an integer greater than 1, and the N types of supporting images and the query images correspond to a meta-learning training mode;
acquiring N pieces of first category information corresponding to the N types of supporting images, and updating each piece of first category information according to the N pieces of first category information and first similarity information to obtain N pieces of second category information, wherein one piece of first category information indicates feature information of one type of supporting images before updating, one piece of first similarity information indicates similarity between each piece of first category information in the N pieces of first category information, and one piece of second category information indicates updated feature information of one type of supporting images;
extracting the features of the query image through a first neural network to obtain at least one piece of first feature information corresponding to the query image, and updating the at least one piece of first feature information through the first neural network according to the N pieces of second category information to obtain at least one piece of second feature information corresponding to the query image;
according to the at least one piece of second feature information, performing feature processing operation through the first neural network to obtain a prediction result corresponding to the query image;
and training the first neural network according to an expected result corresponding to the query image, the predicted result and a first loss function until a preset condition is met, wherein the first loss function indicates the similarity between the predicted result and the expected result.
2. The method according to claim 1, wherein the updating, by the first neural network, the at least one first feature information according to the N second category information to obtain at least one second feature information corresponding to the query image comprises:
calculating second similarity information according to the N pieces of second category information and the at least one piece of first feature information, wherein the second similarity information indicates the similarity between the second category information and the first feature information;
and updating the first characteristic information according to the second similarity information to obtain the second characteristic information.
3. The method of claim 2,
the first neural network includes M feature channels in a feature extraction network, the second category information includes feature information corresponding to the M feature channels, the first feature information includes feature information corresponding to the M feature channels, and the second similarity information is used to indicate similarities between the second category information and the first feature information in each of the M feature channels.
4. The method according to claim 2, wherein the calculating second similarity information according to the N second category information and the at least one first feature information comprises:
and performing similarity calculation on the second category information and the first characteristic information to obtain second similarity information, wherein the second similarity information indicates the similarity of the second category information and the first characteristic information in the whole information hierarchy.
5. The method according to any one of claims 1 to 4, further comprising:
updating the at least one first feature information through the first neural network according to the N pieces of first category information to obtain at least one third feature information corresponding to the query image;
performing fusion operation on the at least one third feature information and the at least one second feature information through the first neural network to obtain at least one updated second feature information corresponding to the query image;
the performing, by the first neural network, a feature processing operation according to the at least one second feature information includes:
performing a feature processing operation by the first neural network according to the at least one updated second feature information.
6. The method according to any one of claims 1 to 4, wherein the performing, by the first neural network, a feature processing operation according to the at least one second feature information to obtain a prediction result corresponding to the query image comprises:
according to the at least one piece of second characteristic information, performing object detection operation through the first neural network to obtain a first prediction result corresponding to the query image, wherein the first prediction result indicates the prediction position of at least one detection frame in the query image and the prediction category corresponding to each detection frame; alternatively, the first and second electrodes may be,
and according to the at least one piece of second characteristic information, performing a classification operation through the first neural network to obtain a second prediction result corresponding to the query image, wherein the second prediction result indicates the prediction category of the query image.
7. An image processing method, characterized in that the method comprises:
acquiring N pieces of second category information, wherein one piece of second category information indicates updated feature information of one category of images, the second category information is obtained according to N pieces of first category information and first similarity information, one piece of first category information indicates feature information of one category of images, the first similarity information indicates similarity between each piece of first category information in the N pieces of first category information, and N is an integer greater than or equal to 1;
extracting features of an image to be processed through a first neural network to obtain at least one piece of first feature information corresponding to the image to be processed, and updating the at least one piece of first feature information through the first neural network according to the N pieces of second category information to obtain at least one piece of second feature information corresponding to the image to be processed;
and according to the at least one piece of second characteristic information, performing characteristic processing operation through the first neural network to obtain a prediction result corresponding to the image to be processed.
8. The method according to claim 7, wherein the updating the at least one first feature information through the first neural network according to the N second category information to obtain at least one second feature information corresponding to the query image comprises:
calculating second similarity information according to the N pieces of second category information and the at least one piece of first feature information, wherein the second similarity information indicates the similarity between the second category information and the first feature information;
and updating the first characteristic information according to the second similarity information to obtain the second characteristic information.
9. A method of training a neural network, the method comprising:
acquiring a training image set, wherein the training image set comprises N types of supporting images and query images, N is an integer greater than or equal to 1, and the N types of supporting images and the query images correspond to a meta-learning training mode;
acquiring N pieces of first category information corresponding to the N types of supporting images, and updating each piece of first category information according to the N pieces of first category information and first similarity information to obtain N pieces of second category information, wherein one piece of first category information indicates feature information of one type of supporting images before updating, one piece of first similarity information indicates similarity between each piece of first category information in the N pieces of first category information, and one piece of second category information indicates updated feature information of one type of supporting images;
performing feature extraction on the query image through a second neural network to obtain first feature information corresponding to the query image, and calculating third similarity information, wherein the third similarity information indicates the similarity between the first feature information and each piece of second category information in the N pieces of second category information;
generating first indication information through the second neural network according to the third similarity information, wherein the first indication information is used for indicating a prediction category corresponding to the query image;
training the second neural network according to second indication information corresponding to the query image, the first indication information and a second loss function until a preset condition is met, wherein the second indication information is used for indicating a correct category corresponding to the query image, and the second loss function indicates the similarity between the first indication information and the second indication information.
10. The method of claim 9, further comprising:
and updating the first feature information through the first neural network according to the N pieces of second category information to obtain updated first feature information corresponding to the query image, wherein the third similarity information indicates the similarity between the updated first feature information and each piece of second category information in the N pieces of second category information.
11. An image processing method, characterized in that the method comprises:
acquiring N pieces of second category information, wherein one piece of second category information indicates updated feature information of one category of images, the second category information is obtained according to N pieces of first category information and first similarity information, one piece of first category information indicates feature information of one category of images, the first similarity information indicates similarity between each piece of first category information in the N pieces of first category information, and N is an integer greater than or equal to 1;
performing feature extraction on an image to be processed through a second neural network to obtain first feature information corresponding to the image to be processed, and calculating third similarity information, wherein the third similarity information indicates the similarity between the first feature information and each piece of second category information in the N pieces of second category information;
and generating first indication information through the second neural network according to the third similarity information, wherein the first indication information is used for indicating a prediction category corresponding to the image to be processed.
12. An apparatus for training a neural network, the apparatus comprising:
an obtaining module, configured to obtain a training image set, where the training image set includes N types of support images and a query image, where N is an integer greater than 1, and the N types of support images and the query image correspond to a meta-learning training mode;
the acquiring module is further configured to acquire N pieces of first category information corresponding to the N types of support images, and update each piece of first category information according to the N pieces of first category information and first similarity information to obtain N pieces of second category information, where one piece of first category information indicates feature information of one type of support image before update, the first similarity information indicates similarity between each piece of first category information of the N pieces of first category information, and one piece of second category information indicates updated feature information of one type of support image;
the generating module is used for extracting the features of the query image through a first neural network to obtain at least one piece of first feature information corresponding to the query image, and updating the at least one piece of first feature information through the first neural network according to the N pieces of second category information to obtain at least one piece of second feature information corresponding to the query image;
the processing module is used for executing feature processing operation through the first neural network according to the at least one piece of second feature information so as to obtain a prediction result corresponding to the query image;
and the training module is used for training the first neural network according to the expected result corresponding to the query image, the predicted result and a first loss function until a preset condition is met, wherein the first loss function indicates the similarity between the predicted result and the expected result.
13. The apparatus of claim 12, wherein the generation module comprises a computation submodule and an update submodule, wherein,
the calculation sub-module is configured to calculate second similarity information according to the N second category information and the at least one piece of first feature information, where the second similarity information indicates a similarity between the second category information and the first feature information;
and the updating submodule is used for updating the first characteristic information according to the second similarity information to obtain the second characteristic information.
14. The apparatus of claim 13,
the first neural network includes M feature channels in a feature extraction network, the second category information includes feature information corresponding to the M feature channels, the first feature information includes feature information corresponding to the M feature channels, and the second similarity information is used to indicate similarities between the second category information and the first feature information in each of the M feature channels.
15. The apparatus of claim 13,
the calculation sub-module is specifically configured to perform similarity calculation on the second category information and the first feature information to obtain second similarity information, where the second similarity information indicates similarity of the second category information and the first feature information in the entire information hierarchy.
16. The apparatus of any one of claims 12 to 15, further comprising:
the updating module is used for updating the at least one piece of first characteristic information through the first neural network according to the N pieces of first category information to obtain at least one piece of third characteristic information corresponding to the query image;
the fusion module is used for executing fusion operation on the at least one third feature information and the at least one second feature information through the first neural network to obtain at least one updated second feature information corresponding to the query image;
the processing module is specifically configured to execute a feature processing operation through the first neural network according to the at least one updated second feature information.
17. The apparatus of any one of claims 12 to 15,
the processing module is specifically configured to:
according to the at least one piece of second characteristic information, performing object detection operation through the first neural network to obtain a first prediction result corresponding to the query image, wherein the first prediction result indicates the prediction position of at least one detection frame in the query image and the prediction category corresponding to each detection frame; alternatively, the first and second electrodes may be,
and according to the at least one piece of second characteristic information, performing a classification operation through the first neural network to obtain a second prediction result corresponding to the query image, wherein the second prediction result indicates the prediction category of the query image.
18. An image processing apparatus, characterized in that the apparatus comprises:
an obtaining module, configured to obtain N pieces of second category information, where one piece of second category information indicates updated feature information of one category of images, where the second category information is obtained according to N pieces of first category information and first similarity information, and one piece of first category information indicates feature information of one category of images, the first similarity information indicates similarity between each piece of first category information in the N pieces of first category information, and N is an integer greater than or equal to 1;
the generating module is used for extracting features of an image to be processed through a first neural network to obtain at least one piece of first feature information corresponding to the image to be processed, and updating the at least one piece of first feature information through the first neural network according to the N pieces of second category information to obtain at least one piece of second feature information corresponding to the image to be processed;
and the processing module is used for executing characteristic processing operation through the first neural network according to the at least one piece of second characteristic information so as to obtain a prediction result corresponding to the image to be processed.
19. The apparatus of claim 18, wherein the generation module comprises a computation submodule and an update submodule, wherein,
the calculation sub-module is configured to calculate second similarity information according to the N second category information and the at least one piece of first feature information, where the second similarity information indicates a similarity between the second category information and the first feature information;
and the updating submodule is used for updating the first characteristic information according to the second similarity information to obtain the second characteristic information.
20. An apparatus for training a neural network, the apparatus comprising:
an obtaining module, configured to obtain a training image set, where the training image set includes a query image and N types of support images, where N is an integer greater than or equal to 1, and the N types of support images and the query image correspond to a meta-learning training mode;
the acquiring module is further configured to acquire N pieces of first category information corresponding to the N types of support images, and update each piece of first category information according to the N pieces of first category information and first similarity information to obtain N pieces of second category information, where one piece of first category information indicates feature information of one type of support image before update, the first similarity information indicates similarity between each piece of first category information of the N pieces of first category information, and one piece of second category information indicates updated feature information of one type of support image;
a generating module, configured to perform feature extraction on the query image through a second neural network to obtain first feature information corresponding to the query image, and calculate third similarity information, where the third similarity information indicates a similarity between the first feature information and each piece of second category information in the N pieces of second category information;
a processing module, configured to generate, according to the third similarity information, first indication information through the second neural network, where the first indication information is used to indicate a prediction category corresponding to the query image;
a training module, configured to train the second neural network according to second indication information corresponding to the query image, the first indication information, and a second loss function until a preset condition is met, where the second indication information is used to indicate a correct category corresponding to the query image, and the second loss function indicates a similarity between the first indication information and the second indication information.
21. The apparatus of claim 20, further comprising:
and the updating module is used for updating the first feature information through the first neural network according to the N pieces of second category information to obtain updated first feature information corresponding to the query image, and the third similarity information indicates the similarity between the updated first feature information and each piece of second category information in the N pieces of second category information.
22. An image processing apparatus, characterized in that the apparatus comprises:
an obtaining module, configured to obtain N pieces of second category information, where one piece of second category information indicates updated feature information of one category of images, where the second category information is obtained according to N pieces of first category information and first similarity information, and one piece of first category information indicates feature information of one category of images, the first similarity information indicates similarity between each piece of first category information in the N pieces of first category information, and N is an integer greater than or equal to 1;
the generating module is used for extracting features of an image to be processed through a second neural network to obtain first feature information corresponding to the image to be processed, and calculating third similarity information, wherein the third similarity information indicates the similarity between the first feature information and each piece of second category information in the N pieces of second category information;
and the processing module is used for generating first indication information through the second neural network according to the third similarity information, wherein the first indication information is used for indicating a prediction category corresponding to the image to be processed.
23. A computer program for causing a computer to perform the method of any one of claims 1 to 6, or for causing a computer to perform the method of claim 7 or 8, or for causing a computer to perform the method of claim 9 or 10, or for causing a computer to perform the method of claim 11, when the computer program is run on a computer.
24. A computer-readable storage medium, characterized by comprising a program which, when run on a computer, causes the computer to perform the method of any one of claims 1 to 6, or causes the computer to perform the method of claim 7 or 8, or causes the computer to perform the method of claim 9 or 10, or causes the computer to perform the method of claim 11.
25. An exercise device comprising a processor and a memory, the processor coupled with the memory,
the memory is used for storing programs;
the processor, configured to execute a program in the memory, to cause the training apparatus to perform the method of any one of claims 1 to 6, or to cause the training apparatus to perform the method of claim 9 or 10.
26. An execution device comprising a processor and a memory, the processor coupled with the memory,
the memory is used for storing programs;
the processor, configured to execute the program in the memory, to cause the execution device to perform the method according to claim 7 or 8, or to cause the execution device to perform the method according to claim 11.
CN202110218469.2A 2021-02-26 2021-02-26 Image processing method, neural network training method and related equipment Pending CN113065634A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110218469.2A CN113065634A (en) 2021-02-26 2021-02-26 Image processing method, neural network training method and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110218469.2A CN113065634A (en) 2021-02-26 2021-02-26 Image processing method, neural network training method and related equipment

Publications (1)

Publication Number Publication Date
CN113065634A true CN113065634A (en) 2021-07-02

Family

ID=76559266

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110218469.2A Pending CN113065634A (en) 2021-02-26 2021-02-26 Image processing method, neural network training method and related equipment

Country Status (1)

Country Link
CN (1) CN113065634A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115065707A (en) * 2022-07-27 2022-09-16 北京华夏圣远能源科技有限公司 Remote monitoring method, device and medium for micromolecule recyclable fracturing fluid sand mixing truck
WO2023207823A1 (en) * 2022-04-29 2023-11-02 华为技术有限公司 Method for obtaining feature information of category description, image processing method, and device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109961089A (en) * 2019-02-26 2019-07-02 中山大学 Small sample and zero sample image classification method based on metric learning and meta learning
CN110717554A (en) * 2019-12-05 2020-01-21 广东虚拟现实科技有限公司 Image recognition method, electronic device, and storage medium
CN111242199A (en) * 2020-01-07 2020-06-05 中国科学院苏州纳米技术与纳米仿生研究所 Training method and classification method of image classification model
CN111275060A (en) * 2018-12-04 2020-06-12 北京嘀嘀无限科技发展有限公司 Recognition model updating processing method and device, electronic equipment and storage medium
US20200242410A1 (en) * 2019-01-30 2020-07-30 Mitsubishi Electric Research Laboratories, Inc. System for Training Descriptor with Active Sample Selection
CN111797893A (en) * 2020-05-26 2020-10-20 华为技术有限公司 Neural network training method, image classification system and related equipment
US20200334490A1 (en) * 2019-04-16 2020-10-22 Fujitsu Limited Image processing apparatus, training method and training apparatus for the same
CN111860588A (en) * 2020-06-12 2020-10-30 华为技术有限公司 Training method for graph neural network and related equipment
CN111881954A (en) * 2020-07-15 2020-11-03 中国科学院自动化研究所 Transduction reasoning small sample classification method based on progressive cluster purification network

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111275060A (en) * 2018-12-04 2020-06-12 北京嘀嘀无限科技发展有限公司 Recognition model updating processing method and device, electronic equipment and storage medium
US20200242410A1 (en) * 2019-01-30 2020-07-30 Mitsubishi Electric Research Laboratories, Inc. System for Training Descriptor with Active Sample Selection
CN109961089A (en) * 2019-02-26 2019-07-02 中山大学 Small sample and zero sample image classification method based on metric learning and meta learning
US20200334490A1 (en) * 2019-04-16 2020-10-22 Fujitsu Limited Image processing apparatus, training method and training apparatus for the same
CN110717554A (en) * 2019-12-05 2020-01-21 广东虚拟现实科技有限公司 Image recognition method, electronic device, and storage medium
CN111242199A (en) * 2020-01-07 2020-06-05 中国科学院苏州纳米技术与纳米仿生研究所 Training method and classification method of image classification model
CN111797893A (en) * 2020-05-26 2020-10-20 华为技术有限公司 Neural network training method, image classification system and related equipment
CN111860588A (en) * 2020-06-12 2020-10-30 华为技术有限公司 Training method for graph neural network and related equipment
CN111881954A (en) * 2020-07-15 2020-11-03 中国科学院自动化研究所 Transduction reasoning small sample classification method based on progressive cluster purification network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
AOXUE LI等: "Few-Shot Learning With Global Class Representations", 《 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)》, pages 9714 - 9723 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023207823A1 (en) * 2022-04-29 2023-11-02 华为技术有限公司 Method for obtaining feature information of category description, image processing method, and device
CN115065707A (en) * 2022-07-27 2022-09-16 北京华夏圣远能源科技有限公司 Remote monitoring method, device and medium for micromolecule recyclable fracturing fluid sand mixing truck
CN115065707B (en) * 2022-07-27 2022-11-04 北京华夏圣远能源科技有限公司 Remote monitoring method, device and medium for micromolecule recyclable fracturing fluid sand mixing truck

Similar Documents

Publication Publication Date Title
CN111797893B (en) Neural network training method, image classification system and related equipment
WO2022068623A1 (en) Model training method and related device
CN111860588A (en) Training method for graph neural network and related equipment
WO2021218471A1 (en) Neural network for image processing and related device
CN111414915B (en) Character recognition method and related equipment
CN111797589A (en) Text processing network, neural network training method and related equipment
WO2022012668A1 (en) Training set processing method and apparatus
CN113191241A (en) Model training method and related equipment
CN111931002A (en) Matching method and related equipment
CN113065634A (en) Image processing method, neural network training method and related equipment
CN111738403A (en) Neural network optimization method and related equipment
CN111950702A (en) Neural network structure determining method and device
CN113052295A (en) Neural network training method, object detection method, device and equipment
CN113011568A (en) Model training method, data processing method and equipment
CN113627422A (en) Image classification method and related equipment thereof
WO2022100607A1 (en) Method for determining neural network structure and apparatus thereof
CN115238909A (en) Data value evaluation method based on federal learning and related equipment thereof
CN113627421A (en) Image processing method, model training method and related equipment
CN114140841A (en) Point cloud data processing method, neural network training method and related equipment
CN114169393A (en) Image classification method and related equipment thereof
WO2023231753A1 (en) Neural network training method, data processing method, and device
CN116739154A (en) Fault prediction method and related equipment thereof
CN111476144A (en) Pedestrian attribute identification model determination method and device and computer readable storage medium
CN113052618A (en) Data prediction method and related equipment
WO2022052647A1 (en) Data processing method, neural network training method, and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination