WO2021082480A1

WO2021082480A1 - Image classification method and related device

Info

Publication number: WO2021082480A1
Application number: PCT/CN2020/097906
Authority: WO
Inventors: 张钧萍; 吴斯凡; 郭庆乐; 汪鹏程
Original assignee: 华为技术有限公司
Priority date: 2019-10-29
Filing date: 2020-06-24
Publication date: 2021-05-06
Also published as: CN110991236A

Abstract

An image classification method and a related device. An image classification model used in the method can extract a spatial feature and a spectral feature from a target image, and spatial-spectral features formed by combining the two features can represent attribute information of the image in a multi-dimensional manner. Therefore, classifying the image on the basis of the spatial-spectral features can effectively improve the accuracy of an image classification result, such that an object in the image can be accurately recognized. The method comprises: first acquiring the target image needing to be classified, wherein the target image is an image generated on the basis of a hyperspectral image, then extracting the spatial feature of the target image and the spectral feature of the target image by means of the image classification model, then constructing, by means of the image classification model, the spatial-spectral features according to the spatial feature and the spectral feature, acquiring a classification result of the spatial-spectral features by means of the image classification model, and finally determining, according to the classification result, the category to which the target image belongs.

Description

An image classification method and related device

Technical field

This application relates to the field of artificial intelligence (AI), and in particular to an image classification method and related devices.

Background technique

Artificial intelligence (AI) technology is a technical discipline that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence. AI technology obtains the best results by perceiving the environment, acquiring knowledge, and using knowledge. In other words, artificial intelligence technology is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence. The use of artificial intelligence for image processing is a common application of artificial intelligence.

Take a monitoring scene as an example. In this scene, a visible light image can be obtained through a monitoring device. The visible light image usually contains multiple types of objects, such as people, houses, boxes, etc., in order to capture a certain type of target object to achieve monitoring , You can select a part of the image containing an object (or certain objects) from the visible light image as the target image to be classified, and then extract the spatial characteristics of the target image (used to characterize the geometric shape, texture, etc. of the object) through the neural network Etc.), and classify the spatial characteristics of the target image through a neural network, and then determine the category to which the target image belongs.

In the above-mentioned image classification process, since only the spatial characteristics of the image are considered, it is not sufficient to fully characterize all the attribute information of the image, resulting in a low accuracy rate of the image classification result, and the object in the image cannot be correctly identified.

Summary of the invention

The embodiments of the present application provide an image classification method and related devices, which can effectively improve the accuracy of image classification results and accurately identify objects in the image.

The first aspect of the embodiments of the present application provides an image classification method, which includes:

If image classification is required, the target image to be classified can be acquired first, where the target image is an image generated based on a multispectral image;

After obtaining the target image, you can first obtain the image classification model, which is a deep network model, and then extract the spatial characteristics of the target image and the spectral characteristics of the target image through the image classification model;

After obtaining the spatial features and spectral features of the target image, the image classification model is used to construct the spatial features according to the spatial and spectral features;

After obtaining the empty spectrum feature, the classification result of the empty spectrum feature can be obtained through the image classification model, and the classification result includes the probability that the target image is located in each category;

From the classification results to determine which category the target image has the highest probability, the category to which the target image belongs can be finally determined.

From the above image classification method, it can be seen that the image classification model used in the image classification process can extract the spatial and spectral features of the target image, and the combination of the two forms the spatial spectral features that can be characterized in multiple dimensions The attribute information of the image, so the classification of the image based on the empty spectrum feature can effectively improve the accuracy of the classification result of the image, and accurately identify the object in the image.

In a possible implementation of the first aspect, before extracting the spatial features of the target image and the spectral features of the target image through the image classification model, the method further includes:

Obtain the spatial information of the target image and the spectral information of the target image, where the spectral information of the target image is a one-dimensional vector formed by the target image, and the spatial information of the target image is a two-dimensional vector formed by the target image and the neighborhood image of the target image vector;

Extracting the spatial characteristics of the target image and the spectral characteristics of the target image through the image classification model includes:

Through the image classification model, the spatial information and the spectral information are extracted respectively, and the spatial characteristics of the target image and the spectral characteristics of the target image are obtained.

In the above embodiment, the spatial information and spectral information of the target image can be extracted first, and used as the input of the image classification model to further extract the spatial characteristics of the target image and the spectral characteristics of the target image, which improves the flexibility and choice of the solution. Sex.

In a possible implementation manner of the first aspect, the image classification model includes a first branch network and a second branch network, the first branch network includes a first convolutional layer and a first pooling layer, and the second branch network includes The second convolutional layer and the second pooling layer respectively perform feature extraction on the spatial information and spectral information through the image classification model, and obtain the spatial characteristics of the target image and the spectral characteristics of the target image, including:

Perform convolution processing on the spectral information through the first convolution layer to obtain the first spectral feature of the target image;

Perform maximum pooling processing on the first spectral feature through the first pooling layer to obtain the second spectral feature of the target image;

Perform convolution processing on the spatial information through the second convolution layer to obtain the first spatial feature of the target image;

The first spatial feature is maximally pooled by the second pooling layer to obtain the second spatial feature of the target image.

In the above embodiment, the image classification model includes two branch networks, and both the first branch network and the second branch network include a convolutional layer and a pooling layer, so it can pass through the first convolutional layer of the first branch network. Perform convolution processing on the spectral information of the target image to obtain the first spectral feature, and then perform maximum pooling processing on the first spectral feature through the first pooling layer to obtain the second spectral feature. In the same way, the second spectral feature can also be obtained. The second convolutional layer of the branch network performs convolution processing on the spatial information of the target image to obtain the first spatial feature, and then performs the maximum pooling processing on the first spatial feature through the second pooling layer to obtain the second spatial feature, improving This improves the flexibility and selectivity of the program.

In a possible implementation of the first aspect, the image classification model further includes a fully connected layer, and the construction of the space spectrum feature based on the spatial feature and the spectral feature through the image classification model includes:

The second spectral feature and the second spatial feature are respectively stretched through the image classification model to obtain the third spectral feature and the third spatial feature, where the third spectral feature and the third spatial feature are one-dimensional vectors;

The third spectral feature and the third spatial feature are fused through the fully connected layer to obtain the spatial spectrum feature.

In the above embodiment, the image classification model also includes a fully connected layer. After the second spectral feature and the second spatial feature output by the first pooling layer and the second pooling layer are obtained, the two features can be combined. Stretching (ie element reorganization) to make it a one-dimensional vector, namely the third spectral feature and the third spatial feature, and then the third spectral feature and the third spatial feature are fused through the fully connected layer to obtain a single The scale of the empty spectrum feature improves the flexibility and selectivity of the scheme.

In a possible implementation of the first aspect, the image classification model includes a first branch network and a second branch network, and the first branch network includes n first convolutional layers and n-1 first pooling layers, The second branch network includes n second convolutional layers and n-1 second pooling layers, where n is greater than or equal to 2, and the spatial information and spectral information are respectively feature extracted through the image classification model to obtain the target image The spatial characteristics and spectral characteristics of the target image include:

Perform convolution processing on the spectral information through the first first convolution layer to obtain the first first spectral feature;

Through the first first pooling layer to the n-1th first pooling layer, the first first spectral feature to the n-1th first spectral feature are respectively subjected to maximum pooling processing to obtain the first From the second spectral feature to the n-1th second spectral feature;

Through the second first convolutional layer to the nth first convolutional layer, the first second spectral feature to the n-1th second spectral feature are respectively convolved to obtain the second first spectral feature To the nth first spectral feature;

Perform convolution processing on the spatial information through the first second convolution layer to obtain the first first spatial feature;

Through the first second pooling layer to the n-1th second pooling layer, the first first spatial feature to the n-1th first spatial feature are respectively subjected to maximum pooling processing to obtain the first From the second spatial feature to the n-1th second spatial feature;

Perform convolution processing on the first second spatial feature to the n-1th second spatial feature through the second second convolutional layer to the nth second convolutional layer to obtain the second first spatial feature To the nth first spatial feature.

In the above embodiment, the image classification model includes two branch networks, and the first branch network includes n first convolutional layers and n-1 first pooling layers, so the spectral information of the target image can be used as the first The input of the first convolutional layer, the first first spectral feature is obtained after convolution, and then the first first spectral feature is used as the input of the first first pooling layer, after the maximum pooling, the first 1 second spectral feature, and then use the first second spectral feature as the input of the second first convolutional layer. After convolution, the second first spectral feature is obtained, and then the second first spectral feature As the input of the second first pooling layer, and so on, the first first convolutional layer to the nth first convolutional layer can output the first first spectral feature to the nth first spectrum, respectively Features, the first first pooling layer to the n-1th first pooling layer can respectively output the first second spectral characteristic to the n-1th second spectral characteristic.

In the same way, by performing feature extraction on the spatial information of the target image through the second branch network, the first first spatial feature to the nth first spatial feature and the first second spatial feature to n-1 are also obtained. The second spatial feature improves the flexibility and selectivity of the scheme.

In a possible implementation of the first aspect, the image classification model further includes n fully connected layers, and constructing the space spectrum feature according to the spatial feature and the spectral feature through the image classification model includes:

Through the image classification model, the first second spectral feature to the n-1th second spectral feature, the first second spatial feature to the n-1th second spatial feature, the nth first spectral feature and the first The n first spatial features are respectively stretched to obtain the first third spectral feature to the nth third spectral feature, and the first third spatial feature to the nth third spatial feature, where the third The spectral feature and the third spatial feature are one-dimensional vectors;

The n pairs of feature groups are respectively fused through n fully connected layers to obtain n sub-space spectral features, where a third spectral feature and a third spatial feature with the same order form a feature group;

Through the image classification model, the n sub-space spectrum features are spliced to obtain the spatial spectrum feature.

In the foregoing embodiment, the image classification model further includes n fully connected layers. The first second spectral feature to the n-1th second spectral feature, and the nth first spectral feature can be stretched, corresponding to the first third spectral feature to the nth third spectral feature In the same way, the first second spatial feature to the n-1th second spatial feature, and the nth first spatial feature can be stretched to obtain the first third spatial feature to the nth A third spatial feature. Then use the first third spectral feature and the first third spatial feature as the input of the first fully connected layer. After fusion, the first subspace spectral feature is obtained, and the second third spectral feature and the second The third spatial feature is used as the input of the second fully connected layer. After fusion, the second subspace spectrum feature is obtained, and so on, until n subspace spectrum features are obtained, and then n subspace spectrum features are spliced together to obtain a multiple The scale of the empty spectrum feature improves the flexibility and selectivity of the scheme.

In a possible implementation of the first aspect, the image classification model further includes a classification layer, and obtaining the classification result of the empty spectrum feature through the image classification model includes:

The spatial spectrum features are classified by the classification layer to obtain the classification results.

In the foregoing embodiment, the image classification model further includes a classification layer. After the empty spectrum feature is obtained, the empty spectrum feature can be split by the classification layer to obtain the classification result of the target image, which improves the flexibility and selectivity of the scheme.

The second aspect of the embodiments of the present application provides a method for model training, which includes:

Obtain an image to be trained, and the image to be trained is an image generated based on a hyperspectral image;

Extract the spatial characteristics of the image to be trained and the spectral characteristics of the image to be trained through the classification model to be trained;

Construct empty spectrum features according to spatial and spectral features through the classification model to be trained;

Obtain the classification result of the empty spectrum feature through the classification model to be trained;

According to the classification results and the real results, the target loss function is used to train the classification model to be trained to obtain the image classification model.

The image classification model obtained from the above-mentioned model training method can extract the spatial and spectral features of the target image. The space spectrum feature formed by the combination of the two can represent the attribute information of the image in multiple dimensions, so it is based on the space spectrum feature The classification of the image can effectively improve the accuracy of the classification result of the image and accurately identify the objects in the image.

In a possible implementation of the second aspect, before extracting the spatial features of the image to be trained and the spectral features of the image to be trained through the classification model to be trained, the method further includes:

Obtain the spatial information of the image to be trained and the spectral information of the image to be trained, where the spectral information of the image to be trained is a one-dimensional vector formed by the image to be trained, and the spatial information of the image to be trained is the image to be trained and the neighbors of the image to be trained. A two-dimensional vector formed by the domain image;

Extracting the spatial features of the image to be trained and the spectral features of the image to be trained through the classification model to be trained includes:

The spatial information and the spectral information are respectively feature extracted through the classification model to be trained to obtain the spatial features of the image to be trained and the spectral features of the image to be trained.

In a possible implementation of the second aspect, the classification model to be trained includes a first branch network and a second branch network, and the first branch network includes n first convolutional layers and n-1 first pooling layers , The second branch network includes n second convolutional layers and n-1 second pooling layers, where n is greater than or equal to 2. The spatial information and spectral information are respectively feature extracted through the classification model to be trained to obtain the The spatial characteristics of the training image and the spectral characteristics of the image to be trained include:

In a possible implementation manner of the second aspect, the classification model to be trained further includes n fully connected layers, and the construction of the spatial spectrum feature according to the spatial feature and the spectral feature through the classification model to be trained includes:

Through the classification model to be trained, the first second spectral feature to the n-1th second spectral feature, the first second spatial feature to the n-1th second spatial feature, the nth first spectral feature and The nth first spatial feature is stretched separately to obtain the first third spectral feature to the nth third spectral feature, and the first third spatial feature to the nth third spatial feature, where the The three-spectral feature and the third spatial feature are one-dimensional vectors;

The n subspace spectrum features are spliced by the classification model to be trained to obtain the spatial spectrum features.

In a possible implementation manner of the second aspect, the classification model to be trained further includes a classification layer, and obtaining the classification result of the empty spectrum feature through the classification model to be trained includes:

A third aspect of the embodiments of the present application provides an image classification device, including:

The first acquisition module is configured to acquire a target image, and the target image is an image generated based on a hyperspectral image;

The extraction module is used to extract the spatial characteristics of the target image and the spectral characteristics of the target image through the image classification model;

The construction module is used to construct the empty spectrum feature according to the spatial feature and the spectral feature through the image classification model;

The second acquisition module is used to acquire the classification result of the empty spectrum feature through the image classification model;

The determining module is used to determine the category to which the target image belongs according to the classification result.

In a possible implementation manner of the third aspect, the device further includes:

The third acquisition module is used to acquire the spatial information of the target image and the spectral information of the target image, where the spectral information of the target image is a one-dimensional vector formed by the target image, and the spatial information of the target image is the neighbor of the target image and the target image. A two-dimensional vector formed by the domain image;

The extraction module is also used to perform feature extraction on the spatial information and the spectral information through the image classification model to obtain the spatial features of the target image and the spectral features of the target image.

In a possible implementation manner of the third aspect, the image classification model includes a first branch network and a second branch network, the first branch network includes a first convolutional layer and a first pooling layer, and the second branch network includes The second convolutional layer and the second pooling layer, the extraction module is also used to:

In a possible implementation of the third aspect, the image classification model further includes a fully connected layer, and the building module is also used to:

In a possible implementation manner of the third aspect, the image classification model includes a first branch network and a second branch network, and the first branch network includes n first convolutional layers and n-1 first pooling layers, The second branch network includes n second convolutional layers and n-1 second pooling layers, where n is greater than or equal to 2, and the extraction module is also used for:

In a possible implementation manner of the third aspect, the image classification model further includes n fully connected layers, and the building module is further used for:

In a possible implementation manner of the third aspect, the image classification model further includes a classification layer, and the second acquisition module is further configured to classify the empty spectrum features through the classification layer to obtain a classification result.

A fourth aspect of the embodiments of the present application provides a model training device, which includes:

The first acquisition module is used to acquire an image to be trained, and the image to be trained is an image generated based on a hyperspectral image;

The extraction module is used to extract the spatial characteristics of the image to be trained and the spectral characteristics of the image to be trained through the classification model to be trained;

The construction module is used to construct the empty spectrum feature according to the spatial feature and the spectral feature through the classification model to be trained;

The second acquisition module is used to acquire the classification result of the empty spectrum feature through the classification model to be trained;

The training module is used to train the classification model to be trained through the target loss function according to the classification result and the real result to obtain the image classification model.

In a possible implementation manner of the fourth aspect, the device further includes:

The third acquisition module is used to acquire the spatial information of the image to be trained and the spectral information of the image to be trained, where the spectral information of the image to be trained is a one-dimensional vector formed by the image to be trained, and the spatial information of the image to be trained is the image to be trained A two-dimensional vector formed by the image and the neighborhood image of the image to be trained;

The extraction module is also used to perform feature extraction on the spatial information and the spectral information respectively through the classification model to be trained to obtain the spatial features of the image to be trained and the spectral features of the image to be trained.

In a possible implementation manner of the fourth aspect, the classification model to be trained includes a first branch network and a second branch network, and the first branch network includes n first convolutional layers and n-1 first pooling layers , The second branch network includes n second convolutional layers and n-1 second pooling layers, where n is greater than or equal to 2, and the extraction module is also used for:

In a possible implementation manner of the fourth aspect, the classification model to be trained further includes n fully connected layers, and the building module is further used for:

In a possible implementation manner of the fourth aspect, the classification model to be trained further includes a classification layer, and the second acquisition module is further configured to classify the empty spectrum features through the classification layer to obtain a classification result.

A fifth aspect of the embodiments of the present application provides an image classification device, including:

One or more central processing units, memory, input and output interfaces, wired or wireless network interfaces, power supply;

The memory is a short-term storage memory or a persistent storage memory;

The central processing unit is configured to communicate with the memory, and execute the instruction operations in the memory on the image classification device to execute any possible implementation of the first aspect and the first aspect, and any possible implementation of the second and second aspects The method in the way.

The sixth aspect of the embodiments of the present application provides a computer-readable storage medium, including instructions, which when the instructions run on a computer, cause the computer to execute any possible implementation manner as in the first aspect and the first aspect, and the second aspect and The method in any possible implementation of the second aspect.

The seventh aspect of the embodiments of the present application provides a computer program product containing instructions, which when run on a computer, causes the computer to execute any one of the possible implementation manners of the first aspect and the first aspect, the second aspect and the second aspect In any one of the possible implementation manners.

It can be seen from the above technical solutions that the embodiments of the present application have the following advantages:

The embodiment of the present application provides a method and related device for image classification, wherein the method first obtains a target image to be classified, the target image is an image generated based on a hyperspectral image, and then the target image is extracted through an image classification model The spatial characteristics of the target image and the spectral characteristics of the target image are used to construct the empty spectrum feature according to the spatial and spectral characteristics through the image classification model, and the classification result of the empty spectrum feature is obtained through the image classification model. Finally, the category of the target image is determined according to the classification result . The image classification model used in the above process can extract the spatial and spectral features of the target image. The space spectrum feature formed by the combination of the two can represent the attribute information of the image in multiple dimensions. Therefore, the image is processed based on the space spectrum feature. Classification can effectively improve the accuracy of image classification results and accurately identify objects in the image.

Description of the drawings

FIG. 1 is a schematic diagram of an image classification model provided by an embodiment of the application;

FIG. 2 is a schematic flowchart of an image classification method provided by an embodiment of the application;

FIG. 3 is another schematic diagram of an image classification model provided by an embodiment of the application;

FIG. 4 is another schematic flowchart of the image classification method provided by an embodiment of this application;

FIG. 5 is a schematic diagram of feature extraction provided by an embodiment of this application;

FIG. 6 is a schematic flowchart of a model training method provided by an embodiment of the application;

FIG. 7 is a schematic diagram of another process of the model training method provided by an embodiment of the application;

FIG. 8 is a schematic structural diagram of an image classification apparatus provided by an embodiment of the application;

FIG. 9 is a schematic structural diagram of a model training device provided by an embodiment of the application;

FIG. 10 is a schematic structural diagram of an image classification device provided by an embodiment of the application.

Detailed ways

The embodiment of the application provides a method and related device for image classification. If a certain multispectral image needs to be classified, the image classification model that has been trained can be obtained first. The image classification model is composed of two branch networks, fully connected Layer and classification layer composition. When performing image classification, the spectral features of the image can be extracted through the first branch network to characterize the spectral reflectance distribution of the object surface in the image, and the spatial features of the image can be extracted through the second branch network for Characterize the features such as the contour, surface texture and shadow of the object in the image, and then use the fully connected layer to fuse the spectral features and the spatial features to obtain the spatial spectrum features of the image, and then classify the spatial spectrum features of the image through the classification layer to obtain The classification result of the image finally determines the category to which the image belongs. In the process of image classification, because the object analyzed by the image classification model is the spatial spectrum feature of the image, the spatial spectrum feature not only involves the spatial feature of the image, but also considers the spectral feature of the image, so the spatial spectrum feature can be more comprehensively characterized The attribute information of the image, for example, comprehensively reflects the spectral reflectance distribution of the surface of the object in the image, as well as the contour, surface texture, and shadow of the object. Therefore, compared with the traditional method only considering the spatial characteristics, the blank in this application The spectral feature can analyze the effective attribute information of the image from multiple aspects, so that the image classification model has a high accuracy rate for the classification result of the image, and can accurately identify the object in the image.

The embodiments of the present application will be described below in conjunction with the drawings. A person of ordinary skill in the art knows that with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of the present application are equally applicable to similar technical problems.

The terms "first" and "second" in the specification and claims of the application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It should be understood that the terms used in this way can be interchanged under appropriate circumstances, and this is merely a way of distinguishing objects with the same attribute used in describing the embodiments of the present application. In addition, the terms "including" and "having" and any variations of them are intended to cover non-exclusive inclusion, so that a process, method, system, product, or device that includes a series of units is not necessarily limited to those units, but may include Listed or inherent to these processes, methods, products, or equipment.

This application uses AI technology to classify images. Specifically, a monitoring scene is taken as an example for illustration. In this scene, images can be collected by monitoring equipment. The images usually contain multiple types of objects, such as people, vehicles, and houses, in order to achieve real-time monitoring of certain types of objects. Generally, it is necessary to classify the objects contained in the image to correctly identify the category to which the object belongs.

In order to improve the accuracy of image classification results, the present application provides an image classification method, which is implemented by an image classification device, wherein the image classification device includes monitoring equipment for acquiring multiple frames of images to be classified. It is worth noting that the images collected by the monitoring device in this application are usually multispectral images. Multispectral images refer to images with more than 3 spectral channels, for example, hyperspectral images with 128 spectral channels and so on.

In addition, the image classification model used to classify multi-spectral images in this application is a deep network model, which can perform feature extraction and classification on multi-spectral images, and then identify the object categories in the images. Figure 1 is a schematic diagram of an image classification model provided by an embodiment of the application. As shown in Figure 1, the image classification model includes a fully connected layer, a classification layer, and two branch networks. The first branch network includes a first branch network. A convolutional layer and a first pooling layer, and the second branch network also includes a second convolutional layer and a second pooling layer. When using the above-mentioned multi-spectral image as the input of the two branch networks of the image splitting model, the first branch network can be used to extract the spectral features of the multi-spectral image, and the second branch network can be used to extract the spatial features of the multi-spectral image, fully connected layer The empty spectrum feature can be constructed based on the spectral feature and the spatial feature, and the classification layer can be classified based on the empty spectrum feature to obtain the classification result of the image.

Fig. 2 is a schematic flow chart of a method for image classification provided by an embodiment of the application. Please refer to Fig. 2. The method for image classification based on the image classification model shown in Fig. 1 includes:

201. Obtain a target image;

After the image classification device obtains the multi-spectral image through the monitoring device, it can generate the target image to be classified based on the multi-spectral image. It should be noted that in the process of classifying a certain frame of multispectral image, the image classification model can classify the entire multispectral image or part of the multispectral image, because the multispectral image can be regarded as composed of multiple pixels. Therefore, the process of classifying multispectral images by the image classification model can be regarded as the process of classifying multiple parallel input pixels. For each pixel, the operation performed by the image classification model is the same Yes, any pixel in the multispectral image can be used as the input of the image classification model, that is, the target image to be classified.

202. Acquire spatial information of the target image and spectral information of the target image;

After acquiring the target image, the spatial information and spectral information of the target image can be further acquired, where the spectral information of the target image is a one-dimensional vector formed by the target image, and the spatial information of the target image is the target image and the neighborhood image of the target image The two-dimensional vector formed.

Specifically, the target image is a certain pixel in the multispectral image, and the spectral curve of the pixel can be generated based on the pixel first, and the spectral curve is used as the spectral information of the pixel as the input of the first branch network . For example, through the monitoring equipment to obtain continuous multi-frame multi-spectral images within a certain period of time (a certain frame of multi-spectral image is the multi-spectral image N that needs to be classified currently), and each pixel in the multi-spectral image N is For the target image, take pixel point n (such as the nth pixel point in the multispectral image N) as an example for description. Multiple frames of continuous multispectral images form a three-dimensional image block. The image block has three dimensions, namely the wide dimension, the high dimension and the spectral dimension. When the pixel point n is determined, the spectral dimension of the image block can be determined from In each frame of multispectral image, obtain the pixel point corresponding to pixel point n (such as the nth pixel point in each frame of multispectral image), that is, pixel point n and multiple pixels corresponding to pixel point n are obtained, This part of the pixel points can constitute the spectral curve of the pixel point n, and the spectral curve is presented in the form of a one-dimensional vector, that is, the spectral information of the pixel point n.

In addition, while acquiring the spectral information of the target image, the spatial information of the target image can also be acquired. Since the target image is a certain pixel in the multispectral image, the pixel can be generated based on the pixel and the neighborhood information of the pixel. The spatial information of the pixel is used as the input of the second branch network. The above example is still used for explanation. Since the number of bands of the multispectral image is too large, the above image block can be reduced by principal component analysis technology, and the first principal component obtained (can be understood as a frame of image obtained by compressing the image block) Determine the pixel point n, and select the neighborhood image of the pixel point n, such as r×r pixels centered on the pixel point n, the value of r can be set according to actual needs. At this time, the pixel point n and the neighborhood image of the pixel point n constitute the spatial information of the pixel point n, and the spatial information is presented in the form of a two-dimensional vector.

After the spectral information and spatial information of the target image are obtained, the two pieces of information can be correspondingly input into the two branch networks of the image classification model to realize feature extraction and image classification.

203. Perform convolution processing on the spectral information through the first convolution layer to obtain the first spectral feature of the target image.

After the spectral information of the target image is obtained, it can be input into the first convolutional layer of the first branch network of the image classification model. After the first convolutional layer performs convolution processing on the spectral information, the first spectral feature of the target image can be obtained . Specifically, the spectral information is a one-dimensional vector. After processing by the first convolution layer, a first spectral feature composed of m one-dimensional vectors can be obtained (for example, a plane image composed of m one-dimensional vectors side by side) ), where m is greater than or equal to 2.

204. Perform maximum pooling processing on the first spectral feature through the first pooling layer to obtain the second spectral feature of the target image;

After the first spectral feature is obtained, the first spectral feature can be used as the input of the first pooling layer. After the first pooling layer performs maximum pooling on the first spectral feature, the second spectral feature can be obtained. Specifically, the first pooling layer can halve the length of each one-dimensional vector in the first spectral feature, so as to obtain the compressed spectral feature, that is, the second spectral feature.

205. Perform convolution processing on the spatial information through the second convolution layer to obtain the first spatial feature of the target image.

After the spatial information of the target image is obtained, it can be input into the second convolutional layer of the second branch network of the image classification model. After the second convolutional layer convolves the spatial information, the first spatial feature of the target image can be obtained . Specifically, the spectral information is a two-dimensional vector. After the second convolutional layer is processed, the first spatial feature composed of k two-dimensional vectors can be obtained (for example, one composed of k two-dimensional vectors side by side has a certain Thickness of the image block), where k is greater than or equal to 2.

It should be understood that the execution order of step 205 and step 203 is in no particular order, and can be performed simultaneously or asynchronously, and there is no specific limitation here.

206. Perform maximum pooling processing on the first spatial feature through the second pooling layer to obtain the second spatial feature of the target image.

After the first spatial feature is obtained, the first spatial feature can be used as the input of the second pooling layer. After the second pooling layer performs maximum pooling on the first spatial feature, the second spatial feature can be obtained. Specifically, the second pooling layer can halve the length and width of each two-dimensional vector in the first spatial feature, and the number of two-dimensional vectors of the first spatial feature (that is, the thickness of the image block) remains unchanged, thereby obtaining The compressed spatial feature is the second spatial feature.

207. Perform stretching processing on the second spectral feature and the second spatial feature through the image classification model, respectively, to obtain the third spectral feature and the third spatial feature;

After obtaining the second spectral feature and the second spatial feature, since the second spectral feature is a two-dimensional vector composed of m one-dimensional vectors, and the second spatial feature is a three-dimensional vector composed of k two-dimensional vectors, it can be passed through the image The classification model stretches the second spectral feature and the second spatial feature, so that the elements of the second spectral feature and the second spatial feature are rearranged to form a one-dimensional third spectral feature and third spatial feature, that is, the third The spectral feature and the third spatial feature are one-dimensional vectors.

208. Perform fusion processing on the third spectral feature and the third spatial feature through the fully connected layer to obtain the spatial spectrum feature;

After obtaining the one-dimensional third spectral feature and the third spatial feature, the third spectral feature and the third spatial feature can be used as the input of the fully connected layer, so that the fully connected layer can merge the third spectral feature and the third spatial feature , Get the empty spectrum feature, so far, get the empty spectrum feature of the target image.

209. Perform classification processing on the space spectrum feature through the classification layer to obtain a classification result;

After the empty spectrum features of the target image are obtained, the empty spectrum features can be classified through the classification layer of the image classification model to obtain the classification result. The classification result includes the probability that the target image is located in each category. For example, the classification result includes: pixel n The probability of belonging to category A is 67%, the probability of pixel n belonging to category B is 20%, and the probability of pixel n belonging to category C is 13%.

210. Determine the category to which the target image belongs according to the classification result.

From the classification results to determine which category the target image has the highest probability, the category to which the target image belongs can be finally determined. As in the above example, when the classification result indicates that the probability of pixel n belonging to category A is the highest, it can be determined that the category to which pixel n belongs is category A.

The image classification model used in this embodiment can extract the spatial and spectral features of the target image, and the space spectrum feature formed by the combination of the two can represent the attribute information of the image in multiple dimensions, so this embodiment can effectively acquire the image Highly reliable attribute information. Furthermore, the image is classified based on the spatial spectrum feature, and the classification result is more effective in spatial correlation and spectral correlation, which can effectively improve the accuracy of the classification result of the image and accurately identify the objects in the image.

The spatial spectrum features generated in the above embodiments are single-scale features. In the process of classifying multi-spectral images, the scene reflected by the image usually contains objects of various scales, such as larger-scale buildings and smaller-scale objects. However, the single-scale spatial spectrum feature generally cannot describe the scene of the multispectral image well, and it is easy to cause the information of small objects to be lost, which affects the accuracy of the image classification results.

Therefore, in order to further improve the accuracy of the image classification results, this application also provides another image classification model. FIG. 3 is another schematic diagram of the image classification model provided by an embodiment of the application. As shown in FIG. 3, the image classification The model is a multiscale spectral-spatial unified network (MSSN), which includes two branch networks, n fully connected layers, and a classification layer. The first branch network includes n first convolutions. Layer and n-1 first pooling layers, n first convolutional layers and n-1 first pooling layers are alternately connected, and the second branch network includes n second convolutional layers and n-1 first pooling layers. Two pooling layers, n second convolutional layers and n-1 second pooling layers are alternately connected. It is worth noting that the input of the first fully connected layer is the output of the first first pooling layer and the output of the first second pooling layer, and the input of the second fully connected layer is the second first. The output of the pooling layer and the output of the second second pooling layer, and so on, the input of the n-1th fully connected layer is the output of the n-1th first pooling layer and the n-1th The output of the second pooling layer, and the input of the nth fully connected layer is the output of the nth first convolutional layer and the output of the nth second convolutional layer. In addition, after the output of all fully connected layers is spliced, it is used as the input of the classification layer for final image classification.

Fig. 4 is a schematic diagram of another flow chart of an image classification method provided by an embodiment of this application. Please refer to Fig. 4. This method performs image classification based on the image classification model shown in Fig. 3, including:

401. Obtain a target image;

402. Acquire spatial information of the target image and spectral information of the target image.

For the specific description of step 401 to step 402, please refer to the relevant description of step 201 to step 202 in the above-mentioned embodiment, which will not be repeated here.

403. Perform convolution processing on the spectral information through the first first convolution layer to obtain the first first spectral feature;

404. Perform maximum pooling processing on the first first spectral feature to the n-1th first spectral feature through the first first pooling layer to the n-1th first pooling layer, respectively, to obtain the first Second spectral feature to the n-1th second spectral feature;

405. Perform convolution processing on the first second spectral feature to the n-1th second spectral feature through the second first convolutional layer to the nth first convolutional layer to obtain the second first convolutional layer. From the spectral feature to the n-th first spectral feature;

After obtaining the spectral information of the target image, it can be used as the input of the first branch network of the image classification model. In order to facilitate understanding, the following describes the process of extracting the spectral features of the first branch network with reference to Figure 3. The spectral information of the target image is input into the first first convolutional layer, and the first first convolutional layer convolves the spectral information. After the product processing, the first first spectral feature is obtained, and then the first first spectral feature is input into the first first pooling layer for maximum pooling processing to obtain a second spectral feature, and then the first The second spectral feature is input to the second first convolutional layer for convolution processing to obtain the second first spectral feature, and then the second first spectral feature is input to the second first pooling layer for maximum pooling processing , Get the second second spectral feature, and so on, until the n-1th second spectral feature is input to the nth first convolutional layer for convolution processing to get the nth first spectral feature, so far, Then the spectral feature extraction of the target image is completed.

Since n first convolutional layers and n-1 first pooling layers in the first branch network are alternately connected, 2n-1 spectral features can be generated (including n first spectral features and n-1th Two spectral features), and each spectral feature has a size difference. For ease of understanding, the size change between two adjacent spectral features will be introduced below in conjunction with FIG. 5. For ease of description, the value of n is 3. Figure 5 is a schematic diagram of feature extraction provided by an embodiment of this application. As shown in Figure 5, suppose that the first branch network includes three first convolutional layers and two first pooling layers. Dimensional vector) After inputting the first first convolutional layer, the first spectral feature a composed of m one-dimensional vectors can be obtained, where m is greater than or equal to 2. Then, the first spectral feature a is input into the first first pooling layer, so that the length of each one-dimensional vector of the first spectral feature a is halved, and the second spectral feature x is obtained. Then the second spectral feature x is input into the second first convolutional layer, and the first spectral feature b composed of p one-dimensional vectors can be obtained, where p is greater than m. Then, the first spectral feature b is input into the second first pooling layer, so that the length of each one-dimensional vector of the first spectral feature b is halved to obtain the second spectral feature y. Finally, the second spectral feature y is input into the third first convolutional layer, and the first spectral feature c composed of t one-dimensional vectors can be obtained, where t is greater than p.

It should be understood that, in FIG. 5, each branch network is illustrated schematically with only 3 convolutional layers and 2 pooling layers, and the number of convolutional layers and pooling layers of each branch network in the embodiment of the present application are not described. The quantity constitutes a limit. In the same way, FIG. 5 only uses three fully connected layers for schematic illustration, and does not limit the number of fully connected layers in the embodiment of the present application.

406. Perform convolution processing on the spatial information through the first second convolution layer to obtain the first first spatial feature.

407. Perform maximum pooling processing on the first first spatial feature to the n-1th first spatial feature through the first second pooling layer to the n-1th second pooling layer, respectively, to obtain the first Second spatial feature to the n-1th second spatial feature;

408. Perform convolution processing on the first second spatial feature to the n-1th second spatial feature through the second second convolutional layer to the nth second convolutional layer, respectively, to obtain the second first Spatial feature to the nth first spatial feature;

In the same way, after the spatial information of the target image is obtained, it can be used as the input of the second branch network of the image classification model. For ease of understanding, the following describes the process of extracting spatial features by the first branch network with reference to Figure 3: The spatial information of the target image is input into the first second convolutional layer, and the first second convolutional layer convolves the spatial information After the product processing, the first first spatial feature is obtained, and then the first first spatial feature is input to the first second pooling layer for maximum pooling processing to obtain a second spatial feature, and then the first The second spatial feature is input to the second second convolutional layer for convolution processing to obtain the second first spatial feature, and then the second first spatial feature is input to the second second pooling layer for maximum pooling processing , Get the second second spatial feature, and so on, until the n-1th second spatial feature is input to the nth second convolutional layer for convolution processing to obtain the nth first spatial feature, so far, Then the spatial feature extraction of the target image is completed. Since n second convolutional layers and n-1 second pooling layers in the first branch network are alternately connected, n first spatial features and n-1 second spatial features can be generated.

Since n second convolutional layers and n-1 second pooling layers in the second branch network are alternately connected, 2n-1 spatial features can be generated (including n first spatial features and n-1th Two spatial features), and each spatial feature has a difference in size. For ease of understanding, the following will still introduce the size change between two adjacent spatial features in conjunction with Figure 5. As shown in Figure 5, suppose that the second branch network includes 3 second convolutional layers and 2 second pooling. Layer, when the spatial information (a two-dimensional vector) is input to the first second convolutional layer, the first spatial feature d composed of k two-dimensional vectors can be obtained (one composed of k two-dimensional vectors side by side has Image block with a certain thickness), where k is greater than or equal to 2. Then the first spatial feature d is input to the first second pooling layer, so that the length and width of each two-dimensional vector of the first spatial feature d are halved, and the second spatial feature z is obtained. Then the second spatial feature z is input to the second second convolutional layer, and the first spatial feature e composed of q two-dimensional vectors can be obtained, where q is greater than k. Then, the first spatial feature e is input into the second second pooling layer, so that the length and width of each two-dimensional vector of the first spatial feature e are halved to obtain the second spatial feature u. Finally, the second spatial feature u is input into the third second convolutional layer, and the first spatial feature f composed of s two-dimensional vectors can be obtained, where s is greater than q.

409. Use the image classification model to convert the first second spectral feature to the n-1th second spectral feature, the first second spatial feature to the n-1th second spatial feature, and the nth first spectral feature And the nth first spatial feature are respectively stretched to obtain the first third spectral feature to the nth third spectral feature, and the first third spatial feature to the nth third spatial feature;

After obtaining n first spectral features, n-1 second spectral features, n first spatial features, and n-1 second spatial features, you can use the image classification model to compare 1 second spectral feature to nth -1 second spectral feature and n-th first spectral feature are respectively stretched, so that the elements of this part of the feature are reorganized, corresponding to the first third spectral feature to the n-th third spectral feature, and Each third spectral feature is a one-dimensional vector. In the same way, the first second spatial feature to the n-1th second spatial feature and the nth first spatial feature can also be stretched through the image classification model to obtain the first third spatial feature correspondingly To the nth third spatial feature, and each third spatial feature is a one-dimensional vector.

410. Perform fusion processing on n pairs of feature groups through n fully connected layers, respectively, to obtain n subspace spectral features, where a third spectral feature and a third spatial feature with the same order form a feature group;

After n third spectral features and n third spatial features are obtained, the first third spectral feature and the first third spatial feature can be combined into a feature group, and the second third spectral feature and the second third spectral feature can be combined into a feature group. The third spatial feature forms a feature group, and so on, until the nth third spectral feature and the nth third spatial feature are formed into a feature group, and finally n feature groups are obtained. Then input the first feature group into the first fully connected layer, so that the first fully connected layer will fuse the two features in the feature group to obtain the first sub-empty spectrum feature, and at the same time input the second feature group into the first fully connected layer. 2 fully connected layers, so that the second fully connected layer merges the two features in the feature group to obtain the second sub-empty spectrum feature, and so on, until the nth feature group is input to the nth fully connected layer , So that the nth fully connected layer fuses the two features in the feature group to obtain the nth subspace spectrum feature.

Specifically, the formula used for the fusion processing of the fully connected layer is as follows:

y ⁱ = f[W ⁱ (spe ⁱ +spa ⁱ )+b ⁱ ]

In the formula, y ⁱ is the sub-space spectrum feature output by the i-th fully connected layer, f() is the activation function, W ⁱ is the preset weight, and b ⁱ is the preset bias, when i is 1 to n- When any value in 1, spe ⁱ is the i-th second spectral feature, spa ⁱ is the i-th second spatial feature, when i is n, spe ⁱ is the i-th first spectral feature, spa ⁱ Is the i-th first spatial feature.

411. Perform splicing processing on the n sub-space spectrum features through the image classification model to obtain the spatial spectrum feature;

After the n sub-space spectrum features are obtained, the n sub-space spectrum features can be spliced through the image classification model to obtain the empty spectrum feature. Specifically, the formula used by the image classification model for splicing processing is as follows:

output=concat(y ¹ ,y ² ,y ³ ,...)

In the formula, output is the space spectrum feature obtained by splicing multiple sub-space spectrum features. Since each sub-space spectrum feature represents a different scale, the multiple sub-space spectrum features of different scales can be spliced to obtain multi-scale space spectrum features.

412. Perform classification processing on the space spectrum feature through the classification layer to obtain a classification result;

413. Determine the category to which the target image belongs according to the classification result.

For specific descriptions of step 412 to step 413, please refer to the relevant descriptions of step 209 to step 210 in the foregoing embodiment, which will not be repeated here.

The image classification model used in this embodiment can effectively extract multi-scale spatial spectrum features of multi-spectral images, and perform image classification based on the spatial spectrum features. The resulting classification results can effectively distinguish objects of different scales. In order to accurately interpret the complex scene in the image, and accurately identify the object category in the scene.

In order to further explain the method of image classification provided by the embodiments of the present application, an application example will be provided below for specific introduction, and the application example includes:

The image classification device obtains a hyperspectral image for classification through a hyperspectral imager. The scene of the hyperspectral image contains objects of different scales, such as large-scale walls, cars, people, and smaller-scale glasses and For objects such as skin, 17 types of image samples are marked from the hyperspectral image. For example, the first type of image sample is a wall, the second type of image sample is a car, and so on. Then obtain the MSSN that has completed the training, and input the aforementioned hyperspectral image into the MSSN for image classification, and obtain the corresponding classification results.

Through quantitative analysis of the classification results, the analysis results are shown in Table 1. Table 1 shows the scene interpretation accuracy of MSSN on hyperspectral images. Among them, this application example also uses support vector machine (SVM). ) The performance of image classification is used as a comparison. It should be noted that the samples used in the training process of SVM and MSSN are the same, and the hyperspectral images used in image classification of SVM and MSSN are also the same.

Table 1 Analysis results

It can be seen from Table 1 that compared with SVM, the hyperspectral image scene interpretation based on the MSSN network can obtain higher classification accuracy. For example, the classification accuracy of the first type of sample under MSSN is 100, and the classification accuracy under SVM is 98.96, which means that after entering the first type of sample (ie wall image sample) marked in the hyperspectral image into the MSSN, the accuracy of the correct identification can reach 100%, and after entering the first type of sample into the SVN, it can The accuracy rate of being correctly identified is 98.96%. Therefore, the classification accuracy of MSSN for each category is higher than that of SVM, especially the three types of samples of 7, 8, and 9. MSSN greatly improves the classification accuracy and effectively shows the performance of MSSN.

It is worth noting that in this application example, the following three indicators are used to measure the performance of MSSN and SVM, which are:

(1) Overall accuracy OA, OA = the number of samples correctly classified/the number of samples to be classified, for example, the number of samples of the first type is 100 (for example, 100 pixels belonging to the wall in the hyperspectral image), after After classification, the number of samples correctly classified into the first category out of 100 samples is the number of samples correctly classified.

(2) Average accuracy AA, AA = the sum of the correct rates of each category/the number of categories.

(3) Kappa coefficient, Kappa=(OA-OO)/(1-AO), where AO is the theoretical accuracy, which is the preset accuracy value.

The above is a specific introduction to the image classification method provided in the embodiment of the present application, and the model training method provided in the embodiment of the present application will be described below. FIG. 6 is a schematic flow chart of the model training method provided by the embodiment of the application. Please refer to FIG. 6. The method includes:

601. Obtain an image to be trained;

602. Obtain spatial information of the image to be trained and spectral information of the image to be trained.

603. Perform convolution processing on the spectral information through the first convolution layer to obtain the first spectral feature of the image to be trained.

604. Perform maximum pooling processing on the first spectral feature through the first pooling layer to obtain the second spectral feature of the image to be trained.

605. Perform convolution processing on the spatial information through the second convolution layer to obtain the first spatial feature of the image to be trained.

606. Perform maximum pooling processing on the first spatial feature through the second pooling layer to obtain the second spatial feature of the image to be trained.

607. Perform stretching processing on the second spectral feature and the second spatial feature by the classification model to be trained to obtain the third spectral feature and the third spatial feature.

608. Perform fusion processing on the third spectral feature and the third spatial feature through the fully connected layer to obtain the spatial spectrum feature;

609. Perform classification processing on the space spectrum feature through the classification layer to obtain a classification result;

For specific descriptions of step 601 to step 609, reference may be made to related descriptions of step 201 to step 209 in the above-mentioned embodiment, which will not be repeated here.

610. According to the classification result and the real result, the to-be-trained classification model is trained through the target loss function to obtain the image classification model.

Since the image to be trained is a certain pixel in the multispectral image, the classification result includes the probability that the image to be trained belongs to each category, but the classification result is not necessarily correct. Since the correct category of the image to be trained in the multispectral image is marked in advance when the image to be trained is obtained, that is, the real result, the difference between the classification result of the image to be trained and the real result can be calculated through the objective loss function. If the difference between the two is beyond the qualified range, adjust the parameters of the classification model to be trained, and re-train with additional samples to be trained until the gap between the classification result of the image to be trained and the real result meets the requirements, then the image can be obtained. 2 The image classification model in the corresponding embodiment.

The image classification model obtained in this embodiment can extract the spatial and spectral features of the target image. The space spectrum feature formed by the combination of the two can represent the attribute information of the image in multiple dimensions. Therefore, the image is performed based on the space spectrum feature. Classification can effectively improve the accuracy of image classification results and accurately identify objects in the image.

FIG. 7 is a schematic diagram of another flow chart of the model training method provided by an embodiment of the application. Please refer to FIG. 7. The method includes:

701. Obtain an image to be trained;

702. Obtain spatial information of the image to be trained and spectral information of the image to be trained.

703. Perform convolution processing on the spectral information through the first first convolution layer to obtain the first first spectral feature;

704. Perform maximum pooling processing on the first first spectral feature to the n-1th first spectral feature through the first first pooling layer to the n-1th first pooling layer, respectively, to obtain the first Second spectral feature to the n-1th second spectral feature;

705. Perform convolution processing on the first second spectral feature to the n-1th second spectral feature through the second first convolutional layer to the nth first convolutional layer, respectively, to obtain the second first convolutional layer. From the spectral feature to the n-th first spectral feature;

706. Perform convolution processing on the spatial information through the first second convolution layer to obtain the first first spatial feature.

707. Perform maximum pooling processing on the first first spatial feature to the n-1th first spatial feature through the first second pooling layer to the n-1th second pooling layer, respectively, to obtain the first Second spatial feature to the n-1th second spatial feature;

708. Perform convolution processing on the first second spatial feature to the n-1th second spatial feature through the second second convolutional layer to the nth second convolutional layer, respectively, to obtain the second first Spatial feature to the nth first spatial feature;

709. Use the to-be-trained classification model to convert the first second spectral feature to the n-1th second spectral feature, the first second spatial feature to the n-1th second spatial feature, and the nth first spectrum. The feature and the nth first spatial feature are respectively stretched to obtain the first third spectral feature to the nth third spectral feature, and the first third spatial feature to the nth third spatial feature;

710. Perform fusion processing on n pairs of feature groups respectively through n fully connected layers to obtain n subspace spectral features, where a third spectral feature and a third spatial feature with the same order form a feature group;

711: Perform splicing processing on the n sub-space spectrum features through the classification model to be trained to obtain the empty spectrum feature;

712. Perform classification processing on the space spectrum feature through the classification layer to obtain a classification result.

For specific descriptions of step 701 to step 712, please refer to the relevant descriptions of step 401 to step 412 in the foregoing embodiment, which will not be repeated here.

713. According to the classification result and the real result, the to-be-trained classification model is trained through the target loss function to obtain the image classification model.

Since the image to be trained is a certain pixel in the multispectral image, the classification result includes the probability that the image to be trained belongs to each category, but the classification result is not necessarily correct. Since the correct category of the image to be trained in the multispectral image is marked in advance when the image to be trained is obtained, that is, the real result, the difference between the classification result of the image to be trained and the real result can be calculated through the objective loss function. If the difference between the two is beyond the qualified range, adjust the parameters of the classification model to be trained, and re-train with additional samples to be trained until the gap between the classification result of the image to be trained and the real result meets the requirements, then the image can be obtained. 4 corresponds to the image classification model in the embodiment.

The above is a specific introduction to the model training method provided by the embodiment of the present application. The image classification device and the model training device provided in the embodiment of the present application will be separately described below. FIG. 8 is a schematic structural diagram of an image classification device provided by an embodiment of the application. As shown in FIG. 8, the device includes:

The first acquisition module 801 is configured to acquire a target image, and the target image is an image generated based on a hyperspectral image;

The extraction module 802 is used to extract the spatial characteristics of the target image and the spectral characteristics of the target image through the image classification model;

The construction module 803 is used to construct the empty spectrum feature according to the spatial feature and the spectral feature through the image classification model;

The second acquisition module 804 is configured to acquire the classification result of the empty spectrum feature through the image classification model;

The determining module 805 is used to determine the category to which the target image belongs according to the classification result.

Optionally, the device further includes:

The extraction module 802 is also used to perform feature extraction on the spatial information and the spectral information respectively through the image classification model to obtain the spatial features of the target image and the spectral features of the target image.

Optionally, the image classification model includes a first branch network and a second branch network, the first branch network includes a first convolutional layer and a first pooling layer, and the second branch network includes a second convolutional layer and a second pooling layer. The layer, the extraction module 802 is also used to:

Optionally, the image classification model further includes a fully connected layer, and the building module 803 is also used to:

Optionally, the image classification model includes a first branch network and a second branch network, the first branch network includes n first convolutional layers and n-1 first pooling layers, and the second branch network includes n second The convolutional layer and n-1 second pooling layers, where n is greater than or equal to 2, the extraction module 802 is also used for:

Optionally, the image classification model further includes n fully connected layers, and the building module 803 is also used to:

Optionally, the image classification model further includes a classification layer, and the second acquisition module 804 is further configured to classify the empty spectrum features through the classification layer to obtain a classification result.

Fig. 9 is a schematic structural diagram of a model training device provided by an embodiment of the application. As shown in Fig. 9, the device includes:

The first acquisition module 901 is configured to acquire an image to be trained, and the image to be trained is an image generated based on a hyperspectral image;

The extraction module 902 is configured to extract the spatial features of the image to be trained and the spectral features of the image to be trained through the classification model to be trained;

The construction module 903 is used to construct an empty spectrum feature according to the spatial feature and the spectral feature through the classification model to be trained;

The second obtaining module 904 is configured to obtain the classification result of the empty spectrum feature through the classification model to be trained;

The training module 905 is used to train the to-be-trained classification model through the target loss function according to the classification result and the real result to obtain the image classification model.

Optionally, the device further includes:

The extraction module 902 is further configured to perform feature extraction on the spatial information and the spectral information respectively through the classification model to be trained to obtain the spatial features of the image to be trained and the spectral features of the image to be trained.

Optionally, the classification model to be trained includes a first branch network and a second branch network. The first branch network includes n first convolutional layers and n-1 first pooling layers, and the second branch network includes n first convolutional layers. Two convolutional layers and n-1 second pooling layers, where n is greater than or equal to 2, and the extraction module 902 is also used for:

Optionally, the classification model to be trained further includes n fully connected layers, and the building module 903 is also used to:

Optionally, the classification model to be trained further includes a classification layer, and the second acquisition module 904 is further configured to classify the empty spectrum features through the classification layer to obtain a classification result.

It should be noted that the information interaction and execution process among the various modules/units of the above-mentioned device are based on the same concept as the method embodiment of the present application, and the technical effects brought by it are the same as those of the method embodiment of the present application, and the specific content may be Please refer to the description in the method embodiment shown in the foregoing application, which will not be repeated here.

Fig. 10 is a schematic structural diagram of an image classification device provided by an embodiment of the application. Please refer to Fig. 10. The device includes: one or more central processing units 1001, a memory 1002, an input/output interface 1003, and a wired or wireless network interface 1004, Power supply 1005;

The memory 1002 is a short-term storage memory or a persistent storage memory;

The central processing unit 1001 is configured to communicate with the memory 1002, and execute the instruction operations in the memory 1002 on the image classification device to perform operations performed by the image classification device in FIG. 2 or FIG. 4, and details are not described herein again.

The embodiment of the present application also relates to a computer-readable storage medium, including instructions, which when run on a computer, cause the computer to execute the method corresponding to FIG. 2 or FIG. 4.

The embodiment of the present application also relates to providing a computer program product containing instructions, which when running on a computer, causes the computer to execute the method corresponding to FIG. 2 or FIG. 4.

Those skilled in the art can clearly understand that, for the convenience and conciseness of the description, the specific working process of the above-described system, device, and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks and other media that can store program codes. .

Claims

An image classification method, characterized in that the method includes:

Acquiring a target image, the target image being an image generated based on a multispectral image;

Extracting the spatial feature of the target image and the spectral feature of the target image through an image classification model;

Constructing a spatial spectrum feature according to the spatial feature and the spectral feature through the image classification model;

Acquiring the classification result of the empty spectrum feature through the image classification model;

According to the classification result, the category to which the target image belongs is determined.
The method according to claim 1, wherein before extracting the spatial feature of the target image and the spectral feature of the target image through an image classification model, the method further comprises:

Acquire the spatial information of the target image and the spectral information of the target image, wherein the spectral information of the target image is a one-dimensional vector formed by the target image, and the spatial information of the target image is the target image And a two-dimensional vector formed by the neighborhood image of the target image;

The extracting the spatial feature of the target image and the spectral feature of the target image through an image classification model includes:

The spatial information and the spectral information are respectively feature extracted through the image classification model to obtain the spatial features of the target image and the spectral features of the target image.
The method according to claim 2, wherein the image classification model includes a first branch network and a second branch network, the first branch network includes a first convolutional layer and a first pooling layer, the The second branch network includes a second convolutional layer and a second pooling layer. The spatial information and the spectral information are respectively feature extracted through the image classification model to obtain the spatial features of the target image and the The spectral characteristics of the target image include:

Performing convolution processing on the spectral information through the first convolution layer to obtain the first spectral feature of the target image;

Performing maximum pooling processing on the first spectral feature through the first pooling layer to obtain the second spectral feature of the target image;

Performing convolution processing on the spatial information through the second convolution layer to obtain the first spatial feature of the target image;

Perform maximum pooling processing on the first spatial feature through the second pooling layer to obtain the second spatial feature of the target image.
The method according to claim 3, wherein the image classification model further comprises a fully connected layer, and constructing a space spectrum feature according to the spatial feature and the spectral feature through the image classification model comprises:

The second spectral feature and the second spatial feature are respectively stretched through the image classification model to obtain a third spectral feature and a third spatial feature, wherein the third spectral feature and the third spatial feature Is a one-dimensional vector;

The third spectral feature and the third spatial feature are fused through the fully connected layer to obtain a spatial spectrum feature.
The method according to claim 2, wherein the image classification model includes a first branch network and a second branch network, and the first branch network includes n first convolutional layers and n-1 first convolutional layers. Pooling layer, the second branch network includes n second convolutional layers and n-1 second pooling layers, where n is greater than or equal to 2, and the spatial information and all the spatial information are compared through the image classification model. Performing feature extraction on the spectral information respectively to obtain the spatial feature of the target image and the spectral feature of the target image includes:

Performing convolution processing on the spectral information by using the first first convolution layer to obtain the first first spectral feature;

Through the first pooling layer to the n-1th first pooling layer, the first first spectral feature to the n-1th first spectral feature are respectively subjected to maximum pooling processing to obtain The first second spectral feature to the n-1th second spectral feature;

Convolution processing is performed on the first second spectral feature to the n-1th second spectral feature through the second first convolutional layer to the nth first convolutional layer to obtain the first 2 first spectral features to the nth first spectral feature;

Performing convolution processing on the spatial information by using the first second convolution layer to obtain the first first spatial feature;

Through the first second pooling layer to the n-1th second pooling layer, the first first spatial feature to the n-1th first spatial feature are respectively subjected to maximum pooling processing to obtain The first second spatial feature to the n-1th second spatial feature;

The first second spatial feature to the n-1th second spatial feature are respectively subjected to convolution processing through the second second convolutional layer to the nth second convolutional layer to obtain the first 2 first spatial features to nth first spatial feature.
The method according to claim 5, wherein the image classification model further comprises n fully connected layers, and constructing a space spectrum feature according to the spatial feature and the spectral feature through the image classification model comprises:

According to the image classification model, the first second spectral feature to the n-1th second spectral feature, the first second spatial feature to the n-1th second spatial feature, the first The n first spectral features and the nth first spatial feature are respectively stretched to obtain the first third spectral feature to the nth third spectral feature, and the first third spatial feature to the nth A third spatial feature, wherein the third spectral feature and the third spatial feature are one-dimensional vectors;

The n pairs of feature groups are respectively fused through the n fully connected layers to obtain n sub-space spectral features, wherein one of the third spectral features and one of the third spatial features with the same order constitutes one of the features group;

The n subspace spectrum features are spliced by the image classification model to obtain the spatial spectrum feature.
The method according to any one of claims 1 to 6, wherein the image classification model further comprises a classification layer, and obtaining the classification result of the empty spectrum feature through the image classification model comprises:

Perform classification processing on the empty spectrum feature through the classification layer to obtain a classification result.
A method for model training is characterized in that it includes:

Acquiring an image to be trained, where the image to be trained is an image generated based on a hyperspectral image;

Extracting the spatial feature of the image to be trained and the spectral feature of the image to be trained through the classification model to be trained;

Constructing an empty spectrum feature according to the spatial feature and the spectral feature through the classification model to be trained;

Acquiring the classification result of the empty spectrum feature through the classification model to be trained;

According to the classification result and the real result, the classification model to be trained is trained through the target loss function to obtain an image classification model.
The method according to claim 8, characterized in that, before extracting the spatial characteristics of the image to be trained and the spectral characteristics of the image to be trained through the classification model to be trained, the method further comprises:

Obtain the spatial information of the image to be trained and the spectral information of the image to be trained, wherein the spectral information of the image to be trained is a one-dimensional vector formed by the image to be trained, and the spatial information of the image to be trained Is a two-dimensional vector formed by the image to be trained and the neighborhood image of the image to be trained;

The extracting the spatial feature of the image to be trained and the spectral feature of the image to be trained through the classification model to be trained includes:

The spatial information and the spectral information are respectively feature extracted through the to-be-trained classification model to obtain the spatial features of the to-be-trained image and the spectral features of the to-be-trained image.
The method according to claim 9, wherein the classification model to be trained includes a first branch network and a second branch network, and the first branch network includes n first convolutional layers and n-1 A pooling layer, the second branch network includes n second convolutional layers and n-1 second pooling layers, where n is greater than or equal to 2, and the spatial information is analyzed by the classification model to be trained Performing feature extraction separately from the spectral information to obtain the spatial feature of the image to be trained and the spectral feature of the image to be trained includes:

Performing convolution processing on the spectral information by using the first first convolution layer to obtain the first first spectral feature;

Through the first pooling layer to the n-1th first pooling layer, the first first spectral feature to the n-1th first spectral feature are respectively subjected to maximum pooling processing to obtain The first second spectral feature to the n-1th second spectral feature;

Convolution processing is performed on the first second spectral feature to the n-1th second spectral feature through the second first convolutional layer to the nth first convolutional layer to obtain the first 2 first spectral features to the nth first spectral feature;

Performing convolution processing on the spatial information by using the first second convolution layer to obtain the first first spatial feature;

Through the first second pooling layer to the n-1th second pooling layer, the first first spatial feature to the n-1th first spatial feature are respectively subjected to maximum pooling processing to obtain The first second spatial feature to the n-1th second spatial feature;

The first second spatial feature to the n-1th second spatial feature are respectively subjected to convolution processing through the second second convolutional layer to the nth second convolutional layer to obtain the first 2 first spatial features to nth first spatial feature.
The method according to claim 10, wherein the classification model to be trained further comprises n fully connected layers, and constructing a spatial spectrum feature according to the spatial feature and the spectral feature through the classification model to be trained comprises:

According to the classification model to be trained, the first second spectral feature to the n-1th second spectral feature, the first second spatial feature to the n-1th second spatial feature, the The nth first spectral feature and the nth first spatial feature are respectively stretched to obtain the first third spectral feature to the nth third spectral feature, and the first third spatial feature to the first n third spatial features, wherein the third spectral feature and the third spatial feature are one-dimensional vectors;

The n pairs of feature groups are respectively fused through the n fully connected layers to obtain n sub-space spectral features, wherein one of the third spectral features and one of the third spatial features with the same order constitutes one of the features group;

The n sub-space-spectrum features are spliced by the classification model to be trained to obtain the null-spectrum features.
The method according to any one of claims 8 to 11, wherein the classification model to be trained further comprises a classification layer, and obtaining the classification result of the empty spectrum feature through the classification model to be trained comprises:

Perform classification processing on the empty spectrum feature through the classification layer to obtain a classification result.
An image classification device, characterized in that it comprises:

The first acquisition module is configured to acquire a target image, the target image being an image generated based on a hyperspectral image;

An extraction module for extracting the spatial characteristics of the target image and the spectral characteristics of the target image through an image classification model;

A construction module, configured to construct an empty spectrum feature according to the spatial feature and the spectral feature through the image classification model;

The second acquisition module is configured to acquire the classification result of the empty spectrum feature through the image classification model;

The determining module is configured to determine the category to which the target image belongs according to the classification result.
A model training device is characterized in that it comprises:

The first acquisition module is configured to acquire an image to be trained, and the image to be trained is an image generated based on a hyperspectral image;

An extraction module, configured to extract the spatial characteristics of the image to be trained and the spectral characteristics of the image to be trained through the classification model to be trained;

A construction module, configured to construct an empty spectrum feature according to the spatial feature and the spectral feature through the classification model to be trained;

The second acquisition module is configured to acquire the classification result of the empty spectrum feature through the classification model to be trained;

The training module is used to train the classification model to be trained through the target loss function according to the classification result and the real result to obtain an image classification model.
An image classification device, characterized in that it comprises:

One or more central processing units, memory, input and output interfaces, wired or wireless network interfaces, power supply;

The memory is a short-term storage memory or a persistent storage memory;

The central processing unit is configured to communicate with the memory, and execute the instruction operations in the memory on the image classification device to execute the method according to any one of claims 1-12.
A computer-readable storage medium, comprising instructions, when the instructions run on a computer, cause the computer to execute the method according to any one of claims 1-12.
A computer program product containing instructions that, when run on a computer, causes the computer to execute the method according to any one of claims 1 to 12.