WO2023137916A1

WO2023137916A1 - Graph neural network-based image scene classification method and apparatus

Info

Publication number: WO2023137916A1
Application number: PCT/CN2022/090725
Authority: WO
Inventors: 王俊
Original assignee: 平安科技（深圳）有限公司
Priority date: 2022-01-21
Filing date: 2022-04-29
Publication date: 2023-07-27
Also published as: CN114399002A

Abstract

The present application relates to the technical field of artificial intelligence. Embodiments of the present application provide a graph neural network-based image scene classification method and apparatus, an electronic device, and a storage medium. The method comprises: performing superpixel segmentation on a target image to be classified so as to obtain a target superpixel segmented image; for each target superpixel unit, updating state vectors of the target superpixel units according to the state vectors of the target superpixel units, state vectors of adjacent target superpixel units, and edge features between the target superpixel units and the adjacent target superpixel units so as to obtain updated state vectors of the target superpixel units; and inputting the updated state vectors of all target superpixel units to a pre-trained image scene classification model so as to obtain a target scene label. According to the present application, global context information can be effectively obtained, thereby improving the accuracy performance of the model in image comprehension tasks, and also eliminating the limitations of high-cost spatial information.

Description

Image scene classification method and device based on graph neural network

This application claims the priority of the Chinese patent application with the application number 202210073146.3 and the invention title "Image Scene Classification Method and Device Based on Graph Neural Network" submitted to the China Patent Office on January 21, 2022, the entire contents of which are incorporated in this application by reference.

technical field

The present application relates to the technical field of artificial intelligence, and in particular to a graph neural network-based image scene classification method, device, electronic equipment, and storage medium.

Background technique

Image scene classification means that for a given image, by identifying the information and content it contains to judge the scene it belongs to (such as nature, street, indoor, etc.), so as to achieve the purpose of scene classification. Convolutional Neural Networks (CNNs) are widely used in computer vision tasks such as image scene classification.

technical problem

The following is the technical problem of the prior art realized by the inventor: directly using the convolutional neural network model for classification, although it can achieve a certain accuracy of scene category classification, but the extraction and modeling of image scene information by conventional convolutional neural networks does not conform to the actual way of human brain cognition, so it also brings problems such as poor interpretability and limited accuracy of the model. The existing global context information acquisition methods, such as non-local means and various attention mechanisms, have too high a parameter cost and are difficult to apply to high-resolution input image scenarios. Therefore, how to improve the accuracy of image scene classification and reduce the amount of parameters in the classification process has become a technical problem to be solved urgently.

technical solution

In the first aspect, the embodiment of the present application proposes an image scene classification method based on a graph neural network, the method comprising:

Perform superpixel segmentation on the target image to be classified to obtain the target superpixel segmentation image;

Obtaining a plurality of target superpixel units under the target superpixel segmentation image, using each of the target superpixel units as a node, and acquiring node features of each target superpixel unit and edge features between adjacent target superpixel units;

For each target superpixel unit, according to the node characteristics of the target superpixel unit, determine the state vector of the target superpixel unit;

For each target superpixel unit, according to the state vector of the target superpixel unit, the state vector of the adjacent target superpixel unit, and the edge feature between the target superpixel unit and the adjacent target superpixel unit, the state vector of the target superpixel unit is updated to obtain the updated state vector of the target superpixel unit;

The updated state vectors of all target superpixel units are input to a pre-trained image scene classification model, so that the image scene classification model outputs a target scene label based on the target superpixel segmented image;

An image scene classification result corresponding to the target image is determined according to the target scene label.

In the second aspect, the embodiment of the present application proposes an image scene classification device based on a graph neural network, including:

The image segmentation module is used to perform superpixel segmentation on the target image to be classified to obtain the target superpixel segmentation image;

The feature extraction module is used to obtain a plurality of target superpixel units under the target superpixel segmentation image, and each of the target superpixel units is used as a node to obtain node features of each target superpixel unit and edge features between adjacent target superpixel units;

A state determination module, configured to, for each target superpixel unit, determine the state vector of the target superpixel unit according to the node characteristics of the target superpixel unit;

A state update module, for each target superpixel unit, according to the state vector of the target superpixel unit, the state vector of the adjacent target superpixel unit, the edge feature between the target superpixel unit and the adjacent target superpixel unit, update the state vector of the target superpixel unit, and obtain the updated state vector of the target superpixel unit;

A label output module, configured to input the updated state vectors of all target superpixel units to the pre-trained image scene classification model, so that the image scene classification model outputs the target scene label based on the target superpixel segmented image;

A scene classification module, configured to determine an image scene classification result corresponding to the target image according to the target scene label.

In a third aspect, an embodiment of the present application proposes an electronic device, the electronic device includes a memory, a processor, a program stored in the memory and operable on the processor, and a data bus for realizing connection and communication between the processor and the memory, when the program is executed by the processor, an image scene classification method based on a graph neural network is implemented, wherein the image scene classification method based on a graph neural network includes:

In a fourth aspect, an embodiment of the present application proposes a storage medium, the storage medium is a computer-readable storage medium for computer-readable storage, the storage medium stores one or more programs, and the one or more programs can be executed by one or more processors to implement a graph neural network-based image scene classification method, wherein the graph neural network-based image scene classification method includes:

Beneficial effect

The scheme of this application is based on graph neural network modeling, and constructs graph data based on superpixel units obtained by superpixel segmentation of target images. In addition, in order to fully mine the spatio-temporal topological relationship of the target image scene, the correlation between adjacent superpixel units and edge features are considered in the modeling process, so that the message passing property of graph neural network can be used to achieve effective image scene classification. In this way, learning the correlation between local features through the graph neural network, not limited to the correlation between a single pixel pair, can better realize feature migration and utilization, effectively obtain global context information, improve the accuracy of deep models in image understanding tasks, and eliminate the limitations of high-cost spatial information.

Description of drawings

Fig. 1 is a schematic flow diagram of an image scene classification method based on a graph neural network provided by an embodiment of the present application;

Fig. 2 a is the schematic diagram of the target image to be classified in the embodiment of the present application;

Fig. 2b is a schematic diagram of a target superpixel segmented image in an embodiment of the present application;

Fig. 3 is a schematic diagram of the message passing process of the graph neural network of the embodiment of the present application;

Fig. 4 is a schematic flow chart of step S140 in Fig. 1;

FIG. 5 is a schematic diagram of a training process of an image scene classification model provided by an embodiment of the present application;

FIG. 6 is a schematic flow diagram of another image scene classification method based on a graph neural network provided by an embodiment of the present application;

7 is a schematic diagram of target superpixel segmentation images of different levels generated based on different preset segmentation thresholds;

FIG. 8 is a schematic diagram of another image scene classification model training process provided by the embodiment of the present application;

9 is a schematic structural diagram of an image scene classification device based on a graph neural network provided by an embodiment of the present application;

FIG. 10 is a schematic diagram of a hardware structure of an electronic device provided by an embodiment of the present application.

Embodiments of the present invention

In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, not to limit the present application.

First, analyze some nouns involved in this application:

Artificial intelligence (AI): It is a new technical science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence; artificial intelligence is a branch of computer science, artificial intelligence attempts to understand the essence of intelligence, and produce a new intelligent machine that can respond in a similar way to human intelligence. Research in this field includes robots, language recognition, image recognition, natural language processing and expert systems. Artificial intelligence can simulate the information process of human consciousness and thinking. Artificial intelligence is also a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.

Graph Neural Networks (GNN): GNN is a neural network that operates directly on graph structures. A graph structure usually includes multiple nodes. In a graph structure, a node can represent an object or concept, and an edge can represent the relationship between nodes. GNN uses a state vector to represent the state of the node. GNN is based on a message propagation mechanism. Each node updates its node status by exchanging messages with each other until a certain stable value is reached. The output of GNN is calculated at each node according to the current node status. It can be said that the main process of GNN learning is to iteratively aggregate and update the adjacent information of nodes in graph data. In an iteration, each node updates its own information by aggregating the features of adjacent nodes and its own features in the previous layer, and usually performs nonlinear transformation on the aggregated information. By stacking multi-layer networks, each node can obtain the information of adjacent nodes within the corresponding hop number.

Region growing algorithms: Digital image segmentation algorithms are generally based on one of two fundamental properties of gray values: discontinuity and similarity. The application of the former property is to segment the image based on the discontinuous change of image gray level, such as the edge of the image. The main application of the second property is to segment an image into similar regions according to implementation-specified criteria. The region growing algorithm is based on the second property of the image, that is, the similarity of the gray value of the image. The basic idea of the region growing algorithm is to merge pixels with similar properties together. For each region, a seed point should be designated as the starting point of growth, and then the pixels in the area around the seed point will be compared with the seed point, and the points with similar properties will be merged to continue growing outward until no pixels that meet the conditions are included. The growth of such a region is complete.

Image scene classification means that for a given image, by judging and identifying the information and content it contains to judge the scene it belongs to (such as nature, street, indoor, etc.), so as to achieve the purpose of scene classification. Convolutional neural network (CNN) is widely used in computer vision tasks such as image scene classification. However, direct use of convolutional neural network model for classification can achieve a certain accuracy of scene category classification. However, the extraction and modeling of image scene information by conventional convolutional neural network does not conform to the actual way of human brain cognition, so it also brings problems such as poor model interpretability and limited accuracy. Existing global context information acquisition methods, such as non-local and various attention mechanisms, have too high a parameter cost and are difficult to apply to high-resolution input image scenarios. Therefore, how to improve the accuracy of image scene classification and reduce the amount of parameters in the classification process has become a technical problem to be solved urgently.

Based on this, the embodiment of the present application provides a graph neural network-based image scene classification method, device, electronic equipment, and storage medium, aiming at improving the accuracy of image scene classification and reducing the amount of parameters in the classification process.

The embodiments of the present application may acquire and process relevant data based on artificial intelligence technology. Among them, artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.

Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

The image scene classification method based on the graph neural network provided in the embodiment of the present application relates to the technical fields of artificial intelligence and image processing. The image scene classification method provided in the embodiment of the present application may be applied to a terminal, may also be applied to a server, and may also be software running on the terminal or the server. In some embodiments, the terminal can be a smart phone, tablet computer, notebook computer, desktop computer, etc.; the server can be configured as an independent physical server, or can be configured as a server cluster or distributed system composed of multiple physical servers, and can also be configured as a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms;

The application can be used in numerous general purpose or special purpose computer system environments or configurations. Examples: personal computers, server computers, handheld or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics devices, network PCs, minicomputers, mainframe computers, distributed computing environments including any of the above systems or devices, etc. This application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices.

Please refer to FIG. 1 , which shows a schematic flowchart of a method for classifying image scenes based on a graph neural network provided by an embodiment of the present application. As shown in Fig. 1, the image scene classification method includes but not limited to the following steps S110-S160.

Step S110, performing superpixel segmentation on the target image to be classified to obtain the target superpixel segmented image.

Exemplarily, FIG. 2a is a target image to be classified, and by performing superpixel segmentation on the target image, a target superpixel segmented image as shown in FIG. 2b can be obtained.

Exemplarily, in step S110, perform superpixel segmentation on the target image to be classified to obtain the target superpixel segmented image, which may be specifically implemented through the following method steps: use a region growing algorithm to perform region segmentation on the target image to be classified to obtain the target superpixel segmented image.

The region growing algorithm is an image segmentation method to digitally segment the image based on the similarity of the gray value of the image. In the process of superpixel segmentation of the target image based on the region growing algorithm, the target image can be regarded as an image composed of N*N pixels based on the preset pixel parameters, and the seed point is selected from the N*N pixels based on the preset rules, and then judge whether the gray value of the adjacent pixel point and the gray value of the current seed point meet the preset similarity, and if so, add the adjacent pixel point to the area to which the current seed point belongs. The condition of region growing is actually some similarity criteria defined according to the continuity between pixel gray levels, and the condition of region growing stop defines a termination rule, basically, when no pixel satisfies the condition of joining a certain region, region growing will stop. In the algorithm, define a variable, the maximum pixel gray value distance reg_maxdist. When the absolute value of the difference between the gray value of the pixel to be added and the average gray value of all pixels in the segmented area is less than or equal to reg_maxdist, the pixel is added to the segmented area. Otherwise, the region growing algorithm stops.

Step S120, acquiring a plurality of target superpixel units under the target superpixel segmentation image, using each of the target superpixel units as a node, and acquiring node features of each target superpixel unit and edge features between adjacent target superpixel units.

As shown in FIG. 2b, after the target image is subjected to superpixel segmentation processing, the image is subdivided into multiple image sub-regions (sets of pixels), and these image sub-regions are super-pixel units. In the embodiment of the present application, the superpixel unit under the target superpixel segmentation image is used as the target superpixel unit, and then the graph data is constructed based on the target superpixel unit. Specifically, each of the target superpixel units is regarded as a node, and the node features of each target superpixel unit and the edge features between adjacent target superpixel units are obtained.

Exemplarily, the node features include at least one of grayscale features, shape features, and texture features. The grayscale feature, shape feature, and texture feature will be exemplified below.

(1) Grayscale features

The grayscale feature is to describe the apparent physical properties of the target object through the grayscale change, which directly reflects the color law of the target itself. In the multi-channel image, the grayscale values of the target in different bands of red, green and blue form a set of vectors. Based on the vectors, multiple indicators including mean, brightness, variance, and standard deviation can be calculated.

(2) Shape features

For image targets, most targets usually have a corresponding regular geometric form in terms of apparent shape. As an important target representation means of images, shape features are generally determined by the outer contour of the target in the image, which reflects the geometric form of the target to a certain extent. Distinguish from other objects based on the compacted outer contour, and the shape features have the advantages of rotation invariance at a recognizable resolution. Commonly used shape features of objects include length, width, area, perimeter, density, roundness, shape index, and rectangularity.

(3) Texture features

The texture information in the image embodies the combined information of grayscale features and spatial features, and can reflect the spatial distribution properties of pixel color information. It usually appears as a local regular pattern at an intermediate scale between the pixel level and the scene level, and belongs to a semi-macro-level target knowledge. In actual computing, texture is often described as the spatial regularity distribution and correlation between image gray levels within a specific 3*3, 5*5, 7*7 or larger window. The most classic and widely used statistical calculation method for texture features is the gray level co-occurrence matrix method proposed by Haralick et al.

Step S130, for each target superpixel unit, determine the state vector of the target superpixel unit according to the node characteristics of the target superpixel unit.

Exemplarily, in the embodiment of the present application, after determining the node characteristics of each target superpixel unit, the initial state of the target superpixel unit may be determined according to the node characteristics of the target superpixel unit according to a preset mapping rule. For the target superpixel unit v, it can be obtained by

Represents its initial state vector, v∈N, N is the set of target superpixel units.

Step S140, for each target superpixel unit, update the state vector of the target superpixel unit according to the state vector of the target superpixel unit, the state vectors of adjacent target superpixel units, and the edge features between the target superpixel unit and adjacent target superpixel units, to obtain the updated state vector of the target superpixel unit.

It can be understood that the embodiment of the present application is based on the message propagation mechanism of the graph neural network. After each node determines its initial state vector, it also exchanges messages with adjacent nodes, and updates its own node status according to the adjacent node messages and the edge messages connected to the adjacent nodes.

Please refer to FIG. 3 , which shows a schematic diagram of a message passing process of a graph neural network. For node v1 in Figure 3, the state vector of node v1 can be updated by considering the state vectors of neighboring nodes v3, v5 and v8 and the edge features connected with neighboring nodes.

Referring to FIG. 4, in step S140, according to the state vector of the target superpixel unit, the state vector of the adjacent target superpixel unit, and the edge features between the target superpixel unit and the adjacent target superpixel unit, the state vector of the target superpixel unit is updated to obtain the updated state vector of the target superpixel unit, which can be specifically implemented through the following method steps S141-S142:

Step S141, according to the state vector of the target superpixel unit, the state vector of the adjacent target superpixel unit, and the edge features between the target superpixel unit and the adjacent target superpixel unit, determine the relationship feature vector of the target superpixel unit.

Specifically, the calculation formula of the relationship feature vector can refer to the following formula (1):

In the above formula,

Represents the relational feature vector of the target superpixel unit v,

Indicates the initial state vector of the target superpixel unit v,

Represents the state vector of the adjacent target superpixel unit w, e _vw represents the edge feature vector of the target superpixel unit v and the adjacent target superpixel unit w, M _t represents the message transfer function,

Represents the collection of adjacent target superpixel units.

Step S142 , updating the state vector of the target superpixel unit according to the state vector of the target superpixel unit and the relationship feature vector, to obtain an updated state vector of the target superpixel unit.

Specifically, the updated state vector of the target superpixel unit can be expressed by the following formula (2):

In the above formula,

Indicates the updated state vector of the target superpixel unit v,

Indicates the initial state vector of the target superpixel unit v,

Denotes the relational feature vector of the target superpixel unit v, and _Ut denotes the state update model.

For example, in a specific example, the initial state vector of a certain target superpixel unit indicates that its initial state is "plant", after collecting the state vectors and edge feature vectors of adjacent target superpixel units, its own state vector is updated, and the updated state vector indicates that it is "tree".

S150. Input the updated state vectors of all target superpixel units to a pre-trained image scene classification model, so that the image scene classification model outputs a target scene label based on the target superpixel segmented image.

It can be understood that after the state vectors of each node reach a stable value through update iterations, the updated state vectors of all target superpixel units are input to the pre-trained image scene classification model, so that the image scene classification model outputs the corresponding target scene labels according to the state vectors of all target superpixel units.

S160. Determine an image scene classification result corresponding to the target image according to the target scene label.

It can be understood that after the target scene label corresponding to the target image is obtained, the image scene classification result can be determined according to the target scene label.

It can be understood that before step S150, the image scene classification model provided by the embodiment of the present application needs to be trained to obtain a pre-trained image scene classification model. Please refer to FIG. 5 , which shows a schematic diagram of a training process of an image scene classification model. As shown in FIG. 5 , the training process of the image scene classification model provided by the embodiment of the present application may include the following steps S200-S250.

Step S200, acquiring a sample image and a sample scene label corresponding to the sample image.

It can be understood that, before performing model training, a sample image set is constructed first, and a sample scene label is attached to each sample image in the sample image set.

Step S210, performing superpixel segmentation on the sample image to obtain a sample superpixel segmented image.

Step S220, obtaining a plurality of sample superpixel units under the sample superpixel segmented image, using each sample superpixel unit as a node, and obtaining node features of each sample superpixel unit and edge features between adjacent sample superpixel units.

Step S230, for each sample superpixel unit, determine the state vector of the sample superpixel unit according to the node characteristics of the sample superpixel unit.

Step S240, for each sample superpixel unit, update the state vector of the sample superpixel unit according to the state vector of the sample superpixel unit, the state vectors of adjacent sample superpixel units, and the edge features between the sample superpixel unit and adjacent sample superpixel units, to obtain the updated state vector of the sample superpixel unit.

The implementation process of the above steps S210-S240 is similar to the implementation process of the previous steps S110-S140, so reference may be made to the relevant descriptions of the previous steps S110-S140, which will not be repeated here.

Step S250, using the updated state vectors of all sample superpixel units as input and the sample scene label as expected output, to train the image scene classification model.

By performing multiple rounds of iterative training on the image scene classification model until the image scene classification model satisfies the training end condition, a trained image scene classification model can be obtained.

Please refer to FIG. 6 . FIG. 6 shows a schematic flowchart of another image scene classification method based on a graph neural network provided by an embodiment of the present application. As shown in FIG. 6, the image scene classification method includes but not limited to the following steps S310-S370.

Step S310 , performing superpixel segmentation on the target image to be classified based on different preset segmentation thresholds to obtain a plurality of target superpixel segmented images, each target superpixel segmented image including a different number of target superpixel units.

It can be understood that, considering that an image scene usually has a certain hierarchy, the embodiments of the present application control the threshold of superpixel segmentation to generate different target superpixel segmentation images based on different preset segmentation thresholds, and each target superpixel segmentation image includes different numbers of target superpixel units.

For example, as shown in Figure 7, for the target image, three different levels of target superpixel segmentation images can be generated based on different preset segmentation thresholds, so as to decompose the image into a multi-level network structure for target information expression. In the example shown in FIG. 7 , the image is divided into three levels of superpixel unit sets, including 8, 4, and 2 superpixels respectively.

In specific implementation, different segmentation thresholds can be determined by adjusting the size of preset pixel parameters in the process of performing superpixel segmentation on the target image based on the region growing algorithm.

Step S320, traversing each target superpixel segmented image, and acquiring a plurality of target superpixel units under the currently traversed target superpixel segmented image.

It can be understood that after the target superpixel segmented images of different levels are obtained, each target superpixel segmented image is traversed, and the following steps S330-S360 are respectively performed for each target superpixel segmented image.

Step S330, for multiple target superpixel units under the currently traversed target superpixel segmentation image, use each target superpixel unit as a node, and acquire node features of each target superpixel unit and edge features between adjacent target superpixel units.

Step S340, for each target superpixel unit, determine the state vector of the target superpixel unit according to the node characteristics of the target superpixel unit.

Step S350, for each target superpixel unit, update the state vector of the target superpixel unit according to the state vector of the target superpixel unit, the state vectors of adjacent target superpixel units, and the edge features between the target superpixel unit and adjacent target superpixel units, to obtain an updated state vector of the target superpixel unit.

In step S360, the updated state vectors of all target superpixel units are input to the pre-trained image scene classification model, so that the image scene classification model outputs a target scene label based on the target superpixel segmented image.

The implementation process of the above steps S330-S260 is similar to the implementation process of the previous steps S120-S150, so reference may be made to the related descriptions of the previous steps S120-S150, which will not be repeated here.

Step S370, acquiring target scene labels output by the image scene classification model based on each target superpixel segmented image.

It can be understood that, in this application, target scene labels of different levels are obtained by segmenting images based on target superpixels of different levels. For example, the label of the target scene based on level 1 is "nature", the label of the target scene based on level 2 is "forest", and the label of the target scene based on level 3 is "shrub forest".

Step S380: Determine an image scene classification result corresponding to the target image according to the target scene label output based on each superpixel segmented image.

Step S380 may specifically include: concatenating target scene labels output from each superpixel segmented image to obtain an image scene classification result corresponding to the target image.

Following the previous example, based on the target superpixel segmentation image at three different levels, the target scene labels corresponding to the three levels "nature", "forest", and "shrub forest" can be obtained, and finally the target scene labels of the three levels can be concatenated to output the final scene classification result as "nature-forest-shrub forest".

In this embodiment, by generating a multi-scale representation of both the micro and macro scales of the target image, on this basis, the extraction and mining of component units and their spatial topological relationship information in the image scene can be better realized, and the key multi-level information of the scene can be obtained more deeply, so as to achieve more accurate and effective image scene classification.

It can be understood that before step S360, the image scene classification model needs to be trained to obtain a pre-trained image scene classification model. Please refer to FIG. 8 , which shows a schematic diagram of a training process of an image scene classification model. As shown in FIG. 8 , the training process of the image scene classification model provided by the embodiment of the present application may include the following steps S400-S420.

Step S400, acquiring a sample image and a sample scene label corresponding to the sample image.

Step S410, perform superpixel segmentation on the sample image based on different preset segmentation thresholds to obtain multiple sample superpixel segmented images, each sample superpixel segmented image includes a different number of sample superpixel units.

It can be understood that, for the sample image, the superpixel segmentation is performed on the sample image based on different preset segmentation thresholds, and the specific segmentation method can refer to the implementation process of the previous step S310, which will not be repeated here.

Step S420, traversing each sample superpixel segmented image to train the image scene classification model based on each sample superpixel segmented image, the training process includes steps S421-S424:

Step S421, taking each of the sample superpixel units under the currently traversed sample superpixel segmented image as a node, and acquiring node features of each sample superpixel unit and edge features between adjacent sample superpixel units.

Step S422, for each sample superpixel unit, determine the state vector of the sample superpixel unit according to the node characteristics of the sample superpixel unit.

Step S423: For each sample superpixel unit, update the state vector of the sample superpixel unit according to the state vector of the sample superpixel unit, the state vectors of adjacent sample superpixel units, and the edge features between the sample superpixel unit and adjacent sample superpixel units, to obtain the updated state vector of the sample superpixel unit.

Step S424, taking the updated state vectors of all sample superpixel units as input and the sample scene label as expected output to train the image scene classification model.

It should be noted that, the specific implementation process of steps S421-S424 is similar to the implementation process of the previous steps S220-S250, so reference may be made to the relevant descriptions of the previous steps S220-S250, which will not be repeated here.

Please refer to FIG. 9, the embodiment of the present application also provides an image scene classification device based on a graph neural network, the device includes:

The image segmentation module 810 is used to perform superpixel segmentation on the target image to be classified to obtain the target superpixel segmentation image;

The feature extraction module 820 is used to obtain a plurality of target superpixel units under the target superpixel segmentation image, and use each of the target superpixel units as a node to obtain node features of each target superpixel unit and edge features between adjacent target superpixel units;

A state determination module 830, configured to, for each target superpixel unit, determine the state vector of the target superpixel unit according to the node characteristics of the target superpixel unit;

The state update module 840, for each target superpixel unit, according to the state vector of the target superpixel unit, the state vector of the adjacent target superpixel unit, and the edge characteristics between the target superpixel unit and the adjacent target superpixel unit, update the state vector of the target superpixel unit to obtain the updated state vector of the target superpixel unit;

The label output module 850 is used to input the updated state vectors of all target superpixel units to the image scene classification model trained in advance, so that the image scene classification model outputs the target scene label based on the target superpixel segmented image;

A scene classification module 860, configured to determine an image scene classification result corresponding to the target image according to the target scene label.

It can be understood that, in some embodiments, the graph neural network-based image scene classification device of the present application further includes a training module for training the image scene classification model to obtain a trained image scene classification model.

It should be noted that, since the information exchange and execution process between the modules of the above-mentioned device are based on the same idea as the method embodiment of the present application, its specific functions and technical effects can be found in the method embodiment section, and will not be repeated here.

The embodiment of the present application also provides an electronic device, the electronic device includes: a memory, a processor, a program stored on the memory and operable on the processor, and a data bus for realizing connection and communication between the processor and the memory, and the above-mentioned image scene classification method is implemented when the program is executed by the processor. The electronic device may be any intelligent terminal including a tablet computer, a vehicle-mounted computer, and the like.

Please refer to FIG. 10. FIG. 10 illustrates a hardware structure of an electronic device in another embodiment. The electronic device includes:

The processor 901 can be implemented by a general-purpose CPU (Central Processing Unit, central processing unit), microprocessor, application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related programs to realize the technical solutions provided by the embodiments of the present application;

The memory 902 may be implemented in the form of a read-only memory (Read Only Memory, ROM), a static storage device, a dynamic storage device, or a random access memory (Random Access Memory, RAM). The memory 902 can store an operating system and other application programs. When implementing the technical solutions provided by the embodiments of this specification through software or firmware, the relevant program codes are stored in the memory 902, and the processor 901 invokes and executes a graph neural network-based image scene classification method according to the embodiment of the present application. As a node, obtain the node characteristics of each target superpixel unit and the edge characteristics between adjacent target superpixel units; for each target superpixel unit, determine the state vector of the target superpixel unit according to the node characteristics of the target superpixel unit; The updated state vector is input to the image scene classification model trained in advance, so that the image scene classification model outputs a target scene label based on the target superpixel segmentation image; determine the image scene classification result corresponding to the target image according to the target scene label;

The input/output interface 903 is used to realize information input and output;

The communication interface 904 is used to realize the communication interaction between the device and other devices, and the communication can be realized through a wired method (such as USB, network cable, etc.), or can be realized through a wireless method (such as a mobile network, WIFI, Bluetooth, etc.); and

bus 905, for transferring information between various components of the device (such as processor 901, memory 902, input/output interface 903 and communication interface 904);

The processor 901 , the memory 902 , the input/output interface 903 and the communication interface 904 are connected to each other within the device through the bus 905 .

Exemplarily, before inputting the updated state vectors of all target superpixel units into the pre-trained image scene classification model to obtain the target scene label through the image scene classification model, the method further includes: obtaining a sample image and a sample scene label corresponding to the sample image; performing superpixel segmentation on the sample image to obtain a sample superpixel segmentation image; obtaining a plurality of sample superpixel units under the sample superpixel segmentation image, using each of the sample superpixel units as a node, and obtaining the node characteristics of each sample superpixel unit and the edge characteristics between adjacent sample superpixel units; For each sample superpixel unit, according to the node characteristics of the sample superpixel unit, determine the state vector of the sample superpixel unit; for each sample superpixel unit, update the state vector of the sample superpixel unit according to the state vector of the sample superpixel unit, the state vectors of adjacent sample superpixel units, and the edge features between the sample superpixel unit and adjacent sample superpixel units, to obtain the updated state vector of the sample superpixel unit; use the updated state vectors of all sample superpixel units as input, and the sample scene label as an expected output to train the image scene classification model .

Exemplarily, before inputting the updated state vectors of all target superpixel units into the pre-trained image scene classification model to obtain the target scene label through the image scene classification model, the method further includes: obtaining a sample image and a sample scene label corresponding to the sample image; performing superpixel segmentation on the sample image based on different preset segmentation thresholds to obtain a plurality of sample superpixel segmentation images, each sample superpixel segmentation image includes a different number of sample superpixel units; Training, the training process includes: taking each of the sample superpixel units under the currently traversed sample superpixel segmentation image as a node respectively, and obtaining the node characteristics of each sample superpixel unit and the edge characteristics between adjacent sample superpixel units; for each sample superpixel unit, determining the state vector of the sample superpixel unit according to the node characteristics of the sample superpixel unit; The state vector of the sample superpixel unit is updated to obtain the updated state vector of the sample superpixel unit; the updated state vector of all sample superpixel units is used as input and the sample scene label is used as the expected output to train the image scene classification model.

Exemplarily, the superpixel segmentation of the target image to be classified to obtain the target superpixel segmentation image includes:

The target image to be classified is subjected to superpixel segmentation based on different preset segmentation thresholds to obtain multiple target superpixel segmented images, and each target superpixel segmented image includes different numbers of target superpixel units. Correspondingly, the acquiring a plurality of target superpixel units under the target superpixel segmentation image includes: traversing each target superpixel segmentation image, and acquiring a plurality of target superpixel units under the currently traversed target superpixel segmentation image; determining the image scene classification result corresponding to the target image according to the target scene label includes: acquiring the target scene label output by the image scene classification model based on each target superpixel segmentation image; and determining the image scene classification result corresponding to the target image based on the target scene label output based on each superpixel segmentation image.

Exemplarily, the determining the image scene classification result corresponding to the target image according to the target scene label output based on each superpixel segmentation image includes: splicing the target scene label output from each superpixel segmentation image to obtain the image scene classification result corresponding to the target image.

Exemplarily, for each target superpixel unit, updating the state vector of the target superpixel unit according to the state vector of the target superpixel unit, the state vector of adjacent target superpixel units, and the edge features between the target superpixel unit and adjacent target superpixel units, to obtain the updated state vector of the target superpixel unit, includes: determining the relationship feature vector of the target superpixel unit according to the state vector of the target superpixel unit, the state vector of adjacent target superpixel units, and the edge features between the target superpixel unit and adjacent target superpixel units; The state vector of the target superpixel unit and the relationship feature vector are used to update the state vector of the target superpixel unit to obtain an updated state vector of the target superpixel unit.

The embodiment of the present application also provides a storage medium, which is a computer-readable storage medium for computer-readable storage. The storage medium stores one or more programs, and one or more programs can be executed by one or more processors to implement a graph neural network-based image scene classification method, wherein the graph neural network-based image scene classification method includes: performing superpixel segmentation on the target image to be classified to obtain a target superpixel segmentation image; The node characteristics of the pixel unit and the edge characteristics between the adjacent target superpixel units; for each target superpixel unit, according to the node characteristics of the target superpixel unit, the state vector of the target superpixel unit is determined; for each target superpixel unit, according to the state vector of the target superpixel unit, the state vector of the adjacent target superpixel unit, and the edge characteristics between the target superpixel unit and the adjacent target superpixel unit, the state vector of the target superpixel unit is updated to obtain the updated state vector of the target superpixel unit; A trained image scene classification model, so that the image scene classification model outputs a target scene label based on the target superpixel segmented image; determine an image scene classification result corresponding to the target image according to the target scene label.

Exemplarily, before inputting the updated state vectors of all target superpixel units into the pre-trained image scene classification model to obtain the target scene label through the image scene classification model, the method further includes: obtaining a sample image and a sample scene label corresponding to the sample image; performing superpixel segmentation on the sample image to obtain a sample superpixel segmentation image; obtaining a plurality of sample superpixel units under the sample superpixel segmentation image, using each of the sample superpixel units as a node, and obtaining the node characteristics of each sample superpixel unit and the edge characteristics between adjacent sample superpixel units; For each sample superpixel unit, according to the node characteristics of the sample superpixel unit, determine the state vector of the sample superpixel unit; for each sample superpixel unit, update the state vector of the sample superpixel unit according to the state vector of the sample superpixel unit, the state vectors of adjacent sample superpixel units, and the edge characteristics between the sample superpixel unit and adjacent sample superpixel units, to obtain the updated state vector of the sample superpixel unit; use the updated state vectors of all sample superpixel units as input, and the sample scene label as the expected output, and perform the image scene classification model. training.

Exemplarily, before inputting the updated state vectors of all target superpixel units into the pre-trained image scene classification model to obtain the target scene label through the image scene classification model, the method further includes: obtaining a sample image and a sample scene label corresponding to the sample image; performing superpixel segmentation on the sample image based on different preset segmentation thresholds to obtain a plurality of sample superpixel segmentation images, each sample superpixel segmentation image includes a different number of sample superpixel units; Training, the training process includes: taking each of the sample superpixel units under the currently traversed sample superpixel segmentation image as a node respectively, and obtaining the node characteristics of each sample superpixel unit and the edge characteristics between adjacent sample superpixel units; for each sample superpixel unit, determining the state vector of the sample superpixel unit according to the node characteristics of the sample superpixel unit; The state vector of the pixel unit is updated to obtain the updated state vector of the sample super pixel unit; the updated state vector of all sample super pixel units is used as input and the sample scene label is used as the expected output to train the image scene classification model.

The computer-readable storage medium may be non-volatile or volatile.

As a non-transitory computer-readable storage medium, memory can be used to store non-transitory software programs and non-transitory computer-executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage devices. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processor via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The preferred embodiments of the embodiments of the present application have been described above with reference to the accompanying drawings, which does not limit the scope of rights of the embodiments of the present application. Any modifications, equivalent replacements and improvements made by those skilled in the art without departing from the scope and essence of the embodiments of the present application shall fall within the scope of rights of the embodiments of the present application.

Claims

A method for classifying image scenes based on a graph neural network, wherein the method includes:

Perform superpixel segmentation on the target image to be classified to obtain the target superpixel segmentation image;

Obtaining a plurality of target superpixel units under the target superpixel segmentation image, using each of the target superpixel units as a node, and acquiring node features of each target superpixel unit and edge features between adjacent target superpixel units;

For each target superpixel unit, according to the node characteristics of the target superpixel unit, determine the state vector of the target superpixel unit;

For each target superpixel unit, according to the state vector of the target superpixel unit, the state vector of the adjacent target superpixel unit, and the edge feature between the target superpixel unit and the adjacent target superpixel unit, the state vector of the target superpixel unit is updated to obtain the updated state vector of the target superpixel unit;

The updated state vectors of all target superpixel units are input to a pre-trained image scene classification model, so that the image scene classification model outputs a target scene label based on the target superpixel segmented image;

An image scene classification result corresponding to the target image is determined according to the target scene label.
The method according to claim 1, wherein, before inputting the updated state vectors of all target superpixel units to a pre-trained image scene classification model to obtain the target scene label through the image scene classification model, the method further comprises:

Acquiring a sample image and a sample scene label corresponding to the sample image;

Performing superpixel segmentation on the sample image to obtain a sample superpixel segmentation image;

Obtaining a plurality of sample superpixel units under the sample superpixel segmentation image, using each of the sample superpixel units as a node, and obtaining the node features of each sample superpixel unit and the edge features between adjacent sample superpixel units;

For each sample superpixel unit, according to the node characteristics of the sample superpixel unit, determine the state vector of the sample superpixel unit;

For each sample superpixel unit, according to the state vector of the sample superpixel unit, the state vectors of adjacent sample superpixel units, and the edge features between the sample superpixel unit and adjacent sample superpixel units, update the state vector of the sample superpixel unit to obtain an updated state vector of the sample superpixel unit;

The image scene classification model is trained by taking the updated state vectors of all sample superpixel units as input and the sample scene label as expected output.
The method according to claim 1, wherein, before inputting the updated state vectors of all target superpixel units to a pre-trained image scene classification model to obtain the target scene label through the image scene classification model, the method further comprises:

Acquiring a sample image and a sample scene label corresponding to the sample image;

Performing superpixel segmentation on the sample image based on different preset segmentation thresholds to obtain a plurality of sample superpixel segmented images, each sample superpixel segmented image includes a different number of sample superpixel units;

Traversing each sample superpixel segmentation image to train the image scene classification model based on each sample superpixel segmentation image, the training process includes:

Taking each of the sample superpixel units under the currently traversed sample superpixel segmentation image as a node, and obtaining the node features of each sample superpixel unit and the edge features between adjacent sample superpixel units;

For each sample superpixel unit, according to the node characteristics of the sample superpixel unit, determine the state vector of the sample superpixel unit;

For each sample superpixel unit, according to the state vector of the sample superpixel unit, the state vectors of adjacent sample superpixel units, and the edge features between the sample superpixel unit and adjacent sample superpixel units, update the state vector of the sample superpixel unit to obtain an updated state vector of the sample superpixel unit;

The image scene classification model is trained by taking the updated state vectors of all sample superpixel units as input and the sample scene label as expected output.
The method according to claim 1, wherein, the target image to be classified is carried out to superpixel segmentation, and the target superpixel segmentation image is obtained, comprising:

Performing superpixel segmentation on the target image to be classified based on different preset segmentation thresholds to obtain multiple target superpixel segmentation images, each target superpixel segmentation image includes a different number of target superpixel units;

The acquisition of multiple target superpixel units under the target superpixel segmentation image includes:

Traverse each target superpixel segmentation image, and obtain a plurality of target superpixel units under the currently traversed target superpixel segmentation image;

The determining the image scene classification result corresponding to the target image according to the target scene label includes:

Obtaining the target scene label output by the image scene classification model based on each target superpixel segmentation image;

According to the target scene label output based on each superpixel segmented image, an image scene classification result corresponding to the target image is determined.
The method according to claim 4, wherein said determining the image scene classification result corresponding to the target image according to the target scene label output based on each superpixel segmentation image comprises:

The target scene labels output from each superpixel segmented image are spliced to obtain an image scene classification result corresponding to the target image.
The method according to claim 1, wherein, for each target superpixel unit, the state vector of the target superpixel unit is updated according to the state vector of the target superpixel unit, the state vector of adjacent target superpixel units, and the edge features between the target superpixel unit and adjacent target superpixel units, to obtain the updated state vector of the target superpixel unit, comprising:

determining a relationship feature vector of the target superpixel unit according to the state vector of the target superpixel unit, the state vector of adjacent target superpixel units, and the edge features between the target superpixel unit and adjacent target superpixel units;

According to the state vector of the target superpixel unit and the relationship feature vector, the state vector of the target superpixel unit is updated to obtain an updated state vector of the target superpixel unit.
The method according to claim 1, wherein performing superpixel segmentation on the target image to be classified to obtain the target superpixel segmentation image comprises:

The region growing algorithm is used to perform region segmentation on the target image to be classified, and the target superpixel segmented image is obtained.
An image scene classification device based on a graph neural network, comprising:

The image segmentation module is used to perform superpixel segmentation on the target image to be classified to obtain the target superpixel segmentation image;

The feature extraction module is used to obtain a plurality of target superpixel units under the target superpixel segmentation image, and each of the target superpixel units is used as a node to obtain node features of each target superpixel unit and edge features between adjacent target superpixel units;

A state determination module, configured to, for each target superpixel unit, determine the state vector of the target superpixel unit according to the node characteristics of the target superpixel unit;

A state update module, for each target superpixel unit, according to the state vector of the target superpixel unit, the state vector of the adjacent target superpixel unit, the edge feature between the target superpixel unit and the adjacent target superpixel unit, update the state vector of the target superpixel unit, and obtain the updated state vector of the target superpixel unit;

A label output module, configured to input the updated state vectors of all target superpixel units to the pre-trained image scene classification model, so that the image scene classification model outputs the target scene label based on the target superpixel segmented image;

A scene classification module, configured to determine an image scene classification result corresponding to the target image according to the target scene label.
An electronic device, wherein the electronic device includes a memory, a processor, a program stored in the memory and operable on the processor, and a data bus for realizing connection and communication between the processor and the memory, and when the program is executed by the processor, an image scene classification method based on a graph neural network is implemented;

Wherein, the image scene classification method based on graph neural network includes:

Perform superpixel segmentation on the target image to be classified to obtain the target superpixel segmentation image;

Obtain a plurality of target superpixel units under the target superpixel segmentation image, use each of the target superpixel units as a node, obtain the node features of each target superpixel unit, and the edge features between adjacent target superpixel units;

For each target superpixel unit, according to the node characteristics of the target superpixel unit, determine the state vector of the target superpixel unit;

For each target superpixel unit, according to the state vector of the target superpixel unit, the state vector of the adjacent target superpixel unit, and the edge feature between the target superpixel unit and the adjacent target superpixel unit, the state vector of the target superpixel unit is updated to obtain the updated state vector of the target superpixel unit;

The updated state vectors of all target superpixel units are input to a pre-trained image scene classification model, so that the image scene classification model outputs a target scene label based on the target superpixel segmented image;

An image scene classification result corresponding to the target image is determined according to the target scene label.
The electronic device according to claim 9, wherein, before inputting the updated state vectors of all target superpixel units to a pre-trained image scene classification model to obtain the target scene label through the image scene classification model, the method further comprises:

Acquiring a sample image and a sample scene label corresponding to the sample image;

Performing superpixel segmentation on the sample image to obtain a sample superpixel segmentation image;

Obtaining a plurality of sample superpixel units under the sample superpixel segmentation image, using each of the sample superpixel units as a node, and obtaining the node features of each sample superpixel unit and the edge features between adjacent sample superpixel units;

For each sample superpixel unit, according to the node characteristics of the sample superpixel unit, determine the state vector of the sample superpixel unit;

For each sample superpixel unit, according to the state vector of the sample superpixel unit, the state vectors of adjacent sample superpixel units, and the edge features between the sample superpixel unit and adjacent sample superpixel units, update the state vector of the sample superpixel unit to obtain an updated state vector of the sample superpixel unit;

The image scene classification model is trained by taking the updated state vectors of all sample superpixel units as input and the sample scene label as expected output.
The electronic device according to claim 9, wherein, before inputting the updated state vectors of all target superpixel units to a pre-trained image scene classification model to obtain the target scene label through the image scene classification model, the method further comprises:

Acquiring a sample image and a sample scene label corresponding to the sample image;

Performing superpixel segmentation on the sample image based on different preset segmentation thresholds to obtain a plurality of sample superpixel segmented images, each sample superpixel segmented image includes a different number of sample superpixel units;

Traversing each sample superpixel segmentation image to train the image scene classification model based on each sample superpixel segmentation image, the training process includes:

Taking each of the sample superpixel units under the currently traversed sample superpixel segmentation image as a node, and obtaining the node features of each sample superpixel unit and the edge features between adjacent sample superpixel units;

For each sample superpixel unit, according to the node characteristics of the sample superpixel unit, determine the state vector of the sample superpixel unit;

For each sample superpixel unit, according to the state vector of the sample superpixel unit, the state vectors of adjacent sample superpixel units, and the edge features between the sample superpixel unit and adjacent sample superpixel units, update the state vector of the sample superpixel unit to obtain an updated state vector of the sample superpixel unit;

The image scene classification model is trained by taking the updated state vectors of all sample superpixel units as input and the sample scene label as expected output.
The electronic device according to claim 9, wherein performing superpixel segmentation on the target image to be classified to obtain a target superpixel segmented image comprises:

Performing superpixel segmentation on the target image to be classified based on different preset segmentation thresholds to obtain multiple target superpixel segmentation images, each target superpixel segmentation image includes a different number of target superpixel units;

The acquisition of multiple target superpixel units under the target superpixel segmentation image includes:

Traverse each target superpixel segmentation image, and obtain a plurality of target superpixel units under the target superpixel segmentation image currently traversed;

The determining the image scene classification result corresponding to the target image according to the target scene label includes:

Obtaining the target scene label output by the image scene classification model based on each target superpixel segmentation image;

According to the target scene label output based on each superpixel segmented image, an image scene classification result corresponding to the target image is determined.
The electronic device according to claim 12, wherein said determining the image scene classification result corresponding to the target image according to the target scene label output based on each superpixel segmentation image comprises:

The target scene labels output from each superpixel segmented image are spliced to obtain an image scene classification result corresponding to the target image.
The electronic device according to claim 9, wherein, for each target superpixel unit, the state vector of the target superpixel unit is updated according to the state vector of the target superpixel unit, the state vector of adjacent target superpixel units, and the edge features between the target superpixel unit and adjacent target superpixel units to obtain the updated state vector of the target superpixel unit, comprising:

determining a relationship feature vector of the target superpixel unit according to the state vector of the target superpixel unit, the state vector of adjacent target superpixel units, and the edge features between the target superpixel unit and adjacent target superpixel units;

According to the state vector of the target superpixel unit and the relationship feature vector, the state vector of the target superpixel unit is updated to obtain an updated state vector of the target superpixel unit.
A storage medium, the storage medium is a computer-readable storage medium for computer-readable storage, wherein the storage medium stores one or more programs, and the one or more programs can be executed by one or more processors to implement a graph neural network-based image scene classification method;

Wherein, the image scene classification method based on graph neural network includes:

Perform superpixel segmentation on the target image to be classified to obtain the target superpixel segmentation image;

Obtaining a plurality of target superpixel units under the target superpixel segmentation image, using each of the target superpixel units as a node, and acquiring node features of each target superpixel unit and edge features between adjacent target superpixel units;

For each target superpixel unit, according to the node characteristics of the target superpixel unit, determine the state vector of the target superpixel unit;

For each target superpixel unit, according to the state vector of the target superpixel unit, the state vector of the adjacent target superpixel unit, and the edge feature between the target superpixel unit and the adjacent target superpixel unit, the state vector of the target superpixel unit is updated to obtain the updated state vector of the target superpixel unit;

The updated state vectors of all target superpixel units are input to a pre-trained image scene classification model, so that the image scene classification model outputs a target scene label based on the target superpixel segmented image;

An image scene classification result corresponding to the target image is determined according to the target scene label.
The storage medium according to claim 15, wherein, before inputting the updated state vectors of all target superpixel units to a pre-trained image scene classification model to obtain the target scene label through the image scene classification model, the method further comprises:

Acquiring a sample image and a sample scene label corresponding to the sample image;

Performing superpixel segmentation on the sample image to obtain a sample superpixel segmentation image;

Obtaining a plurality of sample superpixel units under the sample superpixel segmentation image, using each of the sample superpixel units as a node, and obtaining the node features of each sample superpixel unit and the edge features between adjacent sample superpixel units;

For each sample superpixel unit, according to the node characteristics of the sample superpixel unit, determine the state vector of the sample superpixel unit;

For each sample superpixel unit, according to the state vector of the sample superpixel unit, the state vectors of adjacent sample superpixel units, and the edge features between the sample superpixel unit and adjacent sample superpixel units, update the state vector of the sample superpixel unit to obtain an updated state vector of the sample superpixel unit;

The image scene classification model is trained by taking the updated state vectors of all sample superpixel units as input and the sample scene label as expected output.
The storage medium according to claim 15, wherein, before inputting the updated state vectors of all target superpixel units to a pre-trained image scene classification model to obtain the target scene label through the image scene classification model, the method further comprises:

Acquiring a sample image and a sample scene label corresponding to the sample image;

Performing superpixel segmentation on the sample image based on different preset segmentation thresholds to obtain a plurality of sample superpixel segmented images, each sample superpixel segmented image includes a different number of sample superpixel units;

Traversing each sample superpixel segmentation image to train the image scene classification model based on each sample superpixel segmentation image, the training process includes:

Taking each of the sample superpixel units under the currently traversed sample superpixel segmentation image as a node, and obtaining the node features of each sample superpixel unit and the edge features between adjacent sample superpixel units;

For each sample superpixel unit, according to the node characteristics of the sample superpixel unit, determine the state vector of the sample superpixel unit;

For each sample superpixel unit, according to the state vector of the sample superpixel unit, the state vectors of adjacent sample superpixel units, and the edge features between the sample superpixel unit and adjacent sample superpixel units, update the state vector of the sample superpixel unit to obtain an updated state vector of the sample superpixel unit;

The image scene classification model is trained by taking the updated state vectors of all sample superpixel units as input and the sample scene label as expected output.
The storage medium according to claim 15, wherein performing superpixel segmentation on the target image to be classified to obtain the target superpixel segmented image comprises:

Performing superpixel segmentation on the target image to be classified based on different preset segmentation thresholds to obtain multiple target superpixel segmentation images, each target superpixel segmentation image includes a different number of target superpixel units;

The acquisition of multiple target superpixel units under the target superpixel segmentation image includes:

Traverse each target superpixel segmentation image, and obtain a plurality of target superpixel units under the currently traversed target superpixel segmentation image;

The determining the image scene classification result corresponding to the target image according to the target scene label includes:

Obtaining the target scene label output by the image scene classification model based on each target superpixel segmentation image;

According to the target scene label output based on each superpixel segmented image, an image scene classification result corresponding to the target image is determined.
The storage medium according to claim 18, wherein said determining the image scene classification result corresponding to the target image according to the target scene label output based on each superpixel segmented image comprises:

The target scene labels output from each superpixel segmented image are spliced to obtain an image scene classification result corresponding to the target image.
The storage medium according to claim 15, wherein, for each target superpixel unit, the state vector of the target superpixel unit is updated according to the state vector of the target superpixel unit, the state vector of adjacent target superpixel units, and the edge features between the target superpixel unit and adjacent target superpixel units to obtain the updated state vector of the target superpixel unit, comprising:

determining a relationship feature vector of the target superpixel unit according to the state vector of the target superpixel unit, the state vector of adjacent target superpixel units, and the edge features between the target superpixel unit and adjacent target superpixel units;

According to the state vector of the target superpixel unit and the relationship feature vector, the state vector of the target superpixel unit is updated to obtain an updated state vector of the target superpixel unit.