CN115082742A

CN115082742A - Training method and device for image classification model, electronic equipment and storage medium

Info

Publication number: CN115082742A
Application number: CN202210920449.4A
Authority: CN
Inventors: 丁佳; 李小星; 马璐; 吕晨翀
Original assignee: Beijing Yizhun Medical AI Co Ltd
Current assignee: Beijing Yizhun Medical AI Co Ltd
Priority date: 2022-08-02
Filing date: 2022-08-02
Publication date: 2022-09-20

Abstract

The present disclosure provides a training method, an apparatus, an electronic device and a storage medium for an image classification model, including: acquiring visual features corresponding to a crosscut image and a longitudinal cut image corresponding to a target object; mapping the visual features to each target node included in the first attribute graph based on an inter-graph transformer included in the image classification model, and determining the prediction probability of the attribute features of each target node; based on an intra-graph converter included by an image classification model, performing feature fusion on each classification node in the classification graph, and determining a classification prediction probability corresponding to each classification node; confirming a first sub-loss based on the prediction probability of the attribute feature of each target node and the label of the attribute feature of each target node; confirming a second sub-loss based on the classification prediction probability corresponding to each classification node and the classification label of the target object; adjusting parameters of the image classification model based on the first sub-penalty and the second sub-penalty.

Description

Training method and device for image classification model, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a method and an apparatus for training an image classification model, an electronic device, and a storage medium.

Background

The classification task of the focus is usually complex, and not only the focus part needs to be comprehensively scanned, but also the attribute characteristics of different sections of the focus are combined for judgment; if the lesion is classified by means of the ultrasound image only, the obtained classification result is not necessarily accurate.

Disclosure of Invention

The present disclosure provides a training method and apparatus for an image classification model, an electronic device, and a storage medium, to at least solve the above technical problems in the prior art.

According to a first aspect of the present disclosure, there is provided a training method of an image classification model, including:

acquiring visual features corresponding to a crosscut image and a longitudinal cut image corresponding to a target object;

mapping the visual features to each target node included in the first attribute graph based on an inter-graph transformer included in the image classification model, and determining the prediction probability of the attribute features of each target node; the number of the target nodes included in the first attribute graph is the same as the attribute feature number included in the target object;

mapping each target node included in the first attribute graph to each classification node included in a classification graph based on the inter-graph transformer;

based on an intra-graph converter included by an image classification model, performing feature fusion on each classification node in the classification graph, and determining a classification prediction probability corresponding to each classification node;

confirming a first sub-loss based on the prediction probability of the attribute feature of each target node and the label of the attribute feature of each target node; confirming a second sub-loss based on the classification prediction probability corresponding to each classification node and the classification label of the target object;

adjusting parameters of the image classification model based on the first sub-penalty and the second sub-penalty.

In the foregoing solution, the acquiring the visual features corresponding to the transection image and the rip image corresponding to the target object includes:

inputting the transection image and the rip image corresponding to the target object into an encoder of the image classification model, and confirming that the output of the encoder is the visual feature of the target object.

In the above solution, the mapping the visual features to the target nodes included in the first attribute map based on the inter-map transformer included in the image classification model, and determining the prediction probability of the attribute features of each target node include performing the following operations on each target node in the first attribute map:

updating the feature of the first target node based on edge weights between each source node in the visual feature and the first target node in the first attribute map and a first projection matrix;

confirming a prediction probability of the attribute feature of the first target node based on the feature of the first target node;

wherein a source node included in the visual feature comprises a pixel in the visual feature.

In the foregoing solution, the determining the prediction probability of the attribute feature of the first target node based on the feature of the first target node includes:

converting the feature of the first target node into a one-dimensional probability based on the feature of the first target node and a second projection matrix;

each element in the one-dimensional probability corresponds to the prediction probability of one attribute feature.

In the above solution, the edge weight between each source node in the visual feature and the first target node in the first attribute map is determined by:

respectively projecting a first target node and each source node in the visual features into a public feature space, and confirming the projection of the first target node and the projection of the source node corresponding to each source node;

confirming attention weight values between the first target node projection and each source node projection based on an attention mechanism;

and carrying out normalization processing on the attention weight value, and confirming that the normalization processing result is the edge weight between the first target node and each source node.

In the above solution, the first attribute graph is determined based on the set of each target node and the set of edges between any two target nodes in each target node.

In the above solution, the mapping, based on the inter-graph transformer, each target node included in the first attribute graph to each classification node included in a classification graph includes performing the following operations on each classification node in the classification graph:

the inter-graph transformer updates the feature of the first classification node based on an edge weight between each target node included in the first attribute graph and the first classification node in the classification graph and a third projection matrix.

In the above solution, the edge weight between each target node included in the first attribute graph and the first classification node in the classification graph is determined based on the following manner:

respectively projecting the updated first classification node and each updated classification node in the first attribute graph into a public feature space, and confirming the projection of the first classification node and the projection of a target node corresponding to each target node;

confirming attention weights between the first classification node projection and each target node projection based on an attention mechanism;

and carrying out normalization processing on the attention weight value, and confirming that the normalization processing result is the edge weight between the first classification node and each target node.

In the above scheme, the intra-image transformer included in the image classification-based model performs feature fusion on each classification node in the classification map, and determines the classification prediction probability corresponding to each classification node, including performing the following operations on each classification node in the classification map:

fusing all classification nodes in the classification graph based on edge weights between the first classification node and other classification nodes except the first classification node and a fourth projection matrix, and updating the first classification node based on a fusion result;

confirming the classification prediction probability of the first classification node based on the updated first classification node.

In the above scheme, the first sub-loss is determined based on the prediction probability of the attribute feature of each target node and the label of the attribute feature of each target node; confirming a second sub-loss based on the classification prediction probability corresponding to each classification node and the classification label of the target object, including:

confirming the prediction probability of the attribute characteristics of each target node and the cross entropy loss of the label of the attribute characteristics of each target node as a first sub-loss;

and confirming the classification prediction probability corresponding to each classification node and the cross entropy loss of the classification label of the target object as a second sub-loss.

According to a second aspect of the present disclosure, an image classification method is provided, which is implemented based on an image classification model obtained by training the above method, and the method includes:

acquiring a transverse cutting image and a longitudinal cutting image corresponding to a target object to be classified;

inputting the crosscut image and the longitudinal cut image into an inter-graph transformer included in the image classification model, and outputting a second attribute graph corresponding to the target object to be classified;

inputting the second attribute graph into the inter-graph transformer, and outputting a classification graph corresponding to the target object to be classified;

inputting the classification chart into an intra-chart converter included in the image classification model, and confirming that the output of the intra-chart converter is the classification result of the target object to be classified.

According to a third aspect of the present disclosure, there is provided an apparatus for training an image classification model, the apparatus comprising:

the first acquisition unit is used for acquiring visual features corresponding to the transverse cutting image and the longitudinal cutting image corresponding to the target object;

a first mapping unit, configured to map the visual features to target nodes included in the first attribute map based on an inter-map transformer included in the image classification model, and determine a prediction probability of the attribute features of each target node; the number of the target nodes included in the first attribute graph is the same as the attribute feature number included in the target object;

a second mapping unit, configured to map, based on the inter-graph transformer, each target node included in the first attribute graph to each classification node included in a classification graph;

the fusion unit is used for performing feature fusion on each classification node in the classification graph based on an intra-graph converter included in an image classification model and determining classification prediction probability corresponding to each classification node;

a confirming unit, configured to confirm the first sub-loss based on the prediction probability of the attribute feature of each target node and the label of the attribute feature of each target node; confirming a second sub-loss based on the classification prediction probability corresponding to each classification node and the classification label of the target object;

an adjusting unit, configured to adjust a parameter of the image classification model based on the first sub-loss and the second sub-loss.

According to a fourth aspect of the present disclosure, there is provided an image classification apparatus implemented based on an image classification model obtained by the method training, the method including:

the second acquisition unit is used for acquiring a transverse cutting image and a longitudinal cutting image corresponding to the target object to be classified;

a first input unit, configured to input the transected image and the slit image into an inter-graph transformer included in the image classification model, and output a second attribute map corresponding to the target object to be classified;

the second input unit is used for inputting the second attribute graph into the inter-graph transformer and outputting a classification graph corresponding to the target object to be classified;

and the classification unit is used for inputting the classification chart into an intra-chart converter included in the image classification model and confirming that the output of the intra-chart converter is the classification result of the target object to be classified.

According to a fifth aspect of the present disclosure, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the methods of the present disclosure.

According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the present disclosure.

According to the training method of the image classification model, visual features corresponding to a crosscut image and a longitudinal cut image corresponding to a target object are obtained; mapping the visual features to each target node included in the first attribute graph based on an inter-graph transformer included in the image classification model, and determining the prediction probability of the attribute features of each target node; the number of target nodes included in the first attribute graph is the same as the number of attribute features included in the target object; mapping each target node included in the first attribute graph to each classification node included in a classification graph based on the inter-graph transformer; based on an intra-graph converter included by an image classification model, performing feature fusion on each classification node in the classification graph, and determining a classification prediction probability corresponding to each classification node; confirming a first sub-loss based on the prediction probability of the attribute feature of each target node and the label of the attribute feature of each target node; confirming a second sub-loss based on the classification prediction probability corresponding to each classification node and the classification label of the target object; adjusting parameters of the image classification model based on the first sub-loss and the second sub-loss; and by combining different sections of the target object, the attribute characteristics closely related to classification are fully utilized, so that the model can not only output the classification result, but also predict the corresponding attribute characteristics.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:

in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

FIG. 1 is a schematic flow chart diagram illustrating an alternative method for training an image classification model according to an embodiment of the present disclosure;

FIG. 2 is a schematic flow chart illustrating an alternative method for training an image classification model provided by the embodiment of the present disclosure;

FIG. 3 shows a schematic structure of an image classification model provided by an embodiment of the present disclosure;

FIG. 4 is a schematic flow chart diagram illustrating an alternative image classification method provided by the embodiment of the disclosure;

FIG. 5 is a schematic diagram illustrating an alternative structure of a training apparatus for an image classification model provided in an embodiment of the present disclosure;

fig. 6 is a schematic diagram illustrating an alternative structure of an image classification apparatus provided in an embodiment of the present disclosure;

fig. 7 shows a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In order to make the objects, features and advantages of the present disclosure more apparent and understandable, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

In the description that follows, references to the terms "first", "second", and the like, are intended only to distinguish similar objects and not to indicate a particular ordering for the objects, it being understood that "first", "second", and the like may be interchanged under certain circumstances or sequences of events to enable embodiments of the present disclosure described herein to be practiced in other than the order illustrated or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. The terminology used herein is for the purpose of describing embodiments of the disclosure only and is not intended to be limiting of the disclosure.

It should be understood that, in the various embodiments of the present disclosure, the sequence numbers of the implementation processes do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the internal logic of the processes, and should not constitute any limitation on the implementation processes of the embodiments of the present disclosure.

Taking classification of breast ultrasound images as an example, classification of benign and malignant breast lesions in clinical practice generally adopts BIRADS grades, which are classified into six categories of 2, 3, 4A, 4B, 4C and 5. Where BIRADS is judged to be 2 is usually a benign lesion of cystic, parenchymal, regular morphology and clear borders, with no risk of deterioration; lesions with BIRADS of 3 are clinically typical benign changes with a exacerbation risk of 2% or less; the ultrasound manifestations of BIRADS 4A, 4B, 4C are usually lesions with one, two, three or four signs of malignancy, with a risk of exacerbation of 3% to 95% respectively; whereas the 95% probability of BIRADS being 5 is malignant. Therefore, it is very important to correctly judge the BIRADS grade for the diagnosis of the lesion.

Generally, the diagnosis of the lesion BIRADS depends on a plurality of attribute characteristic indexes of the ultrasound image, such as echo type, boundary, shape, calcification and the like.

In the field of computer vision, most of the existing BIRADS classification of breast lesions is based on ultrasonic images, and BIRADS classification is performed by directly adopting a model extraction feature of a CNN convolutional neural network. It does not focus on the characteristics of the focus closely related to BIRADS, and then has larger calculation amount compared with a Transformer structure, and the model calculation efficiency is not high.

Considering that the classification task of the breast ultrasound focus is complex, the BIRADS classification of the breast focus is judged clinically, and not only is the focus part scanned comprehensively, but also the attribute characteristics of different sections of the focus are combined for judgment. Based on the above analysis, if the lesion is classified only by the ultrasound image, it appears incomplete, and the obtained result is most likely not optimal.

In the related technology, the focus attribute characteristics closely related to BIRADS classification are not fully utilized; and compared with a Transformer structure, the CNN model has larger calculation amount and lower model calculation efficiency.

Aiming at the defects existing in the image classification method in the related technology, the disclosure provides a training method of an image classification model, which is characterized in that the focus attribute and BIRADS are respectively coded into attribute graphs, so that the characteristics realize dynamic conversion between graphs and in graphs between multi-graph structure data, thereby at least solving part or all of the technical problems. It should be understood by those skilled in the art that the above description of the breast ultrasound image is only an example, and is not intended to limit the training method of the image classification model and the application scenario of the image classification method provided by the embodiment of the present disclosure.

Before further detailed description of the embodiments of the present disclosure, the principles of the inter-and intra-transformer in the embodiments of the present disclosure will be described.

1) Inter-map Transformer (Tr) ₁ )

Inter-graph transformers are applicable to feature transfer between different graphs, and for aspect illustration, a graph is defined as a first graph and a second graph, the inter-graph transformers being applicable to inter-graph transfer from the first graph to the second graph. Optionally, when the first map is a map corresponding to the visual feature, the second map is a first attribute map; when the target object to be classified is the first attribute graph, the second graph is a classification graph.

The inter-graph transformer may transform the features of the first graph and pass their corresponding edge weights (edge weights between a node in the first graph and a node in the second graph) to the nodes in the second graph and update the nodes in the second graph, and the update formula may be:

wherein v is _j Is a feature of node j in the second graph, W _s Is a matrix of projections of the image data,

representing the edge weight between node i in the first graph and node j in the second graph. After each node in the first graph is projected to a space corresponding to the second graph, weighted summation is carried out on edge weights between the projected node and each node in the second graph, and the weighted summation is added with the original features of the nodes in the second graph after nonlinear activation, so that the features of the nodes in the second graph are updated.

Alternatively, Attention weights between the first and second maps may be automatically learned by an Attention mechanism (Attention)

Information:

represents the attention weight, W, of node i in the first graph to node j in the second graph _s ^A Characterizing a map projecting node features in a first graph into a common feature spaceRadial matrix, W _t ^A And characterizing a mapping matrix which respectively projects the node features in the second graph into a common feature space, and transforming two projection features (the node features of the first graph and the node features of the second graph) into probability serving as the association weight of an edge between two nodes by an Attention operation. Attention uses a scaled dot product operation with multiple Attention.

Then, each node in the second graph performs normalization processing on the attention weight values among all the nodes in the first graph, and the relative importance of each node in the first graph to any node in the second graph is represented as follows:

where N is the number of nodes in the second graph (for example, taking the sample image as a breast ultrasound image, in the present disclosure, the number of nodes in the first attribute graph is 29, and the number of nodes in the classification graph is 6). After the normalized attention weight is obtained, the characteristics of the target node can be updated through the formula (1).

2) Transducer in figure (Tr, transducer in figure) ₂ )

The purpose of feature transfer in the graph is to model the correlation between nodes of the same graph and perform feature fusion according to the closeness degree between the nodes. Specifically, by combining the characteristics of the neighboring nodes and themselves. The node edge update formula is as follows:

wherein, W _t Is a projection matrix, projecting the node features into an output space; e.g. of the type _n,m And characterizing edge weights between the nth node and the mth node in the image, wherein the edge weights can be learned through an attention mechanism, and the process is consistent with the above. The nodes in the same graph can be updated through the formula (4).

Fig. 1 shows an alternative flowchart of a training method of an image classification model provided by an embodiment of the present disclosure, which will be described according to various steps.

Step S101, obtaining visual characteristics corresponding to a transverse cutting image and a longitudinal cutting image corresponding to a target object.

In some embodiments, the target object may be a lesion sample (lesion), and the cross-cut image and the slit image corresponding to the target object may be a cross-cut image and a slit image of a lesion corresponding to the lesion sample.

In some embodiments, the training device of the image classification model may input the cross-cut image and the longitudinal-cut image corresponding to the target object into an encoder of the image classification model, and confirm that an output of the encoder is the visual feature of the target object; wherein the encoder may be a content 50.

Step S102, based on the inter-graph transformer included in the image classification model, the visual features are mapped to each target node included in the first attribute graph, and the prediction probability of the attribute features of each target node is determined.

In some embodiments, the number of target nodes included in the first attribute graph is the same as the number of attribute features potentially included in the target object; the apparatus may update a feature of a first target node in the first attribute map based on edge weights between source nodes in the visual feature and the first target node and a first projection matrix; confirming a prediction probability of the attribute feature of the first target node based on the feature of the first target node; wherein a source node included in the visual feature comprises a pixel in the visual feature; the first projection matrix is used for projecting each original node to a feature space corresponding to the first attribute graph; the number of the attribute features potentially contained by the target object may be a total number of the attribute features that may exist in the target object, and if the possible boundary of the target object is one of clear, unclear and unclear, the number of the attribute features potentially contained by the target object is 4.

In particular, the apparatus may map the visual characteristic to each target node included in the first attribute map based on the above formula (1), and update the characteristic of the first target node.

At this time, the target object to be classified is an image corresponding to the visual feature, the second graph is a first attribute graph, and v in formula (1) _i The feature of the ith source node in the visual feature is obtained, and N is the total number of the source nodes in the visual feature; w _s To map the source node to a matrix of the feature space corresponding to the first attribute map (i.e. the first projection matrix),

representing the edge weight, v, between a source node i and a destination node j in the first attributed graph _j Is a characteristic of the target node j.

Further, in the formula (2), W _s Characterizing a projection matrix, W, projecting source nodes into a common space _t Representing a matrix that projects the target node into the common space.

In some embodiments, after the apparatus identifies the feature of the first target node, the apparatus may also identify a predicted probability of the attribute feature of the first target node based on the feature of the first target node.

In specific implementation, the device converts the feature of the first target node into a one-dimensional probability based on the feature of the first target node and a second projection matrix; each element in the one-dimensional probability corresponds to the prediction probability of one attribute feature. Further, the number of elements included in the one-dimensional probability is the same as the number of target nodes included in the first attribute graph; the number of target nodes included in the first attribute graph is the same as the number of attribute features potentially contained in the target object; for example, as shown in table 1, if the number of attribute features potentially included in the target object is 29, the number of target nodes included in the first attribute graph is 29. It should be noted that the attribute features potentially included in the target object are not necessarily the attribute features that the target object may include, but for example, the attribute is taken as a boundary, and the corresponding attribute features are clear, still clear, undercleared, and unclear, and the true boundary of the target object is one of clear, still clear, undercleared, and unclear, and cannot be all provided.

In some embodiments, the first attributed graph is validated based on the set of target nodes and the set of edges between any two of the target nodes.

In particular, the first property graph G may be characterized as G ═ V, E, where,

v _a ∈R ^d representing a set of target nodes in the first attributed graph (or a set of features of the target nodes in the first attributed graph), the feature dimension of each target node being the d-dimension; e ═ E _ab } _{a，b＝[1，M]Is that} And M is the total number of the target nodes included in the first attribute graph.

Step S103, mapping each target node included in the first attribute graph to each classification node included in a classification graph based on the inter-graph transformer.

In some embodiments, the apparatus updates the feature of the first classification node based on an edge weight and a third projection matrix between each target node included in the first attribute map and the first classification node in the classification map; in step S103, the first attribute map is a target attribute map formed after each target node is updated.

In specific implementation, the device respectively projects the updated first classification node and each updated classification node in the first attribute map into a public feature space, and confirms the projection of the first classification node and the projection of a target node corresponding to each target node; confirming attention weight values between the first classification node projection and each target node projection based on an attention mechanism; and carrying out normalization processing on the attention weight value, and confirming that the normalization processing result is the edge weight between the first classification node and each target node.

Specifically, the apparatus may update the characteristics of the first classification node based on equations (1) to (3). At this time, the first graph is a first attribute graph, and the second graph is a classification graph; n is the number of possible classification results for the target object (2, 3, 4A, 4B,4C, 5 categories, namely the number of possible classification results is 6); v. of _j To classify the features of the classification node j in the graph,

representing the edge weight, W, between the target node i in the first attribute graph and the classification node j in the classification graph _s Is a third projection matrix, v _i Is the characteristic of the target node i in the first attribute map.

And step S104, performing feature fusion on each classification node in the classification map based on an intra-map converter included in the image classification model, and determining the classification prediction probability corresponding to each classification node.

In some embodiments, the apparatus fuses classification nodes in the classification graph based on edge weights between the first classification node and other classification nodes except the first classification node and a fourth projection matrix, and updates the first classification node based on a fusion result; confirming the classification prediction probability of the first classification node based on the updated first classification node.

Specifically, the apparatus may determine edge weights between a first classification node and other classification nodes other than the first classification node based on formulas (2) to (3), fuse the classification nodes in the classification map based on formula (4), and update the first classification node based on a fusion result.

At this time, the mapping matrix W in equation (2) _s ^A And W _t ^A The same is true for the purpose of mapping the features of each classification node in the classification map to a common feature space, v _m And v _n Respectively representing the characteristics of a classification node m and a classification node n in the classification graph; e.g. of the type _n，m Edge weights for classification node m and classification node n, W _t Is a fourth projection matrix; n is the number of possible classification results for the target object.

Step S105, confirming a first sub-loss based on the prediction probability of the attribute characteristics of each target node and the label of the attribute characteristics of each target node; and confirming a second sub-loss based on the classification prediction probability corresponding to each classification node and the classification label of the target object.

In some embodiments, the apparatus identifies the predicted probability of the attribute feature of each target node and the cross-entropy penalty of the label of the attribute feature of each target node as a first sub-penalty; and confirming the classification prediction probability corresponding to each classification node and the cross entropy loss of the classification label of the target object as a second sub-loss.

Step S106, adjusting parameters of the image classification model based on the first sub-loss and the second sub-loss.

In some embodiments, the device pair first sub-Loss ₁ And a second Loss ₂ Carry out weighted summation like Loss lambda Loss ₁ +(1-λ)Loss ₂ The parameters of the optimized image classification model are then propagated back. And when the total Loss tends to be stable, determining that the training of the image classification model is finished.

Therefore, the training method of the target detection model provided by the embodiment of the disclosure makes full use of the attribute features closely related to image classification. The image classification model can output the final image classification result and predict the corresponding attribute characteristics; in addition, compared with the traditional CNN model, the image classification model provided by the embodiment of the disclosure has small calculation amount, and the image classification model can better interact among features through two feature transfer modes, namely, an inter-graph transform and an intra-graph transform, so that the prediction accuracy is improved.

Next, taking an example that a target object is a breast ultrasound image, fig. 2 shows an alternative flow chart of the training method of the image classification model provided by the embodiment of the disclosure, and fig. 3 shows a schematic structure of the image classification model provided by the embodiment of the disclosure, which will be described with reference to fig. 2 and fig. 3.

Breast lesions include 6 attributes: the method includes the following steps of boundary (margin), shape (shape), calcification (Calcifications _ in _ mass), posterior echo (porterior _ features), echo type (echo _ pattern) and blood flow (vascularity), wherein each attribute has an attribute characteristic corresponding to the attribute, and specific reference may be made to table 1. During training, an ultrasonic transverse cutting image and an ultrasonic longitudinal cutting image of a unified focus are adopted as an image pair for training; the attribute features of the lesions and BIRADS need to be labeled in advance.

TABLE 1 Breast lesion attributes and included features

Referring to table 1, the attribute characteristics of breast lesions included 29 kinds and the classification included 6 kinds. When training an image classification model, extracting visual features from input transected images and longitudinal cut images of a lesion, converting the visual features into attribute maps by using an inter-map Transformer, converting the attribute map features into BIRADS images (classification maps) of higher levels by using the inter-map Transformer, and further performing classification prediction. The node features of the same image (classification map) are fused by using an intra-map Transformer. And simultaneously supervising the attributes and the BIRADS to carry out training optimization of the image classification model.

Step S201, acquiring visual features corresponding to the transected image and the slit image.

In some embodiments, the transection image and the rip image are image pairs of the same lesion (the same target object), and the training device of the image classification model extracts the visual feature X of the lesion by using the event 50 as an encoder for the input transection and rip images of the lesion. The visual features are then converted into a first property graph using an inter-graph Transformer.

Step S202, the visual features are mapped to the first attribute map, and the first sub-loss is confirmed.

In some embodiments, the first attribute map is initialized prior to model training. The first attribute map is denoted by G ═ (V, E),

v _a ∈R ^d representing a set of target nodes in the first attributed graph (or a set of features of the target nodes in the first attributed graph), the feature dimension of each target node being the d-dimension; e＝{e _ab } _{a，b＝[1，M]} Is a set of edges between any two target nodes in the first attribute map, M is the total number of target nodes included in the first attribute map, and when the breast ultrasound image is trained, M is the total number of attribute features, that is, 29, each target node in the first attribute map corresponds to one feature in table 1.

The visual feature (visual feature) X obtained in step S201 is transferred to the first attribute graph by using the inter-graph Transformer method (refer to step S102), and it should be noted that when the visual feature is input, each pixel point therein is regarded as a node in the graph structure:

f _n ＝Tr ₁ (X)

p _n ＝sigmoid(W _n f _n )

f _n is the feature (latent feature) of the nth target node, n ∈ [1, M]，W _n Is a linear projection matrix (second projection matrix), converts the characteristics of the target nodes into one-dimensional probability, and the one-dimensional probability of the nth target node can be expressed as p _n ＝(p _n1 ，p _n2 ，...，p _nN )∈(0，1)，p _n Each element in the index table represents the prediction probability of a certain attribute characteristic corresponding to the nth target node and the binary label y of the nth target node _n Computing cross entropy Loss Loss for epsilon {0, 1} ₁ 。

Step S203, the first attribute map is mapped to the classification map.

Since the attribute feature is very important reference information for determining the lesion classification, the feature of each target node in the first attribute map is mapped to the classification map using the inter-map transformer (refer to step S103). The number of classification nodes included in the classification map is 6, corresponding to 2, 3, 4A, 4B, 4C, 5 of BIRADS.

f _c ＝Tr ₁ (f _n )

f _c Is the feature of the c-th classification node in the classification graph, c is the [1, 6 ]]。

And step S204, fusing the characteristics of each classification node in the classification graph and determining a second sub-loss.

In some embodiments, the inter-graph transformer updates features of a first classification node in the classification graph based on edge weights between each target node in the first attribute graph and the first classification node and a third projection matrix; specifically, the updated first classification node and each updated classification node in the first attribute map are respectively projected into a common feature space, and the projection of the first classification node and the projection of a target node corresponding to each target node are confirmed; confirming attention weight values between the first classification node projection and each target node projection based on an attention mechanism; and carrying out normalization processing on the attention weight value, and confirming that the normalization processing result is the edge weight between the first classification node and each target node. Fusing all classification nodes in the classification graph based on edge weights between the first classification node and other classification nodes except the first classification node and a fourth projection matrix, and updating the first classification node based on a fusion result; confirming the classification prediction probability of the first classification node based on the updated first classification node.

In some embodiments, the apparatus performs feature fusion on the relevance of each classification node in the classification map based on an intra-map Transformer, and then performs classification by linear projection to obtain the predicted probability of each classification node:

f _c ＝Tr ₂ (f _c )

p＝sigmoid(W _c f _c )

similarly, W _c Is a linear projection (third projection matrix), p represents the prediction probability of each classification node, and the cross entropy Loss is calculated by the classification label of the target object ₂ 。

Step S205, adjusting parameters of the image classification model based on the first sub-loss and the second sub-loss.

In some embodiments, the device pair first sub-Loss ₁ And a second Loss ₂ Carry out weighted summation like Loss lambda Loss ₁ +(1-λ)Loss ₂ The parameters of the optimized image classification model are then propagated back. When the total Loss tends to decreaseAnd when the image classification model is stable, determining that the training of the image classification model is finished.

Fig. 4 is a schematic flow chart illustrating an alternative image classification method provided by the embodiment of the present disclosure, which will be described according to various steps.

Step S301, a crosscut image and a longitudinal cut image corresponding to the target object to be classified are obtained.

Wherein the cross-cut image and the longitudinal-cut image are corresponding to the same focus (target object to be classified).

Step S302, inputting the crosscut image and the longitudinal cut image into an inter-graph transformer included in the image classification model, and outputting a second attribute graph corresponding to the target object to be classified.

In some embodiments, the image classification device inputs the cross-cut image and the longitudinal-cut image into an encoder included in an image classification model, and obtains visual features corresponding to the cross-cut image and the longitudinal-cut image; inputting the visual feature into the inter-graph transformer, and outputting the second attribute graph. Wherein the number of target nodes included in the second attribute graph is the same as the total number of possible (potential) attribute features of the target object to be classified.

Wherein, the possible attribute characteristics of the target object to be classified comprise 29 attribute characteristics in table 1.

Step S303, inputting the second attribute map into the inter-map transformer, and outputting a classification map corresponding to the target object to be classified.

In some embodiments, the image classification device inputs the second attribute map into the inter-map transformer, which maps each target node in the second attribute map into a classification map, updating the features of each classification node included in the classification map.

Step S304, inputting the classification chart into an intra-chart converter included in the image classification model, and confirming that the output of the intra-chart converter is the classification result of the target object to be classified.

In some embodiments, the features of each classification node are fused based on an intra-graph transformer, and the output of the intra-graph transformer is confirmed as the classification result of the target object to be classified.

Therefore, by the image classification method provided by the embodiment of the disclosure, the attribute features closely related to image classification are fully utilized, so that a final image classification result can be output, and the corresponding attribute features can be predicted.

Fig. 5 is a schematic structural diagram of an alternative training apparatus for an image classification model provided in an embodiment of the present disclosure, which will be described according to various parts.

In some embodiments, the training apparatus 500 for image classification models includes a first obtaining unit 501, a first mapping unit 502, a second mapping unit 503, a fusing unit 504, a confirming unit 505, and an adjusting unit 506.

The first obtaining unit 501 is configured to obtain visual features corresponding to a cross-cut image and a longitudinal-cut image corresponding to a target object;

the first mapping unit 502 is configured to map the visual features to target nodes included in the first attribute map based on an inter-map transformer included in the image classification model, and determine a prediction probability of the attribute features of each target node; the number of the target nodes included in the first attribute graph is the same as the attribute feature number included in the target object;

the second mapping unit 503 is configured to map each target node included in the first attribute map to each classification node included in a classification map based on the inter-map transformer;

the fusion unit 504 is configured to perform feature fusion on each classification node in the classification map based on an intra-map converter included in an image classification model, and determine a classification prediction probability corresponding to each classification node;

the confirming unit 505 is configured to confirm the first sub-loss based on the prediction probability of the attribute feature of each target node and the label of the attribute feature of each target node; confirming a second sub-loss based on the classification prediction probability corresponding to each classification node and the classification label of the target object;

the adjusting unit 506 is configured to adjust a parameter of the image classification model based on the first sub-loss and the second sub-loss.

The first obtaining unit 501 is specifically configured to input the cross-cut image and the slit image corresponding to the target object into an encoder of the image classification model, and confirm that an output of the encoder is a visual feature of the target object.

The first mapping unit 502 is specifically configured to perform the following operations on each target node in the first attribute graph:

The first mapping unit 502 is specifically configured to convert the feature of the first target node into a one-dimensional probability based on the feature of the first target node and a second projection matrix;

The first mapping unit 502 is specifically configured to respectively project a first target node and each source node in the visual feature into a common feature space, and confirm a projection of the first target node and a projection of the source node corresponding to each source node;

The second mapping unit 503 is specifically configured to perform the following operations on each classification node in the classification map:

The second mapping unit 503 is specifically configured to respectively project the updated first classification node and each updated classification node in the first attribute map into a common feature space, and confirm the projection of the first classification node and the projection of a target node corresponding to each target node;

confirming attention weight values between the first classification node projection and each target node projection based on an attention mechanism;

The merging unit 504 is specifically configured to perform the following operations on each classification node in the classification map:

The confirming unit 505 is specifically configured to confirm that the prediction probability of the attribute feature of each target node and the cross entropy loss of the label of the attribute feature of each target node are first sub-losses;

Fig. 6 is a schematic diagram illustrating an alternative structure of an image classification apparatus provided in an embodiment of the present disclosure, which will be described according to various parts.

In some embodiments, the image classification apparatus 600 includes a second acquisition unit 601, a first input unit 602, a second input unit 603, and a classification unit 604.

The second obtaining unit 601 is configured to obtain a cross-cut image and a longitudinal-cut image corresponding to a target object to be classified;

the first input unit 602 is configured to input the transected image and the slit image into an inter-graph transformer included in the image classification model, and output a second attribute graph corresponding to the target object to be classified;

the second input unit 603 is configured to input the second attribute map into the inter-map transformer, and output a classification map corresponding to the target object to be classified;

the classification unit 604 is configured to input the classification map into an intra-map transformer included in the image classification model, and confirm that an output of the intra-map transformer is a classification result of the target object to be classified.

The present disclosure also provides an electronic device and a readable storage medium according to an embodiment of the present disclosure.

Fig. 7 illustrates a schematic block diagram of an example electronic device 800 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the electronic device 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the electronic apparatus 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

A number of components in the electronic device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the electronic device 800 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 801 performs the respective methods and processes described above, such as a training method of an image classification model and/or an image classification method. For example, in some embodiments, the training method of the image classification model and/or the image classification method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto the electronic device 800 via the ROM 802 and/or the communication unit 809. When loaded into RAM 803 and executed by the computing unit 801, a computer program may perform one or more steps of the image classification model training method and/or the image classification method described above. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the training method of the image classification model and/or the image classification method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present disclosure, "a plurality" means two or more unless specifically limited otherwise.

The above description is only for the specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily think of the changes or substitutions within the technical scope of the present disclosure, and shall cover the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A method for training an image classification model, the method comprising:

acquiring visual characteristics corresponding to a transverse cutting image and a longitudinal cutting image corresponding to a target object;

mapping the visual features to each target node included in the first attribute graph based on an inter-graph transformer included in the image classification model, and determining the prediction probability of the attribute features of each target node; the number of the target nodes included in the first attribute graph is the same as the number of the attribute features included in the target object;

2. The method of claim 1, wherein the obtaining of the visual features corresponding to the transected image and the transected image of the target object comprises:

inputting the transection image and the rip image corresponding to the target object into an encoder of the image classification model, and confirming that the output of the encoder is the visual feature corresponding to the target object.

3. The method of claim 1, wherein the image classification model comprises an inter-graph transformer that maps the visual features to target nodes comprised by a first attribute graph, and wherein determining the predicted probability of the attribute feature for each target node comprises performing the following for each target node in the first attribute graph:

4. The method of claim 3, wherein the identifying the predicted probability of the attribute feature of the first target node based on the feature of the first target node comprises:

5. The method of claim 3, wherein the edge weight between each source node in the visual feature and the first target node in the first attribute map is determined by:

6. The method of claim 1,

the first attribute graph is validated based on a set of target nodes and a set of edges between any two of the target nodes.

7. The method of claim 1, wherein said mapping each target node included in the first attribute graph to each classification node included in a classification graph based on the inter-graph transformer comprises performing the following operations on each classification node in the classification graph:

8. The method of claim 7, wherein the first attribute graph comprises edge weights between each target node and a first classification node in the classification graph, which are determined based on:

9. The method according to claim 1 or 7, wherein the image classification model based intra-graph transformer performs feature fusion on each classification node in the classification graph, and determines the classification prediction probability corresponding to each classification node, and comprises the following operations on each classification node in the classification graph:

10. The method of claim 1, wherein the first sub-loss is identified based on the predicted probability of the attribute feature of each target node and the label of the attribute feature of each target node; confirming a second sub-loss based on the classification prediction probability corresponding to each classification node and the classification label of the target object, including:

confirming that the prediction probability of the attribute characteristics of each target node and the cross entropy loss of the label of the attribute characteristics of each target node are first sub-losses;

11. An image classification method, which is implemented based on the image classification model trained according to any one of claims 1 to 10, the method comprising:

inputting the classification diagram into an intra-diagram converter included in the image classification model, and confirming that the output of the intra-diagram converter is the classification result of the target object to be classified.

12. An apparatus for training an image classification model, the apparatus comprising:

the first acquisition unit is used for acquiring visual features corresponding to a transverse cutting image and a longitudinal cutting image corresponding to a target object;

a first mapping unit, configured to map the visual features to target nodes included in the first attribute map based on an inter-map transformer included in the image classification model, and determine a prediction probability of the attribute features of each target node; the number of target nodes included in the first attribute graph is the same as the number of attribute features included in the target object;

a second mapping unit for mapping each target node included in the first attribute graph to each classification node included in a classification graph based on the inter-graph transformer;

an adjusting unit for adjusting parameters of the image classification model based on the first sub-loss and the second sub-loss.

13. An image classification device, characterized in that, based on the image classification model obtained by training of the above claims 1-10, the method comprises:

14. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10;

or, capable of performing the method of claim 11.

15. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-10;

or, capable of performing the method of claim 11.