CN111291825B

CN111291825B - Focus classification model training method, apparatus, computer device and storage medium

Info

Publication number: CN111291825B
Application number: CN202010115041.0A
Authority: CN
Inventors: 甘伟焜; 詹维伟; 张璐; 陈超; 黄凌云; 刘玉宇
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-02-25
Filing date: 2020-02-25
Publication date: 2024-05-07
Anticipated expiration: 2040-02-25
Also published as: CN111291825A; WO2021169126A1

Abstract

The invention provides a focus classification model training method, a device, computer equipment and a storage medium, wherein the focus classification model training method comprises the following steps: acquiring a sample data set, wherein the sample data set comprises a plurality of sample pictures respectively marked with focus areas, and each focus area is respectively marked with a corresponding classification label; collecting and processing local image information of each focus area to obtain local image information of each focus area; global image information acquisition processing is carried out on each focus area, so that global image information of each focus area is obtained; training a focus classification model with a double input channel, which is built in advance, by utilizing the local image information and the global image information of each focus zone and the classification label corresponding to each focus zone, so as to obtain a target focus classification model. The invention can reduce the requirement on the sample data volume on the premise of realizing accurate focus classification.

Description

Focus classification model training method, apparatus, computer device and storage medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a focus classification model training method, a focus classification model training device, computer equipment and a storage medium.

Background

Traditionally, the judgment of the disease is basically realized by manual judgment of doctors, the manual judgment is long in time consumption and more in related processes, and the disease needs to be comprehensively diagnosed according to a plurality of indexes of the focus, such as whether calcification is carried out, whether the disease is cystic, echo level and the like, the diagnosis result is not easy to quantify, meanwhile, the judgment error is easy to occur in the actual diagnosis process depending on the experience of the doctors.

With the rapid development of medical and computer image processing technologies, automatic identification and diagnosis of medical images is a hot spot of research in the field of intersection of computer image technology and medical images. The auxiliary diagnosis of the ultrasonic image is completed by utilizing the computer image processing technology, and a classifier with high speed and high accuracy is mainly constructed to assist doctors in carrying out benign and malignant diagnosis on the focus area.

The diagnosis of benign and malignant diseases belongs to a classification problem to a certain extent, and the existing focus classification method is realized by inputting images marked with focus areas into a focus classification model trained in advance. However, the size of the focal region has a large difference, and the number of pixels of the focal region reflected in the ultrasonic image may vary from several hundred to several tens of thousands (while the benign and malignant degree of the focal region has no direct relation with the size of the focal region), however, since the existing focal classification model is a single input channel, the classification effect of the single input channel focal classification model is easily affected by the image size, so that the effect is often poor when processing data with a large size range. To solve this problem, the existing method is to individually build models for different lesion size ranges, however, each individual model needs to be trained by a sample with a corresponding scale, which results in an increased requirement for the data volume of the sample.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide a focus classification model training method, a focus classification model training device, computer equipment and a storage medium, so as to reduce the requirement on sample data size on the premise of realizing accurate focus classification.

In order to achieve the above object, the present invention provides a focus classification model training method, including:

Acquiring a sample data set, wherein the sample data set comprises a plurality of sample pictures respectively marked with focus areas, and each focus area is respectively marked with a corresponding classification label;

Collecting and processing local image information of each focus area to obtain local image information of each focus area;

global image information acquisition processing is carried out on each focus area, so that global image information of each focus area is obtained;

training a focus classification model with a double input channel, which is built in advance, by utilizing the local image information and the global image information of each focus zone and the classification label corresponding to each focus zone, so as to obtain a target focus classification model.

In one embodiment of the present invention, the local image information acquisition process includes:

Constructing a focus rectangular frame by taking the uppermost point, the lowermost point, the leftmost point and the rightmost point of the focus area as references;

And randomly moving the focus rectangular frame for a plurality of times in the neighborhood of the focus area, and taking the image information in the focus rectangular frame after each movement as the local image information of the focus area.

In one embodiment of the present invention, before randomly moving the focal rectangular frame multiple times in the neighborhood of the focal zone, the local image information collecting and processing further includes:

And amplifying the focus rectangular frame according to a preset amplifying proportion by taking the center of the focus rectangular frame as a reference.

In one embodiment of the present invention, the global image information acquisition process includes:

Dividing each focus area into a first focus area, a second focus area and a third focus area according to the size, wherein the sizes of the first focus area, the second focus area and the third focus area are sequentially reduced;

Performing downsampling treatment on each focus area in the focus areas of the first type, and taking the image information subjected to downsampling treatment as global image information of the corresponding focus areas in the focus areas of the first type;

Taking the image information of each focus area in the focus areas of the second type as the global image information of the corresponding focus areas in the focus areas of the second type;

And performing interpolation processing on each focus region in the focus region of the third type, and taking the image information subjected to interpolation processing as global image information of a corresponding focus region in the focus region of the third type.

In one embodiment of the present invention, the lesion classification model includes a first convolution network, a second convolution network, a stitching layer, and a fully connected classification layer, where an input end of the first convolution network is configured to receive local image information of each of the lesion areas, an input end of the second convolution network is configured to receive global image information of each of the lesion areas, an input end of the stitching layer is respectively connected with output ends of the first convolution network and the second convolution network, and an input end of the fully connected classification layer is connected with an output end of the stitching layer.

In one embodiment of the present invention, after obtaining the target lesion classification model, the method further comprises:

Obtaining a target case picture, wherein a target focus area is marked in the target case picture;

The local image information of the target focus area is acquired and processed to obtain the local image information of the target focus area;

the global image information acquisition processing is carried out on the target focus area, so that global image information of the target focus area is obtained;

and processing the local image information and the global image information of the target focus area by using the target focus classification model to obtain a classification result of the target focus area.

In order to achieve the above object, the present invention further provides a focus classification model training device, including:

The system comprises a sample acquisition module, a detection module and a detection module, wherein the sample acquisition module is used for acquiring a sample data set, the sample data set comprises a plurality of sample pictures respectively marked with focus areas, and each focus area is respectively marked with a corresponding classification label;

The first local image information acquisition module is used for carrying out local image information acquisition processing on each focus area to obtain local image information of each focus area;

the first global image information acquisition module is used for carrying out global image information acquisition processing on each focus area to obtain global image information of each focus area;

The model training module is used for training a focus classification model with a double input channel, which is built in advance, by utilizing the local image information and the global image information of each focus area and the classification label corresponding to each focus area, so as to obtain a target focus classification model.

In one embodiment of the present invention, the first local image information acquisition module is specifically configured to:

In one embodiment of the present invention, the first local image information acquisition module is further configured to: before randomly moving the focus rectangular frame for a plurality of times in the neighborhood of the focus area,

In one embodiment of the present invention, the first global image information acquisition module is specifically configured to:

In one embodiment of the invention, the apparatus further comprises:

the case picture acquisition module is used for acquiring a target case picture after the model training module obtains the target focus classification model, wherein a target focus area is marked in the target case picture;

the second local image information acquisition module is used for carrying out the local image information acquisition processing on the target focus area to obtain the local image information of the target focus area;

the second global image information acquisition module is used for carrying out global image information acquisition processing on the target focus area to obtain global image information of the target focus area;

And the model processing module is used for processing the local image information and the global image information of the target focus area by utilizing the target focus classification model to obtain a classification result of the target focus area.

To achieve the above object, the present invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, which processor implements the steps of the aforementioned method when executing the computer program.

In order to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the aforementioned method.

By adopting the technical scheme, the invention has the following beneficial effects:

the invention designs a focus classification model with double input channels, and receives local image information and global image information of a focus area respectively through the two input channels, and the focus classification model obtained by training has higher accuracy compared with the existing single input focus classification model because the local image information contains more detail information of the focus area and global image information can reflect the global characteristics of the focus area. Meanwhile, the invention does not need to respectively establish models for different focus size ranges, thereby reducing the requirement of model training on sample data size.

Drawings

FIG. 1 is a flow chart of one embodiment of a lesion classification model training method of the present invention;

FIG. 2 is a schematic diagram of a lesion classification model according to the present invention;

FIG. 3 is a schematic diagram of the first and second convolutional networks of the present invention;

FIG. 4 is a schematic diagram of a residual module according to the present invention;

FIG. 5 is a schematic diagram of the structure of the attention network according to the present invention;

FIG. 6 is a flow chart of another embodiment of a lesion classification model training method according to the present invention;

FIG. 7 is a block diagram illustrating one embodiment of a lesion classification model training device according to the present invention;

FIG. 8 is a block diagram illustrating another embodiment of a lesion classification model training device according to the present invention;

Fig. 9 is a hardware architecture diagram of a computer device of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

Example 1

The present embodiment provides a focus classification model training method, as shown in fig. 1, which includes the following steps:

S11, acquiring a sample data set, wherein the sample data set comprises a plurality of sample pictures respectively marked with focus areas, and each focus area is respectively marked with a corresponding classification label.

Specifically, the present step may obtain sample pictures from a sample database of a hospital, where the focal zone contour in each sample picture is manually marked by a doctor in advance or automatically or semi-automatically marked by an existing detection or segmentation algorithm, and the classification label corresponding to each focal zone is also marked by a doctor in advance, for example, a benign label [10 ] and a malignant label [ 01 ] are marked. In this embodiment, the sample image mainly refers to an ultrasound image, and in particular, an ultrasound image labeled with a thyroid focus area or a breast focus area.

S12, collecting and processing local image information of each focus area to obtain the local image information of each focus area. In this embodiment, the local image information acquisition processing may be implemented by:

S121, constructing a focus rectangular frame by taking the uppermost point, the lowermost point, the leftmost point and the rightmost point of the focus area as references.

S122, randomly moving the focus rectangular frame for a plurality of times in the neighborhood of the focus area, and collecting the image information in the focus rectangular frame after each movement as the local image information of the corresponding focus area. The focus rectangular frame is moved because: in practical application, the ultrasonic image has the defects of lower resolution, artifacts existing in the image, no obvious limit on the focus area, and the like, so that the labeling position of the focus area is not necessarily accurate, and the focus area is moved for a plurality of times; furthermore, by increasing the amount of data by random movement, the need for the amount of sample data can be further reduced to some extent.

Preferably, before performing step S122, the local image information acquisition process may further include the steps of: the focus rectangular frame is enlarged according to a preset enlargement ratio N% by taking the center of the focus rectangular frame as a reference, and the size of the enlarged focus rectangular frame=the size of the original focus rectangular frame is (1+n%). The focus rectangular frame is enlarged here because: when classifying the focus area, the images of the focus area and the surrounding area are combined for classification, so that a better classification effect can be obtained.

S13, global image information acquisition processing is carried out on each focus area, and global image information of each focus area is obtained. In this embodiment, the global image information acquisition processing includes:

S131, dividing each focus area into a first focus area, a second focus area and a third focus area according to the size, wherein the sizes of the first focus area, the second focus area and the third focus area are sequentially reduced. Wherein, the classification procedure is as follows: firstly, counting the size of the long axis (the long axis is the longest axis passing through the center of the focus area) of each focus area in a sample data set, obtaining the median M of all the long axes, taking the focus area with the long axis larger than M as a large focus area, and taking the focus area with the long axis smaller than M as a small focus area; then, obtaining the median M1 of the long axes of all the large focus areas, and taking the large focus areas with the long axes larger than M1 as first focus areas; obtaining the average value M2 of the long axes of all the small focus areas, and taking the small focus areas with the long axes smaller than M2 as third focus areas; meanwhile, a large focal zone with a long axis smaller than M1 and a small focal zone with a long axis smaller than M2 are used as second type focal zones.

S132, carrying out downsampling treatment on each focus area in the focus areas of the first type, so that each focus area in the focus areas of the first type is adjusted to be the same as M1 in long axis through downsampling, and taking the image information subjected to downsampling treatment as global image information of the corresponding focus area in the focus areas of the first type.

S133, taking the image information of each focus area in the focus areas of the second type as the global image information of the corresponding focus area in the focus areas of the second type.

S134, carrying out interpolation treatment on each focus zone in the focus zone of the third type, so that each focus zone in the focus zone of the third type is adjusted to have the same long axis as M2 through interpolation, and taking the image information subjected to interpolation treatment as global image information of the corresponding focus zone in the focus zone of the third type.

Through the processing, each focus area in the sample data set can be unified into a certain size range (the long axis is between M2 and M1). Because the benign and malignant degree of the focus area has no direct relation with the size of the focus area, the focus classification is not influenced by the size adjustment, and the adverse effect on the follow-up focus classification model caused by the large focus size difference can be avoided.

S14, training a focus classification model with a double input channel, which is built in advance, by utilizing the local image information and the global image information of each focus area and the classification labels corresponding to each focus area to obtain a target focus classification model.

Specifically, the present embodiment uses a gradient descent approach to train the lesion classification model, and may use the deep learning network accelerator weight normalization to accelerate the training during the training process. Meanwhile, a commonly used K-fold cross validation method can be adopted to validate the trained model so as to obtain the performance of the trained focus classification model, such as an accuracy rate/AUC curve and the like.

In an embodiment, the value of the amplification ratio N% may be adjusted multiple times to obtain a plurality of trained lesion classification models, and a model with the best performance among the plurality of trained lesion classification models is used as the target lesion classification model. Wherein the amplification factor N% can be adjusted using the following rule: assuming that the value range of N% is set to be less than 100%, the initial value of N% is set to 10% first, and then N% is increased by a predetermined step (e.g., 10%) until N% reaches 100%. It should be appreciated that different values for N% will result in different local image information for each focal zone, and thus different lesion classification models are trained.

Preferably, as shown in fig. 2, the focus classification model adopted in the implementation includes a first convolution network, a second convolution network, a splicing layer and a fully-connected classification layer, where the first convolution network and the first convolution network are parallel networks, an input end of the first convolution network is used for receiving local image information of each focus area, an input end of the second convolution network is used for receiving global image information of each focus area, an input end of the splicing layer is connected with output ends of the first convolution network and the second convolution network, and an input end of the fully-connected classification layer is connected with an output end of the splicing layer. Therefore, the features in the local image information and the global image information can be respectively extracted through the first convolution network and the second convolution network, the features in the local image information and the global image information are spliced through the splicing layer and then input into the full-connection classification layer, and finally classification processing is carried out through the full-connection classification layer, and classification results are output.

Specifically, the first convolutional network and the second convolutional network in this embodiment are depth residual networks (resnet), wherein the depth residual networks preferably include a convolutional layer, a residual module (Resblock), and a pooling layer. As shown in fig. 3, specific structures of the first convolution network and the second convolution network are shown, after the first convolution network/the second convolution network receives the local image information/the global image information, the local image information/the global image information is processed by the first convolution layer, 3 first residual modules, the second convolution layer, 3 second residual modules, the third convolution layer, 3 third residual modules and the pooling layer in sequence, so as to obtain characteristics in the local image information/the global image information. Preferably, the parameters of the third convolutional layer, the third residual module and the pooling layer of the first and second convolutional networks are independent and the parameters of the other layers are shared.

In addition, the structure of each residual module in the present invention is shown in fig. 4, and includes a residual module layer and an attention network (attention). The residual error module layer comprises a fourth convolution layer, a fifth convolution layer and a sixth convolution layer. After the residual error module layer receives an input signal, the residual error module layer sequentially processes the input signal through a fourth convolution layer, a fifth convolution layer and a sixth convolution layer, and then sums the output result of the sixth convolution layer with the input signal and inputs the summed result to the attention network for processing.

The attention network plays a role in making the model focus area, and the structure of the attention network is shown in fig. 5, and the attention network comprises two paths, wherein the first path comprises a seventh convolution layer 1*1; the second path comprises a first maximum pooling layer, an eighth convolution layer, a second maximum pooling layer, a ninth convolution layer, a sub-pixel up-sampling layer and a classification layer, and is used for obtaining an attention weighting matrix, and finally multiplying the attention weighting matrix with the output result of the first path to enable the model to pay more attention to the high weight area. The sub-pixel upsampling is obtained by interpolation adjustment of the upsampling, compared with the upsampling, the sub-pixel upsampling does not need to reduce the channel of the model to 1 in the convolution process, and is more beneficial to integrating information in a plurality of channels.

In summary, the embodiment designs a focus classification model with two input channels, and receives local image information and global image information of a focus region respectively through the two input channels, and because the local image information contains more detail information of the focus region and global image information can reflect global characteristics of the focus region, the focus classification model obtained by training is higher in accuracy than the existing single-input focus classification model. Meanwhile, the invention does not need to respectively build models aiming at different focus size ranges, thereby reducing the requirement on sample data size.

Example two

The difference between the method of the present embodiment and the first embodiment is that after the target lesion classification model is obtained in step S14, the steps shown in fig. 6 are executed, which specifically includes:

S21, acquiring a target case picture, wherein the outline of a target focus area is marked in the target case picture.

S22, collecting and processing local image information of the target focus area to obtain the local image information of the target focus area. The local image information acquisition process in this step is the same as the local image information acquisition process in step S12, and will not be described here again.

S23, global image information acquisition processing is carried out on the target focus area, and global image information of the target focus area is obtained. The global image information collection process in this step is the same as the global image information collection process in step S13, and will not be described here again.

S24, processing the local image information and the global image information of the target focus area by using the target focus classification model obtained through training in the first embodiment to obtain a benign and malignant classification result of the target focus area.

In this embodiment, the two input channels of the target focus classification model respectively receive the local image information and the global image information of the target focus region, and the target focus classification model has a more accurate classification effect, so that the target focus region in the case picture can be accurately classified.

It should be noted that, for simplicity of description, the first and second embodiments are all described as a series of combinations of actions, but it should be understood by those skilled in the art that the present invention is not limited by the order of actions described, as some steps may be performed in other order or simultaneously according to the present invention. Further, those skilled in the art will also appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required to practice the present invention

Example III

The present embodiment provides a focus classification model training apparatus 10, as shown in fig. 7, which includes:

the sample acquisition module 11 is configured to acquire a sample data set, where the sample data set includes a plurality of sample pictures marked with focal areas respectively, and each focal area is marked with a corresponding classification label respectively;

A first local image information acquisition module 12, configured to acquire and process local image information of each focal zone, so as to obtain local image information of each focal zone;

the first global image information acquisition module 13 is configured to perform global image information acquisition processing on each focal zone to obtain global image information of each focal zone;

The model training module 14 is configured to train a pre-established focus classification model with a dual input channel by using the local image information and the global image information of each focus area and the classification label corresponding to each focus area, so as to obtain a target focus classification model.

In this embodiment, the first local image information acquisition module is specifically configured to:

In this embodiment, the first local image information acquisition module is further configured to: before randomly moving the focus rectangular frame for a plurality of times in the neighborhood of the focus area,

In this embodiment, the first global image information acquisition module is specifically configured to:

In this embodiment, the focus classification model includes a first convolution network, a second convolution network, a splicing layer, and a fully-connected classification layer, where an input end of the first convolution network is configured to receive local image information of each focus area, an input end of the second convolution network is configured to receive global image information of each focus area, an input end of the splicing layer is respectively connected with output ends of the first convolution network and the second convolution network, and an input end of the fully-connected classification layer is connected with an output end of the splicing layer.

Example IV

The present embodiment provides a lesion classification model training device 10 according to the third embodiment, as shown in fig. 8, which is different from the third embodiment in that the device further includes the following modules:

The case picture obtaining module 15 is used for obtaining a target case picture, wherein a target focus area is marked in the target case picture;

The second local image information acquisition module 16 is configured to acquire and process the local image information of the target focal zone, so as to obtain local image information of the target focal zone;

The second global image information acquisition module 17 is configured to acquire and process the global image information of the target focal zone, so as to obtain global image information of the target focal zone;

the model processing module 18 is configured to process the local image information and the global image information of the target focal zone by using the target focal classification model obtained by the focal classification model training device in the third embodiment, so as to obtain a classification result of the target focal zone.

The device embodiments described above are substantially similar to the method embodiments described above, and therefore the description is relatively simple, as relevant to the description of the method embodiments provided herein. It should also be appreciated by those skilled in the art that the embodiments described in the specification are all preferred embodiments and that the modules involved are not necessarily essential to the invention.

Example five

The present embodiment provides a computer device, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack-mounted server, a blade server, a tower server, or a rack-mounted server (including an independent server or a server cluster formed by a plurality of servers) that can execute a program. The computer device 20 of the present embodiment includes at least, but is not limited to: a memory 21, a processor 22, which may be communicatively coupled to each other via a system bus, as shown in fig. 9. It should be noted that fig. 9 only shows a computer device 20 having components 21-22, but it should be understood that not all of the illustrated components are required to be implemented, and that more or fewer components may alternatively be implemented.

In the present embodiment, the memory 21 (i.e., readable storage medium) includes a flash memory, a hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the memory 21 may be an internal storage unit of the computer device 20, such as a hard disk or memory of the computer device 20. In other embodiments, the memory 21 may also be an external storage device of the computer device 20, such as a plug-in hard disk provided on the computer device 20, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), or the like. Of course, the memory 21 may also include both internal storage units of the computer device 20 and external storage devices. In this embodiment, the memory 21 is generally used to store an operating system installed in the computer device 20 and various types of application software, such as program codes of the lesion classification model training device 10 according to the third/fourth embodiment. Further, the memory 21 may be used to temporarily store various types of data that have been output or are to be output.

Processor 22 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 22 is generally used to control the overall operation of the computer device 20. In this embodiment, the processor 22 is configured to execute the program code stored in the memory 21 or process data, for example, execute the lesion classification model training device 10, to implement the lesion classification model training method according to the first embodiment.

Example six

The present embodiment provides a computer-readable storage medium such as a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application store, etc., on which a computer program is stored, which when executed by a processor, performs a corresponding function. The computer readable storage medium of the present embodiment is used for storing the focus classification model training apparatus 10, and when executed by the processor, implements the focus classification model training method of the first embodiment.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. A method for training a lesion classification model, comprising:

Training a focus classification model with a double input channel, which is built in advance, by utilizing the local image information and the global image information of each focus zone and the classification label corresponding to each focus zone to obtain a target focus classification model;

The global image information acquisition processing includes:

performing downsampling treatment on each focus area in the focus areas of the first type to adjust the long axes of each focus area in the focus areas of the first type to be the same through downsampling, and taking the image information subjected to the downsampling treatment as global image information of the corresponding focus areas in the focus areas of the first type;

Performing interpolation treatment on each focus zone in the third type focus zone, so that long axes of each focus zone in the third type focus zone are adjusted to be identical through interpolation, and taking the image information subjected to the interpolation treatment as global image information of a corresponding focus zone in the third type focus zone;

The local image information acquisition processing includes:

2. The method of claim 1, wherein the local image information acquisition process further comprises, before randomly moving the focal rectangular frame a plurality of times in a neighborhood of the focal zone:

3. The method of claim 1, wherein the lesion classification model comprises a first convolutional network, a second convolutional network, a stitching layer, and a fully-connected classification layer, wherein an input of the first convolutional network is configured to receive local image information of each lesion region, an input of the second convolutional network is configured to receive global image information of each lesion region, an input of the stitching layer is respectively connected with an output of the first convolutional network and an output of the second convolutional network, and an input of the fully-connected classification layer is connected with an output of the stitching layer.

4. The method of claim 1, further comprising, after obtaining the target lesion classification model:

5. A lesion classification model training device, the device comprising:

The model training module is used for training a pre-established focus classification model with double input channels by utilizing the local image information and the global image information of each focus zone and the classification labels corresponding to each focus zone to obtain a target focus classification model;

the first global image information acquisition module is specifically configured to:

The first local image information acquisition module is specifically configured to:

6. The lesion classification model training device according to claim 5, further comprising:

7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 4 when the computer program is executed.

8. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 4.