CN117011628A

CN117011628A - Image classification model training method, image classification method and device

Info

Publication number: CN117011628A
Application number: CN202210673448.4A
Authority: CN
Inventors: 卢东焕; 宁慕楠; 张欣宇; 熊竹; 陈勃; 郑冶枫
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-06-14
Filing date: 2022-06-14
Publication date: 2023-11-07

Abstract

The invention provides an image classification model training method, an image classification device, electronic equipment and a storage medium, wherein the method comprises the following steps: combining the first original sample image and the second original sample image to form a training sample set of an image classification model; determining a multi-task loss function of the image classification model according to the use environment of the image classification model; the image classification model is trained through the training sample set and the task loss function, and model parameters of the image classification model are determined so as to classify the images to be classified through the image classification model, so that the accuracy of training and training the image classification model is improved on the premise of reducing the total amount of training data and needing no data labeling, an infected person is found timely and accurately, and the time for epidemiological investigation of infectious diseases is saved.

Description

Image classification model training method, image classification method and device

Technical Field

The present invention relates to disease type information processing technology, and more particularly, to an image classification model training method, an image classification device, an electronic apparatus, and a storage medium.

Background

At present, the quick detection of the A-type coronavirus can be finished at home by purchasing quick detection test paper, a detection box and a detection card by people. However, the self-detection only completes the detection steps of the detection test paper, the detection box and the detection card of the biological or chemical principle, and has the problems that the information after detection is reported and collected, the detection result of the existing detection method also needs to be further visually observed and interpreted by a detected person, and then the detection result is reported by operating a mobile phone, so that the detection result can not be automatically and rapidly collected and uploaded, and the informationized advantage is not fully utilized. Meanwhile, because the proficiency of uploading and collecting the images of the users is different, the users can not upload the complete detection images, and adverse effects are brought to information collection, so that the situation information of epidemic situation can not be comprehensively mastered by related functional departments, and the prevention and investigation of the infectious diseases are affected.

Disclosure of Invention

In view of the above, embodiments of the present invention provide an image classification model training method, an image classification method, an apparatus, an electronic device, and a storage medium, which can implement training of an image classification model, and facilitate a weak supervision training manner on the premise of reducing the total amount of training data and not requiring heavy data labeling, so as to stably improve the accuracy of training and training the image classification model, reduce the training time, reduce the overfitting defect of the image classification model, and simultaneously flexibly adapt to the detection and use requirements of different diseases by using a multitasking loss function of the image classification model, thereby implementing large-scale application of the image classification model. Meanwhile, the images with infection risk can be accurately judged according to the classification result of the images, the infected person can be timely and accurately found, and the time for epidemiological investigation of infectious diseases is saved.

The technical scheme of the embodiment of the invention is realized as follows:

the embodiment of the invention provides an image classification model training method, which comprises the following steps:

acquiring a first original sample image without labels and a second original sample image carrying label information;

combining the first original sample image and the second original sample image to form a training sample set of an image classification model;

determining a multi-task loss function of the image classification model according to the use environment of the image classification model;

and training the image classification model through the training sample set and the task loss function, and determining model parameters of the image classification model so as to classify the image to be classified through the image classification model.

The embodiment of the invention also provides an image classification method, which comprises the following steps:

obtaining an image to be classified and disease type information, wherein the disease type information comprises:

a disease type identifier, and a definitive image bounding box state corresponding to the disease type identifier;

extracting features of the training sample set through a feature extraction network in the image classification model to obtain image features of the training sample set;

Classifying the image features of the training sample set through an image classification network in an image classification model to obtain a classification result of the image to be classified;

wherein the image classification model is trained by the method of any one of claims 1 to 5;

and when the classification result of the image to be classified is the same as the state of the confirmed image boundary box, sending out confirmed alarm information.

The embodiment of the invention also provides an image classification model training device, which comprises:

the information transmission module is used for acquiring a first original sample image without labels and a second original sample image carrying label information;

the information processing module is used for combining the first original sample image and the second original sample image to form a training sample set of an image classification model;

the information processing module is used for determining a multi-task loss function of the image classification model according to the use environment of the image classification model;

the information processing module is used for training the image classification model through the training sample set and the task loss function and determining model parameters of the image classification model so as to classify the image to be classified through the image classification model.

In the above-described arrangement, the first and second embodiments,

the information processing module is used for determining the disease type corresponding to the image classification model according to the use environment of the image classification model;

the information processing module is used for determining the type of the boundary frame corresponding to the image classification model and the classification loss function corresponding to the type of the boundary frame according to the type of the disease;

the information processing module is used for acquiring a confidence coefficient loss function of the boundary box;

the information processing module is used for calculating the multi-task loss function of the image classification model according to the classification loss function and the confidence loss function.

In the above-described arrangement, the first and second embodiments,

the information processing module is used for carrying out local augmentation processing on the second original sample image according to the use environment of the image classification model to obtain a local augmentation image so as to realize the execution of spatial filtering on local neighborhood information of the second original sample image;

the information processing module is used for performing global augmentation processing on the second original sample image to obtain a global augmented image so as to realize adjustment of the definition of the second original sample image.

In the above-described arrangement, the first and second embodiments,

the information processing module is used for determining a dynamic noise threshold value matched with the use environment of the image classification model;

the information processing module is used for denoising the training sample set according to the dynamic noise threshold value to form a training sample set matched with the dynamic noise threshold value; or,

the information processing module is used for determining a fixed noise threshold corresponding to the image classification model, and denoising the training sample set according to the fixed noise threshold so as to form a training sample set matched with the fixed noise threshold.

In the above-described arrangement, the first and second embodiments,

the information processing module is used for adjusting network parameters of the image classification model based on the feature vector of the training sample set and the multi-task loss function;

the information processing module is used for determining network parameters of the image classification model until the loss functions of different dimensions corresponding to the image classification model reach corresponding convergence conditions so as to realize the adaptation of the parameters of the image classification model to the use environment.

The embodiment of the invention also provides an image classification device, which comprises:

The data transmission module is used for acquiring images to be classified and disease type information, wherein the disease type information comprises:

the data processing module is used for extracting the characteristics of the training sample set through a characteristic extraction network in the image classification model to obtain the image characteristics of the training sample set;

the data processing module is used for classifying the image characteristics of the training sample set through an image classification network in the image classification model to obtain a classification result of the image to be classified;

and the data processing module is used for sending out diagnosis alarm information when the classification result of the image to be classified is the same as the state of the diagnosis image boundary box.

The embodiment of the invention also provides electronic equipment, which comprises:

a memory for storing executable instructions;

and the processor is used for realizing the image classification model training method when running the executable instructions stored in the memory.

The invention provides a computer readable storage medium storing executable instructions which when executed by a processor implement the image classification model training method.

The embodiment of the invention has the following beneficial effects:

the method comprises the steps of obtaining a first original sample image without labels and a second original sample image carrying label information; combining the first original sample image and the second original sample image to form a training sample set of an image classification model; determining a multi-task loss function of the image classification model according to the use environment of the image classification model; and training the image classification model through the training sample set and the task loss function, and determining model parameters of the image classification model so as to classify the image to be classified through the image classification model. Therefore, the training of the image classification model can be realized, the weak supervision training mode is facilitated on the premise of reducing the total quantity of training data and needing no repeated data marking, the training accuracy of the image classification model is stably improved, the training time is shortened, the over-fitting defect of the image classification model is reduced, meanwhile, the multi-task loss function of the image classification model can flexibly adapt to the detection use requirements of different diseases, and the large-scale application of the image classification model is realized. Meanwhile, the images with infection risk can be accurately judged according to the classification result of the images, the infected person can be timely and accurately found, and the time for epidemiological investigation of infectious diseases is saved.

Drawings

FIG. 1A is a schematic view of an environment in which an image classification model training method according to an embodiment of the present invention is used;

FIG. 1B is an alternative schematic view of an image to be classified according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a composition structure of an image classification model training device according to an embodiment of the present invention;

FIG. 3 is an alternative schematic view of an image to be classified according to an embodiment of the present invention;

FIG. 4 is an alternative schematic view of an image to be classified according to an embodiment of the present invention;

FIG. 5 is a schematic flow chart of an alternative image classification model training method according to an embodiment of the present invention;

fig. 6A is a schematic diagram illustrating an operating principle of a YOLO network according to an embodiment of the present invention;

FIG. 6B is a schematic diagram of a model structure of an image classification model according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of Focus operation in a model of an image classification model according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of a small program capturing images to be classified according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of detection selection in an embodiment of the invention;

fig. 10 is a schematic diagram of an alternative process of the image classification method according to the embodiment of the present invention.

FIG. 11 is a diagram illustrating the classification result according to an embodiment of the present invention.

Detailed Description

The present invention will be further described in detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present invention more apparent, and the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present invention.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.

Before describing embodiments of the present invention in further detail, the terms and terminology involved in the embodiments of the present invention will be described, and the terms and terminology involved in the embodiments of the present invention will be used in the following explanation.

1) Based on the conditions or states that are used to represent the operations that are being performed, one or more of the operations that are being performed may be in real-time or with a set delay when the conditions or states that are being relied upon are satisfied; without being specifically described, there is no limitation in the execution sequence of the plurality of operations performed.

2) A client, a carrier in a terminal that implements a specific function, for example, a mobile client (APP) is a carrier of a specific function in a mobile terminal, for example, a function of performing live online (video push) or a play function of online video.

3) Convolutional neural network (CNN Convolutional Neural Networks) is a type of feedforward neural network (Feed forward Neural Networks) that includes convolutional computation and has a deep structure, and is one of representative algorithms of deep learning. Convolutional neural networks have the capability of token learning (represent ation learning) and are capable of performing a shift-invariant classification (shift-i nvariant classification) on input information in their hierarchical structure.

4) Model training, multi-classification learning is carried out on the image data set. The model can be constructed by adopting a deep learning framework such as a Tensor Flow and a torch, and a multi-image classification model is formed by using the multi-layer combination of neural network layers such as CNN. The input of the model is a three-channel or original channel matrix formed by reading the image through openCV and other tools, the model is output as multi-classification probability, and the image classification result is finally output through softmax and other algorithms. During training, the model approaches to the correct trend through an objective function such as cross entropy and the like.

5) Neural Networks (NN): an artificial neural network (Artificial Neural Ne twork, ANN), abbreviated as neural network or neural-like network, is a mathematical or computational model that mimics the structure and function of biological neural networks (the central nervous system of animals, particularly the brain) for estimating or approximating functions in the field of machine learning and cognitive sciences.

6) The contrast loss: the loss function is compared, and a mapping relation can be learned, so that points with the same category but far away from each other in a high-dimensional space can be mapped to a low-dimensional space through the function, the distances are closer, and the points with different categories but far away from each other are farther from each other after being mapped. As a result, in a low-dimensional space, the same kind of points will have a clustering effect, and different kinds of means will be separated. Similar to the fisher dimension reduction, but the fisher dimension reduction does not have the effect of out-of-sample exte nsion and cannot act on new samples.

7) Soft max: the normalized exponential function is a generalization of the logic function. It can "compress" a K-dimensional vector containing arbitrary real numbers into another K-dimensional real vector such that each element ranges between 0,1 and the sum of all elements is 1.

Fig. 1A is a schematic view of a usage scenario of an image classification model training method provided by an embodiment of the present application, referring to fig. 1A, a terminal (including a terminal 10-1 and a terminal 10-2) is provided with corresponding clients capable of executing different functions, where the clients browse medical records information of different corresponding images obtained by the terminal (including the terminal 10-1) from a corresponding server 200 through a network 300, or obtain corresponding medical images, analyze epidemiological investigation results of the images, the terminal is connected to the server 200 through the network 300, and the network 300 may be a wide area network or a local area network, or a combination of the two, and implement data transmission using a wireless link, where the medical records information types of the corresponding images obtained by the terminal (including the terminal 10-1) from the corresponding server 200 through the network 300 may be the same or different, for example: the terminals (including the terminal 10-1 and the terminal 10-2) can acquire epidemiological investigation results matched with the image from the corresponding server 200 through the network 300, or can acquire epidemiological investigation of the crowd (e.g., close-up case, sub-close-up case, and empty companion case) associated with the current target from the corresponding server 200 through the network 300. The server 200 may store medical record information of corresponding images corresponding to different images, or may store epidemiological surveys matching the medical record information of corresponding images. In some embodiments of the present application, the disease type information stored in the server 200 may include: the information of various types of infectious diseases can be distinguished by corresponding disease type identifiers, meanwhile, the diagnosis confirming and early warning threshold corresponding to the disease type identifiers can be stored, and after the classification result of the images to be classified is obtained through the image classification model, the diagnosis confirming and early warning threshold is used for timely sending out diagnosis confirming and alarm information to inform corresponding disease control departments. The disease type identifier carried by the disease information in the application can characterize various infectious diseases, and particularly the infectious diseases are classified into a class A, a class B and a class C. Taking a type-a coronavirus infection as an example, the terminal 10-2 can perform self-detection through antigen detection test paper to determine whether to diagnose the type-a coronavirus infection, referring to fig. 1B, fig. 1B is an optional schematic diagram of an image to be classified in the embodiment of the application, wherein, a clearly visible red stripe appears on both a T detection line and a C quality control line, and the color of the T line is higher, and the positive is stronger; only the C quality control line has a clearly visible red strip, and the C quality control line is judged as negative; and if the C quality control line does not have a red stripe, judging that the C quality control line is invalid. The terminal 10-2 acquires the image shown in fig. 1B, so that the server 200 can acquire a corresponding classified image, and the image classification process is performed through the image classification model deployed in the server 200, so as to determine whether the user is infected with the type a coronavirus infection.

The embodiment of the invention can be realized by combining Cloud technology, wherein Cloud technology (Cloud technology) refers to a hosting technology for integrating hardware, software, network and other series resources in a wide area network or a local area network to realize calculation, storage, processing and sharing of data, and can also be understood as the general term of network technology, information technology, integration technology, management platform technology, application technology and the like applied based on a Cloud computing business model. Background services of technical network systems require a large amount of computing and storage resources, such as video websites, picture websites and more portal websites, so cloud technologies need to be supported by cloud computing.

It should be noted that cloud computing is a computing mode, which distributes computing tasks on a resource pool formed by a large number of computers, so that various application systems can acquire computing power, storage space and information service as required. The network that provides the resources is referred to as the "cloud". Resources in the cloud are infinitely expandable in the sense of users, and can be acquired at any time, used as needed, expanded at any time and paid for use as needed. As a basic capability provider of cloud computing, a cloud computing resource pool platform, referred to as a cloud platform for short, is generally called infrastructure as a service (IaaS, infrastructure as a Service), and multiple types of virtual resources are deployed in the resource pool for external clients to select for use. The cloud computing resource pool mainly comprises: computing devices (which may be virtualized machines, including operating systems), storage devices, and network devices.

Referring to fig. 1A, the image determining method provided by the embodiment of the present application may be implemented by a corresponding cloud device, for example: the terminals (including the terminal 10-1 and the terminal 10-2) are connected to the server 200 located at the cloud through the network 300, and the network 300 may be a wide area network or a local area network, or a combination of the two. It should be noted that the server 200 may be a physical device or a virtualized device.

Specifically, as shown in fig. 1A in the foregoing embodiment, the server 200 may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, and basic cloud computing services such as big data and artificial intelligence platforms. The terminal may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein.

It should be noted that, for any existing infectious disease, the image classification model can be trained based on the image classification model training method of the embodiment when epidemiological investigation is performed, so as to facilitate remote review and use of doctors, and fine adjustment can be performed on the image classification model according to disease types, so as to meet epidemiological investigation requirements of diseases, for example, when an a-type coronavirus test paper is used as an image to be classified, the method includes: the sample adds the window, detects the window, and the sign line window (C line and T line), when AIDS test paper was regarded as waiting to classify the image, include: a sample addition window and a result judgment window.

The server 200 transmits medical record information of the corresponding image of the same image to the terminal (terminal 10-1 and/or terminal 10-2) through the network 300 to enable the user of the terminal (terminal 10-1 and/or terminal 10-2) to analyze the medical record information of the corresponding image of the image, and thus. As an example, the server 200 deploys a corresponding neural network model, and before deployment, training the image classification model is further required, which specifically includes: acquiring a first original sample image without labels and a second original sample image carrying label information; combining the first original sample image and the second original sample image to form a training sample set of an image classification model; determining a multi-task loss function of the image classification model according to the use environment of the image classification model; and training the image classification model through the training sample set and the task loss function, and determining model parameters of the image classification model so as to classify the image to be classified through the image classification model.

The structure of the image classification model training apparatus according to the embodiment of the present invention will be described in detail below, and the image classification model training apparatus may be implemented in various forms, such as a dedicated terminal with a processing function of the image classification model training apparatus, or a server provided with a processing function of the image classification model training apparatus, for example, the server 200 in fig. 1A. Fig. 2 is a schematic diagram of a composition structure of an image classification model training apparatus according to an embodiment of the present invention, and it can be understood that fig. 2 only shows an exemplary structure of the image classification model training apparatus, but not all the structure, and part or all of the structure shown in fig. 2 can be implemented as required.

The image classification model training device provided by the embodiment of the invention comprises: at least one processor 201, a memory 202, a user interface 203, and at least one network interface 204. The various components in the image classification model training apparatus are coupled together by a bus system 205. It is understood that the bus system 205 is used to enable connected communications between these components. The bus system 205 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration the various buses are labeled as bus system 205 in fig. 2.

The user interface 203 may include, among other things, a display, keyboard, mouse, trackball, click wheel, keys, buttons, touch pad, or touch screen, etc.

It will be appreciated that the memory 202 may be either volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The memory 202 in embodiments of the present invention is capable of storing data to support operation of the terminal (e.g., 10-1). Examples of such data include: any computer program, such as an operating system and application programs, for operation on the terminal (e.g., 10-1). The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, for implementing various basic services and processing hardware-based tasks. The application may comprise various applications.

In some embodiments, the image classification model training apparatus provided by the embodiments of the present invention may be implemented by combining software and hardware, and as an example, the image classification model training apparatus provided by the embodiments of the present invention may be a processor in the form of a hardware decoding processor, which is programmed to perform the image classification model training method provided by the embodiments of the present invention. For example, a processor in the form of a hardware decoding processor may employ one or more application specific integrated circuits (ASICs, application Specific Integrated Circuit), DS ps, programmable logic devices (PLDs, programmable Logic Device), complex programmable logic devices (CPLDs, complex Programmable Logic Device), field programmable gate arrays (FPGs a, fields-Programmable Gate Array), or other electronic components.

As an example of implementation of the image classification model training apparatus provided by the embodiment of the present invention by combining software and hardware, the image classification model training apparatus provided by the embodiment of the present invention may be directly embodied as a combination of software modules executed by the processor 20, the software modules may be located in a storage medium, the storage medium is located in the memory 202, the processor 201 reads executable instructions included in the software modules in the memory 202, and the image classification model training method provided by the embodiment of the present invention is completed by combining necessary hardware (including, for example, the processor 201 and other components connected to the bus 205).

By way of example, the processor 201 may be an integrated circuit chip having signal processing capabilities such as a general purpose processor, such as a microprocessor or any conventional processor, a digital signal processor (DSP, digital Signal Processor), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.

As an example of implementation of the image classification model training apparatus provided by the embodiment of the present invention by hardware, the apparatus provided by the embodiment of the present invention may be implemented directly by the processor 201 in the form of a hardware decoding processor, for example, by one or more application specific integrated circuits (ASIC, application Specific I ntegrated Circuit), DSPs, programmable logic devices (PLD, programmable Logic Device), complex programmable logic devices (CPLD, complex Programmable Logic Device), field programmable gate arrays (FPGA, field-Programmable Gate Array), or other electronic components.

The memory 202 in embodiments of the present invention is used to store various types of data to support the operation of the image classification model training apparatus. Examples of such data include: any executable instructions, such as executable instructions, for operation on an image classification model training apparatus, a program implementing the method of training a slave image classification model according to an embodiment of the present invention may be embodied in the executable instructions.

In other embodiments, the image classification model training apparatus provided in the embodiments of the present invention may be implemented in a software manner, and fig. 2 shows the image classification model training apparatus stored in the memory 202, which may be software in the form of a program, a plug-in, or the like, and includes a series of modules, and as an example of the program stored in the memory 202, may include the image classification model training apparatus, where the image classification model training apparatus includes the following software module information transmission module 2081 and information processing module 2082. When the software modules in the image classification model training device are read into the RAM by the processor 201 and executed, the image classification model training method provided by the embodiment of the invention is implemented, where the functions of each software module in the image classification model training device include:

The information transmission module 2081 is configured to obtain a first original sample image without labels and a second original sample image carrying label information.

The information processing module 2082 is configured to combine the first original sample image and the second original sample image to form a training sample set of the image classification model.

The information processing module 2082 is configured to determine a multitasking loss function of the image classification model according to a use environment of the image classification model.

The information processing module 2082 is configured to train the image classification model through the training sample set and the task loss function, and determine model parameters of the image classification model, so as to implement classification of the image to be classified through the image classification model.

After the training of the image classification model is completed, the image classification device can be continuously deployed in the electronic equipment, which specifically comprises the following steps:

the data transmission module is used for acquiring images to be classified and disease type information, wherein the disease type information comprises: a disease type identifier, and a definitive image bounding box state corresponding to the disease type identifier; the data processing module is used for extracting the characteristics of the training sample set through a characteristic extraction network in the image classification model to obtain the image characteristics of the training sample set; the data processing module is used for classifying the image characteristics of the training sample set through an image classification network in the image classification model to obtain a classification result of the image to be classified; and the data processing module is used for sending out diagnosis alarm information when the classification result of the image to be classified is the same as the state of the diagnosis image boundary box.

The image classification model training method provided by the application is further described with reference to the usage scenario shown in fig. 1A, wherein in the related art, in the disease diagnosis process, the image processing process to be classified is described, and in the case of a type a coronavirus infection, the terminal 10-2 can perform self-detection through the antigen detection test paper to determine whether to confirm the type a coronavirus infection, referring to fig. 3 and fig. 4, fig. 3 is an optional schematic diagram of the image to be classified in the embodiment of the application, fig. 4 is an optional schematic diagram of the image to be classified in the embodiment of the application, and the results of the detection of the type a coronavirus antigen can be classified into three types of positive, negative and invalid, if both the C line and the T line are marked positive, if only the C line is marked negative, and if the C line is not marked invalid. However, in practical application, it is difficult to directly obtain a good judgment result only by the rule, as shown in fig. 3, because of product defects, judgment cannot be performed only by comparing the positions of the C, T character marks and the mark lines, because some antigen reagent C, T characters are not obvious, and automatic recognition is not facilitated; another way, as shown in fig. 4, for users with unskilled information acquisition, a complete image to be classified cannot be acquired, and thus, accuracy of recognition is affected. When the close-connected person corresponding to the image to be classified shown in fig. 4 exceeds the isolation period, the conventional isolation prevention and control measures for the close-connected person fail to a certain extent, and the risk of disease re-transmission exists. The manual collection cannot adapt to the transmission speed of epidemic diseases, and the workload of manual repeated statistics is increased.

In order to solve the above-mentioned drawbacks, referring to fig. 5, fig. 5 is an optional flowchart of an image classification model training method according to an embodiment of the present invention, in which images may be used by selecting different disease type information prediction scenes, and it is to be understood that the steps shown in fig. 5 may be performed by various electronic devices running the image classification model training apparatus, for example, a dedicated terminal, a server or a server cluster with an image classification processing function. The following is a description of the steps shown in fig. 5.

Step 501: the image classification model training device acquires a first original sample image without labels and a second original sample image carrying label information.

In some embodiments of the present invention, taking a type a coronavirus infection as an example, the terminal 10-2 may perform self-detection through an antigen detection test paper to determine whether to diagnose a type a coronavirus infection, and the generated image to be classified is shown in fig. 1B, in the training stage of the image classification model, since a window needs to be added to a sample of an original sample image, the detection window, and the marking line windows where the C line and the T line are located are respectively marked, so that marking cost is increased, and available data is reduced. In order to increase the data volume of the sample image and improve the training precision of the classification model, weak supervision loss based on the integral image annotation is introduced. The original sample image using the weak supervision loss can be classified into two types, one being an a-type coronal antigen detection image (positive, negative are usable) as the second original sample image and the other being an irrelevant image or an invalid image as the first original sample image.

In some embodiments of the present application, in the initial stage of acquiring the training sample, since the image acquisition device of the user is different, the acquisition mode is not skilled, so that the sharpness of the second original sample image may be affected, and in order to ensure the sharpness of the second original sample image, the following processing may be performed:

according to the use environment of the image classification model, carrying out local augmentation processing on the second original sample image to obtain a local augmentation image so as to realize the execution of spatial filtering on local neighborhood information of the second original sample image; and performing global augmentation processing on the second original sample image to obtain a global augmented image so as to realize adjustment of the definition of the second original sample image. The second original sample image is marked as X, and two types of transformation are performed, namely: global augmentation processing Global and local augmentation processing L ocoal. Wherein the global augmentation process may derive a transformation function mapping the input color to the output color. The local augmentation process may perform spatial filtering to determine pixel colors based on the local neighborhood information. In some embodiments of the present application, in order to enhance the sample processing effect, when performing image processing, local augmentation processing is enhanced in the X1 direction, only Local transformation is performed, and Global and Local transformation is performed on the X2 side of the Global augmentation processing result, so that the image classification model can better learn the relationship between Global augmentation processing and Local augmentation processing. It should be noted that examples of the data augmentation operation include, but are not limited to, size scaling, color dithering, gaussian filtering, etc., and the present application is not particularly limited thereto.

Step 502: the image classification model training device combines the first original sample image and the second original sample image to form a training sample set of the image classification model.

In some embodiments of the invention, determining a dynamic noise threshold that matches the environment in which the image classification model is used;

denoising the initial training sample set according to the dynamic noise threshold value to form an initial training sample set matched with the dynamic noise threshold value; because the use environments of the image classification models are different, and the dynamic noise threshold value matched with the use environments of the image classification models is also different, for example, taking an A-type coronavirus infection as an example, the difference between the quantity ratios of positive sample images, negative sample images and invalid sample images of a user is too large, the training speed of the image classification models can be influenced, therefore, the ratio of the training sample set occupied by the positive sample images (namely, the second original sample images) can be improved, and the ratio of the training sample set occupied by the negative sample images (namely, the first original sample images) can be improved according to the dynamic noise value in the AIDS detection environment so as to improve the training accuracy of the image classification models.

In some embodiments of the present invention, a fixed noise threshold corresponding to the image classification model may also be determined, and the initial training sample set may be denoised according to the fixed noise threshold to form an initial training sample set that matches the fixed noise threshold. When the image classification model is solidified in a corresponding hardware mechanism, such as a handheld detection terminal, the use environment is used for detecting the A-type coronavirus infection, and because the noise is single, the training speed of the image classification model can be effectively improved through the fixed noise threshold corresponding to the fixed image classification model, the waiting time of a user is reduced, and the large-scale use of the image classification model in the A-type coronavirus infection is facilitated.

Step 503: and the image classification model training device determines a multi-task loss function of the image classification model according to the use environment of the image classification model.

In some embodiments of the present invention, determining the multi-tasking loss function of the image classification model according to the use environment of the image classification model may be achieved by:

determining the disease type corresponding to the image classification model according to the use environment of the image classification model; determining a boundary box type corresponding to the image classification model and a classification loss function corresponding to the boundary box type according to the disease type; obtaining a confidence loss function of the bounding box; and calculating a multi-task loss function of the image classification model according to the classification loss function and the confidence loss function. Taking a type a coronavirus infection as an example, the terminal 10-2 can perform self-detection through the antigen detection test paper to determine whether to diagnose the type a coronavirus infection, the generated image to be classified is shown in fig. 1B, at this time, the image classification model may include a YOLO network (You Only Look Once), referring to fig. 6A, fig. 6A is a schematic diagram of the working principle of the YOLO network in the embodiment of the present invention, three different areas in the corresponding image to be detected of the antigen test paper can be identified by using the YOLO network included in the image classification model, namely, a sample adding window, a detection window, and a mark line window (including a T line and a C line). The classification of the images to be detected can be realized by judging whether the areas have relative position relations in the images to be detected, wherein the T detection line and the C quality control line are respectively provided with a clear visible red strip, and the T line is positive, and the darker the color of the T line is, the stronger the positive is; only the C quality control line has a clearly visible red strip, and the C quality control line is judged as negative; and if the C quality control line does not have a red stripe, judging that the C quality control line is invalid.

In operation of the YOLO network, the image classification model first divides an input sample image into s×s grids, then each cell is responsible for detecting the target of the center point falling on the cell, and for each cell, the network predicts the result of B frames, the prediction output is the confidence that the frame is the target, the coordinates of the frame and the probability of each frame on multiple categories, so the network outputs s×s (5×b+c) predicted values for each image, where C represents the number of probability values for each category, and 5*B can be expressed as two parts for each grid (s×x grids for the image). 4×b+1×b, where 4*B represents the number of coordinates of the upper left corner and the lower right corner of the detection frame, 1*B represents the confidence of the detection frame, and finally, adding a window to a sample in an image to be detected can be realized through the YOLO network, and the detection window and the identification line window (including T line and C line) are identified.

Since the test strip image of type a coronavirus has three test boxes, c=3 here. As shown in fig. 6A, where the first term takes 1 if there is an image that falls within a square, and takes 0 otherwise. There are 5 parameters per bounding box that need to be predicted, x, y, h, w, confidence. (x, y) represents the center point coordinates of the box, and is related to the boundaries of the square. (h, w) represents the height of the width of the frame, and is related to the whole picture. And then removing windows with lower confidence and threshold according to the threshold, and finally removing redundant windows through Non-maximum suppression (Non-Maximum Suppression) to obtain a final prediction result. Non-maximum suppression, which is commonly used for suppressing detection results which are not maximum values in computer vision tasks, mainly refers to non-maximum suppression in target detection tasks, and redundant overlapping detection frames are removed. Sorting the detection results according to the descending order of probability, wherein the method specifically comprises the following steps: (a) Assuming that frames predicted as detection frame categories in the image classification detection task are arranged A, B, C, D, E in descending order; (b) Selecting a frame A with highest probability, marking the frame A as a receiving frame, and judging IoU values of the remaining frames B, C, D, E and A (a result obtained by dividing the overlapped part of two areas by the integrated part of the two areas); (c) Typically the IoU threshold is set to 0.2-0.5, above which the description is a redundant box, requiring discarding, assuming B is discarded beyond the threshold, C, D, E does not exceed the threshold; (d) Continuing to select a box C with the highest probability from the rest detection boxes, marking the box C as an accepted box, calculating IoU values of the boxes D, E and C, and discarding that the value of IoU is larger than a threshold value; (e) And repeating the above process iteratively until all detection frames of each category are processed. By the non-maximum value suppression processing, it is thereby possible to realize real-time detection of each detection frame in the classified image.

Step 504: the image classification model training device trains the image classification model through the training sample set and the task loss function, and determines model parameters of the image classification model so as to classify the image to be classified through the image classification model.

In some embodiments of the present invention, training the image classification model through the training sample set and the task loss function, and determining model parameters of the image classification model may be implemented by:

adjusting network parameters of the image classification model based on the feature vectors of the training sample set and the multi-task loss function; and determining network parameters of the image classification model until the loss functions of different dimensions corresponding to the image classification model reach corresponding convergence conditions, so as to realize the adaptation of the parameters of the image classification model and the use environment. In order to better illustrate the construction process of the multi-task loss function and the training process of the image classification model, the training process of the image classification model is described below by taking the example that the image classification model adopts a yolov5 structure.

Referring to fig. 6B, fig. 6B is a schematic diagram of a model structure of an image classification model according to an embodiment of the present invention, where the yolov5 structure is mainly divided into four parts: the image classification system comprises an input end, a feature extraction network (back), a Neck structure and a Prediction structure, wherein the Neck structure and the Prediction structure can form an image classification network in an image classification model. Firstly, performing operations such as Mosaic data enhancement, self-adaptive anchor frame calculation, self-adaptive zooming of pictures and the like on an input end. And the input end adopts a data enhancement mode, namely, the original pictures are spliced in a mode of randomly scaling, cutting, arranging and the like. Thus, the background of the picture can be enriched, and the batch size (the number of samples selected by one training) is increased in a phase-changing way, so that the dependence of the training on the batch size per se is reduced.

Next, focus slicing operations and CSPDarknet53 constructs were added to the backbond section. The Darknet53 is a backup network of yolov3, modified from ResNet. The Darknet53 can not only keep the expression capability of Re sNet to the characteristics, but also avoid the gradient vanishing problem caused by the excessive depth of ResNet network. The accuracy of Darknet53 on ImageNet is comparable to ResNet-152 and is far faster than ResNet-152.

The CSP structure divides input into two parts, one part is calculated through a Residual block, the other part is directly through a short cut, and finally Concat operation splicing is carried out. As shown in FIG. 3, with the short cut operation, the CSP structure can reduce part of the computation and greatly enrich the gradient combination, and the reduction of the computation also means the improvement of the speed of reasoning. Because repeated gradient information is avoided, and the difference of the gradient information is increased through the Concat operation, the learning capacity of CNN (convolutional neural network) is improved, and meanwhile, the reasoning accuracy is also improved.

The CSPDarknet53 structure is formed by combining a CSPNet network and a Darknet53, and has the advantages of small calculation amount and high accuracy. Referring to fig. 7, fig. 7 is a schematic diagram of the operation of Foc us in the image classification model according to the embodiment of the present invention, as shown in fig. 7, before entering a back box, a color image of 608×608×3 RGB three channels may be converted into a feature of 304×304×12, so that the input size is reduced, and the channel is increased four times before the channel is relatively, so as to reduce the floating point operand and increase the speed. Further, the feature map is finally changed into a 304×304×64 feature map through a Convolution (CBL) operation, the focus layer converts the information on the w-h plane into the channel dimension, and different features are extracted through a 3*3 convolution mode. By adopting the method, the information loss caused by downsampling can be reduced, so that the training accuracy of an image classification model is improved, and the accurate identification of the A-type coronavirus detection test paper image is ensured.

The Neck part has the functions of fusing and extracting characteristics in the network and is a key part of the network. The Neck part adopts the structure of FPN+PAN, which is the result after the FPN (Feature Pyramid Network) network is fused with PANet (Path AggregationNetwork). As shown in FIG. 5, the FPN is top-down, merging features of the top layer with features of the bottom layer through upsampling and Concat operations. And the Bottom-up Path Aug mentation part, in contrast to the FPN, adopts a Bottom-up feature pyramid structure. Thus, parameters between the bottom layer and the top layer are fused better, and the detection effect is improved. The left arrow is operated from bottom to top, hundreds of layers of networks are needed to be experienced, and the information loss of the shallow layers is serious; the number of network layers experienced by the arrow on the right side is shallower, so that shallow information can be better stored, information loss is reduced, and the capability of extracting and fusing the characteristics of the network is enhanced.

In the prediction part, most important is the definition of the loss function, which is the next back-propagation dependent criterion. The prediction part calculates the frame loss by using generalized cross-over loss (CIOU Loss Generalized Intersection over Union), the prediction part loss function includes classification loss (classification loss) and confidence loss (confidence loss) of the bounding box, and referring to fig. 1B, taking a type a coronavirus infection as an example, a may be used to represent the predicted frame, B represents the real frame, C represents the minimum closed frame containing a and B, and the calculation formula of the cross-over (IOU Intersection over Union) refers to formula 1:

The classification loss consists of two parts, the first being a confidence loss of the bounding box, representing whether this box is the correct target box, and the second being a classification loss, representing which class of targets this bounding box should belong to. Both with cross entropy loss function, calculate reference equation 2:

where N represents the total number of bounding boxes, M represents the number of categories, m=2 for confidence loss in the present invention, and m=3 for category loss. y is _i，c Representing the true class of the sample, y when sample i belongs to class c _i，c =1, otherwise 0. And p is _i,c Representing the probability that the sample i predicted by the network belongs to class c.

After the final SxSx (5×b+c) predicted values are obtained by processing the image classification model, C channel results, that is, sxSxC outputs, are obtained, and then global maximum pooling (max pool) is performed on the outputs to obtain C outputs, that is, the maximum probability of all possible detection frames in the image is used to represent the probability that a sample adding window, a detection window, a C line and a T line exist in the image. For a type a coronal antigen detection image, whether positive or negative, these several boxes should be present; for irrelevant images, these several blocks should not exist, so far, the image classification model is trained, and can be deployed in a corresponding server or cloud server network.

Referring to fig. 8, fig. 8 is a schematic diagram of a small program collecting an image to be classified in the embodiment of the present application, and a user of an instant messaging client may collect and upload the image to be classified through the small program of the instant messaging client. When a user collects images to be classified through the small program shown in fig. 8, fig. 9 is a schematic diagram of detection selection in the embodiment of the present application, and the user may be shown with fig. 9, in which three infectious diseases, i.e., a C-type virus, an a-type coronavirus, and a B-type virus, are exemplified, and the C-type virus, the a-type coronavirus, and the B-type virus belong to contact transmission, and the relationship between the three is the acute infectious disease. However, the C-type virus and the A-type coronavirus belong to respiratory infectious diseases, when the disease type information is determined to be the respiratory infectious diseases, the first contact mode corresponding to the respiratory infectious diseases is air contact by utilizing a knowledge graph, the first transmission medium is spray, dust and aerosol, and the first easily infected people are spray contact people, dust contact people and aerosol contact people. When the disease type information is determined to be the digestive tract infectious disease, searching a second contact mode corresponding to the digestive tract infectious disease by utilizing a knowledge graph to transmit feces, wherein a second transmission medium is the surrounding environment, food and water source polluted by the feces, and a second easily infected crowd is the crowd contacting the feces to pollute the environment, the food and the water source. The user can flexibly select the type of the detected diseases according to the use requirement.

Referring to fig. 10, fig. 10 is a schematic diagram of an optional process of the image classification method according to the embodiment of the present invention, which specifically includes the following steps:

step 1001: obtaining an image to be classified and disease type information, wherein the disease type information comprises: a disease type identifier, and a definitive image bounding box state corresponding to the disease type identifier.

Step 1002: and extracting the characteristics of the training sample set through a characteristic extraction network in the image classification model to obtain the image characteristics of the training sample set.

Step 1003: and classifying the image features of the training sample set through an image classification network in the image classification model to obtain a classification result of the image to be classified.

The classification result of the image to be classified is shown in fig. 11, and the integrity of the detection areas needs to be determined first, if any one of the three types is missing, the user is prompted to invalidate the uploading result. If three types of boxes are present and there are two identification lines, then this would be classified as a positive patient. And if only one identification line is detected, since the C and T characters are not obvious as described above, the relative position of the windows is detected by comparing the sample addition windows to determine whether the identification line is C or T. Because the direction of uploading the image by the user is various, the C line cannot be confirmed to be above the T line, but according to the design of the existing A-type coronal antigen detection reagent, the sample adding window is necessarily below the detection window, so that the T line is considered to be the half of the identification line in the detection window, which is close to the sample adding window, and the C line is considered to be the T line otherwise. The output results are described as follows:

Step 1004: and when the classification result of the image to be classified is the same as the state of the confirmed image boundary box, sending out confirmed alarm information.

Taking the type a coronavirus as an example, when the classification result of the image to be classified is confirmed to be a positive case, the user corresponding to the image to be classified may have symptoms such as reduced muscle strength, sensory symptoms, aphasia, blurred vision, dizziness, headache, nausea, vomiting, cognitive dysfunction, consciousness dysfunction and the like, which indicate that the user may have infected the type a coronavirus, so that the user needs to send out a diagnosis confirming alarm message in time. Collecting an image to be classified through an applet of an instant messaging client, and sending the image to be classified to a cloud server; when the classification result of the image to be classified is the same as the state of the confirmed image boundary box, the registered user information of the instant messaging client is sent to the cloud server so as to track the target object corresponding to the image to be classified through the cloud server, and therefore isolation treatment is carried out on the user corresponding to the image to be classified.

The beneficial technical effects are as follows:

The foregoing description of the embodiments of the invention is not intended to limit the scope of the invention, but is intended to cover any modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims

1. A method of training an image classification model, the method comprising:

2. The method of claim 1, wherein determining a multi-tasking loss function of the image classification model based on a use environment of the image classification model comprises:

Determining the disease type corresponding to the image classification model according to the use environment of the image classification model;

determining a boundary box type corresponding to the image classification model and a classification loss function corresponding to the boundary box type according to the disease type;

obtaining a confidence loss function of the bounding box;

and calculating a multi-task loss function of the image classification model according to the classification loss function and the confidence loss function.

3. The method according to claim 1, wherein the method further comprises:

according to the use environment of the image classification model, carrying out local augmentation processing on the second original sample image to obtain a local augmentation image so as to realize the execution of spatial filtering on local neighborhood information of the second original sample image;

and performing global augmentation processing on the second original sample image to obtain a global augmented image so as to realize adjustment of the definition of the second original sample image.

4. The method according to claim 1, wherein the method further comprises:

determining a dynamic noise threshold value matched with the use environment of the image classification model;

Denoising the training sample set according to the dynamic noise threshold value to form a training sample set matched with the dynamic noise threshold value; or,

and determining a fixed noise threshold corresponding to the image classification model, and denoising the training sample set according to the fixed noise threshold to form a training sample set matched with the fixed noise threshold.

5. The method of claim 1, wherein training the image classification model by the training sample set and the task loss function, determining model parameters of the image classification model, comprises:

adjusting network parameters of the image classification model based on the feature vectors of the training sample set and the multi-task loss function;

and determining network parameters of the image classification model until the loss functions of different dimensions corresponding to the image classification model reach corresponding convergence conditions, so as to realize the adaptation of the parameters of the image classification model and the use environment.

6. A method of classifying images, the method comprising:

7. The method of claim 6, wherein the method further comprises:

collecting an image to be classified through an applet of an instant messaging client, and sending the image to be classified to a cloud server;

and when the classification result of the image to be classified is the same as the state of the confirmed image boundary box, the registered user information of the instant messaging client is sent to the cloud server so as to track the target object corresponding to the image to be classified through the cloud server.

8. An image classification model training apparatus, the apparatus comprising:

9. An image classification apparatus, the apparatus comprising:

10. An electronic device, the electronic device comprising:

a memory for storing executable instructions;

a processor for implementing the image classification model training method of any one of claims 1 to 5 or the image classification method of any one of claims 6 to 7 when executing the executable instructions stored in the memory.

11. A computer program product comprising a computer program or instructions which, when executed by a processor, implements the image classification model training method of any one of claims 1 to 5, or implements the image classification method of any one of claims 6-7.

12. A computer readable storage medium storing executable instructions which when executed by a processor implement the image classification model training method of any one of claims 1 to 5 or the image classification method of any one of claims 6 to 7.