CN110689025A

CN110689025A - Image recognition method, device and system, and endoscope image recognition method and device

Info

Publication number: CN110689025A
Application number: CN201910872399.5A
Authority: CN
Inventors: 王晓宁; 付星辉; 尚鸿; 孙钟前
Original assignee: Tencent Healthcare Shenzhen Co Ltd
Current assignee: Tencent Healthcare Shenzhen Co Ltd
Priority date: 2019-09-16
Filing date: 2019-09-16
Publication date: 2020-01-14
Anticipated expiration: 2039-09-16
Also published as: CN110689025B

Abstract

The disclosure provides an image identification method, device and system and an endoscope image identification method and device, and relates to the field of artificial intelligence. The method comprises the following steps: acquiring an original image, and inputting the original image into an image recognition model, wherein the image recognition model comprises a network main body structure and a plurality of output layers which are connected with the network main body structure and correspond to different tasks; extracting features of a target object in the original image through the network main body structure to obtain image features corresponding to the target object; classifying the sub-image features corresponding to the tasks in the image features through the output layers to output classification results and characterization information corresponding to the target object. According to the image recognition method and device, the user can judge the credibility of the classification result obtained by image recognition according to the representation information and experience, the image recognition efficiency and the accuracy of the image recognition result are improved, and the labor cost is further reduced.

Description

Image recognition method, device and system, and endoscope image recognition method and device

Technical Field

The present disclosure relates to the field of artificial intelligence technology, and in particular, to an image recognition method, an endoscope image recognition method, an image recognition apparatus, an endoscope image recognition apparatus, and an image recognition system.

Background

With the revolution of computer technology and the improvement of algorithms, artificial intelligence becomes a strategic development direction of countries in the world after undergoing a development process full of ripples. Medical treatment has gained wide social attention in recent years as one of the most socially and commercially valuable application scenarios of artificial intelligence. By using artificial intelligence, the system can teach the machine how to understand and understand the world, thereby assisting doctors in diagnosing diseases. According to the statistics of relevant departments, more than 90% of medical data is currently derived from medical images, and the medical image data has become one of the indispensable "evidences" for the diagnosis of doctors. How to utilize medical image mass data to assist doctors in disease diagnosis and improve the diagnosis efficiency of doctors are the key points of attention of the majority of scientific researchers.

At present, after medical images are identified, deep learning can generally give a corresponding prediction result, whether a certain disease exists, but the reliability of the prediction result is uncertain, and if the result output by a machine learning model is relied on, the auxiliary diagnosis of a doctor is interfered, so that the doctor needs to diagnose the corresponding disease type according to other apparent characteristics when diagnosing the disease.

In view of this, there is a need in the art to develop a new image recognition method.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The embodiment of the disclosure provides an image identification method, an image identification device, an image identification system, an endoscope image identification method and an endoscope image identification device, and further corresponding representation information can be output at least to a certain extent while an identification result is output, so that a user can judge the reliability of the identification result according to the representation information, the image identification efficiency and the accuracy of the identification result are improved, and the manual identification cost is reduced.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to an aspect of an embodiment of the present disclosure, there is provided an image recognition method including: acquiring an original image, and inputting the original image into an image recognition model, wherein the image recognition model comprises a network main body structure and a plurality of output layers which are connected with the network main body structure and correspond to different tasks; extracting features of a target object in the original image through the network main body structure to obtain image features corresponding to the target object; classifying the sub-image features corresponding to the tasks in the image features through the output layers to output classification results and characterization information corresponding to the target object.

According to an aspect of an embodiment of the present disclosure, there is provided an image recognition apparatus including: the image acquisition module is used for acquiring an original image and inputting the original image into an image recognition model, wherein the image recognition model comprises a network main body structure and a plurality of output layers which are connected with the network main body structure and correspond to different tasks; the feature extraction module is used for extracting features of a target object in the original image through the network main body structure so as to obtain image features corresponding to the target object; and the classification output module is used for classifying the sub-image features corresponding to the tasks in the image features through the output layers so as to output the classification result and the representation information corresponding to the target object.

In some embodiments of the present disclosure, the network body structure comprises a starting convolutional layer, a plurality of dense convolutional neural network modules, a transition layer for connecting adjacent dense convolutional neural network modules, and a terminating pooling layer; based on the foregoing, the feature extraction module is configured to: performing feature extraction on the target object through the initial convolutional layer to obtain first feature information; performing feature extraction on the first feature information through the dense convolutional neural network module and the transition layer which are sequentially connected to obtain second feature information, wherein output information of the dense convolutional neural network module comprises image features extracted by each feature extraction layer in the dense convolutional neural network module, and the transition layer is used for performing downsampling on the output information of the dense convolutional neural network module; and performing global average pooling on the second feature information through the termination pooling layer to obtain image features corresponding to the target object.

In some embodiments of the present disclosure, the output layer comprises a fully connected layer and a normalization layer; based on the foregoing solution, the classification output module includes: determining a target output layer from the plurality of output layers, acquiring a target task corresponding to the target output layer, and acquiring a target sub-image feature corresponding to the target task from the image feature according to the target task; fully connecting the target sub-image features through the full-connection layer to obtain third feature information; normalizing the sub-feature information in the third feature information through the normalization layer to acquire a probability value corresponding to the sub-feature information; and determining output information corresponding to the target task according to the probability value, and taking the output information as the classification result or the representation information.

In some embodiments of the present disclosure, based on the foregoing solution, the image recognition apparatus further includes: a first training sample acquisition module, configured to acquire a training data set, where the training data set includes an image sample and a plurality of label samples corresponding to the image sample, where each label sample corresponds to each task; and the first model training module is used for training the image recognition model to be trained according to the image sample and the label sample so as to obtain the image recognition model.

In some embodiments of the present disclosure, based on the foregoing, the first model training module is configured to: determining a target label sample from the label samples according to a target task; inputting the image sample into the image recognition model to be trained, and performing feature extraction on a target object in the image sample through the image recognition model to be trained so as to enable an output layer corresponding to the target task to output prediction information; and determining a loss value according to the prediction information, the target label sample and a loss function, and optimizing parameters of the to-be-trained image recognition model to minimize the loss value so as to complete training of the to-be-trained image recognition model.

In some embodiments of the present disclosure, based on the foregoing solution, the image recognition apparatus further includes: the first initialization module is used for acquiring model parameters of an image recognition model obtained based on natural image training and initializing the network main body structure by taking the model parameters as initial values; and the second initialization module is used for initializing the output layer in a random initialization mode.

According to an aspect of an embodiment of the present disclosure, there is provided an endoscopic image recognition method including: acquiring an original endoscope image, and inputting the original endoscope image into an image recognition model, wherein the image recognition model comprises a network main body structure and a plurality of output layers which are connected with the network main body structure and correspond to different tasks; performing feature extraction on the focus in the original endoscope image through the network main body structure to acquire image features corresponding to the focus; classifying sub-image features corresponding to the tasks in the image features through the output layers to output a diagnosis result and auxiliary diagnosis information corresponding to the focus.

According to an aspect of an embodiment of the present disclosure, there is provided an endoscopic image recognition apparatus including: the endoscope image acquisition module is used for acquiring an original endoscope image and inputting the original endoscope image into an image recognition model, wherein the image recognition model comprises a network main body structure and a plurality of output layers which are connected with the network main body structure and correspond to different tasks; the image feature extraction module is used for performing feature extraction on the focus in the original endoscope image through the network main body structure so as to acquire image features corresponding to the focus; and the image classification output module is used for classifying the sub-image features corresponding to the tasks in the image features through the output layers so as to output a diagnosis result and auxiliary diagnosis information corresponding to the focus.

In some embodiments of the present disclosure, based on the foregoing solution, the different tasks include: the disease type classification task, the focus color degree classification task, the focus edge classification task and the focus recession degree classification task.

In some embodiments of the present disclosure, the network body structure comprises a starting convolutional layer, a plurality of dense convolutional neural network modules, a transition layer for connecting adjacent dense convolutional neural network modules, and a terminating pooling layer; based on the foregoing scheme, the image feature extraction module is configured to: performing feature extraction on the focus through the initial convolutional layer to obtain a first image feature; performing feature extraction on the first feature information through the dense convolutional neural network module and the transition layer which are sequentially connected to obtain second feature information, wherein output information of the dense convolutional neural network module comprises image features extracted by each feature extraction layer in the dense convolutional neural network module, and the transition layer is used for performing downsampling on the output information of the dense convolutional neural network module; and performing global average pooling on the second feature information through the termination pooling layer to obtain image features corresponding to the lesions.

In some embodiments of the present disclosure, based on the foregoing, the image classification output module is configured to: fully connecting and normalizing the first sub-image features related to the disease type in the image features through an output layer corresponding to the disease type classification task to determine the diagnosis result; and meanwhile, performing full connection and normalization processing on second sub-image features related to focus color, focus edge or focus surface morphology in the image features through an output layer corresponding to the focus color degree classification task, the focus edge classification task or the focus recession degree classification task to determine the auxiliary diagnosis information.

In some embodiments of the present disclosure, based on the foregoing, the endoscopic image recognition apparatus further includes: a second training sample acquisition module, configured to acquire an endoscope image training sample set, where the endoscope image training sample set includes endoscope image samples and a plurality of label samples corresponding to the endoscope image samples, where each label sample corresponds to each task; the target label determining module is used for determining a target label sample from the label samples according to a target task; the second model training module is used for inputting the endoscope image sample into an image recognition model to be trained, and performing feature extraction on a focus in the endoscope image sample through the image recognition model to be trained so as to enable an output layer corresponding to the target task to output prediction information; and determining a loss value according to the prediction information, the target label sample and a loss function, and optimizing parameters of the to-be-trained image recognition model to minimize the loss value so as to complete training of the to-be-trained image recognition model.

In some embodiments of the present disclosure, based on the foregoing, the endoscopic image recognition apparatus may be further configured to: and alternately carrying out image recognition corresponding to each task on the endoscope image sample through the image recognition model to be trained.

According to an aspect of an embodiment of the present disclosure, there is provided an image recognition system including: the shooting device is used for acquiring image signals to generate an original image containing a target object; an image recognition device connected to the photographing device for receiving the original image, and comprising one or more processors and a storage device, wherein the storage device is configured to store one or more programs, which when executed by the one or more processors, cause the one or more processors to perform the image recognition method or the endoscopic image recognition method as described in the above embodiments on the original image; and the display device is connected with the image recognition device and used for receiving the image recognition result output by the image recognition device and displaying the image recognition result on a display screen of the display device.

In the technical solutions provided by some embodiments of the present disclosure, feature extraction is performed on a target object in an original image through an image recognition model, where the image recognition model includes a network main body structure and a plurality of output layers corresponding to different tasks, first, feature extraction is performed on the target object in the original image through the network main body structure to generate image features corresponding to the target object, and then, classification is performed according to sub-image features corresponding to different tasks through the output layers corresponding to different tasks to output a classification result and characterization information corresponding to the target object. According to the technical scheme, the classification result of the target object can be output, meanwhile, the representation information of the target object can be output, the user is helped to determine the reliability of the classification result, the image recognition efficiency and the accuracy of the image recognition result are improved, and the cost of manual recognition and marking is further reduced.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty. In the drawings:

fig. 1 shows a schematic diagram of an exemplary system architecture to which technical aspects of embodiments of the present disclosure may be applied;

FIG. 2 schematically shows a flow diagram of an image recognition method according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a structural schematic of a plurality of single task learning and multi-task learning, according to one embodiment of the present disclosure;

FIG. 4 schematically illustrates a structural schematic of a network body structure according to one embodiment of the present disclosure;

FIG. 5 schematically shows a flow diagram of network principal structure extraction image features according to one embodiment of the present disclosure;

FIG. 6 schematically shows a flow diagram of output layer classification according to one embodiment of the present disclosure;

FIG. 7 schematically illustrates a flow diagram of an endoscopic image recognition method according to one embodiment of the present disclosure;

FIG. 8 schematically illustrates a structural diagram of an image recognition model according to one embodiment of the present disclosure;

FIG. 9 schematically illustrates a training flow diagram of an image recognition model to be trained, according to one embodiment of the present disclosure;

FIG. 10 schematically illustrates a training flow diagram of an image recognition model to be trained, according to one embodiment of the present disclosure;

FIG. 11 schematically illustrates a block diagram of an image recognition apparatus according to one embodiment of the present disclosure;

FIG. 12 schematically illustrates a block diagram of an image recognition device according to one embodiment of the present disclosure;

FIG. 13 schematically illustrates a block diagram of an image recognition system according to one embodiment of the present disclosure;

fig. 14 shows a schematic structural diagram of a computer system suitable for implementing the image recognition apparatus of the embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solutions of the embodiments of the present disclosure may be applied.

As shown in fig. 1, system architecture 100 may include terminal device 101, network 102, and server 103. Network 102 is the medium used to provide communication links between terminal devices 101 and server 103. Network 102 may include various connection types, such as wired communication links, wireless communication links, and so forth.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired. For example, the server 103 may be a server cluster composed of a plurality of servers. The terminal apparatus 101 may be a photographing device with an imaging unit such as a video camera, a still camera, a smart phone, an endoscope, and the like, and an image containing a target object may be acquired by the terminal apparatus 101.

In an embodiment of the present disclosure, after the terminal device 101 acquires an original image containing a target object, the original image may be sent to the server 103 through the network 102, and after the server 103 acquires the original image, the server 103 may perform image recognition on the target object in the original image to acquire a classification result and characterization information corresponding to the target object in the original image, and specifically, may perform image recognition through an image recognition model loaded in the server 103, where the image recognition model includes a network body structure and a plurality of output layers connected to the network body structure and corresponding to different tasks, where the network body structure performs feature extraction on the target object in the original image to acquire image features corresponding to the target object, and then classifies sub-image features corresponding to different tasks in the image features through the output layers corresponding to different tasks, to output classification results and characterization information corresponding to the target object. The classification result is attribute information of the target object, and the characterization information is an auxiliary judgment basis for determining the target object as the classification result. According to the technical scheme of the embodiment of the disclosure, the user can judge the credibility of the classification result obtained by image recognition according to the representation information and experience, the image recognition efficiency and the accuracy of the image recognition result are improved, and the labor cost is further reduced.

The image recognition method and the endoscope image recognition method provided by the embodiments of the present disclosure are generally executed by a server, and accordingly, the image recognition apparatus and the endoscope image recognition apparatus are generally provided in the server. However, in other embodiments of the present disclosure, the image recognition method and the endoscope image recognition method provided by the embodiments of the present disclosure may be executed by a terminal device.

In the related technology in the field, taking the identification of medical images as an example, an image identification model to be trained can be trained according to collected labeled endoscope images, and a classifier is obtained through training by fine-tuning parameters; and then, the unmarked medical image is input into the trained image recognition model to output a prediction result to assist a doctor in disease diagnosis, but the medical image is recognized to predict whether the patient has a certain disease, but the doctor cannot know the basis for distinguishing the model, and the reliability of the prediction result is uncertain, so that when the prediction result is inaccurate, the doctor is easily misled, and the auxiliary diagnosis of the doctor is interfered.

In view of the problems in the related art, the embodiments of the present disclosure provide an image recognition method and an endoscopic image recognition method, which are implemented based on machine learning, which is one of Artificial Intelligence (AI), which is a theory, method, technique, and application system that simulates, extends, and expands human intelligence, senses an environment, acquires knowledge, and uses the knowledge to obtain an optimal result using a digital computer or a machine controlled by a digital computer. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.

The scheme provided by the embodiment of the disclosure relates to an artificial intelligence image recognition technology, and is specifically explained by the following embodiments:

the embodiment of the disclosure firstly provides an image recognition method, which can be applied to the fields of medical image recognition, equipment damage analysis and the like, and the implementation details of the technical scheme of the embodiment of the disclosure are elaborated as follows:

fig. 2 schematically shows a flow diagram of an image recognition method according to an embodiment of the present disclosure, which may be performed by a server, which may be the server 103 shown in fig. 1. Referring to fig. 2, the image recognition method at least includes steps S210 to S230, which are described in detail as follows:

in step S210, an original image is obtained and input to an image recognition model, where the image recognition model includes a network main structure and a plurality of output layers corresponding to different tasks and connected to the network main structure.

In an embodiment of the present disclosure, the terminal device 101 may acquire an original image, where the original image may be formed by shooting a target object through the terminal device 101 and imaging according to a captured image signal through an imaging unit therein, or may be obtained by downloading from a network through the terminal device 101, or may be an image locally stored by the terminal device 101, and the like, and this is not particularly limited in the embodiment of the present disclosure. After the original image sent by the terminal device 101 is acquired, the original image can be input to an image recognition model to recognize and classify and predict the target object in the original image. The original image can be any image, for example, an equipment damage image, wherein a damage position is a target object, the damage type can be determined by identifying, classifying and predicting the damage position, and then whether the damage is to be used continuously or to be scrapped directly after maintenance is determined according to the damage type; of course, the original image may be other types of images, such as medical images, animal and plant tissue images, and so on.

In one embodiment of the present disclosure, the image recognition model includes a network main body structure and a plurality of output layers corresponding to different tasks, wherein the network main body structure is used for performing feature extraction on a received original image to obtain image features corresponding to a target object in the original image; the output layer is used for carrying out classification prediction according to partial sub-image features in the image features output by the network main body structure so as to output a classification result or characterization information corresponding to the target object. In the embodiment of the disclosure, in order to obtain the prediction result and simultaneously obtain information for assisting in judging the reliability of the prediction result, multi-task learning is combined on the basis of image recognition, a network main body structure is a part of a model structure shared by a plurality of tasks, and an output layer is an independent model structure corresponding to each task, that is, the image recognition model in the embodiment of the disclosure includes a shared structure and an independent structure corresponding to the plurality of tasks.

Fig. 3 shows a structural schematic diagram of a plurality of single-task learning and multi-task learning, as shown in fig. 3, a plurality of single-task learning models are provided on the left side of an arrow, each learning model corresponds to a task, when multi-task prediction is required to be performed on an input information, the input information needs to be respectively input to the learning models corresponding to the tasks, and the input information is processed by the learning models to output prediction results corresponding to the tasks; the right side of the arrow is multi-task learning, only one learning model corresponding to a plurality of tasks is provided, each task can share a part of model parameters, and meanwhile each task has independent output layer parameters. The shared parameters can save calculation amount and simultaneously offset noise in each task, so that the generalization capability of the model is improved, overfitting is reduced, the independent output layer can obtain the most relevant characteristics for the task in the shared part and learn the specific classification limit of each task, the model has enough flexibility, and high accuracy can be obtained for complex tasks such as image recognition.

In step S220, feature extraction is performed on the target object in the original image through the network body structure to obtain an image feature corresponding to the target object.

In one embodiment of the present disclosure, the image recognition model may be any neural network model used for image recognition, such as CNN, R-CNN, Faster R-CNN, VGG, and the like. In the embodiment of the present disclosure, the image recognition model is a Dense convolutional neural network model, wherein the network body structure includes a start convolutional layer, a plurality of Dense convolutional neural network modules, a transition layer for connecting adjacent Dense convolutional neural network modules, and a termination pooling layer, fig. 4 shows a schematic structural diagram of the network body structure, and as shown in fig. 4, the network body structure 400 includes a start convolutional layer 401, a first sense Block402, a transition layer 403, a second sense Block 404, a transition layer 405, a third sense Block406, a transition layer 407, a fourth sense Block 408, and a termination pooling layer 409 in this order.

Next, the flow of image feature extraction is explained based on the network subject structure shown in fig. 4, fig. 5 is a schematic diagram of the flow of extracting image features by the network subject structure, and as shown in fig. 5, the flow at least includes steps S501-S503, specifically:

in step S501, feature extraction is performed on the original image by the start convolution layer to acquire first feature information.

In one embodiment of the present disclosure, an original image is first input into a starting convolutional layer 401 of the network body structure 400, and the starting convolutional layer 401 may perform feature extraction on the original image with a convolutional kernel of a preset size and a preset stride, for example, the size of the convolutional kernel corresponding to the starting convolutional layer 401 may be set to 7 × 7, and the stride may be set to 2. By performing image extraction on the target object in the original image by the start convolution layer 401, first feature information corresponding to the target image can be acquired.

In step S502, feature extraction is performed on the first feature information through a dense convolutional neural network module and a transition layer that are connected in sequence to obtain second feature information, where output information of the dense convolutional neural network module includes image features extracted by each feature extraction layer in the dense convolutional neural network module, and the transition layer is used to perform downsampling on output information of the dense convolutional neural network module.

In one embodiment of the present disclosure, the start convolutional layer 401 outputs the first feature information to the first sense Block402 connected thereto, and the first feature information is processed by the first sense Block402 to obtain the first output feature. The first sense Block402 includes a plurality of convolution layers, for example, six convolution layers, the first convolution layer performs feature extraction on the first feature information, and transmits the output information to the second to sixth convolution layers; the second convolution layer receives information formed by nonlinear transformation of the information output by the first convolution layer, performs characteristic extraction on the information, and simultaneously transmits the output information to a third convolution layer to a sixth convolution layer; after receiving the information obtained by carrying out nonlinear conversion on the information output by the first convolution layer and the second convolution layer, the third convolution layer carries out feature extraction on the information and transmits the output information to the fourth convolution layer to the sixth convolution layer; and repeating the steps until a first output characteristic output by performing nonlinear transformation on the information output by the first convolution layer to the sixth convolution layer is obtained, wherein the nonlinear transformation can be specifically executed by a structure consisting of a batch normalization layer, an activation layer and the convolution layer. Then, the first output feature is input into the transition layer 403 connected to the first detect Block402, and the first output feature includes features output by each convolutional layer in the first detect Block402, so that the dimensionality of the first output feature is large, and in order to improve the computational efficiency of the system, the output feature output by the detect Block needs to be downsampled through the transition layer to reduce the data dimensionality, specifically, the transition layer 403 includes a convolutional layer with a size of 1 × 1 and a pooling layer, first, feature extraction is performed on the first output feature through the convolutional layer with a size of 1 × 1 to perform preliminary dimension reduction processing on the size of the feature map, and then, 2 × 2 average pooling is performed on the first output feature after preliminary dimension reduction through the pooling layer with a step size of 2, so as to realize further dimension reduction of the first output feature.

Further, the second sense Block 404, the transition layer 405, the third sense Block406, the transition layer 407, and the fourth sense Block 408 perform convolution operation or downsampling on the features output by the received neighboring structure as described above until the fourth sense Block 408 outputs the second feature information corresponding to the target object.

In step S503, the second feature information is globally averaged pooled by terminating the pooling layer to acquire an image feature corresponding to the target object.

In an embodiment of the present disclosure, after the termination pooling layer 409 receives the second feature information, pooling the second feature information to obtain an image feature corresponding to the target object, in the embodiment of the present disclosure, the pooling performed on the second feature information may specifically be global average pooling, but may also be other pooling manners, which is not specifically limited in this embodiment of the present disclosure.

It should be noted that the structure of each Dense convolutional neural network module included in the network body structure in the embodiment of the present disclosure may be the same or different, as in the network body structure shown in fig. 4, the first transmit Block402, the second transmit Block 404, the third transmit Block406, and the fourth transmit Block 408 may have the same number of convolutional layers, for example, all have 6 convolutional layers, or may have different numbers of convolutional layers, for example, the first transmit Block402 includes 6 convolutional layers, the second transmit Block 404 includes 12 convolutional layers, the third transmit Block406 includes 24 convolutional layers, and the fourth transmit Block 408 includes 16 convolutional layers, and the convolutional cores of each convolutional layer may be the same or different in size. Similarly, each transition layer in the network main structure 400 includes a convolutional layer and a pooling layer, and the size of the convolutional kernel in each transition layer, the pooling manner of the pooling layer, and parameters such as the step size may be the same or different, for example, the transition layer 403, the transition layer 405, and the transition layer 407 in fig. 4 may include a convolutional layer having a convolutional kernel size of 1 × 1 and a pooling layer for performing 2 × 2 global average pooling and having a step size of 2.

In step S230, the sub-image features corresponding to each task in the image features are classified by each output layer to output a classification result and characterization information corresponding to the target object.

In an embodiment of the present disclosure, after the network main body structure performs feature extraction on a target object in an original image to generate image features corresponding to the target image, the image features may be input to the output layer, so that the output layer performs classification according to sub-image features corresponding to each task in the image features, and determines a classification result corresponding to the target object and characterization information for distinguishing the reliability of the classification result.

In an embodiment of the present disclosure, the output layers include a full connection layer and a normalization layer, where the normalization layer is a softmax layer, and each output layer corresponds to a different task, for example, for the field of device damage detection, by judging a damage type according to a damage condition, a criterion for determining the damage type needs to be obtained from the damage type, where for example, the damage type may be corrosion damage or mechanical damage, and the damage of different types is different, and the texture, color, and damage depth of the damage surface corresponding to the damage type are different, so that in order to ensure the reliability of the damage type, it is necessary to predict the texture, color, and damage depth of the damage surface according to the damage image, and assist the user in determining whether the predicted damage type is correct.

Because a plurality of output layers exist and the tasks corresponding to the output layers are different, when the image features are classified and predicted through the output layers, the image features required by the output layers are different. Fig. 6 shows a schematic flow diagram of output layer classification, and as shown in fig. 6, the output layer classification flow at least includes steps S601 to S603, specifically:

in step S601, a target output layer is determined from the plurality of output layers, a target task corresponding to the target output layer is obtained, and a target sub-image feature corresponding to the target task is obtained from the image features according to the target task.

In an embodiment of the present disclosure, any one of all output layers may be used as a target output layer, and after the target output layer is determined, a target task corresponding to the target output layer may be determined, for example, a task corresponding to the target output layer is a damage depth classification, and then the damage depth classification may be determined as the target task. After the target task is determined, when the network main body structure outputs the image characteristics corresponding to the target object, the target image characteristics related to the target task are respectively input into the target output layer, so that the target output layer is classified and predicted according to the target image characteristics. It should be noted that, in the embodiment of the present disclosure, the target task may be determined first, and then the target output layer and the target sub-image feature corresponding to the target task are determined according to the target task.

In step S602, the target image features are fully connected through the full connection layer to obtain third feature information.

In an embodiment of the present disclosure, after obtaining the target image features, the full-link layer may perform full-link on the target image features, and convert the target image features into one-dimensional vectors, where the one-dimensional vectors are the third feature information, and the number of the included elements is the same as the classification number corresponding to the target task.

In step S603, the sub-feature information in the third feature information is normalized by the normalization layer to obtain a probability value corresponding to the sub-feature information.

In an embodiment of the disclosure, the normalization layer may perform normalization processing on each piece of sub-feature information according to each piece of sub-feature information in the third feature information, so as to convert each piece of sub-feature information into a value between 0 and 1, that is, a probability value corresponding to the sub-feature information, where the type information corresponding to the maximum probability value is information finally output by the output layer. For example, the target task is a damage depth classification, the classification corresponding to the damage depth includes shallow classification, deep classification and deep classification, after normalization processing, probability values corresponding to the four classifications are obtained and are respectively 0.3, 0.5, 0.1 and 0.1, and then the final output loss depth can be determined to be shallow.

According to the technical scheme, the input image is processed in a mode of combining an image recognition model with multi-task learning, the classification information and the representation information corresponding to the target object in the input image are obtained, and a user can determine the credibility of the classification information according to the representation information to lay a foundation for subsequent data processing. The technical solution in the embodiment of the present disclosure may also be applied to the field of medical influence, and generally, only a disease prediction result can be obtained according to a medical image, as for the reliability of the disease prediction result, a doctor cannot know that the reliability is low, but the doctor treats a patient according to the disease prediction result, and the consequence is not unreasonable, so the present disclosure further provides an endoscopic image identification method, and fig. 7 schematically illustrates a flowchart of the endoscopic image identification method according to an embodiment of the present disclosure, where the endoscopic image identification method may be executed by a server, and the server may be the server 103 illustrated in fig. 1. Referring to fig. 7, the endoscopic image recognition method at least includes steps S710 to S730, specifically:

in step S710, an original endoscope image is obtained and input to an image recognition model, where the image recognition model includes a network main structure and a plurality of output layers corresponding to different tasks and connected to the network main structure;

in step S720, feature extraction is performed on the lesion in the original endoscopic image through a network main body structure to obtain an image feature corresponding to the lesion;

in step S730, the sub-image features corresponding to each task in the image features are classified by each output layer to output a diagnosis result and auxiliary diagnosis information corresponding to the lesion.

The endoscope image recognition method shown in fig. 7 is similar to the image recognition method shown in fig. 2, and image recognition and classification prediction are performed on an original endoscope image through an image recognition model in combination with multitask learning, so that a diagnosis result and auxiliary diagnosis information corresponding to a focus in the original endoscope image can be obtained, a doctor can determine the reliability of the diagnosis result output by the image recognition model according to the auxiliary diagnosis information, a treatment scheme can be specified to treat a patient on the basis of the diagnosis result when the reliability is determined to be high, the diagnosis result can be abandoned when the reliability is determined to be low, and the focus condition is observed from the endoscope image through a manual recognition mode to determine the disease type.

In one embodiment of the present disclosure, the image recognition model for recognizing endoscope images may include four output layers, each corresponding to a different task, fig. 8 shows a schematic structural diagram of the image recognition model, as shown in fig. 8, the image recognition model 800 includes a network body structure 800-a and an output layer 800-b, wherein the network body structure 800-a sequentially includes a start convolutional layer 801, a first sense Block 802, a transition layer 803, a second sense Block 804, a transition layer 805, a third sense Block 806, a transition layer 807, a fourth sense Block 808 and a termination pooling layer 809, and the output layer 800-b includes a first output layer 810, a second output layer 811, a third output layer 812 and a fourth output layer 813. The first output layer 810 is used for outputting a disease diagnosis result corresponding to a disease type classification task, the second output layer 811 is used for outputting auxiliary diagnosis information corresponding to a lesion color degree classification task, the third output layer 812 is used for outputting auxiliary diagnosis information corresponding to a lesion edge classification task, the fourth output layer 813 is used for outputting auxiliary diagnosis information corresponding to a lesion dent degree classification task, and a doctor can judge the credibility of the disease diagnosis result according to all auxiliary diagnosis information and guide the formulation of a subsequent treatment scheme. Further, the composition and size of each layer in the endoscope image recognition model 800 may be the same as or different from those of the image recognition model described above, and this is not particularly limited in the embodiments of the present disclosure.

Similarly, when the original endoscope image is identified and classified and predicted by adopting the image identification model, firstly, the original endoscope image can be subjected to feature extraction through the initial convolutional layer 801 to obtain a first image feature; then, feature extraction is carried out on the first feature information through a first Dense Block 802, a transition layer 803, a second Dense Block 804, a transition layer 805, a third Dense Block 806, a transition layer 807 and a fourth Dense Block 808 which are connected in sequence to obtain second feature information, and the processing methods of the Dense convolutional neural network module and the transition layer are the same as those of the Dense convolutional neural network module and the transition layer, and are not described again; then pooling the second feature information, such as global average pooling, by terminating pooling layer 809 to obtain image features corresponding to the lesions; finally, receiving first sub-image characteristics related to the disease type in the image characteristics through a first output layer 810, and performing full connection and normalization processing on the first sub-image characteristics to determine a diagnosis result; meanwhile, a second sub-image feature related to the lesion color, the lesion edge, or the lesion surface morphology among the image features is received through the second output layer 811, the third output layer 812, and the fourth output layer 813, respectively, and the full connection and normalization processing is performed on the second sub-image feature to determine auxiliary diagnostic information.

The original endoscope image is processed based on an image recognition model and in a mode of combining multitask learning, a disease diagnosis result and auxiliary diagnosis information can be acquired simultaneously, and a doctor can judge the reliability of the disease diagnosis result according to the auxiliary diagnosis information and clinical experience so as to determine whether to accept the disease diagnosis result. By the technical scheme of the embodiment of the disclosure, more detailed diagnosis basis and credible diagnosis conclusion can be provided while the disease diagnosis result is determined, the diagnosis experience of a doctor is integrated, the doctor is assisted to carry out disease diagnosis to the greatest extent, the occurrence of missed diagnosis and misdiagnosis is reduced, and the practicability of the computer disease auxiliary diagnosis system is improved.

Before the image recognition model is used for feature extraction of the original image and the original endoscope image, the image recognition model to be trained needs to be trained to form a stable image recognition model. Specifically, a training data set is firstly obtained, wherein the training data set comprises an image sample and a plurality of label samples corresponding to the image sample, and each label sample corresponds to each task; and then training the image recognition model to be trained according to the image sample and the label sample to obtain the image recognition model.

In an embodiment of the present disclosure, for an image recognition model having a plurality of output layers corresponding to different tasks, when the model is trained, each task needs to be performed alternately to adjust a shared parameter and an independent parameter of the image recognition model to be trained, so as to obtain a stable image recognition model.

Next, a model training process in the embodiment of the present disclosure will be described using an endoscopic image as an example. Fig. 9 shows a schematic diagram of a training flow of an image recognition model to be trained, and as shown in fig. 9, a model training process includes steps S901 to S904, specifically:

in step S901, a training data set is acquired.

In one embodiment of the present disclosure, the training data set includes an image sample and a plurality of labeled samples of manual labeling corresponding to the image sample, for example, for an endoscopic image, the endoscopic image identification corresponds to four tasks, and accordingly, there are four labeled samples of manual labeling corresponding to the labeled samples, and the four labels respectively correspond to one task, for example, the four labels may be: ulcerated disease, toxic redness in color, jagged edges and depressed central lesions. When the endoscope image is labeled manually, the following labeling rules can be set according to the disease characteristic expression in the image: the disease types are: 0 (normal), 1 (inflammatory), 2 (ulcer), 3 (tumor), 4 (others); color: 0 (normal), 1 (mild redness), 2 (moderate redness), 3 (severe redness bleeding), 4 (red-white interphase), 5 (white), 6 (orange), 7 (blue); edge: 0 (clear), 1 (gradually unclear), 2 (obviously unclear), 3 (zigzag); lesion dishing degree: 0 (flat), 1 (slight concavity), 2 (concavity), 3 (slight convexity), 4 (convexity), 5 (central dimple, peripheral dimple), 6 (central dimple, peripheral convexity). The reference staff can label the endoscopic image according to the above-mentioned rule to form label information corresponding to the endoscopic image.

In step S902, an image recognition model to be trained is initialized.

In one embodiment of the present disclosure, the shared network layer parameters in the image recognition model may be initialized using the parameters of the image recognition model trained on the natural images as initial values. The natural image may be any image, and is not limited to the type of image processed by the subsequent model, and may be, for example, an ImageNet data set. Because the network main structure in the image recognition model is a network layer shared by a plurality of tasks, and the output layer corresponds to each task, parameters in the network main structure can be initialized by adopting the parameters of the trained image recognition model, and each output layer can be initialized randomly, namely by random assignment.

In step S903, the image sample is input to the image recognition model to be trained, and feature extraction is performed on the target object in the image sample through the image recognition model to be trained, so that the output layer corresponding to the target task outputs prediction information.

In one embodiment of the present disclosure, in the training process, different tasks are performed alternately, and by adjusting the model parameters until the model converges, a target task should be determined before training, and then the to-be-trained image recognition model including the output layer corresponding to the target task is trained. During training, inputting an image sample into an image recognition model to be trained, performing feature extraction on the image sample through a network main body structure in the image recognition model to be trained, outputting image features corresponding to a target object to an output layer, and performing full connection and normalization on sub-image features related to a target task in the image features through the output layer to output corresponding prediction information. For example, if the image sample is an endoscopic image sample and the target task is a color classification task of a lesion, feature extraction may be performed on the lesion in the endoscopic image sample through a network body structure to obtain an image feature corresponding to the lesion; and then, acquiring the sub-image characteristics related to the focus color in the image characteristics by an output layer corresponding to the focus color classification task, and outputting the color classification with the highest matching degree after fully connecting and normalizing the sub-image characteristics.

In step S904, a loss value is determined according to the prediction information, the target label sample and the loss function, and the loss value is minimized by optimizing parameters of the image recognition model to be trained, so as to complete training of the image recognition model to be trained.

In an embodiment of the present disclosure, after obtaining a prediction result output by an image recognition model to be trained, a loss value may be determined according to the prediction result, a target label sample, and a loss function, and parameters in the image recognition model to be trained are optimized by an optimizer, so as to minimize the loss value, that is, make the loss function converge, where the loss function may be a cross entropy loss function or the like, and meanwhile, the optimizer may perform training by using a method such as random gradient descent, or may perform training by using other methods; and finally, the parameters in the shared structure and the unique structure are updated through back propagation. By alternately performing different tasks, when the loss functions of the image recognition models corresponding to each task are all converged, the training of the image recognition models to be trained is completed.

Fig. 10 is a schematic diagram illustrating a training flow of an image recognition model to be trained, and as shown in fig. 10, an original endoscope image is input into a network main body structure in the image recognition model to be trained, and a target object in the original endoscope image is subjected to image recognition through the network main body structure, so as to obtain an image feature corresponding to the target object; classifying according to the sub-image features related to each task in the image features through an output layer corresponding to each task to obtain classification results corresponding to each task, such as a disease classification result, a color degree classification result, an edge classification result and a sag degree classification result in fig. 10; then, respectively comparing the target label information corresponding to each task with the classification result, and determining a loss value, for example, comparing the disease classification result with the target disease classification result, wherein the target disease classification result has ulcer diseases, and determining a disease classification loss value; comparing the color degree classification result with the severe redness of the target color degree classification result to determine a color classification loss value; comparing the edge classification result with the target edge classification result in a zigzag manner, and determining an edge classification loss value; comparing the depression degree classification result with the central depression of the target depression degree classification result to determine a depression degree classification loss value; and finally, optimizing the disease classification loss through a first optimizer to determine a parameter which enables a disease classification loss function to have a minimum value, and updating the network main body structure parameter and the output layer parameter corresponding to the disease classification task by adopting the parameter, similarly, optimizing the color degree classification loss, the edge classification loss and the dent degree classification loss through a second optimizer, a third optimizer and a fourth optimizer respectively to determine a parameter which enables the color degree classification loss function, the edge classification loss function and the dent degree classification loss function to have the minimum value, and updating the network main body structure parameter and the output layer parameter corresponding to the color degree classification task, the edge classification task or the dent degree classification task by adopting the parameter.

It is worth noting that in the training process, the four tasks alternately adjust the model parameters until convergence, and when any task is performed, the image sample and the corresponding label sample corresponding to the task are input into the image recognition model to be trained, and the other label samples are input when the corresponding task is performed.

Embodiments of the apparatus of the present disclosure are described below, which may be used to perform the image recognition methods in the above-described embodiments of the present disclosure. For details that are not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the image recognition method described above in the present disclosure.

Fig. 11 schematically shows a block diagram of an image recognition apparatus according to an embodiment of the present disclosure.

Referring to fig. 11, an image recognition apparatus 1100 according to an embodiment of the present disclosure includes: an image acquisition module 1101, a feature extraction module 1102 and a classification output module 1103.

The image acquisition module 1101 is configured to acquire an original image, and input the original image to an image recognition model, where the image recognition model includes a network main body structure and a plurality of output layers connected to the network main body structure and corresponding to different tasks; a feature extraction module 1102, configured to perform feature extraction on a target object in the original image through the network main body structure to obtain an image feature corresponding to the target object; a classification output module 1103, configured to classify, by each output layer, the sub-image features corresponding to each task in the image features, so as to output a classification result and characterization information corresponding to the target object.

In one embodiment of the present disclosure, the network body structure comprises a starting convolutional layer, a plurality of dense convolutional neural network modules, a transition layer for connecting adjacent dense convolutional neural network modules, and a terminating pooling layer; the feature extraction module 1102 is configured to: performing feature extraction on the target object through the initial convolutional layer to obtain first feature information; performing feature extraction on the first feature information through the dense convolutional neural network module and the transition layer which are sequentially connected to obtain second feature information, wherein output information of the dense convolutional neural network module comprises image features extracted by each feature extraction layer in the dense convolutional neural network module, and the transition layer is used for performing downsampling on the output information of the dense convolutional neural network module; and performing global average pooling on the second feature information through the termination pooling layer to obtain image features corresponding to the target object.

In one embodiment of the present disclosure, the output layer includes a fully connected layer and a normalization layer; the classification output module 1103 includes: determining a target output layer from the plurality of output layers, acquiring a target task corresponding to the target output layer, and acquiring a target sub-image feature corresponding to the target task from the image feature according to the target task; fully connecting the target sub-image features through the full-connection layer to obtain third feature information; normalizing the sub-feature information in the third feature information through the normalization layer to acquire a probability value corresponding to the sub-feature information; and determining output information corresponding to the target task according to the probability value, and taking the output information as the classification result or the representation information.

In one embodiment of the present disclosure, the image recognition apparatus 1100 further includes: a first training sample acquisition module, configured to acquire a training data set, where the training data set includes an image sample and a plurality of label samples corresponding to the image sample, where each label sample corresponds to each task; and the first model training module is used for training the image recognition model to be trained according to the image sample and the label sample so as to obtain the image recognition model.

In one embodiment of the disclosure, the first model training module is configured to: determining a target label sample from the label samples according to a target task; inputting the image sample into the image recognition model to be trained, and performing feature extraction on a target object in the image sample through the image recognition model to be trained so as to enable an output layer corresponding to the target task to output prediction information; and determining a loss value according to the prediction information, the target label sample and a loss function, and optimizing parameters of the to-be-trained image recognition model to minimize the loss value so as to complete training of the to-be-trained image recognition model.

In one embodiment of the present disclosure, the image recognition apparatus 1100 further includes: the first initialization module is used for acquiring model parameters of an image recognition model obtained based on natural image training and initializing the network main body structure by taking the model parameters as initial values; and the second initialization module is used for initializing the output layer in a random initialization mode.

Fig. 12 schematically shows a block diagram of an endoscopic image recognition apparatus according to an embodiment of the present disclosure.

Referring to fig. 12, an endoscopic image recognition apparatus 1200 according to one embodiment of the present disclosure includes: an endoscopic image acquisition module 1201, an image feature extraction module 1202, and an image classification output module 1203.

The endoscope image acquisition module 1201 is configured to acquire an original endoscope image and input the original endoscope image to an image recognition model, where the image recognition model includes a network main body structure and a plurality of output layers connected to the network main body structure and corresponding to different tasks; an image feature extraction module 1202, configured to perform feature extraction on a lesion in the original endoscope image through the network main body structure to obtain an image feature corresponding to the lesion; an image classification output module 1203, configured to classify, through each output layer, sub-image features corresponding to each task in the image features, so as to output a diagnosis result and auxiliary diagnosis information corresponding to the focus.

In one embodiment of the present disclosure, the different tasks include: the disease type classification task, the focus color degree classification task, the focus edge classification task and the focus recession degree classification task.

In one embodiment of the present disclosure, the network body structure comprises a starting convolutional layer, a plurality of dense convolutional neural network modules, a transition layer for connecting adjacent dense convolutional neural network modules, and a terminating pooling layer; the image feature extraction module 1202 is configured to: performing feature extraction on the focus through the initial convolutional layer to obtain a first image feature; performing feature extraction on the first feature information through the dense convolutional neural network module and the transition layer which are sequentially connected to obtain second feature information, wherein output information of the dense convolutional neural network module comprises image features extracted by each feature extraction layer in the dense convolutional neural network module, and the transition layer is used for performing downsampling on the output information of the dense convolutional neural network module; and performing global average pooling on the second feature information through the termination pooling layer to obtain image features corresponding to the lesions.

In one embodiment of the present disclosure, the image classification output module 1203 is configured to: fully connecting and normalizing the first sub-image features related to the disease type in the image features through an output layer corresponding to the disease type classification task to determine the diagnosis result; and meanwhile, performing full connection and normalization processing on second sub-image features related to focus color, focus edge or focus surface morphology in the image features through an output layer corresponding to the focus color degree classification task, the focus edge classification task or the focus recession degree classification task to determine the auxiliary diagnosis information.

In one embodiment of the present disclosure, the endoscopic image recognition apparatus 1200 further includes: a second training sample acquisition module, configured to acquire an endoscope image training sample set, where the endoscope image training sample set includes endoscope image samples and a plurality of label samples corresponding to the endoscope image samples, where each label sample corresponds to each task; the target label determining module is used for determining a target label sample from the label samples according to a target task; the second model training module is used for inputting the endoscope image sample into an image recognition model to be trained, and performing feature extraction on a focus in the endoscope image sample through the image recognition model to be trained so as to enable an output layer corresponding to the target task to output prediction information; and determining a loss value according to the prediction information, the target label sample and a loss function, and optimizing parameters of the to-be-trained image recognition model to minimize the loss value so as to complete training of the to-be-trained image recognition model.

In one embodiment of the present disclosure, the endoscopic image recognition device 1200 may be further configured to: and alternately carrying out image recognition corresponding to each task on the endoscope image sample through the image recognition model to be trained.

An embodiment of the present disclosure further provides an image recognition system, fig. 13 shows a schematic structural diagram of the image recognition system, and as shown in fig. 13, an image recognition system 1300 includes: the photographing device 1301, the image recognition device 1302, and the display device 1303, specifically:

a photographing device 1301 for acquiring an image signal to generate an original image containing a target object; an image recognition device 1302 connected to the photographing device 1301 for receiving the original image, and including one or more processors and a storage device, wherein the storage device is configured to store one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors are caused to perform the image recognition method or the endoscopic image recognition method according to the above embodiments on the original image; and a display device 1303 connected to the image recognition device 1302, and configured to receive the image recognition result output by the image recognition device and display the image recognition result on a display screen of the display device.

The image recognition system can be used for recognizing any type of input image to acquire a classification result and characterization information corresponding to a target object in the input image, for example, when the image recognition system is used for recognizing an endoscopic image, an image of a diseased part can be shot through an endoscope lens, the shot endoscopic image is sent to an image recognition device, the endoscopic image is subjected to image recognition and classification according to the endoscopic image recognition method disclosed by the embodiment of the disclosure through the image recognition device to output a disease diagnosis result corresponding to the diseased part and auxiliary diagnosis information for judging the reliability of the diagnosis result, and a doctor can judge the reliability of the diagnosis result according to the auxiliary diagnosis information and clinical experience.

Fig. 14 shows a schematic structural diagram of a computer system suitable for implementing the image recognition device 1303 according to the embodiment of the present disclosure.

It should be noted that the computer system 1400 of the image recognition device 1303 shown in fig. 14 is only an example, and should not bring any limitation to the functions and the application range of the embodiment of the present disclosure.

As shown in fig. 14, a computer system 1400 includes a Central Processing Unit (CPU)1401, which can execute various appropriate actions and processes according to a program stored in a Read-Only Memory (ROM) 1402 or a program loaded from a storage portion 1408 into a Random Access Memory (RAM) 1403, implementing the image labeling method described in the above-described embodiments. In the RAM 1403, various programs and data necessary for system operation are also stored. The CPU 1401, ROM 1402, and RAM 1403 are connected to each other via a bus 1404. An Input/Output (I/O) interface 1405 is also connected to the bus 1404.

The following components are connected to the I/O interface 1405: an input portion 1406 including a keyboard, a mouse, and the like; an output portion 1407 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage portion 1408 including a hard disk and the like; and a communication section 1409 including a network interface card such as a LAN (Local area network) card, a modem, or the like. The communication section 1409 performs communication processing via a network such as the internet. The driver 1410 is also connected to the I/O interface 1405 as necessary. A removable medium 1411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1410 as necessary, so that a computer program read out therefrom is installed into the storage section 1408 as necessary.

In particular, the processes described below with reference to the flowcharts may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 1409 and/or installed from the removable medium 1411. The computer program performs various functions defined in the system of the present disclosure when executed by a Central Processing Unit (CPU) 1401.

It should be noted that the computer readable medium shown in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

As another aspect, the present disclosure also provides a computer-readable medium that may be contained in the image processing apparatus described in the above-described embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method described in the above embodiments.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An image recognition method, comprising:

acquiring an original image, and inputting the original image into an image recognition model, wherein the image recognition model comprises a network main body structure and a plurality of output layers which are connected with the network main body structure and correspond to different tasks;

extracting features of a target object in the original image through the network main body structure to obtain image features corresponding to the target object;

classifying the sub-image features corresponding to the tasks in the image features through the output layers to output classification results and characterization information corresponding to the target object.

2. The image recognition method of claim 1, wherein the network body structure comprises a starting convolutional layer, a plurality of dense convolutional neural network modules, a transition layer for connecting adjacent dense convolutional neural network modules, and a terminating pooling layer;

the extracting features of the target object in the original image through the network main body structure to obtain image features corresponding to the target object includes:

performing feature extraction on the target object through the initial convolutional layer to obtain first feature information;

performing feature extraction on the first feature information through the dense convolutional neural network module and the transition layer which are sequentially connected to obtain second feature information, wherein output information of the dense convolutional neural network module comprises image features extracted by each feature extraction layer in the dense convolutional neural network module, and the transition layer is used for performing downsampling on the output information of the dense convolutional neural network module;

and performing global average pooling on the second feature information through the termination pooling layer to obtain image features corresponding to the target object.

3. The image recognition method of claim 1, wherein the output layer comprises a fully connected layer and a normalization layer;

classifying the sub-image features corresponding to the tasks in the image features through the output layers to output classification results and characterization information corresponding to the target object, including:

determining a target output layer from the plurality of output layers, acquiring a target task corresponding to the target output layer, and acquiring a target sub-image feature corresponding to the target task from the image feature according to the target task;

fully connecting the target sub-image features through the full-connection layer to obtain third feature information;

normalizing the sub-feature information in the third feature information through the normalization layer to acquire a probability value corresponding to the sub-feature information;

and determining output information corresponding to the target task according to the probability value, and taking the output information as the classification result or the representation information.

4. The image recognition method of claim 1, wherein prior to inputting the original image to an image recognition model, the method further comprises:

acquiring a training data set, wherein the training data set comprises an image sample and a plurality of label samples corresponding to the image sample, and each label sample corresponds to each task;

and training an image recognition model to be trained according to the image sample and the label sample to obtain the image recognition model.

5. The image recognition method of claim 4, wherein the training of the image recognition model to be trained according to the image sample and the label sample to obtain the image recognition model comprises:

determining a target label sample from the label samples according to a target task;

inputting the image sample into the image recognition model to be trained, and performing feature extraction on a target object in the image sample through the image recognition model to be trained so as to enable an output layer corresponding to the target task to output prediction information;

and determining a loss value according to the prediction information, the target label sample and a loss function, and optimizing parameters of the to-be-trained image recognition model to minimize the loss value so as to complete training of the to-be-trained image recognition model.

6. The image recognition method of claim 4, prior to training an image recognition model to be trained based on the image samples and the label samples, the method further comprising:

obtaining model parameters of an image recognition model obtained based on natural image training, and initializing the network main body structure by taking the model parameters as initial values;

and initializing the output layer by a random initialization mode.

7. An endoscopic image recognition method comprising:

acquiring an original endoscope image, and inputting the original endoscope image into an image recognition model, wherein the image recognition model comprises a network main body structure and a plurality of output layers which are connected with the network main body structure and correspond to different tasks;

performing feature extraction on the focus in the original endoscope image through the network main body structure to acquire image features corresponding to the focus;

classifying sub-image features corresponding to the tasks in the image features through the output layers to output a diagnosis result and auxiliary diagnosis information corresponding to the focus.

8. The endoscopic image recognition method according to claim 7, wherein the different tasks include: the disease type classification task, the focus color degree classification task, the focus edge classification task and the focus recession degree classification task.

9. The endoscopic image recognition method of claim 8, wherein the network body structure comprises a starting convolutional layer, a plurality of dense convolutional neural network modules, a transition layer for connecting adjacent dense convolutional neural network modules, and a terminating pooling layer;

the extracting features of the focus in the original endoscope image through the network main body structure to obtain image features corresponding to the focus includes:

performing feature extraction on the focus through the initial convolutional layer to obtain a first image feature;

and performing global average pooling on the second feature information through the termination pooling layer to obtain image features corresponding to the lesions.

10. The endoscopic image recognition method according to claim 8, wherein said classifying, by each of the output layers, sub-image features corresponding to each of the tasks among the image features to output a diagnosis result and auxiliary diagnosis information corresponding to the lesion, comprises:

fully connecting and normalizing the first sub-image features related to the disease type in the image features through an output layer corresponding to the disease type classification task to determine the diagnosis result; at the same time, the user can select the desired position,

and performing full connection and normalization processing on second sub-image features related to the focus color, the focus edge or the focus surface morphology in the image features through an output layer corresponding to the focus color degree classification task, the focus edge classification task or the focus recession degree classification task to determine the auxiliary diagnosis information.

11. The endoscopic image recognition method according to claim 7, wherein before inputting the original endoscopic image to an image recognition model, the method further comprises:

acquiring an endoscope image training sample set, wherein the endoscope image training sample set comprises endoscope image samples and a plurality of label samples corresponding to the endoscope image samples, and each label sample corresponds to each task;

inputting the endoscope image sample into an image recognition model to be trained, and performing feature extraction on a focus in the endoscope image sample through the image recognition model to be trained so as to enable an output layer corresponding to the target task to output prediction information;

12. The image recognition method of claim 11, further comprising:

and alternately carrying out image recognition corresponding to each task on the endoscope image sample through the image recognition model to be trained.

13. An image recognition apparatus, comprising:

the image acquisition module is used for acquiring an original image and inputting the original image into an image recognition model, wherein the image recognition model comprises a network main body structure and a plurality of output layers which are connected with the network main body structure and correspond to different tasks;

the feature extraction module is used for extracting features of a target object in the original image through the network main body structure so as to obtain image features corresponding to the target object;

and the classification output module is used for classifying the sub-image features corresponding to the tasks in the image features through the output layers so as to output the classification result and the representation information corresponding to the target object.

14. An endoscopic image recognition apparatus, comprising:

the endoscope image acquisition module is used for acquiring an original endoscope image and inputting the original endoscope image into an image recognition model, wherein the image recognition model comprises a network main body structure and a plurality of output layers which are connected with the network main body structure and correspond to different tasks;

the image feature extraction module is used for performing feature extraction on the focus in the original endoscope image through the network main body structure so as to acquire image features corresponding to the focus;

and the image classification output module is used for classifying the sub-image features corresponding to the tasks in the image features through the output layers so as to output a diagnosis result and auxiliary diagnosis information corresponding to the focus.

15. An image recognition system, comprising:

the shooting device is used for acquiring image signals to generate an original image containing a target object;

an image recognition device connected to the photographing device for receiving the original image, and comprising one or more processors and a storage device, wherein the storage device is configured to store one or more programs that, when executed by the one or more processors, cause the one or more processors to perform the image recognition method of any one of claims 1 to 6 or the endoscopic image recognition method of any one of claims 7 to 12 on the original image;

and the display device is connected with the image recognition device and used for receiving the image recognition result output by the image recognition device and displaying the image recognition result on a display screen of the display device.