CN110647926A

CN110647926A - Medical image stream identification method and device, electronic equipment and storage medium

Info

Publication number: CN110647926A
Application number: CN201910876588.XA
Authority: CN
Inventors: 郭泽豪
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-09-17
Filing date: 2019-09-17
Publication date: 2020-01-03

Abstract

The embodiment of the invention discloses a medical image stream identification method, a medical image stream identification device, electronic equipment and a storage medium; after the medical image stream of the object to be identified is acquired, a local trained image classification engine can be directly called, a plurality of medical image frames in the medical image stream are classified through the trained image classification engine, then the inspection type of the object to be identified is determined according to the classification result of the plurality of medical image frames, and then the determined inspection type and the medical image stream are sent to a server, so that the server calls an artificial intelligence engine corresponding to the inspection type to identify the medical image stream; the scheme can not only improve the processing efficiency, but also simplify the flow and conditions for realization and improve the applicability.

Description

Medical image stream identification method and device, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of communication, in particular to a medical image stream identification method, a medical image stream identification device, electronic equipment and a storage medium.

Background

With the development of Artificial Intelligence (AI), AI is also applied more and more widely in medical image recognition. The existing medical image recognition system generally comprises an image acquisition device, a video box and a background device; the image acquisition equipment is used for acquiring a video signal of an object to be identified and transmitting the video signal to the video box, and the video box is used for extracting an image frame from the video signal and transmitting the image frame to the background equipment for identification. Because the background device is provided with AI engines of various inspection types, such as a colorectal engine, an upper gastrointestinal engine, and the like, to access image frames of different inspection types into corresponding identification modules for processing, the video box needs to attach the inspection types of the image frames when transmitting the image frames to the background device.

Generally, this type of examination requires that a medical staff upload registration information of an object to be recognized to a background device through a Picture Archiving and Communication System (PACS) before acquiring a video signal, and then, a video box transmits an image frame to the background device to obtain the registration information. That is, first, the video box needs to send the image frame to the background device to determine the check type, and then, after receiving the check type returned by the background device, send the check type and the image frame to the background device together to call the corresponding AI engine to identify the image frame.

In the process of researching and practicing the prior art, the inventor of the invention finds that the prior art generally needs to manually input the type of examination, so that the processing efficiency is low; moreover, since the check type needs to be transmitted to the backend device through the PACS, the code in the PACS system needs to be modified to establish the communication between the PACS and the backend device, and for some old PACS systems, the code is difficult to modify, so the prior art cannot implement the scheme, that is, the prior art is complex to implement and has poor applicability.

Disclosure of Invention

The embodiment of the invention provides a medical image stream identification method, a medical image stream identification device, electronic equipment and a storage medium; not only can the processing efficiency be improved, but also the flow and conditions of realization can be simplified, and the applicability thereof can be improved.

The embodiment of the invention provides a medical image stream identification method, which comprises the following steps:

acquiring a medical image stream of an object to be identified;

calling a trained image classification engine, and classifying a plurality of medical image frames in the medical image stream through the trained image classification engine;

determining the inspection type of the object to be identified according to the classification results of the plurality of medical image frames;

and sending the examination type and the medical image stream to a server so that the server calls an AI engine corresponding to the examination type to identify the medical image stream.

Correspondingly, an embodiment of the present invention further provides a medical image stream recognition apparatus, including:

the acquisition unit is used for acquiring a medical image stream of an object to be identified;

the classification unit is used for calling the trained image classification engine and classifying the plurality of medical image frames in the medical image stream through the trained image classification engine;

the determining unit is used for determining the inspection type of the object to be identified according to the classification results of the plurality of medical image frames;

and the sending unit is used for sending the examination type and the medical image stream to a server so that the server calls an AI engine corresponding to the examination type to identify the medical image stream.

Optionally, in some embodiments of the present invention, the classification unit includes a calling subunit, an intercepting subunit, an extracting subunit, and a determining subunit, as follows:

the calling subunit is used for calling the trained image classification engine;

the intercepting subunit is configured to intercept a medical image frame that needs to be currently classified from the medical image stream to obtain a target frame;

the extraction subunit is configured to perform feature extraction on the target frame according to a preset feature channel through a trained image classification engine to obtain feature information of the target frame;

the determining subunit is configured to determine the category of the target frame based on the feature information of the target frame, and trigger the capturing subunit to perform an operation of capturing a medical image frame that needs to be currently classified from the medical image stream until the number of the classified medical image frames meets a preset condition.

Optionally, in some embodiments of the present invention, the classification unit further includes a correction subunit, as follows:

the correction subunit is configured to acquire correlations between the feature channels, correct the feature information of the target frame according to the correlations, and obtain corrected feature information of the target frame;

the determining subunit specifically determines the category of the target frame by using the corrected feature information based on the target frame.

Optionally, in some embodiments of the present invention, the trained image classification engine includes a classification network and a correction network; the extraction subunit is specifically configured to perform feature extraction on the target frame according to a preset feature channel through the classification network to obtain feature information of the target frame;

the correcting subunit is specifically configured to correct, according to the correlation, the feature information of the target frame through the correction network, so as to obtain corrected feature information of the target frame.

Optionally, in some embodiments of the present invention, the correcting subunit is specifically configured to determine a numerical distribution condition between each feature channel according to the feature information of the target frame, to obtain global information, generate a corresponding weight for each feature channel according to the global information, and calibrate the feature information of the target frame in a channel dimension based on the weight of each feature channel, to obtain corrected feature information of the target frame.

Optionally, in some embodiments of the present invention, the correcting subunit is specifically configured to perform global average pooling processing on each feature channel of the feature information of the target frame to obtain global information.

Optionally, in some embodiments of the present invention, the determining unit is specifically configured to calculate a ratio of each category according to the classification result of the plurality of medical image frames, and determine the category with the highest ratio as the examination type of the object to be identified.

Optionally, in some embodiments of the present invention, the determining unit is specifically configured to calculate a ratio of each category according to the classification result of the plurality of medical image frames; if the category with the occupation ratio higher than the preset threshold exists, determining the category with the occupation ratio higher than the preset threshold as the checking type of the object to be identified: and if the category with the occupation ratio higher than the preset threshold value does not exist, returning to execute the operation of classifying the plurality of medical image frames in the medical image stream by the trained image classification engine until the inspection type of the object to be recognized is determined.

Optionally, in some embodiments of the present invention, the medical image stream identification apparatus further includes a convergence unit, as follows:

the acquisition unit is also used for acquiring a plurality of medical image stream samples marked with examination types;

the classification unit is further configured to invoke an image classification engine, and classify a plurality of sample frames in the medical image stream sample by the image classification engine;

the determining unit is further configured to predict an examination type of the medical image stream sample according to the classification results of the multiple sample frames;

and the convergence unit is used for converging the image classification engine based on the predicted inspection type and the marked inspection type to obtain the trained image classification engine.

Optionally, in some embodiments of the present invention, the classifying unit is specifically configured to intercept multiple sample frames from the medical image stream sample according to a preset policy, perform feature extraction on the sample frames according to a preset feature channel through the image classification engine to obtain feature information of the sample frames, and determine the category of the sample frames based on the feature information of the sample frames.

Optionally, in some embodiments of the present invention, the classifying unit is specifically configured to obtain a correlation between feature channels, correct the feature information of the sample frame according to the correlation to obtain corrected feature information of the sample frame, and determine the category of the sample frame based on the corrected feature information of the sample frame.

Optionally, in some embodiments of the present invention, the medical image flow identification apparatus further includes a preprocessing unit, as follows:

the preprocessing unit is used for preprocessing the sample frame to obtain a preprocessed sample frame, and the preprocessing comprises black edge cutting, scaling, graying and whitening;

the classification unit is specifically configured to perform feature extraction on the preprocessed sample frame according to a preset feature channel through the image classification engine to obtain feature information of the sample frame.

Correspondingly, the embodiment of the invention also provides the electronic equipment, which comprises a memory and a processor; the memory stores an application program, and the processor is used for running the application program in the memory to execute the operation in any medical image flow identification method provided by the embodiment of the invention.

In addition, a storage medium stores a plurality of instructions, and the instructions are suitable for being loaded by a processor to execute the steps in any one of the medical image flow identification methods provided by the embodiments of the present invention.

After the medical image stream of the object to be identified is collected, a local trained image classification engine can be directly called, a plurality of medical image frames in the medical image stream are classified through the trained image classification engine, then the inspection type of the object to be identified is determined according to the classification result of the plurality of medical image frames, and then the determined inspection type and the medical image stream are sent to a server so that the server can call an AI engine corresponding to the inspection type to identify the medical image stream; because the medical image frame is not required to be transmitted to the server through the PACS to determine the examination type, but the examination type can be determined directly locally, the code of the PACS is not required to be modified, and the method is not limited by the PACS, so that the realization process and conditions can be simplified, and the applicability is wider; in addition, the inspection type can be automatically identified by the trained image classification engine, so that the processing efficiency can be greatly improved compared with the existing scheme that the inspection type needs to be manually input and uploaded to the server.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic view of a scene of a medical image stream identification method according to an embodiment of the present invention;

fig. 2 is a flowchart of a medical image stream identification method according to an embodiment of the present invention;

fig. 3 is a flow chart of a medical image stream identification method according to an embodiment of the present invention;

FIG. 4 is an exemplary view of an enteroscope image and a gastroscope image in an embodiment of the present invention;

FIG. 5 is a schematic diagram of an image classification engine according to an embodiment of the present invention;

FIG. 6 is a schematic diagram illustrating a classification network in the image classification engine according to an embodiment of the present invention;

FIG. 7 is a schematic structural diagram of a calibration module in a calibration network in the image classification engine according to an embodiment of the present invention;

fig. 8 is another flowchart of a medical image stream identification method according to an embodiment of the present invention;

fig. 9 is an exemplary diagram of a video box accessing an image classification engine after training in the medical image stream recognition method according to the embodiment of the present invention;

fig. 10 is a schematic structural diagram of a medical image flow identification apparatus provided in an embodiment of the present invention;

fig. 11 is a schematic structural diagram of a medical image flow identification apparatus according to an embodiment of the present invention;

fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a medical image stream identification method, a medical image stream identification device, electronic equipment and a storage medium, which can be used for identifying a medical image stream based on artificial intelligence.

The medical image flow identification device can be specifically integrated in electronic equipment such as a terminal and other equipment; specifically, the terminal may be a video box, or a tablet Computer, a notebook Computer, or a Personal Computer (PC); the terminal can be placed according to the requirements of practical application, for example, the terminal can be placed in a hospital, etc.

For example, taking the medical image stream recognition apparatus integrated in a terminal (placed in a hospital) as an example, referring to fig. 1, after a medical image stream of an object to be recognized is collected, a medical image acquisition device may transmit the medical image stream to the terminal, the terminal invokes a trained image classification engine, the trained image classification engine classifies a plurality of medical image frames in the medical image stream, then determines an examination type of the object to be recognized according to a classification result of the plurality of medical image frames, and transmits the examination type and the medical image stream to a server (such as a cloud), the server invokes an AI engine corresponding to the examination type to recognize the medical image stream, for example, if the examination type is colorectal examination, the server invokes a colorectal engine to recognize the medical image stream; if the examination type is upper gastrointestinal examination, the server calls an upper gastrointestinal engine to identify the medical image stream, and the like, and thereafter, the server can return the identification result to the terminal.

The following are detailed below. It should be noted that the description sequence numbers of the following embodiments are not intended to limit the preferred sequences of the embodiments.

The present embodiment will be described from the perspective of a medical image stream recognition apparatus, which may be specifically integrated in an electronic device such as a terminal; the terminal can be a video box, a tablet computer, a notebook computer, or a PC and the like.

A medical image flow identification method, comprising: the method comprises the steps of collecting a medical image stream of an object to be identified, calling a trained image classification engine, classifying a plurality of medical image frames in the medical image stream through the trained image classification engine, determining an examination type of the object to be identified according to a classification result of the plurality of medical image frames, and sending the examination type and the medical image stream to a server so that the server calls an AI engine corresponding to the examination type to identify the medical image stream.

As shown in fig. 2, a specific process of the medical image stream identification method may be as follows:

101. a medical image stream of an object to be identified is acquired.

For example, a medical image stream sent by a medical image acquisition device may be received. The medical image capturing device may include a Magnetic Resonance Imaging (MRI), a Computed Tomography (CT), a colposcope, a gastroscope, or other endoscope devices.

For convenience of description, the frames are referred to as medical image frames in the embodiment of the present invention. The medical image stream can be obtained by image acquisition of the living body tissue by the medical image acquisition equipment and then provided to the medical image stream recognition device.

The living tissue refers to a certain component of a living body (an independent individual, such as a human, a cat, or a dog, which has a living form and can correspondingly reflect external stimuli), such as a brain, a heart, a spleen, a stomach, or a vagina of a human body, or a body organ of another animal.

It should be noted that, when the medical image acquisition device provides the medical image stream to the medical image stream recognition device, a segment of the medical image stream may be provided to the medical image stream recognition device, or may be provided to the medical image stream recognition device in real time frame by frame, and the specific implementation manner may be determined according to the requirements of the actual application.

102. And calling the trained image classification engine, and classifying the plurality of medical image frames in the medical image stream through the trained image classification engine.

The method for classifying the plurality of medical image frames in the medical image stream by the trained image classification engine may be various, for example, the method may specifically be as follows:

(1) and intercepting the current medical image frame to be classified from the medical image stream to obtain a target frame.

For example, if the medical image capturing device transmits the medical image stream of the object to be recognized to the medical image stream recognition device frame by frame after capturing the medical image stream, at this time, the medical image stream recognition device may use the currently received medical image frame as the medical image frame that needs to be classified currently, so as to obtain the target frame.

For another example, if the medical image capturing device transmits a segment of medical image stream (i.e. a plurality of medical image frames) to the medical image stream recognition apparatus after capturing the medical image stream of the object to be recognized, at this time, the medical image stream recognition apparatus may determine a frame from the received segment of medical image stream at random as the target frame, or determine the latest frame of received medical image frame as the target frame, etc. according to a preset policy.

(2) And performing feature extraction on the target frame according to a preset feature channel by the trained image classification engine to obtain feature information of the target frame.

The trained image classification engine is formed by training a plurality of medical image stream samples marked with examination types. The network structure and specific network parameters of the trained image classification engine may be set according to the requirements of practical applications, for example, taking the trained image classification engine includes a classification network as an example, then, at this time, the step "performing feature extraction on the target frame according to a preset feature channel through the trained image classification engine to obtain feature information of the target frame" may include:

and performing feature extraction on the target frame according to a preset feature channel through the classification network to obtain feature information of the target frame.

The classification network may be a Convolutional Neural Network (CNN), and a specific network structure and network parameters of the classification network may also be set according to requirements of practical applications, for example, an acceptance-V4 network (a classification network) or an acceptance-ResNet-V2 network (a classification network) may be used.

Optionally, since the importance degrees of different feature channels are different, and the extracted features have different effects on the current task, in order to improve the classification accuracy, the features useful for the current task can be improved according to the importance degree of each feature channel, and the features with little use on the current task can be suppressed. The importance degree of each feature channel can be embodied by the correlation between the feature channels, that is, the feature information can be "corrected" according to the correlation between the feature channels.

That is, after the step "extracting the features of the target frame according to the preset feature channel by the trained image classification engine to obtain the feature information of the target frame", the method for recognizing the medical image stream may further include:

and acquiring the correlation among all the characteristic channels, and correcting the characteristic information of the target frame according to the correlation to obtain the corrected characteristic information of the target frame.

Specifically, the "correcting" function may be implemented by embedding a correcting module in the classification network, that is, besides the classification network, the trained image classification engine may further include a correcting network, that is, the step "obtaining the correlation between the feature channels, and correcting the feature information of the target frame according to the correlation to obtain the corrected feature information of the target frame" may include:

and acquiring the correlation among all the characteristic channels, and correcting the characteristic information of the target frame through the correction network according to the correlation to obtain the corrected characteristic information of the target frame.

For example, the value distribution status between the feature channels may be determined according to the feature information of the target frame to obtain global information, and then the feature information of the target frame is corrected by the correction network according to the global information to obtain corrected feature information of the target frame. Specifically, the following may be mentioned:

and generating corresponding weights for the characteristic channels according to the global information, and calibrating the characteristic information of the target frame on the channel dimension based on the weights of the characteristic channels to obtain the corrected characteristic information of the target frame.

The global information may also be referred to as a channel descriptor (Squeeze) or a channel descriptor, which may reflect a correlation between feature channels in the classification network, and specifically, may be obtained by performing an Average Pooling (Average Pooling) process on each feature channel of the feature information of the target frame. The network structure of the calibration network may be determined according to the requirements of the actual application, for example, a separation-Excitation network (SENet, Squeeze-and-Excitation Networks) may be specifically adopted.

(3) And determining the category of the target frame based on the characteristic information of the target frame.

For example, the classification network may be specifically used to identify the feature information of the target frame, so as to obtain the category of the target frame.

Optionally, if, in step (2), the feature information of the target frame has been corrected, at this time, the step "determining the category of the target frame based on the feature information of the target frame" may specifically be: and determining the category of the target frame based on the corrected characteristic information of the target frame.

(4) And (4) returning to execute the step of intercepting the medical image frames which need to be classified currently from the medical image stream, namely returning to the step (1) until the number of the classified medical image frames meets the preset condition.

The preset condition can be set according to the requirements of practical application. For example, a value such as "10" or "15" may be set; alternatively, it may be set to a ratio, such as "10% of the total frame number of the medical image stream", or the like.

103. Determining the inspection type of the object to be identified according to the classification results of the plurality of medical image frames; for example, any of the following methods may be specifically adopted:

(1) the first method;

and calculating the occupation ratio of each category according to the classification results of the plurality of medical image frames, and determining the category with the highest occupation ratio as the inspection type of the object to be identified.

For example, taking 10 medical image frames as an example, if the classification result of 8 medical image frames is "colorectal examination" and the classification result of 2 medical image frames is "upper gastrointestinal tract examination", then at this time, it may be determined that the examination type of the object to be identified is "colorectal examination".

(2) The second way;

calculating the occupation ratio of each category according to the classification results of the plurality of medical image frames, and if the category with the occupation ratio higher than a preset threshold exists, determining the category with the occupation ratio higher than the preset threshold as the inspection type of the object to be identified; and if the category with the occupation ratio higher than the preset threshold value does not exist, returning to the step of classifying the plurality of medical image frames in the medical image stream by the trained image classification engine until the inspection type of the object to be recognized is determined.

Wherein, the threshold value can be set according to the requirement of practical application. For example, taking the threshold as 80% specifically, if there are currently 10 medical image frames, if there are at least 8 (10 × 80%: 8) medical image frames whose classification result is a certain examination type (e.g., "colorectal examination"), the examination type of the object to be identified may be determined as the examination type (e.g., "colorectal examination"), otherwise, if there are less than 8 medical image frames, the step of classifying the multiple medical image frames in the medical image stream by the image classification engine may be returned to perform to continue determining the classification of the subsequent medical image frames until the number of medical image frames of the certain examination type reaches 8, at which time the examination type of the object to be identified may be determined as the examination type.

104. And sending the examination type and the medical image stream to a server so that the server calls an AI engine corresponding to the examination type to identify the medical image stream.

The AI engine may be configured to identify a type of pathology for a medical image frame in the medical image stream.

AI refers to Artificial Intelligence (AI), which is a theory, method, technique, and application system that simulates, extends, and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge, and uses the knowledge to obtain the best results. In other words, AI is an integrated technique of computer science that attempts to understand the essence of intelligence and produces a new intelligent machine that can react in a manner similar to human intelligence. AI is to study the design principles and implementation methods of various intelligent machines, so that the machine has the functions of perception, reasoning and decision making.

The AI technology has wide field, both hardware and software technologies. AI base technologies (hardware level) generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The AI software technology mainly includes several directions, such as computer vision technology, speech processing technology, natural language processing technology, and Machine Learning (ML)/deep Learning.

Among them, machine learning and deep learning are the core of AI, which is a fundamental approach for enabling computers to have intelligence, and is applied throughout various fields of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, and/or inductive learning. Through machine learning and deep learning, the computer can simulate or realize the learning behaviors of human beings so as to acquire new knowledge or skills, further reorganize the existing knowledge structure and continuously improve the performance of the computer.

The AI engine of this embodiment is obtained by performing machine learning and deep learning on a large number of medical images labeled with pathological types by using an AI technique. For example, the upper gastrointestinal engine is obtained by performing machine learning and deep learning on a large number of medical images labeled with types of upper gastrointestinal hong kong pathologies, and the colorectal engine is obtained by performing machine learning and deep learning on a large number of medical images labeled with types of colorectal hong kong pathologies, and so on.

It should be noted that the specific type and number of the AI engines may be determined according to the pathology type and the requirement of the practical application, for example, one AI engine may be set for each pathology type. Alternatively, in order to improve the processing efficiency, a plurality of AI engines may be provided for each pathology type, and the like. Of course, the types of AI engines corresponding to the same pathology type also need to be the same. For example, the pathological type of colorectal cancer and the like needs to be determined through colorectal examination, so the pathological type of colorectal cancer and the like can be identified by setting a "colorectal engine", for example, the relevant pathology of the upper digestive tract needs to be determined through upper digestive tract examination, so the relevant pathology of the upper digestive tract can be identified by setting an "upper digestive tract engine", and the like.

For example, taking a colorectal engine and an upper gastrointestinal engine arranged in the server as an example, if the examination type is colorectal examination, the server may call the colorectal engine to identify the medical image stream; if the examination type is upper gastrointestinal examination, the server may call an upper gastrointestinal engine to identify the medical image stream, and so on.

Thereafter, optionally, the server may also return the identification result to the medical image flow identification device, so that the medical image flow identification device provides the identification result to the relevant person, such as the object to be identified or the medical staff.

Optionally, the trained image classification engine may be set in advance by an operation and maintenance person, or may be obtained by self-training of the medical image stream recognition device. Before the step "call the trained image classification engine", the method for recognizing a medical image stream may further include steps S1 to S4, as follows:

and S1, acquiring a plurality of medical image flow samples marked with examination types.

For example, an original medical image stream may be obtained from a database or a network, and then the original medical image stream is labeled with an examination type, so as to obtain a medical image stream sample labeled with the examination type.

For another example, the original medical image stream transmitted by a medical image acquisition device, such as an MRI, colposcope, enteroscope, gastroscope, or other endoscopic device, may be directly received, and then the original medical image stream may be labeled with an examination type, so as to obtain a sample of the medical image stream labeled with the examination type, and so on.

And S2, calling an image classification engine, and classifying the multiple sample frames in the medical image stream sample through the image classification engine.

For example, a plurality of sample frames may be captured from the medical image stream sample according to a preset policy, then feature extraction is performed on the sample frames according to a preset feature channel by the image classification engine to obtain feature information of the sample frames, and the category of the sample frames is determined based on the feature information of the sample frames.

Optionally, in view of the fact that the extracted features of different feature channels have different effects on the current task (such as classification), in order to improve the accuracy of classification, useful features may be further improved according to the importance degree of each feature channel, and features that are not useful for the current task are suppressed. That is, the feature information of the sample frame may be "corrected" (i.e., re-calibrated, or simply re-calibrated) according to the importance of each feature channel, so that the feature information may better serve the current task.

The importance degree of each feature channel can be represented by the correlation among the feature channels, so that the feature information can be corrected according to the correlation among the feature channels. That is, after the step of performing feature extraction on the sample frame according to a preset feature channel by using the image classification engine to obtain feature information of the sample frame, "the medical image stream identification method may further include:

and acquiring the correlation among all the characteristic channels, and correcting the characteristic information of the sample frame according to the correlation to obtain the corrected characteristic information of the sample frame.

Then, at this time, the step "determining the class of the sample frame based on the feature information of the sample frame" may specifically be: the class of the sample frame is determined based on the corrected feature information of the sample frame.

Optionally, because there are differences between different medical image capturing devices, and there may also be some differences or interference factors in the shooting environment, for example, because of the reasons of the endoscopic devices, there may be black borders at the peripheral edges of the sample frames captured in red from the medical image stream captured by the medical image capturing devices, and for example, because of the differences in the illumination intensities of the endoscopic devices in different hospitals, even if the images are of the same type, there are differences in the saturation and intensity of colors, and so on, therefore, in order to improve the accuracy of classification, after the sample frames are captured, the sample frames may be preprocessed, that is, after the step "capture multiple sample frames from the medical image stream sample according to the preset policy", the medical image stream recognition method may further include:

and preprocessing the sample frame, such as black edge cutting, scaling, graying, whitening and the like, to obtain a preprocessed sample frame.

Then, the step of performing, by the image classification engine, feature extraction on the sample frame according to a preset feature channel to obtain feature information of the sample frame may specifically be: and performing feature extraction on the preprocessed sample frame according to a preset feature channel through the image classification engine to obtain feature information of the sample frame.

The graying can eliminate the color characteristics of the image frame, because the color characteristics have little significance to the two-classification learning task; whitening is the normalization of the pixel values for each channel, which speeds up the training of the model while preventing overfitting of the model.

And S3, predicting the examination type of the medical image stream sample according to the classification result of the plurality of sample frames.

For example, the ratio of each category is calculated from the classification results of the plurality of sample frames, and the category with the highest ratio is determined as the examination type of the medical image flow sample.

Alternatively, for example, the occupation ratio of each category may be calculated according to the classification results of the plurality of sample frames, and if there is a category of which the occupation ratio is higher than a preset threshold, the category of which the occupation ratio is higher than the preset threshold is determined as the examination type of the medical image stream sample; and if the category with the occupation ratio higher than the preset threshold value does not exist, returning to execute the step of classifying the plurality of sample frames by the image classification engine until the examination type of the medical image stream sample is determined.

The threshold may be set according to the requirement of the actual application, and the step is similar to step 103 and is not described herein again.

And S4, converging the image classification engine based on the predicted inspection type and the labeled inspection type to obtain the trained image classification engine.

For example, a preset loss function may be specifically adopted, and the image classification engine is converged according to the predicted inspection type and the labeled inspection type, so as to obtain a trained image classification engine.

As can be seen from the above, after the medical image stream of the object to be recognized is acquired, the local trained image classification engine may be directly invoked, the trained image classification engine classifies a plurality of medical image frames in the medical image stream, then the inspection type of the object to be recognized is determined according to the classification result of the plurality of medical image frames, and then the determined inspection type and the medical image stream are sent to the server, so that the server invokes the artificial intelligence engine corresponding to the inspection type to recognize the medical image stream; because the medical image frame is not required to be transmitted to the server through the PACS to determine the examination type, but the examination type can be determined directly locally, the code of the PACS is not required to be modified, and the method is not limited by the PACS (i.e. no PACS), thereby simplifying the implementation process and conditions, and having wider applicability; in addition, the inspection type can be automatically identified by the trained image classification engine, so that the processing efficiency can be greatly improved compared with the existing scheme that the inspection type needs to be manually input and uploaded to the server, meanwhile, misoperation caused by excessive manual participation can be reduced, and the identification accuracy is improved.

The method according to the preceding embodiment is illustrated in further detail below by way of example.

In this embodiment, the medical image stream recognition apparatus is specifically integrated in a video box as an example.

The video box can be composed of a plurality of video boxes, is respectively placed in each consulting room of a hospital, is respectively connected with the medical image acquisition equipment, and can receive the medical image stream transmitted by the medical image acquisition equipment. The trained image classification engine may be installed in a video box, or in a system of a hospital, and may be called by each video box, referring to fig. 3, for convenience of description, in this embodiment, the trained image classification engine is installed in the video box as an example.

The video box may include a plurality of modules, for example, may include an acquisition card, an interface display module (abbreviated as interface display in fig. 3) and an image processing module (abbreviated as image processing in fig. 3), and specifically may be as follows:

the acquisition card is used for receiving a medical image stream or a medical image stream sample transmitted by medical image acquisition equipment, such as endoscope equipment, and transmitting the medical image stream or the medical image stream sample to the interface display module and the image processing module.

The interface display module may be configured to display the acquired medical image stream, and display information and processing results generated by the image processing module in the processing process, such as displaying a classification result and an examination type.

The image processing module can be used for calling the trained image classification engine, classifying a plurality of medical image frames in the medical image stream acquired by the acquisition card through the trained image classification engine, then determining the inspection type of the object to be identified according to the classification result of the plurality of medical image frames, and then sending the determined inspection type and the medical image stream to the server; in addition, during training, the image classification engine can be called, multiple sample frames in the medical image stream sample collected by the collection card are classified through the image classification engine, the inspection type of the medical image stream sample is predicted according to the classification result of the multiple sample frames, and the image classification engine is converged based on the predicted inspection type and the labeled inspection type to obtain the trained image classification engine. Optionally, the information and processing results generated in the processing process, such as classification results and inspection types, may also be transmitted to the interface display module for the user to view.

In addition, as shown in fig. 3, for convenience of description, in the present embodiment, the medical image acquisition apparatus is specifically an endoscope apparatus (which may include an enteroscope, a gastroscope, and the like), and the examination type includes an enteroscope and a gastroscope, for example, and will be described.

Based on the flow framework shown in fig. 3, the following describes the training of the image classification engine and how to implement the medical image stream recognition method based on the trained image classification engine in detail.

And (I) training an image classification engine.

1. Collecting;

firstly, a video box can collect a plurality of medical image stream samples marked with examination types, for example, referring to fig. 3, an acquisition card in the video box can specifically acquire an original medical image stream from a database or a network, or the acquisition card in the video box receives the original medical image stream sent by an endoscope device, then the original medical image stream is marked with the examination types to obtain the medical image stream sample marked with the examination types, and then, on one hand, the medical image stream sample marked with the examination types can be transmitted to an interface display module so that the interface display module can display the medical image stream sample for the related personnel to go in and out for viewing; on the other hand, the medical image stream sample labeled with the examination type can be transmitted to an image processing module in a video box for further processing.

The original medical image stream may be labeled by a relevant person, such as a medical care provider, according to a specific examination type, for example, if the original medical image stream is a medical image stream collected during enteroscopy, the original medical image stream is labeled as enteroscopy, and if the original medical image stream is a medical image stream collected during gastroscopy, the original medical image stream is labeled as gastroscopy, and the like.

Optionally, after the medical image stream sample is obtained, besides taking part of the medical image stream sample as a training set, the part of the medical image stream sample may also be taken as a verification set, so as to verify the trained image classification engine. For example, 30000 pictures may be used as a training set, the training set containing 19647 enteroscopy pictures, 10353 gastroscopy pictures, and 6766 pictures may be used as a verification set containing 3383 enteroscopy pictures, 3383 gastroscopy pictures, and so on.

2. Classifying;

secondly, the video box can call an image classification engine, and classify a plurality of sample frames in the medical image stream sample through the image classification engine.

For example, an image classification engine may be called from a system of a hospital by an image processing module in a video box, then a plurality of sample frames are captured from the medical image stream sample according to a preset policy, then feature extraction is performed on the sample frames according to a preset feature channel through the image classification engine, so as to obtain feature information of the sample frames, and the category of the sample frames is determined based on the feature information of the sample frames.

For example, as shown in fig. 4, if the feature information of the sample frame conforms to the feature of the enteroscopy image, it can be determined that the sample frame is an enteroscopy image, and further, it is determined that the examination type of the sample frame is enteroscopy; for another example, if the feature information of the sample frame matches the features of the gastroscopic image, the sample frame may be determined as the gastroscopic image, and the examination type of the sample frame may be determined as the gastroscopy, and so on.

Optionally, in order to improve the classification accuracy, the extracted feature information of the sample frame may be corrected according to the importance degree of each feature channel, so as to improve the useful features of the current classification task and suppress the less useful features of the current classification task. Specifically, the image classification engine may be implemented by introducing a correction network into the classification network, that is, as shown in fig. 5, the image classification engine may include a classification network and a correction network.

Optionally, the classification network may be implemented by an inclusion-V4 network or an inclusion-ResNet-V2 network, and the correction network may be implemented by SENet. For convenience of description, in this embodiment, a classification network, specifically an inclusion-ResNet-v 2 network, will be described as an example.

As shown in fig. 5 and fig. 6, the classification network (inclusion-ResNet-v 2) may include a plurality of classification modules, such as a classification module a (5 × inclusion-rest-a), a classification module B (10 × inclusion-rest-B), and a classification module C (5 × inclusion-rest-C), and of course, other network layers may also be set according to the requirements of the actual application, such as an Input layer (Input), an initial layer (Stem), an Average Pooling layer (Average position), a random deactivation layer (Dropout), an active layer (Softmax), and the like.

Optionally, because there may be differences in data dimensions input by each network layer, in order to enable dimensions of each network layer to be matched when the network layers are connected to each other, after some classification modules, for example, after the classification module a and the classification module B, a corresponding dimension adjustment layer, such as "Reduction (Reduction)", may be further set, specifically, the Reduction a (Reduction-a) may be set after the classification module a, the Reduction B (Reduction-B) may be set after the classification module B, and the like.

Correspondingly, the correction network may also comprise a plurality of correction modules (e.g. send Block), each corresponding to a classification module of the classification network.

For example, as shown in fig. 5, a correction module a (send-a) may be connected to the classification module a, and is mainly used for correcting the feature information output by the classification module a; similarly, the correction module B (send-B) may be connected to the classification module B and mainly used for correcting the feature information output by the classification module B, and the correction module C (send-C) may be connected to the classification module C and mainly used for correcting the feature information output by the classification module C.

The input layer of the classification network is mainly used for receiving the sample frame (or the target frame), and the initial layer (Stem) is used for performing convolution processing on the sample frame (or the target frame) received by the input layer so as to reduce the number of characteristics of the sample frame (or the target frame) and further reduce the calculation burden of the classification network.

The classification modules can be connected in series, and are mainly used for performing feature extraction on a sample frame (or a target frame) subjected to initial layer processing according to a preset feature channel to obtain feature information of the sample frame (or the target frame), then transmitting the feature information to the correction module, receiving corrected feature information returned by the correction module, and then determining the category of the sample frame based on the corrected feature information. For example, referring to fig. 5, the following may be specifically mentioned:

after receiving a sample frame (or a target frame) output by an initial layer, a classification module A can perform feature extraction on the sample frame (or the target frame) to obtain feature information A of the sample frame (or the target frame), then send the feature information A to a correction module A, obtain the correlation between feature channels in the classification module A by the correction module A, correct the feature information A according to the correlation, then transmit the corrected feature information A to the classification module A, transmit the corrected feature information A to the next network layer, namely, a dimension reduction A by the classification module A, perform dimension reduction processing on the corrected feature information A by the dimension reduction A, and then guide the processed feature information A into a classification module B.

Similar to the classification module a, the classification module B performs feature extraction on the information after the information is processed up to the dimensionality reduction a, obtains feature information B, then sends the feature information B to the correction module B, obtains the correlation between feature channels in the classification module B by the correction module B, corrects the feature information B according to the correlation, then transmits the corrected feature information B to the classification module B, transmits the corrected feature information B to the next network layer, namely the dimensionality reduction B, performs dimensionality reduction on the corrected feature information B by the dimensionality reduction B, and then leads the corrected feature information B to the classification module C.

Similarly, after receiving the information processed by the dimension reduction B, the classification module C performs feature extraction on the information processed by the dimension reduction B to obtain feature information C, then sends the feature information C to the correction module C, the correction module BC obtains the correlation between the feature channels in the classification module B, corrects the feature information C according to the correlation, then sends the corrected feature information C to the classification module C, and the classification module C sends the corrected feature information C to the next network layer, namely, the average pooling layer performs average pooling on the corrected feature information C, and then the random inactivation layer performs random inactivation on the information after average pooling, namely, temporarily discards some features according to a certain probability to prevent overfitting, thereby improving the classification effect; then, the information after the random inactivation process may be further processed by using an activation function layer (Softmax) (Softmax may map the information into values of (0,1), and the summation of the values is 1, so that the property of probability may be satisfied) to obtain a classification result, for example, a probability that the sample frame (or the target frame) belongs to a certain examination type may be obtained.

Thereafter, the classification result with the probability greater than the set value can be determined as the inspection type of the sample frame (or the target frame).

For example, if the set value is 55%, if the obtained classification result is "enteroscopy probability is 80%", it may be determined that the examination type of the sample frame (or the target frame) is "enteroscopy", and so on.

It should be noted that the classification network may be a multi-classification network or a two-classification network, and in the case of the two-classification network, when it is determined that the current classification result does not belong to a certain inspection type, it may be determined that the inspection type of the sample frame (or the target frame) is another inspection type.

For example, taking the case where the examination type includes "enteroscopy" and "gastroscopy" and the set value is 55%, if the current classification result is "probability of enteroscopy is 40%", it may be determined that the examination type of the sample frame (or target frame) is not "enteroscopy", and therefore, it may be indicated that the examination type of the sample frame (or target frame) is another examination type, i.e., "gastroscopy".

In addition, it should be noted that the structure and specific parameters of the calibration module in the calibration network may be determined according to the requirements of the actual application, for example, the calibration module may include a compression (Squeeze) part, an Excitation (Excitation) part, and a weighting (Scale) part; the Squeeze can perform global average pooling on each feature channel of the received feature information to obtain global information, then the Excitation part generates corresponding weights for each feature channel according to the global information, and the Scale part calibrates the received feature information on the channel dimension based on the weights of each feature channel to obtain corrected feature information.

For example, taking the feature information received by the calibration module as X, if the feature information X includes c₁A feature map with size "w × h", the correction module can convert the feature information X into feature information U, as shown in fig. 7, and the feature information U includes c₂A feature map of size "w × h", wherein c₁And c₂Is the number of channels (i.e., channels); as shown in fig. 7, the formula is:

F_tr(·，θ)：X—>U，X∈R^w×h×c1，U∈R^w×h×c2；

for example, the feature information U may be obtained by performing convolution processing on the feature information, as follows:

wherein, U_cRepresenting the output, V, corresponding to the c-th convolution kernel_cWhich represents the c-th convolution kernel,

c-th representing the s-th inputConvolution kernel, X^sRepresenting the s-th input.

After obtaining the feature information U, the Squeeze part of the correction module may perform global average pooling on each feature channel of the feature information U to obtain global information, i.e., w × h × c₂Is converted into 1 × 1 × c₂The output of (a) is, as shown in fig. 7, formulated as:

wherein, F_sq(U_c2) As global information, c₂For the channels of feature information U, w × h is the size of the feature map in each channel.

Then, the Excitation part of the correction module can pass through the pair F_sq(U_c2) Performing full-connection operation and activation function processing so as to generate corresponding weights for each characteristic channel; as shown in fig. 7, the formula is:

F_ex(·)＝F_ex(F_sq，K)＝σ(K₂δ(K₁F_sq))

wherein K is the spatial dimension, K₁Is of dimension C₂/r*C₂，K₂Is of dimension C₂*C₂R, r is a scaling parameter, δ (K)₁F_sq) To be F_sqMultiplying by K₁(i.e., perform a full join operation), and σ (K)₂δ(K₁F_sq) Is δ (K)₁F_sq) Multiplying by K₂(i.e., a full join operation is performed), specifically, δ (K) may be₁F_sq) After a ReLU layer, multiply by K₂。

The specific value of the scaling parameter r may be determined according to the requirement of the actual application, for example, r may be 16, and the like. The purpose of this parameter is to reduce the number of channels and thus the amount of computation.

Due to F_sqIs 1 x 1C, so K₁F_sqThe result is 1 × C/r, then a ReLU layer, the output dimensionThe degree is unchanged; multiplying the output of the ReLU layer with W2, wherein the dimension of W2 is C/r, so that after the multiplication with W2, the dimension of the output is 1C, and finally, processing by an activation function, such as a sigmoid function, can obtain S, and the S can reflect C in the feature information U₂The weight of each feature channel.

Thereafter, the weights S of the various feature channels can be based in part on the Scale of the correction module_cRe-calibrating (namely re-calibrating) the characteristic information to obtain corrected characteristic information; as shown in fig. 7, the formula is:

F_scale(·)＝F_scale(U_c，S_c)＝U_c·S_c

wherein, F_scale(U_c，S_c) For corrected characteristic information

U_cIs a feature map of the feature information U in the c channel, S_cIs the weight of the c channel.

3. Predicting an examination type of the medical image flow sample;

furthermore, the video box may calculate the ratio of each category according to the classification result of the plurality of sample frames, and determine the category with the highest ratio as the examination type of the medical image stream sample, so as to obtain the predicted examination type of the medical image stream sample.

For example, taking 10 sample frames as an example, if the classification result of 7 sample frames is "enteroscopy", then at this time, the examination type of the medical image stream sample can be determined to be "enteroscopy".

For another example, taking 15 sample frames as an example, if the classification result of 10 sample frames is "gastroscopy", then the type of examination of the medical image stream sample may be determined as "gastroscopy", and so on.

4. Converging;

finally, the video box may adopt a preset loss function, and converge the image classification engine according to the predicted inspection type and the labeled inspection type, for example, may specifically converge the classification network in the image classification engine, so as to obtain the trained image classification engine.

The loss function may be determined according to the requirement of the actual application, and is not described herein.

And secondly, based on the trained image classification engine, the medical image stream can be identified.

As shown in fig. 5 and 8, a medical image stream identification method may specifically include the following processes:

201. the endoscope device collects medical image streams of the object to be identified and sends the collected medical image streams to the video box.

For example, a physician at each physician workstation may activate an endoscopic device to examine an object to be identified, such as a patient, such as by activating an enteroscope or gastroscope that extends into the patient's body and transmits a stream of acquired medical images to a video cassette in real time during the examination.

Each endoscope apparatus may be configured with one video box, or, optionally, a plurality of endoscope apparatuses may correspond to one video box. The connection between the endoscope apparatus and the video box may be a wired connection or a wireless connection, for example, the connection and communication may be performed through a wireless network or bluetooth, which is not described herein.

202. The video box receives the medical image stream and invokes the trained image classification engine and executes step 203.

For example, the video box may receive a medical image stream sent by the endoscope device through the capture card, and then, on the one hand, the received medical image stream is displayed on the screen through the interface display module for the relevant personnel, such as medical personnel, to view in real time; on one hand, the trained image classification engine is called through the image processing module.

Optionally, in order to reduce the communication flow between the video box and the endoscope apparatus, the endoscope apparatus may be configured to report a medical image frame or a segment of medical image stream to the video box at intervals, or the video box may be configured to collect a medical image frame or a segment of medical image stream from the video box at intervals.

For convenience of description, in this embodiment, a video box collects one medical image frame to the video box every 20ms as an example.

203. And the video box intercepts the current medical image frame to be classified from the medical image stream to obtain a target frame.

For example, taking the example that the video box captures one medical image Frame every 20ms to the video box, at this time, the video box may take the currently captured medical image Frame, such as F1(Frame 1), as the target Frame, and then execute step 204.

For another example, when the category of F1(Frame 1) is determined, the medical image Frame currently needing to be classified is updated to F2(Frame 1), and then F2 may be used as the target Frame, step 204 is executed, and so on.

204. And the video box extracts the features of the target frame according to a preset feature channel through a classification network in the trained image classification engine to obtain the feature information of the target frame.

For example, referring to fig. 9, taking the target frame as F1 as an example, at this time, the video box may perform feature extraction on F1 according to a preset feature channel through the classification network in the trained image classification engine, so as to obtain feature information of F1.

For another example, if the target frame is F2, at this time, the video box may perform feature extraction on F2 according to a preset feature channel through a classification network in the trained image classification engine to obtain feature information of F2, and so on.

205. And the video box acquires the correlation among all the characteristic channels in the classification network, and corrects the characteristic information of the target frame through the correction network according to the correlation to obtain the corrected characteristic information of the target frame.

For example, the video box may determine a value distribution status between each feature channel according to the feature information of the target frame to obtain global information, and then generate corresponding weights for each feature channel through a called correction network in the trained image classification engine according to the global information, and calibrate the feature information of the target frame in a channel dimension through the correction network based on the weights of each feature channel to obtain the corrected feature information of the target frame, where a specific correction method may refer to a related description in a training process of the image classification engine, and details of which are not described herein.

206. The video box determines the category of the target frame based on the corrected feature information of the target frame.

For example, the video box may predict, through the classification network, a probability that the target frame belongs to the enteroscopy image (or the gastroscopy image) according to the corrected feature information of the target frame, and if the probability is greater than a set value, determine that the target frame is the enteroscopy image, and further determine that the target frame is the "enteroscopy".

For example, if the set value is 60%, and if the probability that the target frame F1 belongs to the enteroscopy image is 70%, the target frame may be determined to be an enteroscopy image, and then the target frame is determined to be an enteroscopy.

207. The video box determines whether the number of the classified medical image frames meets a preset condition, if so, step 208 is executed, and if not, step 203 is executed again, namely, the step of capturing the medical image frames needing to be classified currently from the medical image stream is executed again.

For example, if the preset condition is "equal to 10", and the current target frame is F1, and the currently classified medical image frame is 1, then the process returns to step 203, i.e. the target frame is updated to the next medical image frame, such as F2, and then the process continues to step 204 to 207 until the number of the classified medical image frames is 10, for example, referring to fig. 9, and step 208 is not executed until the categories of F1 to F10 are all determined.

208. And the video box calculates the ratio of each category according to the classification result of the plurality of medical image frames, and determines the category with the highest ratio as the inspection type of the object to be identified.

For example, taking 10 medical image frames as an example, if the classification result of 8 medical image frames is "enteroscopy" and the classification result of 2 medical image frames is "gastroscopy", the ratio of "enteroscopy" is the highest, so that at this time, the examination type of the object to be identified can be determined to be "enteroscopy".

209. The video box sends the examination type and the medical image stream to the server.

For example, the video box acquires current network information, establishes a connection with the server according to the acquired network information, and transmits the examination type and the medical image stream to the server through the connection.

For example, the examination type and the medical image frames after the target frame may be transmitted to the server, for example, if the current target frame is F10, at this time, the examination type and the medical image frames after F10, such as F11, F12, and F13, etc., may be transmitted to the server.

For another example, if the target frame is already stored when the target frame is classified, the examination type and all the received medical image frames may be sent to the server; for example, if the current target frame is F10 and F1 to F10 have been saved before F1 to F10 were classified, the examination type may be sent to the server together with F1 to F10 and the medical image frames after F10, such as F11, F12 and F13.

Optionally, in order to ensure information security, reduce the amount of transmitted data, and improve transmission efficiency, a compressed password may be agreed between the video box and the server, so that the video box may compress and encrypt the examination type and the medical image stream according to the compressed password, and after obtaining a compressed ciphertext, send the compressed ciphertext to the server.

It should be noted that the server may be a single entity or a server cluster, and is not limited herein.

210. And calling the AI engine corresponding to the inspection type by the server to identify the medical image stream to obtain an identification result.

For example, if the examination type is colorectal examination, the server calls a colorectal engine to identify the medical image stream; if the examination type is upper gastrointestinal examination, the server calls an upper gastrointestinal engine to identify the medical image stream, and the like.

Optionally, if a compressed password is agreed between the video box and the server, and the server receives the compressed ciphertext, the server may decrypt and decompress the received compressed ciphertext according to the compressed password to obtain an inspection type and a medical image stream, and then call an AI engine corresponding to the inspection type to identify the medical image stream to obtain an identification result.

211. And the server sends the identification result to the video box.

212. After the video box receives the identification result, the identification result is displayed on a screen through an interface display module so that relevant personnel such as medical personnel can check the identification result.

It should be noted that, in the present embodiment, only the trained image classification engine is specifically installed in the video box for example, it should be understood that the trained image classification engine may also be installed in other devices besides the video box, such as a system in a hospital; if the target frames are installed in other devices other than the video box, the target frames may be sent to the trained image classification engine one by one for classification, optionally, in order to save communication traffic, reduce signaling flow, and improve processing efficiency, a preset number of medical image frames, for example, 10 medical image frames, may also be sent to the trained image classification engine together, and the trained image classification engine determines the inspection type of the object to be identified according to the classification result of the medical image frames and then returns the inspection type to the video box once again, which is not described herein again.

In addition, it should be further noted that, in the present embodiment, only the medical image capturing device, specifically, the endoscope device, is taken as an example for description, it should be understood that the medical image capturing device may also be other devices, such as MRI, CT, colposcope, or the like, and the examination type also corresponds to the specific medical image capturing device, which is similar to that in the present embodiment and is not described herein again.

As can be seen from the above, in this embodiment, the image classification engine may be trained by collecting a plurality of medical image stream samples labeled with inspection types to obtain a trained image classification engine, then, after the medical image stream of the object to be recognized is collected, the trained image classification engine may be directly called, and the trained image classification engine classifies a plurality of medical image frames in the medical image stream, then, the inspection type of the object to be recognized is determined according to the classification result of the plurality of medical image frames, and then, the determined inspection type and the medical image stream are sent to the server, so that the server calls the artificial intelligence engine corresponding to the inspection type to recognize the medical image stream; according to the scheme, the medical image frame is not required to be transmitted to the server through the PACS to determine the examination type, but the examination type can be determined locally, so that the code of the PACS is not required to be modified, the implementation process can be simplified, the limitation of the PACS is avoided, and the applicability is wide. In addition, the inspection type can be automatically identified by the trained image classification engine, so that the processing efficiency can be greatly improved compared with the existing scheme that the inspection type needs to be manually input and uploaded to the server, meanwhile, misoperation caused by excessive manual participation can be reduced, and the identification accuracy is improved.

In addition, because the scheme introduces a correction network to correct the characteristic information when classifying based on the characteristic information so as to promote the characteristics useful for the current task and inhibit the characteristics with little use for the current task, the classification accuracy can be improved, and the performance of the whole system is improved.

Accordingly, in order to better implement the above method, an embodiment of the present invention further provides a medical image stream recognition apparatus, which may be specifically integrated in an electronic device, such as a terminal, where the terminal may be a video box, a tablet computer, a notebook computer, or a PC.

For example, as shown in fig. 10, the medical image flow identification apparatus may include an acquisition unit 301, a classification unit 302, a determination unit 303, and a transmission unit 304, as follows:

(1) an acquisition unit 301;

an acquisition unit 301 for acquiring a medical image stream of the object to be identified.

For example, the acquisition unit 301 may be specifically configured to receive a medical image stream transmitted by a medical image acquisition device.

It should be noted that, when the medical image acquisition device provides the medical image stream to the acquisition unit 301, a segment of the medical image stream may be provided to the acquisition unit 301, or may be provided to the acquisition unit 301 in real time frame by frame, and the specific implementation may be determined according to the requirements of the actual application, which is not described herein again.

(2) A classification unit 302;

the classifying unit 302 is configured to invoke a trained image classification engine, and classify a plurality of medical image frames in the medical image stream by the trained image classification engine.

For example, the classification unit 302 may include a call subunit, an intercept subunit, an extract subunit, and a determine subunit, as follows:

the calling subunit is used for calling the trained image classification engine.

The intercepting subunit is configured to intercept a medical image frame that needs to be classified currently from the medical image stream to obtain a target frame.

The extraction subunit is configured to perform feature extraction on the target frame according to a preset feature channel through the trained image classification engine, so as to obtain feature information of the target frame.

The determining subunit is configured to determine the category of the target frame based on the feature information of the target frame, and trigger the capturing subunit to perform an operation of capturing the medical image frame currently required to be classified from the medical image stream until the number of the classified medical image frames meets a preset condition.

The trained image classification engine is formed by training a plurality of medical image stream samples marked with examination types. The network structure and specific network parameters of the trained image classification engine may be set according to the requirements of practical applications, for example, taking the case where the trained image classification engine includes a classification network, at this time:

the extracting subunit is specifically configured to perform feature extraction on the target frame according to a preset feature channel through the classification network, so as to obtain feature information of the target frame.

The classification network may be an inclusion-V4 network or an inclusion-ResNet-V2 network, which may be referred to in the foregoing embodiments and will not be described herein again.

Optionally, since the importance degrees of different feature channels are different, and the extracted features have different effects on the current task, in order to improve the classification accuracy, the features useful for the current task can be improved according to the importance degree of each feature channel, and the features with little use on the current task can be suppressed. Since the importance degree of each feature channel can be embodied by the correlation among the feature channels, the feature information can be corrected according to the correlation among the feature channels; that is, optionally, the classification unit 302 may further include a correction subunit, as follows:

and the correcting subunit is used for acquiring the correlation among the characteristic channels, and correcting the characteristic information of the target frame according to the correlation to obtain the corrected characteristic information of the target frame.

Then, at this time, the determining subunit may specifically determine the category of the target frame by using the corrected feature information based on the target frame

Specifically, the "correct" function may be implemented by embedding a correction module in the classification network, that is, in addition to the classification network, the trained image classification engine may further include a correction network, that is:

the correcting subunit is specifically configured to correct the feature information of the target frame through the correcting network according to the correlation, so as to obtain corrected feature information of the target frame.

For example, the correcting subunit may be specifically configured to determine a numerical distribution condition between each feature channel according to the feature information of the target frame, to obtain global information, generate a corresponding weight for each feature channel according to the global information, and calibrate the feature information of the target frame in a channel dimension based on the weight of each feature channel, to obtain corrected feature information of the target frame.

The value distribution status among the feature channels can be obtained by performing global Average Pooling (Average Pooling) processing on each feature channel of the feature information of the target frame, that is:

the correction subunit, a specific section, may be configured to perform global average pooling processing on each feature channel of the feature information of the target frame to obtain global information.

(3) A determination unit 303;

a determining unit 303, configured to determine an examination type of the object to be identified according to the classification results of the plurality of medical image frames;

for example, the determining unit 303 may be specifically configured to calculate a ratio of each category according to the classification results of the plurality of medical image frames, and determine the category with the highest ratio as the examination type of the object to be identified.

Alternatively, for another example, the determining unit 303 may be specifically configured to calculate the ratio of each category according to the classification results of the plurality of medical image frames; if the category with the occupation ratio higher than the preset threshold exists, determining the category with the occupation ratio higher than the preset threshold as the checking type of the object to be identified: and if the category with the occupation ratio higher than the preset threshold value does not exist, returning to execute the operation of classifying the plurality of medical image frames in the medical image stream by the trained image classification engine until the inspection type of the object to be recognized is determined.

The threshold may be set according to the requirement of practical application, which is not described herein.

(4) A transmission unit 304;

a sending unit 304, configured to send the examination type and the medical image stream to a server, so that the server invokes an AI engine corresponding to the examination type to identify the medical image stream.

For example, if the examination type is colorectal examination, the server may call the colorectal engine to identify the medical image stream; if the examination type is upper gastrointestinal examination, the server may call an upper gastrointestinal engine to identify the medical image stream, and so on.

Thereafter, the server may also return the identification result to the medical image flow identification device, so that the medical image flow identification device provides the identification result to the relevant person, such as the object to be identified or the medical staff. Namely, the medical image flow identification device may further include a receiving unit, as follows:

and the receiving unit is used for receiving the identification result sent by the server and displaying the identification result on a screen.

Optionally, the sending unit 304 may also be configured to send the identification result to a specified user according to a preset policy, such as to a mobile phone of a specified doctor or patient, and so on.

Optionally, the trained image classification engine may be set in advance by an operation and maintenance person, or may be obtained by self-training of the medical image stream recognition device. As shown in fig. 11, the medical image stream recognition apparatus further includes a convergence unit 305, as follows:

the acquiring unit 301 may be further configured to acquire a plurality of medical image stream samples labeled with examination types.

The classification unit 302 may also be configured to invoke an image classification engine, and classify a plurality of sample frames in the medical image stream sample by the image classification engine.

For example, the classifying unit 302 may be specifically configured to intercept multiple sample frames from the medical image stream sample according to a preset policy, perform feature extraction on the sample frames according to a preset feature channel through the image classification engine to obtain feature information of the sample frames, and determine the category of the sample frames based on the feature information of the sample frames, which may be specifically referred to in the foregoing embodiments and is not described herein again.

The determining unit 303 is further configured to predict an examination type of the medical image stream sample according to the classification result of the plurality of sample frames.

The convergence unit 305 may be configured to converge the image classification engine based on the predicted inspection type and the labeled inspection type to obtain a trained image classification engine.

Optionally, in view of the fact that the extracted features of different feature channels have different effects on the current task (such as classification), in order to improve the accuracy of classification, useful features may be further improved according to the importance degree of each feature channel, and features that are not useful for the current task are suppressed. That is, the feature information of the sample frame can be "corrected" by the importance of each feature channel, that is:

the classifying unit 302 may be specifically configured to obtain a correlation between each feature channel, correct the feature information of the sample frame according to the correlation to obtain corrected feature information of the sample frame, and determine the category of the sample frame based on the corrected feature information of the sample frame.

Optionally, because there are differences between different medical image capturing devices and there may be some differences or interference factors in the shooting environment, in order to improve the accuracy of classification, after a sample frame is intercepted, the sample frame may be further preprocessed, that is, as shown in fig. 11, the medical image stream recognition apparatus may further include a preprocessing unit 306, as follows:

the preprocessing unit 306 may be configured to preprocess the sample frame to obtain a preprocessed sample frame; the preprocessing comprises the processing of black edge cutting, scaling, graying, whitening and the like.

At this time, the classification unit 302 may be specifically configured to perform feature extraction on the preprocessed sample frame according to a preset feature channel by using the image classification engine, so as to obtain feature information of the sample frame.

In a specific implementation, the above units may be implemented as independent entities, or may be combined arbitrarily to be implemented as the same or several entities, and the specific implementation of the above units may refer to the foregoing method embodiments, which are not described herein again.

As can be seen from the above, after the collecting unit 301 of the medical image stream recognition apparatus of this embodiment collects the medical image stream of the object to be recognized, the classifying unit 302 may directly call the local trained image classification engine, classify the multiple medical image frames in the medical image stream by the trained image classification engine, determine the inspection type of the object to be recognized according to the classification result of the multiple medical image frames by the determining unit 303, and send the determined inspection type and the medical image stream to the server by the sending unit 304, so that the server calls the artificial intelligence engine corresponding to the inspection type to recognize the medical image stream; because the medical image frame is not required to be transmitted to the server through the PACS to determine the examination type, but the examination type can be determined directly locally, the code of the PACS is not required to be modified, and the method is not limited by the PACS, so that the realization process and conditions can be simplified, and the applicability is wider; in addition, the inspection type can be automatically identified by the trained image classification engine, so that the processing efficiency can be greatly improved compared with the existing scheme that the inspection type needs to be manually input and uploaded to the server, meanwhile, misoperation caused by excessive manual participation can be reduced, and the identification accuracy is improved.

In addition, an embodiment of the present invention further provides an electronic device, as shown in fig. 12, which shows a schematic structural diagram of the electronic device according to the embodiment of the present invention, specifically:

the electronic device may include components such as a processor 401 of one or more processing cores, memory 402 of one or more computer-readable storage media, a power supply 403, and an input unit 404. Those skilled in the art will appreciate that the electronic device configuration shown in fig. 12 does not constitute a limitation of the electronic device and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. Wherein:

the processor 401 is a control center of the electronic device, connects various parts of the whole electronic device by various interfaces and lines, performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 402 and calling data stored in the memory 402, thereby performing overall monitoring of the electronic device. Optionally, processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 401.

The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by operating the software programs and modules stored in the memory 402. The memory 402 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the electronic device, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 access to the memory 402.

The electronic device further comprises a power supply 403 for supplying power to the various components, and preferably, the power supply 403 is logically connected to the processor 401 through a power management system, so that functions of managing charging, discharging, and power consumption are realized through the power management system. The power supply 403 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

The electronic device may further include an input unit 404, and the input unit 404 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, the electronic device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 401 in the electronic device loads the executable file corresponding to the process of one or more application programs into the memory 402 according to the following instructions, and the processor 401 runs the application program stored in the memory 402, thereby implementing various functions as follows:

the method comprises the steps of collecting a medical image stream of an object to be identified, calling a trained image classification engine, classifying a plurality of medical image frames in the medical image stream through the trained image classification engine, determining an examination type of the object to be identified according to a classification result of the plurality of medical image frames, and sending the examination type and the medical image stream to a server so that the server calls an AI engine corresponding to the examination type to identify the medical image stream.

The trained image classification engine can be trained by a plurality of medical image stream samples marked with examination types.

Optionally, the trained image classification engine may be set in advance by an operation and maintenance person, or may be obtained by self-training of the electronic device, that is, the processor 401 may also run an application program stored in the memory 402, so as to implement the following functions:

the method comprises the steps of collecting a plurality of medical image stream samples marked with inspection types, calling an image classification engine, classifying a plurality of sample frames in the medical image stream samples through the image classification engine, predicting the inspection types of the medical image stream samples according to the classification results of the plurality of sample frames, converging the image classification engine based on the predicted inspection types and the marked inspection types, and obtaining the trained image classification engine.

The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.

As can be seen from the above, after the electronic device of this embodiment collects the medical image stream of the object to be recognized, it may directly invoke a local post-training image classification engine, classify a plurality of medical image frames in the medical image stream by the post-training image classification engine, determine the inspection type of the object to be recognized according to the classification result of the plurality of medical image frames, and send the determined inspection type and the medical image stream to the server, so that the server invokes the artificial intelligence engine corresponding to the inspection type to recognize the medical image stream; because the medical image frame is not required to be transmitted to the server through the PACS to determine the examination type, but the examination type can be determined directly locally, the code of the PACS is not required to be modified, and the method is not limited by the PACS, so that the realization process and conditions can be simplified, and the applicability is wider; in addition, the inspection type can be automatically identified by the trained image classification engine, so that the processing efficiency can be greatly improved compared with the existing scheme that the inspection type needs to be manually input and uploaded to the server, meanwhile, misoperation caused by excessive manual participation can be reduced, and the identification accuracy is improved.

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.

To this end, the present invention provides a storage medium, in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute the steps in any one of the medical image stream recognition methods provided by the embodiments of the present invention. For example, the instructions may perform the steps of:

Optionally, the instructions may further perform the following steps:

Wherein the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.

Since the instructions stored in the storage medium can execute the steps in any medical image stream identification method provided by the embodiment of the present invention, the beneficial effects that can be achieved by any medical image stream identification method provided by the embodiment of the present invention can be achieved, which are detailed in the foregoing embodiments and will not be described again here.

The medical image stream identification method, the medical image stream identification device, the electronic device and the storage medium provided by the embodiment of the invention are described in detail, a specific example is applied in the text to explain the principle and the implementation of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for those skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method for medical image flow identification, comprising:

acquiring a medical image stream of an object to be identified;

and sending the examination type and the medical image stream to a server so that the server calls an artificial intelligence engine corresponding to the examination type to identify the medical image stream.

2. The method of claim 1, wherein classifying the plurality of medical image frames in the medical image stream by the trained image classification engine comprises:

intercepting a medical image frame needing to be classified currently from the medical image stream to obtain a target frame;

performing feature extraction on the target frame according to a preset feature channel through a trained image classification engine to obtain feature information of the target frame;

and determining the category of the target frame based on the characteristic information of the target frame, and returning to execute the step of intercepting the medical image frames which need to be classified currently from the medical image stream until the number of the classified medical image frames meets a preset condition.

3. The method according to claim 2, wherein after the feature extraction is performed on the target frame according to a preset feature channel by the trained image classification engine to obtain the feature information of the target frame, the method further comprises:

obtaining the correlation among all characteristic channels;

correcting the characteristic information of the target frame according to the correlation to obtain corrected characteristic information of the target frame;

the determining the category of the target frame based on the feature information of the target frame specifically includes: determining a category of the target frame based on the corrected feature information of the target frame.

4. The method according to claim 3, wherein the trained image classification engine includes a classification network and a correction network, and the obtaining of the feature information of the target frame by performing feature extraction on the target frame according to a preset feature channel through the trained image classification engine includes:

extracting the features of the target frame according to a preset feature channel through the classification network to obtain feature information of the target frame;

the correcting the feature information of the target frame according to the correlation to obtain the corrected feature information of the target frame includes: and correcting the characteristic information of the target frame through the correction network according to the correlation to obtain corrected characteristic information of the target frame.

5. The method of claim 4, wherein the obtaining the correlation between the characteristic channels comprises:

determining the numerical distribution condition among all characteristic channels according to the characteristic information of the target frame to obtain global information;

and according to the correlation, correcting the feature information of the target frame through the correction network to obtain corrected feature information of the target frame, specifically: and generating corresponding weights for all the characteristic channels according to the global information, and calibrating the characteristic information of the target frame on the channel dimension based on the weights of all the characteristic channels to obtain the corrected characteristic information of the target frame.

6. The method according to claim 5, wherein the determining a value distribution status between the feature channels according to the feature information of the target frame to obtain global information includes:

and carrying out global average pooling on each characteristic channel of the characteristic information of the target frame to obtain global information.

7. The method according to any one of claims 1 to 6, wherein the determining the examination type of the object to be identified according to the classification results of the plurality of medical image frames comprises:

calculating the proportion of each category according to the classification result of the plurality of medical image frames;

and determining the class with the highest proportion as the inspection type of the object to be identified.

8. The method according to any one of claims 1 to 6, wherein the determining the examination type of the object to be identified according to the classification results of the plurality of medical image frames comprises:

if the category with the occupation ratio higher than the preset threshold exists, determining the category with the occupation ratio higher than the preset threshold as the checking type of the object to be identified:

and if the category with the occupation ratio higher than the preset threshold value does not exist, returning to the step of classifying the plurality of medical image frames in the medical image stream by the trained image classification engine until the inspection type of the object to be recognized is determined.

9. The method of any of claims 1 to 6, wherein before invoking the trained image classification engine, further comprising:

collecting a plurality of medical image stream samples marked with examination types;

calling an image classification engine, and classifying a plurality of sample frames in the medical image stream sample through the image classification engine;

predicting the inspection type of the medical image flow sample according to the classification result of the multiple sample frames;

and converging the image classification engine based on the predicted inspection type and the labeled inspection type to obtain the trained image classification engine.

10. The method of claim 9, wherein the classifying, by the image classification engine, a plurality of sample frames in the medical image stream sample comprises:

intercepting a plurality of sample frames from the medical image stream sample according to a preset strategy;

performing feature extraction on the sample frame according to a preset feature channel through the image classification engine to obtain feature information of the sample frame;

determining the category of the sample frame based on the characteristic information of the sample frame.

11. The method according to claim 10, wherein after the extracting the features of the sample frame according to a preset feature channel by the image classification engine to obtain the feature information of the sample frame, the method further comprises:

obtaining the correlation among all characteristic channels;

correcting the characteristic information of the sample frame according to the correlation to obtain corrected characteristic information of the sample frame;

the determining the type of the sample frame based on the characteristic information of the sample frame specifically includes: determining a category of the sample frame based on the corrected feature information of the sample frame.

12. The method according to claim 10, wherein after the intercepting a plurality of sample frames from the medical image stream sample according to the preset strategy, further comprising:

preprocessing the sample frame to obtain a preprocessed sample frame, wherein the preprocessing comprises black edge cutting, scaling, graying and whitening;

the image classification engine is used for extracting the features of the sample frame according to a preset feature channel to obtain the feature information of the sample frame, and the method specifically comprises the following steps: and performing feature extraction on the preprocessed sample frame according to a preset feature channel through the image classification engine to obtain feature information of the sample frame.

13. A medical image flow identification device, comprising:

and the sending unit is used for sending the inspection type and the medical image stream to a server so that the server calls an artificial intelligence engine corresponding to the inspection type to identify the medical image stream.

14. An electronic device comprising a memory and a processor; the memory stores an application program, and the processor is configured to execute the application program in the memory to perform the operations of the medical image stream identification method according to any one of claims 1 to 13.

15. A storage medium storing instructions adapted to be loaded by a processor to perform the steps of the method according to any one of claims 1 to 13.