CN113744266B

CN113744266B - Method and device for displaying focus detection frame, electronic equipment and storage medium

Info

Publication number: CN113744266B
Application number: CN202111291667.8A
Authority: CN
Inventors: 孙炎; 刘奇为; 胡珊
Original assignee: Wuhan Endoangel Medical Technology Co Ltd
Current assignee: Wuhan Endoangel Medical Technology Co Ltd
Priority date: 2021-11-03
Filing date: 2021-11-03
Publication date: 2022-02-08
Anticipated expiration: 2041-11-03
Also published as: CN113744266A

Abstract

The application provides a display method, a display device, electronic equipment and a storage medium of a focus detection frame, wherein the display method comprises the steps of generating and displaying a video picture corresponding to endoscope video data on a first canvas based on the acquired endoscope video data, and acquiring a video frame image based on the acquired endoscope video data; then preprocessing the video frame image to obtain a preprocessed image; performing focus identification detection on the preprocessed image by using the trained focus detection model to obtain a detection result; generating and displaying a lesion detection box on a second canvas based on the lesion location data in response to the detection result being that the lesion and the lesion location data are identified; the second canvas is located on the upper layer of the first canvas, and the second canvas is a background transparent canvas. The video picture on the first canvas does not influence the display of the focus detection frame, so that the display stability, continuity and accuracy of the focus detection frame are improved, and the accuracy and efficiency of diagnosis under the endoscope are improved.

Description

Method and device for displaying focus detection frame, electronic equipment and storage medium

Technical Field

The application relates to the field of medical technology assistance, in particular to a method and a device for displaying a focus detection frame, electronic equipment and a storage medium.

Background

With the development of electronic imaging, medical imaging and artificial intelligence technologies, internal tissues and organs of a human body are represented in an image mode by means of media (such as X-rays, electromagnetic fields, ultrasonic waves, endoscopes and the like), so that the mode of conveniently evaluating the health condition of the human body becomes an important means for doctors to diagnose and treat diseases, and therefore, in order to reduce the burden of the doctors, reduce the difficulty in diagnosing the diseases and improve the lesion detection rate, the method has very important significance for automatically detecting the lesions in the images and displaying a detection frame in real time to prompt the doctors.

However, in the actual real-time auxiliary medical image focus examination process, along with the movement of the observation visual angle in the operation of a doctor, the size, shape and position of the focus in the visual field are changed greatly, so that the focus detection frame is unstable and discontinuous, and the multi-focus and small-focus detection frames jump back and forth, thereby affecting the operation experience of the doctor and even affecting the diagnosis accuracy of the doctor.

Disclosure of Invention

The application provides a method and a device for displaying a focus detection frame, electronic equipment and a storage medium, and aims to solve the problem of unstable display of the focus detection frame in the prior art.

In one aspect, the present application provides a method for displaying a lesion detection frame, where the method includes:

generating and displaying a video picture corresponding to the endoscope video data on a first canvas based on the acquired endoscope video data, and acquiring a video frame image based on the acquired endoscope video data; preprocessing the video frame image to obtain a preprocessed image; performing focus identification detection on the preprocessed image by using a trained focus detection model to obtain a detection result; in response to the detection result being that a lesion and lesion position data are identified, generating and displaying a lesion detection box on a second canvas based on the lesion position data; wherein the second canvas is located on the upper layer of the first canvas, and the second canvas is a background transparent canvas.

In one possible implementation manner of the present application, the generating and displaying a video frame corresponding to the endoscope video data on a first canvas based on the acquired endoscope video data, and acquiring a video frame image based on the acquired endoscope video data includes: generating and displaying a video picture corresponding to endoscope video data on a first canvas based on the endoscope video data acquired in real time, and acquiring a video frame image and the current time corresponding to the video frame image at a preset frequency based on the endoscope video data acquired in real time;

generating and displaying a lesion detection box on a second canvas based on the lesion location data in response to the detection result being the identification of a lesion and lesion location data, comprising: in response to the detection result being that the focus and the focus position data are identified, storing the focus position data and the current time corresponding to the video frame image in a list to be displayed; wherein the focus position data and the current time corresponding to the video frame image are a set of data to be displayed; acquiring a group of data to be displayed from the list to be displayed, and displaying a focus detection frame in a second canvas based on the focus position data in the data to be displayed;

the acquiring a set of the data to be displayed from the list to be displayed, and after displaying a lesion detection frame in a second canvas based on the lesion position data in the data to be displayed, further includes: after the focus detection frame is displayed for a preset duration, the focus detection frame is cleared on the second canvas;

wherein the preset frequency is determined based on the recognition detection efficiency of the trained lesion detection model; the preset duration is determined based on a recognition detection efficiency of the trained lesion detection model.

In a possible implementation manner of the present application, the list to be displayed includes at least two groups of data to be displayed; before the acquiring a set of the data to be displayed from the list to be displayed and displaying a lesion detection frame in a first canvas based on the lesion position data in the data to be displayed, the method further includes: and dynamically combining the at least two groups of data to be displayed into one group of data to be displayed.

In a possible implementation manner of the present application, the dynamically merging the at least two sets of data to be displayed into one set of data to be displayed includes: and dynamically combining the at least two groups of data to be displayed into a group of data to be displayed through a non-maximum suppression algorithm.

In a possible implementation manner of the present application, before the acquiring a set of the data to be displayed from the list to be displayed and displaying a lesion detection frame in a second canvas based on the lesion position data in the data to be displayed, the method further includes: and filtering and removing the outdated data to be displayed in the list to be displayed based on the current time, the preset duration and the current time to be displayed corresponding to the video frame images in the list to be displayed.

In a possible implementation manner of the present application, the performing a lesion identification detection on the preprocessed image by using the trained lesion detection model to obtain a detection result includes: acquiring an initial target detection model and a focus training set; training the initial target detection model by using the focus training set to obtain the focus detection model; and carrying out focus identification detection on the preprocessed image by using the focus detection model to obtain a detection result.

In a possible implementation manner of the present application, the preprocessing the video frame image to obtain a preprocessed image includes: recognizing the video frame image by using the trained edge cutting network model to obtain an effective information area; and cutting the video frame image based on the effective information area.

In another aspect, the present application provides a display device of a lesion detection frame, the display device including:

the display module is used for generating and displaying a video picture corresponding to the endoscope video data on a first canvas, responding to a detection result, identifying a focus and focus position data, and generating and displaying a focus detection frame on a second canvas based on the focus position data; the acquisition module is used for acquiring video frame images based on the acquired endoscope video data; the preprocessing module is used for preprocessing the video frame image to obtain a preprocessed image; and the identification detection module is used for carrying out focus identification detection on the preprocessed image by utilizing the trained focus detection model to obtain a detection result.

In another aspect, the present application further provides an electronic device, which includes a memory and a processor coupled to each other, where the processor is configured to execute program instructions stored in the memory to implement the method for displaying a lesion detection frame.

In another aspect, the present application further provides a computer readable storage medium, on which program instructions are stored, which when executed by a processor implement the above-mentioned method for displaying a lesion detection frame.

The display method of the focus detection frame comprises the steps of generating and displaying a video picture corresponding to endoscope video data on a first canvas based on the acquired endoscope video data, and acquiring a video frame image based on the acquired endoscope video data; then preprocessing the video frame image to obtain a preprocessed image; performing focus identification detection on the preprocessed image by using the trained focus detection model to obtain a detection result; generating and displaying a lesion detection box on a second canvas based on the lesion location data in response to the detection result being that the lesion and the lesion location data are identified; the second canvas is located on the upper layer of the first canvas, and the second canvas is a background transparent canvas. The accuracy and the precision of focus identification are improved by preprocessing the acquired video frame image and then using the focus detection model to carry out focus identification detection. The focus is identified and the position data of the focus is identified in the video frame image, the second canvas is used for generating the focus detection frame, the first canvas continuously generates and displays video pictures based on the acquired endoscope video data, the video pictures on the first canvas do not influence the display of the focus detection frame, and the display stability and the continuity of the focus detection frame are improved; meanwhile, the accuracy and the efficiency of identifying the focus and the position of the focus are improved, so that the accuracy of highlighting the focus in a display picture by a focus detection frame is improved, and the accuracy and the efficiency of diagnosis under an endoscope are improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic view of an application scenario of a method for displaying a lesion detection frame according to an embodiment of the present application;

fig. 2 is a schematic flowchart illustrating a method for displaying a lesion detection frame according to an embodiment of the present disclosure;

FIG. 3 is a schematic flow chart diagram of one embodiment of S23;

fig. 4 is a schematic flowchart illustrating a method for displaying a lesion detection frame according to another embodiment of the present disclosure;

fig. 5 is a schematic diagram of a frame of an embodiment of a display device for a lesion detection frame provided in the present application;

FIG. 6 is a block diagram of an embodiment of an electronic device of the present application;

FIG. 7 is a block diagram of an embodiment of a computer-readable storage medium of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc. indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be considered as limiting the present invention. Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, features defined as "first", "second", may explicitly or implicitly include one or more of the described features. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

In this application, the word "exemplary" is used to mean "serving as an example, instance, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments. The following description is presented to enable any person skilled in the art to make and use the invention. In the following description, details are set forth for the purpose of explanation. It will be apparent to one of ordinary skill in the art that the present invention may be practiced without these specific details. In other instances, well-known structures and processes are not shown in detail to avoid obscuring the description of the invention with unnecessary detail. Thus, the present invention is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.

Embodiments of the present application provide a method and an apparatus for displaying a lesion detection frame, an electronic device, and a storage medium, which are described in detail below.

Referring to fig. 1, fig. 1 is a schematic view of an application scenario of a display method of a lesion detection frame according to an embodiment of the present application. The endoscope detection terminal 110 and the display terminal 130 are connected in a communication mode. The endoscope detection terminal 110 has a shooting function and is mainly responsible for collecting endoscope images of a part to be detected in an organism such as a human body or an animal body. Can be divided into enterogastroscope, otorhinolaryngology endoscope, oral endoscope, urethrocystoscope, laparoscope and the like according to the detection part. The video data of the endoscopic image collected by the endoscopic detection terminal 110 is transmitted to the display terminal 130. The display terminal 130 acquires the endoscope video data acquired by the endoscope detection terminal 110, and performs the method for displaying the lesion detection frame provided by the present application. The method specifically comprises the following steps: generating and displaying a video picture corresponding to the endoscope video data on a first canvas based on the acquired endoscope video data, and acquiring a video frame image based on the acquired endoscope video data; preprocessing a video frame image to obtain a preprocessed image; performing focus identification detection on the preprocessed image by using the trained focus detection model to obtain a detection result; generating and displaying a lesion detection box on a second canvas based on the lesion location data in response to the detection result being that the lesion and the lesion location data are identified; the second canvas is located on the upper layer of the first canvas, and the second canvas is a background transparent canvas.

The endoscopic detection terminal 110 and the display terminal 130 may be devices that include both receiving and transmitting hardware, i.e., devices having receiving and transmitting hardware capable of performing two-way communication over a two-way communication link. Such a terminal may include: a cellular or other communication device having a single line display or a multi-line display or a cellular or other communication device without a multi-line display.

In this embodiment, the display terminal 130 includes, but is not limited to, a portable terminal such as a mobile phone and a tablet, a fixed terminal such as a computer and a query machine, and various virtual terminals; the display terminal 130 may include a display device and a host, and has a display function.

It is to be understood that the display scene of the lesion detection frame may further include one or more servers, or/and one or more terminals connected to a server network, and is not limited herein. The server includes, but is not limited to, a computer, a network host, a single network server, a plurality of network server sets, or a cloud server formed by a plurality of servers.

It should be noted that the application scenario diagram of the display method of the lesion detection frame shown in fig. 1 is only an example, and the application scenario described in the embodiment of the present application is for more clearly illustrating the technical solution of the embodiment of the present invention, and does not form a limitation to the technical solution provided in the embodiment of the present invention.

Referring to fig. 2, fig. 2 is a schematic flowchart illustrating a method for displaying a lesion detection frame according to an embodiment of the present disclosure. The display method comprises the following steps:

s21: and generating and displaying a video picture corresponding to the endoscope video data on the first canvas based on the acquired endoscope video data, and acquiring a video frame image based on the acquired endoscope video data.

In this step, the endoscope video data is video data acquired by the endoscope, may be acquired in real time, or may be video data acquired and stored by the endoscope at an earlier stage. The endoscope has an imaging function and can acquire video data when performing endoscopic examination on a living body such as a human body or an animal. In medical examination, endoscopes can be classified into enteroscopes, otolaryngoscopes, oral endoscopes, urocystoscopes, laparoscopes, and the like, according to different organ parts detected by the endoscopes.

The endoscope acquires video data, acquires the endoscope video data, and generates and displays a video picture corresponding to the endoscope video data on the first canvas based on the acquired endoscope video data, namely, a display interface displays and plays an image video of endoscopy. Then, based on the acquired endoscopic video data, a video frame image is also acquired. It is understood that the video is composed of consecutive video frame images, and herein, acquiring a video frame image refers to acquiring one frame. Or a frame in the video is acquired at a certain frequency or a certain time interval. It can be understood that the acquired endoscopic video data may be encoded data, which needs to be decoded and then displayed.

S22: and preprocessing the video frame image to obtain a preprocessed image.

After the video frame image is acquired in the previous step, the video frame image is preprocessed, for example, corresponding processing such as cropping processing or image scaling processing is performed on the aspects of image size, color, brightness, resolution and the like, so as to remove invalid information areas and interference information in the image, reduce noise, reserve valid information areas and salient features, and enable the preprocessed image to meet the requirements of subsequent processing. The pretreatment may be a single treatment method or a combination of a plurality of treatment methods. For example, only cropping or only image scaling may be performed, or multiple kinds of cropping and image scaling may be performed, and the order of the multiple kinds of processing may be set and adjusted according to a specific application scenario. The preprocessed image can improve the precision and accuracy of lesion recognition, reduce recognition time consumption and reduce the time consumption of the image in the subsequent data transmission process when the next lesion recognition is performed.

In medical detection, endoscopes can be classified into enteroscopes, otolaryngoscopes, oral endoscopes, urethrocystoscopes, laparoscopes and the like according to different organ parts detected by the endoscopes or different application scenes. Due to different organ part structures, the shapes and sizes of the effective areas in the corresponding endoscopic video frame images are different. Therefore, different types of endoscopes perform corresponding preprocessing on their video frame images, wherein different preprocessing may be performed or the same preprocessing may be performed.

In this step, the preprocessing may include a clipping process. When the video frame image is cut and preprocessed, organs can be detected according to different endoscopes, and corresponding cutting areas are fixed for cutting. If the enteroscope corresponds to the cutting area of the enteroscope video frame image, the gastroscope corresponds to the cutting area of the gastroscope video frame image. The cropping areas of the two endoscope video frame images can be the same or different, and can be set according to the actual detection requirement. In addition to fixing the corresponding clipping region for clipping, the contour of the effective region can be obtained by edge detection algorithm such as Roberts, Prewitt, Sobel, Canny, etc., and the effective region can be clipped according to the contour. Or the effective area can be identified by an effective area identification method based on a neural network and cut based on the effective area. Specifically, the effective region identification method based on the neural network may include: and identifying the video frame image by using the trained edge cutting network model to obtain an effective information area. And then, based on the identified effective information area, cutting the video frame image, removing the ineffective information part at the edge of the image and reserving the useful information area. The trained edge cutting network model can be obtained by training an initial convolutional neural network model and an effective region training set. The method for identifying the effective area and correspondingly cutting the effective area based on the effective area identification method of the neural network can improve the application universality of the cutting method, and can be suitable for various types of endoscopes, video frame images with different brightness and colors obtained by video data obtained by the endoscope under various different light source types, and endoscope video frame images under different detection scenes.

The preprocessing can include image scaling processing, so that the time for subsequent processing of the image is saved, and the efficiency is improved. Specifically, the image scaling process may use an image scaling algorithm known in the art, such as a nearest neighbor interpolation algorithm, a Bilinear interpolation algorithm (Bilinear), a Bicubic interpolation algorithm (Bicubic), or the like.

In one implementation scenario, a video frame image is captured, with a size of 1920 x 1080. And performing effective area identification on the video frame image by using an effective area identification method based on a neural network, and cutting the video frame image based on the identified effective area to obtain an image with the size of 1250 × 1080. According to the maximum value of the length and the width of the cut image, 1250 × 1250 black background pictures are generated, and the cut image is pasted in the middle of the black background pictures, so that the length-width ratio is 1:1, and the size is 1250 × 1250. The aspect ratio is 1:1, so that the image is not deformed when the image is subjected to subsequent scaling and other processing. The image with dimensions 1250 x 1250 was then scaled using a two-line interpolation algorithm to a scaled dimension of 608608. The double-line interpolation value algorithm is a relatively fidelity image scaling algorithm, and fully utilizes four real pixel values around a virtual point in a source image to jointly determine one pixel value in a target image, so that the scaling effect is better than that of the nearest adjacent interpolation, the image quality after scaling is high, the condition of discontinuous pixel values cannot occur, and the characteristic information in the image can be effectively prevented from being excessively lost.

The double-line interpolation value is linear interpolation expansion of an interpolation function with two variables, and the idea is to perform linear interpolation in two directions respectively and obtain a pixel to be solved through interpolation of four adjacent pixels. And selecting a coordinate system, enabling the coordinates of the four adjacent pixels to be respectively (0,0), (0,1), (1,0) and (1,1), and calculating the value of the pixel to be solved through an interpolation formula.

The interpolation formula is shown in formula (I).

f (x, y) = f (0,0) (1-x) (1-y) + f (1,0) x (1-y) + f (0,1) (1-x) y + f (1,1) xy formula (I)

The interpolation formula is expressed as formula (II) by matrix operation.

Formula (II)

S23: and (4) performing focus identification detection on the preprocessed image by using the trained focus detection model to obtain a detection result.

And (4) performing focus identification detection on the preprocessed image obtained by preprocessing in the S22 through the trained focus detection model, and outputting a detection result by the focus detection model. The detection result can be two types, one type is that no focus is identified, and no focus display frame is displayed; alternatively, to identify the lesion and the location of the lesion, a subsequent step S24 may be performed. The position data of the lesion may be region coordinate information, such as coordinates of two diagonal points in the frame region, or coordinates of four diagonal points in the frame region.

When the focus is identified, position data of the focus, that is, position region information of the focus, can also be identified. In this embodiment, the lesion detection model may be a single model, and the lesion identification and the lesion position identification can be realized by one model. In other embodiments, the lesion detection model may be a composite model, i.e., a composite of two or more recognition models, wherein a portion of the models performs lesion recognition and a portion of the models performs lesion location recognition. In this case, the lesion location recognition and the lesion location recognition may be performed simultaneously, or the lesion location recognition may be performed after the lesion is recognized. The lesion detection model in the present application may be a model for performing image recognition and location recognition known in the art, such as a Yolo target detection algorithm, an SSD target detection algorithm, a fast-RCNN target detection algorithm, and the like. The Yolo target detection algorithm comprises multiple versions of Yolo V1, Yolo V2, Yolo V3 and the like.

In one embodiment, referring to fig. 3, fig. 3 is a schematic flow chart of an embodiment of S23, including the following steps:

s231: and obtaining an initial target detection model and a focus training set.

In this embodiment, the initial target detection model may be a YoloV3 target detection algorithm model.

The focus training set can be a stomach focus training set, an intestinal focus training set, an esophagus focus training set and other focus training sets related to digestive tracts; can be a focus training set corresponding to other detection parts, such as an oral cavity focus training set, a urethra bladder focus training set and the like.

One focus training set may be a focus training set corresponding to only one detection site, or may be a focus training set corresponding to a plurality of different detection sites. For example, one lesion training set is a lesion training set, which is only for the stomach and includes a plurality of stomach images. The plurality of stomach images include a plurality of images containing stomach lesions and a plurality of images without lesions, and the number of the lesions in one stomach image can be one or more. In addition, the plurality of stomach images can also correspond to different light source types and can also comprise a plurality of different types of focuses of the stomach. Of course, a lesion training set may also correspond to a plurality of detection sites, such as a training set of digestive tract, and a plurality of organs of digestive tract, which may include stomach images, intestine images, and the like.

Further, when one focus training set may correspond to a plurality of detection sites or different endoscope application scenarios, one focus training set may be divided into a plurality of sub-focus training sets, for example, the digestive tract training set is divided into a stomach sub-focus training set and an intestine sub-focus training set.

S232: and training the initial target detection model by using a focus training set to obtain a focus detection model.

And inputting the images in the focus training set into an initial target detection model, comparing the output result of the model with the labels corresponding to the images in the focus training set, and performing back propagation, and continuously adjusting the weight and the bias parameters of the model to continuously optimize the model, thereby realizing the training of the initial target detection model and obtaining the focus detection model capable of identifying the focus.

In this step, different lesion detection models can be trained using different lesion training sets. A focus detection model corresponding to a detection part can be obtained by training by using a focus training set of the detection part. For example, an intestinal lesion training set is used for training an initial target detection model to obtain an intestinal lesion detection model, so that the specificity and pertinence of the lesion detection model are improved. Certainly, the method can be further refined, a plurality of different focus training sets, such as focus training sets corresponding to intestinal polyps, are respectively set for different types of focuses caused by different lesions of the intestinal part, a focus detection model only aiming at the intestinal polyps is obtained through training, and the specificity and the accuracy of the focus detection model are improved.

On the other hand, a focus detection model applicable to a plurality of detection positions can be obtained through training by aiming at a focus training set of the plurality of detection positions, and the application universality of the focus detection model is improved. And the initial target detection model is trained by using a plurality of different focus training sets, so that a focus detection model which can be suitable for a plurality of detection parts can be obtained. It can be understood that the accuracy of the focus model is positively correlated with the number of samples in the focus training set, and the accuracy of the focus model obtained by training is correspondingly increased due to the large number of samples.

S233: and (4) performing focus identification detection on the preprocessed image by using a focus detection model to obtain a detection result.

And inputting the preprocessed image in the S22 into a trained lesion detection model to obtain a detection result.

S24: in response to the detection result being that the lesion and lesion location data are identified, a lesion detection box is generated and displayed on the second canvas based on the lesion location data. The second canvas is located on the upper layer of the first canvas, and the second canvas is a background transparent canvas.

And when the detection result is that the focus and the focus position data are identified, generating and displaying a focus detection frame on a second canvas according to the focus position data. When the detection result shows that the focus is not identified, the step is not carried out, and a focus detection frame is not generated.

Referring to fig. 4, fig. 4 is a schematic flow chart of another embodiment of a method for displaying a lesion detection frame provided in the present application, which specifically includes:

s31: generating and displaying a video picture corresponding to the endoscope video data on a first canvas based on the endoscope video data acquired in real time, and acquiring a video frame image and the current time corresponding to the video frame image at a preset frequency based on the endoscope video data acquired in real time; wherein the preset frequency is determined based on the recognition detection efficiency of the trained lesion detection model.

In this embodiment, the endoscope acquires video data in real time, and in this step, the video data acquired by the endoscope is acquired in real time, a real-time video picture is generated and displayed on the first canvas, and a video frame image and current time corresponding to the video frame image are acquired at a preset frequency. That is, a frame of video image in the real-time video data is acquired, and the current time of the frame of video image is acquired at the same time. The preset frequency may be determined based on the efficiency of lesion identification in the subsequent step. The preset frequency may also be referred to as obtaining video frame images at a certain time interval, and since a certain amount of time is required for the lesion detection model to perform lesion identification, the time interval of the video frame images required to be input into the lesion detection model to perform lesion identification needs to be matched with the efficiency of the lesion detection model. If the time interval for acquiring the video frame images is too short, the previous frame of video frame image may be input into the focus detection model for identification, and the detection result is not output, and the next frame of video frame image is already in the state of waiting for focus identification, so that the focus detection model cannot be smoothly and timely identified for each video frame image, the situation that the video frame images to be identified are continuously accumulated can occur, and the timeliness and the matching of the focus detection frame display are not facilitated.

S32: and preprocessing the video frame image to obtain a preprocessed image.

This step can refer to the related description of S22, and will not be described herein.

S33: and performing focus identification detection on the preprocessed image by using the trained focus detection model to obtain a detection result.

This step can refer to the related description of S23, and will not be described herein.

S34: responding to the detection result to identify the focus and focus position data, and storing the focus position data and the current time corresponding to the video frame image in a list to be displayed; wherein, the focus position data and the current time corresponding to the video frame image are a group of data to be displayed.

And storing the focus position data P and the current time T corresponding to the video frame image into a List to be displayed when the detection result is that the focus and the focus position data P are identified, wherein the List to be displayed is List = [ (P, T) ]. In the present embodiment, the lesion position data P may be represented as frame region coordinate point information P [ [ x1, y1, x2, y2]. If a plurality of focuses are identified in a video frame image, the focus position data can be coordinate point information of a frame area, and the frame area coordinates cover the positions corresponding to the focuses; the lesion position data may also be frame region coordinate point information corresponding to each lesion. Wherein, the focus with the closer position may be identified and output a frame region coordinate point information. It can be understood that the list to be displayed may store other data according to the requirement, besides the lesion position data P and the current time T corresponding to the video frame image.

In this embodiment, the List to be displayed List is obtained and the lesion position data P stored in the List to be displayed Lis is generated and displayed in the lesion detection frame according to the first-in first-out principle, and the data is removed from the List to be displayed after being obtained, so as to reduce the data storage pressure.

S35: and acquiring a group of data to be displayed from the list to be displayed, and displaying a focus detection frame in a second canvas based on focus position data in the data to be displayed.

In one embodiment, only one lesion position data is maintained in the to-be-displayed list for a period of time, a lesion position data is stored in the to-be-displayed list while a previous lesion position data is extracted and cleared, and a lesion detection frame is displayed in the second canvas based on the previous lesion position data. In other words, in the time period when the focus is detected, the focus detection frame can be continuously and stably displayed, the visual effect and the assistance of endoscopy are improved, and therefore a doctor can more accurately detect and judge the focus through the displayed video picture and the focus detection frame. Of course, there may be no lesion position data in the list to be displayed, i.e. it means that no lesion is identified in a plurality of video frame images within a period of time.

In another embodiment, at least two sets of data to be displayed are included in the list to be displayed. In this case, before S35 is performed, at least two sets of data to be displayed may be dynamically merged into one set of data to be displayed, and then a focus detection frame may be generated and displayed for the merged set of data to be displayed, thereby avoiding visual jitter and instability of the focus detection frame due to large differences between focus positions corresponding to several consecutive video frame images and large differences between display positions corresponding to the focus detection frames. After dynamic combination, the position difference of the focus detection frames generated in two adjacent times is small, more deviation can not occur, and the focus detection frames are stably displayed visually.

If the to-be-displayed List includes at least two sets of to-be-displayed data, and the length L of the List data of the to-be-displayed List is greater than 1, the to-be-displayed List = [ P1, P2. ]. After dynamic merging, the List to be displayed = [ P ], and the data length of the List to be displayed L = 1.

Specifically, at least two sets of data to be displayed can be dynamically combined into one set of data to be displayed through a non-maximum suppression algorithm. The dynamic merging method adopts a Non-maximum suppression (NMS) algorithm to remove redundant data of the list to be displayed. The NMS conversion formula is as shown in formula (III).

Formula (III)

Wherein, shresh is a threshold value, and can be 0-0.5, such as 0.5. IoU is intersection-over-unity. The algorithm flow of the NMS comprises: n frames are set, and the fraction of each frame calculated by the classifier is Si, 1< = i < = N. A set H of candidate frames to be processed is built and initialized to contain all N frames. A set M for storing the optimal frames is built and initialized to be an empty set. Sorting all frames in the set H, selecting the frame M with the highest score, and moving from the set H to the set M. And traversing the boxes in the set H, respectively calculating intersection ratios IoU with the boxes m, if the intersection ratios are higher than a threshold value shresh, considering that the boxes are overlapped with the boxes m, and removing the boxes from the set H, thereby dynamically merging to obtain the optimal boxes.

In another embodiment, before S35, that is, before the step of obtaining a set of data to be displayed from the list to be displayed and displaying the lesion detection box in the second canvas based on the lesion position data in the data to be displayed, the data to be displayed that has expired in the list to be displayed may be filtered and removed based on the current time, the preset duration and the current time to be displayed corresponding to the video frame image in the list to be displayed.

In step S34, the lesion position data in the List to be displayed is dynamic, and the length of the lesion position data is also dynamic. The length L may be 0, which represents no lesion and lesion position data; the length L is 1 in a period of time, which indicates that a focus exists in a corresponding video frame image in the video data in the period of time, and the recognition speed of the focus is matched with the generation speed of the focus detection frame. If the length L is increased to a certain degree, the focus detection frame is not generated and displayed in time within a period of time, partial data in the list to be displayed is overdue, if the overdue partial data is generated and displayed, the focus detection frame may not be matched with a video picture generated and displayed on a first canvas, the focus detection frame does not correspond to the focus position in the video picture, and offset exists, namely the focus in the video picture is selected by the focus detection frame which is not accurately framed. In addition, when the length L is large, if dynamic merging such as NMS algorithm is performed, time consumption for dynamic merging is also increased.

And before displaying the data in the list to be displayed, carrying out overdue judgment on the data in the list to be displayed, and removing the overdue data. For example, when the method for generating the lesion detection frame is called, the current time N is obtained, which is the current time to be displayed. The to-be-displayed List = [ (P1, T1), (P2, T2),. ] is P1, P2, etc. as lesion position data corresponding to adjacent two or more video frame images, and T1, T2, etc. as the current time T corresponding to adjacent two or more video frame images. Here, the preset duration S is the preset duration in step S36. And judging to remove the outdated data based on the current time T, the preset duration S and the current time N to be displayed corresponding to the video frame images in the list to be displayed. The method specifically comprises the following steps: calculating TS = T + S, comparing TS with the current time N to be displayed, if TS is greater than the current time N to be displayed, indicating that the data to be displayed is expired, removing the expired data without displaying the expired data, and comparing TS of each data in a List to be displayed = [ (P1, T1), (P2, T2),. ] with the current time N to be displayed. The data in the List to be displayed may also be compared, and the time ratio Δ T = N-TS of each data is compared. If Δ T >0, remove the data from the List to be displayed; if Δ T is less than or equal to 0, then step S35 is executed.

S36: after the focus detection frame is displayed for the preset duration, the focus detection frame is removed from the second canvas; wherein the preset duration is determined based on the recognition detection efficiency of the trained lesion detection model.

Due to the limited dynamic perception capability of human eyes, when video synthesized by 25 frames/s of images is played, human eyes perceive continuous pictures. Therefore, the focus detection frame is cleared after the preset duration is displayed, and the focus detection frame perceived by human eyes is also continuously displayed, so that a better focus detection frame display effect is provided.

In one embodiment, the lesion detection model is a YoloV3-608 model, and the average identification time of the lesion detection model is 50 milliseconds, and since the model prediction time depends on the hardware environment configuration and the actual operating conditions, the identification time is in a fluctuating state, which may be in the range of 30-70 milliseconds. The recognition and detection efficiency of the lesion detection model is inversely proportional to the time consumed for recognition. For example, the recognition detection efficiency is F =1000/time, wherein time is the average recognition time of the lesion detection model and is measured in milliseconds. In this embodiment, time is 50 milliseconds and the recognition detection efficiency F is 20. The preset frequency in step S31 may be set based on the recognition and detection efficiency F, for example, the preset frequency may be the same as the recognition and detection efficiency F, at this time, the time interval of sampling is the same as the average recognition time of the lesion detection model, at this time, the recognition efficiency is matched with the frequency of sampling, and the image acquisition and the recognition and detection are balanced. The sampling time interval can be slightly larger than the average identification time of the focus detection model, and the phenomenon that the whole display method is unsmooth due to the fact that too many images are located to be recognized by the focus identification model because too fast image sampling exceeds the recognition speed of the focus identification model is avoided. The preset duration in step S36 may also be set based on the recognition detection efficiency F, for example, the preset duration is the same as the average recognition time of the lesion detection model, and at this time, the lesion detection frame display and the lesion recognition are balanced, so as to achieve continuity of the lesion detection frame display. The preset duration time can also be slightly longer than the average identification time of the focus detection model, and the continuity of focus detection frame display can also be realized. If the preset duration time is too long compared with the average recognition time of the focus detection model, too much data to be displayed is stored, the data of the focus detection frame is not displayed in time or even lags seriously, the position of the displayed focus detection frame is greatly deviated from the position of the focus in the video picture, and the display of the focus detection frame is inaccurate.

Referring to fig. 5, fig. 5 is a schematic diagram of a frame of an embodiment of a display device of a lesion detection frame provided in the present application. The display device 50 includes:

and a display module 51, configured to generate and display a video frame corresponding to the endoscopic video data on a first canvas based on the acquired endoscopic video data, and to generate and display a lesion detection frame on a second canvas based on lesion position data in response to a detection result being that a lesion and lesion position data are identified.

And an acquiring module 52 for acquiring video frame images based on the acquired endoscopic video data.

And the preprocessing module 53 is configured to preprocess the video frame image to obtain a preprocessed image.

And the identification detection module 54 is configured to perform focus identification detection on the preprocessed image by using the trained focus detection model to obtain a detection result.

The display device 50 improves the accuracy and precision of lesion recognition by preprocessing the acquired video frame images and then performing lesion recognition and detection using a lesion detection model. The focus is identified and the position data of the focus is identified in the video frame image, the second canvas is used for generating the focus detection frame, the first canvas continuously generates and displays video pictures based on the acquired endoscope video data, the video pictures on the first canvas do not influence the display of the focus detection frame, and the display stability and the continuity of the focus detection frame are improved; meanwhile, the accuracy and the efficiency of identifying the focus and the position of the focus are improved, so that the accuracy of highlighting the focus in a display picture by a focus detection frame is improved, and the accuracy and the efficiency of diagnosis under an endoscope are improved.

In some embodiments of the present application, the display module 51 is further configured to generate and display a video frame corresponding to the endoscope video data on the first canvas based on the endoscope video data acquired in real time. The video frame image is used for displaying the focus position data and the video frame image; wherein, the focus position data and the current time corresponding to the video frame image are a group of data to be displayed. And the system is used for acquiring a group of data to be displayed from the list to be displayed and displaying a focus detection frame in the second canvas based on focus position data in the data to be displayed. And clearing the focus detection frame on a second canvas after the focus detection frame is displayed for the preset duration. The preset frequency is determined based on the recognition detection efficiency of the trained focus detection model; the preset duration is determined based on the recognition detection efficiency of the trained lesion detection model. The obtaining module 52 is further configured to obtain the video frame image and the current time corresponding to the video frame image at a preset frequency based on the endoscope video data obtained in real time.

In some embodiments of the present application, the display module 51 is further configured to dynamically merge at least two sets of data to be displayed into one set of data to be displayed when the list to be displayed includes at least two sets of data to be displayed.

In some embodiments of the present application, the display module 51 is further configured to dynamically merge at least two sets of data to be displayed into one set of data to be displayed through a non-maximum suppression algorithm.

In some embodiments of the present application, the display module 51 is further configured to filter and remove the expired data to be displayed in the list to be displayed based on the current time, the preset duration and the current time to be displayed corresponding to the video frame image in the list to be displayed.

In some embodiments of the present application, the recognition detection module 54 is further configured to obtain an initial target detection model and a lesion training set; training an initial target detection model by using a focus training set to obtain a focus detection model; and (4) performing focus identification detection on the preprocessed image by using a focus detection model to obtain a detection result.

In some embodiments of the present application, the preprocessing module 53 is further configured to identify the video frame image by using the trained edge cutting network model to obtain an effective information area; and cutting the video frame image based on the effective information area.

Referring to fig. 6, fig. 6 is a schematic frame diagram of an embodiment of an electronic device according to the present application. The electronic device 60 comprises a memory 61 and a processor 62 coupled to each other, and the processor 62 is configured to execute program instructions stored in the memory 61 to implement the steps of any of the embodiments of the method for displaying a lesion detection frame described above. In one particular implementation scenario, electronic device 60 may include, but is not limited to: display device, microcomputer, server.

In particular, the processor 62 is configured to control itself and the memory 61 to implement the steps of any of the above-described methods for displaying a lesion detection frame. The processor 62 may also be referred to as a CPU (Central Processing Unit). The processor 62 may be an integrated circuit chip having signal processing capabilities. The Processor 62 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 62 may be collectively implemented by an integrated circuit chip.

In the above scheme, the processor 62 generates and displays a video picture corresponding to the endoscope video data on the first canvas based on the acquired endoscope video data, and acquires a video frame image based on the acquired endoscope video data; preprocessing a video frame image to obtain a preprocessed image; performing focus identification detection on the preprocessed image by using the trained focus detection model to obtain a detection result; generating and displaying a lesion detection box on a second canvas based on the lesion location data in response to the detection result being that the lesion and the lesion location data are identified; the second canvas is located on the upper layer of the first canvas, and the second canvas is a background transparent canvas. The acquired video frame image is preprocessed, and then the focus detection model is used for focus identification detection, so that the accuracy and the precision of focus identification are improved. The focus is identified and the position data of the focus is identified in the video frame image, the second canvas is used for generating the focus detection frame, the first canvas continuously generates and displays video pictures based on the acquired endoscope video data, the video pictures on the first canvas do not influence the display of the focus detection frame, and the display stability and the continuity of the focus detection frame are improved; meanwhile, the accuracy and the efficiency of identifying the focus and the position of the focus are improved, so that the accuracy of highlighting the focus in a display picture by a focus detection frame is improved, and the accuracy and the efficiency of diagnosis under an endoscope are improved.

Referring to fig. 7, fig. 7 is a block diagram illustrating an embodiment of a computer-readable storage medium according to the present application. The computer readable storage medium 70 stores program instructions 700 executable by the processor, the program instructions 700 being for implementing the steps of any of the above-described embodiments of a method of displaying a lesion detection frame. For example, the program instructions 700 are for implementing the steps of:

generating and displaying a video picture corresponding to the endoscope video data on a first canvas based on the acquired endoscope video data, and acquiring a video frame image based on the acquired endoscope video data; preprocessing a video frame image to obtain a preprocessed image; performing focus identification detection on the preprocessed image by using the trained focus detection model to obtain a detection result; generating and displaying a lesion detection box on a second canvas based on the lesion location data in response to the detection result being that the lesion and the lesion location data are identified; the second canvas is located on the upper layer of the first canvas, and the second canvas is a background transparent canvas.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely one type of logical division, and an actual implementation may have another division, for example, a unit or a component may be combined or integrated with another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form. Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on network elements. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The image recognition method, the image recognition device, the server and the storage medium based on deep learning provided by the embodiments of the present application are described in detail above, and a specific example is applied in the present application to explain the principle and the implementation of the present invention, and the description of the above embodiments is only used to help understanding the method and the core idea of the present invention; meanwhile, for those skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A display method of a lesion detection frame is characterized by comprising the following steps:

generating and displaying a video picture corresponding to the endoscope video data on a first canvas based on the acquired endoscope video data, and acquiring a video frame image based on the acquired endoscope video data;

preprocessing the video frame image to obtain a preprocessed image;

performing focus identification detection on the preprocessed image by using a trained focus detection model to obtain a detection result;

in response to the detection result being that a lesion and lesion position data are identified, generating and displaying a lesion detection box on a second canvas based on the lesion position data;

wherein the generating and displaying a video picture corresponding to the endoscope video data on a first canvas based on the acquired endoscope video data, and acquiring a video frame image based on the acquired endoscope video data, comprises:

generating and displaying a video picture corresponding to the endoscope video data on a first canvas based on the endoscope video data acquired in real time, and

acquiring a video frame image and current time corresponding to the video frame image at a preset frequency based on endoscope video data acquired in real time;

generating and displaying a lesion detection box on a second canvas based on the lesion location data in response to the detection result being the identification of a lesion and lesion location data, comprising:

in response to the detection result being that the focus and the focus position data are identified, storing the focus position data and the current time corresponding to the video frame image in a list to be displayed; wherein the focus position data and the current time corresponding to the video frame image are a set of data to be displayed;

acquiring a group of data to be displayed from the list to be displayed, and displaying a focus detection frame in a second canvas based on the focus position data in the data to be displayed;

the acquiring a set of the data to be displayed from the list to be displayed, and after displaying a lesion detection frame in a second canvas based on the lesion position data in the data to be displayed, further includes:

after the focus detection frame is displayed for a preset duration, the focus detection frame is cleared on the second canvas;

the second canvas is positioned on the upper layer of the first canvas and is a background transparent canvas; the preset frequency is determined based on the recognition detection efficiency of the trained focus detection model; the preset duration is determined based on a recognition detection efficiency of the trained lesion detection model.

2. The display method according to claim 1, wherein the list to be displayed includes at least two sets of data to be displayed;

before the acquiring a set of the data to be displayed from the list to be displayed and displaying a lesion detection frame in a second canvas based on the lesion position data in the data to be displayed, the method further includes:

and dynamically combining the at least two groups of data to be displayed into one group of data to be displayed.

3. The method according to claim 2, wherein the dynamically merging the at least two sets of data to be displayed into one set of data to be displayed comprises:

and dynamically combining the at least two groups of data to be displayed into a group of data to be displayed through a non-maximum suppression algorithm.

4. The display method according to claim 1, wherein the obtaining a set of the data to be displayed from the list to be displayed, and before displaying a lesion detection box in a second canvas based on the lesion position data in the data to be displayed, further comprises:

and filtering and removing the outdated data to be displayed in the list to be displayed based on the current time, the preset duration and the current time to be displayed corresponding to the video frame images in the list to be displayed.

5. The method according to claim 1, wherein the performing lesion recognition detection on the preprocessed image by using the trained lesion detection model to obtain a detection result comprises:

acquiring an initial target detection model and a focus training set;

training the initial target detection model by using the focus training set to obtain the focus detection model;

and carrying out focus identification detection on the preprocessed image by using the focus detection model to obtain a detection result.

6. The method according to claim 1, wherein said pre-processing the video frame image to obtain a pre-processed image comprises:

recognizing the video frame image by using the trained edge cutting network model to obtain an effective information area;

and cutting the video frame image based on the effective information area.

7. A display device of a lesion detection frame, the display device comprising:

the display module is used for generating and displaying a video picture corresponding to the endoscope video data on a first canvas based on the acquired endoscope video data;

the acquisition module is used for acquiring a video frame image based on the acquired endoscope video data and acquiring the current time corresponding to the video frame image and the video frame image at a preset frequency based on the endoscope video data acquired in real time;

the preprocessing module is used for preprocessing the video frame image to obtain a preprocessed image;

the identification detection module is used for carrying out focus identification detection on the preprocessed image by utilizing the trained focus detection model to obtain a detection result;

the display module is further used for responding to the detection result to identify the focus and focus position data, generating and displaying a focus detection frame on a second canvas based on the focus position data, generating and displaying a video picture corresponding to the endoscope video data on a first canvas based on the endoscope video data acquired in real time, and storing the focus position data and the current time corresponding to the video frame image in a list to be displayed in response to the detection result to identify the focus and focus position data; wherein the focus position data and the current time corresponding to the video frame image are a set of data to be displayed; the display device is used for acquiring a group of data to be displayed from the list to be displayed and displaying a focus detection frame in a second canvas based on the focus position data in the data to be displayed; the second canvas is used for clearing the focus detection frame after the focus detection frame is displayed for a preset duration; wherein the second canvas is located on the first canvas upper layer, and the second canvas is a background transparent canvas.

8. An electronic device comprising a memory and a processor coupled to each other, wherein the processor is configured to execute program instructions stored in the memory to implement the method for displaying a lesion detection frame according to any one of claims 1 to 6.

9. A computer-readable storage medium having stored thereon program instructions that, when executed by a processor, implement the method of displaying a lesion detection frame according to any one of claims 1 to 6.