CN112766066A

CN112766066A - Method and system for processing and displaying dynamic video stream and static image

Info

Publication number: CN112766066A
Application number: CN202011632065.XA
Authority: CN
Inventors: 陈宗喜; 杜强; 张龙
Original assignee: Beijing Xbentury Network Technology Co ltd
Current assignee: Beijing Xbentury Network Technology Co ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-05-07

Abstract

The invention discloses a method, a system, electronic equipment and a storage medium for processing and displaying dynamic video stream and static image, which relate to the technical field of artificial intelligence, wherein the method comprises the following steps: obtaining a frame extraction image according to the dynamic video stream, generating video stream data from the video stream with a prediction result after frame extraction prediction, and storing or playing and displaying the video stream data; classifying, detecting and segmenting the focus according to the frame-extracting image to obtain static picture data, storing or playing and displaying the static picture data, wherein the dynamic video stream and the static picture have the same source, the ai prediction result is the same, and the generated detected and segmented coordinate points are the same, so that the positions marked on the image by the current dynamic video frame-extracting image and the static image are the same, the current dynamic video frame-extracting image and the current dynamic video frame-extracting image are stored into the static image and are sent to the client to display the picture and the suspicious position of the focus, and the image and the ai result displayed by the client are correct and are the same as the ai result in the dynamic video.

Description

Method and system for processing and displaying dynamic video stream and static image

Technical Field

The present invention relates to the field of artificial intelligence technologies, and in particular, to a method and a system for processing and displaying a dynamic video stream and a static image, an electronic device, and a storage medium.

Background

Dicom (digital Imaging and Communications in medicine), which is an international standard for medical images and related information, is an international standard for medical Imaging and Communications. It defines a medical image format that can be used for data exchange with a quality that meets clinical needs. The probe is moved at the scanning position by the operation doctor to carry out real-time dynamic imaging examination, when the probe scans a suspicious position, the image is frozen, the image is carefully observed, a diagnosis suggestion is given, and the diagnosis suggestion is written into a diagnosis report by a reporting doctor. That is, the sonographer must observe both dynamic video streaming images and still pictorial images. At present, artificial intelligence technology in the field of image recognition is based on image pixel processing.

At present, the mode of acquiring the video stream of the ultrasonic equipment by a video acquisition card can meet the requirements of a dynamic mode and a static mode. However, there are two problems:

the dynamic video ai identification result and the static image ai identification result are inconsistent

The doctor's workflow is to search for suspicious positions through dynamic video streams and then to perform further focus judgment on static pictures. The ai on the dynamic video can identify the frame-drawing image in real time, the identification result is drawn on the picture in real time, and a doctor intercepts an image from the real-time video stream at the moment, so that the problem that the image identification result is inconsistent with the dynamic video identification result exists.

In order to ensure a real-time effect, the current video push-pull streaming technology allows a certain packet loss rate under the conditions of network congestion and the like, and the display effect is that the video is displayed in real time, but a little mosaic may exist. Therefore, the video stream received by the linux server may have packet loss, so that the framed image is not a finished image.

The Windows client side directly obtains the video stream from the acquisition card, and the intercepted image is complete.

Pixels of the linux dynamic video frame extraction image and the windows intercepted static image are different, so ai identification results are different

The above results in unstable products, which brings troubles to doctors, even misleads the doctors, and causes medical accidents.

Low real-time performance

And performing rtsp stream pushing and stream pulling twice, wherein the rtsp stream pushing needs to finish video stream coding first and then send the video stream through a network, and the rtsp stream pulling needs to receive the video stream through the network first and then decode the video stream. The CPU is occupied by encoding and decoding, and the network bandwidth is occupied greatly. Each link can cause certain delay, and finally, the video stream with the ai prediction result has larger delay relative to the original picture of the ultrasonic equipment, so that the user experience is poor, and the workload is increased for doctors.

Disclosure of Invention

An object of the present invention is to provide a method, system, electronic device and storage medium for processing and displaying a dynamic video stream and a static image to solve at least one of the problems mentioned in the background art.

In a first aspect, an embodiment of the present invention provides a method for processing and displaying a dynamic video stream and a static image, which is used for assisting a doctor to diagnose, and includes the following steps:

obtaining a frame extraction image according to the dynamic video stream, generating video stream data from the video stream with the detected and segmented coordinate points and the classification result after frame extraction prediction, and storing or playing and displaying the video stream data;

and classifying, detecting and segmenting the focus according to the frame-extracting image to obtain static picture data, and storing or playing and displaying the static picture data.

Optionally, obtaining a frame-extracted image according to the dynamic video stream, after frame-extracting prediction, generating video stream data from the video stream with the prediction result, and storing or playing and displaying the video stream data includes:

decoding an original video stream through an ai model to obtain each frame of image, and then extracting frames at intervals of a preset number of image frames according to available resources of system hardware to obtain a frame-extracted image and a first image without frames;

inputting the frame-extracted image into an ai model, classifying, detecting and segmenting the focus, and drawing the detected and segmented coordinate points and the classification result to obtain a second image;

and encoding the first image and the second image into a target video stream, and storing or playing and displaying the target video stream.

Optionally, classifying, detecting and segmenting the focus according to the frame-extracted image to obtain static image data, and storing or playing and displaying the static image data includes:

inputting the frame-extracted image into an ai model, and classifying, detecting and segmenting;

and drawing the detected and segmented coordinate points and the classification result to obtain static picture data, and storing or playing and displaying the static picture data.

In a second aspect, an embodiment of the present invention provides a dynamic video stream and still image processing display system, where the display system includes:

the system comprises an ultrasonic ai host and a control unit, wherein the ultrasonic ai host is provided with a linux operating system and is provided with a video acquisition device, and the video acquisition device is used for acquiring an original video stream;

an ultrasound device connected to the video acquisition device by a video line;

the processing module is used for classifying, detecting and segmenting the focus according to the frame-extracting image to obtain static picture data, and storing or playing and displaying the static picture data;

the processing module is further used for obtaining a frame extraction image according to the dynamic video stream, generating video stream data from the video stream with the detected and segmented coordinate points and the classification result after frame extraction prediction, and storing or playing and displaying the video stream data.

Optionally, the display system further comprises: and the external control end is connected with the ultrasonic ai host.

In a third aspect, an embodiment of the present invention provides an image processing display apparatus, including:

the first acquisition module is used for acquiring the dynamic video stream;

the processing module is used for obtaining a frame extraction image according to the dynamic video stream, generating video stream data from the video stream with the detected and segmented coordinate points and the classification result after frame extraction prediction, and storing or playing and displaying the video stream data;

the processing module is also used for classifying, detecting and segmenting the focus according to the frame-extracting image to obtain static picture data, and storing or playing and displaying the static picture data.

Optionally, the apparatus further comprises:

a second acquisition module; and the processing module is used for intercepting the static picture in the original video stream, inputting the static picture into an ai model through the processing module, and obtaining and displaying a target picture with the detected and segmented coordinate points and the classification result.

Further, the processing module comprises:

the first image acquisition unit is used for decoding the original video stream through the ai model, acquiring each frame of image, and then extracting frames from a preset number of image frames according to the available resources of system hardware to obtain a frame-extracted image and a first image without frames;

the second image acquisition unit is used for inputting the frame-extracted image into an ai model, classifying, detecting and segmenting the focus, and drawing the detected and segmented coordinate points and the classification result to obtain a second image;

and the image processing unit is used for encoding the first image and the second image into a target video stream and storing or playing and displaying the target video stream.

In a fourth aspect, the present invention provides an electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor implements the above method by executing the executable instructions.

In a fifth aspect, the present invention provides a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the above-described method.

Advantageous effects

The invention has proposed a dynamic video stream and static image processing display method, system, electronic device and storage medium, the said method gets and takes out the frame picture according to the dynamic video stream, take out the frame after predicting, produce the video stream data with prediction result video stream, and store or broadcast and display the said video stream data; classifying, detecting and segmenting the focus according to the frame-extracting image to obtain static picture data, storing or playing and displaying the static picture data, wherein the dynamic video stream and the static picture have the same source, the ai prediction result is the same, and the generated detected and segmented coordinate points are the same, so that the positions marked on the image by the current dynamic video frame-extracting image and the static image are the same, the current dynamic video frame-extracting image and the current dynamic video frame-extracting image are stored into the static image and are sent to the client to display the picture and the suspicious position of the focus, and the image and the ai result displayed by the client are correct and are the same as the ai result in the dynamic video.

Drawings

FIG. 1 is a flow chart of a method for processing and displaying a dynamic video stream and a static image according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a specific method of step S20 in FIG. 1;

FIG. 3 is a flowchart illustrating a specific method of step S40 in FIG. 1;

FIG. 4 is a block diagram of a motion video stream and still image processing display system according to an embodiment of the present invention;

FIG. 5 is a block diagram of a motion video stream and still image processing display device according to an embodiment of the present invention;

FIG. 6 is a block diagram of the processing module of FIG. 5;

fig. 7 is a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments, and it should be understood that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, features defined as "first", "second", may explicitly or implicitly include one or more of the described features. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

The invention will be further described with reference to the following description and specific examples, taken in conjunction with the accompanying drawings:

the present embodiment aims to provide a dynamic video stream and a still image processing and displaying method, which can make ai recognition results of a dynamic mode image and a still mode image consistent.

In the related art, the DICOM image file content is composed of two parts: a Header (Header) and picture point Data (Pixel Data) of the parameter information are packaged. Each DICOM file must include the file header. The header begins with the file preamble, which consists of 128 bytes 00H, followed by the DICOM prefix, which is a 4-byte string "DICM" that can be used to determine whether a file is a DICOM file. The header also includes other useful information such as the transmission format of the file, the application that generated the file, etc. The image pixel describes the luminance value of each point of the image. DICOM contains 4 content levels: patient; study (check); series (Series); image (Image). Although the contents of the previous layers are the same in many images, they are present in every image file. An image is composed of a plurality of Information entities (Information entities); the information entity is subdivided into modules (modules); the smallest unit inside each Module is called an Attribute (Attribute) or a data Element (Element). In the DICOM file, the position of each data element is stored in a fixed position, so that the corresponding data element can be found according to the offset of the storage position as long as the first address of the storage of the file in the memory is known. Compared with the simple medical image identification, the artificial intelligent diagnosis of the ultrasonic scanning image needs to invest more research and is much more troublesome to process. Compared with examination results such as magnetic resonance, CT and electrocardiogram, most of the ultrasound images rely on dynamic images of different sections acquired by doctors for diagnosis, and the requirements on the individual operating technical level of the sonographer are higher.

The doctor's workflow is to look for suspicious locations through dynamic video streaming first, and then to perform further focus judgment with static pictures. The ai on the dynamic video can identify the frame-drawing image in real time, the identification result is drawn on the picture in real time, and a doctor intercepts an image from the real-time video stream at the moment, so that the problem that the image identification result is inconsistent with the dynamic video identification result exists, troubles are brought to the doctor, and even the doctor is misled, and medical accidents are caused.

An embodiment of the present invention shown in fig. 1 provides a dynamic video streaming and still image processing display method for assisting diagnosis of a doctor, as shown in fig. 1, the dynamic video streaming and still image processing display method including the steps of:

s20, obtaining a frame-extracting image according to the dynamic video stream, generating video stream data from the video stream with the detected and segmented coordinate points and the classification result after frame-extracting prediction, and storing or playing and displaying the video stream data;

s40, classifying, detecting and segmenting the focus according to the frame-extracting image to obtain static picture data, and storing or playing and displaying the static picture data.

According to the method, a frame extraction image is obtained according to a dynamic video stream, after frame extraction prediction is carried out, video stream data is generated by the video stream with detected and segmented coordinate points and classification results, and the video stream data is stored or played and displayed; classifying, detecting and segmenting the focus according to the frame-extracting image to obtain static picture data, storing or playing and displaying the static picture data, wherein the dynamic video stream and the static picture have the same source, the ai prediction result is the same, and the generated detected and segmented coordinate points are the same, so that the positions marked on the image by the current dynamic video frame-extracting image and the static image are the same, the current dynamic video frame-extracting image and the current dynamic video frame-extracting image are stored into the static image and are sent to the client to display the picture and the suspicious position of the focus, and the image and the ai result displayed by the client are correct and are the same as the ai result in the dynamic video.

As shown in fig. 2, specifically, obtaining a frame-extracted image from a dynamic video stream, generating video stream data from the video stream with the detected and segmented coordinate points and the classification result after frame extraction prediction, and storing or playing and displaying the video stream data includes:

s201, decoding an original video stream through an ai model to obtain each frame of image, and then extracting frames at intervals of a preset number of image frames according to available resources of system hardware to obtain a frame-extracted image and a first image without frames;

s202, inputting the frame-extracted image into an ai model, classifying, detecting and segmenting a focus, and drawing a detected and segmented coordinate point and a classification result to obtain a second image;

s203, encoding the first image and the second image into a target video stream, and storing or playing and displaying the target video stream.

As shown in fig. 3, specifically, classifying, detecting, and segmenting the lesion according to the frame-extracted image to obtain static image data, and storing or playing and displaying the static image data includes:

s401, inputting the frame extraction image into an ai model, and classifying, detecting and segmenting;

s402, drawing the detected and segmented coordinate points and the classification result to obtain static picture data, and storing or playing and displaying the static picture data.

The embodiment of the invention also provides a system for processing and displaying the dynamic video stream and the static image, and the scheme in the related technology is that two hosts are provided, one host is a windows system, the windows system is provided with a video acquisition device and client software. The Windows system is user friendly and performs function interaction through the client. One is a linux system, equipped with ai modules. The Linux system is technically friendly to artificial intelligence and convenient to develop and deploy. The Windows host acquires the video stream of the ultrasonic equipment through the acquisition card, transmits the video stream to the linux host through an rtsp video stream transmission protocol, draws the ai result into the video stream after the ai processing is carried out on the linux host, and transmits the ai result to the Windows client for displaying through the rtsp protocol.

At present, artificial intelligence technology in the field of image recognition is based on image pixel processing. Common images such as bmp, jpg, png, etc. are stored with R, G, B three bytes per pixel. An image with 1920 × 1080 resolution consists of 1920 × 1080 pixels, and totally consists of 1920 × 1080 × 3 bytes, and each byte consists of 8 bits, that is, RGB values range from 0 to 255. In the two images, if only one bit of 1920 × 1080 × 3 × 8 bits is different, the two images are different, and ai is considered to be the two images, the pixel values are different, the prediction results are likely to be different, but no difference is observed with the naked eye.

To this end, as shown in fig. 4, an embodiment of the present invention provides a dynamic video stream and still image processing display system, the display system including:

an ultrasound ai host 100 having a linux operating system, the ultrasound ai host being equipped with a video acquisition device 101, the video acquisition device 101 may be, for example, a video acquisition card, the video acquisition device being configured to acquire an original video stream; the ultrasonic ai host 100 directly acquires the video stream from the video capture card, and the problem of network frame loss does not exist.

An ultrasound device 200 connected to the video acquisition apparatus by a video line;

the processing module 102 is configured to classify, detect and segment the focus according to the frame-extracted image to obtain static image data, and store or play and display the static image data; the display mode can be displayed by a Windows client 400 connected to the ultrasound ai host 100, for example;

the processing module 102 is further configured to obtain a frame-extracted image according to the dynamic video stream, generate video stream data from the video stream with the detected and segmented coordinate points and the classification result after frame extraction prediction, and store or play and display the video stream data; the display mode can be sent to the Windows client 400 connected with the ultrasound ai host 100 through rtsp for display; only once rtsp push flow and pull flow, the cpu resource occupation is less, the network bandwidth occupation is less, and the time delay is small.

Preferably, the display system further comprises: the external control terminal 300, for example, may be a handle button, which is connected to the ultrasound ai host 100, for example, through a USB connection. By pressing a handle key, a doctor captures a trigger event, and stores a frame-drawing image and a static picture locally in the memory.

As shown in fig. 4, an embodiment of the present invention provides a motion video stream and still image processing display apparatus, including:

a first obtaining module 20, configured to obtain a dynamic video stream;

a second obtaining module 60; the processing module is used for intercepting a static picture in the original video stream, inputting the static picture into an ai model through the processing module, and obtaining and displaying a target picture with detected and segmented coordinate points and a classification result;

the processing module 102 is used for obtaining a frame extraction image according to the dynamic video stream, generating video stream data from the video stream with the detected and segmented coordinate points and the classification result after frame extraction prediction, and storing or playing and displaying the video stream data;

the processing module 102 is further configured to classify, detect, and segment the focus according to the frame-extracted image to obtain still picture data, and store or play and display the still picture data.

The device of this embodiment obtains a dynamic video stream through the first obtaining module 20, captures a static picture in the original video stream through the second obtaining module 60, inputs the static picture into the ai model through the processing module, and obtains and displays a target picture with detected and segmented coordinate points and classification results; the processing module 102 obtains a frame-extracting image according to the dynamic video stream, generates video stream data from the video stream with a prediction result after frame-extracting prediction, and stores or plays and displays the video stream data; classifying, detecting and segmenting the focus according to the frame-extracting image to obtain static picture data, storing or playing and displaying the static picture data, wherein the dynamic video stream and the static picture have the same source, the ai prediction result is the same, and the generated detected and segmented coordinate points are the same, so that the positions marked on the image by the current dynamic video frame-extracting image and the static image are the same, the current dynamic video frame-extracting image and the current dynamic video frame-extracting image are stored into the static image and are sent to the client to display the picture and the suspicious position of the focus, and the image and the ai result displayed by the client are correct and are the same as the ai result in the dynamic video.

Preferably, the apparatus further comprises:

specifically, the processing module 102 includes:

a first image obtaining unit 1021, configured to decode an original video stream through an ai model, obtain each frame of image, and then extract frames from a predetermined number of image frames according to available resources of system hardware to obtain a frame extracted image and a first image without frame extraction;

a second image obtaining unit 1022, configured to input the frame-extracted image into the ai model, classify, detect, and segment the focus, and draw a second image according to a coordinate point and a classification result detected and segmented;

and an image processing unit 1023, configured to encode the first image and the second image into a target video stream, and store or play the target video stream for display.

An electronic device is also provided in the embodiments of the present application, and fig. 6 shows a schematic structural diagram of an electronic device to which the embodiments of the present application can be applied, and as shown in fig. 6, the computer electronic device includes a Central Processing Unit (CPU)701 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the system 700 are also stored. The CPU 701, the ROM 702, and the RAM 703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

The following components are connected to the I/O interface 705: an input section 1006 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. The drive 310 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The present application also provides a computer readable storage medium, which may be the computer readable storage medium included in one of the dynamic video streaming and still image processing display systems in the above embodiments; or it may be a computer-readable storage medium that exists separately and is not built into the electronic device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the motion video streaming and still image processing display methods described herein.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A dynamic video stream and static image processing display method is used for assisting diagnosis of doctors, and comprises the following steps:

2. The processing and displaying method of claim 1, wherein obtaining a frame-extracted image from a dynamic video stream, generating video stream data from the video stream with a prediction result after frame extraction prediction, and storing or playing and displaying the video stream data comprises:

3. The processing and displaying method according to claim 1, wherein classifying, detecting, and segmenting the lesion according to the frame-extracted image to obtain still picture data, and storing or playing and displaying the still picture data comprises:

4. A motion video streaming and still image processing display system, said display system comprising:

an ultrasound device connected to the video acquisition device by a video line;

5. The display system of claim 4, further comprising: and the external control end is connected with the ultrasonic ai host.

6. An image processing display apparatus, characterized in that the apparatus comprises:

the first acquisition module is used for acquiring the dynamic video stream;

7. The image processing display device according to claim 6, further comprising:

8. The image processing display device according to claim 6, wherein the processing module comprises:

9. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor implements the method of any one of claims 1-4 by executing the executable instructions.

10. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, carry out the steps of the method according to any one of claims 1 to 4.