CN117017355B

CN117017355B - Thyroid autonomous scanning system based on multi-modal generation type dialogue

Info

Publication number: CN117017355B
Application number: CN202311292889.0A
Authority: CN
Inventors: 程栋梁; 王晨; 刘振; 黄琦; 张泉
Original assignee: Hefei Hebin Intelligent Robot Co ltd
Current assignee: Hefei Hebin Intelligent Robot Co ltd
Priority date: 2023-10-08
Filing date: 2023-10-08
Publication date: 2024-01-12
Anticipated expiration: 2043-10-08
Also published as: CN117017355A

Abstract

The invention discloses a thyroid autonomous scanning system based on a multi-mode generation type dialogue, which comprises the following steps: the system comprises a hardware device and an autonomous scanning unit for controlling the action of the hardware device; the hardware device comprises: the device comprises a mechanical arm, a force sensor, a clamping jaw, an upper computer, display equipment and medical ultrasonic equipment; the autonomous scanning unit is arranged in the upper computer, and comprises: the system comprises an image analysis module, a multi-mode question answering module, a decision module and a control module. The invention utilizes the generated multi-mode large language model to realize the full-automatic scanning of thyroid; in the scanning process, no manual path setting and no manual parameter setting are needed; determining a scanning initial position by adopting a zero-force dragging mode; the position and the posture of the probe are adjusted cooperatively according to the sensor data and the ultrasonic image; the continuous absolute motion of the mechanical arm is changed into discrete relative motion through behavior coding, so that scanning failure caused by precision errors in calibration is avoided.

Description

Thyroid autonomous scanning system based on multi-modal generation type dialogue

Technical Field

The invention belongs to the technical field of medical diagnosis, and particularly relates to a thyroid autonomous scanning system based on a multi-mode generation type dialogue.

Background

Ultrasound is a non-invasive, safe, non-radiative, real-time imaging examination that is widely used for clinical diagnosis and monitoring. In china, ultrasound examination is performed billions of times per year, and the massive examination needs are a great challenge for the scarce sonographer, especially in remote areas where medical resources are scarce. With the development of artificial intelligence technology, robots with autonomous scanning capabilities can help the sonographer to improve work efficiency and reduce patient waiting time.

Publication No. CN113855067A describes an autonomous positioning scanning method for fusion of visual images and medical images, which utilizes the visual images and the medical images to collect coordinates of organs and external position areas thereof, and then manually sets a target name, a target parameter, position information and a communication target for the system. The robot motion planning needs to manually set and set acquisition target parameters, and remote control cannot be performed in the scanning process, so that the robot is not really autonomous scanning.

Publication number CN114155940A, CN115227404a describes an autonomous ultrasonic scanning skill strategy generation method based on reinforcement learning, which solves the three difficulties of autonomous scanning, namely, collection and cleaning of collected training data; secondly, the training process is unstable, and the network is not easy to converge; thirdly, the locally optimal solution is not necessarily the globally optimal solution. Meanwhile, the method is a complete end-to-end network prediction, and if abnormal reasons cannot be traced back and self-optimization occurs, only human intervention can be waited.

The publication number CN115429327a describes an automatic scanning sequence selection method using positioning marks and preset scanning tracks, the autonomous scanning of the method is essentially a set of standardized scanning track control, the emergency caused by the difference of physiological structures of scanned persons and small-range movement cannot be solved, and the scanning track formulated by the method only comprises position change and no adjustment of probe posture.

Publication number CN115670515a describes an autonomous scanning method for positioning the thyroid using a depth camera, segmenting the position of the thyroid control probe in the image using a neural network, and adjusting the posture of the probe according to a force sensor, wherein the method adjusts the posture of the probe to be perpendicular to the skin, in practice, the probe is not necessarily perpendicular to the skin or is the best image, and the thyroid scanning is a joint optimization process of position, posture and force, and the scanning of two planes of a transverse plane and a longitudinal plane needs to be completed;

in view of the shortcomings of the above-mentioned published patent, in combination with the technological innovation brought by the generation type artificial intelligence, we propose a thyroid autonomous scanning system based on multi-modal generation type dialogue.

Disclosure of Invention

The invention aims to solve the defects in the prior art, and provides a thyroid autonomous scanning system based on a multi-mode generation dialogue, which realizes the full autonomous scanning of thyroid by using a generated multi-mode large language model; in the scanning process, no manual path setting and no manual parameter setting are needed; and determining the scanning initial position by adopting a zero-force dragging mode.

In order to achieve the above purpose, the present invention provides the following technical solutions:

a thyroid autonomous scanning system based on multimodal generation dialog, comprising:

the system comprises a hardware device and an autonomous scanning unit for controlling the action of the hardware device;

the hardware device includes:

the device comprises a mechanical arm, a force sensor, a clamping jaw, an upper computer, display equipment and medical ultrasonic equipment;

the force sensor is matched with the mechanical arm for use, and the clamping jaw is arranged at the action end of the mechanical arm;

the autonomous scanning unit is built in the upper computer, and comprises:

the system comprises an image analysis module, a multi-mode question-answering module, a decision module and a control module;

the image analysis module is used for processing and identifying the image from the ultrasonic equipment and generating metadata for processing by the decision module;

the multi-mode question-answering module is used for encoding the input characters and images, and obtaining answers by using a generated dialogue language model and encoding the answers into robot operations;

the decision module is used for summarizing the results of the image analysis and the answers of the question-answer module, generating a final decision and delivering the final decision to the control module for execution, and recording the results of the image analysis and summarizing the results into a final ultrasonic report;

the control module is responsible for executing the operation generated by the decision module, so that the safety of the operation is ensured;

the modules interact and communicate appointed data through a data interface, and components in the modules interact all data through sharing storage.

Preferably, the medical ultrasonic equipment is one or more of an ultrasonic three-dimensional diagnostic apparatus, a full-digital color Doppler ultrasound apparatus and ultrasonic color Doppler.

Preferably, the image analysis module includes:

a thyroid region segmentation component, a thyroid nodule detection component, a thyroid nodule attribute classification component, a thyroid diffuse lesion classification component, and an image quality confidence component;

the thyroid region segmentation component identifies a thyroid region and a trachea region in an ultrasonic image based on deep learning, performs pixel-level classification by using a segmentation semantic segmentation network to obtain masks of the thyroid region and the trachea region, calculates centroid coordinates, contours and areas of the thyroid region according to the masks, and gives the centroid coordinates, contours and areas to the decision module;

the thyroid nodule detection component identifies nodules in the thyroid region based on deep learning, uses a target detection network to generate a detection box for the nodules, and uses a multi-target tracking algorithm to obtain a unique identification number for the detection box;

the thyroid nodule attribute classification component classifies the nodules based on deep learning with multiple attributes, and the classification network uses a classification network of 8 output layers;

the thyroid diffuse disease classification component classifies thyroid regions with single attribute based on deep learning, and the classification network uses a classification network of 1 output layer;

the image quality confidence component calculates the confidence of each pixel of the ultrasonic image based on a random walk method and transmits the confidence to the decision module.

Preferably, the multi-mode question-answering module includes:

a word vector component, a graph vector component, a multi-modal dialogue question-answering component and an action coding component;

the word vector component encodes the text based on deep learning, and the word vector encodes the input text into a vector with fixed dimension by using a trained language model;

the image vector component encodes the image based on deep learning, and the image vector extracts the features of the image by using a convolution network without a classification layer and a retention feature layer, and reduces the dimension into a vector with fixed dimension;

the multi-modal dialogue question-answering component generates an answer text according to the word vector and the graph vector based on a multi-modal generated dialogue language model;

the action encoding component encodes answers derived from the generated conversational language model into three dimensional probe operations.

Preferably, the decision module includes:

a scanning state management component, a thyroid search component, a thyroid scanning component, a posture adjustment component and an ultrasonic reporting component;

the scanning state management component is responsible for scheduling the scanning state; judging whether the target in the current state is completed or not, and if the target needs to enter other states, not directly communicating with the control module;

the thyroid searching component judges whether thyroid is found according to the centroid coordinates and the area provided by the thyroid region segmentation component of the image analysis module, and judges whether the thyroid is left or right through the centroid coordinates and the center of the air duct;

the thyroid gland scanning component is responsible for scanning the cross section and the longitudinal section of left and right thyroid glands, and a thyroid gland rectangular frame is obtained according to the thyroid gland region provided by the thyroid gland region segmentation component of the image analysis module;

the gesture adjusting component sends the prompt word text into the word vector component according to the language template assembly, sends the confidence coefficient map and the ultrasonic image into the map vector component, and adjusts the gesture of the probe according to the output of the action encoding component and the image quality confidence coefficient component;

the ultrasonic reporting component collects the attributes of the nodules according to the set priority and calculates the classification of the nodules according to the collected attributes according to the unique thyroid nodule identification number of the image analysis module, and stores the real-time ultrasonic image of the nodules.

Preferably, the control module includes:

the system comprises a robot control assembly, a coordinate system conversion assembly and a safety control assembly;

the robot control assembly is responsible for issuing a control instruction of the mechanical arm, uploading data of each sensor of the mechanical arm, and entering and exiting zero-force dragging;

the coordinate system conversion component converts the instruction based on the working coordinate system provided by the decision module into an instruction of a robot base coordinate system given to the control component, and the component further comprises a set of calibration program for ensuring the accuracy of the coordinate system;

the safety control component is responsible for guaranteeing the action safety of the mechanical arm, ensuring that the contact force is between 2N and 4N during scanning, and the filtering decision module can cause out-of-range operation.

The invention has the technical effects and advantages that: compared with the prior art, the thyroid autonomous scanning system based on the multi-mode generation type dialogue has the following effects:

utilizing the generated multi-mode large language model to realize the full-automatic scanning of thyroid; in the scanning process, no manual path setting and no manual parameter setting are needed; determining a scanning initial position by adopting a zero-force dragging mode;

the position and the posture of the probe are adjusted cooperatively according to the sensor data and the ultrasonic image; the absolute motion of the continuous mechanical arm is changed into discrete relative motion through behavior coding, so that scanning failure caused by precision errors in calibration is avoided;

automatically processing the abnormality of the scanning process through the image segmentation of the thyroid region; and automatically generating a report by analyzing diffuse lesions and placeholder lesions in the scanning process through image classification and target detection.

Drawings

FIG. 1 is a scan state transition diagram;

FIG. 2 is a block and component flow architecture diagram;

fig. 3 is a schematic diagram of a tool coordinate system and a base coordinate system according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to specific embodiments in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention provides a thyroid autonomous scanning system based on a multi-modal generation dialogue, which is shown in fig. 1-3, and the system utilizes a generated multi-modal large language model to realize the full autonomous scanning of thyroid; in the scanning process, no manual path setting and no manual parameter setting are needed; determining a scanning initial position by adopting a zero-force dragging mode;

simultaneously, the position and the posture of the probe are adjusted cooperatively according to the sensor data and the ultrasonic image; the absolute motion of the continuous mechanical arm is changed into discrete relative motion through behavior coding, so that scanning failure caused by precision errors in calibration is avoided;

furthermore, the system automatically processes the abnormality of the scanning process through the image segmentation of the thyroid region; diffuse lesions and placeholder lesions in the scanning process are analyzed through image classification and target detection.

A thyroid autonomous scanning system comprising:

the hardware device comprises:

the autonomous scanning unit is arranged in the upper computer, and comprises:

the system comprises an image analysis module, a multi-mode question answering module, a decision module and a control module. The image analysis module comprises a thyroid region segmentation component, a thyroid nodule detection component, a thyroid nodule attribute classification component, a thyroid diffuse lesion classification component and an image quality confidence component. The multi-modal question-answering module comprises a word vector component, a graph vector component, a multi-modal dialogue question-answering component and an action coding component. The decision module comprises a scanning state management component, a thyroid search component, a thyroid scanning component, a posture adjustment component and an ultrasonic reporting component. The control module comprises a robot control component, a coordinate system conversion component and a safety control component. And the different modules interact and communicate appointed data through data interfaces, and the components in the modules interact all the data through sharing storage.

The image analysis module is responsible for processing and identifying the image from the ultrasonic equipment and generating metadata for processing by the decision module.

The thyroid region segmentation component identifies thyroid regions and tracheal regions in the ultrasonic image based on deep learning, performs pixel-level classification by using a segmentation semantic segmentation network to obtain masks of the thyroid regions and the tracheal regions, calculates centroid coordinates, contours and areas of the thyroid regions according to the masks, and gives the centroid coordinates, contours and areas to the decision module. The thyroid nodule detection component identifies nodules in the thyroid region based on deep learning, uses a target detection network to generate a detection box for the nodules, and uses a multi-target tracking algorithm to obtain a unique identification number for the detection box.

The thyroid nodule attribute classification component performs multi-attribute classification of the nodule based on deep learning, the multi-attribute classification network uses a classification network of 8 output layers, such as the resnet of 8 fully connected layers, to identify 8 attributes of the nodule, totaling 30 categories, as follows:

multi-attribute classification table

TABLE 1

Azimuth of

(Edge)

Areola of sound

Structure of the

Echo

Echo texture

Strong echo of range

Rear echo characterization

Vertical position

Finishing process

Sounding halo

Solidity of the product

Hyperechoic echo

Uniformity of

Microcalcifications

Enhancement

Level bit

Irregularities

Silence halo

The reality is mainly

Iso-echo

Non-uniformity of

Comet tail artifacts

Attenuation of

Blurring

Mainly in the form of capsule

Low echo

Coarse calcification

No change

Exothyroiditis invasion

Cystic nature

Very low echo

Peripheral calcification

Mixing changes

Spongy

Anechoic echo

Strong echo without range

Point strong echo with undefined meaning

The thyroid diffuse lesions classification component performs single attribute classification of thyroid regions based on deep learning, classification networks use classification networks of 1 output layer to classify 4 categories, normal thyroid, toxic goiter, subacute thyroiditis, and hashimoto thyroiditis.

The image quality confidence component calculates the confidence of each pixel of the ultrasound image based on the random walk method and delivers the confidence to the decision module.

The multimodal question-answering module is responsible for encoding the input characters and images, obtaining answers by using a generated dialogue language model and encoding the answers into robot operations.

the image vector component encodes the image based on the deep learning, and the image vector extracts the features of the image by using a convolution network without a classification layer and a feature layer, and reduces the dimension into a vector with fixed dimension;

the multi-modal dialogue question-answering component generates an answer text according to the word vector and the graph vector based on the multi-modal generated dialogue language model;

the action encoding component encodes answers derived from the generated conversational language model into three-dimensional probe operations based on a 6-output multi-attribute classification network, such as a 6-output-layer multi-layer perceptron. There are 6 operations in total in three dimensions, each operation having 3 categories, as listed below:

operation class table

TABLE 2

Tool coordinate system	Move in the X direction	Move in the Y direction	Z-direction movement	Rotated about the X-axis	Rotated about the Y-axis	Rotated about the Z axis
							Forward direction movement/rotation	1	1	1	1	1	1
No operation	0	0	0	0	0	0
							Negative direction movement/rotation	2	2	2	2	2	2

The decision module is responsible for summarizing the results of the image analysis and the answers of the question-answer module, generating a final decision to be transmitted to the control module for execution, and recording the results of the image analysis to be summarized into a final ultrasonic report.

The scanning state management component is responsible for scheduling the scanning state; and judging whether the target in the current state is completed or not, and if the target needs to enter other states, not directly communicating with the control module. The scanning process has three states, namely thyroid search, thyroid scanning and posture adjustment. The completion goal of the thyroid search state is to find the thyroid region and the thyroid region is located in the center of the image; the completion goal of the thyroid gland scanning state is to scan four boundaries of the upper, lower, left and right of the thyroid gland area; the posture adjustment state is completed with the aim that the image confidence map exceeds a threshold value and the triggering condition is that the image confidence map is lower than the threshold value. The thyroid search is performed twice to search for left and right thyroid leaves, and after the target is achieved, the thyroid scanning state is entered, the posture adjustment state is entered when the condition is not satisfied, and the thyroid adjustment state is exited when the condition is satisfied.

And the thyroid searching component judges whether the thyroid is found according to the centroid coordinates and the area provided by the thyroid region segmentation component of the image analysis module, and judges whether the thyroid is a left leaf or a right leaf through the centroid coordinates and the air duct center. If the thyroid region is found, determining the moving direction of the probe according to the thyroid center position and the center of the image; if the thyroid region is not found, the probe scans according to a fixed Z-shaped route, the scanning height of a Z-shaped window is set to 65 mm according to clinical statistics, the scanning width is set to 55 mm, the step length is 1 mm when the probe moves, and a generation instruction is transmitted to the control module for execution.

The thyroid gland scanning component is responsible for scanning the cross section and the longitudinal section of left and right thyroid glands, a thyroid gland rectangular frame is obtained according to the thyroid gland region provided by the thyroid gland region segmentation component of the image analysis module, the probe is moved to the upper left corner of the rectangular frame and is positioned at the center of the image, and the probe is moved leftwards after being moved downwards once when the upper right corner of the rectangular frame is positioned at the center of the image according to the sequence scanning of the X-axis direction and the lower direction of the tool coordinate system, namely the Y-axis direction, the left, the lower and the right bow-shaped directions of the tool coordinate system. Similarly, when the lower right corner of the rectangular frame is positioned in the center of the image, the scanning state management component is informed of the completion of thyroid gland scanning. The step size of the downward movement is set to 5 mm, and the step size of the leftward and rightward movement is set to 1 mm. When a thyroid nodule is found, the probe is rotated 90 degrees, i.e. 90 degrees around the Z axis of the tool coordinate system, into a longitudinal section scan, and a pose adjustment is performed.

The gesture adjusting component sends the confidence level diagram and the ultrasonic image into the word vector component by analyzing the confidence level diagram output by the image quality confidence level component and prompting the word text according to the language template assembly, the confidence level diagram and the ultrasonic image are sent into the image vector component, the response text is obtained through the multi-mode dialogue question-answering component, the movement direction and the rotation direction of the probe are obtained through the action coding component, the movement step length is 1 millimeter each time, the rotation step length is 1 degree, and the gesture is submitted to the control module for execution. And recording the confidence coefficient of the ultrasonic image and the length and width of the nodule detection frame in the gesture adjustment process, adjusting the probe to the gesture with the highest confidence coefficient in the adjustment period after 20 steps of adjustment, and re-entering the gesture adjustment state if the highest confidence coefficient is still greater than a threshold value.

The ultrasonic reporting component collects the attributes of the nodules according to the set priority and calculates the nodule classification according to the collected attributes (table 4) according to the thyroid nodule unique identification number of the image analysis module, saves the real-time ultrasonic image of the nodules, collects the assembly text according to the language template, and specifically comprises: finding a nodule, grading 4B, vertical, finishing, halation, solidity, hypoechoism, uniformity, coarse calcification, rear echo enhancement, sending to a word vector component, sending the saved nodule ultrasound image to a graph vector component, generating a conclusion writing report.

The set priorities are as follows: priority setting

Priority setting table

TABLE 3 Table 3

Azimuth of	Vertical position (1 min)>Level (0 min)
		(Edge)	Invasion of thyroid gland (1 minute)>Irregular (1 minute)>Fuzzy (1 minute)>Finishing (0 min)
Areola of sound	Sounding areola (0 min)>Soundless corona (0 min)
		Structure of the	Nature (1 min)>The reality is mainly (0 min)>The cystic nature is the main (0 min)>Spongy (0 min)>Cystic (0 min)
Echo	Very low echo (1 min)>Low echo (0 min)>Iso-echo (0 min)>Hyperechoic (0 min)>Anechoic (0 min)
		Echo texture	Non-uniformity (0 min)>Even (0 min)
Strong return of range Sound production	Microcalcifications (1 minute)>Coarse calcification (0 point)>Peripheral calcification (0 score)>Point-like strong echo (0 min)>Comet Tail artifact (-1 min)>Non-range strong echo (0 min)
		Rear echo	Enhancement (0 min)>Attenuation (0 min)>No change (0 point), any two kinds of optional mixed changes (0 point) appear in the three materials

The nodule classification is as follows:

nodule classification table

TABLE 4 Table 4

Score of	-1	0	1	2	3	4	5
								Grading	Level 2	3 grade	4A	4B	4C	4C	Grade 5

The control module is responsible for executing the operation generated by the decision module, and ensures the safety of the operation.

The robot control assembly is responsible for issuing a control command of the mechanical arm, uploading data of each sensor of the mechanical arm, and entering and exiting zero-force dragging.

The coordinate system conversion component converts the instructions based on the working coordinate system provided by the decision module into instructions of the robot base coordinate system given to the control component, and the component further comprises a set of calibration procedures to ensure the accuracy of the coordinate system.

Finally, it should be noted that: the foregoing description of the preferred embodiments of the present invention is not intended to be limiting, but rather, it will be apparent to those skilled in the art that the foregoing description of the preferred embodiments of the present invention can be modified or equivalents can be substituted for some of the features thereof, and any modification, equivalent substitution, improvement or the like that is within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A multimodal generation dialog-based thyroid autonomous scanning system, comprising:

the hardware device includes:

the autonomous scanning unit is built in the upper computer, and comprises:

the multi-mode question-answering module is used for encoding the input characters and images, and obtaining answers by using a generated dialogue language model and encoding the answers into robot operations; the multi-modal question-answering module comprises:

the action coding component codes answers obtained by the generated dialogue language model into probe operation in three dimensions;

2. A multimodal generation dialog-based thyroid autonomous scanning system as claimed in claim 1 wherein: the medical ultrasonic equipment is one or more of an ultrasonic three-dimensional diagnostic apparatus, a full-digital color Doppler ultrasound apparatus and ultrasonic color Doppler.

3. A multimodal generation dialog-based thyroid autonomous scanning system as claimed in claim 1 wherein: the image analysis module comprises:

4. A multimodal generation dialog-based thyroid autonomous scanning system as claimed in claim 1 wherein: the decision module comprises:

5. A multimodal generation dialog-based thyroid autonomous scanning system as claimed in claim 1 wherein: the control module includes: