WO2022252908A1 - 对象识别方法、装置、计算机设备及存储介质 - Google Patents

对象识别方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2022252908A1
WO2022252908A1 PCT/CN2022/091089 CN2022091089W WO2022252908A1 WO 2022252908 A1 WO2022252908 A1 WO 2022252908A1 CN 2022091089 W CN2022091089 W CN 2022091089W WO 2022252908 A1 WO2022252908 A1 WO 2022252908A1
Authority
WO
WIPO (PCT)
Prior art keywords
features
image
feature
spatial
computer device
Prior art date
Application number
PCT/CN2022/091089
Other languages
English (en)
French (fr)
Inventor
何楠君
卢东焕
李悦翔
林一
马锴
郑冶枫
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to US17/991,385 priority Critical patent/US20230080098A1/en
Publication of WO2022252908A1 publication Critical patent/WO2022252908A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • G16H30/40ICT specially adapted for the handling or processing of medical images for processing medical images, e.g. editing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10081Computed x-ray tomography [CT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30096Tumor; Lesion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images

Definitions

  • the embodiments of the present application relate to the field of computer technology, and in particular to an object recognition method, device, computer equipment, and storage medium.
  • the recognition result of the target object is obtained by recognizing the collected medical image of the target object, and the state of the target object is determined according to the recognition result, or the medical image is segmented according to the recognition result.
  • Embodiments of the present application provide an object recognition method, device, computer equipment, and storage medium, which improve recognition accuracy. Described technical scheme is as follows:
  • an object recognition method comprising:
  • the computer equipment extracts spatial features of multiple medical images respectively, and the multiple medical images are images of the same target object at different times;
  • the computer device fuses the extracted multiple spatial features to obtain a first fused spatial feature of the target object
  • the computer device extracts spatio-temporal features of the target object based on the first fused spatial features, the spatio-temporal features characterize changes in spatial information of the plurality of medical images at different times;
  • the computer device identifies the target object based on the spatio-temporal feature, and obtains an identification result of the target object.
  • an object recognition device comprising:
  • the spatial feature extraction module is used to extract the spatial features of multiple medical images respectively, and the multiple medical images are images of the same target object at different times;
  • a spatial feature fusion module configured to fuse the extracted multiple spatial features to obtain the first fused spatial feature of the target object
  • a spatio-temporal feature extraction module configured to extract the spatio-temporal feature of the target object based on the first fused spatial feature, the spatio-temporal feature representing the change of the spatial information of the plurality of medical images at different moments;
  • An object identification module configured to identify the target object based on the spatio-temporal features, and obtain an identification result of the target object.
  • a computer device in another aspect, includes a processor and a memory, at least one computer program is stored in the memory, and the at least one computer program is loaded and executed by the processor to implement the following: Operations performed by the object recognition method described in the above aspect.
  • a computer-readable storage medium wherein at least one computer program is stored in the computer-readable storage medium, and the at least one computer program is loaded and executed by a processor, so as to realize the above-mentioned The operation performed by the object recognition method.
  • a computer program product or computer program comprising computer program code, said computer program code being stored in a computer-readable storage medium, from which a processor of a computer device reads The computer-readable storage medium reads the computer program code, and the processor executes the computer program code, so that the computer device implements the operations performed by the object recognition method as described in the above aspects.
  • the method, device, computer equipment, and storage medium provided in the embodiments of the present application first extract the spatial features of multiple medical images of the target object respectively, and after fully extracting the spatial features of each medical image, fuse the multiple spatial features, and Based on the obtained first fusion spatial feature, the spatiotemporal feature of the target object is extracted, the spatiotemporal feature can represent the change of the spatial information of multiple medical images at different moments, and the temporal relationship between multiple medical images is considered during the extraction, so that The extracted spatio-temporal features can more accurately represent the spatial information and temporal information of multiple medical images, thereby improving the accuracy of recognition results when identifying target objects based on the spatio-temporal features.
  • FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application.
  • FIG. 2 is a flow chart of an object recognition method provided by an embodiment of the present application.
  • Fig. 3 is a flow chart of another object recognition method provided by the embodiment of the present application.
  • Fig. 4 is a schematic structural diagram of an image recognition model provided by an embodiment of the present application.
  • Fig. 5 is a schematic structural diagram of another image recognition model provided by the embodiment of the present application.
  • Fig. 6 is a schematic structural diagram of another image recognition model provided by the embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a first extraction network provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of another first extraction network provided by an embodiment of the present application.
  • FIG. 9 is a flow chart of another object recognition method provided by the embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of another image recognition network provided by an embodiment of the present application.
  • Fig. 11 is a schematic diagram of a thermal map provided by the embodiment of the present application.
  • Fig. 12 is a schematic structural diagram of an image recognition model in a related art provided by an embodiment of the present application.
  • FIG. 13 is a flow chart of an image segmentation method provided by an embodiment of the present application.
  • Fig. 14 is a schematic structural diagram of an object recognition device provided by an embodiment of the present application.
  • Fig. 15 is a schematic structural diagram of another object recognition device provided by an embodiment of the present application.
  • FIG. 16 is a schematic structural diagram of a terminal provided in an embodiment of the present application.
  • FIG. 17 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • first, second and the like used in this application may be used to describe various concepts herein, but unless otherwise specified, these concepts are not limited by these terms. These terms are only used to distinguish one concept from another.
  • a first image feature may be referred to as a second image feature
  • a second image feature may be referred to as a first image feature without departing from the scope of the present application.
  • At least one includes one, two or more than two, multiple includes two or more, each A refers to each of the corresponding plurality, and any refers to any one of the plurality.
  • a plurality of medical images includes 3 medical images, and each medical image refers to each medical image in the 3 medical images, and any refers to any one of the 3 medical images, which can be the first One, it could be the second, it could be the third.
  • the solution provided by the embodiment of the present application involves artificial intelligence computer vision, machine learning and other technologies.
  • the image recognition model By calling the image recognition model, the spatial and temporal features of the target object are extracted, and the target object is identified based on the temporal and spatial features to obtain the recognition result.
  • the object recognition method provided in the embodiment of the present application is executed by a computer device.
  • the computer device is a terminal or a server.
  • the terminal is a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc., but not limited thereto.
  • the server is an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or a server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, Cloud servers for basic cloud computing services such as cloud communications, middleware services, domain name services, security services, CDN (Content Delivery Network, content distribution network), and big data and artificial intelligence platforms.
  • CDN Content Delivery Network, content distribution network
  • the computer programs involved in the embodiments of the present application can be deployed and executed on one computer device, or executed on multiple computer devices located at one location, or distributed in multiple locations and Executed on multiple computer devices interconnected through a communication network, multiple computer devices distributed in multiple locations and interconnected through a communication network form a blockchain system.
  • the computer device used to identify the object in the embodiment of the present application is a node in the block chain system
  • the node extracts the spatial features of multiple medical images of the target object, and based on the multiple medical images Spatial features, extracting the spatio-temporal features of the target object, identifying the target object based on the spatio-temporal features, and obtaining the recognition result, and then the node or the node corresponding to other devices in the blockchain can also store the recognition result of the target object.
  • a deep learning network structure which includes a multi-head self-attention module (Multi-head Self-attention), a multi-layer perceptron (MLP, Multi-Layer Perceptron) and a regularization layer, and the network structure uses the residual structure.
  • the multi-head self-attention module is obtained by cascading multiple self-attention modules, and the output results of multiple self-attention modules are cascaded to obtain the output result of the multi-head self-attention module.
  • CNN Convolutional Neural Network, Convolutional Neural Network
  • a deep learning network widely used in image classification tasks including at least a convolutional layer, a pooling layer, or other processing layers.
  • ResNet residual Network, residual network
  • ResNet A CNN network structure, ResNet is easy to optimize, and alleviates the problem of gradient disappearance caused by increasing depth in deep neural networks.
  • CT Computer Tomography
  • FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application.
  • the implementation environment includes at least one terminal 101 (one terminal 101 is taken as an example in FIG. 1 ) and a server 102 .
  • the terminal 101 and the server 102 are connected through a wireless or wired network.
  • a target application provided by the server 102 is installed on the terminal 101, and the terminal 101 can implement functions such as image recognition and image transmission through the target application.
  • the terminal 101 is a computer, mobile phone, tablet computer or other terminals.
  • the target application is a target application in the operating system of the terminal 101, or a target application provided by a third party.
  • the target application is a medical application, and the medical recognition application has the function of recognizing medical images, and also has other functions, such as generating medical records, displaying medical images, and other functions.
  • the server 102 is a background server of the target application or a cloud server providing services such as cloud computing and cloud storage.
  • the embodiment of the present application provides a cervical state recognition scenario.
  • the terminal collects cervical images of the same object every 30 seconds, obtains 5 CT images, sends the 5 CT images to the server, and the server extracts the spatial features of the 5 CT images respectively, and then based on the 5 spatial features Feature extraction of the spatio-temporal features of the object's cervix, identification based on the spatio-temporal features, obtaining the cervix identification result of the object, and returning the cervix identification result to the terminal, and then using the cervix identification result as a basis for auxiliary judgment, In combination with other information of the subject, the state of the cervix of the subject is determined.
  • the embodiment of the present application also provides a CT image segmentation scenario.
  • the terminal collects cervical images of the same object every 30 seconds, obtains 5 CT images, sends the 5 CT images to the server, and the server extracts the spatial features of the 5 CT images respectively, and then based on the 5 spatial features Feature extraction of the spatio-temporal features of the subject's cervix, identification based on the spatio-temporal features, obtaining the cervix identification result of the subject, and returning the cervix identification result to the terminal, and the terminal determines the lesion area in each CT image according to the identification result, Each CT image is segmented to obtain a lesion area in each CT image, so as to further process the lesion area.
  • Fig. 2 is a flow chart of an object recognition method provided by an embodiment of the present application.
  • the execution subject of the embodiment of the present application is a computer device. Referring to Figure 2, the method comprises the following steps:
  • a computer device extracts spatial features of multiple medical images respectively.
  • the multiple medical images are images of the same target object at different times, and the target object is any object, for example, the target object refers to a person or a certain part of the body.
  • the spatial feature characterizes the spatial information of the corresponding medical image, for example, the spatial information includes at least size information of the medical image, pixel value or position information of a pixel in the medical image.
  • the process of extracting the spatial features of the multiple medical images is performed independently without interfering with each other.
  • the computer device fuses the extracted multiple spatial features to obtain a first fused spatial feature of the target object.
  • the computer equipment first extracts the spatial features of each medical image, and then further extracts the spatio-temporal features based on the extracted multiple spatial features. Since the subsequent feature extraction needs to consider the temporal relationship between multiple medical images, Therefore, the extracted multiple spatial features are first fused to obtain the first fused spatial feature.
  • the computer device extracts the spatio-temporal features of the target object based on the first fused spatial feature.
  • the spatio-temporal feature since the spatio-temporal feature is obtained by extracting temporal features on the basis of the first fused spatial feature, the extracted spatio-temporal features include the spatial information of each medical image and the temporal information of multiple medical images,
  • the sequence information of the plurality of medical images refers to the time sequence corresponding to the plurality of medical images and the change of the medical images at different moments, that is, the spatial-temporal feature represents the change of the spatial information of the plurality of medical images at different moments.
  • the computer device identifies the target object based on the spatiotemporal feature, and obtains an identification result of the target object.
  • the recognition result is used to indicate the state of the target object, and optionally, the state of the target object includes a normal state and an abnormal state. Alternatively, the recognition results are used to indicate abnormal areas in each medical image.
  • the spatial features of multiple medical images of the target object are first extracted respectively, and after the spatial features of each medical image are fully extracted, the multiple spatial features are fused, and the first fusion space obtained based on the fusion is Feature extraction
  • the spatiotemporal features of the target object, the spatiotemporal features represent the changes of the spatial information of multiple medical images at different times, and the temporal relationship between multiple medical images is considered during extraction, so that the extracted spatiotemporal features can more accurately represent multiple medical images.
  • the spatial information and temporal information of a medical image can be used to identify the target object based on the spatiotemporal features, which also improves the accuracy of the recognition results.
  • Fig. 3 is a flow chart of another object recognition method provided by the embodiment of the present application.
  • the execution subject of the embodiment of the present application is a computer device. Referring to Fig. 3, this method comprises the following steps:
  • a computer device respectively extracts first image features of multiple medical images of a target object.
  • the target object is any object, and the target object refers to a human body or a certain part in the human body, for example, the target object is any part such as lungs, stomach, or uterus.
  • the multiple medical images are images of the same target object at different times, that is, the multiple medical images are images obtained by collecting the target object at different times.
  • the medical image is a CT image, an image taken by X-ray irradiation or an image collected by other methods.
  • the multiple medical images are collected by the computer device, or sent to the computer device after being collected by other devices, which is not limited in the present application.
  • the intervals between acquisition times of any two adjacent medical images are the same or different.
  • the duration of the interval is 30 seconds, 60 seconds or other durations.
  • the interval between the acquisition time of the first medical image and the acquisition time of the second medical image is 30 seconds
  • the interval between the acquisition time of the second medical image and the acquisition time of the third medical image is 30 seconds
  • the interval is 30 seconds
  • the interval between the acquisition time of the third medical image and the acquisition time of the fourth medical image is 30 seconds.
  • the first image feature is used to describe the corresponding medical image, and the first image feature is in a vector, matrix or other forms.
  • the computer device respectively encodes the multiple medical images to obtain the first image feature of each medical image.
  • the computer device extracts spatial features of the multiple medical images based on the first image features of the multiple medical images respectively.
  • the spatial feature represents the spatial information of the corresponding medical image, for example, the spatial information includes at least size information of the medical image, pixel values of pixels in the medical image, and position information.
  • the spatial features are in vector, matrix or other form.
  • the process of extracting the spatial features of the multiple medical images is performed independently without interfering with each other.
  • the following takes the extraction of the spatial features of any medical image as an example to describe the process of extracting the spatial features illustrate.
  • the medical image is divided into a plurality of image regions, that is, the medical image includes a plurality of image regions, and correspondingly, the computer device divides the first image feature of the medical image into a plurality of region features, and each region The feature corresponds to an image region in the medical image, and the first attention parameters corresponding to the multiple regional features are obtained respectively, and based on the multiple first attention parameters, the multiple regional features are weighted and fused to obtain the first attention parameter corresponding to the medical image.
  • Two image features; based on the second image features, spatial features of the medical image are extracted.
  • the first attention parameter represents the importance of the corresponding regional feature in the first image feature
  • the second image feature is also used to describe the corresponding medical image.
  • the difference between the first image feature and the second image feature is that :
  • the second image feature is obtained by adjusting the first image feature according to the importance of different regional features on the basis of the first image feature. Compared with the first image feature, the second image feature can more accurately represent More important image regions in medical images.
  • the computer device maps each regional feature into at least two feature spaces to obtain at least two mapping features corresponding to each regional feature, wherein the At least two feature spaces characterize the similarity of different pixels in the corresponding image area in corresponding dimensions; based on at least two mapping features corresponding to each area feature, the first attention parameter corresponding to each area feature is obtained.
  • the computer device extracting the spatial feature of the medical image based on the second image feature includes: the computer device directly extracts the spatial feature of the second image feature to obtain the spatial feature of the medical image.
  • the computer device in order to avoid losing the information in the first image feature in the process of processing the first image feature to obtain the second image feature, resulting in inaccurate extracted spatial features, fuses the second The image feature and the first image feature are used to obtain the third image feature corresponding to the medical image; based on the third image feature, the spatial feature of the medical image is extracted.
  • the third image feature includes the first image feature and the second image feature, so the third image feature includes complete information of the medical image and can highlight information of more important image regions in the medical image.
  • the computer device in order to reduce the amount of calculation in the processing process and improve the processing speed, the computer device first performs normalization processing on the first image features to obtain the processed first image features, and then normalizes the first image features.
  • the normalized first image feature performs the step of determining the first attention parameter.
  • normalization processing is performed on the third image features to obtain processed third image features, and then the step of extracting spatial features is performed on the normalized third image features.
  • the normalization process can limit the numerical values contained in the image features within the range of 0 to 1, so as to avoid large differences between the various numerical values contained in the image features, resulting in complicated processing.
  • the embodiment of the present application does not limit the order in which the spatial features of multiple medical images are extracted.
  • the spatial features of multiple medical images are extracted at the same time, or, according to the acquisition time corresponding to the medical images, sequentially Extract spatial features of each medical image.
  • the computer device fuses the extracted multiple spatial features to obtain a first fused spatial feature of the target object.
  • the computer equipment first extracts the spatial features of each medical image, and then performs temporal feature extraction on the basis of the extracted multiple spatial features.
  • temporal feature extraction it is necessary to consider the differences between multiple medical images. Instead of extracting temporal features for each spatial feature separately, it is necessary to fuse multiple spatial features first to obtain the first fused spatial feature.
  • fusing multiple spatial features may be concatenating the multiple spatial features, so that the obtained first fused spatial features include multiple spatial features of medical images.
  • the computer device extracts the spatio-temporal features of the target object based on the first fused spatial feature.
  • the extracted spatio-temporal features include the spatial information of each medical image and the time-series information of multiple medical images, the time-series information of the multiple medical images.
  • the information refers to the chronological sequence corresponding to multiple medical images and the changes of the medical images at different moments, that is, the spatial-temporal features represent the changes of the spatial information of multiple medical images at different moments. That is, spatio-temporal features characterize the variation of multiple medical images.
  • the computer device divides the first fused spatial feature into a plurality of spatial sub-features according to the medical image, each spatial sub-feature corresponds to a medical image, and obtains the second note corresponding to the plurality of spatial sub-features respectively.
  • a force parameter based on multiple second attention parameters, multiple spatial sub-features are fused to obtain second fused spatial features corresponding to multiple medical images; based on the second fused spatial features, spatio-temporal features are extracted.
  • the second attention parameter represents the importance of the corresponding spatial sub-feature in the first fused spatial feature.
  • the computer device maps each spatial sub-feature into at least two feature spaces to obtain at least two mapping features corresponding to each spatial sub-feature, where At least two feature spaces characterize the similarity of different pixels in the corresponding medical image in corresponding dimensions; based on at least two mapping features corresponding to each spatial sub-feature, obtain a second attention parameter corresponding to each spatial sub-feature .
  • the computer device extracts the spatial features of the medical image based on the second fused space features, including: the computer device directly extracts time-series features from the second fused space to obtain the spatio-temporal features.
  • the computer device in order to avoid losing information in the first fused spatial feature in the process of processing the first fused spatial feature to obtain the second fused spatial feature, resulting in inaccurate extracted spatial features, the computer device
  • the second fused spatial feature and the first fused spatial feature are fused to obtain a third fused spatial feature of the target object; based on the third fused spatial feature, a spatio-temporal feature is extracted.
  • the third fused spatial feature includes the first fused spatial feature and the second fused spatial feature, so while the third fused spatial feature contains the spatial information of all medical images, it can also highlight the more important features of multiple medical images. information about medical images.
  • the computer device in order to reduce the calculation amount in the processing process and improve the processing speed, the computer device first performs normalization processing on the first fusion spatial feature to obtain the processed first fusion spatial feature, and then The step of determining the second attention parameter is performed on the normalized first fused spatial feature. Similarly, normalization processing is performed on the third fused spatial feature to obtain the processed third fused spatial feature, and then the step of extracting spatio-temporal features is performed on the normalized third fused spatial feature.
  • the computer device identifies the target object based on the spatiotemporal feature, and obtains an identification result of the target object.
  • the recognition result is used to indicate the state of the target object, or the recognition result is used to indicate the abnormal region in each medical image.
  • the state of the target object includes a normal state and an abnormal state.
  • the normal state indicates that the target object has not changed, and the abnormal state indicates that the target object has changed compared with the target object in the normal state.
  • the normal state indicates that the target object has no disease, and the abnormal state indicates that the target object has a disease.
  • the recognition result includes a first category and a second category, wherein the first category indicates that the target object is in a normal state, and the second category indicates that the target object is in an abnormal state. For example, the first category is negative and the second category is positive.
  • the abnormal area in the medical image refers to the area where lesions occur in the medical image, for example, the abnormal area is a lesion area; correspondingly, the normal area in the medical image refers to the area where no lesion occurs in the medical image area.
  • the computer equipment separately segments each medical image to obtain the abnormal area in each medical image, that is, to segment the abnormal area in each medical image, so as to facilitate further processing of the segmented abnormal area. For example, the computer equipment segments the CT image of the cervix, segments out the lesion area in the CT image, and further identifies the lesion area to determine the size and shape of the lesion area to obtain more accurate information on the lesion area.
  • the identification results obtained in the embodiments of the present application are only a basis for assisting doctors in identifying diseases.
  • a doctor needs to combine the recognition results of the target object, other cancer-related information, and the physical condition of the target object to identify whether the target object has cancer.
  • the spatial features of multiple medical images of the target object are first extracted respectively, and after the spatial features of each medical image are fully extracted, the multiple spatial features are fused, and based on the obtained first fused spatial features , to extract the spatio-temporal features of the target object, the spatio-temporal features can represent the changes of the spatial information of multiple medical images at different times, and the temporal relationship between multiple medical images is considered during extraction, so that the extracted spatio-temporal features can be more accurately
  • the spatial information and time series information of multiple medical images are represented, so that when the target object is identified based on the spatiotemporal features, the accuracy of the identification result is also improved.
  • the first image feature and the second image feature are fused, and the spatial feature extraction is performed on the fused third image feature, so that more information can be used when extracting the spatial feature
  • the accuracy rate of the spatial feature is further improved, and information in the first image feature is avoided in the process of acquiring the second image feature, resulting in inaccurate extracted spatial features.
  • the first fusion spatial feature and the second fusion spatial feature will be fused, and the time-series feature extraction will be performed on the third fusion spatial feature obtained through fusion, so that more information can be used when extracting spatio-temporal features.
  • the accuracy rate of the spatio-temporal feature is further improved, and the information in the first fused spatial feature is avoided in the process of obtaining the second fused spatial feature, resulting in inaccurate extracted spatio-temporal features.
  • the first attention parameter corresponding to each area feature in the first image feature is used to obtain the second image feature corresponding to the first image feature, so that the more important features of the second image feature can be highlighted
  • the regional characteristics of the image region similarly, using the second attention parameter corresponding to each spatial sub-feature in the first fusion spatial feature to obtain the second fusion spatial feature corresponding to the first fusion spatial feature, so that the second fusion spatial feature can Highlight the more important spatial features of medical images.
  • FIG. 3 The above-mentioned embodiment shown in FIG. 3 is described by taking computer equipment to directly process multiple medical images to realize object recognition as an example.
  • the computer equipment invokes an image recognition model to process multiple medical images, Implement object recognition.
  • the structure of the image recognition model is introduced below:
  • the image recognition model 400 includes a first extraction network 401 , a second extraction network 402 and a recognition network 403 .
  • the first extraction network 401 is connected with the second extraction network 402
  • the second extraction network 402 is also connected with the recognition network 403
  • the first extraction network 401 is used to extract the spatial features of the medical image
  • the second extraction network 402 is used to extract the target
  • the spatio-temporal features of the object, the recognition network 403 is used to recognize the target object.
  • the image recognition model 400 is TiT (Transformer in Transformer), that is, the image recognition model 400 is a cascaded Transformer, and TiT is obtained by cascading at least two Transformers.
  • both the first extraction network and the second extraction network are a Transformer.
  • the image recognition model 400 includes a plurality of first extraction networks 401 (in FIG. 5 Take three as an example), each first extraction network 401 is used to extract spatial features based on a medical image, and the plurality of first extraction networks 401 are respectively connected to the second extraction network 402 .
  • the image recognition model 4001 also includes a third extraction network 404, the third extraction network 404 is connected to the first extraction network 401, and the third extraction network 404 is used to extract image features of medical images, That is to convert medical images into a form that can be processed by computer equipment.
  • the image recognition model 400 includes a plurality of first extraction networks 401
  • the third extraction network 404 is respectively connected to each first extraction network 401 .
  • the first extraction network 401 and the second extraction network 402 have a similar network structure, taking the network structure of the first extraction network 401 as an example, referring to FIG. 7 , the first extraction network 401 includes a first Regularization layer 411 , first attention layer 421 , first extraction layer 431 .
  • the first regularization layer 411 is connected with the first attention layer 421
  • the first attention layer 421 is also connected with the first extraction layer 431 .
  • the first extraction layer 431 includes a first fusion layer, a second regularization layer and a multilayer perceptron
  • the first extraction network 401 further includes a second fusion layer 441 .
  • the first fusion layer is connected with the last layer of the previous network, the first attention layer 421 and the second regularization layer
  • the second regularization layer is also connected with the multi-layer perceptron
  • the multi-layer perceptron is also connected with the second
  • the fusion layer 441 is connected
  • the second fusion layer is also connected with the first fusion layer.
  • FIG. 9 is a flowchart of another object recognition method provided by the embodiment of the present application.
  • the execution subject of the embodiment of the present application is a computer device. Referring to Figure 9, the method comprises the following steps:
  • the computer device invokes a third extraction network to respectively extract first image features of multiple medical images.
  • the third extraction network is used to encode the medical image to obtain the first image feature of the medical image.
  • the image recognition model includes a third extraction network, and the third extraction network sequentially extracts the first image features of multiple medical images; or the image recognition model includes multiple third extraction networks, and each third extraction network extracts A first image feature of a medical image.
  • the third extraction network adopts the following formula to extract the first image features of multiple medical images:
  • M represents the extracted first image feature
  • x represents the input medical image
  • Encoder( ⁇ ) represents the extraction using CNN.
  • the resolution (length ⁇ width) of any medical image is H ⁇ W
  • the number of channels of any medical image is C
  • the number of medical images is T
  • both C and T are positive integers.
  • the computer device calls the first extraction network to extract spatial features of multiple medical images based on the first image features of the multiple medical images respectively.
  • the computer device when the image recognition model includes only one first extraction network, invokes the first extraction network, and sequentially extracts spatial features based on the first image features of multiple medical images, thereby obtaining multiple spatial features.
  • the computer device calls one of the first extraction networks to extract spatial features based on the first image features of a medical image to obtain the spatial features of the medical image.
  • the first extraction network includes a first attention layer and a first extraction layer.
  • the computer device calls the first attention layer to divide the first image features of the medical image into multiple Each regional feature, obtain the first attention parameters corresponding to the multiple regional features, and fuse the multiple regional features according to the multiple first attention parameters to obtain the second image feature corresponding to the medical image; call the first extraction layer , based on the second image feature, the spatial feature of the medical image is extracted.
  • the first attention parameter represents the importance of the corresponding regional feature in the first image feature
  • each regional feature corresponds to an image region in the medical image
  • the medical image includes multiple image regions.
  • the computer device invokes the first attention layer, maps each regional feature into at least two feature spaces, and obtains at least two corresponding to each regional feature. mapping features; based on at least two mapping features corresponding to each regional feature, obtain the first attention parameter corresponding to each regional feature.
  • the computer device invokes the first attention layer, and maps each region feature to three feature spaces, and the three feature spaces correspond to the query (query) dimension, key (key) dimension, and value (value) feature Dimension, use the following formula to determine the first attention parameter corresponding to each regional feature:
  • q represents the mapping feature of the query dimension
  • k represents the mapping feature of the key dimension
  • v represents the mapping feature of the value feature dimension
  • y represents any region feature
  • U qkv represents the model parameters obtained from training
  • A represents the feature of any region
  • softmax( ⁇ ) indicates normalization processing
  • D h indicates the number of dimensions of the hidden layer in the first attention layer.
  • MSA(y) [SA 1 (y); SA 2 (y); . . . ; SA k (y)] U mas
  • SA(y) represents the regional feature after any regional feature is weighted
  • MSA(y) represents the second image feature
  • k represents the division of the medical image into k image regions
  • U mas represents the model parameters obtained from training.
  • the first extraction network includes a residual network structure, that is, the first extraction network also includes a first fusion layer, and the computer device calls the first fusion layer to fuse the second image features and the first image features to obtain The third image feature corresponding to the medical image; calling the first extraction layer to extract the spatial feature of the medical image based on the third image feature.
  • the first extraction network further includes a first regularization layer and a second regularization layer
  • the computer device calls the first regularization layer , performing normalization processing on the first image feature to obtain the processed first image feature.
  • the computer device invokes the second regularization layer to perform normalization processing on the third image features to obtain the processed third image features.
  • the first extraction layer includes a multi-layer perceptron
  • the computer device invokes the multi-layer perceptron to extract spatial features based on the third image feature.
  • the computer device in order to avoid losing the information in the third image feature in the process of processing the third image feature to obtain the spatial feature, resulting in inaccurate spatio-temporal features extracted subsequently, the computer device fuses the third image feature with Spatial features, the fused spatial features are obtained, and the fused spatial features are subsequently processed.
  • the computer device invokes the second extraction network, fuses the extracted multiple spatial features to obtain a first fusion spatial feature, and extracts a spatio-temporal feature based on the first fusion spatial feature.
  • the second extraction network includes a third fusion layer
  • the computer device invokes the third fusion layer to fuse multiple spatial features to obtain the first fusion spatial feature.
  • the following formula is used in the third fusion layer to obtain the first fusion spatial feature:
  • z represents the first fusion spatial feature
  • T represents a total of T medical images.
  • the network structure of the second extraction network is similar to the network structure of the first extraction network
  • the second extraction network includes a second attention layer and a second extraction layer
  • the computer device calls the second attention layer, Dividing the first fused spatial feature into multiple spatial sub-features, obtaining second attention parameters corresponding to the multiple spatial sub-features respectively, based on the multiple second attention parameters, fusing the multiple spatial sub-features to obtain multiple medical images
  • the corresponding second fusion spatial feature call the second extraction layer, based on the second fusion spatial feature, extract the spatio-temporal feature.
  • the computer device invokes the second attention layer to map each spatial sub-feature into at least two feature spaces, and obtain the corresponding At least two mapping features; based on the at least two mapping features corresponding to each spatial sub-feature, obtain a second attention parameter corresponding to each spatial sub-feature.
  • the second extraction network includes a residual network structure, that is, the second extraction network also includes a fourth fusion layer, and the computer device calls the fourth fusion layer to fuse the second fusion space feature and the first fusion space feature , to obtain the third fused spatial feature of the target object; call the second extraction layer to extract the spatio-temporal feature based on the third fused spatial feature.
  • the second extraction network further includes a third regularization layer and a fourth regularization layer
  • the computer device calls the third regularization layer , performing normalization processing on the first fused spatial feature to obtain the processed first fused spatial feature.
  • the computer device invokes the fourth regularization layer to perform normalization processing on the third fusion spatial feature to obtain the processed third fusion spatial feature.
  • the second extraction layer includes a multi-layer perceptron
  • the computer device invokes the multi-layer perceptron to perform temporal feature extraction on the third fused spatial feature to obtain a spatio-temporal feature.
  • f represents the spatio-temporal feature
  • TT( ) represents the time series feature extraction
  • the embodiment of the present application only takes one second extraction layer as an example.
  • the image recognition model includes multiple second extraction layers, and the spatio-temporal features output by the current second extraction layer , input to the next second extraction layer until the spatio-temporal feature output by the last second extraction layer is obtained, and the spatio-temporal feature output by the last second extraction layer is determined as the spatio-temporal feature of the target object.
  • the computer device invokes the identification network, identifies the target object based on the spatiotemporal features, and obtains an identification result of the target object.
  • the recognition network is used to recognize the target object, and obtain the recognition result of the target object.
  • the recognition network includes an MLP and an activation function Softmax
  • the computer device invokes the MLP and the activation function Softmax to recognize a target object and obtain a recognition result.
  • the output of the recognition network is 0 or 1.
  • the output of the recognition network is a probability, and when the output probability is greater than the reference probability, it indicates that the target object is in a normal state, and when the output probability is not greater than the reference probability, it indicates that the target object is in an abnormal state.
  • the first image features corresponding to the three medical images are extracted through the third extraction network 1001, and the obtained three first image features are respectively input into the corresponding
  • the first extraction network 1002 of the first extraction network 1002 outputs the spatial features, and then inputs the three spatial features to the second extraction network 1003, outputs the spatio-temporal features of the target object through the second extraction network 1003, and then inputs the spatio-temporal features to Recognition network 1004 to obtain the recognition result of the target object.
  • any first extraction network 1002 the first image features are normalized through the regularization layer, and the processed first image features are respectively mapped to three feature spaces, and then the multi-head attention layer is used to map The three mapping features obtained are processed, the second image features are output, the first image features and the second image features are fused to obtain the third image features, and the third image features are then normalized through a regularization layer to obtain The processed third image feature, the processed third image feature is input to the multi-layer perceptron, and the corresponding spatial feature is obtained through the processing of the multi-layer perceptron, and then through a fusion layer, the spatial feature is fused with the third image features to obtain the fused spatial features.
  • the computer device invokes the recognition network to recognize each medical image of the target object separately, and after identifying the abnormal region in each medical image, mark the abnormal region in the medical image, and output the marked Post medical images.
  • the abnormal area in the medical image is drawn with a colored solid line, or the abnormal area is filled with a color not in the medical image, or marked in other ways, which is not limited in this embodiment of the present application.
  • related technologies provide a structure of an image recognition model, as shown in FIG. 11 , taking three medical images of a target object as an example. These three medical images are respectively subjected to feature extraction by the corresponding convolutional neural network 1101, and the The three extracted features are all input to the graph convolutional network 1102, and the graph convolutional network 1102 fuses the three features, recognizes the fused features, and obtains a recognition result. Wherein, a circle in the graph convolutional network 1102 represents an extracted feature.
  • the corresponding convolutional neural network needs to be trained separately for each medical image, resulting in a large amount of training, difficulty in model training, and low recognition efficiency.
  • features are extracted for different medical images, differences between different medical images are not fully considered. Therefore, the spatial information and temporal information of multiple medical images are not fully utilized, resulting in low recognition accuracy.
  • image recognition models in related technologies include Early fusion (early fusion) model, Voting (voting) model, MLP, LSTM (Long Short-Term Memory, long-term short-term memory network) and GCN (Graph Convolutional Networks, graph convolution network ), compare the recognition results of the image recognition model TiT in the present application with the recognition results of the image recognition model in the related art, using precision (Precision), recall (Recall), accuracy (Accuracy), and parameters in the model It can be seen that the recognition accuracy of the image recognition model in this application is higher, and the training process is simpler. See the following Table 1 for the comparison results. It can be seen from Table 1 that the precision rate, recall rate and accuracy rate of the recognition results obtained by the image recognition model in this application are the largest, and compared with GCN, in this application The image recognition model needs to learn a small number of parameters.
  • the heat map indicates the lesion area in the corresponding medical image
  • the recognition results of the present application with the corresponding heat map, it can be determined that the The method can accurately identify the lesion area in the medical image, and the accuracy of the recognition result is high.
  • the method provided in the embodiment of the present application calls the image recognition model to identify the target object, first calls the first extraction network to extract the spatial features of multiple medical images of the target object, and after fully extracting the spatial features of each medical image, calls the first extraction network
  • the second extraction network fuses multiple spatial features, and extracts the spatiotemporal features of the target object based on the obtained first fused spatial features.
  • the spatiotemporal features can represent the spatial information changes of multiple medical images at different moments, and the extraction takes into account
  • the temporal relationship between multiple medical images enables the extracted spatio-temporal features to more accurately represent the spatial and temporal information of multiple medical images, thereby invoking the recognition network.
  • the recognition results are also improved. the accuracy rate.
  • the first extraction network and the second extraction network in the embodiment of the present application adopt a residual network structure, which alleviates the gradient disappearance problem caused by increasing the depth in the deep neural network, so that when extracting spatial features or extracting spatiotemporal features, More information can be used to further improve the accuracy of spatial features or spatiotemporal features.
  • both the first extraction network and the second extraction network use the attention layer, and the attention layer can be used to further process the first image features, so that the processed second image features can be highlighted Regional features of more important image regions; similarly, the first fused spatial features can be further processed by using the attention layer, so that the processed second fused spatial features can highlight more important spatial features of medical images.
  • the image recognition model needs to be trained before the computer device invokes the image recognition model to recognize objects.
  • the training process includes:
  • the computer equipment obtains multiple sample images and the sample recognition results to which the multiple sample images belong; the image recognition model is called to process the multiple sample images to obtain the predicted recognition results of the sample objects; according to the sample recognition results and the predicted recognition results, the training image Identify the model.
  • the multiple sample images are images of the same sample object at different times.
  • the computer device performs multiple iterative training on the image recognition model, and the iterative training ends when the number of training times of the image recognition model reaches a reference number, or the training time of the image recognition model reaches the reference time.
  • the known colposcope data set Time-lapsed Colposcopic Images (TCI, time-lapsed colposcopic image) is used as the sample data set of the image recognition model, the sample data set contains 7668 patients' time-lapsed colposcopic images, and the patient's The age distribution ranges from 24 to 49 years old. These patients were divided into 4 categories, namely non-cancerous (no cancer), Cervical Intraepithelial Neoplasia1 (CIN1, cervical intraepithelial neoplasia 1), CIN2 ⁇ 3 and Cancer (cancer). CIN1, CIN2-3, and Cancer were combined into category 1, collectively referred to as low-grade squamous intraepithelial lesions or more severe.
  • TCI time-lapsed colposcopic Images
  • the sample data of each patient includes images of 5 time nodes (the initial image, the image after 60 seconds, the image of 90 seconds, the image of 120 seconds and the image of 150 seconds).
  • the computer device uses a cross-entropy loss function or other loss functions to process the output probability, and trains the image recognition model according to the output result of the loss function.
  • the computer device that invokes the image recognition model to recognize the object in FIG. 9 above may be the same computer device as the computer device that trains the image recognition model, or may be a different computer device.
  • the computer device in the above embodiment shown in FIG. 9 is a server, or a user's terminal, and the computer device for training an image recognition model is a developer's terminal or server.
  • the computer device in the above embodiment shown in FIG. 9 and the computer device for training the image recognition model are the same server.
  • the image recognition model in the embodiment of the present application includes a residual network structure, so the model training process of the image recognition model is simpler, the calculation amount is small, and the training speed of the image recognition model is obviously improved.
  • the method provided by the embodiment of the present application can be applied in various scenarios.
  • the image segmentation scenario of the present application will be described below through the embodiment shown in FIG. 13 :
  • the computer equipment collects multiple CT images of the cervix at different times.
  • the computer device respectively extracts first image features of each CT image.
  • the computer device extracts spatial features of each CT image based on the extracted multiple first image features respectively.
  • the computer device fuses the extracted multiple spatial features to obtain a first fused spatial feature of the cervix.
  • the computer device extracts the space-time feature of the cervix based on the first fusion space feature.
  • the computer device determines an identification result of the cervix based on the spatio-temporal feature, and the identification result is used to indicate an abnormal area in each CT image.
  • the computer device Based on the identification result of the cervix, the computer device separately segments each CT image to obtain a lesion area in each CT image.
  • Fig. 14 is a schematic structural diagram of an object recognition device provided by an embodiment of the present application. Referring to Figure 14, the device includes:
  • the spatial feature extraction module 1401 is used to extract the spatial features of multiple medical images respectively, and the multiple medical images are images of the same target object at different times;
  • a spatial feature fusion module 1402 configured to fuse the extracted multiple spatial features to obtain the first fused spatial feature of the target object
  • the spatio-temporal feature extraction module 1403 is configured to extract the spatio-temporal feature of the target object based on the first fused spatial feature, and the spatio-temporal feature represents the change of the spatial information of multiple medical images at different moments;
  • the object identification module 1404 is configured to identify the target object based on the spatio-temporal features, and obtain the identification result of the target object.
  • the device provided in the embodiment of the present application first extracts the spatial features of multiple medical images of the target object respectively, and after fully extracting the spatial features of each medical image, fuses the multiple spatial features, and based on the obtained first fusion spatial features , to extract the spatio-temporal features of the target object, the spatio-temporal features can represent the changes of the spatial information of multiple medical images at different times, and the temporal relationship between multiple medical images is considered during extraction, so that the extracted spatio-temporal features can be more accurately
  • the spatial information and time series information of multiple medical images are represented, so that when the target object is identified based on the spatiotemporal features, the accuracy of the identification result is also improved.
  • the device further includes:
  • An image feature extraction module 1405, configured to extract first image features of a plurality of medical images respectively;
  • the spatial feature extraction module 1401 is configured to extract the spatial features of the multiple medical images based on the first image features of the multiple medical images respectively.
  • the spatial feature extraction module 1401 includes:
  • the first attention determination unit 1411 is configured to, for each medical image, divide the first image feature of the medical image into a plurality of regional features, respectively acquire the first attention parameters corresponding to the plurality of regional features, and the first attention parameters Representing the importance of the corresponding regional feature in the first image feature, the medical image includes a plurality of image regions, and each regional feature corresponds to an image region in the medical image;
  • the first feature fusion unit 1421 is configured to perform weighted fusion of multiple regional features based on multiple first attention parameters to obtain a second image feature corresponding to the medical image;
  • the spatial feature extraction unit 1431 is configured to extract the spatial feature of the medical image based on the second image feature.
  • the first attention determination unit 1411 is configured to:
  • Each regional feature is mapped to at least two feature spaces to obtain at least two mapping features corresponding to each regional feature, wherein at least two feature spaces represent the similarity of different pixels in the corresponding image area in the corresponding dimensions Spend;
  • a first attention parameter corresponding to each regional feature is obtained.
  • the spatial feature extraction unit 1431 is configured to:
  • the spatial feature of the medical image is extracted.
  • the spatial feature extraction module 1401 further includes:
  • the first normalization unit 1441 is configured to perform normalization processing on the third image features to obtain processed third image features.
  • the spatial feature extraction module 1401 further includes:
  • the second normalization unit 1451 is configured to perform normalization processing on the first image features of each medical image respectively to obtain the processed first image features of each medical image.
  • the spatio-temporal feature extraction module 1403 includes:
  • the second attention determination unit 1413 is configured to divide the first fused spatial feature into a plurality of spatial sub-features, respectively obtain the second attention parameters corresponding to the plurality of spatial sub-features, and the second attention parameters represent the corresponding spatial sub-features
  • the degree of importance in the first fused spatial feature, each spatial sub-feature corresponds to a medical image
  • the second feature fusion unit 1423 is configured to fuse multiple spatial sub-features based on multiple second attention parameters to obtain second fused spatial features corresponding to multiple medical images;
  • the spatiotemporal feature extraction unit 1433 is configured to extract spatiotemporal features based on the second fused spatial features.
  • the spatio-temporal feature extraction unit 1433 is configured to:
  • the recognition result is used to indicate the state of the target object.
  • the device further includes:
  • a state determining module 1406, configured to determine the state of the target object based on the recognition result.
  • the recognition result is used to indicate an abnormal region in each medical image, see FIG. 15, and the apparatus further includes:
  • the image segmentation module 1407 is configured to segment each medical image based on the recognition result to obtain abnormal regions in each medical image.
  • the image recognition model includes a first extraction network, a second extraction network, and a recognition network
  • the spatial feature extraction module 1401 is used to call the first extraction network to extract spatial features of multiple medical images
  • the spatial feature fusion module 1402 is used to call the second extraction network to fuse the extracted multiple spatial features to obtain the first fused spatial feature;
  • the spatio-temporal feature extraction module 1403 is used to call the second extraction network to extract spatio-temporal features based on the first fusion spatial feature;
  • the object recognition module 1404 is configured to call the recognition network to recognize the target object based on the spatio-temporal features, and obtain the recognition result of the target object.
  • the image recognition model also includes a third extraction network, referring to FIG. 15 , the device also includes:
  • the image feature extraction module 1405 is used to call the third extraction network to extract the first image features of multiple medical images respectively;
  • the spatial feature extraction module 1401 is configured to call the first extraction network to extract the spatial features of the multiple medical images based on the first image features of the multiple medical images respectively.
  • the first extraction network includes a first attention layer and a first extraction layer.
  • the spatial feature extraction module 1401 includes:
  • the first attention determination unit 1411 is configured to call the first attention layer for each medical image, divide the first image features of the medical image into multiple regional features, and obtain the first attention corresponding to the multiple regional features respectively Parameters, the first attention parameter characterizes the importance of the corresponding regional feature in the image feature, each regional feature corresponds to an image area in the medical image, and the medical image includes multiple image areas;
  • the first feature fusion unit 1421 is configured to call the first attention layer to fuse multiple regional features according to multiple first attention parameters to obtain second image features corresponding to the medical image;
  • the spatial feature extraction unit 1431 is configured to invoke the first extraction layer to extract the spatial features of the medical image based on the second image features.
  • the second extraction network includes a second attention layer and a second extraction layer.
  • the spatio-temporal feature extraction module 1403 includes:
  • the second attention determination unit 1413 is configured to call the second attention layer, divide the first fusion spatial feature into a plurality of spatial sub-features, obtain the second attention parameters corresponding to the plurality of spatial sub-features, and the second attention
  • the parameter represents the importance of the corresponding spatial sub-feature in the first fusion spatial feature, and each spatial sub-feature corresponds to a medical image;
  • the second feature fusion unit 1423 is configured to call the second attention layer, and fuse multiple spatial sub-features based on multiple second attention parameters to obtain second fused spatial features corresponding to multiple medical images;
  • the spatio-temporal feature extraction unit 1433 is configured to invoke the second extraction layer to extract spatio-temporal features based on the second fusion spatial feature.
  • the training process of the image recognition model includes:
  • the image recognition model is trained.
  • the object recognition device provided in the above embodiment recognizes objects, it only uses the division of the above-mentioned functional modules as an example. In practical applications, the above-mentioned function allocation can be completed by different functional modules according to needs. The internal structure of the computer equipment is divided into different functional modules to complete all or part of the functions described above.
  • the object recognition device provided by the above embodiment and the object recognition method embodiment belong to the same idea, and the specific implementation process thereof is detailed in the method embodiment, and will not be repeated here.
  • the embodiment of the present application also provides a computer device, the computer device includes a processor and a memory, at least one computer program is stored in the memory, and the at least one computer program is loaded and executed by the processor, so as to realize the object recognition of the above embodiment The operation performed by the method.
  • FIG. 16 is a schematic structural diagram of a terminal 1600 provided by an embodiment of the present application.
  • the terminal 1600 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc., and the terminal 1600 may also be called user equipment, a portable terminal, a laptop terminal, a desktop terminal and other names.
  • the terminal 1600 includes: a processor 1601 and a memory 1602 .
  • the processor 1601 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like.
  • the processor 1601 can adopt at least one hardware form in DSP (Digital Signal Processing, digital signal processing), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, programmable logic array) accomplish.
  • the processor 1601 may also include an AI (Artificial Intelligence, artificial intelligence) processor, where the AI processor is used to process computing operations related to machine learning.
  • AI Artificial Intelligence, artificial intelligence
  • Memory 1602 may include one or more computer-readable storage media, which may be non-transitory.
  • the non-transitory computer-readable storage medium in the memory 1602 is used to store at least one computer program, and the at least one computer program is used to be executed by the processor 1601 to implement the methods provided by the method embodiments in this application. object recognition method.
  • the terminal 1600 may optionally further include: a peripheral device interface 1603 and at least one peripheral device.
  • the processor 1601, the memory 1602, and the peripheral device interface 1603 may be connected through buses or signal lines.
  • Each peripheral device can be connected to the peripheral device interface 1603 through a bus, a signal line or a circuit board.
  • the peripheral device includes: at least one of a display screen 1604 and a camera assembly 1605 .
  • the peripheral device interface 1603 may be used to connect at least one peripheral device related to I/O (Input/Output, input/output) to the processor 1601 and the memory 1602 .
  • the processor 1601, memory 1602 and peripheral device interface 1603 are integrated on the same chip or circuit board; in some other embodiments, any one of the processor 1601, memory 1602 and peripheral device interface 1603 or The two can be implemented on a separate chip or circuit board, which is not limited in this embodiment.
  • the display screen 1604 is used to display a UI (User Interface, user interface).
  • the UI can include graphics, text, icons, video, and any combination thereof.
  • the display screen 1604 also has the ability to collect touch signals on or above the surface of the display screen 1604 .
  • the touch signal can be input to the processor 1601 as a control signal for processing.
  • the camera assembly 1605 is used to capture images or videos.
  • the camera component 1605 includes a front camera and a rear camera.
  • the front camera is set on the front panel of the terminal, and the rear camera is set on the back of the terminal.
  • terminal 1600 also includes one or more sensors 1606 .
  • the one or more sensors 1606 include, but are not limited to: an acceleration sensor 1611 , a gyro sensor 1612 , a pressure sensor 1613 , an optical sensor 1614 and a proximity sensor 1615 .
  • FIG. 16 does not limit the terminal 1600, and may include more or less components than shown in the figure, or combine certain components, or adopt different component arrangements.
  • FIG. 17 is a schematic structural diagram of a server provided by an embodiment of the present application.
  • the server 1700 may have relatively large differences due to different configurations or performances, and may include one or more than one processor (Central Processing Units, CPU) 1701 and one Or more than one memory 1702, wherein at least one computer program is stored in the memory 1702, and the at least one computer program is loaded and executed by the processor 1701 to implement the methods provided by the above-mentioned method embodiments.
  • the server may also have components such as a wired or wireless network interface, a keyboard, and an input and output interface for input and output, and the server may also include other components for realizing device functions, which will not be repeated here.
  • the embodiment of the present application also provides a computer-readable storage medium, at least one computer program is stored in the computer-readable storage medium, and the at least one computer program is loaded and executed by a processor, so as to realize the object recognition method of the above-mentioned embodiment The action performed.
  • the embodiment of the present application also provides a computer program product or computer program, where the computer program product or computer program includes computer program code, and the computer program code is stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer program code from the computer-readable storage medium, and the processor executes the computer program code, so that the computer device implements the operations performed by the object recognition method of the above-mentioned embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Radiology & Medical Imaging (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Public Health (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Quality & Reliability (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Image Analysis (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

本申请实施例公开了一种对象识别方法、装置、计算机设备及存储介质,属于计算机技术领域。该方法包括:计算机设备分别提取多个医学图像的空间特征(201),所述多个医学图像为同一目标对象在不同时刻的图像;计算机设备融合所提取的多个空间特征,得到所述目标对象的第一融合空间特征(202);计算机设备基于所述第一融合空间特征,提取所述目标对象的时空特征(203);计算机设备基于所述时空特征识别所述目标对象,得到所述目标对象的识别结果(204)。该方法提取得到的时空特征能够更加准确地表示多个医学图像的空间信息和时序信息,从而基于该时空特征识别目标对象时,也提高了识别结果的准确率。

Description

对象识别方法、装置、计算机设备及存储介质
本申请要求于2021年06月03日提交、申请号为202110617124.4、发明名称为“对象识别方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及计算机技术领域,特别涉及一种对象识别方法、装置、计算机设备及存储介质。
背景技术
随着计算机技术的发展,采用图像处理技术辅助进行对象识别已成为一种常用手段。例如,在医学领域中,通过对采集的目标对象的医学图像进行识别,得到目标对象的识别结果,根据该识别结果确定目标对象的状态,或者根据该识别结果对医学图像进行分割。
发明内容
本申请实施例提供了一种对象识别方法、装置、计算机设备及存储介质,提高了识别准确率。所述技术方案如下:
一方面,提供了一种对象识别方法,所述方法包括:
计算机设备分别提取多个医学图像的空间特征,所述多个医学图像为同一目标对象在不同时刻的图像;
所述计算机设备融合所提取的多个空间特征,得到所述目标对象的第一融合空间特征;
所述计算机设备基于所述第一融合空间特征,提取所述目标对象的时空特征,所述时空特征表征所述多个医学图像在不同时刻的空间信息的变化;
所述计算机设备基于所述时空特征识别所述目标对象,得到所述目标对象的识别结果。
另一方面,提供了一种对象识别装置,所述装置包括:
空间特征提取模块,用于分别提取多个医学图像的空间特征,所述多个医学图像为同一目标对象在不同时刻的图像;
空间特征融合模块,用于融合所提取的多个空间特征,得到所述目标对象的第一融合空间特征;
时空特征提取模块,用于基于所述第一融合空间特征,提取所述目标对象的时空特征,所述时空特征表征所述多个医学图像在不同时刻的空间信息的变化;
对象识别模块,用于基于所述时空特征识别所述目标对象,得到所述目标对象的识别结果。
另一方面,提供了一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有至少一条计算机程序,所述至少一条计算机程序由所述处理器加载并执行,以实现如上述方面所述的对象识别方法所执行的操作。
另一方面,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条计算机程序,所述至少一条计算机程序由处理器加载并执行,以实现如上述方面所述的对象识别方法所执行的操作。
另一方面,提供了一种计算机程序产品或计算机程序,所述计算机程序产品或所述计算机程序包括计算机程序代码,所述计算机程序代码存储在计算机可读存储介质中,计算机设备的处理器从计算机可读存储介质读取所述计算机程序代码,处理器执行所述计算机程序代码,使得所述计算机设备实现如上述方面所述的对象识别方法所执行的操作。
本申请实施例提供的方法、装置、计算机设备及存储介质,先分别提取目标对象的多个医学图像的空间特征,在充分提取了每个医学图像的空间特征后,融合多个空间特征,并基于得到的第一融合空间特征,提取目标对象的时空特征,该时空特征能够表征多个医学图像在不同时刻的空间信息的变化,且提取时考虑了多个医学图像之间的时间关系,使提取的时空特征能够更加准确地表示多个医学图像的空间信息和时序信息,从而基于该时空特征识别目标对象时,也提高了识别结果的准确率。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请实施例的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种实施环境的示意图;
图2是本申请实施例提供的一种对象识别方法的流程图;
图3是本申请实施例提供的另一种对象识别方法的流程图;
图4是本申请实施例提供的一种图像识别模型的结构示意图;
图5是本申请实施例提供的另一种图像识别模型的结构示意图;
图6是本申请实施例提供的另一种图像识别模型的结构示意图;
图7是本申请实施例提供的一种第一提取网络的结构示意图;
图8是本申请实施例提供的另一种第一提取网络的结构示意图;
图9是本申请实施例提供的另一种对象识别方法的流程图;
图10是本申请实施例提供的另一种图像识别网络的结构示意图;
图11是本申请实施例提供的一种热力图的示意图;
图12是本申请实施例提供的一种相关技术中图像识别模型的结构示意图;
图13是本申请实施例提供的一种图像分割方法的流程图;
图14是本申请实施例提供的一种对象识别装置的结构示意图;
图15是本申请实施例提供的另一种对象识别装置的结构示意图;
图16是本申请实施例提供的一种终端的结构示意图;
图17是本申请实施例提供的一种服务器的结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
可以理解,本申请所使用的术语“第一”、“第二”等可在本文中用于描述各种概念,但除非特别说明,这些概念不受这些术语限制。这些术语仅用于将一个概念与另一个概念区分。举例来说,在不脱离本申请的范围的情况下,可以将第一图像特征称为第二图像特征,将第二图像特征称为第一图像特征。
本申请所使用的术语“至少一个”、“多个”、“每个”、“任一”等,至少一个包括一个、两个或两个以上,多个包括两个或两个以上,每个是指对应的多个中的每一个,任一是指多个中的任意一个。举例来说,多个医学图像包括3个医学图像,而每个医学图像是指这3个医学图像中的每一个医学图像,任一是指这3个医学图像中的任意一个,可以是第一个,可以是第二个,也可以是第三个。
本申请实施例提供的方案涉及人工智能的计算机视觉、机器学习等技术,通过调用图像识别模型,提取目标对象的空间特征和时空特征,并基于时空特征识别目标对象,以得到识别结果。
本申请实施例提供的对象识别方法,由计算机设备执行。可选地,该计算机设备为终端或服务器。可选地,该终端是智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能手表等,但并不局限于此。可选地,该服务器是独立的物理服务器,或者,是多个物理服务器构成的服务器集群或者分布式系统,或者,是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN(Content Delivery Network,内容分发网络)、以及大数据和人工智能平台等基础云计算服务的云服务器。
在一种可能实现方式中,本申请实施例所涉及的计算机程序可被部署在一个计算机设备上执行,或者在位于一个地点的多个计算机设备上执行,又或者,在分布在多个地点且通过通信网络互连的多个计算机设备上执行,分布在多个地点且通过通信网络互连的多个计算机设备组成区块链系统。
在一种可能实现方式中,本申请实施例中用于识别对象的计算机设备是区块链系统中的节点,该节点提取目标对象的多个医学图像的空间特征,以及基于多个医学图像的空间特征,提取目标对象的时空特征,并基于该时空特征识别目标对象,得到识别结果,之后该节点或者该区块链中的其他设备对应的节点还能够存储该目标对象的识别结果。
为了便于理解本申请实施例,先对本申请实施例涉及到的关键词进行解释:
Transformer:一种深度学习网络结构,该网络结构包括多头自注意力模块(Multi-head Self-attention)、多层感知机(MLP,Multi-Layer Perceptron)以及正则化层,且该网络结构中采用了残差结构。其中,多头自注意力模块是由多个自注意力模块级联得到的,多个自注意力模块的输出结果级联,即可得到多头自注意力模块的输出结果。
CNN(Convolutional Neural Network,卷积神经网络):一种广泛应用于图像分类任务的深度学习网络,至少包含卷积层、池化层或其他处理层。
ResNet(Residual Network,残差网络):一种CNN网络结构,ResNet容易进行优化,且缓解了深度神经网络中增加深度带来的梯度消失问题。
CT(Computed Tomography,电子计算机断层扫描)图像:CT图像是采用X射线对人体或物体中具有一定厚度的层面进行扫描,并接收透过该层面的X射线,对接收到的X射线进行处理后得到的图像。
图1是本申请实施例提供的一种实施环境的示意图。参见图1,该实施环境包括至少一个终端101(图1中以1个终端101为例)和服务器102。终端101和服务器102之间通过无线或者有线网络连接。
终端101上安装由服务器102提供服务的目标应用,终端101能够通过该目标应用实现例如图像识别、图像传输等功能。可选地,终端101为电脑、手机、平板电脑或者其他终端。可选地,目标应用为终端101操作系统中的目标应用,或者为第三方提供的目标应用。例如,目标应用为医疗应用,该医疗识别应用具有识别医学图像的功能,另外还具有其他功能,例如,生成病历、显示医学图像等功能。可选地,服务器102为该目标应用的后台服务器或者为提供云计算以及云存储等服务的云服务器。
在图1所示的实施环境的基础上,本申请实施例提供了一种宫颈状态识别场景。在此场景中,终端每隔30秒采集同一个对象的宫颈图像,得到了5个CT图像,将5个CT图像发送给服务器,服务器分别提取5个CT图像的空间特征,再基于5个空间特征提取该对象宫颈的时空特征,基于该时空特征进行识别,得到该对象的宫颈识别结果,并将该宫颈识别结果返回给终端,后续即可将该宫颈识别结果作为一种辅助判断的依据,并结合该对象的其他信息,确定该对象的宫颈的状态。
在图1所示的实施环境的基础上,本申请实施例还提供了一种CT图像分割场景。在此场景下,终端每隔30秒采集同一个对象的宫颈图像,得到了5个CT图像,将5个CT图像发送给服务器,服务器分别提取5个CT图像的空间特征,再基于5个空间特征提取该对象 宫颈的时空特征,基于该时空特征进行识别,得到该对象的宫颈识别结果,并将该宫颈识别结果返回给终端,终端根据该识别结果,确定每个CT图像中的病灶区域,对每个CT图像进行分割,得到每个CT图像中的病灶区域,以便对病灶区域进行进一步处理。
图2是本申请实施例提供的一种对象识别方法的流程图。本申请实施例的执行主体为计算机设备。参见图2,该方法包括以下步骤:
201、计算机设备分别提取多个医学图像的空间特征。
其中,多个医学图像为同一目标对象在不同时刻的图像,该目标对象为任一对象,例如该目标对象是指一个人或者是身体中的某个部位。空间特征表征对应的医学图像的空间信息,例如空间信息至少包括医学图像的尺寸信息、医学图像中像素点的像素值或位置信息。且计算机设备在提取多个医学图像的空间特征时,该多个医学图像的空间特征提取过程独立执行,互不干扰。
202、计算机设备融合所提取的多个空间特征,得到目标对象的第一融合空间特征。
本申请实施例中,计算机设备先分别提取每个医学图像的空间特征,之后基于提取的多个空间特征进一步提取时空特征,由于后续提取特征时,需要考虑多个医学图像之间的时间关系,因此,先对提取得到的多个空间特征进行融合,得到第一融合空间特征。
203、计算机设备基于第一融合空间特征,提取目标对象的时空特征。
本申请实施例中,由于时空特征是在第一融合空间特征的基础上,进行时序特征提取得到的,因此提取的时空特征中包含每个医学图像的空间信息以及多个医学图像的时序信息,该多个医学图像的时序信息是指多个医学图像对应的时间先后顺序以及不同时刻的医学图像的变化情况,即时空特征表征多个医学图像在不同时刻的空间信息的变化。
204、计算机设备基于时空特征识别目标对象,得到目标对象的识别结果。
其中,识别结果用于指示目标对象的状态,可选地,目标对象的状态包括正常状态和异常状态。或者,识别结果用于指示每个医学图像中的异常区域。
本申请实施例提供的方法,先分别提取目标对象的多个医学图像的空间特征,在充分提取了每个医学图像的空间特征后,融合多个空间特征,并基于融合得到的第一融合空间特征提取目标对象的时空特征,该时空特征表征多个医学图像在不同时刻的空间信息的变化,且提取时考虑了多个医学图像之间的时间关系,使提取的时空特征更加准确地表示多个医学图像的空间信息和时序信息,从而基于该时空特征识别目标对象时,也提高了识别结果的准确率。
图3是本申请实施例提供的另一种对象识别方法的流程图。本申请实施例的执行主体为计算机设备。参见图3,该方法包括以下步骤:
301、计算机设备分别提取目标对象的多个医学图像的第一图像特征。
其中,目标对象为任一对象,该目标对象是指人体或者人体中的某个部位,例如,目标对象为肺部、胃部、子宫等任一部位。多个医学图像为同一目标对象在不同时刻的图像,即多个医学图像是在不同时刻对目标对象进行采集得到的图像。该医学图像为CT图像、通过X光照射拍摄的图像或采用其他方式采集的图像。可选地,该多个医学图像为该计算机设备采集的,或者是由其他设备采集之后发送给该计算机设备的,本申请对此不做限制。
可选地,任两个相邻的医学图像的采集时间之间间隔相同的时长或者不同的时长。例如,间隔的时长为30秒、60秒或其他时长。例如,对于四个医学图像,第一个医学图像的采集时间与第二个医学图像的采集时间之间间隔30秒,第二个医学图像的采集时间与第三个医学图像的采集时间之间间隔30秒,第三个医学图像的采集时间与第四个医学图像的采集时间之间间隔30秒。
其中,第一图像特征用于描述对应的医学图像,该第一图像特征为向量、矩阵或其他形式。在一种可能实现方式中,计算机设备分别对多个医学图像进行编码,得到每个医学图像 的第一图像特征。
302、计算机设备分别基于多个医学图像的第一图像特征,提取多个医学图像的空间特征。
其中,空间特征表征对应的医学图像的空间信息,例如空间信息至少包括医学图像的尺寸信息、医学图像中像素点的像素值、位置信息。该空间特征为向量、矩阵或其他形式。
计算机设备在提取多个医学图像的空间特征时,该多个医学图像的空间特征提取过程独立执行,互不干扰,下面以提取任一医学图像的空间特征为例,对提取空间特征的过程进行说明。
在一种可能实现方式中,医学图像被划分为多个图像区域,即医学图像包括多个图像区域,对应地,计算机设备将医学图像的第一图像特征划分为多个区域特征,每个区域特征对应医学图像中的一个图像区域,并且,分别获取多个区域特征对应的第一注意力参数,基于多个第一注意力参数,对多个区域特征进行加权融合,得到医学图像对应的第二图像特征;基于第二图像特征,提取医学图像的空间特征。
其中,第一注意力参数表征对应的区域特征在第一图像特征中的重要程度,该第二图像特征也用于描述对应的医学图像,该第一图像特征与该第二图像特征的区别在于:第二图像特征是在第一图像特征的基础上,根据不同的区域特征的重要程度对第一图像特征进行调整后得到的,第二图像特征与第一图像特征相比能够更加准确地表征医学图像中较为重要的图像区域。
对于第一注意力参数的确定,在一种可能实现方式中,计算机设备将每个区域特征分别映射到至少两个特征空间中,得到每个区域特征对应的至少两个映射特征,其中,该至少两个特征空间表征对应图像区域中的不同像素点在对应的维度上的相似度;基于每个区域特征对应的至少两个映射特征,获取每个区域特征对应的第一注意力参数。
在一种可能实现方式中,计算机设备基于第二图像特征,提取医学图像的空间特征,包括:计算机设备直接对第二图像特征进行空间特征提取,得到医学图像的空间特征。
在另一种可能实现方式中,为了避免在对第一图像特征进行处理,得到第二图像特征的过程中丢失第一图像特征中的信息,导致提取的空间特征不准确,计算机设备融合第二图像特征与第一图像特征,得到医学图像对应的第三图像特征;基于第三图像特征,提取医学图像的空间特征。其中,第三图像特征中包含第一图像特征和第二图像特征,因此该第三图像特征包含医学图像完整的信息的同时,又能够凸显医学图像中较为重要的图像区域的信息。
另外,在一种可能实现方式中,计算机设备为了减小处理过程中的计算量,提高处理速度,先对第一图像特征进行归一化处理,得到处理后的第一图像特征,之后针对归一化处理后的第一图像特征执行确定第一注意力参数的步骤。同理,对第三图像特征进行归一化处理,得到处理后的第三图像特征,之后针对归一化处理后的第三图像特征执行提取空间特征的步骤。其中,归一化处理可将图像特征中包含的数值限定在0到1的范围内,从而避免图像特征中包含的各个数值之间相差较大,导致处理过程复杂。
上述提取空间特征的过程是以一个医学图像为例进行说明的,本申请中的每个医学图像均能够采用上述实施方式来提取对应的空间特征。
需要说明的是,本申请实施例对多个医学图像提取空间特征的先后顺序不做限制,可选地,同时分别提取多个医学图像的空间特征,或者,按照医学图像对应的采集时间,依次提取每个医学图像的空间特征。
303、计算机设备融合所提取的多个空间特征,得到目标对象的第一融合空间特征。
本申请实施例中,计算机设备先提取每个医学图像的空间特征,然后在提取的多个空间特征的基础上进行时序特征提取,由于进行时序特征提取时,需要考虑多个医学图像之间的时间关系,而不是分别针对每个空间特征进行时序特征提取,因此,需要先融合多个空间特征,得到第一融合空间特征。
其中,融合多个空间特征可以是拼接该多个空间特征,使得到的第一融合空间特征包含多个医学图像的空间特征。
304、计算机设备基于第一融合空间特征,提取目标对象的时空特征。
由于时空特征是在第一融合空间特征的基础上,进行时序特征提取得到的,因此提取的时空特征包含每个医学图像的空间信息以及多个医学图像的时序信息,该多个医学图像的时序信息是指多个医学图像对应的时间先后顺序以及不同时刻的医学图像的变化情况,即时空特征表征多个医学图像在不同时刻的空间信息的变化。也就是说,时空特征表征多个医学图像的变化情况。
在一种可能实现方式中,计算机设备按照医学图像,将第一融合空间特征划分为多个空间子特征,每个空间子特征对应一个医学图像,分别获取多个空间子特征对应的第二注意力参数;基于多个第二注意力参数,融合多个空间子特征,得到多个医学图像对应的第二融合空间特征;基于第二融合空间特征,提取时空特征。其中,第二注意力参数表征对应的空间子特征在第一融合空间特征中的重要程度。
对于第二注意力参数的确定,在一种可能实现方式中,计算机设备将每个空间子特征分别映射到至少两个特征空间中,得到每个空间子特征对应的至少两个映射特征,其中至少两个特征空间表征对应医学图像中的不同像素点在对应的维度上的相似度;基于每个空间子特征对应的至少两个映射特征,获取每个空间子特征对应的第二注意力参数。
在一种可能实现方式中,计算机设备基于第二融合空间特征,提取医学图像的空间特征,包括:计算机设备直接对第二融合空间进行时序特征提取,得到时空特征。
在另一种可能实现方式中,为了避免在对第一融合空间特征进行处理,得到第二融合空间特征的过程中丢失第一融合空间特征中的信息,导致提取的空间特征不准确,计算机设备融合第二融合空间特征与第一融合空间特征,得到目标对象的第三融合空间特征;基于第三融合空间特征,提取时空特征。其中,第三融合空间特征中包含第一融合空间特征和第二融合空间特征,因此该第三融合空间特征中在包含全部医学图像的空间信息的同时,又能够凸显多个医学图像中较为重要的医学图像的信息。
另外,在一种可能实现方式中,计算机设备为了减小处理过程中的计算量,提高处理速度,先对第一融合空间特征进行归一化处理,得到处理后的第一融合空间特征,之后针对归一化处理后的第一融合空间特征执行确定第二注意力参数的步骤。同理,对第三融合空间特征进行归一化处理,得到处理后的第三融合空间特征,之后针对归一化处理后的第三融合空间特征执行提取时空特征的步骤。
305、计算机设备基于时空特征识别目标对象,得到目标对象的识别结果。
其中,识别结果用于指示目标对象的状态,或者识别结果用于指示每个医学图像中的异常区域。
在一种可能实现方式中,目标对象的状态包括正常状态和异常状态,正常状态指示目标对象未发生变化,异常状态指示目标对象相对于正常状态下的目标对象发生了变化。例如,在对目标对象进行疾病识别的场景下,正常状态表示目标对象没有发生病变,异常状态表示目标对象发生了病变。可选地,识别结果包括第一类别和第二类别,其中,第一类别表示目标对象处于正常状态,第二类别表示目标对象处于异常状态。例如,第一类别为阴性,第二类别为阳性。
在一种可能实现方式中,医学图像中的异常区域是指医学图像中发生病变的区域,例如,异常区域为病灶区域;对应地,医学图像中的正常区域是指医学图像中未发生病变的区域。计算机设备基于识别结果,分别对每个医学图像进行分割,得到每个医学图像中的异常区域,即将每个医学图像中的异常区域分割出来,便于对分割出的异常区域进行进一步的处理。例如,计算机设备对宫颈的CT图像进行分割,分割出CT图像中的病灶区域,对该病灶区域进行进一步识别,以确定该病灶区域的尺寸、形状等,得到该病灶区域更准确的信息。
需要说明的是,在医学领域中,本申请实施例中得到的识别结果仅是一种辅助医生对疾病进行识别的依据。例如,在癌症识别场景下,医生需要结合目标对象的识别结果、癌症相关的其他信息以及目标对象的身体状况,来识别目标对象是否患有癌症。
本申请实施例提供的方法,先分别提取目标对象的多个医学图像的空间特征,在充分提取了每个医学图像的空间特征后,融合多个空间特征,并基于得到的第一融合空间特征,提取目标对象的时空特征,该时空特征能够表征多个医学图像在不同时刻的空间信息的变化,且提取时考虑了多个医学图像之间的时间关系,使提取的时空特征能够更加准确地表示多个医学图像的空间信息和时序信息,从而基于该时空特征识别目标对象时,也提高了识别结果的准确率。
并且,本申请实施例中在提取空间特征时,融合第一图像特征与第二图像特征,对融合得到的第三图像特征进行空间特征提取,使提取空间特征时,能够利用更多的信息,进一步提高了空间特征的准确率,避免了获取第二图像特征的过程中丢失第一图像特征中的信息,导致提取的空间特征不准确。同理,在提取时空特征时,将融合第一融合空间特征与第二融合空间特征,对融合得到的第三融合空间特征进行时序特征提取,使提取时空特征时,能够利用更多的信息,进一步提高了时空特征的准确率,避免了获取第二融合空间特征的过程中丢失第一融合空间特征中的信息,导致提取的时空特征不准确。
并且,本申请实施例中,利用第一图像特征中每个区域特征对应的第一注意力参数,获取第一图像特征对应的第二图像特征,使第二图像特征中能够凸显出更加重要的图像区域的区域特征;同理,利用第一融合空间特征中每个空间子特征对应的第二注意力参数,获取第一融合空间特征对应的第二融合空间特征,使第二融合空间特征能够凸显出更加重要的医学图像的空间特征。
上述图3所示的实施例是以计算机设备直接对多个医学图像进行处理,实现对象识别为例进行说明,在另一实施例中,计算机设备调用图像识别模型对多个医学图像进行处理,实现对象识别。下面先对图像识别模型的结构进行介绍:
参见图4,该图像识别模型400包括第一提取网络401、第二提取网络402和识别网络403。其中,第一提取网络401与第二提取网络402连接,第二提取网络402还与识别网络403连接,第一提取网络401用于提取医学图像的空间特征,第二提取网络402用于提取目标对象的时空特征,识别网络403用于识别目标对象。
在一种可能实现方式中,图像识别模型400为TiT(Transformer in Transformer),即图像识别模型400为级联Transformer,TiT由至少两个Transformer级联得到。其中,第一提取网络和第二提取网络均为一个Transformer。
可选地,对于每个医学图像,分别采用不同的第一提取网络401来提取空间特征,这种情况下,参见图5,该图像识别模型400包括多个第一提取网络401(图5中以3个为例),每个第一提取网络401用于基于一个医学图像提取空间特征,该多个第一提取网络401分别与第二提取网络402连接。
可选地,参见图6,该图像识别模型4001还包括第三提取网络404,该第三提取网络404与第一提取网络401连接,该第三提取网络404用于提取医学图像的图像特征,即将医学图像转换为计算机设备能够处理的形式。在图像识别模型400包括多个第一提取网络401的情况下,第三提取网络404分别与每个第一提取网络401连接。
在一种可能实现方式中,第一提取网络401和第二提取网络402具有类似的网络结构,以第一提取网络401的网络结构为例,参见图7,该第一提取网络401包括第一正则化层411、第一注意力层421、第一提取层431。其中,第一正则化层411与第一注意力层421连接,第一注意力层421还与第一提取层431连接。
可选地,参见图8,第一提取层431包括第一融合层、第二正则化层和多层感知机,第一提取网络401还包括第二融合层441。其中,第一融合层与上一个网络的最后一层、第一注意力层421及第二正则化层连接,第二正则化层还与多层感知机连接,多层感知机还与第二融合层441连接,第二融合层还与第一融合层连接。
下面对调用上述所示的图像识别模型进行对象识别的过程进行详细说明。图9是本申请 实施例提供的另一种对象识别方法的流程图。本申请实施例的执行主体为计算机设备。参见图9,该方法包括以下步骤:
901、计算机设备调用第三提取网络,分别提取多个医学图像的第一图像特征。
其中,第三提取网络用于对医学图像进行编码,以获取医学图像的第一图像特征。
可选地,图像识别模型包括一个第三提取网络,该第三提取网络依次提取多个医学图像的第一图像特征;或者图像识别模型包括多个第三提取网络,每个第三提取网络提取一个医学图像的第一图像特征。
例如,第三提取网络采用下述公式提取多个医学图像的第一图像特征:
M=Encoder(x)
其中,M表示提取得到的第一图像特征,x表示输入的医学图像,Encoder(·)表示采用CNN进行提取。
其中,任一医学图像的分辨率(长×宽)为H×W,任一医学图像的通道数量为C,医学图像的个数为T,C和T均为正整数。
902、计算机设备调用第一提取网络,分别基于多个医学图像的第一图像特征,提取多个医学图像的空间特征。
本申请实施例中,图像识别模型中仅包括一个第一提取网络的情况下,计算机设备调用该第一提取网络,依次基于多个医学图像的第一图像特征提取空间特征,从而得到多个空间特征;图像识别模型包括多个第一提取网络的情况下,计算机设备分别调用一个第一提取网络,基于一个医学图像的第一图像特征提取空间特征,得到该医学图像的空间特征。
任一第一提取网络对医学图像的空间特征的提取过程是相同的。下面以任一第一提取网络为例,对空间特征的提取过程进行说明:
在一种可能实现方式中,第一提取网络包括第一注意力层和第一提取层,对于每个医学图像,计算机设备调用第一注意力层,将医学图像的第一图像特征划分为多个区域特征,分别获取多个区域特征对应的第一注意力参数,按照多个第一注意力参数,对多个区域特征进行融合,得到医学图像对应的第二图像特征;调用第一提取层,基于第二图像特征,提取医学图像的空间特征。其中,第一注意力参数表征对应的区域特征在第一图像特征中的重要程度,每个区域特征对应医学图像中的一个图像区域,医学图像包括多个图像区域。
对于第一注意力参数的确定,在一种可能实现方式中,计算机设备调用第一注意力层,将每个区域特征分别映射到至少两个特征空间中,得到每个区域特征对应的至少两个映射特征;基于每个区域特征对应的至少两个映射特征,获取每个区域特征对应的第一注意力参数。
例如,计算机设备调用第一注意力层,将每个区域特征分别映射至三个特征空间中,该三个特征空间分别对应的查询(query)维度、键(key)维度以及值(value)特征维度,采用下述公式,确定每个区域特征对应的第一注意力参数:
[q,k,v]=yU qkv
Figure PCTCN2022091089-appb-000001
其中,q表示查询维度的映射特征,k表示键维度的映射特征,v表示值特征维度的映射特征,y表示任一区域特征,U qkv表示训练得到的模型参数,A表示该任一区域特征对应的第一注意力参数,softmax(·)表示进行归一化处理,D h表示第一注意力层中的隐藏层的维度个数。
对应地,采用下述公式,确定医学图像对应的第二图像特征:
SA(y)=Av
MSA(y)=[SA 1(y);SA 2(y);…;SA k(y)]U mas
其中,SA(y)表示对任一区域特征进行加权后的区域特征,MSA(y)表示第二图像特征,k表示将医学图像划分为了k个图像区域,U mas表示训练得到的模型参数。
在一种可能实现方式中,第一提取网络包括残差网络结构,即第一提取网络还包括第一 融合层,计算机设备调用第一融合层,融合第二图像特征与第一图像特征,得到医学图像对应的第三图像特征;调用第一提取层,基于第三图像特征,提取医学图像的空间特征。
另外,在一种可能实现方式中,为了减小处理过程中的计算量,提高处理速度,第一提取网络还包括第一正则化层和第二正则化层,计算机设备调用第一正则化层,对第一图像特征进行归一化处理,得到处理后的第一图像特征。同理,计算机设备调用第二正则化层,对第三图像特征进行归一化处理,得到处理后的第三图像特征。
可选地,第一提取层包括多层感知机,计算机设备调用多层感知机基于第三图像特征,提取空间特征。
在一种可能实现方式中,为了避免在对第三图像特征进行处理得到空间特征的过程中丢失第三图像特征中的信息,导致后续提取的时空特征不准确,计算机设备融合第三图像特征与空间特征,得到融合后的空间特征,后续对该融合后的空间特征进行处理。
903、计算机设备调用第二提取网络,融合所提取的多个空间特征,得到第一融合空间特征,基于第一融合空间特征,提取时空特征。
在一种可能实现方式中,第二提取网络包括第三融合层,计算机设备调用该第三融合层融合多个空间特征,得到第一融合空间特征。例如,在第三融合层中采用下述公式,得到第一融合空间特征:
Figure PCTCN2022091089-appb-000002
其中,z表示第一融合空间特征,
Figure PCTCN2022091089-appb-000003
表示输入的医学图像的空间特征,T表示共T个医学图像。其中,
Figure PCTCN2022091089-appb-000004
与上述步骤902中得到的输出MSA(y)相比,
Figure PCTCN2022091089-appb-000005
是在MSA(y)中拼接了一行或者一列进行训练得到的模型参数后得到的。
在一种可能实现方式中,第二提取网络的网络结构与第一提取网络的网络结构类似,第二提取网络包括第二注意力层和第二提取层,计算机设备调用第二注意力层,将第一融合空间特征划分为多个空间子特征,分别获取多个空间子特征对应的第二注意力参数,基于多个第二注意力参数,融合多个空间子特征,得到多个医学图像对应的第二融合空间特征;调用第二提取层,基于第二融合空间特征,提取时空特征。
对于第二注意力参数的确定,在一种可能实现方式中,计算机设备调用第二注意力层,将每个空间子特征分别映射到至少两个特征空间中,得到每个空间子特征对应的至少两个映射特征;基于每个空间子特征对应的至少两个映射特征,获取每个空间子特征对应的第二注意力参数。
在一种可能实现方式中,第二提取网络包括残差网络结构,即第二提取网络还包括第四融合层,计算机设备调用第四融合层,融合第二融合空间特征与第一融合空间特征,得到目标对象的第三融合空间特征;调用第二提取层,基于第三融合空间特征,提取时空特征。
另外,在一种可能实现方式中,为了减小处理过程中的计算量,提高处理速度,第二提取网络还包括第三正则化层和第四正则化层,计算机设备调用第三正则化层,对第一融合空间特征进行归一化处理,得到处理后的第一融合空间特征。同理,计算机设备调用第四正则化层,对第三融合空间特征进行归一化处理,得到处理后的第三融合空间特征。
可选地,第二提取层包括多层感知机,计算机设备调用多层感知机对第三融合空间特征进行时序特征提取,得到时空特征。
例如,在第二提取网络中采用下述公式提取时空特征:
Figure PCTCN2022091089-appb-000006
其中,f表示时空特征,TT(·)表示进行时序特征提取,
Figure PCTCN2022091089-appb-000007
表示第一融合空间特征。其中,
Figure PCTCN2022091089-appb-000008
与上述融合得到的z相比,
Figure PCTCN2022091089-appb-000009
是在z中拼接了一行或一列进行训练得到的模型参数后得到的。
需要说明的是,本申请实施例仅是以一个第二提取层为例进行说明,在另一实施例中,图像识别模型包括多个第二提取层,将当前第二提取层输出的时空特征,输入至下一个第二提取层,直至得到最后一个第二提取层输出的时空特征,将最后一个第二提取层输出的时空特征确定为目标对象的时空特征。
904、计算机设备调用识别网络,基于时空特征识别目标对象,得到目标对象的识别结果。
其中,识别网络用于识别目标对象,得到目标对象的识别结果。
在一种可能实现方式中,识别网络包括MLP和激活函数Softmax,计算机设备调用该MLP和激活函数Softmax,识别目标对象,得到识别结果。
在一种可能实现方式中,识别网络的输出为0或1,输出为1时,表示目标对象为正常状态,输出为0时,表示目标对象为异常状态。或者,识别网络的输出为概率,输出的概率大于参考概率时,表示目标对象为正常状态,输出的概率不大于参考概率时,表示目标对象为异常状态。
例如,参见图10,以目标对象的三个医学图像为例,首先经过第三提取网络1001,提取三个医学图像对应的第一图像特征,将得到的三个第一图像特征分别输入至对应的第一提取网络1002,经过第一提取网络1002输出空间特征,再将三个空间特征输入至第二提取网络1003,经过第二提取网络1003输出目标对象的时空特征,再将时空特征输入至识别网络1004,得到目标对象的识别结果。其中,在任一第一提取网络1002中,经过正则化层对第一图像特征进行归一化处理,将处理后的第一图像特征分别映射至三个特征空间,再经过多头注意力层对映射到的三个映射特征进行处理,输出第二图像特征,融合第一图像特征与第二图像特征,得到第三图像特征,将第三图像特征再经过一个正则化层进行归一化处理,得到处理后的第三图像特征,将处理后的第三图像特征输入至多层感知机,经过多层感知机的处理,得到对应的空间特征,再经过一个融合层,融合该空间特征与第三图像特征,得到融合后的空间特征。
在另一种可能实现方式中,计算机设备调用识别网络,分别对目标对象的每个医学图像进行识别,识别出每个医学图像中的异常区域之后,在医学图像中标记出异常区域,输出标记后的医学图像。例如,采用彩色的实线圈出医学图像中的异常区域,或者在异常区域填充医学图像中没有的颜色,或者采用其他方式标记,本申请实施例对此不做限制。
另外,相关技术中提供了一种图像识别模型的结构,参见图11,也以目标对象的三个医学图像为例,这三个医学图像分别由对应的卷积神经网络1101进行特征提取,将提取得到的三个特征均输入至图卷积网络1102,由图卷积网络1102融合三个特征,对融合后的特征进行识别,得到识别结果。其中,图卷积网络1102中的一个圆表示提取得到的一个特征。
本申请与相关技术相比:
相关技术中针对每个医学图像需要分别训练对应的卷积神经网络,导致训练量大,模型训练困难,识别效率低,且由于是针对不同的医学图像分别提取特征,没有充分考虑不同医学图像之间的关系,因此,对多个医学图像的空间信息和时序信息利用不充分,导致识别准确率较低。
另外,相关技术中的图像识别模型包括Early fusion(早期融合)模型、Voting(投票)模型、MLP、LSTM(Long Short-Term Memory,长短期记忆网络)和GCN(Graph Convolutional Networks,图卷积网络),对比本申请中的图像识别模型TiT的识别结果与相关技术中的图像识别模型的识别结果,采用精确率(Precision)、召回率(Recall)、准确率(Accuracy)、和模型中的参数数量对识别结果进行评估,可以看出本申请中的图像识别模型的识别准确率更高,且训练过程更加简单。对比结果参见下述表1,从表1中可看出,本申请中的图像识别模型得到的识别结果的精确率、召回率和准确率均是最大的,且与GCN相比,本申请中的图像识别模型需要学习的参数数量较少。
表1
Figure PCTCN2022091089-appb-000010
另外,参见图12所示的医学图像及热力图,该热力图中指示了对应的医学图像中的病灶区域,通过对比本申请的识别结果及对应的热力图,可以确定本申请实施例提供的方法能够准确识别出医学图像中的病灶区域,得到的识别结果的准确率较高。
本申请实施例提供的方法,调用图像识别模型识别目标对象,先调用第一提取网络分别提取目标对象的多个医学图像的空间特征,在充分提取了每个医学图像的空间特征后,调用第二提取网络,融合多个空间特征,并基于得到的第一融合空间特征,提取目标对象的时空特征,该时空特征能够表征多个医学图像在不同时刻的空间信息的变化,且提取时考虑了多个医学图像之间的时间关系,使提取的时空特征能够更加准确地表示多个医学图像的空间信息和时序信息,从而调用识别网络,基于该时空特征识别目标对象时,也提高了识别结果的准确率。
并且,本申请实施例中的第一提取网络和第二提取网络中采用了残差网络结构,缓解了深度神经网络中增加深度带来的梯度消失问题,使提取空间特征或者提取时空特征时,能够利用更多的信息,进一步提高空间特征或时空特征的准确率。
并且,本申请实施例中,第一提取网络和第二提取网络均采用了注意力层,利用注意力层能够对第一图像特征进行进一步处理,使处理后的第二图像特征中能够凸显出更加重要的图像区域的区域特征;同理,利用注意力层能够对第一融合空间特征进行进一步处理,使处理后的第二融合空间特征能够凸显出更加重要的医学图像的空间特征。
本申请实施例中,计算机设备调用图像识别模型识别对象之前需要先训练图像识别模型。训练过程包括:
计算机设备获取多个样本图像及多个样本图像所属的样本识别结果;调用图像识别模型,对多个样本图像进行处理,得到样本对象的预测识别结果;根据样本识别结果和预测识别结果,训练图像识别模型。其中,多个样本图像为同一样本对象在不同时刻的图像。计算机设备对图像识别模型进行多次迭代训练,图像识别模型的训练次数达到参考次数,或者图像识别模型的训练时长达到参考时长时结束迭代训练。
可选地,采用已知的阴道镜数据集Time-lapsed Colposcopic Images(TCI,时序阴道镜图像)作为图像识别模型的样本数据集,该样本数据集中包含7668个病人的时序阴道镜图像,病人的年龄分布在24岁到49岁之间。这些病人被划分为4类,分别是non-cancerous(没有癌症)、Cervical Intraepithelial Neoplasia1(CIN1,宫颈上皮内瘤变1)、CIN2~3以及Cancer(癌症)。将CIN1、CIN2~3以及Cancer合并成1类,统称为低度鳞状上皮内病变或更严重。将样本数据集中80%的样本用于训练图像识别模型,20%的样本用于测试图像识别模型。其中,每个病人的样本数据均包含5个时间节点的图像(初始图像,60秒后的图像,90秒的图像,120秒的图像以及150秒的图像)。
在一种可能实现方式中,输出的识别结果为概率的情况下,计算机设备采用交叉熵损失函数或者其他损失函数,对输出的概率进行处理,根据损失函数的输出结果训练图像识别模型。
需要说明的是,上述图9中调用图像识别模型识别对象的计算机设备,与训练图像识别模型的计算机设备可以是同一个计算机设备,也可以是不同的计算机设备。例如,上述图9所示实施例中的计算机设备是服务器,或者是用户的终端,训练图像识别模型的计算机设备是开发人员的终端或服务器。或者,上述图9所示实施例中的计算机设备和训练图像识别模型的计算机设备是同一个服务器。
本申请实施例中的图像识别模型中包含残差网络结构,因此该图像识别模型的模型训练过程更加简单,计算量小,明显提高了图像识别模型的训练速度。
本申请实施例提供的方法可应用于多种场景下,以下将通过图13所示的实施例,对本申请的图像分割场景进行说明:
1301、计算机设备采集宫颈在不同时刻的多个CT图像。
1302、计算机设备分别提取每个CT图像的第一图像特征。
1303、计算机设备分别基于提取的多个第一图像特征,提取每个CT图像的空间特征。
1304、计算机设备融合所提取的多个空间特征,得到宫颈的第一融合空间特征。
1305、计算机设备基于第一融合空间特征,提取宫颈的时空特征。
1306、计算机设备基于时空特征,确定宫颈的识别结果,该识别结果用于指示每个CT图像中的异常区域。
1307、计算机设备基于宫颈的识别结果,分别对每个CT图像进行分割,得到每个CT图像中的病灶区域。
图14是本申请实施例提供的一种对象识别装置的结构示意图。参见图14,该装置包括:
空间特征提取模块1401,用于分别提取多个医学图像的空间特征,多个医学图像为同一目标对象在不同时刻的图像;
空间特征融合模块1402,用于融合所提取的多个空间特征,得到目标对象的第一融合空间特征;
时空特征提取模块1403,用于基于第一融合空间特征,提取目标对象的时空特征,时空特征表征多个医学图像在不同时刻的空间信息的变化;
对象识别模块1404,用于基于时空特征识别目标对象,得到目标对象的识别结果。
本申请实施例提供的装置,先分别提取目标对象的多个医学图像的空间特征,在充分提取了每个医学图像的空间特征后,融合多个空间特征,并基于得到的第一融合空间特征,提取目标对象的时空特征,该时空特征能够表征多个医学图像在不同时刻的空间信息的变化,且提取时考虑了多个医学图像之间的时间关系,使提取的时空特征能够更加准确地表示多个医学图像的空间信息和时序信息,从而基于该时空特征识别目标对象时,也提高了识别结果的准确率。
在一种可能实现方式中,参见图15,该装置还包括:
图像特征提取模块1405,用于分别提取多个医学图像的第一图像特征;
空间特征提取模块1401,用于分别基于多个医学图像的第一图像特征,提取多个医学图像的空间特征。
在一种可能实现方式中,参见图15,空间特征提取模块1401,包括:
第一注意力确定单元1411,用于对于每个医学图像,将医学图像的第一图像特征划分为多个区域特征,分别获取多个区域特征对应的第一注意力参数,第一注意力参数表征对应的区域特征在第一图像特征中的重要程度,医学图像包括多个图像区域,每个区域特征对应医学图像中的一个图像区域;
第一特征融合单元1421,用于基于多个第一注意力参数,对多个区域特征进行加权融合,得到医学图像对应的第二图像特征;
空间特征提取单元1431,用于基于第二图像特征,提取医学图像的空间特征。
在一种可能实现方式中,第一注意力确定单元1411,用于:
将每个区域特征分别映射到至少两个特征空间中,得到每个区域特征对应的至少两个映射特征,其中至少两个特征空间表征对应图像区域中的不同像素点在对应的维度上的相似度;
基于每个区域特征对应的至少两个映射特征,获取每个区域特征对应的第一注意力参数。
在一种可能实现方式中,空间特征提取单元1431,用于:
融合第二图像特征与第一图像特征,得到医学图像对应的第三图像特征;
基于第三图像特征,提取医学图像的空间特征。
在一种可能实现方式中,参见图15,空间特征提取模块1401,还包括:
第一归一化单元1441,用于对第三图像特征进行归一化处理,得到处理后的第三图像特征。
在一种可能实现方式中,参见图15,空间特征提取模块1401,还包括:
第二归一化单元1451,用于分别对每个医学图像的第一图像特征进行归一化处理,得到每个医学图像处理后的第一图像特征。
在一种可能实现方式中,参见图15,时空特征提取模块1403,包括:
第二注意力确定单元1413,用于将第一融合空间特征划分为多个空间子特征,分别获取多个空间子特征对应的第二注意力参数,第二注意力参数表征对应的空间子特征在第一融合空间特征中的重要程度,每个空间子特征对应一个医学图像;
第二特征融合单元1423,用于基于多个第二注意力参数,融合多个空间子特征,得到多个医学图像对应的第二融合空间特征;
时空特征提取单元1433,用于基于第二融合空间特征,提取时空特征。
在一种可能实现方式中,参见图15,时空特征提取单元1433,用于:
融合第二融合空间特征与第一融合空间特征,得到目标对象的第三融合空间特征;
基于第三融合空间特征,提取时空特征。
在一种可能实现方式中,识别结果用于指示目标对象的状态,参见图15,该装置还包括:
状态确定模块1406,用于基于识别结果,确定目标对象的状态。
在一种可能实现方式中,识别结果用于指示每个医学图像中的异常区域,参见图15,该装置还包括:
图像分割模块1407,用于基于识别结果,分别对每个医学图像进行分割,得到每个医学图像中的异常区域。
在一种可能实现方式中,图像识别模型包括第一提取网络、第二提取网络和识别网络,空间特征提取模块1401,用于调用第一提取网络,分别提取多个医学图像的空间特征;
空间特征融合模块1402,用于调用第二提取网络,融合所提取的多个空间特征,得到第一融合空间特征;
时空特征提取模块1403,用于调用第二提取网络,基于第一融合空间特征,提取时空特征;
对象识别模块1404,用于调用识别网络,基于时空特征识别目标对象,得到目标对象的识别结果。
在一种可能实现方式中,图像识别模型还包括第三提取网络,参见图15,装置还包括:
图像特征提取模块1405,用于调用第三提取网络,分别提取多个医学图像的第一图像特征;
空间特征提取模块1401,用于调用第一提取网络,分别基于多个医学图像的第一图像特征,提取多个医学图像的空间特征。
在一种可能实现方式中,第一提取网络包括第一注意力层和第一提取层,参见图14,空间特征提取模块1401,包括:
第一注意力确定单元1411,用于对于每个医学图像,调用第一注意力层,将医学图像的第一图像特征划分为多个区域特征,分别获取多个区域特征对应的第一注意力参数,第一注 意力参数表征对应的区域特征在图像特征中的重要程度,每个区域特征对应医学图像中的一个图像区域,医学图像包括多个图像区域;
第一特征融合单元1421,用于调用第一注意力层,按照多个第一注意力参数,对多个区域特征进行融合,得到医学图像对应的第二图像特征;
空间特征提取单元1431,用于调用第一提取层,基于第二图像特征,提取医学图像的空间特征。
在一种可能实现方式中,第二提取网络包括第二注意力层和第二提取层,参见图14,时空特征提取模块1403,包括:
第二注意力确定单元1413,用于调用第二注意力层,将第一融合空间特征划分为多个空间子特征,分别获取多个空间子特征对应的第二注意力参数,第二注意力参数表征对应的空间子特征在第一融合空间特征中的重要程度,每个空间子特征对应一个医学图像;
第二特征融合单元1423,用于调用第二注意力层,基于多个第二注意力参数,融合多个空间子特征,得到多个医学图像对应的第二融合空间特征;
时空特征提取单元1433,用于调用第二提取层,基于第二融合空间特征,提取时空特征。
在一种可能实现方式中,图像识别模型的训练过程包括:
获取多个样本图像及多个样本图像所属的样本识别结果,多个样本图像为同一样本对象在不同时刻的图像;
调用图像识别模型,对多个样本图像进行处理,得到样本对象的预测识别结果;
根据样本识别结果和预测识别结果,训练图像识别模型。
上述所有可选技术方案,可以采用任意结合形成本申请的可选实施例,在此不再一一赘述。
需要说明的是:上述实施例提供的对象识别装置在识别对象时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将计算机设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的对象识别装置与对象识别方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
本申请实施例还提供了一种计算机设备,该计算机设备包括处理器和存储器,存储器中存储有至少一条计算机程序,该至少一条计算机程序由处理器加载并执行,以实现上述实施例的对象识别方法所执行的操作。
可选地,该计算机设备提供为终端。图16是本申请实施例提供的一种终端1600的结构示意图。该终端1600可以是智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能手表等,该终端1600还可能被称为用户设备、便携式终端、膝上型终端、台式终端等其他名称。
终端1600包括有:处理器1601和存储器1602。
处理器1601可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器1601可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。在一些实施例中,处理器1601还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器1602可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。在一些实施例中,存储器1602中的非暂态的计算机可读存储介质用于存储至少一条计算机程序,该至少一条计算机程序用于被处理器1601所执行以实现本申请中方法实施例提供的对象识别方法。
在一些实施例中,终端1600还可选包括有:外围设备接口1603和至少一个外围设备。处理器1601、存储器1602和外围设备接口1603之间可以通过总线或信号线相连。各个外围 设备可以通过总线、信号线或电路板与外围设备接口1603相连。具体地,外围设备包括:显示屏1604、摄像头组件1605中的至少一种。
外围设备接口1603可被用于将I/O(Input/Output,输入/输出)相关的至少一个外围设备连接到处理器1601和存储器1602。在一些实施例中,处理器1601、存储器1602和外围设备接口1603被集成在同一芯片或电路板上;在一些其他实施例中,处理器1601、存储器1602和外围设备接口1603中的任意一个或两个可以在单独的芯片或电路板上实现,本实施例对此不加以限定。
显示屏1604用于显示UI(User Interface,用户界面)。该UI可以包括图形、文本、图标、视频及其它们的任意组合。当显示屏1604是触摸显示屏时,显示屏1604还具有采集在显示屏1604的表面或表面上方的触摸信号的能力。该触摸信号可以作为控制信号输入至处理器1601进行处理。
摄像头组件1605用于采集图像或视频。可选地,摄像头组件1605包括前置摄像头和后置摄像头。前置摄像头设置在终端的前面板,后置摄像头设置在终端的背面。
在一些实施例中,终端1600还包括有一个或多个传感器1606。该一个或多个传感器1606包括但不限于:加速度传感器1611、陀螺仪传感器1612、压力传感器1613、光学传感器1614以及接近传感器1615。
本领域技术人员可以理解,图16中示出的结构并不构成对终端1600的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
可选地,该计算机设备提供为服务器。图17是本申请实施例提供的一种服务器的结构示意图,该服务器1700可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(Central Processing Units,CPU)1701和一个或一个以上的存储器1702,其中,存储器1702中存储有至少一条计算机程序,该至少一条计算机程序由处理器1701加载并执行以实现上述各个方法实施例提供的方法。当然,该服务器还可以具有有线或无线网络接口、键盘以及输入输出接口等部件,以便进行输入输出,该服务器还可以包括其他用于实现设备功能的部件,在此不做赘述。
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有至少一条计算机程序,该至少一条计算机程序由处理器加载并执行,以实现上述实施例的对象识别方法所执行的操作。
本申请实施例还提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机程序代码,该计算机程序代码存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机程序代码,处理器执行该计算机程序代码,使得计算机设备实现上述实施例的对象识别方法所执行的操作。
需要说明的是,在本申请实施例中,涉及到对象特征、对象图像等相关的数据,当本申请以上实施例运用到具体产品或技术中时,需要获得用户许可或者同意,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,该程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上仅为本申请实施例的可选实施例,并不用以限制本申请实施例,凡在本申请实施例的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (16)

  1. 一种对象识别方法,所述方法包括:
    计算机设备分别提取多个医学图像的空间特征,所述多个医学图像为同一目标对象在不同时刻的图像;
    所述计算机设备融合所提取的多个空间特征,得到所述目标对象的第一融合空间特征;
    所述计算机设备基于所述第一融合空间特征,提取所述目标对象的时空特征,所述时空特征表征所述多个医学图像在不同时刻的空间信息的变化;
    所述计算机设备基于所述时空特征识别所述目标对象,得到所述目标对象的识别结果。
  2. 根据权利要求1所述的方法,其中,所述计算机设备分别提取多个医学图像的空间特征之前,所述方法还包括:
    所述计算机设备分别提取所述多个医学图像的第一图像特征;
    所述计算机设备分别提取多个医学图像的空间特征,包括:
    所述计算机设备分别基于所述多个医学图像的第一图像特征,提取所述多个医学图像的空间特征。
  3. 根据权利要求2所述的方法,其中,所述计算机设备分别基于所述多个医学图像的第一图像特征,提取所述多个医学图像的空间特征,包括:
    对于每个医学图像,所述计算机设备将所述医学图像的第一图像特征划分为多个区域特征,分别获取所述多个区域特征对应的第一注意力参数,所述第一注意力参数表征对应的区域特征在所述图像特征中的重要程度,所述医学图像包括多个图像区域,每个区域特征对应所述医学图像中的一个图像区域;
    所述计算机设备基于多个第一注意力参数,对所述多个区域特征进行加权融合,得到所述医学图像对应的第二图像特征;
    所述计算机设备基于所述第二图像特征,提取所述医学图像的空间特征。
  4. 根据权利要求3所述的方法,其中,所述分别获取所述多个区域特征对应的第一注意力参数,包括:
    将每个区域特征分别映射到至少两个特征空间中,得到每个区域特征对应的至少两个映射特征,其中所述至少两个特征空间表征对应图像区域中的不同像素点在对应的维度上的相似度;
    基于所述每个区域特征对应的至少两个映射特征,获取所述每个区域特征对应的第一注意力参数。
  5. 根据权利要求3所述的方法,其中,所述计算机设备基于所述第二图像特征,提取所述医学图像的空间特征,包括:
    所述计算机设备融合所述第二图像特征与所述第一图像特征,得到所述医学图像对应的第三图像特征;
    所述计算机设备基于所述第三图像特征,提取所述医学图像的空间特征。
  6. 根据权利要求5所述的方法,其中,所述基于所述第三图像特征,提取所述医学图像的空间特征之前,所述方法还包括:
    对所述第三图像特征进行归一化处理,得到处理后的所述第三图像特征。
  7. 根据权利要求3所述的方法,其中,所述对于每个医学图像,所述计算机设备将所述医学图像的第一图像特征划分为多个区域特征之前,所述方法还包括:
    所述计算机设备分别对所述每个医学图像的第一图像特征进行归一化处理,得到所述每个医学图像处理后的所述第一图像特征。
  8. 根据权利要求1所述的方法,其中,所述计算机设备基于所述第一融合空间特征,提取所述目标对象的时空特征,包括:
    所述计算机设备将所述第一融合空间特征划分为多个空间子特征,分别获取所述多个空间子特征对应的第二注意力参数,所述第二注意力参数表征对应的空间子特征在所述第一融合空间特征中的重要程度,每个空间子特征对应一个医学图像;
    所述计算机设备基于多个第二注意力参数,融合所述多个空间子特征,得到所述多个医学图像对应的第二融合空间特征;
    所述计算机设备基于所述第二融合空间特征,提取所述时空特征。
  9. 根据权利要求8所述的方法,其中,所述计算机设备基于所述第二融合空间特征,提取所述时空特征,包括:
    所述计算机设备融合所述第二融合空间特征与所述第一融合空间特征,得到所述目标对象的第三融合空间特征;
    所述计算机设备基于所述第三融合空间特征,提取所述时空特征。
  10. 根据权利要求1所述的方法,其中,所述方法基于图像识别模型执行,所述图像识别模型包括第一提取网络、第二提取网络和识别网络,所述计算机设备分别提取多个医学图像的空间特征,包括:
    所述计算机设备调用所述第一提取网络,分别提取多个医学图像的空间特征;
    所述计算机设备融合所提取的多个空间特征,得到所述目标对象的第一融合空间特征,包括:
    所述计算机设备调用所述第二提取网络,融合所提取的多个空间特征,得到所述第一融合空间特征;
    所述计算机设备基于所述第一融合空间特征,提取所述目标对象的时空特征,包括:
    所述计算机设备调用所述第二提取网络,基于所述第一融合空间特征,提取所述时空特征;
    所述计算机设备基于所述时空特征识别所述目标对象,得到所述目标对象的识别结果,包括:
    所述计算机设备调用所述识别网络,基于所述时空特征识别所述目标对象,得到所述目标对象的识别结果。
  11. 根据权利要求10所述的方法,其中,所述图像识别模型还包括第三提取网络,所述计算机设备调用所述第一提取网络,分别提取多个医学图像的空间特征之前,所述方法还包括:
    所述计算机设备调用所述第三提取网络,分别提取所述多个医学图像的第一图像特征;
    所述计算机设备调用所述第一提取网络,分别提取多个医学图像的空间特征,包括:
    所述计算机设备调用所述第一提取网络,分别基于所述多个医学图像的第一图像特征,提取所述多个医学图像的空间特征。
  12. 根据权利要求11所述的方法,其中,所述第一提取网络包括第一注意力层和第一提取层,所述计算机设备调用所述第一提取网络,分别提取多个医学图像的空间特征,包括:
    对于每个医学图像,所述计算机设备调用所述第一注意力层,将所述医学图像的第一图像特征划分为多个区域特征,分别获取所述多个区域特征对应的第一注意力参数,所述第一注意力参数表征对应的区域特征在所述第一图像特征中的重要程度,每个区域特征对应所述医学图像中的一个图像区域,所述医学图像包括多个图像区域;
    所述计算机设备调用所述第一注意力层,按照多个第一注意力参数,对所述多个区域特征进行融合,得到所述医学图像对应的第二图像特征;
    所述计算机设备调用所述第一提取层,基于所述第二图像特征,提取所述医学图像的空间特征。
  13. 一种对象识别装置,所述装置包括:
    空间特征提取模块,用于分别提取多个医学图像的空间特征,所述多个医学图像为同一目标对象在不同时刻的图像;
    空间特征融合模块,用于融合所提取的多个空间特征,得到所述目标对象的第一融合空间特征;
    时空特征提取模块,用于基于所述第一融合空间特征,提取所述目标对象的时空特征,所述时空特征表征所述多个医学图像在不同时刻的空间信息的变化;
    对象识别模块,用于基于所述时空特征识别所述目标对象,得到所述目标对象的识别结果。
  14. 一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有至少一条计算机程序,所述至少一条计算机程序由所述处理器加载并执行,以实现如权利要求1至12任一权利要求所述的对象识别方法所执行的操作。
  15. 一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条计算机程序,所述至少一条计算机程序由处理器加载并执行,以实现如权利要求1至12任一权利要求所述的对象识别方法所执行的操作。
  16. 一种计算机程序产品,所述计算机程序产品包括计算机程序代码,所述计算机程序代码存储在计算机可读存储介质中,计算机设备的处理器从所述计算机可读存储介质读取所述计算机程序代码,所述处理器执行所述计算机程序代码,使得所述计算机设备实现如权利要求1至12任一权利要求所述的对象识别方法所执行的操作。
PCT/CN2022/091089 2021-06-03 2022-05-06 对象识别方法、装置、计算机设备及存储介质 WO2022252908A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/991,385 US20230080098A1 (en) 2021-06-03 2022-11-21 Object recognition using spatial and timing information of object images at diferent times

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110617124.4 2021-06-03
CN202110617124.4A CN113610750B (zh) 2021-06-03 2021-06-03 对象识别方法、装置、计算机设备及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/991,385 Continuation US20230080098A1 (en) 2021-06-03 2022-11-21 Object recognition using spatial and timing information of object images at diferent times

Publications (1)

Publication Number Publication Date
WO2022252908A1 true WO2022252908A1 (zh) 2022-12-08

Family

ID=78303409

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/091089 WO2022252908A1 (zh) 2021-06-03 2022-05-06 对象识别方法、装置、计算机设备及存储介质

Country Status (3)

Country Link
US (1) US20230080098A1 (zh)
CN (1) CN113610750B (zh)
WO (1) WO2022252908A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610750A (zh) * 2021-06-03 2021-11-05 腾讯医疗健康(深圳)有限公司 对象识别方法、装置、计算机设备及存储介质

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114529889A (zh) * 2022-01-28 2022-05-24 燕山大学 一种分心驾驶行为识别方法、装置及存储介质
CN114677443B (zh) * 2022-05-27 2022-08-19 深圳智华科技发展有限公司 光学定位方法、装置、设备及存储介质
CN115496976B (zh) * 2022-08-29 2023-08-11 锋睿领创(珠海)科技有限公司 多源异构数据融合的视觉处理方法、装置、设备及介质
CN116759079B (zh) * 2023-08-23 2023-11-03 首都医科大学附属北京朝阳医院 基于多特征融合的出血转化判定方法、装置、介质及终端
CN117476208A (zh) * 2023-11-06 2024-01-30 无锡市惠山区人民医院 一种基于时间序列的医学影像的认知功能障碍智能辅助识别系统
CN117351197A (zh) * 2023-12-04 2024-01-05 北京联影智能影像技术研究院 图像分割方法、装置、计算机设备和存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109117890A (zh) * 2018-08-24 2019-01-01 腾讯科技(深圳)有限公司 一种图像分类方法、装置和存储介质
CN109190540A (zh) * 2018-06-06 2019-01-11 腾讯科技(深圳)有限公司 活检区域预测方法、图像识别方法、装置和存储介质
CN110472532A (zh) * 2019-07-30 2019-11-19 中国科学院深圳先进技术研究院 一种视频对象行为识别方法和装置
CN110490863A (zh) * 2019-08-22 2019-11-22 北京红云智胜科技有限公司 基于深度学习的检测冠脉造影有无完全闭塞病变的系统
US20200311940A1 (en) * 2019-04-01 2020-10-01 Siemens Healthcare Gmbh Probabilistic motion model for generating medical images or medical image sequences
CN113610750A (zh) * 2021-06-03 2021-11-05 腾讯医疗健康(深圳)有限公司 对象识别方法、装置、计算机设备及存储介质

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7088850B2 (en) * 2004-04-15 2006-08-08 Edda Technology, Inc. Spatial-temporal lesion detection, segmentation, and diagnostic information extraction system and method
US20140310243A1 (en) * 2010-08-16 2014-10-16 Mr. Steven James McGee Heart beacon cycle
CN104216949A (zh) * 2014-08-13 2014-12-17 中国科学院计算技术研究所 一种融合空间信息的图像特征聚合表示方法及系统
JP2019152927A (ja) * 2018-02-28 2019-09-12 株式会社エクォス・リサーチ 画像データ生成装置、画像認識装置、画像データ生成プログラム、及び、画像認識プログラム
CN109461140A (zh) * 2018-09-29 2019-03-12 沈阳东软医疗系统有限公司 图像处理方法及装置、设备和存储介质
CN110689025B (zh) * 2019-09-16 2023-10-27 腾讯医疗健康(深圳)有限公司 图像识别方法、装置、系统及内窥镜图像识别方法、装置
CN110633700B (zh) * 2019-10-21 2022-03-25 深圳市商汤科技有限公司 视频处理方法及装置、电子设备和存储介质
CN111652331B (zh) * 2020-08-05 2021-05-11 腾讯科技(深圳)有限公司 一种图像识别方法、装置和计算机可读存储介质
CN112016461B (zh) * 2020-08-28 2024-06-11 深圳市信义科技有限公司 一种多目标的行为识别方法及系统
CN112733818B (zh) * 2021-03-30 2021-08-13 深圳佑驾创新科技有限公司 基于注意力机制的车灯状态识别方法、装置、终端和介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109190540A (zh) * 2018-06-06 2019-01-11 腾讯科技(深圳)有限公司 活检区域预测方法、图像识别方法、装置和存储介质
CN109117890A (zh) * 2018-08-24 2019-01-01 腾讯科技(深圳)有限公司 一种图像分类方法、装置和存储介质
US20200311940A1 (en) * 2019-04-01 2020-10-01 Siemens Healthcare Gmbh Probabilistic motion model for generating medical images or medical image sequences
CN110472532A (zh) * 2019-07-30 2019-11-19 中国科学院深圳先进技术研究院 一种视频对象行为识别方法和装置
CN110490863A (zh) * 2019-08-22 2019-11-22 北京红云智胜科技有限公司 基于深度学习的检测冠脉造影有无完全闭塞病变的系统
CN113610750A (zh) * 2021-06-03 2021-11-05 腾讯医疗健康(深圳)有限公司 对象识别方法、装置、计算机设备及存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610750A (zh) * 2021-06-03 2021-11-05 腾讯医疗健康(深圳)有限公司 对象识别方法、装置、计算机设备及存储介质

Also Published As

Publication number Publication date
CN113610750B (zh) 2024-02-06
US20230080098A1 (en) 2023-03-16
CN113610750A (zh) 2021-11-05

Similar Documents

Publication Publication Date Title
WO2022252908A1 (zh) 对象识别方法、装置、计算机设备及存储介质
WO2021179205A1 (zh) 医学图像分割方法、医学图像分割装置及终端设备
WO2021051965A1 (zh) 图像处理方法及装置、电子设备、存储介质和计算机程序
US11900647B2 (en) Image classification method, apparatus, and device, storage medium, and medical electronic device
WO2020103676A1 (zh) 图像识别方法、装置、系统及存储介质
CN110276366A (zh) 使用弱监督模型来检测对象
Kou et al. Microaneurysms segmentation with a U-Net based on recurrent residual convolutional neural network
CN109997147A (zh) 用于诊断耳部病理的耳镜图像分析的系统和方法
US11967181B2 (en) Method and device for retinal image recognition, electronic equipment, and storage medium
WO2021159811A1 (zh) 青光眼辅助诊断装置、方法及存储介质
WO2023165012A1 (zh) 问诊方法和装置、电子设备及存储介质
CN113257383B (zh) 匹配信息确定方法、显示方法、装置、设备及存储介质
CN111598168B (zh) 图像分类方法、装置、计算机设备及介质
WO2023065503A1 (zh) 一种面部表情的分类方法和电子设备
WO2023173646A1 (zh) 一种表情识别方法及装置
KR101925603B1 (ko) 병리 영상의 판독을 보조하는 방법 및 이를 이용한 장치
Zhang et al. A survey of wound image analysis using deep learning: Classification, detection, and segmentation
WO2023142532A1 (zh) 一种推理模型训练方法及装置
CN110796659A (zh) 一种目标检测结果的鉴别方法、装置、设备及存储介质
Huan et al. Multilevel and multiscale feature aggregation in deep networks for facial constitution classification
TWI728369B (zh) 人工智慧雲端膚質與皮膚病灶辨識方法及其系統
Wang et al. Multiscale feature fusion for skin lesion classification
Qiu et al. A novel tongue feature extraction method on mobile devices
Yan et al. Skin lesion classification based on the VGG‐16 fusion residual structure
CN113761119A (zh) 状态检测方法、装置、计算机设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22814955

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE