CN117352133A - Multi-mode data-based multi-task combined learning traditional Chinese medicine virtual-actual functional state identification method - Google Patents

Multi-mode data-based multi-task combined learning traditional Chinese medicine virtual-actual functional state identification method Download PDF

Info

Publication number
CN117352133A
CN117352133A CN202311220175.9A CN202311220175A CN117352133A CN 117352133 A CN117352133 A CN 117352133A CN 202311220175 A CN202311220175 A CN 202311220175A CN 117352133 A CN117352133 A CN 117352133A
Authority
CN
China
Prior art keywords
virtual
task
data
real
chinese medicine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311220175.9A
Other languages
Chinese (zh)
Inventor
胡镜清
王传池
吴珊
陈南杰
许强
刘微微
陈帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Xinhuangpu Joint Innovation Institute Of Traditional Chinese Medicine
Hu Jingqing
Original Assignee
Guangdong Xinhuangpu Joint Innovation Institute Of Traditional Chinese Medicine
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Xinhuangpu Joint Innovation Institute Of Traditional Chinese Medicine filed Critical Guangdong Xinhuangpu Joint Innovation Institute Of Traditional Chinese Medicine
Priority to CN202311220175.9A priority Critical patent/CN117352133A/en
Publication of CN117352133A publication Critical patent/CN117352133A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/90ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to alternative medicines, e.g. homeopathy or oriental medicines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Public Health (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Alternative & Traditional Medicine (AREA)
  • Pathology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The application belongs to the technical field of traditional Chinese medicine virtual-real function state identification, and provides a multi-mode data-based multi-task combined learning traditional Chinese medicine virtual-real function state identification method; according to the invention, the virtual-real identification task is subdivided into the virtual-real identification classification task and the key slot value extraction task, so that traditional Chinese medicine multi-mode data such as tongue images, facial videos, images, audios and texts can be effectively fused, key feature vectors are extracted through attention mechanisms aiming at different tasks, losses of different tasks are obtained, and the losses of different tasks are fused by adopting a gate mechanism; the multi-task combined learning virtual-actual recognition model constructed by the invention can objectively quantify the result probability of virtual-actual recognition, and realizes efficient and accurate traditional Chinese medicine virtual-actual functional state recognition.

Description

Multi-mode data-based multi-task combined learning traditional Chinese medicine virtual-actual functional state identification method
Technical Field
The application relates to the technical field of traditional Chinese medicine virtual-real functional state identification, in particular to application of a multi-mode data fusion and multi-task joint learning technology in the aspect of traditional Chinese medicine virtual-real functional state identification; more particularly, the invention relates to a multi-modal data-based multi-task combined learning traditional Chinese medicine virtual-real functional state identification method.
Background
The deficiency and excess are terms of traditional Chinese medicine, and simply refer to the functional state of traditional Chinese medicine of a human. The traditional Chinese medicine holds that the deficiency and excess functional state is a relatively stable inherent characteristic which is synthesized in aspects of morphological structure, physiological function and psychological state and is formed on the basis of congenital endowment and acquired in the life process of a human body. The concept of the functional state of deficiency and excess is discussed in detail in the "Su in Huang Di Nei Jing" at the earliest, and in addition, the distribution of deficiency and excess in different regions and different people is different. In the current classification process of the deficiency and excess functional states, eight-class syndrome differentiation is a core method, and judgment is carried out by observing the information of the signs, pulse conditions and the like of people. The traditional Chinese medicine functional state deficiency and excess identification research mainly takes the Chinese medicine constitution classification and judgment issued by the Chinese medicine society as a main reference standard in the Chinese medicine constitution quantity table written by the Wang Qi institution team. In the clinical functional state false-true identification process, the identification result is seriously dependent on subjective feeling of an object to be identified and professional knowledge and experience of doctors, and lacks of an objective analysis process.
Along with the development of artificial intelligence technology, technology such as deep learning enables traditional Chinese medicine research, so that objectively identifying virtual and actual functional states becomes possible, for example, the Chinese patent with publication number of CN110532907B is published on days 2022-01-21: in the traditional Chinese medicine human body constitution classification method based on facial image and tongue image bimodal feature extraction, facial tongue features of patients relevant to constitution are extracted through methods such as deep learning, so that accuracy of constitution classification results is improved. Another example is China patent publication No. CN 116189884B with publication No. 2023-07-25: a multimode fusion traditional Chinese medicine physique judging method and system based on facial vision are characterized in that the face, lips, eyes, tongue and pulse are detected through an artificial intelligent advanced paradigm, a multimode fusion technology is used for carrying out multidimensional data fusion, different modal data are mutually supplemented in an attempt, a conclusion is more comprehensively and accurately obtained, and under the condition of a limited carrying structure, the collection, the processing and the analysis of various human body data are realized. However, from the result, the identification accuracy of the prior art is still to be improved.
Disclosure of Invention
Aiming at the limitations of the prior art, the invention provides a multi-mode data-based multi-task combined learning traditional Chinese medicine virtual-real functional state identification method, which adopts the following technical scheme:
a method for constructing a multi-mode data-based multi-task joint learning virtual-real identification model comprises the following steps:
s1, collecting traditional Chinese medicine multi-mode data of a preset number of sample personnel; the traditional Chinese medicine multi-modal data comprises at least two of image data, video data, voice data and text data; marking the types of the virtual and real functional states of the sample personnel according to the preset virtual and real functional states;
s2, extracting feature vectors of the traditional Chinese medicine multi-mode data, and labeling key slot closing values of an extraction result; the extracted result is subjected to data normalization to obtain normalized feature vectors
S3, applying a preset two-way long and short time memory model to the feature vectorPerforming representation learning, and obtaining a recognition classification feature representation vector y according to a preset recognition classification attention mechanism and a trough value attention mechanism cls Slot value attention extraction feature representation vector y ext
S4, representing the vector y by the identification classification characteristic cls Slot value attention extraction feature representation vector y ext Performing multi-task joint learning on a preset neural network by combining the labeling results of the steps S1 and S2 to obtain an identification classification task loss cls Slot value extraction task loss ext
S5, using a door mechanism network to identify and classify task loss cls Slot value extraction task loss ext Fusion to obtain fusion loss gate
S6, according to the fusion loss gate And correcting the weights of all layers of the neural network through back propagation to complete the construction of the multi-task joint learning virtual-real identification model.
Compared with the prior art, the virtual-real identification task is subdivided into the virtual-real identification classification task and the key slot value extraction task, so that traditional Chinese medicine multi-mode data such as tongue images, facial videos, images, audios and texts can be effectively fused, key feature vectors are extracted through attention mechanisms aiming at different tasks in the process of model construction, losses of the different tasks are obtained, and the losses of the different tasks are fused through a gate mechanism; the multi-task combined learning virtual-actual recognition model constructed by the invention can objectively quantify the result probability of virtual-actual recognition, and realizes efficient and accurate traditional Chinese medicine virtual-actual functional state recognition.
As a preferred solution, the virtual-real functional states include five virtual: yin deficiency, blood deficiency, qi stagnation, fire heat, qi deficiency; the method also comprises the following steps: wind, essence deficiency, phlegm dampness, blood stasis and yang deficiency.
As a preferred embodiment, the step S3 includes the following steps:
feature vector unifying sequence length to TInputting to the bidirectional long and short time memory model to generate the final hidden state +.>Wherein: />Representing the forward output of the bidirectional long-short-time memory model output in the ith time step; />The reverse output of the bidirectional long-short-time memory model output in the ith time step is shown;
feature vectorThe j-th time step after the two-way long-short time memory model is used for hiding the state h j Considered as key K and value V, with the weight matrix to be trained +.>Respectively regarded as query vectors Q; the slot value attention deficit feature representation vector y is obtained by the following formula ext
Wherein:representing feature vector +.>A slot value label for the ith feature; />Context vector representing slot value, calculated as hidden state h of bidirectional long-short-time memory model 1 ,...,h T Weighted sum +.>
Sigma represents an activation function; i, j each represent a time sequence of steps of hidden states, e i,j Representing the j-th time step hidden state h j Hidden state h with the ith time step i AndThe weight feature vector obtained after multiplication represents; />Represented as the ith time step hidden state h i In h j The weight value exp (e) obtained for querying Q and the key value K i, ) And dividing the value obtained after the hidden state inquiry of each time step by the value to obtain a normalized weight result.
As a preferred embodiment, the step S5 is implemented by the following formula:
loss gate =∑v*tanh(W 1 g loss cls +W 2 g loss ext )
wherein v, W 1 g W is provided 2 g Are trainable parameters.
As a preferred aspect, the image data includes tongue images and/or face images; in the step S2, the image data is subjected to feature vector extraction by:
after preprocessing including size adjustment, graying and denoising is carried out on the image data, inputting the image data into a preset CNN model to demarcate an ROI image of a region of interest, and inputting the ROI image into a preset feature extraction network to extract feature vectors.
As a preferred solution, the video data is multi-view video data for tongue and/or face; in the step S2, the video data is extracted with feature vectors by:
projecting video data comprising a two-dimensional spatial dimension X, Y and a temporal dimension T onto a combination of all two-dimensional views, including (X, Y), (X, T) and (T, Y), using a preset 3D CNN model; generating a two-dimensional output matrix Yin e R for the two-dimensional view 512×n As a feature vector, where n represents the number of frames along the default dimension.
As a preferred solution, in the step S2, the extraction of the feature vector is performed on the voice data by:
after MFCC conversion is carried out on the voice data, a preset x-vector frame is input, and the characteristic vector of the voice data is output through preset Tanh layer activation and 1D-Conv layer calibration channel weight.
The invention also includes the following:
a multi-mode data-based multi-task combined learning traditional Chinese medicine virtual-real functional state identification method comprises the following steps:
s7, acquiring traditional Chinese medicine multi-modal data of a user, and performing feature extraction and normalization processing;
s8, inputting the result of the step S7 into the multi-task combined learning virtual-real recognition model obtained by the multi-task combined learning virtual-real recognition model construction method based on the multi-mode data, and obtaining the virtual-real functional state type probability of the user;
and S9, filtering the virtual and actual functional state type probability of the user according to a preset threshold value to obtain a virtual and actual functional state identification result of the user.
Compared with the prior art, the invention innovates the extraction of the joint key modal slot values, adopts a multi-task joint learning mode to explicitly integrate local key features in modal information and classification global task learning representation, so that the virtual-real recognition task obtains gain from the multi-task joint learning, can objectively quantify the result probability of virtual-real recognition, more accurately obtain the virtual-real functional state recognition conclusion, trace back key data sources and realize efficient and accurate traditional Chinese medicine virtual-real functional state recognition.
A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method for identifying virtual and real functional states of a multi-modal data-based multi-task joint learning traditional Chinese medicine as described above.
A computer device comprising a storage medium, a processor, and a computer program stored in the storage medium and executable by the processor; when the computer program is executed by the processor, the method for identifying the virtual and real functional states of the traditional Chinese medicine based on the multi-mode data multi-task combined learning is realized.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered limiting the scope, and that other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a method for constructing a multi-modal data-based multi-task joint learning virtual-real recognition model according to embodiment 1 of the present invention;
fig. 2 is a schematic diagram of a method for constructing a multi-modal data-based multi-task joint learning virtual-real recognition model according to embodiment 1 of the present invention;
fig. 3 is a schematic diagram of a door mechanism network used in the method for constructing a multi-modal data-based multi-task joint learning virtual-real recognition model according to embodiment 1 of the present invention;
fig. 4 is a flow chart of a method for constructing a multi-modal data-based multi-task joint learning virtual-real recognition model according to embodiment 2 of the present invention.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
In the description of the present application, it should be understood that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. The specific meaning of the terms in this application will be understood by those of ordinary skill in the art in a specific context.
Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The embodiments described below and features of the embodiments may be combined with each other without conflict.
Example 1
After analysis and demonstration of the prior art, the research and development personnel of the invention consider that the accuracy of the prior art is to be improved, and the key is that the prior art excessively weights global features and ignores local features, and vector feature learning benefits brought by a key mode information extraction task in the original data are not utilized, so the embodiment provides the following scheme:
referring to fig. 1 and 2, the method for constructing the multi-mode data-based multi-task joint learning virtual-real recognition model includes the following steps:
s1, collecting traditional Chinese medicine multi-mode data of a preset number of sample personnel; the traditional Chinese medicine multi-modal data comprises at least two of image data, video data, voice data and text data; marking the types of the virtual and real functional states of the sample personnel according to the preset virtual and real functional states;
s2, extracting feature vectors of the traditional Chinese medicine multi-mode data, and labeling key slot closing values of an extraction result; the extracted result is subjected to data normalization to obtain normalized feature vectors
S3, applying a preset two-way long and short time memory model to the feature vectorPerforming representation learning, and obtaining a recognition classification feature representation vector y according to a preset recognition classification attention mechanism and a trough value attention mechanism cls Slot value attention extraction feature representation vector y ext
S4, representing the vector y by the identification classification characteristic cls Slot value attention extraction feature representation vector y ext Performing multi-task joint learning on a preset neural network by combining the labeling results of the steps S1 and S2 to obtain an identification classification task loss cls Slot value extraction task loss ext
S5, using a door mechanism network to identify and classify task loss cls Slot value extraction task loss ext Fusion to obtain fusion loss gate
S6, according to the fusion loss gate And correcting the weights of all layers of the neural network through back propagation to complete the construction of the multi-task joint learning virtual-real identification model.
Compared with the prior art, the virtual-real identification task is subdivided into the virtual-real identification classification task and the key slot value extraction task, so that traditional Chinese medicine multi-mode data such as tongue images, facial videos, images, audios and texts can be effectively fused, key feature vectors are extracted through attention mechanisms aiming at different tasks in the process of model construction, losses of the different tasks are obtained, and the losses of the different tasks are fused through a gate mechanism; the multi-task combined learning virtual-actual recognition model constructed by the invention can objectively quantify the result probability of virtual-actual recognition, and realizes efficient and accurate traditional Chinese medicine virtual-actual functional state recognition.
Specifically, in the field of traditional Chinese medicine, a functional state represents a functional state of a human body constitution; the traditional Chinese medicine is used for identifying the human body function state, and is similar to the judgment of the human body physiological cycle or physiological state; the identification of the functional state of deficiency and excess is not in direct conclusion of the diagnosis of the health condition or disease type of the human body, and this process itself does not involve structural problems in pathology or medicine, but only concerns the manifestation of the body system, organs or physiological functions.
Functional states are generally divided into two basic types, virtual and real: the deficiency functional state indicates that the body is in some aspects of insufficient or weak function, and external manifestations may include fatigue, physical weakness, inappetence, etc.; the real functional state indicates that the body is overactive or overactive in some aspects, usually accompanied by damp-heat and turbid phlegm in the body. Based on the identification information of the deficiency and excess functional states, the physician can be helped to better understand the physical characteristics of the patient, better determine the disease properties and formulate corresponding treatment schemes in the subsequent development diagnosis or treatment process.
In this embodiment, the recognition classification attention and slot value attention mechanism is to perform attention weight distribution learning on two tasks respectively, and the learned feature vectors are used for learning of virtual-real functional state recognition classification and key slot value extraction of the two tasks respectively. The method has the advantages that the method is used for respectively learning two tasks instead of sharing the same attention weight, is more beneficial to explicitly carrying out a key feature extraction process, achieves the aim of multi-task learning to improve the accuracy of identifying and classifying virtual and real functional states, furthest reduces potential redundancy or ambiguity related to each mode, and improves the overall performance of the model.
The sample person may refer to a volunteer providing training data for the present embodiment. In the process of data acquisition, the shooting environment of the images and the videos is preferably natural light in daytime or an artificial light source environment with uniform illumination, a sample person can take a sitting posture or a standing posture, the distance between the sample person and the camera is about 60 cm+/-10 cm, wherein the videos can comprise tongue image videos and face videos, and the images can comprise tongue image images and face images; the audio collection can be realized through a recorder, a sample person can speak by himself or read a fixed poem provided by the system, and the time is not too long and is within 10 seconds; the acquisition modes can be realized through mobile equipment such as a mobile phone, and the acquisition functions of instruments such as a tongue diagnosis instrument and a face diagnosis instrument can be effectively replaced, so that the operation is convenient, and the cost is relatively low. The text data may then include raw health measurement scales, questionnaires, and the like.
As a preferred embodiment, the virtual-real functional states include five virtual: yin deficiency, blood deficiency, qi stagnation, fire heat, qi deficiency; the method also comprises the following steps: wind, essence deficiency, phlegm dampness, blood stasis and yang deficiency.
Specifically, the virtual-actual functional state identification and labeling are performed on each sample person, and assuming that the traditional Chinese medicine multi-modal data of a certain sample person are collected, medical staff can label the category corresponding to the sample person from the ten virtual-actual functional states according to the conditions of the sample person and own experience, and each sample person can label multiple functional states.
More specifically, in this embodiment, the meaning of the five-deficiency five-excess functional state is as follows:
yin deficiency refers to the condition of yin deficiency without yang control, moistening and nourishing, etc., and is mainly manifested as dry throat, dysphoria with feverish sensation in the chest, and night sweat, and thready and rapid pulse.
Yang deficiency refers to the condition of yang-qi deficiency in the body, with its effects of warming and nourishing, promoting and the like failing to take aversion to cold limbs as the main manifestation.
Qi deficiency refers to the condition of primordial qi deficiency, hypofunction of qi in promoting, consolidating, defending and transforming, or hypofunction of viscera and tissues, manifested as shortness of breath, debilitation, listlessness, and pulse deficiency.
Blood deficiency refers to the condition of blood deficiency failing to nourish viscera, meridians and tissues, manifested as pale complexion, face, lips and tongue and thready pulse.
Essence deficiency refers to the deficiency of kidney essence, which is a substance that maintains vital activities of people, and is a weak state mainly manifested by insufficient energy, alopecia and shaking of teeth, amnesia and deafness.
Qi stagnation refers to the condition of qi stagnation of a certain part of the human body or viscera and meridians, unsmooth operation, distending, choking and pain.
Phlegm-dampness refers to the condition of internal resistance or fluid channeling, and is mainly manifested as cough with excessive phlegm, chest distress, nausea, dizziness, obesity, etc.
Blood stasis refers to the condition of internal stagnation of blood and unsmooth blood circulation, and is mainly manifested as fixed stinging, tumor, bleeding and blood stasis.
The heat refers to the condition of excessive internal yang heat caused by exogenous heat, improper diet, excessive emotion, etc., and is mainly manifested as fever, thirst, flushed complexion, constipation, yellow urine, etc.
Wind: the condition of wind-like shaking is mainly manifested by excessive heat, excessive yang, yin deficiency, blood deficiency and the like in the body.
As a preferred embodiment, the image data comprises tongue images and/or face images; in the step S2, the image data is subjected to feature vector extraction by:
after preprocessing including size adjustment, graying and denoising is carried out on the image data, inputting the image data into a ROI (Region of Interest) image of a preset CNN model for demarcating an interested area, and inputting the ROI image into a preset feature extraction network for extracting feature vectors.
Specifically, the CNN model can be Faster R-CNN; the feature extraction network can select ResNet to extract the output of the pooling layer before the last full-connection layer as the feature vector.
As a preferred embodiment, the video data is multi-view video data for tongue and/or face; in the step S2, the video data is extracted with feature vectors by:
projecting video data comprising a two-dimensional spatial dimension X, Y and a temporal dimension T onto a combination of all two-dimensional views, including (X, Y), (X, T) and (T, Y), using a preset 3D CNN model; generating a two-dimensional output matrix Yin e R for the two-dimensional view 512×n As a feature vector, where n represents the number of frames along the default dimension.
As a preferred embodiment, in the step S2, the extraction of the feature vector is performed on the voice data by:
after MFCC conversion is carried out on the voice data, a preset x-vector frame is input, and the characteristic vector of the voice data is output through preset Tanh layer activation and 1D-Conv layer calibration channel weight.
For raw health measurement scale text data, the extraction of feature vectors may be performed using Actbert or UniLMv2, MASS, TENER, biobert model pairs.
In the step S2, labeling the key slot value of the extraction result, taking a certain sample person as an example, and taking a face image, a tongue image and a video frame image of the sample person as key local area images representing virtual and real functions of the sample person, for example, an area representing tongue fur color, the labeling person can introduce a section of vector with the same value as the pixel dimension of the image as 0, and the vector value corresponding to the area is set as 1; the same holds for other audio, text data.
As an alternative embodiment, in the step S2, the normalized feature vector is obtained by performing data normalization on the extraction resultThe method can be carried out in the following way:
filling the extracted feature vectors to the unified sequence length, and then performing full connection to obtain normalized features; if the traditional Chinese medicine multi-modal data comprises four types of image data, video data, voice data and text data at the same time, the formula involved in the normalization process is as follows:
wherein x is 1 ,x 2 ,x 3 ,x 4 Respectively representing an image feature vector matrix, a video feature vector matrix, an audio feature vector matrix and a text feature vector matrix,the connection between the representative feature vectors can be ensured by supplementing 0 in the vector dimension representing the sequence lengthThe modal vectors can be connected, the->Is the normalized feature vector.
As a preferred embodiment, the step S3 includes the following steps:
feature vector unifying sequence length to TInputting to the bidirectional long and short time memory model to generate the final hidden state +.>Wherein: />Representing the forward output of the bidirectional long-short-time memory model output in the ith time step; />The reverse output of the bidirectional long-short-time memory model output in the ith time step is shown;
feature vectorThe j-th time step after the two-way long-short time memory model is used for hiding the state h j Considered as key K and value V, with the weight matrix to be trained +.>Respectively regarded as query vectors Q; the slot value attention deficit feature representation vector y is obtained by the following formula ext
Wherein:representing feature vector +.>A slot value label for the ith feature; />Context vector representing slot value, calculated as hidden state h of bidirectional long-short-time memory model 1 ,...,h T Weighted sum +.>
Sigma represents an activation function; i, j each represent a time sequence of steps of hidden states, e i,j Representing the j-th time step hidden state h j Hidden state h with the ith time step i AndThe weight feature vector obtained after multiplication represents; />Represented as the ith time step hidden state h i In h j The weight value exp (e) obtained for querying Q and the key value K i, ) And dividing the value obtained after the hidden state inquiry of each time step by the value to obtain a normalized weight result.
Specifically, the above formula only takes the slot value attention mechanism as an example, and the formula for identifying and classifying the attention mechanism is the same as the above formula. In the scheme, two tasks are separated to obtain attention mechanisms, so that attention mechanisms corresponding to the two tasks are split into adaptation virtual-real identification tasks and slot values for extracting attention mechanisms corresponding to the two tasks in order to adapt to multi-task learning.
As an alternative embodiment, in the step S4, N samples are provided, each sample has C possible classes, assuming that the sequence length is T, and based on the labeling results of the steps S1 and S2, performing multi-task learning to obtain loss of the virtual-real recognition task cls Slot value extraction task loss ext The formula specifically related to the method is as follows:
the model predicts the probability of whether the ith sample belongs to the jth class.
Wherein,true tag (0 or 1) indicating whether the ith sample belongs to the jth class, p i,j The representation model predicts the probability of whether the ith sample belongs to the jth class.
As a preferred embodiment, said step S5 is implemented by the following formula:
loss gate =∑v*tanh(W 1 g loss cls +W 2 g loss ext )
wherein v, W 1 g W is provided 2 g Are trainable parameters, and the related principle can be seen in fig. 3.
Example 2
Referring to fig. 4, the method for identifying virtual and real functional states of the traditional Chinese medicine based on multi-mode data multi-task combined learning comprises the following steps:
s7, acquiring traditional Chinese medicine multi-modal data of a user, and performing feature extraction and normalization processing;
s8, inputting the result of the step S7 into the multi-task combined learning virtual-real recognition model obtained by the multi-task combined learning virtual-real recognition model construction method based on multi-mode data as described in the embodiment 1, and obtaining the virtual-real functional mode type probability of the user;
and S9, filtering the virtual and actual functional state type probability of the user according to a preset threshold value to obtain a virtual and actual functional state identification result of the user.
Compared with the prior art, the invention innovates the extraction of the joint key modal slot values, adopts a multi-task joint learning mode to explicitly integrate local key features in modal information and classification global task learning representation, so that the virtual-real recognition task obtains gain from the multi-task joint learning, can objectively quantify the result probability of virtual-real recognition, more accurately obtain the virtual-real functional state recognition conclusion, trace back key data sources and realize efficient and accurate traditional Chinese medicine virtual-real functional state recognition.
Specifically, in this embodiment, the acquisition mode of the multi-modal data of the traditional Chinese medicine of the user may be the same as the acquisition mode of the sample personnel in embodiment 1, and the data form similar to that during training is obtained after the processing in step S7.
Example 3
A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the multi-modal data-based multi-task joint learning traditional Chinese medicine virtual-real functional state recognition method described in embodiment 2.
Example 4
A computer device comprising a storage medium, a processor, and a computer program stored in the storage medium and executable by the processor; when the computer program is executed by a processor, the method for identifying virtual and real functions of the traditional Chinese medicine based on multi-mode data multi-task combined learning is realized as described in the embodiment 2.
The foregoing is merely various embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. The method for constructing the multi-mode data-based multi-task joint learning virtual-real identification model is characterized by comprising the following steps of:
s1, collecting traditional Chinese medicine multi-mode data of a preset number of sample personnel; the traditional Chinese medicine multi-modal data comprises at least two of image data, video data, voice data and text data; marking the types of the virtual and real functional states of the sample personnel according to the preset virtual and real functional states;
s2, extracting feature vectors of the traditional Chinese medicine multi-mode data, and labeling key slot closing values of an extraction result; the extracted result is subjected to data normalization to obtain normalized feature vectors
S3, applying a preset two-way long and short time memory model to the feature vectorPerforming representation learning, and obtaining a recognition classification feature representation vector y according to a preset recognition classification attention mechanism and a trough value attention mechanism cls Slot value attention extraction feature representation vector y ext
S4, representing the vector y by the identification classification characteristic cls Slot value attention extraction feature representation vector y ext Performing multi-task joint learning on a preset neural network by combining the labeling results of the steps S1 and S2 to obtain an identification classification task loss cls Slot value extraction task loss ext
S5, using a door mechanism network to identify and classify task loss cls Slot value extraction task loss ext Fusion to obtain fusion loss gate
S6, according to the fusion loss gate And correcting the weights of all layers of the neural network through back propagation to complete the construction of the multi-task joint learning virtual-real identification model.
2. The method for constructing a multi-modal data-based multi-task joint learning virtual-real recognition model according to claim 1, wherein the virtual-real functional states include five virtual: yin deficiency, blood deficiency, qi stagnation, fire heat, qi deficiency; the method also comprises the following steps: wind, essence deficiency, phlegm dampness, blood stasis and yang deficiency.
3. The method for constructing the multi-modal data-based multi-task joint learning virtual-real recognition model according to claim 1, wherein the step S3 includes the following steps:
feature vector unifying sequence length to TInputting to the bidirectional long and short time memory model to generate the final hidden state +.>Wherein: />Representing the forward output of the bidirectional long-short-time memory model output in the ith time step; />The reverse output of the bidirectional long-short-time memory model output in the ith time step is shown;
feature vectorThe j-th time step after the two-way long-short time memory model is used for hiding the state h j Considered as key K and value V, with the weight matrix to be trained +.>Respectively regarded as query vectors Q; the slot value attention deficit feature representation vector y is obtained by the following formula ext
Wherein:representing feature vector +.>A slot value label for the ith feature; />Context vector representing slot value, calculated as hidden state h of bidirectional long-short-time memory model 1 ,...,h T Weighted sum +.>
Sigma represents an activation function; i, j each represent a time sequence of steps of hidden states, e i,j Representing the j-th time step hidden state h j Hidden state h with the ith time step i AndThe weight feature vector obtained after multiplication represents; />Represented as the ith time step hidden state h i In h j The weight value exp (e) obtained for querying Q and the key value K i, ) And dividing the value obtained after the hidden state inquiry of each time step by the value to obtain a normalized weight result.
4. The method for constructing a multi-modal data-based multi-task joint learning virtual-real recognition model according to claim 1, wherein the step S5 is implemented by the following formula:
wherein v is,And +.>Are trainable parameters.
5. The method for constructing a multi-modal data-based multi-task joint learning virtual-real recognition model according to claim 1, wherein the image data includes tongue images and/or face images; in the step S2, the image data is subjected to feature vector extraction by:
after preprocessing including size adjustment, graying and denoising is carried out on the image data, inputting the image data into a preset CNN model to demarcate an ROI image of a region of interest, and inputting the ROI image into a preset feature extraction network to extract feature vectors.
6. The method for constructing a multi-modal data-based multi-task joint learning virtual-real recognition model according to claim 1, wherein the video data is multi-view video data for tongue and/or face; in the step S2, the video data is extracted with feature vectors by:
projecting video data comprising a two-dimensional spatial dimension X, Y and a temporal dimension T onto a combination of all two-dimensional views, including (X, Y), (X, T) and (T, Y), using a preset 3D CNN model; generating a two-dimensional output matrix Yin e R for the two-dimensional view 512×n As a feature vector, where n represents the number of frames along the default dimension.
7. The method for constructing a multi-modal data-based multi-task joint learning virtual-real recognition model according to claim 1, wherein in the step S2, feature vector extraction is performed on the speech data by:
after MFCC conversion is carried out on the voice data, a preset x-vector frame is input, and the characteristic vector of the voice data is output through preset Tanh layer activation and 1D-Conv layer calibration channel weight.
8. The multi-mode data-based multi-task combined learning traditional Chinese medicine virtual-real functional state identification method is characterized by comprising the following steps of:
s7, acquiring traditional Chinese medicine multi-modal data of a user, and performing feature extraction and normalization processing;
s8, inputting the result of the step S7 into the multi-task combined learning virtual-real recognition model obtained by the multi-task combined learning virtual-real recognition model construction method based on the multi-mode data according to any one of claims 1 to 7, and obtaining the user virtual-real functional state type probability;
and S9, filtering the virtual and actual functional state type probability of the user according to a preset threshold value to obtain a virtual and actual functional state identification result of the user.
9. A computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the method for identifying virtual and real functional states in multi-modal data-based multi-task joint learning traditional Chinese medicine as claimed in claim 8.
10. A computer device comprising a storage medium, a processor, and a computer program stored in the storage medium and executable by the processor; the computer program, when executed by a processor, realizes the multi-mode data-based multi-task combined learning traditional Chinese medicine virtual-real functional state identification method as claimed in claim 8.
CN202311220175.9A 2023-09-20 2023-09-20 Multi-mode data-based multi-task combined learning traditional Chinese medicine virtual-actual functional state identification method Pending CN117352133A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311220175.9A CN117352133A (en) 2023-09-20 2023-09-20 Multi-mode data-based multi-task combined learning traditional Chinese medicine virtual-actual functional state identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311220175.9A CN117352133A (en) 2023-09-20 2023-09-20 Multi-mode data-based multi-task combined learning traditional Chinese medicine virtual-actual functional state identification method

Publications (1)

Publication Number Publication Date
CN117352133A true CN117352133A (en) 2024-01-05

Family

ID=89356622

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311220175.9A Pending CN117352133A (en) 2023-09-20 2023-09-20 Multi-mode data-based multi-task combined learning traditional Chinese medicine virtual-actual functional state identification method

Country Status (1)

Country Link
CN (1) CN117352133A (en)

Similar Documents

Publication Publication Date Title
Filippini et al. Thermal infrared imaging-based affective computing and its application to facilitate human robot interaction: A review
US10219736B2 (en) Methods and arrangements concerning dermatology
CN112861624A (en) Human body posture detection method, system, storage medium, equipment and terminal
CN105393252A (en) Physiologic data acquisition and analysis
Zhu Computer Vision‐Driven Evaluation System for Assisted Decision‐Making in Sports Training
Casalino et al. Contact-less real-time monitoring of cardiovascular risk using video imaging and fuzzy inference rules
Tan et al. Autoencoder-based transfer learning in brain–computer interface for rehabilitation robot
Woo et al. Speech map: A statistical multimodal atlas of 4D tongue motion during speech from tagged and cine MR images
CN107658004A (en) A kind of method and system for checking high in the clouds medical image information on mobile terminals
CN112420141A (en) Traditional Chinese medicine health assessment system and application thereof
Sarath Human emotions recognition from thermal images using Yolo algorithm
CN117935339A (en) Micro-expression recognition method based on multi-modal fusion
Gan et al. FEAFA+: an extended well-annotated dataset for facial expression analysis and 3d facial animation
Germanese et al. Computer Vision Tasks for Ambient Intelligence in Children’s Health
Kwaśniewska et al. Real-time facial features detection from low resolution thermal images with deep classification models
Zhao et al. DFME: A New Benchmark for Dynamic Facial Micro-expression Recognition
CN117352133A (en) Multi-mode data-based multi-task combined learning traditional Chinese medicine virtual-actual functional state identification method
CN113555106B (en) Intelligent traditional Chinese medicine remote auxiliary diagnosis and treatment platform based on generation countermeasure network
CN113129277A (en) Tongue coating detection system based on convolutional neural network
Almonacid-Uribe et al. Deep learning for diagonal earlobe crease detection
Zheng et al. Sports Biology Seminar of Three‐dimensional Movement Characteristics of Yoga Standing Based on Image Recognition
CN117036877B (en) Emotion recognition method and system for facial expression and gesture fusion
Chen et al. Palpation localization of radial artery based on 3-dimensional convolutional neural networks
CN117334299A (en) Multi-mode traditional Chinese medicine deficiency and excess functional state identification method and related device
Saha et al. Personalized pain study platform using evidence-based continuous learning tool

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20240920

Address after: No. 81, Xiangxue Avenue Middle, Huangpu District, Guangzhou, Guangdong 510000

Applicant after: Guangdong Xinhuangpu Joint Innovation Institute of traditional Chinese Medicine

Country or region after: China

Applicant after: Hu Jingqing

Address before: No. 81, Xiangxue Avenue Middle, Huangpu District, Guangzhou, Guangdong 510000

Applicant before: Guangdong Xinhuangpu Joint Innovation Institute of traditional Chinese Medicine

Country or region before: China