CN113628709B - Similar object determination method, device, equipment and storage medium - Google Patents
Similar object determination method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN113628709B CN113628709B CN202111177589.9A CN202111177589A CN113628709B CN 113628709 B CN113628709 B CN 113628709B CN 202111177589 A CN202111177589 A CN 202111177589A CN 113628709 B CN113628709 B CN 113628709B
- Authority
- CN
- China
- Prior art keywords
- data
- medical record
- local feature
- sample
- local
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 80
- 239000013598 vector Substances 0.000 claims abstract description 337
- 238000012545 processing Methods 0.000 claims abstract description 122
- 238000012549 training Methods 0.000 claims description 83
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 81
- 201000010099 disease Diseases 0.000 claims description 71
- 230000036541 health Effects 0.000 claims description 31
- 230000003862 health status Effects 0.000 claims description 24
- 238000000605 extraction Methods 0.000 claims description 23
- 238000003556 assay Methods 0.000 claims description 16
- 208000024891 symptom Diseases 0.000 claims description 16
- 238000012163 sequencing technique Methods 0.000 claims description 5
- 238000012360 testing method Methods 0.000 claims description 5
- 239000011159 matrix material Substances 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 4
- 238000013139 quantization Methods 0.000 claims description 4
- 238000005516 engineering process Methods 0.000 description 15
- 238000013473 artificial intelligence Methods 0.000 description 12
- 238000010586 diagram Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 10
- 208000035475 disorder Diseases 0.000 description 9
- 230000008569 process Effects 0.000 description 8
- 238000004590 computer program Methods 0.000 description 7
- 238000005259 measurement Methods 0.000 description 7
- 238000010801 machine learning Methods 0.000 description 6
- 241000282414 Homo sapiens Species 0.000 description 5
- 238000003058 natural language processing Methods 0.000 description 5
- 230000036961 partial effect Effects 0.000 description 5
- 238000011160 research Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 3
- 241000282472 Canis lupus familiaris Species 0.000 description 2
- 241000282326 Felis catus Species 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 206010011224 Cough Diseases 0.000 description 1
- 241000196324 Embryophyta Species 0.000 description 1
- 241000282412 Homo Species 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 206010037660 Pyrexia Diseases 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 206010012601 diabetes mellitus Diseases 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000035876 healing Effects 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/60—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Public Health (AREA)
- Medical Informatics (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Epidemiology (AREA)
- Evolutionary Biology (AREA)
- Artificial Intelligence (AREA)
- Primary Health Care (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Pathology (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
Abstract
The application discloses a method, a device, equipment and a storage medium for determining similar objects, and belongs to the technical field of computers and the Internet. The method comprises the following steps: acquiring characteristic data of a first object; acquiring a plurality of local feature vectors of the first object based on the feature data; determining a similarity between a plurality of local feature vectors of the first object and a plurality of local feature vectors of the second object; and determining that the first object has similarity with the second object under the condition that a preset condition is met. In the application, the local feature vector is used for representing the features of the first object in a period of time, so that the local feature vector can more finely represent the features of the first object, and under the condition that the local feature vector can represent the first state of the first object, the mutual influence between the local feature vector in the first state and the local feature vector in the second state in the subsequent processing is avoided, and the accuracy of the subsequent processing result is improved.
Description
Technical Field
The present application relates to the field of computer and internet technologies, and in particular, to a method, an apparatus, a device, and a storage medium for determining similar objects.
Background
The medical record is the original record of the whole process of diagnosis and treatment of the object in the hospital, and has an important role in the subsequent disease diagnosis and disease treatment of the object.
In the related art, before a first object is treated, feature extraction is performed on current medical record data and historical medical record data of the first object, global features of diseases of the first object are determined, a second object with similar global features is selected based on the global features, the second object is determined to be an object with similar diseases to the first object, and the first object is treated by combining treatment data of the second object.
However, in the above-described related art, the second subject having a similar disorder is determined directly from the global features, which are not fine enough, resulting in that the degree of similarity of disorders between subjects cannot be accurately measured.
Disclosure of Invention
The embodiment of the application provides a method, a device, equipment and a storage medium for determining similar objects, and the accuracy of a disease similarity measurement result is improved. The technical scheme is as follows.
According to an aspect of the embodiments of the present application, there is provided a similar object determination method, including the following steps:
acquiring medical record data of a first object, wherein the medical record data comprises medical record information at a plurality of recording moments;
obtaining a plurality of local feature vectors of the first object based on the medical record data; wherein a local feature vector is used to characterize the condition features and health status of the first subject over a period of time;
determining a trajectory similarity between the plurality of local feature vectors of the first object and the plurality of local feature vectors of the second object; the track similarity is used for representing the similarity of two groups of local feature vectors in time sequence;
determining that the first object and the second object have similar symptoms if the trajectory similarity satisfies a condition.
According to an aspect of an embodiment of the present application, there is provided a method for training a medical record data processing model, the method including the steps of:
acquiring a training sample of a medical record data processing model, wherein the training sample comprises medical record data of a sample object and a task result label, and the medical record data comprises medical record information at a plurality of recording moments; wherein, the medical record data processing model comprises: the system comprises a local feature coding network, a global feature coding network and a task result output network;
encoding medical record data of the sample object by adopting the local feature encoding network to obtain a plurality of local feature vectors of the sample object; wherein a local feature vector is used to characterize the condition features and health status of the sample object over a time period;
acquiring a global feature vector of the sample object by adopting the global feature coding network based on the incidence relation among a plurality of local feature vectors of the sample object, wherein the global feature vector is used for representing the global disease feature and the overall health state of the sample object;
executing a downstream task by adopting the task result output network according to the global feature vector to obtain a task output result of the sample object;
determining the training loss of the medical record data processing model based on the task output result and the task result label of the sample object and the parameters corresponding to the local feature coding network;
and training the medical record data processing model according to the training loss.
According to an aspect of the embodiments of the present application, there is provided a similar object determination apparatus, including:
the medical record data acquisition module is used for acquiring medical record data of the first object, wherein the medical record data comprises medical record information at a plurality of recording moments;
a local vector acquisition module, configured to acquire a plurality of local feature vectors of the first object based on the medical record data; wherein a local feature vector is used to characterize the condition features and health status of the first subject over a period of time;
a trajectory similarity determination module, configured to determine a trajectory similarity between the plurality of local feature vectors of the first object and the plurality of local feature vectors of the second object; the track similarity is used for representing the similarity of two groups of local feature vectors in time sequence;
a similar object determination module, configured to determine that the first object and the second object have similar symptoms if the trajectory similarity satisfies a condition.
According to an aspect of the embodiments of the present application, there is provided an apparatus for training a medical record data processing model, the apparatus including the following modules:
the system comprises a training sample acquisition module, a task result label acquisition module and a task result processing module, wherein the training sample acquisition module is used for acquiring a training sample of a medical record data processing model, the training sample comprises medical record data of a sample object and a task result label, and the medical record data comprises medical record information at a plurality of recording moments; wherein, the medical record data processing model comprises: the system comprises a local feature coding network, a global feature coding network and a task result output network;
the local feature acquisition module is used for encoding the medical record data of the sample object by adopting the local feature encoding network to obtain a plurality of local feature vectors of the sample object; wherein a local feature vector is used to characterize the condition features and health status of the sample object over a time period;
a global feature obtaining module, configured to obtain a global feature vector of the sample object based on an incidence relation between a plurality of local feature vectors of the sample object by using the global feature coding network, where the global feature vector is used to characterize a global disorder feature and an overall health status of the sample object;
the downstream task execution module is used for executing a downstream task according to the global feature vector by adopting the task result output network to obtain a task output result of the sample object;
the training loss determining module is used for determining the training loss of the medical record data processing model based on the task output result and the task result label of the sample object and the parameters corresponding to the local feature coding network;
and the target model training module is used for training the medical record data processing model according to the training loss.
According to an aspect of the embodiments of the present application, there is provided a computer device, including a processor and a memory, where at least one instruction, at least one program, a code set, or a set of instructions is stored in the memory, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by the processor to implement the similar object determination method described above or implement the training method of the medical record data processing model described above.
According to an aspect of the embodiments of the present application, there is provided a computer-readable storage medium having at least one instruction, at least one program, a code set, or a set of instructions stored therein, where the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by a processor to implement the similar object determination method or the training method of the medical record data processing model.
According to an aspect of embodiments herein, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the computer device executes the similar object determination method or the training method of the medical record data processing model is realized.
The technical scheme provided by the embodiment of the application can bring the following beneficial effects:
acquiring a plurality of local characteristic vectors of a first object through medical record data, determining the similarity of the local characteristic vectors of the first object and a second object in time sequence according to the local characteristic vectors of the first object and the local characteristic vectors of the second object, wherein the local characteristic vectors are used for representing the disease symptoms of the first object in a period of time, and converting the medical record data of the first object into the disease symptoms of multiple periods of time, so that the disease symptoms of the first object can be more finely embodied by the local characteristic vectors, and the measurement result is more fine and accurate when the disease symptoms similarity between the first object and the second object is subsequently measured according to the track similarity; moreover, the local feature vector is also used for representing the health state of the first object in a period of time, so that the accuracy of the measurement result in the case of the similarity of the symptoms is improved, and under the condition that the local feature vector can represent the health state of the first object, the influence between the local feature vector in the health state and the local feature vector in the non-health state in the subsequent processing is avoided, and the accuracy of the subsequent processing result is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a diagram illustrating an example medical record data processing model;
FIG. 2 is a flowchart of a similar object determination method provided in an embodiment of the present application;
fig. 3 is a flowchart of a similar object determination method according to another embodiment of the present application;
FIG. 4 illustrates a schematic diagram of a user interface;
FIG. 5 is a flow chart of a method for training a medical record data processing model according to an embodiment of the present application;
fig. 6 is a block diagram of a similar object determination apparatus according to an embodiment of the present application;
fig. 7 is a block diagram of a similar object determination apparatus according to another embodiment of the present application;
FIG. 8 is a block diagram of a medical record data processing model training apparatus according to an embodiment of the present application;
FIG. 9 is a block diagram of a medical record data processing model training apparatus according to another embodiment of the present application;
fig. 10 is a block diagram of a computer device according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.
With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.
The scheme provided by the embodiment of the application relates to the technologies of machine learning and the like of artificial intelligence, and the medical record data processing model is trained by adopting training samples. Optionally, after the medical record data processing model is trained, the computer device may invoke the medical record data processing model to process the medical record data of the first object. The first object refers to any living being with vital signs, such as a human, a cat, a dog, a tree, and the like, which is not limited in the embodiments of the present application; the medical record data includes medical record information at a plurality of recording times.
Illustratively, as shown in fig. 1, the medical record data processing model 10 includes a local feature coding network 11, a global feature coding network 12 and a task result output network 13. After acquiring medical record data of the first object, the computer device performs feature extraction and grouping processing on the medical record data to obtain a plurality of feature data sets, and further, the local feature coding network 11 respectively codes the plurality of feature data sets to obtain a plurality of local feature vectors of the first object. Wherein one local feature vector is used for characterizing the symptom and health status of the first object within a time period. Then, a global feature vector of the first object is obtained by the global feature coding network 12 based on the association relationship between the plurality of local feature vectors, and the global feature vector is used for characterizing the global condition features and the overall health status of the first object. Further, the task result output network 13 executes the downstream task according to the global feature vector, and obtains the task output result of the first object. The downstream task may be any task, such as a disease identification task, a recovery probability prediction task, a death probability prediction task, a determination task for a specific disease, and the like, which is not limited in the embodiment of the present application. In addition, in the embodiment of the present application, after obtaining the output result of the local feature coding network 11, the computer device determines, according to the plurality of local feature vectors of the first object, a trajectory similarity between the plurality of local feature vectors of the first object and the plurality of local feature vectors of the second object, and determines that the first object and the second object have similar disorders in a case that the trajectory similarity satisfies a condition.
For convenience of description, in the following method embodiments, only the execution subject of each step is described as an example of a computer device, and the computer device may be any electronic device with computing and storage capabilities. For example, the computer device may be a server, which may be an independent physical server, or a server cluster or distributed system formed by a plurality of physical servers, and for example, when the server is a plurality of servers, the plurality of servers may be formed into a blockchain, and the server is a node on the blockchain. For another example, the computer device may also be a terminal, and the terminal may be, but is not limited to, a smart phone, a tablet computer, a laptop computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein. In this embodiment, the execution subject of each step may be the same computer device, or may be executed by a plurality of different computer devices in an interactive manner, which is not limited herein. It should be noted that, in the embodiment of the present application, an execution subject of the similar object determination method described below may be the same computer device as an execution subject of the medical record data processing model described below, or may be a different computer device, and the embodiment of the present application is not limited to this.
The technical solution of the present application will be described in detail with reference to several embodiments.
Referring to fig. 2, a flowchart of a similar object determination method according to an embodiment of the present application is shown. The method can comprise the following steps (201-204).
The first object refers to any living organism having vital signs, such as human beings, animals such as cats and dogs, plants such as flowers and plants, and the like, and the examples of the present application do not limit the present invention. The medical record data is recorded data on the disease of the first object at different recording times. In an embodiment of the present application, a computer device obtains medical record data for a first object prior to determining similar objects of the first object. The case data includes medical record information at a plurality of recording times.
Optionally, the recording time is used as a partition, and the data included in the medical record data can be divided into data of both the current medical record data and the historical medical record data. The current medical record data refers to data acquired by the computer equipment at the current moment in real time, and the historical medical record data refers to data acquired by the computer equipment before the current moment.
In a possible implementation manner, in a real-time medical record data processing scenario, the medical record data includes current medical record data and historical medical record data, and in this case, the latest recording time of the recording times included in the medical record data is the current time. Optionally, when the current medical record data of the first object is acquired, the computer device determines to execute the step of acquiring the medical record data of the first object, and further queries and acquires the historical medical record data of the first object from the medical record data storage database according to the identification information of the first object. For example, the current medical record data may be real-time data reported by a user to a computer device through a client of an application program, where the user may be a doctor, a patient himself or a guardian of the patient, and this is not limited in this embodiment of the present application.
In another possible implementation manner, in a non-real-time medical record data processing scenario, the medical record data only includes historical medical record data, and in this case, the latest recording time of the recording times included in the medical record data is before the current time. Optionally, when acquiring medical record data, the computer device acquires identification information of the targeted first object, and further queries and acquires historical medical record data of the first object from the medical record data storage database according to the identification information.
Alternatively, the recording time included in the medical record data may include all the recording times corresponding to the first object, or may include a part of the recording times corresponding to the first object.
In a possible implementation manner, in order to ensure the integrity of the medical record data, the recording time included in the medical record data is all the recording time corresponding to the first object, and when the computer device acquires the medical record data of the first object, the computer device directly acquires the medical record data of the first object at all the recording time, without the need of screening the recording time of the medical record data, so that the operation is simple and convenient, and the data is complete and rich.
In another possible embodiment, in order to reduce the amount of calculation for medical record data, the record data included in the medical record data is partial record time corresponding to a first object, and when acquiring the medical record data of the first object, the computer device performs a filtering process on all record time corresponding to the first object to obtain partial record time corresponding to the first object, and determines case data of the first object from medical record information recorded by the partial record data. Optionally, the screening conditions for all recording moments include, but are not limited to, at least one of the following: the time difference between the recording time and the current time is less than a threshold value, the recording time is within the outbreak period of a specific disease condition, the medical record information recorded at the recording time comprises the related information of a specific disease condition, and the like. The threshold may be any value, which is not limited in the embodiment of the present application.
In the case where the recording data included in the medical record data is the partial recording time corresponding to the first object, the partial recording time necessarily includes the current time in the real-time medical record data processing scenario.
Step 202, a plurality of local feature vectors of the first object are obtained based on the medical record data.
In an embodiment of the present application, after acquiring the medical record data, the computer device acquires a plurality of local feature vectors of the first object based on the medical record data. Wherein one local feature vector is used for characterizing the symptom and health status of the first object within a time period.
Optionally, after acquiring the medical record data, the computer device performs feature extraction and grouping processing on the medical record data to acquire a plurality of feature data sets, and then determines a plurality of local feature vectors according to the plurality of feature data sets. Wherein one feature group corresponds to one local feature vector.
Note that, the order of executing the feature extraction and the grouping process is not limited in the embodiment of the present application. Optionally, the computer device may perform feature data processing on the disease state data to obtain feature data at a plurality of recording moments, and then perform grouping processing on the feature data according to the recording moments to obtain a plurality of feature data groups; or, the computer device may also perform grouping processing on the disease state data according to the recording time to obtain a plurality of disease state data sets, and then perform feature extraction on each disease state data set to obtain a feature data set corresponding to each disease state data set, and one feature data set corresponding to one disease state data set.
It should be further noted that, the above is to perform grouping processing on medical record data according to the recording time, and in an exemplary embodiment, the medical record data may also be performed grouping processing according to other grouping conditions. Illustratively, the other grouping condition may be: the similar feature data is grouped into a group, and the similar feature data is averagely grouped so that each group of feature data contains the same amount of feature data.
In an embodiment of the present application, after acquiring a plurality of local feature vectors of a first object, a computer device determines a trajectory similarity between the plurality of local feature vectors of the first object and a plurality of local feature vectors of a second object. The track similarity is used for representing the similarity of two groups of local feature vectors in a time sequence; the plurality of local feature vectors of the second object may be acquired by the computer device after the plurality of local feature vectors of the first object are acquired, or may be acquired and stored by the computer device when the computer device processes medical record data of the second object, which is not limited in this embodiment of the application.
Alternatively, the second object may be any object other than the first object, or may be a specific object selected from candidate objects.
In one possible embodiment, the second object is any object other than the first object. Optionally, after acquiring the plurality of local feature vectors of the first object, the computer device acquires any one object as a second object, and acquires a plurality of local feature vectors of the second object.
In another possible embodiment, the second object is a specific object chosen from the candidates. Optionally, the computer device acquires medical record data of a plurality of candidate objects after acquiring the plurality of local feature vectors of the first object, and further selects, as the second object, a candidate object that satisfies a matching condition with the medical record data of the first object from the plurality of candidate objects according to medical record data corresponding to each candidate object. Wherein the matching condition includes but is not limited to at least one of the following: the recording time of the medical record information is close to the recording time of the medical record information of the first object, the medical condition description data included in the medical record information is similar to the medical condition description data included in the medical record information of the first object, the medical record information includes medical condition description data for a target medical condition, and the number of recording times included in the medical record information is equal to the number of recording times included in the medical record information of the first object.
It should be noted that the number of the second objects may be one or more, and the embodiment of the present application is not limited thereto.
Optionally, in this embodiment of the application, after the computer device obtains the plurality of local feature vectors of the first object, in addition to performing step 203, the plurality of local feature vectors of the first object may be stored in a suitable location, so as to facilitate subsequent direct application of the plurality of local feature vectors of the first object.
In one possible embodiment, in order to save the storage space, when storing the plurality of local feature vectors of the first object acquired this time, it is necessary to first clear the plurality of local feature vectors of the first object stored before.
In another possible implementation manner, in order to ensure the integrity of data, when a plurality of local feature vectors of a first object acquired this time are stored, it is first determined whether the acquisition manner for the local feature vector this time is the same as the acquisition manner for the local feature vector before, and then previous local feature vectors acquired according to the same acquisition manner are cleared, previous local feature vectors acquired according to different acquisition manners are retained, and the local feature vectors acquired this time are stored. The obtaining manner includes any one of obtaining steps (such as a medical record data processing sequence, a medical record data processing manner, and the like) for the local feature vector mentioned in the present application, that is, the obtaining manner can be determined to be different if any one of the obtaining steps is different. Of course, in an exemplary embodiment, to facilitate data search, for a plurality of local feature vectors of the same object acquired according to different acquisition steps, the acquisition steps are stored in different locations according to the acquisition steps, and when the computer device stores the local feature vectors acquired this time, it is not necessary to perform a determination operation regarding the acquisition steps, and it is only necessary to clear the local feature vectors stored in the storage location corresponding to the acquisition step this time.
And step 204, under the condition that the track similarity meets the condition, determining that the first object and the second object have similar diseases.
In this embodiment, after acquiring the trajectory similarity, the computer device determines whether the trajectory similarity satisfies a condition, and determines that the first object and the second object have similar symptoms if the trajectory similarity satisfies the condition.
Optionally, for the trajectory similarity of different representation forms, the corresponding conditions are different.
In a possible embodiment, the trajectory similarity is represented in a numerical form, and the condition is that the trajectory similarity is greater than a target value, and after the trajectory similarity is obtained, the computer device determines that the trajectory similarity satisfies the condition if the trajectory similarity is greater than the target value. The target value may be any value, which is not limited in the embodiments of the present application.
In another possible embodiment, the track similarity is expressed in the form of a mark, and the condition is that the mark of the track similarity is a specific mark. Illustratively, the above labels include "dissimilar", "somewhat similar", "very similar", "almost identical", and the above conditions are the labels of the trajectory similarity "," very similar "," almost identical ".
Optionally, in an embodiment of the present application, the computer device may determine, when determining that the first subject and the second subject have similar disorders, treatment data for the first subject from treatment data of the second subject; alternatively, the computer device may also send the relationship between the first object and the second object directly to the user of the medical staff.
In summary, in the technical scheme provided in the embodiment of the present application, a plurality of local feature vectors of a first object are obtained through medical record data, and then a degree of similarity in time sequence of the local feature vectors of the first object and a second object is determined according to the plurality of local feature vectors of the first object and the plurality of local feature vectors of the second object, where the local feature vectors are used to characterize a disease condition feature of the first object in a time period, and medical record data of the first object is converted into a disease condition feature in a multi-segment time period, so that the local feature vectors can more finely embody the disease condition feature of the first object, and then a measurement result is more fine and accurate when the disease condition similarity between the first object and the second object is subsequently measured according to a trajectory similarity; moreover, the local feature vector is also used for representing the health state of the first object in a period of time, so that the accuracy of the measurement result in the case of the similarity of the symptoms is improved, and under the condition that the local feature vector can represent the health state of the first object, the influence between the local feature vector in the health state and the local feature vector in the non-health state in the subsequent processing is avoided, and the accuracy of the subsequent processing result is improved.
Next, a manner of obtaining the local feature vector will be described.
In an exemplary embodiment, the above step 202 includes the following steps.
2021. And performing feature extraction on medical record information at a plurality of recording moments contained in the medical record data to obtain feature data at the plurality of recording moments.
In the embodiment of the present application, after acquiring medical record data of the first object, the computer device performs feature extraction on medical record information at a plurality of recording times included in the medical record data to obtain feature data at the plurality of recording times.
Optionally, in this embodiment of the present application, the medical record information includes disease description data for describing a disease and examination assay data for describing an examination result, and the computer device needs to perform feature extraction on the disease description data and the examination assay data respectively when performing feature extraction.
Taking a target recording time of the plurality of recording times as an example, for medical record information at the target recording time, at least one disease characteristic corresponding to the target recording time is extracted from disease description data contained in the medical record information at the target recording time, and the disease characteristics are cough, fever, suspected cold, diabetes and the like; at least one state feature corresponding to the target recording time is extracted from the examination and test data contained in the medical record information at the target recording time, and the state feature is used for describing the health state of the first object. Wherein, the characteristic data of the target recording time comprises: at least one symptom characteristic corresponding to the target recording time and at least one state characteristic corresponding to the target recording time.
The target recording time is any one of a plurality of recording times. In the embodiment of the application, each recording moment corresponds to at least one disease state characteristic and at least one state characteristic.
2022. And according to the recording time of the characteristic data, grouping the characteristic data at a plurality of recording times to obtain a plurality of characteristic data groups.
In the embodiment of the application, after the computer device acquires the feature data, the computer device performs grouping processing on the feature data at a plurality of recording times according to the recording time of the feature data to obtain a plurality of feature data groups. Wherein one characteristic data group comprises at least one characteristic data.
Optionally, in this embodiment of the present application, when performing grouping processing, the computer device sorts each feature data based on the sequence of the recording time from morning to evening to obtain a feature data sequence; and further, grouping the characteristic data sequence according to the grouping time length to obtain a plurality of characteristic data groups. The grouping time period may be any value, which is not limited in the embodiment of the present application, and the grouping time period may be, for example, 3 hours, 1 day, 1 week, 1 month, and the like. Optionally, in order to ensure the accuracy of the subsequent trajectory similarity calculation, the grouping duration corresponding to the first object is the same as the grouping duration corresponding to the second object.
Optionally, different feature data sets may include different feature data, and may also include repeated feature data.
In one possible embodiment, different characteristic data are included in different characteristic data sets. When the computer device groups the feature data sequence, for adjacent feature data groups, the feature data corresponding to the latest recording time in the former feature data group and the feature data corresponding to the earliest recording time in the latter feature data group are adjacent feature data in the feature data sequence.
In another possible embodiment, the different characteristic data sets comprise repeated characteristic data. When the computer equipment groups the feature data, regarding adjacent feature data, the feature data corresponding to the earliest recording time in the former feature data group and the feature data corresponding to the earliest recording time in the latter feature data group are adjacent feature data in a feature data sequence; or, the feature data corresponding to the latest recording time in the previous feature data group and the feature data corresponding to the latest recording time in the next feature data group are adjacent feature data in the feature data sequence.
2023. And respectively coding each characteristic data group by adopting a local characteristic coding network to obtain a plurality of local characteristic vectors of the first object.
In this embodiment, after the computer device obtains the feature data sets, the computer device uses a local feature coding network to code each feature data set, so as to obtain a plurality of local feature vectors of the first object. Wherein each feature data set corresponds to a local feature vector.
Optionally, in this embodiment of the application, before the feature data group is encoded, or after the feature data is acquired, in a case where the feature data includes numerical data, normalization processing is performed on the numerical data to obtain processed feature data, and in a case where the feature data includes text data, numerical quantization processing is performed on the text data to obtain processed feature data. Wherein, the processed characteristic data is used for encoding.
Optionally, in this embodiment of the present application, the number of the local feature vectors of the first object is n, the number of the local feature vectors of the second object is m, and both n and m are integers greater than 1. Next, a description is given of a trajectory similarity acquisition method.
In the case where n and m are not equal, in an exemplary embodiment, the above step 203 includes the following several steps.
2031a, obtaining the distance between each local feature vector of the first object and each local feature vector of the second object, and obtaining n × m distances.
2032a, determining the track similarity according to n × m distances.
In the embodiment of the application, when the computer device obtains the trajectory similarity, a plurality of local feature vectors of the first object and a plurality of local feature vectors of the second object are obtained, and the distance between each local feature vector of the first object and each local feature vector of the second object is obtained according to the local feature vectors corresponding to the first object and the second object, so as to obtain n × m distances.
In the embodiment of the present application, after acquiring the n × m distances, the computer device acquires the recording times of the feature data corresponding to the m local feature vectors of the first object, respectively, and the recording times of the feature data corresponding to the n local feature vectors of the second object, respectively; further, based on the order of the recording time from morning to evening, a time series corresponding to the local feature vector of the first object and a time series corresponding to the local feature vector of the second object are determined. In addition, since the local feature vector is acquired from the feature data group, the feature data group includes at least one feature data, and different features correspond to different recording times, when acquiring the recording time corresponding to the local feature vector, an average value or a median value of the recording times of the respective feature data included in the feature data group corresponding to the local feature vector may be used as the recording time corresponding to the local feature vector.
In this embodiment, after acquiring the time series of the local feature vectors corresponding to the first object and the second object, the computer device traverses the m local feature vectors of the first object and the n local feature vectors of the second object by using the time series corresponding to the local feature vector of the first object and the time series corresponding to the local feature vector of the second object as a reference, and obtains a shortest traversal feature path between the m local feature vectors of the first object and the n local feature vectors of the second object. The shortest traversal feature path is a traversal path corresponding to the sum of the shortest distances of the m local feature vectors of the first object and the n local feature vectors of the second object which traverse once. It should be noted that, when traversing the local feature vectors, it is necessary to traverse different local feature vectors of the same user in the order of morning and evening of the recording time. For local feature vectors of the same object, during traversal, traversing the local feature vectors with early recording time and then traversing the local feature vectors with late recording time; or, traversing the local feature vectors with the later recording time and then traversing the local feature vectors with the earlier recording time uniformly.
In this embodiment of the present application, after obtaining the shortest traversal feature path, the computer device determines the trajectory similarity according to a sum of shortest distances corresponding to the shortest traversal feature path.
In the case where n and m are equal, in an exemplary embodiment, the above step 203 includes the following steps.
2031b, acquiring the recording time of the feature data corresponding to the m local feature vectors of the first object, and the recording time of the feature data corresponding to the n local feature vectors of the second object.
2032b, determining a time series corresponding to the local feature vector of the first object and a time series corresponding to the local feature vector of the second object based on the order of the recording time from morning to evening.
2033b, determining the correspondence between the m local feature vectors of the first object and the n local feature vectors of the second object according to the time series corresponding to the local feature vectors of the first object and the time series corresponding to the local feature vectors of the second object.
2034b, obtaining the distance between the local feature vectors with corresponding relation.
2035b, determining the track similarity according to the sum of the distances between the local feature vectors with corresponding relationship.
In the first local feature vector and the second local feature vector having a correspondence relationship, the order of the first local feature vector in the time series of the local feature vector of the first object is the same as the order of the second local feature vector in the time series of the local feature vector of the second object. The first local feature vector is any one of a plurality of local feature vectors of the first object, and the second local feature vector is any one of a plurality of local feature vectors of the second object.
It should be noted that the above description of the track similarity obtaining manner is only exemplary and explanatory, and In an exemplary embodiment, the computer device may also use other algorithms to calculate the track similarity, such as EDR (Edit Distance on Real Sequence), lcs (Longest Common string), DTW (Dynamic Time Warping), free Distance, OWD (One Way Distance), LIP (location In-between Polylines), and the like.
Referring to fig. 3, a flowchart of a similar object determination method according to another embodiment of the present application is shown. The method can include the following steps (301-304).
Step 301 and step 302 are the same as step 201 and step 202 in the embodiment of fig. 2, and refer to the embodiment of fig. 2 specifically, which are not described herein again.
Step 303, obtaining a global feature vector of the first object based on the association relationship between the plurality of local feature vectors of the first object.
In an embodiment of the present application, after obtaining the plurality of local feature vectors, the computer device obtains a global feature vector of the first object based on an association relationship between the plurality of local feature vectors of the first object. Wherein the global feature vector is used to characterize global condition features and overall health status of the first object.
Optionally, in this embodiment of the present application, the computer device obtains, through the global feature coding network, the global feature vector of the first object based on an association relationship between the plurality of local feature vectors of the first object.
And step 304, executing the downstream task according to the global feature vector to obtain a task output result of the first object.
In the embodiment of the application, after the computer device obtains the global feature vector, the computer device executes a downstream task according to the global feature vector to obtain a task output result of the first object. The downstream task may be any task, and for example, the downstream task may be a disease condition identification task, a recovery probability prediction task, a death probability prediction task, a determination task with a specific disease condition, and the like, which is not limited in the embodiment of the present application.
Optionally, in this embodiment of the present application, the computer device executes a downstream task according to the global feature vector through the task result output network, so as to obtain a task output result of the first object. Wherein, different downstream tasks correspond to different task result output networks.
Of course, in an exemplary embodiment, if desired, the computer device may also perform the downstream tasks according to the local feature vector and the global feature vector; alternatively, the downstream task is performed based on the feature vector of the first object (including the global feature vector and/or the local feature vector), and the feature vector of the second object (including the global feature vector and/or the local feature vector).
Optionally, in this embodiment of the application, after the computer device obtains the task output result, the computer device may display the task output result in a user interface of an application. Illustratively, as shown in fig. 4, a basic information input section 41 and a medical record information input section 42 are included in the user interface 40, the computer device acquires the marking information of the first object according to the basic information input section 41, and acquires the medical record data of the first object according to the medical record information input section; further, the computer device executes the steps described in the above embodiments of fig. 2 and fig. 3 according to the medical record data of the first object, obtains a task output result corresponding to the downstream task, and displays the task output result in the output result display block 43. Illustratively, in fig. 4, the downstream tasks include a disorder identification task 44, a healing probability prediction task 45, and a mortality probability prediction task 46.
In summary, in the technical scheme provided by the embodiment of the present application, the global feature vector is determined through the association relationship between the local feature vectors, so that the global feature vector can accurately represent the global disorder feature and the overall health state of the first object, and the accuracy of the task output result of the subsequent downstream task is improved; moreover, the global feature vector can represent the overall health state of the first object, and interaction between the global feature vector in a healthy state and the global feature vector in an unhealthy state in downstream task execution is avoided.
In addition, the downstream task is not limited in the application, that is, the global feature vector can be flexibly applied to various downstream tasks.
Optionally, in the present application, a medical record data processing model is used to obtain the trajectory similarity and the task output result according to the medical record data. Wherein, this medical record data processing model includes: the system comprises a local feature coding network, a global feature coding network and a task result output network. Next, a method for training a medical record data processing model in the present application will be described.
Referring to fig. 5, a flowchart of a method for training a medical record data processing model according to an embodiment of the present application is shown. The method can include the following steps (501-506).
The training samples are used for training the medical record data processing model. In the embodiment of the application, before the medical record data processing model is trained, a training sample of the medical record data processing model is acquired.
The training sample comprises medical record data of a sample object and a task result label, the medical record data comprises medical record information of a plurality of recording moments, and the task result label is a standard task output result of a downstream task.
Step 502, the medical record data of the sample object is encoded by using a local feature encoding network, and a plurality of local feature vectors of the sample object are obtained.
In the embodiment of the application, after the computer device obtains the training sample, the computer device encodes medical record data of the sample object in the training sample by using a local feature coding network to obtain a plurality of local feature vectors of the sample object. Wherein one local feature vector is used for characterizing the disease features and the health status of the sample object within a time period.
In an exemplary embodiment, the step 502 includes the following steps.
5021. And performing feature extraction on medical record information at a plurality of recording moments contained in the medical record data of the sample object to obtain sample feature data at the plurality of recording moments.
In the embodiment of the present application, after acquiring medical record data of the sample object, a computer device performs feature extraction on medical record information at a plurality of recording times included in the medical record data to obtain sample feature data at the plurality of recording times.
Optionally, in this embodiment of the present application, the medical record information includes sample disease description data for describing a disease, and sample examination assay data for describing an examination result, and when performing feature extraction, the computer device needs to perform feature extraction on the sample disease description data and the sample examination assay data, respectively.
Taking a target recording time in a plurality of recording times as an example, extracting and obtaining at least one sample disease characteristic corresponding to the target recording time from sample disease description data contained in medical record information of the target recording time for the medical record information of the target recording time; at least one sample state characteristic corresponding to the target recording time is extracted from sample examination and assay data contained in medical record information of the target recording time, and the sample state characteristic is used for describing the health state of the sample object. Wherein, the sample characteristic data of the target recording time comprises: at least one sample disease characteristic corresponding to the target recording time and at least one sample state characteristic corresponding to the target recording time.
5022. And according to the recording time of the sample characteristic data, grouping the sample characteristic data at a plurality of recording times to obtain a plurality of sample characteristic data groups.
In the embodiment of the application, after the computer device obtains the sample feature data, the computer device performs grouping processing on the sample feature data at a plurality of recording moments according to the recording moments of the sample feature data to obtain a plurality of sample feature data sets. Wherein, at least one sample characteristic data is included in one sample characteristic data group.
Optionally, in this embodiment of the present application, when performing grouping processing, the computer device sorts each sample feature data based on the sequence of the recording time from morning to evening to obtain a sample feature data sequence; and further, grouping the sample characteristic data sequence according to the grouping time length to obtain a plurality of sample characteristic data groups. The grouping time period may be any value, which is not limited in the embodiment of the present application, and the grouping time period may be, for example, 3 hours, 1 day, 1 week, 1 month, and the like.
5023. And respectively coding each sample characteristic data group by adopting a local characteristic coding network to obtain a plurality of local characteristic vectors of the sample object, wherein each sample characteristic data group corresponds to one local characteristic vector.
In this embodiment, after obtaining the sample feature data set, the computer device encodes each sample feature data set by using a local feature coding network, to obtain a plurality of local feature vectors of the sample object.
Optionally, in this embodiment of the application, before the sample feature data group is encoded, or after the sample feature data is acquired, in a case that the sample feature data includes numerical data, normalization processing is performed on the numerical data to obtain processed sample feature data, and in a case that the sample feature data includes text data, numerical quantization processing is performed on the text data to obtain processed sample feature data. Wherein, the processed sample characteristic data is used for encoding.
In this embodiment of the present application, after obtaining the local feature vectors, the computer device obtains global feature vectors of the sample objects based on the association relationships between the plurality of local feature vectors of the sample objects by using a global feature coding network. Wherein the global feature vector is used to characterize global condition features and overall health status of the sample object.
And step 504, executing a downstream task according to the global feature vector by adopting a task result output network to obtain a task output result of the sample object.
In the embodiment of the application, after the computer device obtains the global feature vector, the computer device executes a downstream task according to the global feature vector by using a task result output network to obtain a task output result of the sample object. The downstream task may be any task, and different downstream tasks correspond to different task result output networks.
And 505, determining the training loss of the medical record data processing model based on the task output result and the task result label of the sample object and the parameters corresponding to the local feature coding network.
In the embodiment of the application, after the computer device obtains the task output result, the training loss of the medical record data processing model is determined based on the task output result and the task result label of the sample object and the parameters corresponding to the local feature coding network.
Optionally, the training loss includes a first sub-loss and a second sub-loss. The first sub-loss is used for measuring the accuracy of an output result of the medical record data processing model, and the second sub-loss is used for measuring the difference between local feature vectors of adjacent time periods from the same sample object. In an exemplary embodiment, the step 505 includes the following steps.
5051. A first sub-penalty is determined based on the task output result and the task result label for the sample object.
Illustratively, assume that the task output result is y and the task result tag isThen the first sub-loss LyComprises the following steps:
wherein,the method comprises the steps of representing a local feature coding network, g (eta)) representing a global feature coding network, d (eta)) representing a task result output network, H representing a cross entropy function, and q representing distribution.
5052. And determining a second sub-loss based on the corresponding parameter of the local feature coding network.
Optionally, in this embodiment of the application, when the computer device obtains the second sub-loss, the computer device obtains the weight matrices of the networks in each layer in the local feature coding network, determines the maximum singular values corresponding to the weight matrices of the networks in each layer, and further determines the second sub-loss according to the maximum singular values corresponding to the networks in each layer.
Illustratively, suppose the number of network layers in the local feature coding network is L, and the weight matrix isThen the second sub-loss LsComprises the following steps:
wherein,a local feature-coded network is represented,and representing the maximum singular value of the l-th layer weight matrix in the local feature coding network.
5053. And determining the training loss of the medical record data processing model according to the first sub-loss and the second sub-loss.
In an embodiment of the present application, after obtaining the first sub-loss and the second sub-loss, the computer device determines a training loss of the medical record data processing model according to the first sub-loss and the second sub-loss.
Illustratively, assume that the first sub-loss is LyThe second sub-loss is LsThen the training loss L is:
And step 506, training the medical record data processing model according to the training loss.
In the embodiment of the application, after acquiring the training loss, the computer device trains the medical record data processing model according to the training loss.
Optionally, during the model training, the computer device adjusts parameters of the medical record data processing model according to the training loss, and repeats the above step 501 and 506 based on the medical record data processing model after the parameters are adjusted until the training loss converges.
In summary, in the technical solution provided in the embodiment of the present application, the local feature vector, the global feature vector, and the task output result are obtained according to medical record data through the medical record data processing model, on one hand, the local feature vector is used for representing the disease features of the object in a period of time, and the medical record data of the object is converted into the disease features in a plurality of periods of time, so that the global feature vector obtained according to the association relationship between the local feature vectors is more accurate; on the other hand, the local feature vector is also used for representing the health state of the object in a period of time, the global feature vector can be used for representing the overall health state of the object, the mutual influence between the local feature vector in the health state and the local feature vector in the non-health state in the subsequent processing process of the task result output network is avoided, and the accuracy of the task output result is improved.
It should be noted that, with regard to some details of the medical record data processing model, reference may be made to the description of the embodiment of fig. 2 and 3.
The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.
Referring to fig. 6, a block diagram of a similar object determining apparatus according to an embodiment of the present application is shown. The device has the function of realizing the similar object determination method, and the function can be realized by hardware or by hardware executing corresponding software. The device can be computer equipment, and can also be arranged in the computer equipment. The apparatus 600 may include: a medical record data acquisition module 610, a local vector acquisition module 620, a trajectory similarity determination module 630 and a similar object determination module 640.
The medical record data acquiring module 610 is configured to acquire medical record data of a first object, where the medical record data includes medical record information at a plurality of recording moments.
The local vector obtaining module 620 is configured to obtain a plurality of local feature vectors of the first object based on the medical record data; wherein one local feature vector is used to characterize the condition features and health status of the first subject over a period of time.
The trajectory similarity determining module 630 is configured to determine a trajectory similarity between the plurality of local feature vectors of the first object and the plurality of local feature vectors of the second object; and the track similarity is used for representing the similarity of two groups of local feature vectors in time sequence.
The similar object determination module 640 is configured to determine that the first object and the second object have similar symptoms if the trajectory similarity satisfies a condition.
In an exemplary embodiment, as shown in fig. 7, the local vector acquisition module 620 includes: a feature extraction sub-module 621, a data grouping sub-module 622, and a vector acquisition sub-module 623.
The feature extraction sub-module 621 is configured to perform feature extraction on medical record information at the multiple recording moments included in the medical record data to obtain feature data at the multiple recording moments.
The data grouping sub-module 622 is configured to perform grouping processing on the feature data at the multiple recording moments according to the recording moments of the feature data to obtain multiple feature data groups.
The vector obtaining sub-module 623 is configured to encode each feature data set by using a local feature coding network, so as to obtain a plurality of local feature vectors of the first object, where each feature data set corresponds to one local feature vector.
In an exemplary embodiment, the medical record information includes condition description data for describing a condition, and examination assay data for describing an examination result. As shown in fig. 7, the feature extraction sub-module 621 is configured to:
for medical record information at a target recording time, extracting at least one medical record characteristic corresponding to the target recording time from medical record description data contained in the medical record information at the target recording time;
and extracting at least one state feature corresponding to the target recording time from the examination and assay data contained in the medical record information of the target recording time, wherein the state feature is used for describing the health state of the first object.
In an exemplary embodiment, as shown in fig. 7, the data packet sub-module 622 is configured to:
sequencing all the feature data based on the sequence of the recording time from morning to evening to obtain a feature data sequence;
and grouping the characteristic data sequence according to the grouping duration to obtain a plurality of characteristic data groups.
In an exemplary embodiment, as shown in fig. 7, the apparatus 600 further includes: a data processing module 650.
The data processing module 650 is configured to, when the feature data includes numerical data, perform normalization processing on the numerical data to obtain processed feature data.
The data processing module 650 is further configured to perform numerical quantization processing on the text data to obtain processed feature data when the feature data includes text data; wherein the processed feature data is used for encoding.
In an exemplary embodiment, the number of local feature vectors of the first object is n, the number of local feature vectors of the second object is m, and n and m are integers greater than 1. As shown in fig. 7, the trajectory similarity determining module 630 includes: a distance acquisition sub-module 631 and a similarity determination sub-module 632.
The distance obtaining sub-module 631 is configured to obtain, when n and m are not equal, a distance between each local feature vector of the first object and each local feature vector of the second object, so as to obtain n × m distances.
The similarity determination submodule 632 is configured to determine the trajectory similarity according to the n × m distances.
In an exemplary embodiment, as shown in fig. 7, the similarity determination submodule 632 is configured to:
acquiring the recording time of the feature data corresponding to the m local feature vectors of the first object respectively, and the recording time of the feature data corresponding to the n local feature vectors of the second object respectively;
determining a time sequence corresponding to the local feature vector of the first object and a time sequence corresponding to the local feature vector of the second object based on the sequence of the recording time from morning to evening;
traversing the m local feature vectors of the first object and the n local feature vectors of the second object by taking the time series corresponding to the local feature vector of the first object and the time series corresponding to the local feature vector of the second object as a reference to obtain a shortest traversal feature path between the m local feature vectors of the first object and the n local feature vectors of the second object; wherein, the shortest traversal feature path is a traversal path corresponding to the sum of the shortest distances of once traversal of the m local feature vectors of the first object and the n local feature vectors of the second object;
and determining the track similarity according to the sum of the shortest distances corresponding to the shortest traversal feature paths.
In an exemplary embodiment, as shown in fig. 7, the trajectory similarity determination module 630 is configured to:
under the condition that n and m are equal, acquiring the recording time of the feature data corresponding to the m local feature vectors of the first object respectively, and the recording time of the feature data corresponding to the n local feature vectors of the second object respectively;
determining a time sequence corresponding to the local feature vector of the first object and a time sequence corresponding to the local feature vector of the second object based on the sequence of the recording time from morning to evening;
determining the corresponding relation between the m local feature vectors of the first object and the n local feature vectors of the second object according to the time sequence corresponding to the local feature vector of the first object and the time sequence corresponding to the local feature vector of the second object;
obtaining the distance between the local feature vectors with the corresponding relation;
and determining the track similarity according to the sum of the distances among a plurality of local feature vectors with the corresponding relation.
In an exemplary embodiment, as shown in fig. 7, the apparatus 600 further includes: a second object selection module 660.
The medical record data acquiring module 610 is further configured to acquire medical record data of a plurality of candidate objects.
The second object selection module 660 is configured to select, as the second object, a candidate object that satisfies a matching condition with the medical record data of the first object from the multiple candidate objects according to medical record data respectively corresponding to the candidate objects; wherein the matching condition comprises at least one of: the recording time of the medical record information is close, the medical record information comprises similar disease description data, the medical record information comprises disease description data aiming at the target disease, and the number of the recording time contained in the medical record information is the same.
In an exemplary embodiment, as shown in fig. 7, the apparatus 600 further includes: a global vector acquisition module 670 and a task result acquisition module 680.
The global vector obtaining module 670 is configured to obtain a global feature vector of the first object based on an association relationship between a plurality of local feature vectors of the first object, where the global feature vector is used to characterize a global disorder feature and an overall health status of the first object.
The task result obtaining module 680 is configured to execute a downstream task according to the global feature vector to obtain a task output result of the first object.
In summary, in the technical scheme provided in the embodiment of the present application, a plurality of local feature vectors of a first object are obtained through medical record data, and then a degree of similarity in time sequence of the local feature vectors of the first object and a second object is determined according to the plurality of local feature vectors of the first object and the plurality of local feature vectors of the second object, where the local feature vectors are used to characterize a disease condition feature of the first object in a time period, and medical record data of the first object is converted into a disease condition feature in a multi-segment time period, so that the local feature vectors can more finely embody the disease condition feature of the first object, and then a measurement result is more fine and accurate when the disease condition similarity between the first object and the second object is subsequently measured according to a trajectory similarity; moreover, the local feature vector is also used for representing the health state of the first object in a period of time, so that the accuracy of the measurement result in the case of the similarity of the symptoms is improved, and under the condition that the local feature vector can represent the health state of the first object, the influence between the local feature vector in the health state and the local feature vector in the non-health state in the subsequent processing is avoided, and the accuracy of the subsequent processing result is improved.
Referring to fig. 8, a block diagram of a medical record data processing model training apparatus according to an embodiment of the present application is shown. The device has the function of realizing the medical record data processing model training method, and the function can be realized by hardware or by hardware executing corresponding software. The device can be computer equipment, and can also be arranged in the computer equipment. The apparatus 800 may include: a training sample acquisition module 810, a local feature acquisition module 820, a global feature acquisition module 830, a downstream task execution module 840, a training loss determination module 850, and a target model training module 860.
The training sample acquisition module 810 is configured to acquire a training sample of a medical record data processing model, where the training sample includes medical record data of a sample object and a task result label, and the medical record data includes medical record information at a plurality of recording moments; wherein, the medical record data processing model comprises: the system comprises a local feature coding network, a global feature coding network and a task result output network.
The local feature obtaining module 820 is configured to encode the medical record data of the sample object by using the local feature encoding network to obtain a plurality of local feature vectors of the sample object; wherein one local feature vector is used to characterize the condition features and health status of the sample object over a time period.
The global feature obtaining module 830 is configured to obtain a global feature vector of the sample object based on an association relationship between a plurality of local feature vectors of the sample object by using the global feature coding network, where the global feature vector is used to characterize a global disorder feature and an overall health status of the sample object.
The downstream task execution module 840 is configured to execute a downstream task according to the global feature vector by using the task result output network, so as to obtain a task output result of the sample object.
The training loss determining module 850 is configured to determine a training loss of the medical record data processing model based on the task output result and the task result label of the sample object and the parameters corresponding to the local feature coding network.
The target model training module 860 is configured to train the medical record data processing model according to the training loss.
In an exemplary embodiment, as shown in fig. 9, the training loss determination module 850 includes: a loss determination submodule 851.
The loss determining sub-module 851 is configured to determine a first sub-loss based on the task output result and the task result label of the sample object, where the first sub-loss is used to measure the accuracy of the output result of the medical record data processing model.
The loss determining sub-module 851 is further configured to determine a second sub-loss based on the parameter corresponding to the local feature coding network, where the second sub-loss is used to measure a difference between local feature vectors from adjacent time periods of the same sample object.
The loss determining submodule 851 is further configured to determine a training loss of the medical record data processing model according to the first sub-loss and the second sub-loss.
In an exemplary embodiment, as shown in fig. 9, the loss determination sub-module 851 is configured to:
acquiring a weight matrix of each layer of network in the local feature coding network;
determining maximum singular values respectively corresponding to the weight matrixes of the networks in each layer;
and determining the second sub-loss according to the maximum singular values respectively corresponding to the networks of each layer.
In an exemplary embodiment, as shown in fig. 9, the local feature obtaining module 820 includes: a feature extraction sub-module 821, a data grouping sub-module 822, and a vector acquisition sub-module 823.
The feature extraction sub-module 821 is configured to perform feature extraction on the medical record information of the plurality of recording moments included in the medical record data of the sample object to obtain sample feature data of the plurality of recording moments.
The data grouping submodule 822 is configured to perform grouping processing on the sample feature data at the multiple recording moments according to the recording moments of the sample feature data, so as to obtain multiple sample feature data groups.
The vector obtaining sub-module 823 is configured to encode each sample feature data set by using a local feature coding network, to obtain a plurality of local feature vectors of the sample object, where each sample feature data set corresponds to one local feature vector.
In an exemplary embodiment, the medical record information includes sample condition description data for describing a condition, and sample examination assay data for describing an examination result. As shown in fig. 9, the feature extraction sub-module 821 is configured to:
for medical record information at a target recording time, extracting at least one sample disease characteristic corresponding to the target recording time from sample disease description data contained in the medical record information at the target recording time;
extracting at least one sample state characteristic corresponding to the target recording time from sample examination and assay data contained in medical record information of the target recording time, wherein the sample state characteristic is used for describing the health state of the sample object; wherein the sample characteristic data of the target recording time comprises: the at least one sample condition characteristic corresponding to the target recording time and the at least one sample state characteristic corresponding to the target recording time.
In summary, in the technical solution provided in the embodiment of the present application, the local feature vector, the global feature vector, and the task output result are obtained according to medical record data through the medical record data processing model, on one hand, the local feature vector is used for representing the disease features of the object in a period of time, and the medical record data of the object is converted into the disease features in a plurality of periods of time, so that the global feature vector obtained according to the association relationship between the local feature vectors is more accurate; on the other hand, the local feature vector is also used for representing the health state of the object in a period of time, the global feature vector can be used for representing the overall health state of the object, the mutual influence between the local feature vector in the health state and the local feature vector in the non-health state in the subsequent processing process of the task result output network is avoided, and the accuracy of the task output result is improved.
It should be noted that, when the apparatus provided in the foregoing embodiment implements the functions thereof, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the apparatus may be divided into different functional modules to implement all or part of the functions described above. In addition, the apparatus and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.
Referring to fig. 10, a block diagram of a computer device according to an embodiment of the present application is shown. The computer device can be used for realizing the functions of the similar object determination method or the training method of the medical record data processing model. Specifically, the method comprises the following steps:
the computer apparatus 1000 includes a Central Processing Unit (CPU) 1001, a system Memory 1004 including a Random Access Memory (RAM) 1002 and a Read Only Memory (ROM) 1003, and a system bus 1005 connecting the system Memory 1004 and the Central Processing Unit 1001. The computer device 1000 also includes a basic Input/Output system (I/O system) 1006, which helps to transfer information between various devices within the computer, and a mass storage device 1007, which stores an operating system 1013, application programs 1014, and other program modules 1015.
The basic input/output system 1006 includes a display 1008 for displaying information and an input device 1009, such as a mouse, keyboard, etc., for user input of information. Wherein a display 1008 and an input device 1009 are connected to the central processing unit 1001 via an input-output controller 1010 connected to the system bus 1005. The basic input/output system 1006 may also include an input/output controller 1010 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, the input-output controller 1010 also provides output to a display screen, a printer, or other type of output device.
The mass storage device 1007 is connected to the central processing unit 1001 through a mass storage controller (not shown) connected to the system bus 1005. The mass storage device 1007 and its associated computer-readable media provide non-volatile storage for the computer device 1000. That is, the mass storage device 1007 may include a computer-readable medium (not shown) such as a hard disk or a CD-ROM (Compact disk Read-Only Memory) drive.
Without loss of generality, computer readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM (Erasable Programmable Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), flash Memory or other solid state Memory technology, CD-ROM, DVD (Digital Video Disc) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that computer storage media is not limited to the foregoing. The system memory 1004 and mass storage device 1007 described above may be collectively referred to as memory.
According to various embodiments of the present application, the computer device 1000 may also operate as a remote computer connected to a network via a network, such as the Internet. That is, the computer device 1000 may be connected to the network 1012 through the network interface unit 1011 connected to the system bus 1005, or may be connected to other types of networks or remote computer systems (not shown) using the network interface unit 1011.
The memory also includes a computer program stored in the memory and configured to be executed by the one or more processors to implement the similar object determination method described above or to implement the training method of the medical record data processing model described above.
In an exemplary embodiment, a computer readable storage medium is further provided, in which at least one instruction, at least one program, a code set, or a set of instructions is stored, which when executed by a processor, implements the above similar object determination method, or implements the above training method of medical record data processing model.
Optionally, the computer-readable storage medium may include: ROM (Read Only Memory), RAM (Random Access Memory), SSD (Solid State drive), or optical disc. The Random Access Memory may include a ReRAM (resistive Random Access Memory) and a DRAM (Dynamic Random Access Memory).
In an exemplary embodiment, a computer program product or computer program is also provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the computer device executes the similar object determination method or the medical record data processing model training method.
It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. In addition, the step numbers described herein only exemplarily show one possible execution sequence among the steps, and in some other embodiments, the steps may also be executed out of the numbering sequence, for example, two steps with different numbers are executed simultaneously, or two steps with different numbers are executed in a reverse order to the order shown in the figure, which is not limited by the embodiment of the present application.
The above description is only exemplary of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the present application should be included in the protection scope of the present application.
Claims (17)
1. A method for similar object determination, the method comprising:
acquiring medical record data of a first object, wherein the medical record data comprises medical record information at a plurality of recording moments, and the medical record information comprises disease description data for describing diseases and examination test data for describing examination results;
extracting the features of the medical record information at the plurality of recording moments contained in the medical record data to obtain the feature data at the plurality of recording moments;
sequencing all the feature data based on the sequence of the recording time from morning to evening to obtain a feature data sequence;
according to the grouping duration, carrying out grouping processing on the characteristic data sequence to obtain a plurality of characteristic data groups;
coding each characteristic data group respectively to obtain a plurality of local characteristic vectors of the first object; wherein each feature data set corresponds to a local feature vector characterizing the condition features and health status of the first subject over a time period, the condition features being derived from the condition description data and the health status being derived from the examination assay data;
determining a trajectory similarity between the plurality of local feature vectors of the first object and the plurality of local feature vectors of the second object; the track similarity is used for representing the similarity of two groups of local feature vectors in time sequence;
determining that the first object and the second object have similar symptoms if the trajectory similarity satisfies a condition.
2. The method of claim 1, wherein said encoding each of said feature data sets to obtain a plurality of local feature vectors of said first object comprises:
and respectively coding each characteristic data group by adopting a local characteristic coding network to obtain a plurality of local characteristic vectors of the first object.
3. The method according to claim 1, wherein the extracting features of the medical record information at the plurality of recording moments included in the medical record data to obtain the feature data at the plurality of recording moments comprises:
for medical record information at a target recording time, extracting at least one medical record characteristic corresponding to the target recording time from medical record description data contained in the medical record information at the target recording time;
extracting at least one state feature corresponding to the target recording time from examination and assay data contained in medical record information of the target recording time, wherein the state feature is used for describing the health state of the first object;
wherein, the characteristic data of the target recording time comprises: at least one disease characteristic corresponding to the target recording time and at least one state characteristic corresponding to the target recording time.
4. The method of claim 1, wherein before encoding each of the feature data sets to obtain a plurality of local feature vectors of the first object, the method further comprises:
under the condition that the characteristic data comprise numerical data, carrying out normalization processing on the numerical data to obtain processed characteristic data;
under the condition that the feature data comprise text data, carrying out numerical value quantization processing on the text data to obtain processed feature data;
wherein the processed feature data is used for encoding.
5. The method of claim 1, wherein the number of local feature vectors of the first object is n, the number of local feature vectors of the second object is m, and n and m are both integers greater than 1;
the determining of the trajectory similarity between the plurality of local feature vectors of the first object and the plurality of local feature vectors of the second object comprises:
under the condition that n and m are not equal, acquiring the distance between each local feature vector of the first object and each local feature vector of the second object to obtain n multiplied by m distances;
and determining the track similarity according to the n multiplied by m distances.
6. The method of claim 5, wherein determining the trajectory similarity from the nxm distances comprises:
acquiring the recording time of the feature data corresponding to the m local feature vectors of the first object respectively, and the recording time of the feature data corresponding to the n local feature vectors of the second object respectively;
determining a time sequence corresponding to the local feature vector of the first object and a time sequence corresponding to the local feature vector of the second object based on the sequence of the recording time from morning to evening;
traversing the m local feature vectors of the first object and the n local feature vectors of the second object by taking the time series corresponding to the local feature vector of the first object and the time series corresponding to the local feature vector of the second object as a reference to obtain a shortest traversal feature path between the m local feature vectors of the first object and the n local feature vectors of the second object; wherein, the shortest traversal feature path is a traversal path corresponding to the sum of the shortest distances of once traversal of the m local feature vectors of the first object and the n local feature vectors of the second object;
and determining the track similarity according to the sum of the shortest distances corresponding to the shortest traversal feature paths.
7. The method of claim 5, wherein determining a trajectory similarity between the plurality of local feature vectors of the first object and the plurality of local feature vectors of the second object comprises:
under the condition that n and m are equal, acquiring the recording time of the feature data corresponding to the m local feature vectors of the first object respectively, and the recording time of the feature data corresponding to the n local feature vectors of the second object respectively;
determining a time sequence corresponding to the local feature vector of the first object and a time sequence corresponding to the local feature vector of the second object based on the sequence of the recording time from morning to evening;
determining the corresponding relation between the m local feature vectors of the first object and the n local feature vectors of the second object according to the time sequence corresponding to the local feature vector of the first object and the time sequence corresponding to the local feature vector of the second object;
obtaining the distance between the local feature vectors with the corresponding relation;
and determining the track similarity according to the sum of the distances among a plurality of local feature vectors with the corresponding relation.
8. The method of claim 1, further comprising:
acquiring medical record data of a plurality of candidate objects;
selecting a candidate object which meets a matching condition with the medical record data of the first object from the plurality of candidate objects as the second object according to the medical record data corresponding to each candidate object;
wherein the matching condition comprises at least one of: the recording time of the medical record information is close, the medical record information comprises similar disease description data, the medical record information comprises disease description data aiming at the target disease, and the number of the recording time contained in the medical record information is the same.
9. The method of any of claims 1 to 8, wherein after obtaining the plurality of local feature vectors of the first object based on the medical record data, further comprising:
obtaining a global feature vector of the first object based on an incidence relation among a plurality of local feature vectors of the first object, wherein the global feature vector is used for representing global disease features and a whole health state of the first object;
and executing a downstream task according to the global feature vector to obtain a task output result of the first object.
10. A method for training a medical record data processing model, the method comprising:
acquiring a training sample of a medical record data processing model, wherein the training sample comprises medical record data of a sample object and a task result label, and the medical record data comprises medical record information at a plurality of recording moments; wherein, the medical record data processing model comprises: the medical record information comprises disease description data used for describing disease and examination test data used for describing examination results;
performing feature extraction on the medical record information of the plurality of recording moments contained in the medical record data of the sample object by adopting the local feature coding network to obtain sample feature data of the plurality of recording moments; sequencing each sample characteristic data based on the sequence of the recording time from morning to evening to obtain a sample characteristic data sequence; according to the grouping duration, carrying out grouping processing on the sample characteristic data sequence to obtain a plurality of sample characteristic data groups; respectively encoding each sample characteristic data group to obtain a plurality of local characteristic vectors of the sample object; wherein each sample feature data set corresponds to a local feature vector characterizing the condition features and health status of the sample subject over a time period, the condition features being derived from the condition description data and the health status being derived from the examination assay data;
acquiring a global feature vector of the sample object by adopting the global feature coding network based on the incidence relation among a plurality of local feature vectors of the sample object, wherein the global feature vector is used for representing the global disease feature and the overall health state of the sample object;
executing a downstream task by adopting the task result output network according to the global feature vector to obtain a task output result of the sample object;
determining the training loss of the medical record data processing model based on the task output result and the task result label of the sample object and the parameters corresponding to the local feature coding network;
and training the medical record data processing model according to the training loss.
11. The method of claim 10, wherein determining a training loss of the medical record data processing model based on the task output results and task result labels of the sample objects and the parameters corresponding to the local feature coding network comprises:
determining a first sub-loss based on the task output result of the sample object and the task result label, wherein the first sub-loss is used for measuring the accuracy of the output result of the medical record data processing model;
determining a second sub-loss based on the parameters corresponding to the local feature coding network, wherein the second sub-loss is used for measuring the difference degree between local feature vectors from adjacent time periods of the same sample object;
and determining the training loss of the medical record data processing model according to the first sub-loss and the second sub-loss.
12. The method of claim 11, wherein determining the second sub-loss based on the corresponding parameter of the local feature encoding network comprises:
acquiring a weight matrix of each layer of network in the local feature coding network;
determining maximum singular values respectively corresponding to the weight matrixes of the networks in each layer;
and determining the second sub-loss according to the maximum singular values respectively corresponding to the networks of each layer.
13. The method of claim 11, wherein the medical record information includes sample condition description data for describing a condition, and sample examination assay data for describing an examination result;
the extracting features of the medical record information at the plurality of recording moments included in the medical record data of the sample object to obtain the sample feature data at the plurality of recording moments includes:
for medical record information at a target recording time, extracting at least one sample disease characteristic corresponding to the target recording time from sample disease description data contained in the medical record information at the target recording time;
extracting at least one sample state characteristic corresponding to the target recording time from sample examination and assay data contained in medical record information of the target recording time, wherein the sample state characteristic is used for describing the health state of the sample object;
wherein the sample characteristic data of the target recording time comprises: the at least one sample condition characteristic corresponding to the target recording time and the at least one sample state characteristic corresponding to the target recording time.
14. A similar object determination apparatus, characterized in that the apparatus comprises:
the medical record data acquisition module is used for acquiring medical record data of the first object, wherein the medical record data comprises medical record information at a plurality of recording moments, and the medical record information comprises disease description data for describing diseases and examination and test data for describing examination results;
the local vector acquisition module is used for performing feature extraction on medical record information at the plurality of recording moments contained in the medical record data to obtain feature data at the plurality of recording moments; sequencing all the feature data based on the sequence of the recording time from morning to evening to obtain a feature data sequence; according to the grouping duration, carrying out grouping processing on the characteristic data sequence to obtain a plurality of characteristic data groups; coding each characteristic data group respectively to obtain a plurality of local characteristic vectors of the first object; wherein each feature data set corresponds to a local feature vector characterizing the condition features and health status of the first subject over a time period, the condition features being derived from the condition description data and the health status being derived from the examination assay data;
a trajectory similarity determination module, configured to determine a trajectory similarity between the plurality of local feature vectors of the first object and the plurality of local feature vectors of the second object; the track similarity is used for representing the similarity of two groups of local feature vectors in time sequence;
a similar object determination module, configured to determine that the first object and the second object have similar symptoms if the trajectory similarity satisfies a condition.
15. An apparatus for training a medical record data processing model, the apparatus comprising:
the system comprises a training sample acquisition module, a task result label acquisition module and a task result processing module, wherein the training sample acquisition module is used for acquiring a training sample of a medical record data processing model, the training sample comprises medical record data of a sample object and a task result label, and the medical record data comprises medical record information at a plurality of recording moments; wherein, the medical record data processing model comprises: the medical record information comprises disease description data used for describing disease and examination test data used for describing examination results;
the local feature acquisition module is used for extracting features of medical record information at a plurality of recording moments contained in the medical record data of the sample object by adopting the local feature coding network to obtain sample feature data at a plurality of recording moments; sequencing each sample characteristic data based on the sequence of the recording time from morning to evening to obtain a sample characteristic data sequence; according to the grouping duration, carrying out grouping processing on the sample characteristic data sequence to obtain a plurality of sample characteristic data groups; respectively encoding each sample characteristic data group to obtain a plurality of local characteristic vectors of the sample object; wherein each sample feature data set corresponds to a local feature vector characterizing the condition features and health status of the sample subject over a time period, the condition features being derived from the condition description data and the health status being derived from the examination assay data;
a global feature obtaining module, configured to obtain a global feature vector of the sample object based on an incidence relation between a plurality of local feature vectors of the sample object by using the global feature coding network, where the global feature vector is used to characterize a global disorder feature and an overall health status of the sample object;
the downstream task execution module is used for executing a downstream task according to the global feature vector by adopting the task result output network to obtain a task output result of the sample object;
the training loss determining module is used for determining the training loss of the medical record data processing model based on the task output result and the task result label of the sample object and the parameters corresponding to the local feature coding network;
and the target model training module is used for training the medical record data processing model according to the training loss.
16. A computer device comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the similar object determination method as claimed in any one of claims 1 to 9 or to implement the training method of medical record data processing model as claimed in any one of claims 10 to 13.
17. A computer-readable storage medium, wherein at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the storage medium, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by a processor to implement the similar object determination method according to any one of claims 1 to 9, or to implement the training method of the medical record data processing model according to any one of claims 10 to 13.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111177589.9A CN113628709B (en) | 2021-10-09 | 2021-10-09 | Similar object determination method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111177589.9A CN113628709B (en) | 2021-10-09 | 2021-10-09 | Similar object determination method, device, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113628709A CN113628709A (en) | 2021-11-09 |
CN113628709B true CN113628709B (en) | 2022-02-11 |
Family
ID=78390905
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111177589.9A Active CN113628709B (en) | 2021-10-09 | 2021-10-09 | Similar object determination method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113628709B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114334161B (en) * | 2021-12-30 | 2023-04-07 | 医渡云(北京)技术有限公司 | Model training method, data processing method, device, medium and electronic equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109978022A (en) * | 2019-03-08 | 2019-07-05 | 腾讯科技(深圳)有限公司 | A kind of medical treatment text message processing method and device, storage medium |
CN110413981A (en) * | 2018-04-27 | 2019-11-05 | 阿里巴巴集团控股有限公司 | The based reminding method and device of the quality detecting method of electronic health record, similar case history |
CN113380360A (en) * | 2021-06-07 | 2021-09-10 | 厦门大学 | Similar medical record retrieval method and system based on multi-mode medical record map |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105989598A (en) * | 2015-02-13 | 2016-10-05 | 中国科学院沈阳自动化研究所 | Eye fundus image vessel segmentation method based on local enhancement active contour module |
CN108648827B (en) * | 2018-05-11 | 2022-04-08 | 北京邮电大学 | Cardiovascular and cerebrovascular disease risk prediction method and device |
-
2021
- 2021-10-09 CN CN202111177589.9A patent/CN113628709B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110413981A (en) * | 2018-04-27 | 2019-11-05 | 阿里巴巴集团控股有限公司 | The based reminding method and device of the quality detecting method of electronic health record, similar case history |
CN109978022A (en) * | 2019-03-08 | 2019-07-05 | 腾讯科技(深圳)有限公司 | A kind of medical treatment text message processing method and device, storage medium |
CN113380360A (en) * | 2021-06-07 | 2021-09-10 | 厦门大学 | Similar medical record retrieval method and system based on multi-mode medical record map |
Also Published As
Publication number | Publication date |
---|---|
CN113628709A (en) | 2021-11-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hassantabar et al. | CovidDeep: SARS-CoV-2/COVID-19 test based on wearable medical sensors and efficient neural networks | |
US11144825B2 (en) | Interpretable deep learning framework for mining and predictive modeling of health care data | |
US20220254493A1 (en) | Chronic disease prediction system based on multi-task learning model | |
CN111696661B (en) | Patient grouping model construction method, patient grouping method and related equipment | |
JP7191443B2 (en) | Target object attribute prediction method based on machine learning, related equipment and computer program | |
Castellani et al. | Place and health as complex systems: A case study and empirical test | |
CN111738001A (en) | Training method of synonym recognition model, synonym determination method and equipment | |
CN113722474A (en) | Text classification method, device, equipment and storage medium | |
US8972406B2 (en) | Generating epigenetic cohorts through clustering of epigenetic surprisal data based on parameters | |
CN113822439A (en) | Task prediction method, device, equipment and storage medium | |
CN113707264A (en) | Medicine recommendation method, device, equipment and medium based on machine learning | |
CN112069329A (en) | Text corpus processing method, device, equipment and storage medium | |
Ding et al. | Multiple lesions detection of fundus images based on convolution neural network algorithm with improved SFLA | |
CN113628709B (en) | Similar object determination method, device, equipment and storage medium | |
CN112382355A (en) | Intelligent medical data management method, storage medium and system | |
CN115424691A (en) | Case matching method, system, device and medium | |
Seitanidis et al. | Identification of heart arrhythmias by utilizing a deep learning approach of the ECG signals on edge devices | |
CN113693611A (en) | Machine learning-based electrocardiogram data classification method and device | |
Zawadzka et al. | Graph representation integrating signals for emotion recognition and analysis | |
Mühling | Concept Landscapes: Aggregating Concept Maps for Analysis. | |
CN116543917A (en) | Information mining method for heterogeneous time sequence data | |
Dineva et al. | ICT-based beekeeping using IoT and machine learning | |
CN115758153A (en) | Target object track data processing method, device and equipment | |
Iapăscurtă | A less traditional approach to biomedical signal processing for sepsis prediction | |
Vijayakumar et al. | An Intelligent stacking Ensemble-Based Machine Learning Model for Heart abnormality |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 40055337 Country of ref document: HK |