CN117520473B

CN117520473B - Method and system for constructing perinatal medical research database

Info

Publication number: CN117520473B
Application number: CN202311582239.XA
Authority: CN
Inventors: 陈敦金; 代科伟; 何昊斐; 萧子如; 陈文锌
Original assignee: Individual
Current assignee: Dai Kewei
Priority date: 2023-11-23
Filing date: 2023-11-23
Publication date: 2024-04-26
Anticipated expiration: 2043-11-23
Also published as: CN117520473A

Abstract

The application relates to the technical field of perinatal medicine, and discloses a construction method and a system of a perinatal medicine research database, wherein the method comprises the following steps: extracting data from the surrounding medical record to obtain a medical text word; inputting the medical text words into a perinatal medical RoBERTa model to obtain target text vector representation and a vector representation node level thereof; node connection is carried out on the target text vector representation according to the vector representation node level, and a single-chain surrounding medical research data tree is constructed; calculating the vector similarity of the target text vector representations of any two leaf levels; node connection is carried out according to the similarity of any two vectors, and a multi-chain perinatal medical research data tree is constructed; and constructing a perinatal medical research database based on the single-chain perinatal medical research data tree and the multi-chain perinatal medical research data tree. The application improves the accuracy of the perinatal medical research database and the query efficiency of the perinatal medical research data.

Description

Method and system for constructing perinatal medical research database

Technical Field

The application relates to the technical field of perinatal medicine, in particular to a construction method and a construction system of a perinatal medicine research database.

Background

The perinatal medical data is important for the perinatal medical research, so that a scientific and accurate perinatal medical research database needs to be established. In the prior art, when constructing the perinatal medical research database, the main method is to observe and record the related information and indexes of the patient, such as medical record data, etc., in clinical practice, and then manually input the related information and indexes of the patient into the perinatal medical research database, or store the medical record data into the perinatal medical research database in the form of pictures. However, the manual input may have the problems of data deletion, inconsistent quality and the like, so that the constructed perinatal medical research database has low accuracy and is unfavorable for the subsequent perinatal medical research. The medical record data is stored in the form of pictures, and when the data needs to be queried, the data can only be searched through one Zhang Zhaopian, so that the query efficiency of the perinatal medical research data is low.

Disclosure of Invention

The present application is directed to solving at least one of the technical problems existing in the related art. Therefore, the embodiment of the application provides a method and a system for constructing a perinatal medical research database, which can improve the accuracy of the perinatal medical research database and the query efficiency of the perinatal medical research data.

In a first aspect, an embodiment of the present application provides a method for constructing a perinatal medical study database, including:

Carrying out data extraction on the perinatal medical record of each pregnant woman to obtain medical text words of each pregnant woman; the medical text words comprise pregnant woman information words, pregnant woman physiological index words, fetal physiological index words, delivery mode information words and maternal and infant health result words;

inputting the medical text words of each pregnant woman into a perinatal medical RoBERTa model to obtain a target text vector representation and a vector representation node level thereof output by the perinatal medical RoBERTa model; the vector representation node hierarchy includes a root node hierarchy and a leaf hierarchy;

Node connection is carried out on the target text vector representation of each pregnant woman according to the vector representation node level, and a single-chain perinatal medical research data tree of each pregnant woman is constructed;

Calculating the vector similarity of the target text vector representation of the leaf level between any two pregnant women;

Node connection is carried out according to the vector similarity of any two pregnant women, and a multi-chain perinatal medical research data tree between any two pregnant women is constructed;

constructing a perinatal medical research database based on the single-chain perinatal medical research data tree of each pregnant woman and the multi-chain perinatal medical research data tree between any two pregnant women;

The perinatal medical RoBERTa model is obtained based on a perinatal medical text sample and a text vector representation and vector representation node level training corresponding to the perinatal medical text sample.

In a second aspect, an embodiment of the present application provides a system for constructing a database of perinatal medical studies, including:

The data extraction module is used for extracting data from the perinatal medical record of each pregnant woman to obtain medical text words of each pregnant woman; the medical text words comprise pregnant woman information words, pregnant woman physiological index words, fetal physiological index words, delivery mode information words and maternal and infant health result words;

The vector representation output module is used for inputting the medical text words of each pregnant woman into the perinatal medical RoBERTa model to obtain the target text vector representation and the vector representation node level of the target text vector representation output by the perinatal medical RoBERTa model; the vector representation node hierarchy includes a root node hierarchy and a leaf hierarchy;

The first data tree construction module is used for carrying out node connection on the target text vector representation of each pregnant woman according to the vector representation node level, and constructing a single-chain perinatal medical study data tree of each pregnant woman;

The similarity calculation module is used for calculating the vector similarity of the target text vector representation of the leaf level between any two pregnant women;

The second data tree construction module is used for carrying out node connection according to the vector similarity of any two pregnant women and constructing a multi-chain perinatal medical research data tree between any two pregnant women;

The perinatal medical database construction module is used for constructing a perinatal medical research database based on the single-chain perinatal medical research data tree of each pregnant woman and the multi-chain perinatal medical research data tree between any two pregnant women;

In a third aspect, an embodiment of the present application further provides an electronic device, including a memory storing a plurality of instructions; the processor loads instructions from the memory to execute any of the perinatal medical study database construction methods provided by the embodiments of the present application.

In a fourth aspect, an embodiment of the present application further provides a computer readable storage medium, where the computer readable storage medium stores a plurality of instructions, where the instructions are adapted to be loaded by a processor to perform any one of the methods for building a perinatal medical study database provided by the embodiments of the present application.

In a fifth aspect, embodiments of the present application further provide a computer program product, including a computer program or instructions, which when executed by a processor implement any of the methods for building a database of perinatal medical studies provided by the embodiments of the present application.

According to the embodiment of the application, the perinatal medical research database is constructed by the single-chain perinatal medical research data tree and the multi-chain perinatal medical research data tree according to the automatic data extraction and the data association of the perinatal medical record, so that the accuracy of the perinatal medical research database is improved, and meanwhile, when the inquiry is needed, the data to be inquired can be quickly and accurately obtained according to the connectivity of the single-chain perinatal medical research data tree and the multi-association of the multi-chain perinatal medical research data tree, and the inquiry efficiency of the perinatal medical research data is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a method for constructing a database of perinatal medical study provided in an embodiment of the application;

FIG. 2 is a schematic diagram of a perinatal medical research database provided in an embodiment of the application;

FIG. 3 is a schematic diagram of a construction system of a database construction system for perinatal medical study provided in an embodiment of the application;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application. Meanwhile, in the description of the embodiments of the present application, the terms "first", "second", and the like are used only to distinguish the description, and are not to be construed as indicating or implying relative importance. Thus, features defining "first", "second" may explicitly or implicitly include one or more features. In the description of the embodiments of the present application, the meaning of "plurality" is two or more, unless explicitly defined otherwise.

The embodiment of the application provides a construction method and a construction system of a perinatal medical research database. Specifically, the embodiment of the application will be described from the perspective of a perinatal medical study database construction system, which can be integrated in an electronic device in particular, that is, the method of constructing the perinatal medical study database of the embodiment of the application can be executed by the electronic device. Optionally, the electronic device includes a terminal device. The terminal device may be a mobile phone, a tablet computer, a smart bluetooth device, a notebook computer, a game console, or a personal computer (Personal Computer, PC), etc. Optionally, the electronic device includes a server, which may be a stand alone server, or may be a server network or a server cluster including, but not limited to, a computer, a network host, a single network server, a network server set, or a cloud server formed by servers. Wherein the Cloud server is composed of a large number of computers or web servers based on Cloud Computing (Cloud Computing).

The following description of the embodiments is not intended to limit the preferred embodiments. Although a logical order is depicted in the flowchart, in some cases the steps shown or described may be performed in an order different than depicted in the figures.

Optionally, the embodiment of the present application is illustrated by taking the database construction system as an execution subject. Referring to fig. 1, fig. 1 is a schematic flow chart of a method for constructing a database for perinatal medical study according to an embodiment of the application. The specific flow of the method for constructing the perinatal medical research database provided by the embodiment of the application can be as follows from step 10 to step 40, comprising:

And step 10, extracting data from the perinatal medical record of each pregnant woman to obtain medical text words of each pregnant woman.

Optionally, the perinatal medical research database needs to have scientificity and authenticity, so when the perinatal medical research database needs to be constructed, medical staff needs to acquire real perinatal medical record of each pregnant woman, and input the perinatal medical record of each pregnant woman into the database construction system.

Further, after receiving the perinatal medical record of each pregnant woman, the database construction system analyzes the medical text words of the perinatal medical record of each pregnant woman to obtain medical text words of each pregnant woman, and in one embodiment, the medical text words at least include pregnant woman information words, such as name, age, delivery time and the like of the pregnant woman, pregnant woman physiological index words and fetal physiological index words, such as blood pressure, heart rate, hemoglobin, blood sugar, ultrasonic cardiography and the like, delivery mode information words, such as antenatal, caesarean section and the like, and maternal and infant health result words.

And step 20, inputting the medical text words of each pregnant woman into the perinatal medicine RoBERTa model to obtain a target text vector representation and a vector representation node level of the target text vector representation output by the perinatal medicine RoBERTa model.

Optionally, the database construction system inputs the medical text word of each pregnant woman into the perinatal medicine RoBERTa model, and obtains a vector representation node level of the output target text vector representation of the perinatal medicine RoBERTa model and each target text vector representation, wherein the vector representation node level comprises a root node level and a leaf level. It can thus be appreciated that the target text vector representations are output by the perinatal medicine RoBERTa model, and whether each target text vector representation is a root node level or a leaf level. In one embodiment, the pregnant woman information word is a root node level, and the pregnant woman physiological index word, the fetal physiological index word, the delivery mode information word and the maternal and infant health result word are leaf levels.

It should be noted that, the perinatal medical RoBERTa model is obtained by performing model training on the pre-training language model BERT based on the perinatal medical text sample and the corresponding text vector representation and vector representation node level. In the embodiment of the application, model parameters in the surrounding medicine RoBERTa model training process are optimized by a negative sampling comparison method, and the method specifically comprises the following steps: constructing a surrounding medical text training set as D= { (q, y ⁺) }, wherein q is a surrounding medical text sample, y ⁺ is text vector representation, A ^- is a vector representation node level, and then the matching degree g (q, y) between the surrounding medical text sample and the corresponding text vector representation and vector representation node level is as follows:

the negative sampling loss to be optimized during training is defined as:

and when the model parameters are optimized, a step degree optimizer Adam is adopted to optimize the network parameters.

In one embodiment, the perinatal medicine RoBERTa model includes a vector encoding layer, a feature fusion layer and an overfitting prevention layer, wherein the vector encoding layer is a form of converting input data into a vectorized representation, and can convert various different types of data into vectors with the same dimension. The main goal of the feature fusion layer is to combine features (or feature extractors) from multiple different sources into a better feature representation. The anti-overfitting layer acts to prevent the model from overfitting on the training data. Therefore, the medical text word of each pregnant woman is input into the perinatal medical RoBERTa model to obtain a target text vector representation, specifically:

The database construction system inputs the medical text words of each pregnant woman to the vector coding layer for vector coding, and initial coding vectors of the medical text words of each pregnant woman output by the vector coding layer are obtained, wherein the initial coding vectors of the medical text words of each pregnant woman are I is the vector coding value of the medical text words of each pregnant woman, and the vector coding value/>A is a text feature value of each medical text word, u is a word feature value of each medical text word, v is a semantic feature value of each medical text word, n is a length of each medical text word, and T and B are coefficients of a vector coding learning representation. Thus, in an embodiment, the initial encoding vector of the medical text words for each pregnant woman may be expressed as/>

Further, the database construction system inputs the initial coding vector of the medical text word of each pregnant woman to the feature fusion layer for vector fusion, so as to obtain an initial text vector representation of the medical text word of each pregnant woman output by the feature fusion layer, wherein the initial text vector representation can be expressed as

Further, the database construction system performs the anti-overfitting by inputting the initial text vector representation of the medical text word of each pregnant woman to the dropoff overfitting prevention layer with the loss rate of 0.5, and obtains the target text vector representation of the medical text word of each pregnant woman output by the overfitting prevention layer, so that the target text vector representation can be represented as dropoff (0.5)

Further, the vector encoding layers include an embedding layer, a position encoding layer, an attention mechanism, a first residual normalization layer, a feed forward network, and a second residual normalization layer, wherein the embedding layer primarily converts input discrete data, such as words, into a continuous vector representation. The position-coding layer functions to process the order information of the elements in the sequence data. The attention mechanism is then used to calculate and focus on the important parts of the input sequence so that the model can process the sequence data more efficiently. The multi-headed attention sub-layer is a key component thereof, and is linearly transformed by using a linear transformation layer pair Q, K, V. The first residual normalization layer and the second residual normalization layer are added outside each sub-layer and are respectively connected with the output of the multi-head self-attention mechanism and the output of the feedforward full-connection network. The feed forward network is a fully connected network architecture for processing and converting the characteristics of the input data. Therefore, the medical text words of each pregnant woman are input into the vector coding layer, and the initial coding vector of the medical text words of each pregnant woman output by the vector coding layer is obtained, specifically:

The database construction system inputs the medical text words of each pregnant woman to the embedding layer, and the embedding layer converts each word unit in the medical text words of each pregnant woman into a first coding vector with three dimensions, namely the number of description texts, the number of word units contained in the description texts and the embedding dimension of each word unit.

Further, the database construction system inputs the first coding vector to the position coding layer, the position coding layer marks the position of each text unit in the description text to obtain a first coding array, and the first coding array and the first coding vector are overlapped to obtain a second coding vector.

Further, the database construction system inputs the second coding vector to an attention mechanism, the attention mechanism performs linear mapping on the second coding vector to obtain a Query vector, a Key vector and a Value vector, and an attention matrix is obtained according to the Query vector and the Key vector, wherein the attention matrix can be expressed as: softmax { (Q x K T)/[ sqrt (d_k) ] }

Where Q is a Query vector, K is a Key Key vector, and d_k is the dimension of the Key Key vector.

Further, the database construction system carries out weighting processing on the Value vector according to the attention matrix to obtain a first hidden layer vector corresponding to the second coding vector.

Further, the database construction system inputs the second coding vector and the first hidden layer vector to a first residual error standardization layer, the first residual error standardization layer carries out residual error connection on the second coding vector and the first hidden layer vector to obtain a first spliced vector, and the first residual error standardization layer carries out normalization processing on the first spliced vector to obtain the second hidden layer vector.

Further, the database construction system inputs the second hidden layer vector to the feedforward network, the feedforward network performs two-layer linear mapping on the second hidden layer vector, and the second hidden layer vector is processed through the first activation function to obtain a third hidden layer vector of the second hidden layer vector.

Further, the database construction system inputs the second hidden layer vector and the third hidden layer vector to a second residual error standardization layer, the second residual error standardization layer carries out residual error connection on the second hidden layer vector and the third hidden layer vector to obtain a second spliced vector, and the second residual error standardization layer carries out normalization processing on the second spliced vector to obtain an initial coding vector of the medical text word of each pregnant woman.

And 30, node connection is carried out on the target text vector representation of each pregnant woman according to the vector representation node level, and a single-chain perinatal medical study data tree of each pregnant woman is constructed.

Optionally, the database construction system determines a root node level in the node levels of the vector representation, and a leaf level for each level, wherein a primary leaf level is connected to the root node level, a secondary leaf level is connected to the primary leaf level, and so on, to obtain an N-level leaf level.

Further, the database construction system performs node connection on the target text vector representation of each pregnant woman according to the hierarchical sequence of the root node hierarchy, the primary leaf hierarchy, the secondary leaf hierarchy and the N-level leaf hierarchy, and constructs a single-chain perinatal medical study data tree of each pregnant woman.

In an embodiment, the pregnant woman information word is a root node level, the pregnant woman physiological index word and the fetal physiological index word are first-level leaf levels, the delivery mode information word is a second-level leaf level of the pregnant woman physiological index word, and the maternal and infant health result word is a second-level leaf level of the pregnant woman physiological index word and the fetal physiological index word, so that the pregnant woman physiological index word and the fetal physiological index word are taken as two node branches of the pregnant woman information word, the maternal and infant health result word and the delivery mode information word are taken as node branches of the pregnant woman physiological index word, the maternal and infant health result word is taken as node branches of the fetal physiological index word, a single-chain perinatal medical research data tree is constructed, and the single-chain perinatal medical research data tree can refer to fig. 2, and fig. 2 is a schematic diagram of a perinatal medical research database provided in the embodiment of the application.

Step 40, calculating the vector similarity of the target text vector representation of the leaf level between any two pregnant women.

Optionally, the database construction system calculates the vector similarity of the target text vector representation of the leaf level between any two pregnant women, specifically:

The database construction system acquires two text vector representations to be matched, which are positioned at the same leaf level and at the same branch node, in target text vector representations of leaf levels of any two pregnant women, wherein the two text vector representations to be matched are a first text vector representation s _i＝{w₁,w₂,…,w_i and a second text vector representation r _j＝{e₁,e₂,…,e_j to be matched respectively. In an embodiment, referring to fig. 2, in the target text vector representation of the leaf level of any two pregnant women, two text vectors to be matched at the same leaf level and the same branch node are represented as a2, b2 and c2, a3, b3 and c3, a4, b4 and c4, a5, b5 and c5.

Further, the database construction system obtains a maximum Common substring Common (w _i,e_j) between the first to-be-matched text vector representation s _i and the second to-be-matched text vector representation r _i. Further, the database construction system calculates a vector similarity between the two text vector representations to be matched according to the first text vector representation to be matched s _i, the second text vector representation to be matched r _i, and the maximum Common substring Common (w _i,e_j).

Further, calculating a vector similarity between two text vector representations to be matched, comprising:

the database construction system calculates the word similarity between two text vector representations to be matched, and the specific formula of the word similarity is as follows:

the database construction system calculates the matching similarity between two text vector representations to be matched, and the specific formula of the matching similarity is as follows:

The database construction system calculates the vector similarity between two text vector representations to be matched according to the word similarity and the matching similarity, and the specific formula of the vector similarity is as follows:

wherein WordSim (s _i,r_i) is word similarity, twoGramSim (s _i,r_i) is matching similarity, and RelSim (s _i,r_i) is vector similarity.

Further, since any one of the leaf nodes in the leaf hierarchy is limited by the maximum number, the maximum number of connections of any one of the leaf nodes in the leaf hierarchy and all candidate node links in the single-chain perinatal medical study data tree for each pregnant woman are obtained. Therefore, the database construction system calculates the relevance score of each candidate node link according to the vector similarity of each candidate node link, and sorts each candidate node link according to the relevance score of each candidate node link to obtain the sorted node links.

Further, the method comprises the steps of. The database construction system determines the node links with the largest connection quantity before in the ordered node links as target node links of any leaf node in a leaf level in a single-chain perinatal medical research data tree of each pregnant woman, and the calculation formula of the relevance score of each candidate node link is as follows:

Where f (s _i,r_i) is a relevance score and n is the length of the medical text word of the leaf node corresponding to the candidate node link.

And 50, connecting nodes according to the vector similarity of any two pregnant women, and constructing a multi-chain perinatal medical research data tree between any two pregnant women.

Optionally, the database construction system compares the vector similarity of any two pregnant women with a preset similarity threshold value to obtain a comparison result, wherein the comparison result may be that the vector similarity of any two pregnant women is greater than or equal to the preset similarity threshold value, and the comparison result may also be that the vector similarity of any two pregnant women is less than the preset similarity threshold value.

Therefore, in the embodiment of the present application, if the comparison result is that the vector similarity of any two pregnant women is greater than or equal to the preset similarity threshold, the database construction system performs node connection on two text vector representations to be matched, which are located at the same leaf level and are located at the same branch node, in the target text vector representation of the leaf level, so as to construct a multi-chain perinatal medical research data tree between any two pregnant women, as shown in fig. 2.

Step 60, constructing a perinatal medical study database based on the single-chain perinatal medical study data tree of each pregnant woman and the multi-chain perinatal medical study data tree between any two pregnant women.

Optionally, the database construction system correlates the single-chain perinatal medical study data tree of each pregnant woman with the multi-chain perinatal medical study data tree between any two pregnant women to construct a perinatal medical study database.

In an embodiment, the pregnant woman information word is a root node level, the pregnant woman physiological index word and the fetal physiological index word are first-level leaf levels, the delivery mode information word is a second-level leaf level of the pregnant woman physiological index word, and the maternal and infant health result word is a second-level leaf level of the pregnant woman physiological index word and the fetal physiological index word, so that the pregnant woman physiological index word and the fetal physiological index word are used as two node branches of the pregnant woman information word, the maternal and infant health result word and the delivery mode information word are used as node branches of the pregnant woman physiological index word, and the maternal and infant health result word is used as a node branch of the fetal physiological index word.

The method comprises the steps of presetting a similarity threshold value to be 0.8, wherein a pregnant woman information word of a pregnant woman 1 is a1, a pregnant woman physiological index word is a2, a fetus physiological index word a3, a delivery mode information word a4 and a mother and infant health result word a5. The pregnant woman 2 has pregnant woman information word b1, pregnant woman physiological index word b2, fetal physiological index word b3, delivery mode information word b4 and maternal and infant health result word b5. The pregnant woman 3 has pregnant woman information word c1, pregnant woman physiological index word c2, fetal physiological index word c3, delivery mode information word c4 and maternal and infant health result word c5. The vector similarity of the physiological index words of pregnant women is 0.8, the vector similarity of the physiological index words of pregnant women is a2, the physiological index words of pregnant women is c2, the physiological index words of pregnant women is b2, the physiological index words of pregnant women are c2, the vector similarity of the physiological index words of pregnant women is 0.93, and the rest words are not repeated, so that referring to fig. 2, fig. 2 is a schematic diagram of the perinatal medicine research database provided in the embodiment of the application.

The system for constructing the perinatal medical research database provided by the embodiment of the application is described below, and the system for constructing the perinatal medical research database described below and the method for constructing the perinatal medical research database described above can be correspondingly referred to each other.

Referring to fig. 3, fig. 3 is a schematic structural diagram of a perinatal medical study database construction system provided in an embodiment of the present application, which may include:

The data extraction module 301 is configured to perform data extraction on the perinatal medical record of each pregnant woman, so as to obtain a medical text word of each pregnant woman; the medical text words comprise pregnant woman information words, pregnant woman physiological index words, fetal physiological index words, delivery mode information words and maternal and infant health result words;

The vector representation output module 302 is configured to input a medical text word of each pregnant woman into a perinatal medical RoBERTa model, and obtain a target text vector representation and a vector representation node level thereof output by the perinatal medical RoBERTa model; the vector representation node hierarchy includes a root node hierarchy and a leaf hierarchy;

a first data tree construction module 303, configured to perform node connection on the target text vector representation of each pregnant woman according to the vector representation node level, and construct a single-chain perinatal medical study data tree of each pregnant woman;

A similarity calculation module 304, configured to calculate a vector similarity of the target text vector representation of the leaf level between any two pregnant women;

the second data tree construction module 305 is configured to perform node connection according to the vector similarity of any two pregnant women, and construct a multi-chain perinatal medical study data tree between any two pregnant women;

The perinatal medical database construction module 306 is configured to construct a perinatal medical study database based on the single-chain perinatal medical study data tree of each pregnant woman and the multi-chain perinatal medical study data tree between any two pregnant women;

In an alternative example, vector representation output module 302 is further configured to:

Inputting the medical text words of each pregnant woman to a vector coding layer to obtain an initial coding vector of the medical text words of each pregnant woman output by the vector coding layer; wherein the initial encoding vector of the medical text words of each pregnant woman is that I is the vector code value of the medical text word of each pregnant woman, and the vector code valueA is a text characteristic value of each medical text word, u is a word characteristic value of each medical text word, v is a semantic characteristic value of each medical text word, n is a length of each medical text word, and T and B are coefficients of vector coding learning representation;

inputting an initial coding vector of the medical text word of each pregnant woman to the feature fusion layer to obtain an initial text vector representation of the medical text word of each pregnant woman output by the feature fusion layer;

and inputting the initial text vector representation of the medical text words of each pregnant woman to a dropout overfitting prevention layer with the loss rate of 0.5 to avoid overfitting, and obtaining the target text vector representation of the medical text words of each pregnant woman output by the overfitting prevention layer.

Inputting the medical text words of each pregnant woman to an embedding layer, and converting each word unit in the medical text words of each pregnant woman into a first code vector with one dimension by the embedding layer; the first coding vector has three dimensions, namely the number of descriptive texts and the number of word units contained in the descriptive texts, and the embedding dimension of each word unit;

inputting the first coding vector into a position coding layer, marking the position of each word unit in the descriptive text by the position coding layer to obtain a first coding array, and superposing the first coding array and the first coding vector to obtain a second coding vector;

Inputting the second coding vector into an attention mechanism, performing linear mapping on the second coding vector by the attention mechanism to obtain a Query vector, a Key Key vector and a Value vector, solving an attention matrix according to the Query vector and the Key Key vector, and performing weighting processing on the Value vector according to the attention matrix to obtain a first hidden layer vector corresponding to the second coding vector;

an initial encoding vector of the medical text word of each pregnant woman is obtained based on the first hidden layer vector.

Inputting the second coding vector and the first hidden layer vector into a first residual error standardization layer, carrying out residual error connection on the second coding vector and the first hidden layer vector by the first residual error standardization layer to obtain a first spliced vector, and carrying out normalization processing on the first spliced vector by the first residual error standardization layer to obtain a second hidden layer vector;

Inputting the second hidden layer vector into a feedforward network, performing two-layer linear mapping on the second hidden layer vector by the feedforward network, and processing the second hidden layer vector through a first activation function to obtain a third hidden layer vector of the second hidden layer vector;

Inputting the second hidden layer vector and the third hidden layer vector into a second residual error standardization layer, carrying out residual error connection on the second hidden layer vector and the third hidden layer vector by the second residual error standardization layer to obtain a second spliced vector, and carrying out normalization processing on the second spliced vector by the second residual error standardization layer to obtain an initial coding vector of the medical text word of each pregnant woman.

In an alternative example, the similarity calculation module 304 is further configured to:

Obtaining two text vector representations to be matched, which are positioned at the same leaf level and at the same branch node, in target text vector representations of leaf levels of any two pregnant women; the two text vector representations to be matched are a first text vector representation s _i＝{w₁,w₂,…,w_i to be matched and a second text vector representation r _j＝{e₁,e₂,…,e_j to be matched respectively;

Obtaining a maximum Common substring Common (w _i,e_j) between the first text vector representation to be matched s _i and the second text vector representation to be matched r _i;

And calculating the vector similarity between the two text vector representations to be matched based on the first text vector representation to be matched s _i, the second text vector representation to be matched r _i and the maximum Common substring Common (w _i,e_j).

And calculating the word similarity between two text vector representations to be matched, wherein the specific formula is as follows:

calculating the matching similarity between two text vector representations to be matched, wherein the specific formula is as follows:

Based on the word similarity and the matching similarity, calculating the vector similarity between two text vector representations to be matched, wherein the specific formula is as follows:

In an alternative example, the perinatal medical study database construction system further comprises:

Obtaining the maximum connection number of any leaf node in the leaf level in the single-chain perinatal medical research data tree of each pregnant woman and all candidate node links;

Calculating a relevance score of each candidate node link based on the vector similarity of each candidate node link, and sorting each candidate node link according to the relevance score of each candidate node link to obtain sorted node links;

Determining the node links with the largest connection quantity before in the ordered node links as target node links of any leaf node in a leaf level in a single-chain perinatal medical research data tree of each pregnant woman;

The calculation formula of the relevance score of each candidate node link is as follows:

The specific embodiment of the perinatal medical research database construction system provided by the application is basically the same as each embodiment of the perinatal medical research database construction method, and is not described in detail herein.

Optionally, as shown in fig. 4, fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device may include: processor 410, communication interface (Communication Interface) 420, memory 430, and communication bus 440, wherein processor 410, communication interface 420, and memory 430 communicate with each other via communication bus 440. Processor 410 may invoke a computer program in memory 430 to perform the steps of the method of construction of a perinatal medical study database, for example including:

In an alternative example, inputting the medical text word of each pregnant woman into the perinatal medicine RoBERTa model, obtaining the target text vector representation output by the perinatal medicine RoBERTa model, including:

In an alternative example, inputting the medical text word of each pregnant woman to a vector encoding layer, obtaining an initial encoding vector of the medical text word of each pregnant woman output by the vector encoding layer, including:

In an alternative example, obtaining an initial encoding vector of the medical text word for each pregnant woman based on the first hidden layer vector includes:

In an alternative example, calculating the vector similarity of the leaf-level target text vector representation between any two pregnant women includes:

In an alternative example, calculating the vector similarity between two text vector representations to be matched based on the first text vector representation to be matched s _i, the second text vector representation to be matched r _i, and the maximum Common substring Common (w _i,e_j), includes:

In an alternative example, the perinatal medical study database construction method further comprises:

Further, the logic instructions in the memory 430 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, embodiments of the present application further provide a non-transitory computer readable storage medium, where the non-transitory computer readable storage medium includes a computer program, where the computer program may be stored on the non-transitory computer readable storage medium, and when the computer program is executed by a processor, the computer program may be capable of executing the steps of the method for building the perinatal medical study database provided in the foregoing embodiments, for example, including:

In still another aspect, an embodiment of the present application further provides a computer product, where the computer product includes a computer program, where the computer program may be stored on the computer product, and when the computer program is executed by a processor, the computer is capable of executing the steps of the method for building a database of perinatal medical study provided in the foregoing embodiments, for example, including:

The system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

1. A method for constructing a perinatal medical research database, comprising:

The perinatal medical RoBERTa model is obtained based on a perinatal medical text sample and a text vector representation and vector representation node level training corresponding to the perinatal medical text sample;

Wherein the calculating the vector similarity of the target text vector representation of the leaf level between any two pregnant women comprises:

obtaining a maximum Common substring Common (w _i,e_j) between the first text vector representation to be matched s _i and the second text vector representation to be matched r _j;

Calculating a vector similarity between two text vector representations to be matched based on the first text vector representation to be matched s _i, the second text vector representation to be matched r _j, and the maximum Common substring Common (w _i,e_j);

The calculating the vector similarity between two text vector representations to be matched based on the first text vector representation to be matched s _i, the second text vector representation to be matched r _j, and the maximum Common substring Common (w _i,e_j) includes:

Wherein WordSim (s _i,r_j) is word similarity, twoGramSim (s _i,r_j) is matching similarity, and RelSim (s _i,r_j) is vector similarity;

The construction method of the perinatal medical research database further comprises the following steps:

Where f (s _i,r_j) is a relevance score and n is the length of the medical text word of the leaf node corresponding to the candidate node link.

2. The perinatal medical research database construction method according to claim 1, wherein the perinatal medical RoBERTa model comprises a vector coding layer, a feature fusion layer and an overfitting prevention layer;

Inputting the medical text words of each pregnant woman into a perinatal medicine RoBERTa model to obtain a target text vector representation output by the perinatal medicine RoBERTa model, wherein the method comprises the following steps:

Inputting the medical text words of each pregnant woman to a vector coding layer to obtain an initial coding vector of the medical text words of each pregnant woman output by the vector coding layer; wherein the initial encoding vector of the medical text words of each pregnant woman is that I is the vector code value of the medical text word of each pregnant woman, and the vector code valueA is a text characteristic value of each medical text word, u is a word characteristic value of each medical text word, v is a semantic characteristic value of each medical text word, m is a length of each medical text word, and T and B are coefficients of vector coding learning representation;

3. The perinatal medical research database construction method as recited in claim 2, wherein the vector coding layer comprises an embedding layer, a position coding layer and an attention mechanism;

The step of inputting the medical text words of each pregnant woman to a vector coding layer to obtain initial coding vectors of the medical text words of each pregnant woman output by the vector coding layer comprises the following steps:

inputting the medical text words of each pregnant woman to an embedding layer, and converting each word unit in the medical text words of each pregnant woman into a first dimension code vector by the embedding layer; the first coding vector has three dimensions, namely the number of descriptive texts, the number of word units contained in the descriptive texts and the embedding dimension of each word unit;

4. The perinatal medical research database construction method as recited in claim 3, wherein the vector encoding layer further comprises a first residual normalization layer, a feed-forward network, and a second residual normalization layer;

The obtaining the initial coding vector of the medical text word of each pregnant woman based on the first hidden layer vector comprises the following steps:

5. A perinatal medical research database construction system, comprising:

6. An electronic device comprising a processor and a memory, the memory storing a plurality of instructions; the processor loads instructions from the memory to perform the perinatal medical research database construction method as claimed in any one of claims 1 to 4.

7. A computer readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the perinatal medical research database construction method as defined in any one of claims 1 to 4.