CN116186262A - Menstrual disorder typing system, menstrual disorder typing method, electronic device, and recording medium - Google Patents

Menstrual disorder typing system, menstrual disorder typing method, electronic device, and recording medium Download PDF

Info

Publication number
CN116186262A
CN116186262A CN202310168401.7A CN202310168401A CN116186262A CN 116186262 A CN116186262 A CN 116186262A CN 202310168401 A CN202310168401 A CN 202310168401A CN 116186262 A CN116186262 A CN 116186262A
Authority
CN
China
Prior art keywords
menstrual disorder
text
description text
case
case description
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310168401.7A
Other languages
Chinese (zh)
Inventor
杜登斌
陈昊
张永卫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuzheng Intelligent Technology Beijing Co ltd
Original Assignee
Wuzheng Intelligent Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuzheng Intelligent Technology Beijing Co ltd filed Critical Wuzheng Intelligent Technology Beijing Co ltd
Priority to CN202310168401.7A priority Critical patent/CN116186262A/en
Publication of CN116186262A publication Critical patent/CN116186262A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Theoretical Computer Science (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Computational Linguistics (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The embodiment of the invention discloses a menstrual disorder parting system, a menstrual disorder parting method, electronic equipment and a storage medium, wherein the menstrual disorder parting method comprises the following steps: constructing a text description corpus based on menstrual disorder case description text; extracting main characteristic information of each menstrual disorder case description text in a text description corpus, and constructing a standard database; extracting feature vectors corresponding to main feature information in a standard database to obtain feature vector sets corresponding to the description text of each menstrual disorder case; extracting feature vectors of the case description text to be typed, and respectively calculating the similarity between the feature vectors of the case description text to be typed and elements in a feature vector set by using cosine measurement; and carrying out case matching and typing on the text to be typed according to the similarity, and obtaining the disease type and prescription information corresponding to the description text of the case to be typed. The menstrual disorder typing method solves the problem that the traditional Chinese medicine menstrual disorder type cannot be intelligently identified in the prior art.

Description

Menstrual disorder typing system, menstrual disorder typing method, electronic device, and recording medium
Technical Field
The present invention relates to the field of computer technology, and in particular, to a menstrual disorder typing system, a menstrual disorder typing method, an electronic device, and a storage medium.
Background
Menstrual disorder is one of common gynecological diseases, and is manifested by abnormal menstrual cycle or abnormal bleeding volume, and common types include menorrhagia, hypomenorrhea, menorrhagia, and diluted menstruation; traditional Chinese medicine is divided into different types of menstrual disorder according to the cause, symptoms and pulse conditions of menstrual disorder, and the corresponding important formulas are selected for treatment according to the types of menstrual disorder.
At present, the type of menstrual disorder is judged by on-site diagnosis in traditional Chinese medicine, and the symptoms of menstrual disorder patients cannot be intelligently analyzed.
Disclosure of Invention
The embodiment of the invention aims to provide a menstrual disorder typing system, a menstrual disorder typing method, electronic equipment and a storage medium, which are used for solving the problem that the traditional Chinese medicine menstrual disorder type cannot be intelligently identified in the prior art.
To achieve the above object, an embodiment of the present invention provides a menstrual disorder typing method, the method specifically including:
collecting a certain number of menstrual disorder case descriptive texts;
constructing a text description corpus based on the menstrual disorder case description text;
extracting main characteristic information of each menstrual disorder case description text in the text description corpus by using a TextRank algorithm, and constructing a standard database;
extracting feature vectors corresponding to main feature information in the standard database to obtain feature vector sets corresponding to the description text of each menstrual disorder case;
extracting feature vectors of the case description text to be typed, and respectively calculating the similarity between the feature vectors of the case description text to be typed and elements in the feature vector set by using cosine measurement;
and performing case matching and typing on the text to be typed according to the similarity to obtain the disease type and prescription information corresponding to the case description text to be typed.
Based on the technical scheme, the invention can also be improved as follows:
further, extracting main characteristic information of each menstrual disorder case description text by a TextRank algorithm, and constructing a standard database, wherein the standard database comprises;
calculating main characteristic information of the descriptive text of each menstrual disorder case by a formula 1;
Figure SMS_1
wherein d represents a damping coefficient, and is generally set to 0.85; v (V) i Representing any node in the way; in (V) i ) The representation points to vertex V i Is defined by a vertex set; out (V) j ) Represented by vertex V j All vertex sets connected out; w (w) ij Representing the vertex V i and Vj Is a connection weight of (2); WS (V) i ) Representing the vertex V i Is added to the final ranking weights of (a).
Further, the extracting the feature vector corresponding to the main feature information in the standard database to obtain a feature vector set corresponding to each menstrual disorder case description text includes:
calculating word frequency of the menstrual disorder case description text description through a formula 2;
Figure SMS_2
wherein ,ni,j Is the number of times the vocabulary appears in the menstrual disorder case description text dj,
Figure SMS_3
is the sum of the times of occurrence of all words in menstrual disorder case description text dj;
calculating an inverse document frequency by formula 3;
Figure SMS_4
where |d| is the total number of menstrual disorder case description text in the text description corpus; | { j: t is t i ∈d j The } | represents the number of menstrual disorder case descriptive text containing the word ti; if the term is not in the text description corpus, it will result in zero denominator, so 1+| { j is typically used: t is t i ∈d j }|;
Calculating a TF-IDF value through a formula 4;
TF-IDF=tf ij -idf i equation 4;
after the TF-IDF value of each word in the menstrual disorder case description text is calculated, descending order is carried out, a plurality of words with TF-IDF values higher than a set threshold value are selected as keywords, and feature vectors are constructed according to the keywords and the corresponding TF-IDF values, so that a feature vector set corresponding to each menstrual disorder case description text is obtained.
Further, extracting feature vectors of the case description text to be typed, and respectively calculating the similarity between the feature vectors of the case description text to be typed and elements in the feature vector set by using cosine measurement, wherein the method comprises the following steps:
and respectively calculating the similarity between the feature vector of the case description text to be typed and the elements in the feature vector set through a formula 5:
Figure SMS_5
wherein, I x I I is vector x= (x) 1 ,x 2 ,x 3 ,...,x p ) Is defined as the Euclidean norm of
Figure SMS_6
Conceptually, it is the length of vector x.
A menstrual disorder typing system comprising:
the acquisition module is used for acquiring a certain number of menstrual disorder case description texts;
a first construction module for constructing a text description corpus based on the menstrual disorder case description text;
the first extraction module is used for extracting main characteristic information of each menstrual disorder case description text in the text description corpus through a TextRank algorithm;
the second construction module is used for constructing a standard database;
the second extraction module is used for extracting the feature vector of the case description text to be typed;
the similarity calculation module is used for calculating the similarity between the feature vector of the case description text to be typed and the elements in the feature vector set by using cosine measurement;
and the parting module is used for carrying out case matching and parting on the text to be parting according to the similarity to obtain the disease type and prescription information corresponding to the text to be parting case description.
Further, the first extraction module is further configured to:
calculating main characteristic information of the descriptive text of each menstrual disorder case by a formula 1;
Figure SMS_7
wherein d represents a damping coefficient, and is generally set to 0.85; v (V) i Representing any node in the way; in (V) i ) The representation points to vertex V i Is defined by a vertex set; out (V) j ) Represented by vertex V j All vertex sets connected out; w (w) ij Representing the vertex V i and Vj Is a connection weight of (2); WS (V) i ) Representing the vertex V i Is added to the final ranking weights of (a).
Further, the second extraction module is further configured to:
calculating word frequency of the menstrual disorder case description text description through a formula 2;
Figure SMS_8
wherein ,ni,j Is the number of times the vocabulary appears in the menstrual disorder case description text dj,
Figure SMS_9
is the sum of the times of occurrence of all words in menstrual disorder case description text dj;
calculating an inverse document frequency by formula 3;
Figure SMS_10
where |d| is the total number of menstrual disorder case description text in the text description corpus; | { j: t is t i ∈d j The } | represents the number of menstrual disorder case descriptive text containing the word ti; if the term is not in the text description corpus, it will result in zero denominator, so 1+| { j is typically used: t is t i ∈d j }|;
Calculating a TF-IDF value through a formula 4;
TF-IDF=tf ij -idf i equation 4;
after the TF-IDF value of each word in the menstrual disorder case description text is calculated, descending order is carried out, a plurality of words with TF-IDF values higher than a set threshold value are selected as keywords, and feature vectors are constructed according to the keywords and the corresponding TF-IDF values, so that a feature vector set corresponding to each menstrual disorder case description text is obtained.
Further, the similarity calculation module further includes:
and respectively calculating the similarity between the feature vector of the case description text to be typed and the elements in the feature vector set through a formula 5:
Figure SMS_11
wherein, I x I I is vector x= (x) 1 ,x 2 ,x 3 ,...,x p ) Is defined as the Euclidean norm of
Figure SMS_12
Conceptually, it is the length of vector x.
An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method when the computer program is executed.
A non-transitory computer readable medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method.
The embodiment of the invention has the following advantages:
the menstrual disorder typing method of the invention collects a certain amount of menstrual disorder case descriptive text; constructing a text description corpus based on the menstrual disorder case description text; extracting main characteristic information of each menstrual disorder case description text in the text description corpus by using a TextRank algorithm, and constructing a standard database; extracting feature vectors corresponding to main feature information in the standard database to obtain feature vector sets corresponding to the description text of each menstrual disorder case; extracting feature vectors of the case description text to be typed, and respectively calculating the similarity between the feature vectors of the case description text to be typed and elements in the feature vector set by using cosine measurement; performing case matching and typing on the text to be typed according to the similarity to obtain the disease type and prescription information corresponding to the case description text to be typed; solves the problem that the traditional Chinese medicine menstrual disorder type cannot be intelligently identified in the prior art.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those of ordinary skill in the art that the drawings in the following description are exemplary only and that other implementations can be obtained from the extensions of the drawings provided without inventive effort.
The structures, proportions, sizes, etc. shown in the present specification are shown only for the purposes of illustration and description, and are not intended to limit the scope of the invention, which is defined by the claims, so that any structural modifications, changes in proportions, or adjustments of sizes, which do not affect the efficacy or the achievement of the present invention, should fall within the ambit of the technical disclosure.
FIG. 1 is a flow chart of a method of typing menstrual disorder according to the present invention;
FIG. 2 is a block diagram of a menstrual disorder typing system of the present invention;
fig. 3 is a schematic diagram of an entity structure of an electronic device according to the present invention.
Wherein the reference numerals are as follows:
the system comprises an acquisition module 10, a first construction module 20, a first extraction module 30, a second construction module 40, a second extraction module 50, a similarity calculation module 60, a parting module 70, an electronic device 80, a processor 801, a memory 802 and a bus 803.
Detailed Description
Other advantages and advantages of the present invention will become apparent to those skilled in the art from the following detailed description, which, by way of illustration, is to be read in connection with certain specific embodiments, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Examples
Fig. 1 is a flowchart of an embodiment of a menstrual disorder typing method according to the present invention, as shown in fig. 1, wherein the menstrual disorder typing method according to the embodiment of the present invention includes the steps of:
s101, acquiring a certain number of menstrual disorder case description texts;
specifically, traditional Chinese medicine considers that menstrual water flows out of kidneys, female irregular menstruation is related to kidney functions, spleen, liver, qi and blood, pulse, conception vessel and uterus. The disease is mainly caused by seven emotions or exogenous six exogenous pathogenic factors, or congenital kidney qi deficiency, excessive sexual overstrain and overstrain, which causes the impairment of viscera qi, kidney liver spleen dysfunction, qi-blood imbalance, leading to the impairment of thoroughfare and conception vessels, which is irregular menstruation. Menorrhagia, which is mostly caused by internal heat, blood deficiency or blood stasis, resulting in boiling, overflow or inability to manage, and random flooding, is usually marked by the need of observing the condition of the patient and distinguishing the treatment from the menstruation color; and the hypomenorrhea is mostly caused by blood deficiency, blood stasis, phlegm dampness and qi stagnation blocking the qi and blood passage and unsmooth blood.
Menstrual disorder case descriptive text includes etiology, menstrual cycle, menstrual period, menstrual blood color, menstrual blood volume, duration, complications, and the like. For example, normal blood is dark red, and small fragments of the endometrium, cervical mucus and vaginal epithelial cells which fall off are mixed in the blood, so that the blood is free of blood clots; if the menstrual blood is thin like water, it is only a little pink or black and purple, and it is abnormal. If the menstrual blood is completely coagulated blood, the menstrual blood is abnormal, and the part possibly with bleeding should seek medical care early, so that the health of the body is ensured. For another example, a typical female menstrual cycle is 28 to 30 days, but there is also 40 days for a menstrual cycle. But all are normal conditions as long as they are regular. In addition, menstruation is easily affected by various factors, so that it is a normal phenomenon 3 to 5 days after the advance or misplacement. If the menstrual cycle is 20 days, 40 days next, and the situation often occurs that some menstrual cycles are even from 1 to 2 days, and the period is lost after more than 10 days for 1 to 2 days, which belongs to irregular menstruation. In the primary tide of girls, the functions of the ovaries are imperfect, so that dysfunction and irregularity can occur, which is not a pathological phenomenon.
S102, constructing a text description corpus based on menstrual disorder case description text;
s103, extracting main characteristic information of each menstrual disorder case description text in a text description corpus by using a TextRank algorithm, and constructing a standard database;
specifically, main characteristic information of the descriptive text of each menstrual disorder case is calculated by the formula 1;
Figure SMS_13
wherein d represents a damping coefficient, and is generally set to 0.85; v (V) i Representing any node in the way; in (V) i ) The representation points to vertex V i Is defined by a vertex set; out (V) j ) Represented by vertex V j All vertex sets connected out; w (w) ij Representing the vertex V i and Vj Is a connection weight of (2); WS (V) i ) Representing the vertex V i Is added to the final ranking weights of (a).
Blood deficiency type irregular menstruation: the symptoms are that after the menstrual period is prolonged, the amount is small, the color is light red, no lump exists, or the pain of the lower abdomen is caused; or dizziness, dim eyesight, palpitation, insomnia, pale complexion or sallow complexion, pale red tongue and weak pulse; 2) Menoxenia due to blood cold: the symptoms are that after the menstrual period is prolonged, the amount is small, the color is dark red or blood clots are present, the pain is caused by cold glue, the heat is relieved, the cold limbs are averse, the tongue coating is white, and the pulse is deep and tight; 3) Blood heat type irregular menstruation: the syndrome has a large dosage, bright red or deep red, thick and viscous; or small blood clots, with symptoms of vexation, thirst, yellow urine, constipation, red tongue, yellow coating, slippery and rapid pulse. Etc.;
TextRank is an algorithm based on graph ordering, the idea is derived from the Pagerank algorithm of Google, a text is divided into a plurality of constituent units (words and sentences) and a graph model is established, important components in the text are ordered by using a voting mechanism, and keyword extraction can be realized by using information of a single document.
TextRank uses the principle of voting, with each word prizing its neighbors, the weight of the vote being dependent on its own number of votes. Assuming that each word is a Vertex (Vertex), then all words form a network in which each Vertex has edges pointing to other vertices and also other vertices point to their own edges. And calculating the weight sum of the vertexes pointing to each vertex connected with each vertex, and finally obtaining the weight value of the vertex.
The main problem with TextRank is the determination of the initial value, which is assigned a non-0 value for simplicity of subsequent calculations. At the same time, a concept of a damping coefficient is introduced, which represents the probability from a given vertex to any other vertex.
And S104, extracting feature vectors corresponding to the main feature information in the standard database to obtain feature vector sets corresponding to the description text of each menstrual disorder case.
Specifically, word frequency of the menstrual disorder case description text description is calculated through a formula 2;
Figure SMS_14
wherein ,ni,j Is the number of times the vocabulary appears in the menstrual disorder case description text dj,
Figure SMS_15
is the sum of the times of occurrence of all words in menstrual disorder case description text dj;
calculating an inverse document frequency by formula 3;
Figure SMS_16
where |d| is the total number of menstrual disorder case description text in the text description corpus; | { j: t is t i ∈d j The } | represents the number of menstrual disorder case descriptive text containing the word ti; if the term is not in the text description corpus, it will result in zero denominator, so 1+| { j is typically used: t is t i ∈d j }|;
Calculating a TF-IDF value through a formula 4;
TF-IDF=tf ij -idf i equation 4;
after the TF-IDF value of each word in the menstrual disorder case description text is calculated, descending order is carried out, a plurality of words with TF-IDF values higher than a set threshold value are selected as keywords, and feature vectors are constructed according to the keywords and the corresponding TF-IDF values, so that a feature vector set corresponding to each menstrual disorder case description text is obtained.
S105, extracting feature vectors of the case description text to be typed, and respectively calculating the similarity between the feature vectors of the case description text to be typed and elements in the feature vector set by using cosine measurement.
Specifically, similarity between the feature vector of the case description text to be typed and the elements in the feature vector set is calculated through a formula 5 respectively:
let x, y be the two vectors to be compared, using the cosine metric as the similarity function:
Figure SMS_17
wherein, I x I I is vector x= (x) 1 ,x 2 ,x 3 ,...,x p ) Is defined as the Euclidean norm of
Figure SMS_18
Conceptually, it is the length of vector x;
the cosine value of the angle of 0 degree is 1, and the cosine value of any other angle is not more than 1; and its minimum value is-1. The cosine value of the angle between the two vectors thus determines whether the two vectors point approximately in the same direction. When the two vectors have the same direction, the cosine similarity value is 1; when the included angle of the two vectors is 90 degrees, the cosine similarity value is 0; when the two vectors point in diametrically opposite directions, the cosine similarity has a value of-1. This results in dependence on the length of the vector, only on the pointing direction of the vector. Cosine similarity is usually used for positive space and therefore gives values between-1 and 1.
And S106, performing case matching and typing on the text to be typed according to the similarity to obtain the disease type and prescription information corresponding to the description text of the case to be typed.
Specifically, the types of disorders specifically include blood-heat type; 1) Blood heat, liver depression transforming into heat, qi deficiency and blood deficiency. Specifically, the blood is febrile. Symptoms: menoxenia, red menstrual blood or purple or deep red, sticky and thick texture, vexation of heart and chest, dry face and mouth, dry throat and mouth, flushed face, yellow urine and stool, red tongue and yellow tongue fur. Treatment: it is suitable for clearing heat and cooling blood, and can be taken as a pill, capsule, etc.; 2) Liver depression transforming into heat. Symptoms: menoxenia, obstruction of menstruation, chest, hypochondrium, breast and lower abdominal distention and pain, chest distress, irritability or frequent sighing, belch, anorexia, red or purple menstrual blood, red tongue edge, bitter taste, dry throat, thin and yellow tongue fur. Treatment: liver soothing and qi relieving Yu Qingre, and can be used for treating menoxenia, leukorrhagia, and other diseases; 3) Qi deficiency type. Symptoms: early menstruation or prolonged menstruation, with symptoms of multiple colors, thin and clear quality, listlessness, debilitation, palpitation, short breath, loose stool, empty lower abdomen, and pale tongue with thin coating. Treatment: for invigorating qi and blood, it can be taken as pill for invigorating middle-jiao and qi, and pill for invigorating spleen; 4) Blood deficiency type. Symptoms: after the menstrual period is wrong, the symptoms of hypofunction and thin quality, dizziness, palpitation, insomnia, dreaminess, sallow complexion, pale tongue and little coating are caused. Treatment: replenishing blood and replenishing qi to replenish the body fluid, can be administered in the form of tablet, FUNING pill, BAZHENYIMU pill, radix Angelicae sinensis blood replenishing paste, BABAKUNSHU pill, SHIZHENXIANGFU pill, ning Kun ZHIBAODAN, JIAWEIYIMU paste, FUYANGSHIWEI tablet, ANKUNZANYU pill, SHENRONGBAIFENG pill, etc.
FIG. 2 is a flow chart of an embodiment of a menstrual disorder typing system according to the present invention; as shown in fig. 2, the menstrual disorder typing system according to an embodiment of the present invention includes the steps of:
an acquisition module 10 for acquiring a number of menstrual disorder case descriptive texts;
a first construction module 20 for constructing a text description corpus based on the menstrual disorder case description text;
a first extracting module 30, configured to extract main feature information of each of the menstrual disorder case description texts in the text description corpus through a TextRank algorithm;
the first extraction module 30 is further configured to:
calculating main characteristic information of the descriptive text of each menstrual disorder case by a formula 1;
Figure SMS_19
wherein d represents a damping coefficient, and is generally set to 0.85; v (V) i Representing any node in the way; in (V) i ) The representation points to vertex V i Is defined by a vertex set; out (V) j ) Represented by vertex V j All vertex sets connected out; omega ij Representing the vertex V i and Vj Is a connection weight of (2); WS (V) i ) Representing the vertex V i Is added to the final ranking weights of (a).
A second construction module 40 for constructing a standard database;
a second extracting module 50, configured to extract feature vectors of the case description text to be typed;
the second extraction module 50 is further configured to:
calculating word frequency of the menstrual disorder case description text description through a formula 2;
Figure SMS_20
wherein ,ni,j Is the number of times the vocabulary appears in the menstrual disorder case description text dj,
Figure SMS_21
is the sum of the times of occurrence of all words in menstrual disorder case description text dj;
calculating an inverse document frequency by formula 3;
Figure SMS_22
where |d| is the total number of menstrual disorder case description text in the text description corpus; | { j: t is t i ∈d j The } | represents the number of menstrual disorder case descriptive text containing the word ti; if the term is not in the text description corpus, it will result in zero denominator, so 1+| { j is typically used: t is t i ∈d j }|;
Calculating a TF-IDF value through a formula 4;
TF-IDF=tf ij -idf i equation 4;
after the TF-IDF value of each word in the menstrual disorder case description text is calculated, descending order is carried out, a plurality of words with TF-IDF values higher than a set threshold value are selected as keywords, and feature vectors are constructed according to the keywords and the corresponding TF-IDF values, so that a feature vector set corresponding to each menstrual disorder case description text is obtained.
A similarity calculating module 60, configured to calculate similarities between feature vectors of the case description text to be typed and elements in the feature vector set using cosine metrics, respectively;
the similarity calculation module 60 further includes:
and respectively calculating the similarity between the feature vector of the case description text to be typed and the elements in the feature vector set through a formula 5:
Figure SMS_23
wherein, I x I I is vector x= (x) 1 ,x 2 ,x 3 ,...,x p ) Is defined as the Euclidean norm of
Figure SMS_24
Conceptually, it is the length of vector x.
And the typing module 70 is used for performing case matching and typing on the text to be typed according to the similarity, and obtaining the disease type and prescription information corresponding to the text to be typed case description.
According to the menstrual disorder typing system, a certain number of menstrual disorder case description texts are collected through the collection module 10, a text description corpus is built through the first building module 20 based on the menstrual disorder case description texts, main characteristic information of each menstrual disorder case description text in the text description corpus is extracted through the first extraction module 30, a standard database is built through the second building module 40, characteristic vectors of the case description texts to be typed are extracted through the second extraction module 50, cosine measures are used for respectively calculating the similarity between the characteristic vectors of the case description texts to be typed and elements in the characteristic vector set through the similarity calculation module 60, case matching and typing are carried out on the case description texts to be typed according to the similarity, so that symptom types and prescription information corresponding to the case description texts to be typed are obtained, a standard database of association relations between different case typing standards and corresponding main characteristic information is built based on traditional Chinese medicine dialectical typing standards of current female menstrual disorder, meanwhile, a space model is applied to standard characteristic extraction, the problem of effective typing and characteristic extraction can be effectively solved, the important characteristic information is greatly reduced, the intelligent diagnosis accuracy is greatly improved, and the characteristic information is greatly identified.
Fig. 3 is a schematic diagram of an entity structure of an electronic device according to an embodiment of the present invention, as shown in fig. 3, an electronic device 80 includes: a processor 801 (processor), a memory 802 (memory), and a bus 803;
the processor 801 and the memory 802 complete communication with each other through the bus 803;
the processor 801 is configured to invoke program instructions in the memory 802 to perform the methods provided by the above-described method embodiments, including, for example: collecting a certain number of menstrual disorder case descriptive texts; constructing a text description corpus based on the menstrual disorder case description text; extracting main characteristic information of each menstrual disorder case description text in the text description corpus by using a TextRank algorithm, and constructing a standard database; extracting feature vectors corresponding to main feature information in the standard database to obtain feature vector sets corresponding to the description text of each menstrual disorder case; extracting feature vectors of the case description text to be typed, and respectively calculating the similarity between the feature vectors of the case description text to be typed and elements in the feature vector set by using cosine measurement; and performing case matching and typing on the text to be typed according to the similarity to obtain the disease type and prescription information corresponding to the case description text to be typed.
The present embodiment provides a non-transitory computer readable medium storing computer instructions that cause a computer to perform the methods provided by the above-described method embodiments, for example, including: collecting a certain number of menstrual disorder case descriptive texts; constructing a text description corpus based on the menstrual disorder case description text; extracting main characteristic information of each menstrual disorder case description text in the text description corpus by using a TextRank algorithm, and constructing a standard database; extracting feature vectors corresponding to main feature information in the standard database to obtain feature vector sets corresponding to the description text of each menstrual disorder case; extracting feature vectors of the case description text to be typed, and respectively calculating the similarity between the feature vectors of the case description text to be typed and elements in the feature vector set by using cosine measurement; and performing case matching and typing on the text to be typed according to the similarity to obtain the disease type and prescription information corresponding to the case description text to be typed.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.
The apparatus embodiments described above are merely illustrative, wherein elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable medium such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method of the respective embodiments or parts of the embodiments.
While the invention has been described in detail in the foregoing general description and specific examples, it will be apparent to those skilled in the art that modifications and improvements can be made thereto. Accordingly, such modifications or improvements may be made without departing from the spirit of the invention and are intended to be within the scope of the invention as claimed.

Claims (10)

1. A method for typing menstrual disorder, the method comprising:
collecting a certain number of menstrual disorder case descriptive texts;
constructing a text description corpus based on the menstrual disorder case description text;
extracting main characteristic information of each menstrual disorder case description text in the text description corpus by using a TextRank algorithm, and constructing a standard database;
extracting feature vectors corresponding to main feature information in the standard database to obtain feature vector sets corresponding to the description text of each menstrual disorder case;
extracting feature vectors of the case description text to be typed, and respectively calculating the similarity between the feature vectors of the case description text to be typed and elements in the feature vector set by using cosine measurement;
and performing case matching and typing on the text to be typed according to the similarity to obtain the disease type and prescription information corresponding to the case description text to be typed.
2. The menstrual disorder typing method according to claim 1, wherein the extracting of main feature information of the descriptive text of each of the menstrual disorder cases by TextRank algorithm constructs a standard database, comprising;
calculating main characteristic information of the descriptive text of each menstrual disorder case by a formula 1;
Figure QLYQS_1
wherein d represents a damping coefficient, and is generally set to 0.85; v (V) i Representing any node in the way; in (V) i ) The representation points to vertex V i Is defined by a vertex set; out (V) j ) Represented by vertex V j All vertex sets connected out; omega ij Representing the vertex V i and Vj Is a connection weight of (2); WS (V) i ) Representing the vertex V i Is added to the final ranking weights of (a).
3. The menstrual disorder typing method according to claim 1, wherein the extracting feature vectors corresponding to the main feature information in the standard database to obtain a feature vector set corresponding to each menstrual disorder case description text comprises:
calculating word frequency of the menstrual disorder case description text description through a formula 2;
Figure QLYQS_2
wherein ,ni,j Is the number of times the vocabulary appears in the menstrual disorder case description text dj,
Figure QLYQS_3
is the sum of the times of occurrence of all words in menstrual disorder case description text dj;
calculating an inverse document frequency by formula 3;
Figure QLYQS_4
where |d| is the total number of menstrual disorder case description text in the text description corpus; |j: t is t i ∈d j The } | represents the number of menstrual disorder case descriptive text containing the word ti; if the term is not in the text description corpus, it will result in zero denominator, so 1+| { j is typically used: t is t i ∈d j }|;
Calculating a TF-IDF value through a formula 4;
TF-IDF=tf ij -idf i equation 4;
after the TF-IDF value of each word in the menstrual disorder case description text is calculated, descending order is carried out, a plurality of words with TF-IDF values higher than a set threshold value are selected as keywords, and feature vectors are constructed according to the keywords and the corresponding TF-IDF values, so that a feature vector set corresponding to each menstrual disorder case description text is obtained.
4. The method for typing menstrual disorder according to claim 1, wherein the extracting feature vectors of the case description text to be typed, calculating the similarity between the feature vectors of the case description text to be typed and elements in the feature vector set, respectively, using cosine measures, comprises:
and respectively calculating the similarity between the feature vector of the case description text to be typed and the elements in the feature vector set through a formula 5:
Figure QLYQS_5
wherein, I x I I is vector x= (x) 1 ,x 2 ,x 3 ,...,x p ) Is defined as the Euclidean norm of
Figure QLYQS_6
Conceptually, it is the length of vector x.
5. A menstrual disorder typing system, comprising:
the acquisition module is used for acquiring a certain number of menstrual disorder case description texts;
a first construction module for constructing a text description corpus based on the menstrual disorder case description text;
the first extraction module is used for extracting main characteristic information of each menstrual disorder case description text in the text description corpus through a TextRank algorithm;
the second construction module is used for constructing a standard database;
the second extraction module is used for extracting the feature vector of the case description text to be typed;
the similarity calculation module is used for calculating the similarity between the feature vector of the case description text to be typed and the elements in the feature vector set by using cosine measurement;
and the parting module is used for carrying out case matching and parting on the text to be parting according to the similarity to obtain the disease type and prescription information corresponding to the text to be parting case description.
6. The menstrual disorder typing system according to claim 5, wherein the first extraction module is further configured to:
calculating main characteristic information of the descriptive text of each menstrual disorder case by a formula 1;
Figure QLYQS_7
wherein d represents a damping coefficient, and is generally set to 0.85; v (V) i Representing any node in the way; in (V) i ) The representation points to vertex V i Is defined by a vertex set; out (V) j ) Representing the vertex V j All vertex sets connected out; omega ij Representing the vertex V i and Vj Is a connection weight of (2); WS (V) i ) Representing the vertex V i Is added to the final ranking weights of (a).
7. The menstrual disorder typing system according to claim 5, wherein the second extraction module is further configured to:
calculating word frequency of the menstrual disorder case description text description through a formula 2;
Figure QLYQS_8
wherein ,ni,j Is the number of times the vocabulary appears in the menstrual disorder case description text dj,
Figure QLYQS_9
is the sum of the times of occurrence of all words in menstrual disorder case description text dj;
calculating an inverse document frequency by formula 3;
Figure QLYQS_10
where |d| is the total number of menstrual disorder case description text in the text description corpus; | { j: t is t i ∈d j The } | represents the number of menstrual disorder case descriptive text containing the word ti; if the term is not in the text description corpus, it will result in zero denominator, so 1+| { j is typically used: t is t i ∈d j }|;
Calculating a TF-IDF value through a formula 4;
TF-IDF=tf ij -idf i equation 4;
after the TF-IDF value of each word in the menstrual disorder case description text is calculated, descending order is carried out, a plurality of words with TF-IDF values higher than a set threshold value are selected as keywords, and feature vectors are constructed according to the keywords and the corresponding TF-IDF values, so that a feature vector set corresponding to each menstrual disorder case description text is obtained.
8. The menstrual disorder typing system according to claim 5, wherein the similarity calculation module further comprises:
and respectively calculating the similarity between the feature vector of the case description text to be typed and the elements in the feature vector set through a formula 5:
Figure QLYQS_11
wherein, I x I I is vector x= (x) 1 ,x 2 ,x 3 ,...,x p ) Is defined as the Euclidean norm of
Figure QLYQS_12
Conceptually, it is the length of vector x.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 4 when the computer program is executed.
10. A non-transitory computer readable medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any of claims 1 to 4.
CN202310168401.7A 2023-02-27 2023-02-27 Menstrual disorder typing system, menstrual disorder typing method, electronic device, and recording medium Pending CN116186262A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310168401.7A CN116186262A (en) 2023-02-27 2023-02-27 Menstrual disorder typing system, menstrual disorder typing method, electronic device, and recording medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310168401.7A CN116186262A (en) 2023-02-27 2023-02-27 Menstrual disorder typing system, menstrual disorder typing method, electronic device, and recording medium

Publications (1)

Publication Number Publication Date
CN116186262A true CN116186262A (en) 2023-05-30

Family

ID=86452006

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310168401.7A Pending CN116186262A (en) 2023-02-27 2023-02-27 Menstrual disorder typing system, menstrual disorder typing method, electronic device, and recording medium

Country Status (1)

Country Link
CN (1) CN116186262A (en)

Similar Documents

Publication Publication Date Title
Hassantabar et al. CovidDeep: SARS-CoV-2/COVID-19 test based on wearable medical sensors and efficient neural networks
CN110929511B (en) Intelligent matching method for personalized traditional Chinese medicine diagnosis and treatment information and traditional Chinese medicine information based on semantic similarity
Lin et al. Nonparametric estimation of the gap time distribution for serial events with censored data
CN109102899A (en) Chinese medicine intelligent assistance system and method based on machine learning and big data
CN110246577B (en) Method for assisting gestational diabetes genetic risk prediction based on artificial intelligence
CN109325942A (en) Eye fundus image Structural Techniques based on full convolutional neural networks
CN110335684A (en) The intelligent dialectical aid decision-making method of Chinese medicine based on topic model technology
CN104915561A (en) Intelligent disease attribute matching method
CN111985246B (en) Disease cognitive system based on main symptoms and accompanying symptom words
CN111563891B (en) Disease prediction system based on color cognition
CN112289441B (en) Medical biological feature information matching system based on multiple modes
CN110348019A (en) A kind of medical bodies vector method for transformation based on attention mechanism
Wu et al. Diagnosis of sleep disorders in traditional Chinese medicine based on adaptive neuro-fuzzy inference system
CN109065174A (en) Consider the case history theme acquisition methods and device of similar constraint
CN118335292A (en) Interactive auxiliary system of special prescription for special diseases of traditional Chinese medicine
Tang et al. Deep adaptation network for subject-specific sleep stage classification based on a single-lead ECG
CN112182168A (en) Medical record text analysis method and device, electronic equipment and storage medium
CN113345574B (en) Traditional Chinese medicine stomachache health preserving scheme obtaining device based on BERT language model and CNN model
CN112002419B (en) Disease auxiliary diagnosis system, equipment and storage medium based on clustering
CN112259220B (en) System, equipment and storage medium for predicting diseases based on nasal bleeding accompanying symptoms
CN116913475A (en) Traditional Chinese medicine curative effect evaluation system and method for gout
CN116186262A (en) Menstrual disorder typing system, menstrual disorder typing method, electronic device, and recording medium
Hui et al. Extraction and classification of tcm medical records based on bert and bi-lstm with attention mechanism
CN116501837A (en) Retrieval method, system, equipment and storage medium based on double-tower recall
Karthik et al. Virtual doctor: an artificial medical diagnostic system based on hard and soft inputs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination