CN114898895A - Xinjiang local adverse drug reaction identification method and related device - Google Patents

Xinjiang local adverse drug reaction identification method and related device Download PDF

Info

Publication number
CN114898895A
CN114898895A CN202210467433.2A CN202210467433A CN114898895A CN 114898895 A CN114898895 A CN 114898895A CN 202210467433 A CN202210467433 A CN 202210467433A CN 114898895 A CN114898895 A CN 114898895A
Authority
CN
China
Prior art keywords
xinjiang
local
corpus
adverse drug
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210467433.2A
Other languages
Chinese (zh)
Inventor
王晓卓
杨柳
陈天宇
王涛
郭江涛
王楷
曹澍
黎红
聂旭贝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Xinjiang Electric Power CorporationInformation & Telecommunication Co ltd
State Grid Corp of China SGCC
Original Assignee
State Grid Xinjiang Electric Power CorporationInformation & Telecommunication Co ltd
State Grid Corp of China SGCC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Xinjiang Electric Power CorporationInformation & Telecommunication Co ltd, State Grid Corp of China SGCC filed Critical State Grid Xinjiang Electric Power CorporationInformation & Telecommunication Co ltd
Priority to CN202210467433.2A priority Critical patent/CN114898895A/en
Publication of CN114898895A publication Critical patent/CN114898895A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Toxicology (AREA)
  • Epidemiology (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of machine learning, in particular to a Xinjiang local adverse drug reaction identification method and a related device. The former comprises establishing a local drug adverse reaction corpus of Xinjiang and dividing the corpus into a training sample set and a test sample set; learning and training the model by using a training sample set to obtain an adverse reaction identification model; and testing the adverse reaction identification model by using the test sample set, selecting the optimal parameters, and outputting the adverse reaction identification model. The method acquires a large amount of text information of the network related to the local drug adverse reaction in Xinjiang to establish a local drug adverse reaction corpus, captures the bidirectional semantic dependence and local features of the text in the local drug adverse reaction corpus by using a neural network model of a bidirectional long-time memory network and convolutional neural network mixed network based on an attention mechanism, highlights important features by using the attention mechanism, and ensures effective identification of the adverse reactions.

Description

Xinjiang local adverse drug reaction identification method and related device
Technical Field
The invention relates to the technical field of machine learning, in particular to a Xinjiang local adverse drug reaction identification method and a related device.
Background
An adverse drug reaction is a negative reaction that is caused by the interaction of a single drug or multiple drugs, regardless of the disease being treated, and that is harmful to the patient. The adverse drug reactions bring great harm, especially in Xinjiang, because the number of people participating in local adverse drug reactions clinical trials in Xinjiang is limited, many potential adverse drug reactions cannot be found in clinical experimental stages, and the traditional adverse drug reaction finding mode also severely limits drug vigilance. Detailed adverse reaction early warning is not given in the specifications of most local medicines in Xinjiang, and only the adverse reaction is not clear, so that the medicine can not play a role in guiding patients to take medicines. This results in a great safety hazard for the public in the process of self-diagnosis medication.
Because the corpus of the local medicine in Xinjiang has no relevant construction, and due to the particularity of the local medicine in Xinjiang, a plurality of potential adverse reactions cannot be expressed in clinic in time. Especially in the local medicine field of Xinjiang, the work related to medical language processing is still in the preliminary stage. A great deal of adverse drug reaction text comment data from net friends is collected from the network, and the adverse drug reaction feedback text in the network comment implies a great deal of undiscovered 'knowledge'. Compared with the traditional medical report, the information is more sufficient, timely and widely spread. However, because of the characteristics of non-regularization, spoken language expression, error expression and the like, the web text brings huge challenges to the work of mining text information based on the social network. The texts in the network have strong individual subjectivity, and the accurate judgment can be made by contacting the contexts in many times. Meanwhile, the computer technology is not researched to be integrated into medical research about adverse drug reactions in Xinjiang, and adverse reaction research based on social network comments is relatively less.
Therefore, the analysis and identification of the local drug adverse reaction in Xinjiang are problems to be researched urgently, and play an important role in improving the overall level of the medicine in China and promoting the rapid development of the Xinjiang region.
Disclosure of Invention
The invention provides a method and a related device for identifying local adverse drug reactions in Xinjiang, overcomes the defects of the prior art, and can effectively solve the problems that no local drug corpus exists at present, and the local adverse drug reactions in Xinjiang cannot be identified by combining computer technology and local adverse drug reactions in Xinjiang.
One of the technical schemes of the invention is realized by the following measures: a method for constructing a local adverse drug reaction identification model in Xinjiang comprises the following steps:
establishing a local Xinjiang local adverse drug reaction corpus, and dividing the local Xinjiang local adverse drug reaction corpus into a training sample set and a testing sample set, wherein the local Xinjiang local adverse drug reaction corpus comprises texts related to local Xinjiang local adverse drug reactions in a network;
learning and training the model by utilizing a training sample set to obtain an adverse reaction identification model, wherein the model is established by a bidirectional long-time and short-time memory network and convolutional neural network mixed network model algorithm based on an attention mechanism;
and testing the adverse reaction identification model by using the test sample set, selecting the optimal parameters, and outputting the adverse reaction identification model.
The following is further optimization or/and improvement of the technical scheme of the invention:
the establishing of the Xinjiang local adverse drug reaction corpus, and the dividing of the Xinjiang local adverse drug reaction corpus into a training sample set and a testing sample set comprises the following steps:
obtaining a corpus related to the adverse drug reactions in Xinjiang;
preprocessing the corpus to obtain entities in the corpus;
and obtaining and classifying candidate incidence relations among the entities, obtaining corresponding relations among the entities, and identifying the adverse drug reactions.
The preprocessing the corpus to obtain the entity in the corpus comprises:
denoising the corpus to realize corpus structure normalization, wherein denoising comprises removing non-Chinese parts of data, removing stop words and deleting punctuation marks;
and setting a corpus labeling rule, and labeling the corpus by using the corpus labeling rule to obtain an entity in the corpus.
The above-mentioned learning training is carried out to the model to utilize training sample set, obtains adverse reaction recognition model, includes:
segmenting the text into a set of minimum semantic units by adopting a word2vec technology, vectorizing the set of minimum semantic units, generating a distributed vector, and obtaining word vector characteristics of the text;
respectively inputting the word vector characteristics into a bidirectional long-time and short-time memory network and a convolutional neural network to obtain bidirectional context global characteristics and local convolutional characteristics, and splicing the bidirectional context global characteristics and the local convolutional characteristics for average fusion;
by utilizing an Attention mechanism, the symptom characteristics of adverse reactions are highlighted;
and classifying by using a classifier to identify adverse reactions.
The second technical scheme of the invention is realized by the following measures: a method for identifying local adverse drug reactions in Xinjiang comprises the following steps:
acquiring information to be identified;
inputting the information to be identified into the Xinjiang local adverse drug reaction identification model to obtain an adverse reaction identification result; wherein the Xinjiang local adverse drug reaction recognition model is the Xinjiang local adverse drug reaction recognition model constructed by adopting the Xinjiang local adverse drug reaction recognition model construction method.
The third technical scheme of the invention is realized by the following measures: a device for constructing a local adverse drug reaction identification model in Xinjiang comprises:
a corpus construction unit, which is used for establishing a Xinjiang local adverse drug reaction corpus and dividing the Xinjiang local adverse drug reaction corpus into a training sample set and a testing sample set, wherein the Xinjiang local adverse drug reaction corpus comprises texts related to the Xinjiang local adverse drug reactions in a network;
the modeling unit is used for learning and training the model by utilizing the training sample set to obtain an adverse reaction identification model, wherein the model is established by a bidirectional long-time memory network and convolutional neural network mixed network model algorithm based on an attention system;
and the testing unit is used for testing the adverse reaction identification model by using the test sample set, selecting the optimal parameters and outputting the adverse reaction identification model.
The fourth technical scheme of the invention is realized by the following measures: a local adverse drug reaction recognition device in Xinjiang comprises:
an acquisition unit that acquires information to be identified;
the communication unit inputs the information to be identified into the Xinjiang local adverse drug reaction identification model;
the identification unit inputs the information to be identified into the Xinjiang local adverse drug reaction identification model to obtain an adverse reaction identification result; the Xinjiang local adverse drug reaction recognition model is constructed by adopting a Xinjiang local adverse drug reaction recognition model construction method.
The method acquires a large amount of text information of network related to local drug adverse reactions in Xinjiang to establish a local drug adverse reaction corpus, captures the bidirectional semantic dependence and local features of texts in the local drug adverse reaction corpus by using a neural network model of a bidirectional long-time memory network and convolutional neural network mixed network based on an attention mechanism, highlights more important (namely adverse reaction) features by using the attention mechanism, ensures the effectiveness of adverse reaction identification, and further effectively identifies the potential adverse reactions of the local drugs in Xinjiang.
Drawings
FIG. 1 is a flow chart of a model construction method of the present invention.
FIG. 2 is a flow chart of a corpus establishing method of the present invention.
FIG. 3 is a flow chart of the method for obtaining the adverse reaction recognition model of the present invention.
FIG. 4 is a flow chart of the method for identifying adverse reactions of the present invention.
FIG. 5 is a schematic structural diagram of a model building apparatus according to the present invention.
Fig. 6 is a schematic structural diagram of the identification device of the present invention.
Detailed Description
The present invention is not limited by the following examples, and specific embodiments may be determined according to the technical solutions and practical situations of the present invention.
Before explaining the embodiments of the present invention in detail, an application scenario of the embodiments of the present invention will be described. Because the corpus of the local medicine in Xinjiang has no relevant construction, and due to the particularity of the local medicine in Xinjiang, a plurality of potential adverse reactions cannot be expressed in clinic in time. Under such a scene, the Xinjiang regional adverse drug reaction model construction method, the identification method and the related device provided by the embodiment of the invention can be used, so that the context global information and the local information of the Xinjiang regional adverse drug reaction text are fully mined, the identification accuracy is improved, a Xinjiang regional adverse drug reaction text identification algorithm of a bidirectional long-time and short-time memory Network and convolutional neural Network (Attentition Mechanism and BILSTM-CNN Hybrid Network, ATT-BILSTM-CNN) based on an Attention Mechanism is utilized, more complete context information is provided, the local characteristics are highlighted, and the potential adverse reaction of the Xinjiang regional drug can be identified by combining the information in the Network.
The invention is further described with reference to the following examples and figures:
example 1: as shown in the attached figure 1, the embodiment of the invention discloses a method for constructing a Xinjiang local adverse drug reaction recognition model, which comprises the following steps:
step S101, establishing a Xinjiang local adverse drug reaction corpus, and dividing the Xinjiang local adverse drug reaction corpus into a training sample set and a testing sample set, wherein the Xinjiang local adverse drug reaction corpus comprises texts related to the Xinjiang local adverse drug reactions in a network;
step S102, learning and training a model by utilizing a training sample set to obtain an adverse reaction identification model, wherein the model is established by a bidirectional long-time memory network and convolutional neural network mixed network model algorithm based on an attention mechanism;
and S103, testing the adverse reaction identification model by using the test sample set, selecting the optimal parameters, and outputting the adverse reaction identification model.
The invention discloses a method for constructing a model for identifying local adverse drug reactions in Xinjiang. The method comprises the steps of obtaining a large amount of text information of network related to Xinjiang local adverse drug reactions, establishing a Xinjiang local adverse drug reaction corpus, capturing bidirectional semantic dependence and local features of texts in the Xinjiang local adverse drug reaction corpus by using a neural network model of a bidirectional long-time memory network and a convolutional neural network mixed network based on an attention mechanism, highlighting more important (namely adverse reaction) features by using the attention mechanism, ensuring the effectiveness of adverse reaction identification, and further effectively identifying potential adverse reactions of Xinjiang local drugs.
Example 2: as shown in the attached figure 2, the embodiment of the invention discloses a method for constructing a local Xinjiang adverse drug reaction recognition model, wherein a local Xinjiang adverse drug reaction corpus is established and is divided into a training sample set and a test sample set, and the method further comprises the following steps:
step S201, obtaining a corpus related to the local adverse drug reactions in Xinjiang in a network;
the embodiment uses crawler technology to crawl text information about local adverse drug reactions in Xinjiang from websites such as medical forums, mainstream medical blogs, sticking bars and the like.
Step S202, preprocessing the corpus to obtain entities in the corpus;
due to individual differences of expression habits, user expression words in different age groups, different cultural degrees and even different regions can be different. Not only because of differences in the educational level or literary literacy of different groups of people, but also the long-standing, evolving differences between written and spoken languages. In addition, the network comments are originally free texts with great randomness, and the linguistic data acquired from the network has dissimilarity and a great deal of ambiguity. Meanwhile, the adverse drug reactions and the pharmaceutical indications have similarities, for example, diarrhea may be the symptoms of patients suffering from diseases, and the adverse drug reactions caused by wrong medication may also be possible. Therefore, in this embodiment, the corpus needs to be preprocessed to distinguish the indications and adverse reactions of the drugs.
Here, the preprocessing the corpus in this embodiment specifically includes:
(1) denoising the corpus to realize corpus structure normalization, wherein denoising comprises removing non-Chinese parts of data, removing stop words and deleting punctuation marks;
because network expressions and formal medical reports are different, part of net friends can use very spoken expressions which also show adverse drug reactions, for example, sentences with adverse drug reactions such as 'feeling very hard', 'being hard after eating', 'being panic in heart burn', 'being buzzy in brain', 'being bile to be spit out', and the like can be used for expressing own experience of taking medicine, so that in the step, the noise of the material is removed, the structural standardization processing is realized, and the model can correctly recognize the adverse drug reactions in the sentences.
(2) And setting a corpus labeling rule, and labeling the corpus by using the corpus labeling rule to obtain an entity in the corpus.
The corpus labeling rules can be analogized to Xinjiang local medicine which is also a prescription medicine through similarity and relevance according to the existing Chinese medicine text labeling method. Marking is carried out from four aspects of part of speech, semantic category, syntactic structure and named entity marking, and adding in the aspect of marking rules: medicine, indications, symptoms, disease positions, disease names, causes of diseases and other labels.
For example, the sentence is that the doctor opens the grandma particles in the last week, the doctor has a good night after drinking according to the dosage of the doctor after going home, but the child has the stomach pain after taking the medicine, the corpus is labeled by the corpus labeling rule, namely, the catching cold is labeled as the etiology, the cold is labeled as the disease name, the headache is labeled as the indication, the grandma particles are labeled as the medicine, and the stomach pain is labeled as the symptom; the sentence contains three main entities, "headache", "grandma granule" and "belly pain".
Step S203, obtaining and classifying candidate incidence relations among the entities to obtain corresponding relations among the entities, and identifying the adverse drug reactions.
Based on the above example, the candidate associations that may exist in the three entities are: the method comprises the steps of (1) training the Zukamu granules, headache, ZUXIANG granules, stomachache, headache and stomachache by using a network according to information such as grammar, syntax and semantics, and classifying candidate relations by using a classifier according to indications corresponding to the medicines in Xinjiang local medicine standards issued by the ministry of health, so that the corresponding relation of the ZUKAU granules for treating headache and the ZUXIANG granules for causing stomachache can be obtained. The aims of filtering indications and identifying adverse drug reactions are achieved. Then the classifier classifies the types of the adverse reactions and the adverse reactions.
Example 3: as shown in the attached figure 3, the embodiment of the invention discloses a method for constructing an adverse reaction identification model of local medicine in Xinjiang, wherein the model is learned and trained by utilizing a training sample set to obtain the adverse reaction identification model, and the method further comprises the following steps:
step S301, a word2vec technology is adopted, a text is segmented into a set of the smallest semantic units and vectorized, a distributed vector is generated, and word vector characteristics of the text are obtained;
word2vec mainly contains two models, CBOW continuous bag-of-words model and Skip-gram Skip-word model. The CBOW continuous bag of words model is used to predict the current value by context, which is equivalent to deducting a word from a sentence and letting you guess what this word is. The Skip-gram word skipping model predicts the context using the current word, which is equivalent to guessing what words might appear before and after a word. In the embodiment, a Skip-gram word skipping model is adopted to extract high-quality word vector characteristics in the text.
The Skip-gram word skipping model predicts other background words in the front and the back of the sentence according to the current word, and if the two words have the same output, the similarity of the two words can be deduced reversely. The Skip-gram Skip model is a well-known technique and will not be described in detail. In this embodiment, on the basis of using the Skip-gram model, a current word is affected by words in the vicinity of the current word, a plurality of words are predicted by using one word, and each word is adjusted once when being used as a central word, so that interference by other high-frequency words is less. Due to the particularity of local medicines in Xinjiang, the audiences of the local medicines are not very wide, certain limitation is caused to the acquisition of the linguistic data of the adverse drug reaction, and when the data scale is small, the word vector is more accurate due to repeated iterative adjustment.
Step S302, respectively inputting the word vector characteristics into a bidirectional long-time and short-time memory network and a convolutional neural network to obtain bidirectional context global characteristics and local convolutional characteristics, and splicing the bidirectional context global characteristics and the local convolutional characteristics for average fusion;
the bidirectional long-short time memory network is composed of a long-short time memory network (LSTM) from front to back and a long-short time memory network from back to front, so that bidirectional semantic dependence can be better captured, and the BILSTM can capture dynamic information of a time sequence and also utilize front and back information of a current word. For spoken language and irregular network comments of the text, the global characteristics of the hidden information can be fully obtained, so that the training accuracy is improved.
The convolutional neural network is a classic deep learning network, and can obtain a good classification effect while reducing the training difficulty only by needing few super-parameter adjustment and static vectors. It consists of an input layer, a convolution layer and a pooling layer. In the embodiment, a convolutional neural network is used for extracting local convolutional features, pooling is a sampling step after a convolution process, a pooling layer further samples obtained feature vectors and then alleviates the problem of overfitting, more maximum pooling strategies and average pooling strategies are used, but the position information and feature frequency information of feature items can be lost in the traditional pooling strategies, and in order to extract richer semantic information from network comments, the convolutional neural network based on an attention pooling strategy is used in the embodiment. The information loss problem of the pooling layer can be reduced, sentences of different categories can be dispersed invisibly, a pooling strategy based on an attention mechanism can achieve a better classification effect, and the attention-based method highlights more local characteristic information different from maximum pooling and average pooling.
Step S303, highlighting symptom characteristics of adverse reactions by using an Attention mechanism;
the embodiment introduces an Attention mechanism, combines the local features trained by CNN and the global features of BILSTM, and adds the local features and the global features to an Attention layer, and can give a value of an element attribute by adopting the Attention mechanism, so that prominent weight information can be obtained, and further, the text recognition of the adverse drug reactions can be more effectively carried out. The focus in this example is on symptom information for adverse reactions, which requires that the model be more sensitive in handling long sentences related to adverse drug reactions. Therefore, the embodiment introduces an Attention mechanism, and maps the result to an interval by using a tanh function to obtain a weighting characteristic, so that subsequent calculation and analysis can be facilitated.
And step S304, classifying by using a classifier, and identifying adverse reactions.
The present embodiment may employ a softmax classifier here.
Example 4: as shown in the attached figure 4, the embodiment of the invention discloses a method for identifying local adverse drug reactions in Xinjiang, which comprises the following steps:
step S401, obtaining information to be identified;
step S402, inputting information to be identified into a Xinjiang local adverse drug reaction identification model to obtain an adverse reaction identification result; wherein the Xinjiang local adverse drug reaction recognition model is the Xinjiang local adverse drug reaction recognition model constructed by the Xinjiang local adverse drug reaction recognition model construction method in the embodiments 1 to 3.
Example 5: as shown in the attached figure 5, the embodiment of the invention discloses a Xinjiang local adverse drug reaction recognition model construction device, which comprises:
a corpus construction unit, which is used for establishing a Xinjiang local adverse drug reaction corpus and dividing the Xinjiang local adverse drug reaction corpus into a training sample set and a testing sample set, wherein the Xinjiang local adverse drug reaction corpus comprises texts related to the Xinjiang local adverse drug reactions in a network;
the modeling unit is used for learning and training the model by utilizing the training sample set to obtain an adverse reaction identification model, wherein the model is established by a bidirectional long-time memory network and convolutional neural network mixed network model algorithm based on an attention system;
and the testing unit is used for testing the adverse reaction identification model by using the test sample set, selecting the optimal parameters and outputting the adverse reaction identification model.
Example 6: as shown in the attached figure 6, the embodiment of the invention discloses a Xinjiang local adverse drug reaction recognition device, which comprises:
an acquisition unit that acquires information to be identified;
the communication unit inputs the information to be identified into the Xinjiang local adverse drug reaction identification model;
the identification unit inputs the information to be identified into the Xinjiang local adverse drug reaction identification model to obtain an adverse reaction identification result; wherein the Xinjiang local adverse drug reaction recognition model is the Xinjiang local adverse drug reaction recognition model constructed by the Xinjiang local adverse drug reaction recognition model construction method in the embodiments 1 to 3.
Example 7: in order to verify the effectiveness of the Xinjiang local adverse drug reaction identification model established by the attention mechanism-based bidirectional long-time and short-time memory network and convolutional neural network mixed network model algorithm in the aspect of text classification of the Xinjiang local adverse drug reactions. Comparative experiments were performed with the same data input to the single network and the single network with added attention as well as the present invention. And each model adopts the optimal parameters of the model and inputs the same word vector. The results of the specific experiments are shown in table 1.
As can be seen from the various model comparison tests in Table 1, the F values of CNN and BILSTM networks with attention mechanism added are improved by 1.65% and 0.9% respectively compared with those of CNN and BILSTM networks without addition. The combined network with the added attention mechanism has an F value improved by 2.33% compared with the combined network without the added attention mechanism. The attention mechanism is proved to be capable of effectively focusing more critical information so as to enable classification to be more accurate. Compared with single CNN and BILSTM networks, the accuracy of the BILSTM-CNN combined network is respectively improved by 3.52 percent and 0.91 percent, the F value is respectively improved by 3.45 percent and 1.09 percent, the context global characteristics and the local characteristics can be effectively fused by the combined network, so that the text recognition of local drug adverse reactions in Xinjiang achieves a better effect, and the effectiveness of the combined network is proved. The Xinjiang local adverse drug reaction identification model is established based on the attention mechanism bidirectional long-time and short-time memory network and convolutional neural network mixed network model algorithm, experiments are carried out on the Xinjiang local adverse drug reaction data set, and the P, R and F values of the experiment results obtain the optimal values of the comparative experiments. The flexibility of the combined network and the effectiveness of the attention mechanism are verified, and the superiority of the model disclosed by the invention on the text classification task of the local adverse drug reactions in Xinjiang is proved.
Example 8: the embodiment of the invention discloses a storage medium, wherein a computer program capable of being read by a computer is stored on the storage medium, and the computer program is set to execute a Xinjiang local adverse drug reaction identification model construction method during running.
The storage medium may include, but is not limited to: u disk, read-only memory, removable hard disk, magnetic or optical disk, etc. various media capable of storing computer programs.
Example 9: an embodiment of the invention discloses a terminal comprising a processor, a memory, a communication interface, and one or more programs stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing the steps of the method for identifying a local adverse drug reaction in Xinjiang.
Embodiment 10, an embodiment of the invention discloses a server comprising a processor, a memory, a communication interface, and one or more programs stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing the steps of the method for identifying adverse drug reactions in Xinjiang.
The processor may be a central processing unit CPU, general purpose processor, digital signal processor DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. Or a combination that performs a computing function, e.g., comprising one or more microprocessors, DSPs, and microprocessors, etc.
The communication module may be a transceiver, an RF circuit or a communication interface, etc. The storage module may be a memory, and may include but is not limited to: u disk, read-only memory, removable hard disk, magnetic or optical disk, etc. for storing computer program.
Embodiments of the present application also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any of the methods as described in the above method embodiments. The computer program product may be a software installation package, the computer comprising an electronic device.
The above technical features constitute the best embodiment of the present invention, which has strong adaptability and best implementation effect, and unnecessary technical features can be increased or decreased according to actual needs to meet the requirements of different situations.
Figure DEST_PATH_IMAGE002

Claims (10)

1. A method for constructing a local adverse drug reaction identification model in Xinjiang is characterized by comprising the following steps of:
establishing a local Xinjiang local adverse drug reaction corpus, and dividing the local Xinjiang local adverse drug reaction corpus into a training sample set and a testing sample set, wherein the local Xinjiang local adverse drug reaction corpus comprises texts related to local Xinjiang local adverse drug reactions in a network;
learning and training the model by utilizing a training sample set to obtain an adverse reaction identification model, wherein the model is established by a bidirectional long-time and short-time memory network and convolutional neural network mixed network model algorithm based on an attention mechanism;
and testing the adverse reaction identification model by using the test sample set, selecting the optimal parameters, and outputting the adverse reaction identification model.
2. The method for constructing a Xinjiang local adverse drug reaction recognition model according to claim 1, wherein the establishing of the Xinjiang local adverse drug reaction corpus is divided into a training sample set and a test sample set, and the method comprises the following steps:
obtaining a corpus related to the adverse drug reactions in Xinjiang;
preprocessing the corpus to obtain entities in the corpus;
and obtaining and classifying candidate incidence relations among the entities, obtaining corresponding relations among the entities, and identifying the adverse drug reactions.
3. The method for constructing a model for identifying adverse drug reactions in Xinjiang according to claim 2, wherein the preprocessing the corpus to obtain entities in the corpus comprises:
denoising the corpus to realize corpus structure normalization, wherein denoising comprises removing non-Chinese parts of data, removing stop words and deleting punctuation marks;
and setting a corpus labeling rule, and labeling the corpus by using the corpus labeling rule to obtain an entity in the corpus.
4. The method for constructing a Xinjiang local adverse drug reaction recognition model according to any one of claims 1 to 3, wherein the step of learning and training the model by using a training sample set to obtain the adverse reaction recognition model comprises the following steps:
segmenting the text into a set of minimum semantic units by adopting a word2vec technology, vectorizing the set of minimum semantic units, generating a distributed vector, and obtaining word vector characteristics of the text;
respectively inputting the word vector characteristics into a bidirectional long-time and short-time memory network and a convolutional neural network to obtain bidirectional context global characteristics and local convolutional characteristics, and splicing the bidirectional context global characteristics and the local convolutional characteristics for average fusion;
the symptom characteristics of adverse reactions are highlighted by utilizing an Attention mechanism;
and classifying by using a classifier to identify adverse reactions.
5. A method for identifying local adverse drug reactions in Xinjiang is characterized by comprising the following steps:
acquiring information to be identified;
inputting the information to be identified into the Xinjiang local adverse drug reaction identification model to obtain an adverse reaction identification result; wherein the Xinjiang local adverse drug reaction recognition model is constructed by the Xinjiang local adverse drug reaction recognition model construction method according to any one of claims 1 to 4.
6. A Xinjiang local ADM recognition model construction device using the Xinjiang local ADM recognition model construction method according to any one of claims 1 to 4, comprising:
a corpus construction unit, which is used for establishing a Xinjiang local adverse drug reaction corpus and dividing the Xinjiang local adverse drug reaction corpus into a training sample set and a testing sample set, wherein the Xinjiang local adverse drug reaction corpus comprises texts related to the Xinjiang local adverse drug reactions in a network;
the modeling unit is used for learning and training the model by utilizing the training sample set to obtain an adverse reaction identification model, wherein the model is established by a bidirectional long-time memory network and convolutional neural network mixed network model algorithm based on an attention system;
and the testing unit is used for testing the adverse reaction identification model by using the test sample set, selecting the optimal parameters and outputting the adverse reaction identification model.
7. A local Xinjiang adverse drug reaction recognition device using the local Xinjiang adverse drug reaction recognition method according to claim 5, comprising:
an acquisition unit that acquires information to be identified;
the communication unit inputs the information to be identified into the Xinjiang local adverse drug reaction identification model;
the identification unit inputs the information to be identified into the Xinjiang local adverse drug reaction identification model to obtain an adverse reaction identification result; wherein the Xinjiang local adverse drug reaction recognition model is constructed by the Xinjiang local adverse drug reaction recognition model construction method according to any one of claims 1 to 4.
8. A storage medium, having stored thereon a computer program readable by a computer, the computer program being arranged to, when executed, perform the method of any one of claims 1 to 4.
9. A terminal comprising a processor, memory, a communications interface, and one or more programs stored in the memory and configured to be executed by the processor, the programs including instructions for performing the steps of the method of any of claims 5.
10. A server, comprising a processor, a memory, a communication interface, and one or more programs stored in the memory and configured to be executed by the processor, the programs including instructions for performing the steps of the method of any of claims 5.
CN202210467433.2A 2022-04-29 2022-04-29 Xinjiang local adverse drug reaction identification method and related device Pending CN114898895A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210467433.2A CN114898895A (en) 2022-04-29 2022-04-29 Xinjiang local adverse drug reaction identification method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210467433.2A CN114898895A (en) 2022-04-29 2022-04-29 Xinjiang local adverse drug reaction identification method and related device

Publications (1)

Publication Number Publication Date
CN114898895A true CN114898895A (en) 2022-08-12

Family

ID=82719451

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210467433.2A Pending CN114898895A (en) 2022-04-29 2022-04-29 Xinjiang local adverse drug reaction identification method and related device

Country Status (1)

Country Link
CN (1) CN114898895A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115775635A (en) * 2022-11-22 2023-03-10 长沙砝码柯数据科技有限责任公司 Medicine risk identification method and device based on deep learning model and terminal equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115775635A (en) * 2022-11-22 2023-03-10 长沙砝码柯数据科技有限责任公司 Medicine risk identification method and device based on deep learning model and terminal equipment

Similar Documents

Publication Publication Date Title
Salminen et al. Developing an online hate classifier for multiple social media platforms
Van Der Lee et al. Best practices for the human evaluation of automatically generated text
Al-Thanyyan et al. Automated text simplification: a survey
CN111316281B (en) Semantic classification method and system for numerical data in natural language context based on machine learning
Du et al. Public perception analysis of tweets during the 2015 measles outbreak: comparative study using convolutional neural network models
Doing-Harris et al. Computer-assisted update of a consumer health vocabulary through mining of social network data
Baker et al. Detecting epidemic diseases using sentiment analysis of arabic tweets.
Dawdy-Hesterberg et al. Learnability and generalisation of Arabic broken plural nouns
Gómez-Adorno et al. Automatic authorship detection using textual patterns extracted from integrated syntactic graphs
US20220284174A1 (en) Correcting content generated by deep learning
Rello et al. A spellchecker for dyslexia
Badal et al. Natural language processing in text mining for structural modeling of protein complexes
Ullah et al. RweetMiner: Automatic identification and categorization of help requests on twitter during disasters
CN112232065A (en) Method and device for mining synonyms
Santander-Cruz et al. Semantic feature extraction using SBERT for dementia detection
CN114078597A (en) Decision trees with support from text for healthcare applications
Liu et al. MedDG: an entity-centric medical consultation dataset for entity-aware medical dialogue generation
Tassone et al. Utilizing deep learning and graph mining to identify drug use on Twitter data
Zhao et al. Exploiting classification correlations for the extraction of evidence-based practice information
Liu et al. A genetic algorithm enabled ensemble for unsupervised medical term extraction from clinical letters
Ullah et al. A deep neural network-based approach for sentiment analysis of movie reviews
Alkouz et al. Deepluenza: Deep learning for influenza detection from twitter
CN114898895A (en) Xinjiang local adverse drug reaction identification method and related device
Liu et al. Extracting patient demographics and personal medical information from online health forums
Gu et al. Optimizing corpus creation for training word embedding in low resource domains: A case study in autism spectrum disorder (ASD)

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination