CN110442861B - Chinese professional term and new word discovery method based on real world statistics - Google Patents
Chinese professional term and new word discovery method based on real world statistics Download PDFInfo
- Publication number
- CN110442861B CN110442861B CN201910608625.9A CN201910608625A CN110442861B CN 110442861 B CN110442861 B CN 110442861B CN 201910608625 A CN201910608625 A CN 201910608625A CN 110442861 B CN110442861 B CN 110442861B
- Authority
- CN
- China
- Prior art keywords
- word
- value
- words
- pmi
- news
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Machine Translation (AREA)
Abstract
The invention relates to a method for discovering Chinese professional terms and new words based on real world statistics. The invention uses the inter-Point Mutual Information (PMI) and the adjacent entropy (BE) to judge and search the 'seed' (the word with high aggregation), and the reason for adopting the two methods is mainly that the two methods belong to unsupervised learning and have complementary functions. After finding the "seed", we filter out new words using the refined statistics in the 16 hundred million word based real world corpus.
Description
Technical Field
The invention relates to a Chinese professional term and new word discovery method based on real world statistics, which is used for detecting new words and professional terms in Chinese texts in the professional field.
Background
Word segmentation plays an important role in Chinese Natural Language Processing (NLP), which is the first task in natural language processing. In the current natural language processing process, constructing a professional word dictionary for some texts with strong speciality is an effective method for improving word segmentation quality. How to efficiently establish a professional word dictionary in the professional field is difficult work, and at present, a large number of methods are deep learning algorithms based on manual labeling, and the process is called entity name recognition. But it cannot handle professional medical texts like drug names or operation names without professional assistance or existing dictionaries.
Taking the names of the drugs in the medical field as an example, see Table 1
TABLE 1 Chinese and English reference table (part) for medicine name
It can be seen that for such a professional text most non-professional persons are simply not able to perform accurate labeling. Such as "doreggi (fentanyl transdermal patch)", this term can be divided into two parts: "doregyzgi" and "fentanyl transdermal patch". But the second part is easy to be left alone due to the existence of transliterated words and terms. Its correct division is "fentanyl", "transdermal", and "transdermal patch". The word "transparent" is easily incorporated by humans into other words. The same problem often appears in similar texts, the labeling difficulty of the professional text is very high, and the traditional NLP processing method has very poor processing effect on the professional text and can not meet the actual application requirements.
Disclosure of Invention
The purpose of the invention is: and determining the degree of solidification of one word based on the information entropy and the adjacent entropy so as to realize the discovery of new words and professional terms in the text.
In order to achieve the above object, the technical solution of the present invention is to provide a method for discovering Chinese professional terms and new words based on real world statistics, which is characterized by comprising the following steps:
step 1, collecting news corpora from various news media, defining the news corpora as news texts, taking clinical medicine names of medical institutions as contrast medical professional test texts, and defining the contrast medical professional test texts as professional texts;
step 2, binary word segmentation is respectively used for news texts and professional texts, non-Chinese characters in the word segmentation result of the news texts are abandoned to obtain candidate words, the occurrence times and frequency of the candidate words are counted, after the candidate words with lower frequency are removed, PMI value calculation is carried out on each remaining candidate word, the PMI value is a standard for calculating the solidification degree between two characters in the candidate words, the higher the PMI value is, the closer the connection between the two characters is represented, after the PMI value of each candidate word is calculated, the candidate words with the PMI value in a position division are abandoned, and therefore target words are obtained;
step 3, calculating the external adjacent entropy of any one target word x obtained in the step 2And internal entropy of adjacencyWherein:
in the formula, H r (x) Representing the right-adjacent entropy, H, of the target word x l (x) Representing the left-adjacent entropy, H, of the target word x r (x l ) Representing the left-hand character x in the target word x l Right adjacent entropy of (H) l (x r ) Representing the word x to the right in the target word x r Left contiguous entropy of (d);
step 4, each target word is obtained through calculation according to the external adjacent entropy and the internal adjacent entropy of each target wordThe BE value of the target word is normalized to obtain a normalized BE value, the BE value of the target word x is set as BE (x), and the normalized BE value is set as BE (x)Then there are:
in the formula (I), the compound is shown in the specification,represents the mean of the BE values for all target words, std (BE (x)) represents the standard deviation of BE (x);
and 5, acquiring a Score value of each target word, and setting the Score value of the target word x as Score (x), wherein the Score value comprises the following steps:
in the formula, λ represents a weight, PMI' represents a PMI value of the target word x;
step 6, taking the target word with the score value larger than a set threshold value as a seed word;
and 7, after the generation of the seed words is completed, obtaining a word group table of the binary characters, wherein the coagulation degree of the words in the word group table is high, and the professional terms to be extracted are extracted from the news text in the form of the binary words, so that the binary words in the word group table are integrated.
Preferably, in step 2, if the PMI value of any candidate word obtained by binary word segmentation of the news text is PMI', then:
in the formula, x represents one character in candidate words obtained by binary word segmentation of the professional text, the other character is y, p (x) represents the frequency of the character x in the professional text, p (y) represents the frequency of the character y in the professional text, and p (x, y) represents the frequency of the character xy in the professional text; x 'represents one word in candidate words obtained by binary word segmentation of news text, x' = x, another word is y ', y' = y, p (x ') represents the frequency of occurrence of the word x' in the news text, p (y ') represents the frequency of occurrence of the word y' in the news text, and p (x ', y') represents the frequency of occurrence of the word x 'y' in the news text.
Preferably, in step 2, the obtained pmi 'is normalized, so that the normalized value of the pmi' isThen there are:
Preferably, in step 7, when the bigrams in the word group table are integrated together, the bigrams in the word group table are recombined or elongated by using the conditional probability.
The invention uses the mutual information between Points (PMI) and the adjacent entropy (BE) to judge and search the 'seed' (the word with high aggregation), and the reason for adopting the two methods is mainly that the two methods belong to unsupervised learning and have complementary functions. After finding the "seed", we filter out new words using the refined statistics in the 16 hundred million word based real world corpus.
Detailed Description
The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.
The invention provides a method for discovering Chinese professional terms and new words based on real world statistics, which comprises the following steps:
in the first step, news corpora (hereinafter referred to as news texts) from the news of New wave, the news of China, the news of Tencent, the news of Baidu and the network media of people's daily news are collected. The name of the clinical drug of the medical institution was used as a reference medical professional test text (hereinafter referred to as a professional text).
And secondly, respectively using binary word segmentation for the news text and the professional text, and discarding non-Chinese characters in word segmentation results. The obtained result includes candidate words, the occurrence frequency of the candidate words, and the frequency of the candidate words, and table 2 shows the above operations performed on the 1G news text and partial results. And candidate words with the frequency lower than alpha are eliminated. And calculating PMI and BE values of each candidate word after the low-frequency word is removed to obtain the word solidity of each binary character.
Candidate word | Number of occurrences of candidate word | Frequency of occurrence of candidate words |
Represent | 544435 | 0.00133 |
Product(s) | 422727 | 0.00103 |
Editing | 372018 | 0.00091 |
Report on | 259518 | 0.00063 |
Beijing | 249406 | 0.00060 |
Appear by | 245255 | 0.00059 |
In part | 240593 | 0.00058 |
Become into | 229208 | 0.00056 |
To | 226781 | 0.00055 |
First of all | 224486 | 0.00054 |
Table 2
Thirdly, calculating the PMI value of each candidate word obtained in the second step
The PMI value is a criterion for calculating the degree of coagulation between two words, and a higher value thereof indicates a closer relationship between two words, and mathematically the PMI value can be expressed as PMI:
in equation (1), x represents one word, y represents another word, P (x) represents the frequency of occurrence of the word x in the text, P (y) represents the frequency of occurrence of the word x in the text, and P (x, y) represents the frequency of occurrence of the word xy in the text.
After the formula (1) for calculating the PMI value is applied to the invention, the PMI value is corrected, and the corrected PMI value PMI' of the candidate word after the news text obtained in the second step passes through the binary word segmentation is calculated, so that the following steps are provided:
in the formula (2), x represents one character in candidate words obtained by binary word segmentation of the professional text, and the other character is y; x 'represents one word in candidate words obtained by binary word segmentation of news text, x' = x, and the other word is y ', y' = y.
Candidate words PMI '≦ 0 are first discarded, and then the normalized value of the corrected PMI value PMI' is calculatedTaking the normalized value as the PMI value of the current candidate word, the following steps are carried out:
the invention hopes to correct the professional text through the candidate words in the news text, the professional text is the test object of the invention, and the invention aims to mine the words in the professional text. But words dug by the traditional method have various problems. Therefore, the present invention makes corrections through news text, i.e., real world data.
And sorting the PMI values of all candidate words in a descending order, and then discarding the candidate words of which the PMI values are behind the first quartile (25% of the numbers of all numerical values in the sample after being arranged from small to large), thereby obtaining the target words.
In the following steps, the present invention will calculate the BE value of the target word, BE (Adjacent entropy) being another criterion for determining word solidity. For a target word x, we define x i As its adjacent character. The unidirectional adjacency entropy of x can be written as: h (x) = -Sigma i p(x i )log 2 p(x i ),p(x i ) The expression x i Frequency of occurrence in the text. It shows the diversity of characters on the left and right sides of a target word. Higher numbers indicate that the word appears more often in the text, whereas the word does not appear in much of the text, and it is more likely to merge with adjacent words into a new word.
Fourthly, calculating the external adjacent entropy of any one target word x obtained in the third stepAnd internal contiguous entropy>Wherein:
in the formulae (4) and (5), H r (x) Representing the right-adjacent entropy, H, of the target word x l (x) Representing the left adjacency of the target word xEntropy, H r (x l ) Representing the left-hand character x in the target word x l Right adjacent entropy of (1), H l (x r ) Representing the right-hand character x in the target word x r Left adjacent entropy.
x lr Representing the right-adjacent entropy of the word to the left of the target word x, x rl Representing the left-adjacent entropy of the word to the right of the target word x.
External entropy of adjacencyThe result of (a) represents the multiplicity of a word when externally adjoining entropy @>A large value of (a) indicates that the word occurs in a large number of contexts. The invention also obtains better effect by calculating the internal adjacency entropy.
Fifthly, calculating to obtain the BE value of each target word according to the external adjacent entropy and the internal adjacent entropy of each target word, normalizing each BE value to obtain a normalized BE value, setting the BE value of the target word x as BE (x), and setting the normalized BE value as BE (x)Then there are:
in the formulae (6) and (7),represents the mean of the BE values for all target words, and Std (BE (x)) represents the standard deviation of BE (x). Equation (6) combines the entropy of the internal adjacency and the entropy of the external adjacency to obtain a new value, and the size of the new value can express the degree of coagulation of the candidate word. Ideally, the seed word is desired to be present in a variety of contexts and the internal freezing of the word is high, mathematically represented by equation (6).
The present invention requires the combination of BE and PMI to calculate the degree of coagulation of all the words of the "seed", and therefore requires a normalization process on the final result. Under the condition that the overall sample distribution and parameters are not known, the method uses the t distribution to carry out normalization processing, as shown in the formula (7).
And sixthly, acquiring a Score value of each target word, and setting the Score value of the target word x as Score (x), wherein the Score value comprises the following steps:
in the formula (7), λ represents a weight value pmi ′ The PMI value representing the target word x.
In order to obtain higher-quality seed words, the weight needs to BE added to PMI and BE when the PMI and the BE are combined, and lambda is introduced into the calculation as a parameter, as shown in formula (7).
And seventhly, taking the target word with the score value larger than the set threshold value as a seed word. The larger the score value is, the more the candidate seed word is like a fixed collocation; conversely, this indicates that the word is not sufficient to be a new word or candidate.
And eighthly, obtaining a phrase table of binary characters after the generation of the seed words is finished, wherein the solidification degree of the words is high. Due to the fact that real-world statistics are combined as the screening condition, the professional terms to be extracted can be considered to be extracted through the form of the binary words. These bigrams need to be integrated later. The present invention recombines or lengthens these scattered strings by using conditional probabilities.
For example, starting from the word "two", we always take the last word in the string as the starting point. So for "two", we chose "A" as the starting point. The next test looks for all words in the "seeds" table beginning with "A". The appearance probability of the words is consistent with the form of Bayesian conditional probability
The invention then gives a threshold to determine which words can be the subject of the extension. For example, the candidate word of "inject" may be "inject liquid", etc., and these words are recombined to obtain a new three-element word. And continuously iterating until no next word or all candidate words can not reach the set threshold. Thus, all professional word discovery is completed.
The invention is further illustrated by the following specific examples:
step 1, collecting news feeds from New wave news, china daily news, tencent news, hundred-degree news and people's daily news network media, wherein the time span is 2014 to 2018, the fields cover the fields of sports, entertainment, politics, science, art, culture and the like, the word number of each news is about 1000 words, the total number of words is 8GB news data, and the total word number reaches 16 hundred million words of news corpora (hereinafter referred to as news texts). The name of the clinical drug of the medical institution was used as a reference medical professional test text (hereinafter referred to as a professional text).
Step 2: a binary word list is generated and candidate words with a frequency of occurrence less than 5 are discarded. The PMI for each word is then computed and the characters for PMI <0 are discarded because this means that they are not enough to be a word.
And 3, step 3: setting the weight λ =0.3, the BE for each candidate is calculated and combined with PMI into a new quantity, denoted score. Finally, a score value for score is found and candidate words smaller than this value are discarded.
After all the above processes are completed, a statistically significant table with a high degree of word solidity is obtained. Table 3 shows the results of the descending order of the seed table, showing the higher ranked results of the degree of coagulation. A common feature of these "seeds" is that they occur in reality but are rarely used and, in addition, occur in large numbers and often together in our test text. Therefore, we believe that they can be high quality seeds and are ready to extend word length.
Sorting | Candidate word | Socre |
1 | Medical debate | 14.919 |
2 | Fork assembly | 13.762 |
3 | Point matching | 12.535 |
4 | Measuring pump | 12.414 |
5 | Chamber or | 12.385 |
6 | Coriolus versicolor | 11.798 |
7 | Check and | 11.794 |
8 | backup instrument | 11.537 |
9 | Two sides of the bag | 10.590 |
10 | Study and examination | 10.178 |
TABLE 3 test of the 10 words in the text with the highest degree of word aggregation and their scores
And 4, generating a list named Continue and stop. Wherein, words which can be extended in length continuously are stored in Continue; and storing words with the length which cannot be expanded continuously in stop. The length of the word begins to be expanded, and the probability threshold is set to be 0.3, which indicates that the word is only in P next >The case of 0.3 is considered as a candidate word that can be expanded. Putting the current word into a Continue list if the current word can still find the extension word; otherwise, put into the list of stop.
Claims (1)
1. A method for discovering Chinese professional terms and new words based on real world statistics is characterized by comprising the following steps:
step 1, collecting news corpora from various news media, defining the news corpora as news texts, taking clinical medicine names of medical institutions as contrast medical professional test texts, and defining the contrast medical professional test texts as professional texts;
step 2, binary word segmentation is respectively used for news texts and professional texts, non-Chinese characters in the word segmentation result of the news texts are abandoned to obtain candidate words, the occurrence times and frequency of the candidate words are counted, after the candidate words with the frequency less than 5 are removed, PMI value calculation is carried out on each remaining candidate word, the PMI value is a standard for calculating the solidification degree between two characters in the candidate words, the higher the PMI value is, the closer the relation between the two characters is represented, after the PMI value of each candidate word is calculated, the candidate words with the PMI value in a position division are abandoned, and therefore target words are obtained; if the PMI value of any candidate word obtained by binary word segmentation of the news text is PMI', the method comprises the following steps:
in the formula, x represents one character in candidate words obtained by binary word segmentation of the professional text, the other character is y, p (x) represents the frequency of the character x in the professional text, p (y) represents the frequency of the character y in the professional text, and p (x, y) represents the frequency of the character xy in the professional text; x is the number of ′ Representing one word, x, of candidate words of a news text obtained by binary word segmentation ′ = x, another word is y ′ ,y ′ =y,p(x ′ ) Representing a word x ′ Frequency of occurrence in news text, p (y) ′ ) Indicating the word y ′ Frequency of occurrence in news text, p (x) ′ ,y ′ ) The expression x' y ′ Frequency of occurrence in news text;
step 3, calculating the external adjacent entropy of any one target word x obtained in step 2And internal entropy of adjacencyWherein:
in the formula, H r (x) Representing the right-adjacent entropy, H, of the target word x l (x) Representing the left-adjacent entropy, H, of the target word x r (x l ) Representing the left-hand character x in the target word x l Right adjacent entropy of (1), H l (x r ) Representing the right-hand character x in the target word x r Left contiguous entropy of (d);
step 4, calculating to obtain BE value of each target word according to external adjacent entropy and internal adjacent entropy of each target word, and normalizing each BE value to obtain normalizationThe BE value after the normalization is set as BE (x) which is the BE value of the target word xThen there are:
in the formula (I), the compound is shown in the specification,represents the mean of the BE values for all target words, std (BE (x)) represents the standard deviation of BE (x);
and 5, obtaining the Score value of each target word, and setting the Score value of the target word x as Score (x), wherein the Score value comprises the following steps:
in the formula, λ represents weight value pmi ′ A PMI value representing a target word x;
step 6, taking the target word with the score value larger than a set threshold value as a seed word;
step 7, after the generation of the seed words is completed, obtaining a word group table of binary characters, wherein the coagulation degree of the words in the word group table is high, and the professional terms to be extracted are extracted from the news text in the form of the binary words, so that the binary words in the word group table are integrated; when the bigrams in the word group table are integrated together, the bigrams in the word group table are recombined or elongated by using conditional probabilities.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910608625.9A CN110442861B (en) | 2019-07-08 | 2019-07-08 | Chinese professional term and new word discovery method based on real world statistics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910608625.9A CN110442861B (en) | 2019-07-08 | 2019-07-08 | Chinese professional term and new word discovery method based on real world statistics |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110442861A CN110442861A (en) | 2019-11-12 |
CN110442861B true CN110442861B (en) | 2023-04-07 |
Family
ID=68429578
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910608625.9A Active CN110442861B (en) | 2019-07-08 | 2019-07-08 | Chinese professional term and new word discovery method based on real world statistics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110442861B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112988953B (en) * | 2021-04-26 | 2021-09-03 | 成都索贝数码科技股份有限公司 | Adaptive broadcast television news keyword standardization method |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105224682A (en) * | 2015-10-27 | 2016-01-06 | 上海智臻智能网络科技股份有限公司 | New word discovery method and device |
CN105786991A (en) * | 2016-02-18 | 2016-07-20 | 中国科学院自动化研究所 | Chinese emotion new word recognition method and system in combination with user emotion expression ways |
CN106126606A (en) * | 2016-06-21 | 2016-11-16 | 国家计算机网络与信息安全管理中心 | A kind of short text new word discovery method |
CN108509425A (en) * | 2018-04-10 | 2018-09-07 | 中国人民解放军陆军工程大学 | Chinese new word discovery method based on novelty |
CN108845982A (en) * | 2017-12-08 | 2018-11-20 | 昆明理工大学 | A kind of Chinese word cutting method of word-based linked character |
CN108874921A (en) * | 2018-05-30 | 2018-11-23 | 广州杰赛科技股份有限公司 | Extract method, apparatus, terminal device and the storage medium of text feature word |
CN108959259A (en) * | 2018-07-05 | 2018-12-07 | 第四范式(北京)技术有限公司 | New word discovery method and system |
-
2019
- 2019-07-08 CN CN201910608625.9A patent/CN110442861B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105224682A (en) * | 2015-10-27 | 2016-01-06 | 上海智臻智能网络科技股份有限公司 | New word discovery method and device |
CN105786991A (en) * | 2016-02-18 | 2016-07-20 | 中国科学院自动化研究所 | Chinese emotion new word recognition method and system in combination with user emotion expression ways |
CN106126606A (en) * | 2016-06-21 | 2016-11-16 | 国家计算机网络与信息安全管理中心 | A kind of short text new word discovery method |
CN108845982A (en) * | 2017-12-08 | 2018-11-20 | 昆明理工大学 | A kind of Chinese word cutting method of word-based linked character |
CN108509425A (en) * | 2018-04-10 | 2018-09-07 | 中国人民解放军陆军工程大学 | Chinese new word discovery method based on novelty |
CN108874921A (en) * | 2018-05-30 | 2018-11-23 | 广州杰赛科技股份有限公司 | Extract method, apparatus, terminal device and the storage medium of text feature word |
CN108959259A (en) * | 2018-07-05 | 2018-12-07 | 第四范式(北京)技术有限公司 | New word discovery method and system |
Non-Patent Citations (2)
Title |
---|
Liang Yang 等.Extraction New Sentiment Words in Weibo Based on Relative Branch Entropy.《China Conference on Information Retrieval》.2018,全文. * |
刘伟童 ; 刘培玉 ; 刘文锋 ; 李娜娜 ; .基于互信息和邻接熵的新词发现算法.计算机应用研究.2018,(第05期),全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN110442861A (en) | 2019-11-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Liu et al. | A soft-label method for noise-tolerant distantly supervised relation extraction | |
CN104391942B (en) | Short essay eigen extended method based on semantic collection of illustrative plates | |
TWI518528B (en) | Method, apparatus and system for identifying target words | |
CN104899260B (en) | Chinese pathological text structured processing method | |
CN105786991B (en) | In conjunction with the Chinese emotion new word identification method and system of user feeling expression way | |
CN106897559B (en) | A kind of symptom and sign class entity recognition method and device towards multi-data source | |
CN106844351B (en) | Medical institution organization entity identification method and device oriented to multiple data sources | |
CN111899890B (en) | Medical data similarity detection system and method based on bit string hash | |
RU2018119771A (en) | COMPARISON OF HOSPITALS FROM DECLINED HEALTH DATABASES WITHOUT OBVIOUS QUASI-IDENTIFIERS | |
CN107993724A (en) | A kind of method and device of medicine intelligent answer data processing | |
CN110502750A (en) | Disambiguation method, system, equipment and medium during Chinese medicine text participle | |
CN109344250A (en) | Single diseases diagnostic message rapid structure method based on medical insurance data | |
CN113343703B (en) | Medical entity classification extraction method and device, electronic equipment and storage medium | |
CN105488098B (en) | A kind of new words extraction method based on field otherness | |
CN109947951A (en) | A kind of automatically updated emotion dictionary construction method for financial text analyzing | |
CN109215798B (en) | Knowledge base construction method for traditional Chinese medicine ancient languages | |
CN106959943B (en) | Language identification updating method and device | |
CN109471950A (en) | The construction method of the structural knowledge network of abdominal ultrasonic text data | |
CN105956158B (en) | The method that network neologisms based on massive micro-blog text and user information automatically extract | |
CN112632910A (en) | Operation encoding method, electronic device and storage device | |
CN110442861B (en) | Chinese professional term and new word discovery method based on real world statistics | |
CN115982222A (en) | Searching method based on special disease and special medicine scenes | |
CN111104481A (en) | Method, device and equipment for identifying matching field | |
US11556706B2 (en) | Effective retrieval of text data based on semantic attributes between morphemes | |
Gafni | Child phonology analyzer: Processing and analyzing transcribed speech. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |