CN110489526A - A kind of term extended method, device and storage medium for medical retrieval - Google Patents

A kind of term extended method, device and storage medium for medical retrieval Download PDF

Info

Publication number
CN110489526A
CN110489526A CN201910742880.2A CN201910742880A CN110489526A CN 110489526 A CN110489526 A CN 110489526A CN 201910742880 A CN201910742880 A CN 201910742880A CN 110489526 A CN110489526 A CN 110489526A
Authority
CN
China
Prior art keywords
word
new
vector
words
initial retrieval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910742880.2A
Other languages
Chinese (zh)
Inventor
肖婷婷
陈凯
周异
侯翠兰
谢利剑
徐萌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai City Children Hospital
Original Assignee
Shanghai City Children Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai City Children Hospital filed Critical Shanghai City Children Hospital
Priority to CN201910742880.2A priority Critical patent/CN110489526A/en
Publication of CN110489526A publication Critical patent/CN110489526A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3338Query expansion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution

Abstract

The present invention provides a kind of term extended method, device and storage medium for medical retrieval, which comprises obtains several initial retrieval words;It calculates separately and the most similar new word of several described initial retrieval term vectors;Construct a vector space, the vector space includes several described initial retrieval words and all new words, with line connect in the vector space any initial retrieval word and with the immediate all new words of its vector;The score of all new words is calculated, the score of the new word is directly proportional to the item number of line that the new word is connected with the initial retrieval word;Extension dictionary is obtained, the extension dictionary is the part that score is higher than preset threshold in the new word.Any one medical name of user's input may be implemented using method, apparatus and storage medium of the invention, numerous identical or associated medical name is expanded, to be more convenient, comprehensively carry out medical retrieval.

Description

A kind of term extended method, device and storage medium for medical retrieval
Technical field
The present invention relates to medical retrieval fields, and in particular, to a kind of term extended method for medical retrieval, dress It sets and storage medium.
Background technique
On the internet when the retrieval of progress medical literature or medical information, because medical vocabulary is more professional, common people It is less susceptible to grasp, even the doctor of profession, they also just may only know a kind of title for a technology, if also Other titles also not necessarily understand;Meanwhile when carrying out medical retrieval, more fully search result, is generally required in order to obtain Enumerate out other identical or associated titles of a medical name, it is also desirable to spend the more time.
Therefore need a kind of method, any one medical name of user's input may be implemented, expand it is numerous identical or Associated medical name, so as to be more convenient, comprehensively carry out medical retrieval.
Through retrieving, application No. is 201610383323.2 Chinese invention applications, and it discloses a kind of data processing methods And device, method include: that the document comprising Medical Statistics method is obtained from target database as target data source;From mesh The expansion word that target word and target word are obtained in data source is marked, the expansion word of target word is to have identical semantic or pass with target word The word of connection relationship;The word obtained from target data source is constructed into semantic dictionary;By semantic analysis by the word in semantic dictionary Sorted out, and the word after classification is stored to storage unit, the word in each classification has identical semantic or incidence relation;When When receiving the term that user is inputted by search interface, obtained from target data source based on the word stored in storage unit Target literature corresponding with term simultaneously exports.
Above-mentioned patent refers to " expansion word of target word is to have identical semantic or incidence relation word with target word ", but does not have It is described how convenient, the comprehensive and word that is accurately expanded.
Summary of the invention
For the defects in the prior art, the object of the present invention is to provide a kind of term extension sides for medical retrieval Method and device can be convenient, be accurately extended comprehensively to the term of medical retrieval.
According to an aspect of the present invention, a kind of term extended method for medical retrieval, including following step are provided It is rapid:
Obtain several initial retrieval words;
It calculates separately and the most similar new word of several described initial retrieval term vectors;
A vector space, the vector space packet are constructed with several described initial retrieval words and all new words Include several described initial retrieval words and all new words, with line connect in the vector space any initial retrieval word and With the immediate all new words of its vector;
Calculate the score of all new words, the score of the new word and the new word and the initial retrieval word The item number of connected line is directly proportional;
Extension dictionary is obtained, the extension dictionary is the part that score is higher than preset threshold in the new word.
Preferably, it is calculating separately with before the most similar new word of several described initial retrieval term vectors, is obtaining in advance The vector for obtaining each word indicates.
Preferably, the method that the vector for obtaining each word indicates is that word is embedded in vector algorithm.
Preferably, the term vector for obtaining each word method the following steps are included:
A related fields is selected, and selectes several correlation documents and several search of the related fields Word;
Choose document databse;
Vector algorithm is embedded in using word to the document databse, the vector for obtaining target word in document databse indicates.
Preferably, if the method for calculating the score of all new words is the new word and any initial retrieval Word has a connected line, then plus one point.
Preferably, after obtaining extension dictionary, judge whether to need to carry out next iteration,
If so, carrying out next iteration using the extension dictionary as the initial retrieval word;
If it is not, then terminating.
According to an aspect of the present invention, a kind of term expanding unit for medical retrieval is provided, comprising:
Acquiring unit, for obtaining several initial retrieval words;
First computing unit is connected with the acquiring unit, for calculate separately with several described initial retrieval words to Measure most similar new word;
Vector space construction unit is connected with first computing unit, for several described initial retrieval words and All new words construct a vector space, and the vector space includes several described initial retrieval words and all described New word, with line connect in the vector space any initial retrieval word and with the immediate all new words of its vector;
Second computing unit is connected with the vector space construction unit, for calculating the score of all new words, The score of the new word is directly proportional to the item number of line that the new word is connected with the initial retrieval word;
Screening unit is connected with second computing unit, for screening the new word, obtains and extends dictionary, described Extending dictionary is the part that score is higher than preset threshold in the new word.
Preferably, further include pretreatment unit, be connected with first computing unit, for each word to be obtained ahead of time Vector indicates.
Preferably, further include iteration unit, be respectively connected with the screening unit, acquiring unit, needed for judging whether Carry out next iteration;If so, continuing next iteration using the extension dictionary as the initial retrieval word; If it is not, then terminating.
A method according to the present invention, provides a kind of computer readable storage medium, is stored thereon with computer program, The step of any one of claim 1-6 the method is realized when described program is executed by processor.
Compared with prior art, the present invention have it is following the utility model has the advantages that
Using method, apparatus and storage medium of the invention can by the initial retrieval word (medical name) to input into Row extension, obtains all same or associated medical name, so as to obtain more comprehensively and accurate search result, It avoids generating the case where omitting search result;The time that user inputs whole terms can be saved simultaneously, it is convenient and practical.
Further, because medicine word be constantly in development, using method, apparatus and storage medium of the invention, It is adapted to the medicine dictionary continued to develop.
Detailed description of the invention
Upon reading the detailed description of non-limiting embodiments with reference to the following drawings, other feature of the invention, Objects and advantages will become more apparent upon:
Fig. 1 shows the flow chart of the term extended method for medical retrieval of one embodiment of the invention;
Fig. 2 shows the flow charts of the term extended method for medical retrieval of another embodiment of the present invention;
The flow chart for the method that the vector that each word is obtained ahead of time that Fig. 3 shows one embodiment of the invention indicates;
Fig. 4 shows the flow chart of the term extended method for medical retrieval of further embodiment of this invention;
Fig. 5 shows the schematic diagram of the term expanding unit for medical retrieval of one embodiment of the invention;
Fig. 6 shows the schematic diagram of the term expanding unit for medical retrieval of another embodiment of the present invention;
Fig. 7 shows the schematic diagram of the term expanding unit for medical retrieval of further embodiment of this invention.
Specific embodiment
The present invention is described in detail combined with specific embodiments below.Following embodiment will be helpful to the technology of this field Personnel further understand the present invention, but the invention is not limited in any way.It should be pointed out that the ordinary skill of this field For personnel, without departing from the inventive concept of the premise, various modifications and improvements can be made.These belong to the present invention Protection scope.
Fig. 1 shows the flow chart of the term extended method for medical retrieval of one embodiment of the present of invention, such as Shown in Fig. 1, the term extended method for medical retrieval include at least step S01 to step S05, be discussed in detail as Under:
Step S01 is executed, several initial retrieval words are obtained;
In one embodiment of the invention, the initial retrieval word is medical name, may include multiple words, example Such as: flavones, lycopene, nutrient;The quantity of the initial retrieval word is more than or equal to 2.
In one embodiment of the invention, the initial retrieval word can be manually entered acquisition, be manually entered Inquiry language can be through the input of the input equipments such as keyboard, touch screen, be also possible to through the input of the other modes such as voice. The initial retrieval word is also possible to through the other modes acquisition in addition to being manually entered, such as is obtained by other algorithms It takes.
Step S02 is executed, is calculated separately and the most similar new word of several described initial retrieval term vectors.
In one embodiment of the invention, calculate separately it is most similar new with several described initial retrieval term vectors Before word, the vector that each word is obtained ahead of time is indicated.
Fig. 2 shows the flow chart of the term extended method for medical retrieval of another embodiment of the present invention, such as Fig. 2 It is shown, before step S01, step S00 is first carried out, the vector that each word is obtained ahead of time indicates.
The method that the vector for obtaining each word indicates, which can be, is embedded in vector (word embedding) using word Algorithm calculates the vector expression for obtaining each word.
Word insertion vector (word embedding) technology is an important concept inside NLP (natural language processing), Can use word insertion vector (word embedding) technology indicates the vector that a word is converted into regular length, thus It is convenient for Mathematical treatment.
The vector that step S00 through this embodiment obtains each word indicates, it can step S02 neutralization is calculated The most similar new word of the initial retrieval term vector.
The flow chart for the method that the vector that each word is obtained ahead of time that Fig. 3 shows one embodiment of the invention indicates, such as Shown in Fig. 3, in the present embodiment, the step S00 at least specifically includes the following steps:
Execute step S001, select a related fields, and select several correlation documents of the related fields with And several search terms.
In one embodiment of the invention, it is fixed to can be cardiomyopathy, deep learning algorithm or text for the related fields Position algorithm etc..
In one embodiment of the invention, described several correlation documents for selecting the related fields and several A search term can be by manually selecting, and can also select by other means.
Step S002 is executed, document databse is chosen.
In one embodiment of the invention, in order to obtain better effect, document databses more as far as possible is chosen.The document Library not only includes related fields, also may include other field.The document databse can be obtained from current source data set of opening, It can also obtain by other means.
Step S003 is executed, vector (word embedding) algorithm is embedded in using word to the document databse, obtains document The vector of target word indicates in library.
Since institute's predicate insertion vector (word embedding) technology belongs to the prior art, here no longer specifically It is bright.
The vector of the available each word of the method for the step S001 to S003 of an embodiment indicates through the invention, by This can execute step S02, calculate and the most similar new word of several described initial retrieval term vectors.
It should be noted that restriction step S00's of the invention executes sequence, in other embodiment of the invention In, step S00 can also be executed again after executing the step S01.
Step S03 is executed, constructs a vector space, institute with several described initial retrieval words and all new words Stating vector space includes several described initial retrieval words and all new words, is connected with line any in the vector space Initial retrieval word and with the immediate all new words of its vector.
Step S04 is executed, the score of all new words, the score of the new word and the new word and institute are calculated The item number for stating the connected line of initial retrieval word is directly proportional.
In one embodiment of the invention, if the new word and any initial retrieval word have one to be connected Line then plus one point may finally obtain the score of all new words.
Step S05 is executed, extension dictionary is obtained, the extension dictionary is that score is higher than preset threshold in the new word Part.
In one embodiment of the invention, when first time constructing search key, human expert participation is allowed to sentence Whether the new word that breaks belongs to extension dictionary (i.e. whether related to initial retrieval word), available according to the judgement of human expert Optimal cutling method, it is hereby achieved that preset threshold.
In other embodiments of the invention, preset threshold can also be obtained by other means, for example user is allowed to exist It is selected when use.Such as: when user needs more comprehensively more extension dictionary, default threshold can be turned down when in use Value;When user needs less but more accurately extension dictionary, threshold can be turned up when in use.
The method of an embodiment through the invention can screen the new word, filter out in new word with institute State the more relevant part of initial retrieval word.Higher than the new word of the preset threshold be exactly we need with the initial inspection The relevant extension dictionary of rope word is exactly less relevant to the initial retrieval word lower than the new word of the preset threshold.
Fig. 4 shows the flow chart of the term extended method for medical retrieval of another embodiment of the present invention, such as Shown in Fig. 4, the method for another embodiment of the present invention is at least included the following steps:
Wherein step S01 executes step S06, judgement after executing the step S05 to S05 and same as described above Whether need to carry out next iteration.
If so, continue to execute step S01 using the extension dictionary as the initial retrieval word, changed next time Generation.
If it is not, then terminating.
In one embodiment of this invention, it is described judge whether to need to carry out next iteration can be obtained by user Judged after dictionary must be extended, can also preset the number of iterations or other terminates the condition of iteration.
By the process of iteration, available more extension dictionaries, in the process, human expert can also be participated in, Constantly optimize the preset threshold.
Fig. 5 shows the schematic diagram of the term expanding unit 100 for medical retrieval of one embodiment of the invention, such as schemes Shown in 5, the term expanding unit 100 for medical retrieval is included at least:
Acquiring unit 01, for obtaining several initial retrieval words.
In one embodiment of this invention, the acquiring unit can be including but not limited to keyboard, mouse, electronics instruction The computer input devices such as pen or touch screen.
First computing unit 02 is connected with the acquiring unit 01, for calculating separately and several described initial retrievals The most similar new word of term vector.
Vector space construction unit 03 is connected with first computing unit 02, for several described initial retrievals Word and all new words construct a vector space, and the vector space includes several described initial retrieval words and owns The new word, with line connect in the vector space any initial retrieval word and with the immediate all new lists of its vector Word.
Second computing unit 04 is connected with the vector space construction unit 03, for calculating all new words Score, the score of the new word are directly proportional to the item number of line that the new word is connected with the initial retrieval word.
In one embodiment of the invention, second computing unit 04 is used to calculate obtaining for all new words Point, if the new word has the line being connected, plus one point with any initial retrieval word, all institutes may finally be obtained State the score of new word.
Screening unit 05 is connected with second computing unit 04, and for screening the new word, acquisition extends dictionary, The extension dictionary is the part that score is higher than preset threshold in the new word.
In one embodiment of the invention, when first time constructing search key, human expert participation is allowed to sentence Whether the new word that breaks belongs to extension dictionary (i.e. whether related to initial retrieval word), available according to the judgement of human expert Optimal cutling method, it is hereby achieved that preset threshold.Preset threshold can also be obtained by other means, for example allows user It is selected when in use: when user needs more comprehensively more extension dictionary, preset threshold can be turned down when in use; When user needs less but more accurately extension dictionary, threshold value can be turned up when in use.
Fig. 6 shows the schematic diagram of the term expanding unit for medical retrieval of further embodiment of this invention, such as Fig. 6 Shown, described device further includes pretreatment unit 00, and the pretreatment unit 00 is connected with first computing unit 02, is used for The vector that each word is obtained ahead of time indicates.
In one embodiment of the invention, the pretreatment unit 00 uses word insertion vector (word embedding) The vector that algorithm obtains each word indicates.
Fig. 7 shows the schematic diagram of the term expanding unit for medical retrieval of another embodiment of the present invention, such as Fig. 7 Shown, described device further includes iteration unit 06, and the iteration unit 06 distinguishes phase with the screening unit 04, acquiring unit 01 Even, it needs to carry out next iteration for judging whether;If so, using the extension dictionary as the initial retrieval word, after It is continuous to carry out next iteration;If it is not, then terminating.
In one embodiment of this invention, it is described judge whether to need to carry out next iteration can be obtained by user Judged after dictionary must be extended, the number of iterations can also be preset or terminates the condition of iteration.
It should be noted that although being referred to several units for acting the device executed in being described in detail above, It is that this division is not enforceable.In fact, according to presently filed embodiment, two or more above-described units Feature and function can embody in a unit.Conversely, the feature and function of an above-described unit can be into one Step, which is divided by multiple units, to be embodied.
The embodiment of the present invention also provides a kind of computer readable storage medium, and the computer readable storage medium can be Included in device described in above-described embodiment;It is also possible to individualism, and without in supplying described device.The meter Calculation machine readable medium carries one or more program, when one or more of programs are executed by processor by one When, realize method described in above-described embodiment.
Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented Mode can also be realized by software realization in such a way that software is in conjunction with necessary hardware.Therefore, according to the application The technical solution of embodiment can be embodied in the form of software products, which can store non-volatile at one Property storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in or network on, including some instructions are so that a calculating Equipment (can be personal computer, server, touch control terminal or network equipment etc.) is executed according to the application embodiment Method.
Illustrate the term extended method and device for medical retrieval using above-described embodiment with three cases below The extension dictionary obtained after being extended to initial retrieval word:
Case 1
Initial ranging word is: flavones, lycopene, nutrient, phosphatide, nutrient, content be low, microelement, fibre Tie up element, amino acid, tea polyphenols, lysine, resveratrol, organized enzyme, organic acid, lecithin, water solubility, protein, polysaccharide, benefit Fill agent.
The method of an embodiment is extended initial ranging word through the invention, obtains following extension dictionary:
Lycopene, carrotene, thiamine, intake, polysaccharide, malic acid, flavones, phosphatide, vitamin, isoflavones, Active material chlorophyll, coenzyme, is rich in, oryzanol, lutein, histidine, omega, choline, linolenic acid, is rich in iron, reed Fourth, hesperidine, organized enzyme, sterol, amino acid, oleic acid, b6, b1, b2, b3, caffeine, mutase, inorganic salts, flavonoids, calcium Matter, ascorbic acid, b6, pantothenic acid, tartaric acid, fiber, resveratrol, taurine, pectin, vitamins, caffeine, solubility, Huang Letones, carbohydrate, linoleic acid, anthocyanidin, lactic acid, amylase, tea polyphenols, carotenoid lipid, fructose, color Propylhomoserin, lysine, mustard oil, minerals, in required, vegetalitas, folic acid, magnanimity, natto, ferment, replenishers, tannic acid, soybean Isoflavones, riboflavin, cellulose, alkaloid, 400iu, carbohydrate, polyphenol, cystine, rhodanate, antioxidant, has nutrition Machine acid, drops, glucose, niacin, supplement, b1:, dha, b12, ovum lecithin, protein, micro secondary element, niacin, contain Have, unsaturation, nutrient, water solubility, nutrient, vitamin, content are low, anti-oxidant, fatty acid.
Case two
Initial ranging word is: when Sino phenanthrene, azn, Shi Guibao, amgen, hundred, regeneron, gift come, 002198.
The method of an embodiment is extended initial ranging word through the invention, obtains following extension dictionary:
Sino phenanthrene, Novartis, Bo Linge, Novo Nordisk, Roche, A Silili, Pfizer, Mo Shadong, Baeyer, regeneron, Gift come, shire, Pfizer, Medtronic.
Case three
Initial ranging word is: radix polygonati officinalis, the root of Dahurain angelica, Radix Glycyrrhizae, feverfew, the fruit of glossy privet, schizonepeta, rhizoma zingiberis, Radix Angelicae Sinensis, cimicifugae foetidae, Radix Codonopsis, small Fennel, cape jasmine, radix paeoniae rubra, rhizoma alismatis, campanulaceae, Rhizoma Atractylodis Macrocephalae, Rhizoma Chuanxiong, Radix Ophiopogonis, fructus amomi.
The method of an embodiment is extended initial ranging word through the invention, obtains following extension dictionary:
Chinese gall, olibanum, 12g, rhizoma atractylodis, folium artemisiae argyi, Semen Cuscutae, garden burnet, monkshood, Rhizoma Atractylodis Macrocephalae, lycopodium calvatum, viola mandshurica, teasel root, Mountain cornel, ramulus mori, radix rehmanniae preparata, rhizoma zingiberis, turtle shell, herba taxilli, celestial spirit, radix bupleuri, smoked jujube, gypsum, the root bark of white mulberry, 18 grams, radix scrophulariae, gentianae macrophyllae, Radix achyranthis bidentatae, the tuber of pinellia, Cassia, the seed of cowherb, tortoiseshell, rhizoma corydalis, Radix Angelicae Sinensis, turpentine, radix rehmanniae recen, Cortex Phellodendri, great burdock achene, feverfew, official Osmanthus, the root of three-nerved spicebush, the coptis, radices trichosanthis, reed root, rhizoma acori graminei, Rhizoma Et Radix Notopterygii, monkshood, lopseed, radix polygonati officinalis, cortex moutan, Fructus Aurantii, Radix Salviae Miltiorrhizae, 61 Scattered, Sculellaria barbata, lophatherum gracile, the root of gansui, desert cistanche, radix achyranthis bidentatae, the root of Chinese clematis, Morinda officinalis, myrrh, agkistrodon, Rhizoma Chuanxiong, the tuber of dwarf lilyturf, thatch are real, extra large Wind rattan, evodia rutaecarpa, Semen Juglandis, fructus cannabis, in one's early teens, tussilago, kuh-seng, the seed of Oriental arborvitae, arbor-vitae, Radix Angelicae Pubescentis, Rehmannia glutinosa, the fruit of glossy privet, Tea before bombyx batryticatus, Chinese herbaceous peony, semen allii tuberosi, Caulis Spatholobi, oriental wormwood, semen momordicae, madder, rain, rhizoma imperatae, summer cypress, elscholtiza, fructus amomi, smilax, Asarum, mulberry fruit, rhizoma alismatis, cape jasmine, curcuma zedoary, caulis akebiae, campanulaceae, Chinese violet, scorpio, Cortex Magnoliae Officinalis, blackberry lily, Schisandra chinensis, the root of Dahurain angelica, Herba Cistanches, Radish seed, osmanthus heart, caulis bambusae in taenian, Longstamen Onion Bulb, psoralea corylifolia, asparagus fern, excrementum pteropi, the bletilla striata, rhizoma anemarrhenae, Yun Ling, radix paeoniae rubra, mulberry skin, shaggy-fruited dittany, Gao Liang Ginger, cimicifugae foetidae, nutmeg, field thistle, dried human placenta, Chinese ephedra, schizonepeta, Fructus Forsythiae, endothelium corneum gigeriae galli, talcum, radix pseudostellariae, the dried immature fruit of citron orange, the root bark of tree peony, money Grass, Radix Curcumae, frutus cnidii, ramulus cinnamomi, waxgourd seed, Radix Ophiopogonis, Fructus Corni, dutchmanspipe root, radix scutellariae, Eclipta prostrata, semen plantaginis, rhizoma polygonati, myotonin, Cortex acanthopanacis, luffa.
Pass through above case, it can be seen that the method and apparatus of the embodiment of the present invention have as follows compared with prior art Advantage:
Relevant medical name is not enough understood usually using person, a kind of title is only known for the same thing, without It sees whether that there are also other titles and easily causes missing inspection in retrieval even medical expert will not know all titles.Make With the method for the embodiment of the present invention can by being extended to the initial retrieval word (medical name) of input, obtain it is all and its Identical or associated medical name avoids generating the feelings for omitting search result so as to obtain more fully search result Condition.
In the prior art, user is more acurrate in order to obtain, comprehensive search result, needs to enumerate a medicine name Other the identical or associated names claimed are referred to as term, spend the time more, usage experience is also poor.And use the present invention The method user of embodiment only needs importation term, so that it may which extension obtains more terms, this can be saved The time that user inputs term is saved, is easy to use.
Specific embodiments of the present invention are described above.It is to be appreciated that the invention is not limited to above-mentioned Particular implementation, those skilled in the art can make various deformations or amendments within the scope of the claims, this not shadow Ring substantive content of the invention.

Claims (10)

1. a kind of term extended method for medical retrieval characterized by comprising
Obtain several initial retrieval words;
It calculates separately and the most similar new word of several described initial retrieval term vectors;
A vector space is constructed with several described initial retrieval words and all new words, the vector space includes institute State several initial retrieval words and all new words, with line connect in the vector space any initial retrieval word and and its The immediate all new words of vector;
The score of all new words is calculated, the score of the new word is connected with the new word with the initial retrieval word Line item number it is directly proportional;
Extension dictionary is obtained, the extension dictionary is the part that score is higher than preset threshold in the new word.
2. the method according to claim 1, wherein calculating separately and several described initial retrieval term vectors Before most similar new word, the vector that each word is obtained ahead of time is indicated.
3. according to the method described in claim 2, it is characterized in that, the method that the vector for obtaining each word indicates is word It is embedded in vector algorithm.
4. according to the method described in claim 3, it is characterized in that, the method for the term vector for obtaining each word includes:
A related fields is selected, and selectes several correlation documents and several search terms of the related fields;
Choose document databse;
Vector algorithm is embedded in using word to the document databse, the vector for obtaining target word in the document databse indicates.
5. the method according to claim 1, wherein if the method for calculating the score of all new words is institute Stating new word has the line being connected with any initial retrieval word, then plus one point.
6. the method according to claim 1, wherein judging whether to need to carry out down after obtaining extension dictionary An iteration,
If so, carrying out next iteration using the extension dictionary as the initial retrieval word;
If it is not, then terminating.
7. a kind of term expanding unit for medical retrieval characterized by comprising
Acquiring unit, for obtaining several initial retrieval words;
First computing unit is connected with the acquiring unit, for calculating separately with several described initial retrieval term vectors most Similar new word;
Vector space construction unit is connected with first computing unit, is used for several described initial retrieval words and owns The new word constructs a vector space, and the vector space includes several described initial retrieval words and all new lists Word, with line connect in the vector space any initial retrieval word and with the immediate all new words of its vector;
Second computing unit is connected with the vector space construction unit, described for calculating the score of all new words The score of new word is directly proportional to the item number of line that the new word is connected with the initial retrieval word;
Screening unit is connected with second computing unit, for screening the new word, obtains extension dictionary, the extension Dictionary is the part that score is higher than preset threshold in the new word.
8. device according to claim 7, which is characterized in that further include pretreatment unit, with first computing unit It is connected, the vector for each word to be obtained ahead of time indicates.
9. device according to claim 7, which is characterized in that it further include iteration unit, it is single with the screening unit, acquisition Member is respectively connected with, and needs to carry out next iteration for judging whether;If so, using the extension dictionary as the initial inspection Rope word, continues next iteration;If it is not, then terminating.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that described program is processed Any one of claim 1-6 the method is realized when device executes.
CN201910742880.2A 2019-08-13 2019-08-13 A kind of term extended method, device and storage medium for medical retrieval Pending CN110489526A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910742880.2A CN110489526A (en) 2019-08-13 2019-08-13 A kind of term extended method, device and storage medium for medical retrieval

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910742880.2A CN110489526A (en) 2019-08-13 2019-08-13 A kind of term extended method, device and storage medium for medical retrieval

Publications (1)

Publication Number Publication Date
CN110489526A true CN110489526A (en) 2019-11-22

Family

ID=68550679

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910742880.2A Pending CN110489526A (en) 2019-08-13 2019-08-13 A kind of term extended method, device and storage medium for medical retrieval

Country Status (1)

Country Link
CN (1) CN110489526A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11203289A (en) * 1998-01-16 1999-07-30 Fuji Xerox Co Ltd Associated retrieval expression retrieving device and computer readable recording medium storing associated retrieval expression retrieving program
CN103761263A (en) * 2013-12-31 2014-04-30 武汉传神信息技术有限公司 Method for recommending information for users
CN104516903A (en) * 2013-09-29 2015-04-15 北大方正集团有限公司 Keyword extension method and system and classification corpus labeling method and system
CN105653660A (en) * 2015-12-29 2016-06-08 云南电网有限责任公司电力科学研究院 Association method and device of retrieval keyword
CN108491462A (en) * 2018-03-05 2018-09-04 昆明理工大学 A kind of semantic query expansion method and device based on word2vec
CN109214004A (en) * 2018-09-06 2019-01-15 广州知弘科技有限公司 Big data processing method based on machine learning
CN109344400A (en) * 2018-09-18 2019-02-15 江苏润桐数据服务有限公司 A kind of judgment method and device of document storage

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11203289A (en) * 1998-01-16 1999-07-30 Fuji Xerox Co Ltd Associated retrieval expression retrieving device and computer readable recording medium storing associated retrieval expression retrieving program
CN104516903A (en) * 2013-09-29 2015-04-15 北大方正集团有限公司 Keyword extension method and system and classification corpus labeling method and system
CN103761263A (en) * 2013-12-31 2014-04-30 武汉传神信息技术有限公司 Method for recommending information for users
CN105653660A (en) * 2015-12-29 2016-06-08 云南电网有限责任公司电力科学研究院 Association method and device of retrieval keyword
CN108491462A (en) * 2018-03-05 2018-09-04 昆明理工大学 A kind of semantic query expansion method and device based on word2vec
CN109214004A (en) * 2018-09-06 2019-01-15 广州知弘科技有限公司 Big data processing method based on machine learning
CN109344400A (en) * 2018-09-18 2019-02-15 江苏润桐数据服务有限公司 A kind of judgment method and device of document storage

Similar Documents

Publication Publication Date Title
Sun et al. Application of mid-infrared spectroscopy in the quality control of traditional Chinese medicines
CN110289106A (en) A method of effect, which is analyzed, from Chinese medicine compound prescription corresponds to Chinese medicine and its pharmacological property compatibility relationship
CN102451126B (en) Hair-growing cosmetic composition, and preparation method thereof
CN109903854A (en) A kind of core drug recognition methods based on TCM Literature
Long et al. A combination system for prediction of Chinese Materia Medica properties
Zhang et al. Medication regularity of pulmonary fibrosis treatment by contemporary traditional Chinese medicine experts based on data mining
CN109947901A (en) Prescription Effect prediction technique based on multi-layer perception (MLP) and natural language processing technique
CN110489526A (en) A kind of term extended method, device and storage medium for medical retrieval
Xing et al. Research on image recognition technology of traditional Chinese medicine based on deep transfer learning
Lee et al. The clinical review of Samgi-Halleak pharmacopuncture effects for insomnia & fatigue
Hu et al. Image recognition of Chinese herbal pieces based on multi-task learning model
Kim The Daily Dose and Decoct Method of Rhubarb in Treatise on Cold Damage Diseases
González-Obando et al. Five new species of the genus Euplocania Enderlein (Psocodea,‘Psocoptera’, Psocomorpha, Ptiloneuridae) from Colombia
CN104268656B (en) Method for assessing cooperativity and effect degree of various kinds of traditional Chinese medicine with same biological function on biological function and traditional Chinese medicine compound optimizing method
Li et al. An analysis and research of type-2 diabetes TCM records based on text mining
Lee Research on changes in the condition of eyelashes in cosmetics containing peptides
Tang et al. Basic theories and development of Miao medicine
Hamid et al. i-Herbs: An Expert System for Malaysian Herbs Identification Using Production Rules Approach
CN105853868A (en) Medicine for prolonging life
Yan et al. Design of knowledge graph of traditional Chinese medicine prescription and knowledge analysis of implicit relationship
Cheng et al. A Support Vector Machine Learning for the Upward and Downward Tendency Theory of Traditional Chinese Medicine
CN110060519A (en) A kind of Chinese medicine multimedia teaching apparatus and teaching method
Li et al. Data Exploration and Mining on Traditional Chinese Medicine
Huang Differentiation of the" yin-yang" properties of herbs with spleen-meridian tropism by chemical and pharmacological profilings
Li et al. A preliminary study of plant domain ontology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20191122