CN110489526A - A kind of term extended method, device and storage medium for medical retrieval - Google Patents
A kind of term extended method, device and storage medium for medical retrieval Download PDFInfo
- Publication number
- CN110489526A CN110489526A CN201910742880.2A CN201910742880A CN110489526A CN 110489526 A CN110489526 A CN 110489526A CN 201910742880 A CN201910742880 A CN 201910742880A CN 110489526 A CN110489526 A CN 110489526A
- Authority
- CN
- China
- Prior art keywords
- word
- new
- vector
- words
- initial retrieval
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3338—Query expansion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
Abstract
The present invention provides a kind of term extended method, device and storage medium for medical retrieval, which comprises obtains several initial retrieval words;It calculates separately and the most similar new word of several described initial retrieval term vectors;Construct a vector space, the vector space includes several described initial retrieval words and all new words, with line connect in the vector space any initial retrieval word and with the immediate all new words of its vector;The score of all new words is calculated, the score of the new word is directly proportional to the item number of line that the new word is connected with the initial retrieval word;Extension dictionary is obtained, the extension dictionary is the part that score is higher than preset threshold in the new word.Any one medical name of user's input may be implemented using method, apparatus and storage medium of the invention, numerous identical or associated medical name is expanded, to be more convenient, comprehensively carry out medical retrieval.
Description
Technical field
The present invention relates to medical retrieval fields, and in particular, to a kind of term extended method for medical retrieval, dress
It sets and storage medium.
Background technique
On the internet when the retrieval of progress medical literature or medical information, because medical vocabulary is more professional, common people
It is less susceptible to grasp, even the doctor of profession, they also just may only know a kind of title for a technology, if also
Other titles also not necessarily understand;Meanwhile when carrying out medical retrieval, more fully search result, is generally required in order to obtain
Enumerate out other identical or associated titles of a medical name, it is also desirable to spend the more time.
Therefore need a kind of method, any one medical name of user's input may be implemented, expand it is numerous identical or
Associated medical name, so as to be more convenient, comprehensively carry out medical retrieval.
Through retrieving, application No. is 201610383323.2 Chinese invention applications, and it discloses a kind of data processing methods
And device, method include: that the document comprising Medical Statistics method is obtained from target database as target data source;From mesh
The expansion word that target word and target word are obtained in data source is marked, the expansion word of target word is to have identical semantic or pass with target word
The word of connection relationship;The word obtained from target data source is constructed into semantic dictionary;By semantic analysis by the word in semantic dictionary
Sorted out, and the word after classification is stored to storage unit, the word in each classification has identical semantic or incidence relation;When
When receiving the term that user is inputted by search interface, obtained from target data source based on the word stored in storage unit
Target literature corresponding with term simultaneously exports.
Above-mentioned patent refers to " expansion word of target word is to have identical semantic or incidence relation word with target word ", but does not have
It is described how convenient, the comprehensive and word that is accurately expanded.
Summary of the invention
For the defects in the prior art, the object of the present invention is to provide a kind of term extension sides for medical retrieval
Method and device can be convenient, be accurately extended comprehensively to the term of medical retrieval.
According to an aspect of the present invention, a kind of term extended method for medical retrieval, including following step are provided
It is rapid:
Obtain several initial retrieval words;
It calculates separately and the most similar new word of several described initial retrieval term vectors;
A vector space, the vector space packet are constructed with several described initial retrieval words and all new words
Include several described initial retrieval words and all new words, with line connect in the vector space any initial retrieval word and
With the immediate all new words of its vector;
Calculate the score of all new words, the score of the new word and the new word and the initial retrieval word
The item number of connected line is directly proportional;
Extension dictionary is obtained, the extension dictionary is the part that score is higher than preset threshold in the new word.
Preferably, it is calculating separately with before the most similar new word of several described initial retrieval term vectors, is obtaining in advance
The vector for obtaining each word indicates.
Preferably, the method that the vector for obtaining each word indicates is that word is embedded in vector algorithm.
Preferably, the term vector for obtaining each word method the following steps are included:
A related fields is selected, and selectes several correlation documents and several search of the related fields
Word;
Choose document databse;
Vector algorithm is embedded in using word to the document databse, the vector for obtaining target word in document databse indicates.
Preferably, if the method for calculating the score of all new words is the new word and any initial retrieval
Word has a connected line, then plus one point.
Preferably, after obtaining extension dictionary, judge whether to need to carry out next iteration,
If so, carrying out next iteration using the extension dictionary as the initial retrieval word;
If it is not, then terminating.
According to an aspect of the present invention, a kind of term expanding unit for medical retrieval is provided, comprising:
Acquiring unit, for obtaining several initial retrieval words;
First computing unit is connected with the acquiring unit, for calculate separately with several described initial retrieval words to
Measure most similar new word;
Vector space construction unit is connected with first computing unit, for several described initial retrieval words and
All new words construct a vector space, and the vector space includes several described initial retrieval words and all described
New word, with line connect in the vector space any initial retrieval word and with the immediate all new words of its vector;
Second computing unit is connected with the vector space construction unit, for calculating the score of all new words,
The score of the new word is directly proportional to the item number of line that the new word is connected with the initial retrieval word;
Screening unit is connected with second computing unit, for screening the new word, obtains and extends dictionary, described
Extending dictionary is the part that score is higher than preset threshold in the new word.
Preferably, further include pretreatment unit, be connected with first computing unit, for each word to be obtained ahead of time
Vector indicates.
Preferably, further include iteration unit, be respectively connected with the screening unit, acquiring unit, needed for judging whether
Carry out next iteration;If so, continuing next iteration using the extension dictionary as the initial retrieval word;
If it is not, then terminating.
A method according to the present invention, provides a kind of computer readable storage medium, is stored thereon with computer program,
The step of any one of claim 1-6 the method is realized when described program is executed by processor.
Compared with prior art, the present invention have it is following the utility model has the advantages that
Using method, apparatus and storage medium of the invention can by the initial retrieval word (medical name) to input into
Row extension, obtains all same or associated medical name, so as to obtain more comprehensively and accurate search result,
It avoids generating the case where omitting search result;The time that user inputs whole terms can be saved simultaneously, it is convenient and practical.
Further, because medicine word be constantly in development, using method, apparatus and storage medium of the invention,
It is adapted to the medicine dictionary continued to develop.
Detailed description of the invention
Upon reading the detailed description of non-limiting embodiments with reference to the following drawings, other feature of the invention,
Objects and advantages will become more apparent upon:
Fig. 1 shows the flow chart of the term extended method for medical retrieval of one embodiment of the invention;
Fig. 2 shows the flow charts of the term extended method for medical retrieval of another embodiment of the present invention;
The flow chart for the method that the vector that each word is obtained ahead of time that Fig. 3 shows one embodiment of the invention indicates;
Fig. 4 shows the flow chart of the term extended method for medical retrieval of further embodiment of this invention;
Fig. 5 shows the schematic diagram of the term expanding unit for medical retrieval of one embodiment of the invention;
Fig. 6 shows the schematic diagram of the term expanding unit for medical retrieval of another embodiment of the present invention;
Fig. 7 shows the schematic diagram of the term expanding unit for medical retrieval of further embodiment of this invention.
Specific embodiment
The present invention is described in detail combined with specific embodiments below.Following embodiment will be helpful to the technology of this field
Personnel further understand the present invention, but the invention is not limited in any way.It should be pointed out that the ordinary skill of this field
For personnel, without departing from the inventive concept of the premise, various modifications and improvements can be made.These belong to the present invention
Protection scope.
Fig. 1 shows the flow chart of the term extended method for medical retrieval of one embodiment of the present of invention, such as
Shown in Fig. 1, the term extended method for medical retrieval include at least step S01 to step S05, be discussed in detail as
Under:
Step S01 is executed, several initial retrieval words are obtained;
In one embodiment of the invention, the initial retrieval word is medical name, may include multiple words, example
Such as: flavones, lycopene, nutrient;The quantity of the initial retrieval word is more than or equal to 2.
In one embodiment of the invention, the initial retrieval word can be manually entered acquisition, be manually entered
Inquiry language can be through the input of the input equipments such as keyboard, touch screen, be also possible to through the input of the other modes such as voice.
The initial retrieval word is also possible to through the other modes acquisition in addition to being manually entered, such as is obtained by other algorithms
It takes.
Step S02 is executed, is calculated separately and the most similar new word of several described initial retrieval term vectors.
In one embodiment of the invention, calculate separately it is most similar new with several described initial retrieval term vectors
Before word, the vector that each word is obtained ahead of time is indicated.
Fig. 2 shows the flow chart of the term extended method for medical retrieval of another embodiment of the present invention, such as Fig. 2
It is shown, before step S01, step S00 is first carried out, the vector that each word is obtained ahead of time indicates.
The method that the vector for obtaining each word indicates, which can be, is embedded in vector (word embedding) using word
Algorithm calculates the vector expression for obtaining each word.
Word insertion vector (word embedding) technology is an important concept inside NLP (natural language processing),
Can use word insertion vector (word embedding) technology indicates the vector that a word is converted into regular length, thus
It is convenient for Mathematical treatment.
The vector that step S00 through this embodiment obtains each word indicates, it can step S02 neutralization is calculated
The most similar new word of the initial retrieval term vector.
The flow chart for the method that the vector that each word is obtained ahead of time that Fig. 3 shows one embodiment of the invention indicates, such as
Shown in Fig. 3, in the present embodiment, the step S00 at least specifically includes the following steps:
Execute step S001, select a related fields, and select several correlation documents of the related fields with
And several search terms.
In one embodiment of the invention, it is fixed to can be cardiomyopathy, deep learning algorithm or text for the related fields
Position algorithm etc..
In one embodiment of the invention, described several correlation documents for selecting the related fields and several
A search term can be by manually selecting, and can also select by other means.
Step S002 is executed, document databse is chosen.
In one embodiment of the invention, in order to obtain better effect, document databses more as far as possible is chosen.The document
Library not only includes related fields, also may include other field.The document databse can be obtained from current source data set of opening,
It can also obtain by other means.
Step S003 is executed, vector (word embedding) algorithm is embedded in using word to the document databse, obtains document
The vector of target word indicates in library.
Since institute's predicate insertion vector (word embedding) technology belongs to the prior art, here no longer specifically
It is bright.
The vector of the available each word of the method for the step S001 to S003 of an embodiment indicates through the invention, by
This can execute step S02, calculate and the most similar new word of several described initial retrieval term vectors.
It should be noted that restriction step S00's of the invention executes sequence, in other embodiment of the invention
In, step S00 can also be executed again after executing the step S01.
Step S03 is executed, constructs a vector space, institute with several described initial retrieval words and all new words
Stating vector space includes several described initial retrieval words and all new words, is connected with line any in the vector space
Initial retrieval word and with the immediate all new words of its vector.
Step S04 is executed, the score of all new words, the score of the new word and the new word and institute are calculated
The item number for stating the connected line of initial retrieval word is directly proportional.
In one embodiment of the invention, if the new word and any initial retrieval word have one to be connected
Line then plus one point may finally obtain the score of all new words.
Step S05 is executed, extension dictionary is obtained, the extension dictionary is that score is higher than preset threshold in the new word
Part.
In one embodiment of the invention, when first time constructing search key, human expert participation is allowed to sentence
Whether the new word that breaks belongs to extension dictionary (i.e. whether related to initial retrieval word), available according to the judgement of human expert
Optimal cutling method, it is hereby achieved that preset threshold.
In other embodiments of the invention, preset threshold can also be obtained by other means, for example user is allowed to exist
It is selected when use.Such as: when user needs more comprehensively more extension dictionary, default threshold can be turned down when in use
Value;When user needs less but more accurately extension dictionary, threshold can be turned up when in use.
The method of an embodiment through the invention can screen the new word, filter out in new word with institute
State the more relevant part of initial retrieval word.Higher than the new word of the preset threshold be exactly we need with the initial inspection
The relevant extension dictionary of rope word is exactly less relevant to the initial retrieval word lower than the new word of the preset threshold.
Fig. 4 shows the flow chart of the term extended method for medical retrieval of another embodiment of the present invention, such as
Shown in Fig. 4, the method for another embodiment of the present invention is at least included the following steps:
Wherein step S01 executes step S06, judgement after executing the step S05 to S05 and same as described above
Whether need to carry out next iteration.
If so, continue to execute step S01 using the extension dictionary as the initial retrieval word, changed next time
Generation.
If it is not, then terminating.
In one embodiment of this invention, it is described judge whether to need to carry out next iteration can be obtained by user
Judged after dictionary must be extended, can also preset the number of iterations or other terminates the condition of iteration.
By the process of iteration, available more extension dictionaries, in the process, human expert can also be participated in,
Constantly optimize the preset threshold.
Fig. 5 shows the schematic diagram of the term expanding unit 100 for medical retrieval of one embodiment of the invention, such as schemes
Shown in 5, the term expanding unit 100 for medical retrieval is included at least:
Acquiring unit 01, for obtaining several initial retrieval words.
In one embodiment of this invention, the acquiring unit can be including but not limited to keyboard, mouse, electronics instruction
The computer input devices such as pen or touch screen.
First computing unit 02 is connected with the acquiring unit 01, for calculating separately and several described initial retrievals
The most similar new word of term vector.
Vector space construction unit 03 is connected with first computing unit 02, for several described initial retrievals
Word and all new words construct a vector space, and the vector space includes several described initial retrieval words and owns
The new word, with line connect in the vector space any initial retrieval word and with the immediate all new lists of its vector
Word.
Second computing unit 04 is connected with the vector space construction unit 03, for calculating all new words
Score, the score of the new word are directly proportional to the item number of line that the new word is connected with the initial retrieval word.
In one embodiment of the invention, second computing unit 04 is used to calculate obtaining for all new words
Point, if the new word has the line being connected, plus one point with any initial retrieval word, all institutes may finally be obtained
State the score of new word.
Screening unit 05 is connected with second computing unit 04, and for screening the new word, acquisition extends dictionary,
The extension dictionary is the part that score is higher than preset threshold in the new word.
In one embodiment of the invention, when first time constructing search key, human expert participation is allowed to sentence
Whether the new word that breaks belongs to extension dictionary (i.e. whether related to initial retrieval word), available according to the judgement of human expert
Optimal cutling method, it is hereby achieved that preset threshold.Preset threshold can also be obtained by other means, for example allows user
It is selected when in use: when user needs more comprehensively more extension dictionary, preset threshold can be turned down when in use;
When user needs less but more accurately extension dictionary, threshold value can be turned up when in use.
Fig. 6 shows the schematic diagram of the term expanding unit for medical retrieval of further embodiment of this invention, such as Fig. 6
Shown, described device further includes pretreatment unit 00, and the pretreatment unit 00 is connected with first computing unit 02, is used for
The vector that each word is obtained ahead of time indicates.
In one embodiment of the invention, the pretreatment unit 00 uses word insertion vector (word embedding)
The vector that algorithm obtains each word indicates.
Fig. 7 shows the schematic diagram of the term expanding unit for medical retrieval of another embodiment of the present invention, such as Fig. 7
Shown, described device further includes iteration unit 06, and the iteration unit 06 distinguishes phase with the screening unit 04, acquiring unit 01
Even, it needs to carry out next iteration for judging whether;If so, using the extension dictionary as the initial retrieval word, after
It is continuous to carry out next iteration;If it is not, then terminating.
In one embodiment of this invention, it is described judge whether to need to carry out next iteration can be obtained by user
Judged after dictionary must be extended, the number of iterations can also be preset or terminates the condition of iteration.
It should be noted that although being referred to several units for acting the device executed in being described in detail above,
It is that this division is not enforceable.In fact, according to presently filed embodiment, two or more above-described units
Feature and function can embody in a unit.Conversely, the feature and function of an above-described unit can be into one
Step, which is divided by multiple units, to be embodied.
The embodiment of the present invention also provides a kind of computer readable storage medium, and the computer readable storage medium can be
Included in device described in above-described embodiment;It is also possible to individualism, and without in supplying described device.The meter
Calculation machine readable medium carries one or more program, when one or more of programs are executed by processor by one
When, realize method described in above-described embodiment.
Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented
Mode can also be realized by software realization in such a way that software is in conjunction with necessary hardware.Therefore, according to the application
The technical solution of embodiment can be embodied in the form of software products, which can store non-volatile at one
Property storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in or network on, including some instructions are so that a calculating
Equipment (can be personal computer, server, touch control terminal or network equipment etc.) is executed according to the application embodiment
Method.
Illustrate the term extended method and device for medical retrieval using above-described embodiment with three cases below
The extension dictionary obtained after being extended to initial retrieval word:
Case 1
Initial ranging word is: flavones, lycopene, nutrient, phosphatide, nutrient, content be low, microelement, fibre
Tie up element, amino acid, tea polyphenols, lysine, resveratrol, organized enzyme, organic acid, lecithin, water solubility, protein, polysaccharide, benefit
Fill agent.
The method of an embodiment is extended initial ranging word through the invention, obtains following extension dictionary:
Lycopene, carrotene, thiamine, intake, polysaccharide, malic acid, flavones, phosphatide, vitamin, isoflavones,
Active material chlorophyll, coenzyme, is rich in, oryzanol, lutein, histidine, omega, choline, linolenic acid, is rich in iron, reed
Fourth, hesperidine, organized enzyme, sterol, amino acid, oleic acid, b6, b1, b2, b3, caffeine, mutase, inorganic salts, flavonoids, calcium
Matter, ascorbic acid, b6, pantothenic acid, tartaric acid, fiber, resveratrol, taurine, pectin, vitamins, caffeine, solubility, Huang
Letones, carbohydrate, linoleic acid, anthocyanidin, lactic acid, amylase, tea polyphenols, carotenoid lipid, fructose, color
Propylhomoserin, lysine, mustard oil, minerals, in required, vegetalitas, folic acid, magnanimity, natto, ferment, replenishers, tannic acid, soybean
Isoflavones, riboflavin, cellulose, alkaloid, 400iu, carbohydrate, polyphenol, cystine, rhodanate, antioxidant, has nutrition
Machine acid, drops, glucose, niacin, supplement, b1:, dha, b12, ovum lecithin, protein, micro secondary element, niacin, contain
Have, unsaturation, nutrient, water solubility, nutrient, vitamin, content are low, anti-oxidant, fatty acid.
Case two
Initial ranging word is: when Sino phenanthrene, azn, Shi Guibao, amgen, hundred, regeneron, gift come, 002198.
The method of an embodiment is extended initial ranging word through the invention, obtains following extension dictionary:
Sino phenanthrene, Novartis, Bo Linge, Novo Nordisk, Roche, A Silili, Pfizer, Mo Shadong, Baeyer, regeneron,
Gift come, shire, Pfizer, Medtronic.
Case three
Initial ranging word is: radix polygonati officinalis, the root of Dahurain angelica, Radix Glycyrrhizae, feverfew, the fruit of glossy privet, schizonepeta, rhizoma zingiberis, Radix Angelicae Sinensis, cimicifugae foetidae, Radix Codonopsis, small
Fennel, cape jasmine, radix paeoniae rubra, rhizoma alismatis, campanulaceae, Rhizoma Atractylodis Macrocephalae, Rhizoma Chuanxiong, Radix Ophiopogonis, fructus amomi.
The method of an embodiment is extended initial ranging word through the invention, obtains following extension dictionary:
Chinese gall, olibanum, 12g, rhizoma atractylodis, folium artemisiae argyi, Semen Cuscutae, garden burnet, monkshood, Rhizoma Atractylodis Macrocephalae, lycopodium calvatum, viola mandshurica, teasel root,
Mountain cornel, ramulus mori, radix rehmanniae preparata, rhizoma zingiberis, turtle shell, herba taxilli, celestial spirit, radix bupleuri, smoked jujube, gypsum, the root bark of white mulberry, 18 grams, radix scrophulariae, gentianae macrophyllae,
Radix achyranthis bidentatae, the tuber of pinellia, Cassia, the seed of cowherb, tortoiseshell, rhizoma corydalis, Radix Angelicae Sinensis, turpentine, radix rehmanniae recen, Cortex Phellodendri, great burdock achene, feverfew, official
Osmanthus, the root of three-nerved spicebush, the coptis, radices trichosanthis, reed root, rhizoma acori graminei, Rhizoma Et Radix Notopterygii, monkshood, lopseed, radix polygonati officinalis, cortex moutan, Fructus Aurantii, Radix Salviae Miltiorrhizae, 61
Scattered, Sculellaria barbata, lophatherum gracile, the root of gansui, desert cistanche, radix achyranthis bidentatae, the root of Chinese clematis, Morinda officinalis, myrrh, agkistrodon, Rhizoma Chuanxiong, the tuber of dwarf lilyturf, thatch are real, extra large
Wind rattan, evodia rutaecarpa, Semen Juglandis, fructus cannabis, in one's early teens, tussilago, kuh-seng, the seed of Oriental arborvitae, arbor-vitae, Radix Angelicae Pubescentis, Rehmannia glutinosa, the fruit of glossy privet,
Tea before bombyx batryticatus, Chinese herbaceous peony, semen allii tuberosi, Caulis Spatholobi, oriental wormwood, semen momordicae, madder, rain, rhizoma imperatae, summer cypress, elscholtiza, fructus amomi, smilax,
Asarum, mulberry fruit, rhizoma alismatis, cape jasmine, curcuma zedoary, caulis akebiae, campanulaceae, Chinese violet, scorpio, Cortex Magnoliae Officinalis, blackberry lily, Schisandra chinensis, the root of Dahurain angelica, Herba Cistanches,
Radish seed, osmanthus heart, caulis bambusae in taenian, Longstamen Onion Bulb, psoralea corylifolia, asparagus fern, excrementum pteropi, the bletilla striata, rhizoma anemarrhenae, Yun Ling, radix paeoniae rubra, mulberry skin, shaggy-fruited dittany, Gao Liang
Ginger, cimicifugae foetidae, nutmeg, field thistle, dried human placenta, Chinese ephedra, schizonepeta, Fructus Forsythiae, endothelium corneum gigeriae galli, talcum, radix pseudostellariae, the dried immature fruit of citron orange, the root bark of tree peony, money
Grass, Radix Curcumae, frutus cnidii, ramulus cinnamomi, waxgourd seed, Radix Ophiopogonis, Fructus Corni, dutchmanspipe root, radix scutellariae, Eclipta prostrata, semen plantaginis, rhizoma polygonati, myotonin,
Cortex acanthopanacis, luffa.
Pass through above case, it can be seen that the method and apparatus of the embodiment of the present invention have as follows compared with prior art
Advantage:
Relevant medical name is not enough understood usually using person, a kind of title is only known for the same thing, without
It sees whether that there are also other titles and easily causes missing inspection in retrieval even medical expert will not know all titles.Make
With the method for the embodiment of the present invention can by being extended to the initial retrieval word (medical name) of input, obtain it is all and its
Identical or associated medical name avoids generating the feelings for omitting search result so as to obtain more fully search result
Condition.
In the prior art, user is more acurrate in order to obtain, comprehensive search result, needs to enumerate a medicine name
Other the identical or associated names claimed are referred to as term, spend the time more, usage experience is also poor.And use the present invention
The method user of embodiment only needs importation term, so that it may which extension obtains more terms, this can be saved
The time that user inputs term is saved, is easy to use.
Specific embodiments of the present invention are described above.It is to be appreciated that the invention is not limited to above-mentioned
Particular implementation, those skilled in the art can make various deformations or amendments within the scope of the claims, this not shadow
Ring substantive content of the invention.
Claims (10)
1. a kind of term extended method for medical retrieval characterized by comprising
Obtain several initial retrieval words;
It calculates separately and the most similar new word of several described initial retrieval term vectors;
A vector space is constructed with several described initial retrieval words and all new words, the vector space includes institute
State several initial retrieval words and all new words, with line connect in the vector space any initial retrieval word and and its
The immediate all new words of vector;
The score of all new words is calculated, the score of the new word is connected with the new word with the initial retrieval word
Line item number it is directly proportional;
Extension dictionary is obtained, the extension dictionary is the part that score is higher than preset threshold in the new word.
2. the method according to claim 1, wherein calculating separately and several described initial retrieval term vectors
Before most similar new word, the vector that each word is obtained ahead of time is indicated.
3. according to the method described in claim 2, it is characterized in that, the method that the vector for obtaining each word indicates is word
It is embedded in vector algorithm.
4. according to the method described in claim 3, it is characterized in that, the method for the term vector for obtaining each word includes:
A related fields is selected, and selectes several correlation documents and several search terms of the related fields;
Choose document databse;
Vector algorithm is embedded in using word to the document databse, the vector for obtaining target word in the document databse indicates.
5. the method according to claim 1, wherein if the method for calculating the score of all new words is institute
Stating new word has the line being connected with any initial retrieval word, then plus one point.
6. the method according to claim 1, wherein judging whether to need to carry out down after obtaining extension dictionary
An iteration,
If so, carrying out next iteration using the extension dictionary as the initial retrieval word;
If it is not, then terminating.
7. a kind of term expanding unit for medical retrieval characterized by comprising
Acquiring unit, for obtaining several initial retrieval words;
First computing unit is connected with the acquiring unit, for calculating separately with several described initial retrieval term vectors most
Similar new word;
Vector space construction unit is connected with first computing unit, is used for several described initial retrieval words and owns
The new word constructs a vector space, and the vector space includes several described initial retrieval words and all new lists
Word, with line connect in the vector space any initial retrieval word and with the immediate all new words of its vector;
Second computing unit is connected with the vector space construction unit, described for calculating the score of all new words
The score of new word is directly proportional to the item number of line that the new word is connected with the initial retrieval word;
Screening unit is connected with second computing unit, for screening the new word, obtains extension dictionary, the extension
Dictionary is the part that score is higher than preset threshold in the new word.
8. device according to claim 7, which is characterized in that further include pretreatment unit, with first computing unit
It is connected, the vector for each word to be obtained ahead of time indicates.
9. device according to claim 7, which is characterized in that it further include iteration unit, it is single with the screening unit, acquisition
Member is respectively connected with, and needs to carry out next iteration for judging whether;If so, using the extension dictionary as the initial inspection
Rope word, continues next iteration;If it is not, then terminating.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that described program is processed
Any one of claim 1-6 the method is realized when device executes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910742880.2A CN110489526A (en) | 2019-08-13 | 2019-08-13 | A kind of term extended method, device and storage medium for medical retrieval |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910742880.2A CN110489526A (en) | 2019-08-13 | 2019-08-13 | A kind of term extended method, device and storage medium for medical retrieval |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110489526A true CN110489526A (en) | 2019-11-22 |
Family
ID=68550679
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910742880.2A Pending CN110489526A (en) | 2019-08-13 | 2019-08-13 | A kind of term extended method, device and storage medium for medical retrieval |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110489526A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11203289A (en) * | 1998-01-16 | 1999-07-30 | Fuji Xerox Co Ltd | Associated retrieval expression retrieving device and computer readable recording medium storing associated retrieval expression retrieving program |
CN103761263A (en) * | 2013-12-31 | 2014-04-30 | 武汉传神信息技术有限公司 | Method for recommending information for users |
CN104516903A (en) * | 2013-09-29 | 2015-04-15 | 北大方正集团有限公司 | Keyword extension method and system and classification corpus labeling method and system |
CN105653660A (en) * | 2015-12-29 | 2016-06-08 | 云南电网有限责任公司电力科学研究院 | Association method and device of retrieval keyword |
CN108491462A (en) * | 2018-03-05 | 2018-09-04 | 昆明理工大学 | A kind of semantic query expansion method and device based on word2vec |
CN109214004A (en) * | 2018-09-06 | 2019-01-15 | 广州知弘科技有限公司 | Big data processing method based on machine learning |
CN109344400A (en) * | 2018-09-18 | 2019-02-15 | 江苏润桐数据服务有限公司 | A kind of judgment method and device of document storage |
-
2019
- 2019-08-13 CN CN201910742880.2A patent/CN110489526A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH11203289A (en) * | 1998-01-16 | 1999-07-30 | Fuji Xerox Co Ltd | Associated retrieval expression retrieving device and computer readable recording medium storing associated retrieval expression retrieving program |
CN104516903A (en) * | 2013-09-29 | 2015-04-15 | 北大方正集团有限公司 | Keyword extension method and system and classification corpus labeling method and system |
CN103761263A (en) * | 2013-12-31 | 2014-04-30 | 武汉传神信息技术有限公司 | Method for recommending information for users |
CN105653660A (en) * | 2015-12-29 | 2016-06-08 | 云南电网有限责任公司电力科学研究院 | Association method and device of retrieval keyword |
CN108491462A (en) * | 2018-03-05 | 2018-09-04 | 昆明理工大学 | A kind of semantic query expansion method and device based on word2vec |
CN109214004A (en) * | 2018-09-06 | 2019-01-15 | 广州知弘科技有限公司 | Big data processing method based on machine learning |
CN109344400A (en) * | 2018-09-18 | 2019-02-15 | 江苏润桐数据服务有限公司 | A kind of judgment method and device of document storage |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Sun et al. | Application of mid-infrared spectroscopy in the quality control of traditional Chinese medicines | |
CN110289106A (en) | A method of effect, which is analyzed, from Chinese medicine compound prescription corresponds to Chinese medicine and its pharmacological property compatibility relationship | |
CN102451126B (en) | Hair-growing cosmetic composition, and preparation method thereof | |
CN109903854A (en) | A kind of core drug recognition methods based on TCM Literature | |
Long et al. | A combination system for prediction of Chinese Materia Medica properties | |
Zhang et al. | Medication regularity of pulmonary fibrosis treatment by contemporary traditional Chinese medicine experts based on data mining | |
CN109947901A (en) | Prescription Effect prediction technique based on multi-layer perception (MLP) and natural language processing technique | |
CN110489526A (en) | A kind of term extended method, device and storage medium for medical retrieval | |
Xing et al. | Research on image recognition technology of traditional Chinese medicine based on deep transfer learning | |
Lee et al. | The clinical review of Samgi-Halleak pharmacopuncture effects for insomnia & fatigue | |
Hu et al. | Image recognition of Chinese herbal pieces based on multi-task learning model | |
Kim | The Daily Dose and Decoct Method of Rhubarb in Treatise on Cold Damage Diseases | |
González-Obando et al. | Five new species of the genus Euplocania Enderlein (Psocodea,‘Psocoptera’, Psocomorpha, Ptiloneuridae) from Colombia | |
CN104268656B (en) | Method for assessing cooperativity and effect degree of various kinds of traditional Chinese medicine with same biological function on biological function and traditional Chinese medicine compound optimizing method | |
Li et al. | An analysis and research of type-2 diabetes TCM records based on text mining | |
Lee | Research on changes in the condition of eyelashes in cosmetics containing peptides | |
Tang et al. | Basic theories and development of Miao medicine | |
Hamid et al. | i-Herbs: An Expert System for Malaysian Herbs Identification Using Production Rules Approach | |
CN105853868A (en) | Medicine for prolonging life | |
Yan et al. | Design of knowledge graph of traditional Chinese medicine prescription and knowledge analysis of implicit relationship | |
Cheng et al. | A Support Vector Machine Learning for the Upward and Downward Tendency Theory of Traditional Chinese Medicine | |
CN110060519A (en) | A kind of Chinese medicine multimedia teaching apparatus and teaching method | |
Li et al. | Data Exploration and Mining on Traditional Chinese Medicine | |
Huang | Differentiation of the" yin-yang" properties of herbs with spleen-meridian tropism by chemical and pharmacological profilings | |
Li et al. | A preliminary study of plant domain ontology |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20191122 |