CN104239500A - Method and device for establishing health-care food associated knowledge base - Google Patents

Method and device for establishing health-care food associated knowledge base Download PDF

Info

Publication number
CN104239500A
CN104239500A CN201410459501.6A CN201410459501A CN104239500A CN 104239500 A CN104239500 A CN 104239500A CN 201410459501 A CN201410459501 A CN 201410459501A CN 104239500 A CN104239500 A CN 104239500A
Authority
CN
China
Prior art keywords
health food
disease
description
similarity
corpus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410459501.6A
Other languages
Chinese (zh)
Other versions
CN104239500B (en
Inventor
曾刚
陆彬
李岱峰
伊凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201410459501.6A priority Critical patent/CN104239500B/en
Publication of CN104239500A publication Critical patent/CN104239500A/en
Application granted granted Critical
Publication of CN104239500B publication Critical patent/CN104239500B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • G06F16/212Schema design and management with details for data modelling support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2291User-Defined Types; Storage management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • G06F16/2448Query languages for particular applications; for extensibility, e.g. user defined types
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/2454Optimisation of common expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24554Unary operations; Data partitioning operations
    • G06F16/24556Aggregation; Duplicate elimination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The embodiment of the invention discloses a method and a device for establishing a health-care food associated knowledge base. The method for establishing the health-care food associated knowledge base comprises the following steps: searching a reference information source by utilizing the name keyword of a health-care food in a preset health-care food knowledge base; establishing a description language database about the health-care food by utilizing a research result of the reference information source; performing word segmentation on description language data of the health-care food in the description language database by utilizing a pre-configured disease dictionary; establishing an association relationship between the health-care food and a disease appearing in the description language data according to a word segmentation result. According to the method and the device for establishing the health-care food associated knowledge base, provided by the embodiment of the invention, detailed information about the health-care food can be provided for a user.

Description

Health food association knowledge base construction method and device
Technical field
The embodiment of the present invention relates to database technical field, particularly relates to a kind of health food association knowledge base construction method and device.
Background technology
According to the definition to health food in " the health food registration management way (trying) " of State Food and Drug Administration's promulgation in 2005, health food refers to claims to have specific health care or the food for the purpose of replenishing vitamins, mineral matter, namely be suitable for specific crowd to eat, there is adjustment body function, not for the purpose of disease therapy, and human body is not produced to the food of any acute, subacute or chronic hazard.According to the definition of health food, China's health food is divided into two classes: a class is the health food regulating body function.The health food that State Food and Drug Administration announces has 27 kinds of functions, has the function of the function of the function of the function of the function of develop immunitypty, alleviating physical fatigue, auxiliary antilipemic, auxiliary hyperglycemic, aided blood pressure-lowering, fat-reducing kinetic energy etc.; Another kind of is nutritious supplementary pharmaceutical, to supplement one or more vitamins, mineral matter and not to provide the product for the purpose of energy, its effect is the deficiency of complementary diets supply, there is the danger of some chronic disease in prevention nutritional deficiency and reduction, this series products is only limitted to replenishing vitamins and mineral matter.So first health food is food.But daily edible bread and cheese distinguishes health food with people again.It has the effect regulating body function to human body, often have booster action to the disease curing human body.
Ordinary consumer often wishes which disease can play the auxiliary effect cured to health food by internet checking to.But consumer, when searching for, generally can search out bulk information, and accuracy is low, is needed to do a large amount of screening operations, this makes consumer more be difficult to select.
Summary of the invention
In view of this, the embodiment of the present invention proposes a kind of health food association knowledge base construction method and device, so that improve accuracy and information sifting efficiency that consumer obtains health food information.
First aspect, embodiments provide a kind of health food association knowledge base construction method, described method comprises:
Utilize the name keyword retrieving reference information source of health food in preset health food knowledge base;
Utilization sets up the description corpus about described health food to the result for retrieval of described Reference Sources;
The description language material of pre-configured disease dictionary to health food described in described description corpus is utilized to carry out participle;
According to word segmentation result, set up the incidence relation between the disease that occurs in described health food and described description language material.
Second aspect, embodiments provide a kind of health food Association repository construction device, described device comprises:
Reference information retrieval module, for utilizing the name keyword retrieving reference information source of health food in preset health food knowledge base;
Module set up in corpus, sets up description corpus about described health food for utilizing to the result for retrieval of described Reference Sources;
Language material word-dividing mode, carries out participle for utilizing the description language material of pre-configured disease dictionary to health food described in described description corpus;
Module is set up in association, for according to word segmentation result, sets up the incidence relation between the disease that occurs in described health food and described description language material.
The health food association knowledge base construction method that the embodiment of the present invention provides and device are by setting up the incidence relation between health food and its disease with auxiliary treatment effect, process can be unified to the mass data of health food, the accuracy about health food information can be improved, and decrease the screening operation of user, improve efficiency and convenience that user obtains health food information.
Accompanying drawing explanation
By reading the detailed description done non-limiting example done with reference to the following drawings, other features, objects and advantages of the present invention will become more obvious:
Fig. 1 is the process flow diagram of the health food association knowledge base construction method that first embodiment of the invention provides;
Fig. 2 is the page figure about the webpage of the reference information of health food on the internet that provides of first embodiment of the invention;
Fig. 3 is the process flow diagram that in the health care video association knowledge base construction method that provides of first embodiment of the invention, corpus is set up;
Fig. 4 is the process flow diagram of the health food association knowledge base construction method that second embodiment of the invention provides;
Fig. 5 is being associated the schematic flow sheet of disease for describing in corpus the health food of not including of providing of second embodiment of the invention;
Fig. 6 is the process flow diagram of the health food association knowledge base construction method that third embodiment of the invention provides;
Fig. 7 is the structural drawing of the health food Association repository construction device that fourth embodiment of the invention provides.
Embodiment
Below in conjunction with drawings and Examples, the present invention is described in further detail.Be understandable that, specific embodiment described herein is only for explaining the present invention, but not limitation of the invention.It also should be noted that, for convenience of description, illustrate only part related to the present invention in accompanying drawing but not full content.
Fig. 1 to Fig. 3 shows the first embodiment of the present invention.
Fig. 1 is the process flow diagram of the health food association knowledge base construction method that first embodiment of the invention provides.See Fig. 1, described health food association knowledge base construction method comprises:
S110, utilizes the name keyword retrieving reference information source of health food in preset health food knowledge base.
Described health food knowledge base is the database of the preset key message about health food.There is the key message of health food according to different health food name storage in described health food knowledge base.The key message of described health food at least comprises the title of health food, also likely comprises the information such as the manufacturer of health food, brand name, product type.
Described Reference Sources is the information source of the reference information stored about health food.Described Reference Sources can be the webpage of the webpage about the reference information of health food on internet, such as Baidupedia.About the webpage of the reference information of health food on the internet that Fig. 2 shows.See Fig. 2, described webpage provide not only title, the manufacturer of health food, also can provide the more detailed information such as the function of health food.
In the present embodiment, from preset health food knowledge base, first obtain the title of health food, then utilize the title of described health food to retrieve described Reference Sources, to obtain the more information about described health food.Preferably, when selecting on internet about the webpage of the reference information of health food as described Reference Sources, can search engine retrieving internet web page be passed through, realize the retrieval to described Reference Sources.
S120, utilizes and sets up description corpus about described health food to the result for retrieval of described Reference Sources.
After described Reference Sources is retrieved, get the more information about described health food.Data based on the information got, set up the description corpus about described health food.Described description corpus is the database be described the details of described health food, wherein contains the description language material of different health food.And the description language material of health food can do detailed description to information such as the composition of health food, effects.
S130, utilizes the description language material of pre-configured disease dictionary to health food described in described description corpus to carry out participle.
Described disease dictionary is the data dictionary of the title of pre-configured record various disease.Described disease dictionary not only can record the disease name of some common diseases, also can record the title of some not too common diseases.If a disease has multiple title in Chinese, then described disease dictionary can carry out record respectively to the different names of this disease.
After the disease dictionary of pre-configured record disease name, described disease dictionary can be utilized to carry out participle to the description language material in described description corpus.Carrying out participle to described description language material is exactly be take word as the cutting result of unit by described description material segmentation.Then by comparing with the disease name of including in described disease dictionary, the disease name comprised in described description language material can be got.
S140, according to word segmentation result, sets up the incidence relation between the disease that occurs in described health food and described description language material.
If there is the disease name of a disease in the description language material of described health food, so illustrate that the treatment of this health food to described disease has booster action, therefore should set up the incidence relation between described health food and described disease.
Example, set up incidence relation between described health food and described disease can by health food and with its relevant disease according to the corresponding relation between them stored in same tables of data.
Fig. 3 is the process flow diagram that in the health care video association knowledge base construction method that provides of first embodiment of the invention, corpus is set up.See Fig. 3, preferably, the description corpus to the result for retrieval of described Reference Sources is set up about described health food is utilized to comprise:
S121, according to the paragraph heading in described Reference Sources, filters described result for retrieval, effectively describes corpus data to obtain.
Usually form with different paragraphs about the description language material of described health food in described Reference Sources, and different paragraphs has different titles usually.The general contents of this paragraph described Reference Sources can be learnt from the different titles of described description language material.
In different paragraph headings, the paragraph having some paragraph headings to indicate is usually very relevant to the description corpus data needing to obtain, and the paragraph indicated by other paragraph headings is then so not relevant to the description corpus data needing to obtain.The paragraph very relevant to the description corpus data needing to obtain is become and effectively describes corpus data.Such as, general very relevant to the description corpus data that needs obtain with the paragraph comprising " effect " or " effect " in paragraph heading, can as effectively describing corpus data.
Keyword such to similar " effect ", " effect " can be filtered the data obtained from described Reference Sources as title keyword, thus effectively described corpus data.
S122, merges the effective description corpus data about identical health food, thus sets up the description corpus about described health food.
After the described data obtained from described Reference Sources are filtered, get the effective corpus data that describes and may comprise some data segments.These data segments are merged, just can set up the description corpus about described health care video.
Concrete, can be to represent that the character string of different pieces of information section connects to the merging of described effective description corpus data, to form a new character string comprising different pieces of information section content.
The present embodiment is by utilizing the name keyword retrieving reference information source of health food in preset health food knowledge base, utilization sets up the description corpus about described health food to the result for retrieval of described Reference Sources, the description language material of pre-configured disease dictionary to health food described in described description corpus is utilized to carry out participle, according to word segmentation result, set up the incidence relation between the disease that occurs in described health food and described description language material, thus the incidence relation established between health food and its disease with auxiliary treatment effect, improve efficiency and convenience that user obtains health food information.
Fig. 4 and Fig. 5 shows the second embodiment of the present invention.
Fig. 4 is the process flow diagram of the health food association knowledge base construction method that second embodiment of the invention provides.Described health food association knowledge base construction method is based on first embodiment of the invention, further, after setting up during described health food and described description are expected the incidence relation between the disease that occurs, also comprise: calculate the similarity between health food and the health food of not including of having included in described description corpus according to the description field of health food; Similarity between the health food of not including if described and the health food of having included higher than the similarity threshold preset, then the incidence relation between the disease that the health food of not including described in setting up is associated with described health food of having included.
See Fig. 4, described health food association knowledge base construction method comprises:
S410, utilizes the name keyword retrieving reference information source of health food in preset health food knowledge base.
S420, utilizes and sets up description corpus about described health food to the result for retrieval of described Reference Sources.
S430, utilizes the description language material of pre-configured disease dictionary to health food described in described description corpus to carry out participle.
S440, according to word segmentation result, sets up the incidence relation between the disease that occurs in described health food and described description language material.
S450, calculates the similarity between health food and the health food of not including of having included in described description corpus according to the description field of health food.
The description field of described health food is the description field of health food described in described preset health food knowledge base.Described description field comprises the name field of described health food, effect field and components field.
The data of including of described Reference Sources itself are incomplete, and that is Reference Sources itself has the problem of Sparse, and not all health food can find corresponding record in described Reference Sources.Therefore, after setting up the description corpus of described health food, what the health food in described preset health food knowledge base had is incorporated in described description corpus, and what have is not incorporated in described description corpus.And the health food be not incorporated in described description corpus just cannot be associated with disease data.
In the health food of not including in described description corpus, some is identical with the health food of including or effect is quite similar.In order to the health food and associating between disease that describe and do not include in corpus can be set up, calculate the similarity between health food and the health food of having included of not including in described description corpus.
Preferably, the description vectors of the health food of not including described in the description field of the health food of not including can being formed, and the description vectors of the health food of having included described in the description field of the health food of having included is formed.Then, the similarity between health food and the health food of having included of not including is calculated according to the description vectors of the two.Further preferred, the similarity between described health food of not including and the health food of having included is cosine similarity.
The formula of the cosine similarity between the health food of not including described in calculating and the health food of having included is as follows:
similarity ( a , b ) = Σ i = 1 3 w i · sim ( a i , b i ) = Σ i = 1 3 w i · a i · b i | a i | | b i | ,
Wherein, similarity represents the similarity between health food a and health food b, a irepresent i-th element in the description vectors of health food a, b irepresent i-th element in the description vectors of health food b, w irepresent the weight between i-th element in two description vectors, sim (a i, b i) represent the similarity between i-th element in the description vectors of health food a and health food b.
S460, the similarity between the health food of not including if described and the health food of having included higher than the similarity threshold preset, then the incidence relation between the disease that the health food of not including described in setting up is associated with described health food of having included.
The similarity threshold of having included health food and not included between health food can be pre-set, when the similarity of not including health food and having included between health food calculated is higher than this similarity threshold, then do not include health food described in thinking and described health food of having included is closely similar.Therefore, do not include health food described in described disease association relation of having included health food can being copied to, namely add the incidence relation between disease that the health food of not including described in you is associated with described health food of having included.
Fig. 5 shows and to be associated the flow process of disease for describing in corpus the health food of not including.See Fig. 5, the health food 510 of having included has disease 512 associated therewith in the database set up.And for the health food 520 of not including, owing to not including the related data about it in described description corpus, the disease data relevant with it can not be excavated.But the health food 510 of having included and the health food 520 of not including have their respective description vectors 511,521.Utilize their description vectors 511,521 calculating similarity therebetween, and the association disease data 512 of described health food 510 of having included copies to described health food 520 of not including higher than during the similarity threshold preset by similarity between, just completes the excavation 522 of the association disease data to the health food 520 of not including.
The present embodiment is by after setting up the incidence relation between the disease occurred in described health food and described description language material, the similarity between health food and the health food of not including of having included in described description corpus is calculated according to the description field of health food, and when the similarity between described health food of not including and the health food of having included is higher than the similarity threshold preset, incidence relation between the disease that the health food of not including described in foundation is associated with described health food of having included, complete the health food and associating between disease that describe and do not include in language material, solve the Sparse Problem of Reference Sources.
Fig. 6 shows the third embodiment of the present invention.
Fig. 6 is the process flow diagram of the health food association knowledge base construction method that third embodiment of the invention provides.Described health food association knowledge base construction method is based on first embodiment of the invention, further, after setting up during described health food and described description are expected the incidence relation between the disease that occurs, also comprise: utilize symptom that described health food is corresponding and and described health food between the symptom of relevant disease, calculate the similarity between the same disease with it with incidence relation of described health food, and according to the similarity between described health food and described disease, the incidence relation between described health food and described disease is confirmed.
See Fig. 6, described health food association knowledge base construction method comprises:
S610, utilizes the name keyword retrieving reference information source of health food in preset health food knowledge base.
S620, utilizes and sets up description corpus about described health food to the result for retrieval of described Reference Sources.
S630, utilizes the description language material of pre-configured disease dictionary to health food described in described description corpus to carry out participle.
S640, according to word segmentation result, sets up the incidence relation between the disease that occurs in described health food and described description language material.
S650, utilize symptom that described health food is corresponding and and described health food between the symptom of relevant disease, calculate the similarity between the same disease with it with incidence relation of described health food, and according to the similarity between described health food and described disease, the incidence relation between described health food and described disease is confirmed.
Because the inaccurate situation of data may appear in the description language material of described health food, occur so the incidence relation between the health food set up and disease also may have the inaccurate situation of data.In the present embodiment, after establishing the incidence relation between described health food and described disease, the incidence relation between set up health food and disease is confirmed.
The confirmation of the incidence relation between described health food to described disease is needed according to the symptom data relevant with them.Suppose that disease d exists with health products h to associate, the set of the symptom that disease d is corresponding is S 1={ sym 1, sym 2..., sym n, sympotomatic set corresponding to health products h effect is combined into S 2={ sym 1, sym 2..., sym m, by calculating the correlativity of S1 and S2, obtain the similarity of disease d and health products h effect.
If S=S 1∪ S 2for disease and all symptoms corresponding to health products effect, the calculating formula of similarity derivation between described disease and described health products is as follows:
D KL ( d | | h ) = Σ i d ( i ) log d ( i ) h ( i ) + Σ i h ( i ) log h ( i ) d ( i ) 2 , i ∈ S .
Wherein, d (i)=num (i)/num (S), the total degree that num (i) occurs in the description document of disease d for symptom i, the number of times that num (S) occurs in the description document of disease d for all symptoms in S and.Similar, the total degree that h (i)=num (i) occurs in the description document of health products for symptom i, the number of times that num (S) occurs in the description document of health products h for all symptoms in S and.
The result of calculation of above-mentioned formula, namely D kL(d||h) degree of correlation of larger then disease and health products is lower.Threshold value can be adjusted according to actual conditions, the result of similarity lower than threshold value is deleted, to guarantee accuracy rate from the association of constructing.
The present embodiment is by collecting health food and symptom data corresponding to disease, the symptom corresponding according to described health food and symptom corresponding to described disease calculate the similarity between described health food and described disease, and according to the similarity between the described health food calculated and described disease, the incidence relation between described health food and described disease is confirmed, thus ensure that the accurate of incidence relation in health food Association repository, improve the confidence level of data.
Fig. 7 shows the fourth embodiment of the present invention.
Fig. 7 is the structural drawing of the health food Association repository construction device that fourth embodiment of the invention provides.See Fig. 7, described health food Association repository construction device comprises: reference information retrieval module 710, corpus are set up module 720, language material word-dividing mode 730 and association and set up module 740.
Described reference information retrieval module 710 is for utilizing the name keyword retrieving reference information source of health food in preset health food knowledge base.
Described corpus is set up module 720 and is set up description corpus about described health food for utilizing to the result for retrieval of described Reference Sources.
Described language material word-dividing mode 730 carries out participle for utilizing the description language material of pre-configured disease dictionary to health food described in described description corpus.
Described association sets up module 740 for according to word segmentation result, sets up the incidence relation between the disease that occurs in described health food and described description language material.
Preferably, described health food Association repository construction device also comprises: health food similarity calculation module 750 and incidence relation copy module 760.
Described health food similarity calculation module 750 is for according to word segmentation result, after setting up during described health food and described description are expected the incidence relation between the disease that occurs, calculate the similarity between health food and the health food of not including of having included in described description corpus according to the description field of health food.
Described incidence relation copy module 760 for the similarity between described health food of not including and the health food of having included higher than the similarity threshold preset when, then the incidence relation between the disease that the health food of not including described in setting up is associated with described health food of having included.
Preferably, described health food Association repository construction device also comprises: incidence relation confirms module 770.
Described incidence relation confirms that module 770 is for according to word segmentation result, after setting up during described health food and described description are expected the incidence relation between the disease that occurs, utilize symptom that described health food is corresponding and and described health food between the symptom of relevant disease, calculate the similarity between the same disease with it with incidence relation of described health food, and according to the similarity between described health food and described disease, the incidence relation between described health food and described disease is confirmed.
Preferably, described corpus is set up module 720 and is comprised: result for retrieval filter element 721 and data combination unit 722.
Described result for retrieval filter element 721, for according to the paragraph heading in described Reference Sources, filters described result for retrieval, effectively describes corpus data to obtain.
Described data combination unit 722 for being merged by the effective description corpus data about identical health food, thus sets up the description corpus about described health food.
Preferably, the description field of described health food comprises the name field of described health food, effect field and components field.
Preferably, the cosine similarity between the health food of having included described in the similarity between described health food of having included and the health food of not including comprises and the health food of not including.
The invention described above embodiment sequence number, just to describing, does not represent the quality of embodiment.
Those of ordinary skill in the art should be understood that, above-mentioned of the present invention each module or each step can realize with general calculation element, they can concentrate on single calculation element, or be distributed on network that multiple calculation element forms, alternatively, they can realize with the executable program code of computer installation, thus they storages can be performed by calculation element in the storage device, or they are made into each integrated circuit modules respectively, or the multiple module in them or step are made into single integrated circuit module to realize.Like this, the present invention is not restricted to the combination of any specific hardware and software.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and what each embodiment stressed is the difference with other embodiments, the same or analogous part between each embodiment mutually see.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, to those skilled in the art, the present invention can have various change and change.All do within spirit of the present invention and principle any amendment, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (12)

1. a health food association knowledge base construction method, is characterized in that, comprising:
Utilize the name keyword retrieving reference information source of health food in preset health food knowledge base;
Utilization sets up the description corpus about described health food to the result for retrieval of described Reference Sources;
The description language material of pre-configured disease dictionary to health food described in described description corpus is utilized to carry out participle;
According to word segmentation result, set up the incidence relation between the disease that occurs in described health food and described description language material.
2. method according to claim 1, is characterized in that, according to word segmentation result, after setting up during described health food and described description are expected the incidence relation between the disease that occurs, also comprises:
The similarity between health food and the health food of not including of having included in described description corpus is calculated according to the description field of health food;
Similarity between the health food of not including if described and the health food of having included higher than the similarity threshold preset, then the incidence relation between the disease that the health food of not including described in setting up is associated with described health food of having included.
3. method according to claim 1, is characterized in that, according to word segmentation result, after setting up during described health food and described description are expected the incidence relation between the disease that occurs, also comprises:
Utilize symptom that described health food is corresponding and and described health food between the symptom of relevant disease, calculate the similarity between the same disease with it with incidence relation of described health food, and according to the similarity between described health food and described disease, the incidence relation between described health food and described disease is confirmed.
4. according to the arbitrary described method of claim 1-3, it is characterized in that, utilize the description corpus to the result for retrieval of described Reference Sources is set up about described health food to comprise:
According to the paragraph heading in described Reference Sources, described result for retrieval is filtered, effectively describe corpus data to obtain;
Effective description corpus data about identical health food is merged, thus sets up the description corpus about described health food.
5. method according to claim 4, is characterized in that, the description field of described health food comprises the name field of described health food, effect field and components field.
6. method according to claim 4, is characterized in that, the cosine similarity between the health food of having included described in the similarity between described health food of having included and the health food of not including comprises and the health food of not including.
7. a health food Association repository construction device, is characterized in that, comprising:
Reference information retrieval module, for utilizing the name keyword retrieving reference information source of health food in preset health food knowledge base;
Module set up in corpus, sets up description corpus about described health food for utilizing to the result for retrieval of described Reference Sources;
Language material word-dividing mode, carries out participle for utilizing the description language material of pre-configured disease dictionary to health food described in described description corpus;
Module is set up in association, for according to word segmentation result, sets up the incidence relation between the disease that occurs in described health food and described description language material.
8. device according to claim 7, is characterized in that, also comprises:
Health food similarity calculation module, for according to word segmentation result, after setting up during described health food and described description are expected the incidence relation between the disease that occurs, calculate the similarity between health food and the health food of not including of having included in described description corpus according to the description field of health food;
Incidence relation copy module, for the similarity between described health food of not including and the health food of having included higher than when the similarity threshold preset, then the incidence relation between the disease that the health food of not including described in setting up is associated with described health food of having included.
9. device according to claim 7, is characterized in that, also comprises:
Incidence relation confirms module, for according to word segmentation result, after setting up during described health food and described description are expected the incidence relation between the disease that occurs, utilize symptom that described health food is corresponding and and described health food between the symptom of relevant disease, calculate the similarity between the same disease with it with incidence relation of described health food, and according to the similarity between described health food and described disease, the incidence relation between described health food and described disease is confirmed.
10., according to the arbitrary described device of claim 7-9, it is characterized in that, described corpus is set up module and is comprised:
Result for retrieval filter element, for according to the paragraph heading in described Reference Sources, filters described result for retrieval, effectively describes corpus data to obtain;
Data combination unit, for being merged by the effective description corpus data about identical health food, thus sets up the description corpus about described health food.
11. devices according to claim 10, is characterized in that, the description field of described health food comprises the name field of described health food, effect field and components field.
12. devices according to claim 10, is characterized in that, the cosine similarity between the health food of having included described in the similarity between described health food of having included and the health food of not including comprises and the health food of not including.
CN201410459501.6A 2014-09-10 2014-09-10 Health food association knowledge base construction method and device Active CN104239500B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410459501.6A CN104239500B (en) 2014-09-10 2014-09-10 Health food association knowledge base construction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410459501.6A CN104239500B (en) 2014-09-10 2014-09-10 Health food association knowledge base construction method and device

Publications (2)

Publication Number Publication Date
CN104239500A true CN104239500A (en) 2014-12-24
CN104239500B CN104239500B (en) 2017-10-27

Family

ID=52227559

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410459501.6A Active CN104239500B (en) 2014-09-10 2014-09-10 Health food association knowledge base construction method and device

Country Status (1)

Country Link
CN (1) CN104239500B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105956359A (en) * 2016-04-15 2016-09-21 陈杰 Medicine project name contrast translation method for heterogeneous system
WO2017028422A1 (en) * 2015-08-20 2017-02-23 小米科技有限责任公司 Knowledge base construction method and apparatus
WO2018228259A1 (en) * 2017-06-16 2018-12-20 阿里巴巴集团控股有限公司 Relationship diagram processing method and apparatus
CN113157996A (en) * 2020-01-23 2021-07-23 久瓴(上海)智能科技有限公司 Document information processing method and device, computer equipment and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101697164A (en) * 2009-10-30 2010-04-21 北京东方灵盾科技有限公司 Method, system and device for extracting information on traditional medicament patent documents
CN102135961A (en) * 2010-01-22 2011-07-27 北京金山软件有限公司 Method and device for determining domain feature words
US20130096947A1 (en) * 2011-10-13 2013-04-18 The Board of Trustees of the Leland Stanford Junior, University Method and System for Ontology Based Analytics
CN103258053A (en) * 2013-05-31 2013-08-21 深圳市宜搜科技发展有限公司 Method and system for extracting domain feature words
CN103678435A (en) * 2013-07-08 2014-03-26 重庆绿色智能技术研究院 Drug specification data similarity matching method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101697164A (en) * 2009-10-30 2010-04-21 北京东方灵盾科技有限公司 Method, system and device for extracting information on traditional medicament patent documents
CN102135961A (en) * 2010-01-22 2011-07-27 北京金山软件有限公司 Method and device for determining domain feature words
US20130096947A1 (en) * 2011-10-13 2013-04-18 The Board of Trustees of the Leland Stanford Junior, University Method and System for Ontology Based Analytics
CN103258053A (en) * 2013-05-31 2013-08-21 深圳市宜搜科技发展有限公司 Method and system for extracting domain feature words
CN103678435A (en) * 2013-07-08 2014-03-26 重庆绿色智能技术研究院 Drug specification data similarity matching method

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017028422A1 (en) * 2015-08-20 2017-02-23 小米科技有限责任公司 Knowledge base construction method and apparatus
JP2017532704A (en) * 2015-08-20 2017-11-02 シャオミ・インコーポレイテッド Knowledge base construction method and apparatus
RU2638013C2 (en) * 2015-08-20 2017-12-08 Сяоми Инк. Method and device for building knowledge base
US10331648B2 (en) 2015-08-20 2019-06-25 Xiaomi Inc. Method, device and medium for knowledge base construction
CN105956359A (en) * 2016-04-15 2016-09-21 陈杰 Medicine project name contrast translation method for heterogeneous system
CN105956359B (en) * 2016-04-15 2018-06-05 陈杰 A kind of pharmaceutical item title for heterogeneous system compares translation method
WO2018228259A1 (en) * 2017-06-16 2018-12-20 阿里巴巴集团控股有限公司 Relationship diagram processing method and apparatus
CN113157996A (en) * 2020-01-23 2021-07-23 久瓴(上海)智能科技有限公司 Document information processing method and device, computer equipment and readable storage medium
CN113157996B (en) * 2020-01-23 2022-09-16 久瓴(上海)智能科技有限公司 Document information processing method and device, computer equipment and readable storage medium

Also Published As

Publication number Publication date
CN104239500B (en) 2017-10-27

Similar Documents

Publication Publication Date Title
Liu et al. MMKG: multi-modal knowledge graphs
Dietz et al. TREC Complex Answer Retrieval Overview.
Gormley et al. Elasticsearch: the definitive guide: a distributed real-time search and analytics engine
Gupta et al. Information retrieval with verbose queries
US9448992B2 (en) Natural language search results for intent queries
US9286546B2 (en) Identifying labels for image collections
US20160188590A1 (en) Systems and methods for news event organization
US20140289675A1 (en) System and Method of Mapping Products to Patents
CA2916856A1 (en) Automatic generation of headlines
CN103838789A (en) Text similarity computing method
CN104239500A (en) Method and device for establishing health-care food associated knowledge base
CN103455487A (en) Extracting method and device for search term
Wu et al. Towards a probabilistic taxonomy of many concepts
CN104199938B (en) Agricultural land method for sending information and system based on RSS
CN102955853A (en) Method and device for generating cross-language abstract
Balsmeier et al. Automated disambiguation of us patent grants and applications
Krutil et al. Web page classification based on schema. org collection
Balog et al. The first joint international workshop on entity-oriented and semantic search (JIWES)
CN102279893A (en) Many-to-many automatic analysis method of document group
CN106503064B (en) A kind of generation method of adaptive microblog topic abstract
Pasca Ranking class labels using query sessions
Swami et al. Web Scraping Framework based on Combining Tag and Value Similarity
Bhadoria et al. Competent Search in Blog Ranking Algorithm Using Cluster Mining
Sundar Towards automatic data extraction using tag and Value similarity based on structural-semantic entropy
JP6296651B2 (en) Document relationship extracting apparatus and program

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant