CN113656600B

CN113656600B - Knowledge graph-based DHI report interpretation method, system and storage medium

Info

Publication number: CN113656600B
Application number: CN202110969609.XA
Authority: CN
Inventors: 高萌; 沈维政; 付强; 寇胜利; 张翼; 张永根; 熊本海
Original assignee: Northeast Agricultural University
Current assignee: Northeast Agricultural University
Priority date: 2021-08-23
Filing date: 2021-08-23
Publication date: 2022-04-29
Anticipated expiration: 2041-08-23
Also published as: CN113656600A

Abstract

A DHI report reading method, a DHI report reading system and a DHI report reading storage medium based on knowledge maps belong to the technical field of livestock breeding. The problem that the DHI report interpretation is not efficient and information of the DHI report cannot be objectively, accurately and effectively utilized due to the fact that no automatic interpretation method aiming at the DHI report exists at present is solved. The method analyzes DHI index data based on the acquired DHI data of a pasture, and performs problem diagnosis on a dynamic analysis result by combining a knowledge graph in the DHI field, wherein the problem diagnosis process comprises problem positioning, namely based on the knowledge graph in the DHI field, taking a fact description of dynamic analysis as a 'performance index/symptom' entity, and calculating to obtain the probability that the fact description is influenced by a certain influence factor, wherein the knowledge graph in the DHI field comprises three types of entities and entity relations of 'performance index/symptom', 'influence factor' and 'solution measures'. Mainly for DHI report interpretation.

Description

Knowledge graph-based DHI report interpretation method, system and storage medium

Technical Field

The invention relates to a DHI report reading method, a DHI report reading system, namely a storage medium, and belongs to the technical field of livestock breeding.

Background

DHI (Dairy Herd improvement) cow production performance measurements are of great significance in directing Herd improvement in pastures. Currently, the DHI report has the following problems in practical applications:

(1) the interpretation of the DHI report does not form a standardized knowledge system, the knowledge is only expert experience or is recorded in part of professional books, and the knowledge is not systematic and cannot be fully used for guiding the analysis of the DHI report;

(2) due to the problem (1), a large gap exists among staff capable of professionally interpreting DHI reports, most pastures completely rely on a preliminary interpretation report issued by a DHI detection center according to a measurement result, the report is roughly judged only according to the measurement result, and deep analysis cannot be sufficiently performed by fully combining with the actual production of the pasture, so that the practicability of the report is poor;

(3) common cattle farm management software in the market, such as CNDHI, FreeDMS and the like, is limited to analyzing the trend of key indexes of the DHI, the early warning of the indexes and other data statistics functions, and pasture managers are still in a blank state in the face of the numerical curve of the indexes, and the software cannot combine expert knowledge to give an instructive conclusion, so that the value reported by the DHI cannot be effectively played and utilized.

The reasons lead to low enthusiasm of pasture parameter measurement, and greatly influence deep popularization of DHI measurement work in China.

Disclosure of Invention

The invention aims to solve the problems that the reading of the DHI report is not efficient and the information of the DHI report cannot be objectively, accurately and effectively utilized because no automatic reading method aiming at the DHI report exists at present.

A knowledge-graph-based DHI report interpretation method comprises the following steps:

s1, acquiring DHI data of the pasture, wherein the DHI index data comprise monthly data and historical data;

the data in this month directly uses DHI detection center to determine the DHI report file finished by the analysis system according to the production performance of Chinese dairy cows;

the historical data acquires the relevant index data according to the set fields by traversing the relevant index data in the historical DHI report file;

s2, analyzing the DHI index data, wherein the analysis of the index data comprises two aspects of static analysis and dynamic analysis:

the static analysis is to find abnormal indexes according to the monthly data of each index and the normal range value standard of each index and correspondingly form a fact description;

the dynamic analysis is to combine the monthly data and the historical data of each index, analyze the recent change rule of each index, and form a fact description correspondingly;

s3, problem diagnosis is carried out on the result of dynamic analysis by combining with the knowledge graph in the DHI field, the problem diagnosis process comprises problem positioning, the problem positioning is based on the knowledge graph in the DHI field, the fact description of dynamic analysis is used as a 'performance index/symptom' entity, the probability of the fact description caused by the influence of a certain influence factor is calculated and obtained, and the probability is recorded as P (fac):

P(fac)＝P(fac|sym)·P_prior(sym)

where P (fac | sym) is the conditional probability between the performance indicator/symptom and the influencing factor, i.e. the weight of the edge between the entities; p_prior(sym) is the prior probability of a performance indicator/symptom;

the DHI domain knowledge graph comprises three types of entities and entity relations of performance indexes/symptoms, influence factors and solutions; the method comprises the following steps that two types of entities of 'influence factor' and 'performance index/symptom' form a triplet, a weight is arranged on an edge between the entities of the 'influence factor' and the 'performance index/symptom', and the weight is a conditional probability P (fac | sym) between the two types of entities, and the process of determining the conditional probability between the two types of entities comprises the following steps:

acquiring each 'influence factor j' and corresponding score Ij fed back by the participant Qn aiming at the 'performance index/symptom i';

based on a certain performance index/symptom i, counting the influence factors J corresponding to all participants to obtain an influence factor set J ═ { jm }, wherein M is 1,2, … …, and M is the total number of all influence factors corresponding to the certain performance index/symptom i;

setting M influence bit orders according to influence factors;

aiming at the participants Qn, respectively sequencing the given influence factors j according to corresponding scores; then counting the number of the influence factors of the participants Qn arranged at the first position, and taking the influence factor with the largest number as the first influence position; counting the number of the influence factors of the participants Qn arranged at the second position, and taking the influence factor with the largest number as the second influence order; sequentially carrying out statistics to obtain corresponding influence orders until M influence orders are obtained;

from the first influence order to the end of the Mth influence order, calculating the score of the factor embodied by the corresponding influence order for each influence order, respectively, comprising the following steps:

counting the scores of the influencing factors Pj given by the participants Qn 'and the influencing factors Pj given by Qn' under the condition that the ordering of the influencing factors Pj by the participants Qn is consistent with the influencing factors Pj corresponding to the influencing order m, and calculating the average score of the scores corresponding to the influencing factors Pj given by Qn 'according to the number of Qn'; the average score is the conditional probability between two entities of 'influencing factor Pj' and 'performance index/symptom';

the case where the order of the influencing factor Pj by the participant Qn is consistent with the influencing bit order m, namely: based on the ranking result of the participant Qn, the participant Qn also ranks the influencing factor Pj in the mth bit of the ranking result given by the participant Qn.

Further, the problem diagnosis process further comprises the step of giving out a guiding measure suggestion, wherein the guiding measure suggestion is based on the influence factors obtained through positioning, and the solution corresponding to the influence factors is determined from the knowledge graph in the DHI field according to the relation between the two entities of the influence factors and the solution.

Further, said P_prior(sym) the initial value is obtained by statistical calculation according to historical DHI reports and pasture record data, and is updated according to month data.

Further, the DHI domain knowledge graph is constructed in advance, and the construction process comprises the following steps:

(1) constructing a DHI field ontology, wherein the DHI field ontology comprises three types of entities of 'performance index/symptom', 'influencing factor' and 'solution' and an entity relation;

(2) taking an electronic text obtained after electronization of relevant data of DHI measurement and application guidance as a labeling object, and performing semantic labeling on electronic text data by taking a body as a labeling basis to form labeled data;

(3) and extracting the entity and the entity relation from the internet text by using the data in the labeled data as training data according to the body structure of the knowledge graph in the DHI field to obtain expanded data and form a complete knowledge graph in the DHI field.

A knowledge-graph-based DHI report interpretation system, comprising:

the DHI data acquisition unit is used for acquiring DHI data of a pasture;

the DHI index data comprises the monthly data and the historical data; the data in this month directly uses DHI detection center to determine the DHI report file finished by the analysis system according to the production performance of Chinese dairy cows; the historical data acquires the relevant index data according to the set fields by traversing the relevant index data in the historical DHI report file;

the DHI index data analysis unit is used for analyzing the DHI index data;

analyzing the index data comprises two aspects of static analysis and dynamic analysis:

the problem diagnosis unit is used for calling the knowledge graph in the DHI field to carry out problem diagnosis on the result of the dynamic analysis; the problem diagnosis unit comprises a problem positioning module, the problem positioning module takes the fact description which is dynamically analyzed as a performance index/symptom entity based on the DHI domain knowledge graph, and the probability of the fact description which is influenced by a certain influence factor is calculated;

a database for storing a DHI domain knowledge graph;

the DHI domain knowledge graph comprises three types of entities and entity relations of performance indexes/symptoms, influence factors and solutions; the two entities of 'influence factor' and 'performance index/symptom' form a triple, and the weight is arranged between the entities of 'influence factor' and 'performance index/symptom'.

Further, the system further comprises a crowdsourcing data acquisition unit;

the crowdsourcing data acquisition unit is used for providing 'performance index/symptom' and corresponding 'influence factor' options for different users, and adding options for the users, and the users add the 'performance index/symptom' and 'influence factor' contents through the addition options;

the crowd-sourced data acquisition unit is further used for acquiring the performance indexes/symptoms fed back by different participants, and the influence factors corresponding to each performance index/symptom and the corresponding scores of the influence factors.

Further, the system also comprises an edge weight calculation unit for calculating the weight of the edge between the 'influence factor' and the 'performance index/symptom' entity, and the method comprises the following steps:

setting M influence bit orders according to influence factors;

counting the scores of the influencing factors Pj given by the participants Qn 'and the influencing factors Pj given by Qn' under the condition that the ordering of the influencing factors Pj by the participants Qn is consistent with the influencing factors Pj corresponding to the influencing order m, and calculating the average score of the scores corresponding to the influencing factors Pj given by Qn 'according to the number of Qn'; the average score is the conditional probability between two entities of 'influence factor Pj' and 'performance index/symptom', and the conditional probability is the weight of the edge between the entity of 'influence factor Pj' and the corresponding 'performance index/symptom';

Further, the prior probability determination unit of the performance index/symptom is used for calculating the prior probability P of the performance index/symptom according to the historical DHI report and the pasture record data_prior(sym) providing the problem diagnosis unit with a prior probability for calculating a performance index/symptom that the fact description is influenced by a certain influencing factor to the occurrence probability.

Furthermore, the problem diagnosis unit further comprises a guiding measure suggesting module, wherein the guiding measure suggesting module determines a solving measure corresponding to the influencing factor according to the relation between the two types of entities of the influencing factor and the solving measure in the DHI domain knowledge graph based on the influencing factor obtained by positioning.

A storage medium having stored therein at least one instruction that is loaded and executed by a processor to implement a method for knowledgegraph-based DHI report interpretation.

Has the advantages that:

1. the invention can carry out domination and quantitative expression on knowledge involved in DHI report interpretation through the constructed DHI domain knowledge graph, namely: influence factors, complex symbiotic relation and influence degree between abnormal measured parameters and actual symptoms, realizes automatic reading of the DHI report, and can effectively utilize DHI report information to guide the livestock breeding process.

2. The invention can replace part of the work of report analysts of the existing DHI detection center, does not need to integrate and compare and analyze all DHI reports for providing preliminary conclusions for a pasture, and writes text interpretation reports, has the advantage of high efficiency, does not depend on the experience of personnel, can ensure that the information of the DHI reports can be objectively and effectively utilized while ensuring the improvement of the working efficiency, further ensures the accuracy of interpretation, and can better assist pasture managers to compare and interpret results according to the actual production and management conditions, and eliminate and locate the reasons causing the abnormal DHI measuring indexes.

Drawings

FIG. 1 is a schematic representation of an ontology in the field of DHI;

fig. 2 is a flow chart of a method for interpreting a DHI report based on a knowledge graph.

Detailed Description

The first embodiment is as follows:

the DHI report interpretation method based on the knowledge graph according to the embodiment includes the following steps:

1. constructing a DHI domain knowledge graph:

(1) constructing a DHI field ontology, wherein the DHI field ontology comprises three types of entities and entity relations, namely 'performance index/symptom', 'influencing factor' and 'solution', as shown in FIG. 1; the performance index/symptom refers to an index for presenting a health state embodied by the cow or a symptom embodied by the cow;

for example: the performance index is 'somatic cell abnormality', one of the influencing factors is the problem of feed raw materials, and the problem can also show symptoms such as 'low milk fat rate' and 'rising hoof disease incidence rate';

(2) electronic texts obtained after electronizing professional books and documents related to DHI measurement and application guidance are used as labeling objects, and the body is used as a labeling basis to perform semantic labeling on electronic text data to form labeling data;

(3) and (3) taking data in the labeled data as training data, and extracting entities and entity relations from Internet texts such as Baidu encyclopedia and the like by adopting a supervision, semi-supervision and unsupervised method according to the body structure of the knowledge graph in the DHI field to obtain expanded data to form the complete knowledge graph in the DHI field.

(4) For the triplets of the DHI knowledge graph containing the two types of entities, namely 'influencing factors' and 'performance indexes/symptoms', the conditional probability between the two types of entities is calculated and is marked as P (fac | sym) and serves as the weight of the edges between the entities.

P (fac | sym) is obtained by means of crowdsourcing calculation, namely: all influencing factors which are likely to lead to a certain performance index/symptom are provided to the participants (pasture producers, pasture managers, domain experts, etc.) by crowdsourcing software, and the degree of influence of these factors is ranked and scored by the individual participants, for example: the influence factors causing the performance index of high lipoprotein-egg ratio include 4 factors such as rumen bypass fat, heat stress, dry matter feed intake insufficiency and rumen mycoprotein synthesis insufficiency which are respectively marked as I1, I2, I3 and I4, the ranking and scoring results of participants A are I1(0.6), I3(0.2), I2(0.1) and I4(0.1), the ranking and scoring results of participants B are I1(0.4), I2(0.2), I3(0.2) and I4(0.2), the ranking and scoring results of participants C are I1(0.4), I4(0.2), I2(0.1) and I3(0.1), the ranking and scoring results of participants D are I1(0.5), I3(0.2), I4(0.2) and I2, and a plurality of experience data can be obtained by using the results of participants A, namely experience data;

according to data, calculating the weight of each influence factor and a certain performance index/symptom according to a principle of sorting and scoring, namely selecting the influence factor with the most occurrence times in each rank as the influence factor of the rank according to a minority obedient majority, and averaging the scoring values of the participants of the influence factors in the ranks as the weight between the influence factor and the performance index/symptom;

for example: based on the data of the participants A, B, C, D, firstly, the influencing factors are ranked, in A, B, C, D four participant data, in order to achieve uniform criteria for ranking convenience and calculation weight, in the embodiment, the total score of all performance indexes of each parameter is determined to be 1, actually, each score is a corresponding probability, and the total probability is 1, wherein a four indexes I1, I2, I3 and I4 are scored as 0.6, 0.1, 0.2 and 0.1, and when the indexes are ranked, the indexes are I1, I3, I2 and I4; based on A, B, C, D four participant data, I1 occurred most in the first digit, so I1 was in the first digit; i3 occurs most frequently in the second order, so I3 is in the second order; by analogy, the sequencing results are I1, I3, I2 and I4. Secondly, based on the determined sorting result, selecting the weights of the influence factors and the performance indexes/symptoms at each bit after the grading calculation sorting of the participants at the corresponding bit given in the corresponding bit is performed according to the sequence from the first bit to the last bit, wherein for I1, the calculation participants A, B, C, D all give I1 at the first bit, then calculating the average value of I1 grading given by all participants, and obtaining that the weight of I1 is (0.6+0.4+0.4+0.5)/4 is 0.475; for I3, calculate participant A, D gave I3 the second time, then calculate A, D gave the average of I3 scores, resulting in I3 weights of (0.2+ 0.2)/2-0.2; for I2, the mean of the participant A, C scores was calculated, resulting in I2 weights of (0.1+ 0.1)/2-0.1; for I4, the mean of the participant A, B scores was calculated, resulting in an I4 weight of (0.1+ 0.2)/2-0.15. By analogy, the correlation strength between each performance index/symptom and each influence factor can be obtained, a correlation coefficient matrix is formed, and the weight of the edge between the two types of entities of the performance index/symptom and the influence factor is represented.

Particularly, the crowdsourcing software provides a fixed influence factor for the crowdsourcing participants to rank and score, and also provides an influence factor supplement function, namely when the participants consider that the fixed influence factor provided by the problem is insufficient, the crowdsourcing software can feed back a new influence factor, rank and score the new influence factor together with the fixed influence factor, feed back a crowdsourcing system, update the new influence factor into a knowledge graph as an entity node after statistics and confirmation of domain experts, and take a corresponding weight value as the weight of an edge between the influence factor and a performance index/symptom. For example: the performance index of the high-fat-to-egg ratio is I1, I2, I3 and I4, and after the participant E receives a problem, the participant E considers that the protein quality is not ideal and can cause the high-fat-to-egg ratio, the participant E can add the influence factor which is marked as I5, and feeds back a crowdsourcing system after I1, I2, I3, I4 and I5 are ranked and scored.

2. DHI index data acquisition:

the method is simple and efficient, and can provide basic measurement data and related statistical indexes, such as average calving interval, lactation days, butter fat rate, protein rate, fat-egg ratio, peak milk, peak days, continuous power, urea nitrogen and the like; and traversing the relevant index data in the historical DHI report file through software, acquiring the relevant index data according to the set field, and storing the relevant index data in the database.

3. Statistical analysis of index data:

analyzing index data comprises two aspects of static analysis and dynamic analysis;

the static analysis is to find abnormal indexes according to the monthly data of each index and the normal range value standard of each index, and correspondingly form a fact description, for example, the range value of the fat-egg ratio to the normal range is 1.12-1.30, and if the value of the fat-egg ratio in the month is 1.54, the condition is expressed as that the fat-egg ratio is higher;

the dynamic analysis is to combine the monthly data and the historical data of each index, analyze the recent change rule of each index, and correspondingly form a fact description, for example, the values of the milk yield in two months are 41.6kg and 30.0kg respectively, if the value of the milk yield in the month is 24.4kg, the milk yield is expressed as 'the milk yield is continuously decreased'.

4. Problem diagnosis:

performing problem diagnosis on the result of the dynamic analysis by combining a knowledge graph in the DHI field, wherein the problem diagnosis comprises two aspects of problem positioning and guidance measure suggestion;

problem localization is based on DHI domain knowledge graph, the fact description which is dynamically analyzed is used as a 'performance index/symptom' entity, and the probability of the fact description which is influenced by some influence factor and is recorded as P (fac):

P(fac)＝P(fac|sym)·P_prior(sym)

where P (fac | sym) is the conditional probability between the performance indicator/symptom and the influencing factor, i.e. the weight of the edge between the entities; p_prior(sym) is the prior probability of performance index/symptom, P_prior(sym) the initial value is calculated according to historical DHI reports and pasture record data statistics, namely: the prior probability of the performance index is the proportion of the number of reports with abnormal index in the historical DHI reports to the total number of reports; the prior probability of a symptom is the proportion of the number of cows presenting the symptom in the pasture record data to the total number of cows. Furthermore, P_prior(sym) can be updated monthly, for example: at month 5, P is calculated_prior(sym) takes DHI reports and ranch log data up to month 4; at 6 months, calculate P_prior(sym) takes DHI reports and ranch log data up to month 5;

the guiding measure suggestion is that according to the influence factors obtained by positioning, the solution corresponding to the influence factors is found from the knowledge graph in the DHI field according to the relationship between the two entities of the influence factors and the solution, and the solution is fed back to the user.

The system flow and module configuration are shown in fig. 2.

Examples

(1) According to a DHI report in the month of a certain pasture, extracting that the average milk fat rate of the whole group is 4.24%, the milk protein rate is 2.76% and the ratio of fat to egg is 1.54; the mean urea nitrogen level for high-fat egg-ratio herds was 22.33 mg/100 ml;

(2) the standard range of the fat-to-egg ratio is 1.12-1.30, the fat-to-egg ratio of the cattle is 1.54>1.30, the cattle is judged to have the problem of higher fat-to-egg ratio, and the fact description of higher fat-to-egg ratio is correspondingly generated; the standard range of the urea nitrogen level is 10-18 mg/100 ml, the average urea nitrogen level of high-fat eggs in the cattle group is 22.33 mg/100 ml and is more than 18 mg/100 ml, and the part of the cattle group is judged to have the problem of 'overhigh urea nitrogen level', which corresponds to the description of the fact that 'overhigh urea nitrogen level of high-fat eggs in the cattle group' is generated;

(3) according to a knowledge graph in the DHI field and according to the index of high lipoprotein-to-egg ratio, the influence factors are preliminarily positioned as that rumen fat is added into feed, and combined with the symptom of high urea nitrogen level of high lipoprotein-to-cow group, the influence factors are further positioned as that feed protein is excessive and energy is insufficient, protein quality is not ideal, and crude protein in the feed is not effectively utilized.

(4) According to knowledge graph in DHI field, according to the influence factors in (3), corresponding solutions are found, namely 'increasing the supply of rumen-bypass protein in protein source', 'adding beneficial bacteria in feed' and 'placing lick brick in feeding trough'.

Through the process, the pasture production and management problems reflected by the abnormal DHI indexes can be automatically analyzed and positioned, the solution is correspondingly given, the burden of DHI report analysts of a DHI detection center is greatly reduced, the practicability of DHI determination is improved, and the comprehensive popularization of DHI determination work in China is facilitated.

The present invention is capable of other embodiments and its several details are capable of modifications in various obvious respects, all without departing from the spirit and scope of the present invention.

Claims

1. A DHI report interpretation method based on knowledge graph is characterized by comprising the following steps:

P(fac)＝P(fac|sym)·P_prior(sym)

setting M influence bit orders according to influence factors;

2. The method as claimed in claim 1, wherein the problem diagnosis process further comprises providing guidance measure suggestions, the guidance measure suggestions are influence factors obtained based on positioning, and the solution measures corresponding to the influence factors are determined from the knowledge graph in the DHI field according to the relationship between the two types of entities.

3. The method of claim 1, wherein P is the knowledge-graph-based DHI report interpretation_prior(sym) initial value is obtained by statistical calculation according to historical DHI report and pasture record data, and is updated according to month dataAnd (5) new.

4. A method for knowledge-graph based DHI report interpretation according to claim 1,2 or 3, wherein said DHI domain knowledge-graph is pre-constructed, the construction process comprising the steps of:

5. A knowledge-graph-based DHI report interpretation system, comprising:

the DHI data acquisition unit is used for acquiring DHI data of a pasture;

the DHI index data analysis unit is used for analyzing the DHI index data;

a database for storing a DHI domain knowledge graph;

the DHI domain knowledge graph comprises three types of entities and entity relations of performance indexes/symptoms, influence factors and solutions; the two entities of ' influence factor ' and ' performance index/symptom ' form a triple, and the weight is arranged between the entities of the influence factor ' and the ' performance index/symptom ';

the system also includes a crowdsourcing data acquisition unit;

the crowdsourcing data acquisition unit is also used for acquiring performance indexes/symptoms fed back by different participants, influence factors corresponding to the performance indexes/symptoms and corresponding scores of the influence factors;

the system also comprises an edge weight calculation unit which is used for calculating the weight of the edge between the 'influence factor' entity and the 'performance index/symptom' entity, and comprises the following steps:

setting M influence bit orders according to influence factors;

6. The system of claim 5, wherein the unit for determining the prior probability of performance index/symptom is configured to calculate the prior probability P of performance index/symptom according to historical DHI report and pasture record data_prior(sym) providing the problem diagnosis unit with a prior probability for calculating a performance index/symptom that the fact description is influenced by a certain influencing factor to the occurrence probability.

7. The system of claim 5 or 6, wherein the problem diagnosis unit further comprises a guidance measure suggestion module, and the guidance measure suggestion module determines the solution corresponding to the influence factor according to the relationship between the two types of entities, namely the influence factor and the solution, from the knowledge graph in the DHI field based on the located influence factor.

8. A storage medium having stored therein at least one instruction that is loaded and executed by a processor to implement a method for knowledge-graph based DHI report interpretation as recited in any one of claims 1 to 4.