CN113869066A - Semantic understanding method and system based on agricultural field text - Google Patents

Semantic understanding method and system based on agricultural field text Download PDF

Info

Publication number
CN113869066A
CN113869066A CN202111203860.1A CN202111203860A CN113869066A CN 113869066 A CN113869066 A CN 113869066A CN 202111203860 A CN202111203860 A CN 202111203860A CN 113869066 A CN113869066 A CN 113869066A
Authority
CN
China
Prior art keywords
text data
data
semantic
text
entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111203860.1A
Other languages
Chinese (zh)
Inventor
方佩
冯仁伟
全威
谢昭俊
侯敏
杨森
李国民
姜涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Comservice Enrising Information Technology Co Ltd
Original Assignee
China Comservice Enrising Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Comservice Enrising Information Technology Co Ltd filed Critical China Comservice Enrising Information Technology Co Ltd
Priority to CN202111203860.1A priority Critical patent/CN113869066A/en
Publication of CN113869066A publication Critical patent/CN113869066A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/02Agriculture; Fishing; Mining

Abstract

The invention discloses a semantic understanding method and a semantic understanding system based on texts in the agricultural field, which relate to natural language processing and solve the problems of inaccurate results of supply and demand information matching, intelligent question answering and document retrieval in the agricultural field, and have the technical scheme that: acquiring text data in the agricultural field; performing word segmentation and part-of-speech tagging on the text data, and performing entity processing on the text data subjected to word segmentation and part-of-speech tagging according to the context information of the text data; constructing a basic knowledge graph of homologous text data and a semantic knowledge graph of heterologous text data; combining the text data processed in the step two to form a word segmentation labeling model, an entity recognition model and a semantic recognition model; and step five, performing iterative update on the text data and updating the knowledge graph in real time. The invention provides a basic natural language processing method which is combined and output in a composite model form, and provides application to text data.

Description

Semantic understanding method and system based on agricultural field text
Technical Field
The invention relates to the field of natural language processing, in particular to a semantic understanding method and a semantic understanding system based on texts in the agricultural field.
Background
With the large environment in which natural language processing technology is vigorously developed in various fields, there is also an urgent need for the construction of natural language processing directions in the agricultural field.
The results of the prior art on supply and demand information matching, intelligent question answering and literature retrieval in the agricultural field are inaccurate, and the updating of data in the agricultural field is extremely slow, so that the data in the agricultural field is still in a state of years ago.
The invention provides a semantic understanding method and a semantic understanding system for words in the agricultural field, which are designed according to natural language characteristics and lower-layer application in the agricultural field and are combined with the characteristics of various knowledge types, strong heterogeneous relevance, relatively fixed language sentence patterns and the like in the agricultural field.
Disclosure of Invention
The invention aims to provide a semantic understanding method and a semantic understanding system based on texts in the agricultural field, and solves the problems of inaccurate results of supply and demand information matching, intelligent question answering and document retrieval in the agricultural field.
The technical purpose of the invention is realized by the following technical scheme:
in a first aspect, the invention provides a semantic understanding method based on an agricultural field text, which comprises the following steps:
acquiring text data in the agricultural field;
performing word segmentation and part-of-speech tagging on the text data, and performing entity processing on the text data subjected to word segmentation and part-of-speech tagging according to the context information of the text data;
constructing a basic knowledge graph of homologous text data and a semantic knowledge graph of heterologous text data;
step four, combining the text data processing methods in the step two to form a word segmentation labeling model, an entity recognition model and a semantic recognition model;
and step five, performing iterative update on the text data and updating the knowledge graph in real time.
The invention carries out word segmentation on text data in the aspect of agricultural field and word part of speech tagging, then constructs a basic knowledge graph of homologous text data and a semantic knowledge graph of heterologous text data from the text data after word segmentation and word part of speech tagging.
Further, in the second step, natural language processing is performed on the text data; the natural language processing comprises word segmentation, keyword extraction, theme extraction, abstract, event extraction and semantic extraction;
carrying out entity processing on the text data subjected to natural language processing; the entity processing comprises part of speech tagging, entity recognition, semantic disambiguation, role tagging, reference resolution and entity alignment.
Further, in the third step, entity alignment is performed on the text data, and the aligned text data is imported into the basic map to generate a basic knowledge map; the method comprises the steps of comparing similarity of attributes owned by a plurality of entities of text data, setting a similarity threshold, and considering that the text data belong to the same entity when the similarity reaches a certain similarity threshold;
all atoms in the basic knowledge graph are quantized, and the quantized data are processed by a translation algorithm, a path learning algorithm, a type constraint algorithm and a heterogeneous information algorithm in a natural language to generate the semantic knowledge graph.
Furthermore, in the fourth step, all parts of speech in the text data are labeled, including word segmentation, part of speech labeling and role labeling for the text data;
identifying all entities in the text data, including word segmentation, keyword extraction, entity identification and reference resolution of the text data;
and judging the semantics expressed by the text data, including abstracting the text data, extracting events, extracting semantics, extracting themes, extracting keywords and disambiguating the semantics.
Furthermore, in the fifth step, the error existing in the existing data is modified, the behavior data of the modified action is recorded, the behavior data is processed and converted into trainable data, and iterative training of the text data is realized;
downstream tasks of supply and demand information matching, intelligent question answering and patent document retrieval in the agricultural field are realized;
testing and marking the generated text data;
and adding, modifying and deleting the knowledge graph to complete the real-time updating of the knowledge graph.
In a second aspect, the invention further provides a semantic understanding system based on the text in the agricultural field, which is used for realizing the semantic understanding method in the first aspect, and the system comprises a data processing layer and a data application layer, wherein the data processing layer comprises a data acquisition unit, a data processing unit and a map construction unit, and the data application layer comprises a data combination unit and a data application unit; the functions are specifically realized as follows:
the data acquisition unit is used for acquiring text data of the agricultural field;
the data processing unit is used for performing word segmentation and part-of-speech tagging on the text data and performing entity processing on the text data subjected to word segmentation and part-of-speech tagging according to the text data context information;
the map construction unit is used for constructing a basic knowledge map of homologous text data and a semantic knowledge map of heterologous text data;
the data combination unit is used for combining the text data processing methods of the data processing unit to form a word segmentation labeling model, an entity recognition model and a semantic recognition model;
and the data application unit is used for performing iterative update on the text data and updating the knowledge graph in real time.
Further, the data processing unit includes a first processing unit and a second processing unit, and is specifically implemented as follows:
the first processing unit is used for carrying out natural language processing on the text data; the natural language processing comprises word segmentation, keyword extraction, theme extraction, abstract, event extraction and semantic extraction;
the second processing unit is used for performing entity processing on the text data subjected to the natural language processing; the entity processing comprises part of speech tagging, entity recognition, semantic disambiguation, role tagging, reference resolution and entity alignment.
Further, the map construction unit comprises a basic knowledge map construction unit and a semantic knowledge map construction unit, and is specifically realized as follows:
the basic knowledge graph construction unit is used for aligning the entities of the text data, importing the aligned text data into a basic knowledge graph and generating the basic knowledge graph; the method comprises the steps of comparing similarity of attributes owned by a plurality of entities of text data, setting a similarity threshold, and considering that the text data belong to the same entity when the similarity reaches a certain similarity threshold;
the semantic knowledge map construction unit is used for quantizing all atoms in the basic knowledge map, and processing the quantized data by a translation algorithm, a path learning algorithm, a type constraint algorithm and a heterogeneous information algorithm in a natural language to generate the semantic knowledge map.
Furthermore, the word segmentation tagging model is used for tagging all parts of speech in the text data, including word segmentation, part of speech tagging and role tagging of the text data;
the entity identification model is used for identifying all entities in the text data, and comprises the steps of performing word segmentation, keyword extraction, entity identification and reference resolution on the text data;
the semantic recognition model is used for judging the semantics expressed by the text data, and comprises the steps of abstracting the text data, extracting events, extracting semantics, extracting themes, extracting keywords and disambiguating the semantics.
Further, the data combination unit also forms a click model, and the data application unit comprises an AI interface, a basic function interface, a map query interface and an error correction and reasoning interface;
the error correction and reasoning interface is used for modifying errors in the existing data, recording behavior data of modified actions, processing and converting the behavior data into trainable data of the click model, and realizing iterative training of the click model;
the AI interface comprises a word segmentation labeling model, an entity recognition model and a semantic recognition model and is used for realizing the downstream tasks of supply and demand information matching, intelligent question answering and patent document retrieval in the agricultural field;
the basic function interface is used for testing and marking the text data generated by the data processing unit;
and the map query interface is used for adding, modifying and deleting the knowledge map in the map construction unit to finish the real-time updating of the knowledge map.
Compared with the prior art, the invention has the following beneficial effects:
the invention is used for serving specific downstream tasks such as supply and demand information matching, intelligent question answering, patent literature retrieval and the like in agriculture. All algorithms for processing data in the agricultural field are opened in an interface form through the basic function interfaces, and the basic function interfaces are used as functional opening advantages that the basic function interfaces can be used for processing data test and new data marking, and then the data correctness is checked manually, so that the production efficiency is improved, and the consumption of human resources is reduced. The knowledge graph can be added, modified, deleted and the like through the graph query interface, the interface form is open, the operation is convenient, and the modification can be carried out in real time. The error correction and reasoning interface is a click model presentation mode, errors existing in existing data can be modified in an interface mode, behavior data of modification actions are recorded at the same time, the data are converted into model trainable data through data processing, and iterative optimization of the data can be achieved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
FIG. 1 is a flow chart of a semantic understanding method according to an embodiment of the present invention;
FIG. 2 is an architecture diagram of a semantic understanding system provided by an embodiment of the present invention;
fig. 3 is a flowchart of acquiring text data according to an embodiment of the present invention;
FIG. 4 is a flow chart of a construction of a segmentation tagging model according to an embodiment of the present invention;
FIG. 5 is a flow chart of entity recognition model construction according to an embodiment of the present invention;
FIG. 6 is a flow chart of semantic identification model construction according to an embodiment of the present invention;
FIG. 7 is a flow diagram of a basic knowledge graph construction provided by an embodiment of the present invention;
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention.
It will be understood that when an element is referred to as being "secured to" or "disposed on" another element, it can be directly on the other element or be indirectly on the other element. When an element is referred to as being "connected to" another element, it can be directly or indirectly connected to the other element.
It will be understood that the terms "length," "width," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like, as used herein, refer to an orientation or positional relationship indicated in the drawings that is solely for the purpose of facilitating the description and simplifying the description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and is therefore not to be construed as limiting the invention.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
Examples
As shown in fig. 1, the embodiment provides a semantic understanding method based on an agricultural field text, and the method includes the following steps:
acquiring text data in the agricultural field;
performing word segmentation and part-of-speech tagging on the text data, and performing entity processing on the text data subjected to word segmentation and part-of-speech tagging according to the context information of the text data;
constructing a basic knowledge graph of homologous text data and a semantic knowledge graph of heterologous text data;
step four, combining the text data processing methods in the step two to form a word segmentation labeling model, an entity recognition model and a semantic recognition model;
and step five, performing iterative update on the text data and updating the knowledge graph in real time.
Specifically, as shown in fig. 3, in the step one, the data types of the text data may be divided into text data, behavior data, encyclopedia data, and core word data. The text data mainly comes from data such as forum comment information, news information, supply and demand order information and the like which are open on the Internet, and the data can be directly acquired through a data capture tool, and then the acquired data set is cleaned and sorted. The behavior data is derived from operator behavior log data generated by a click model in a core engine layer; the encyclopedia data can use official open source data such as encyclopedia, Wikipedia and the like, and the data has the advantages of detailed content and high accuracy; the data about the core words refers to proper nouns in the agricultural field or words with special meanings in the field, and the data is provided by agricultural experts.
In the second step, the text data processing can be divided into NLP processing and NLP entity processing, word segmentation and part of speech tagging belong to the NLP processing, and entity processing belongs to the NLP entity processing. The natural language processing is abbreviated as NLP processing, and the NLP processing comprises word segmentation, keyword extraction, theme extraction, abstract, event extraction and semantic extraction; NLP entity processing comprises part of speech tagging, entity recognition, semantic disambiguation, role tagging, reference resolution and entity alignment.
In the third step, agriculture can be broadly divided into five types of planting industry, fishery industry, animal husbandry, forestry, subsidiary industry and the like, and due to the fact that association exists among the types, knowledge in the agriculture industry is rich, the relation is complex, semantic understanding and reasoning need to be assisted by means of knowledge maps, basic knowledge maps are mainly constructed by agricultural experts, and accuracy and authority of the maps can be guaranteed. The method has the great characteristics that animals and plants have certain correlation, the animal and plant knowledge graph belongs to a heterogeneous knowledge graph in the natural language field, the basic knowledge graph cannot meet the requirement of reasoning the heterogeneous knowledge graph, the heterogeneous knowledge graph needs to be subjected to translation, path learning, type constraint and heterogeneous information algorithm processing in natural language, the possible hidden relation is found and deduced, and the heterogeneous information is combined to form the semantic knowledge graph.
In the fourth step, the text data processed in the second step is output through three models to complete information matching, intelligent question answering and document retrieval.
In the fifth step, agriculture progresses with the development, so that data needs to be updated and corrected, and part-of-speech tagging needs to be performed on new text data to complete subsequent information matching and searching.
In a further embodiment of this embodiment, the text is processed by natural language processing and natural language entity processing, and in the second step, the text data is processed by natural language processing; the natural language processing comprises word segmentation, keyword extraction, theme extraction, abstract, event extraction and semantic extraction;
carrying out entity processing on the text data subjected to natural language processing; the entity processing comprises part of speech tagging, entity recognition, semantic disambiguation, role tagging, reference resolution and entity alignment.
In a further embodiment of this embodiment, a knowledge graph is constructed for the text data, as shown in fig. 7, in the third step, entity alignment is performed on the text data, and the aligned text data is imported into a basic graph to generate a basic knowledge graph; the method comprises the steps of comparing similarity of attributes owned by a plurality of entities of text data, setting a similarity threshold, and considering that the text data belong to the same entity when the similarity reaches a certain similarity threshold;
all atoms in the basic knowledge graph are quantized, and the quantized data are processed by a translation algorithm, a path learning algorithm, a type constraint algorithm and a heterogeneous information algorithm in a natural language to generate the semantic knowledge graph.
Specifically, entity alignment is performed on data before the data is imported into a basic map, most plants and animals are distinguished in agriculture, and the entity alignment algorithm compares the similarity of attributes owned by a plurality of entities, and the entities are considered to be the same entity when the similarity reaches a certain threshold.
The basic knowledge graph can be constructed by experts in the agricultural field, the advantage of constructing the basic knowledge graph by the experts is that the experts deeply plough the field for a long time, the basic knowledge graph has an effect of advancing with time on the knowledge in the field, and some relations which do not exist or can hardly be used can be effectively avoided. The expert can ensure the accuracy and authority of the map by constructing the basic knowledge map, and the semantic map construction is facilitated. The knowledge graph is formed by combining a plurality of triples, all atoms in the basic graph are quantized, and the quantized data are processed by a translation algorithm, a path learning algorithm, a type constraint algorithm and a heterogeneous information algorithm in natural language to generate the semantic knowledge graph. The essence of semantic knowledge graph is to fuse the heterogeneous knowledge graphs. Because the animals and the plants belong to the heterogeneous knowledge maps, although the knowledge in other fields can be mentioned in the leaves of the knowledge tree, the animals and the plants are scattered at all. The semantic knowledge graph integrates the heterogeneous knowledge graph through an algorithm to form a field closed loop, and the field closed loop has the advantage that information with high association degree can be provided according to the existing information for reasoning if accurate data is not found in lower-layer application. For example, if no information on "pest control in apple plantation" is found in the application search, fruits with similar attributes are inferred from the attributes of apples about pest control, and the discovered pests are used to deduce plants on which they act, so that it may be possible to discover that the pests also act on apples.
In a further embodiment of this embodiment, the method for processing text data is combined to construct three models to achieve data output, as shown in fig. 4, 5 and 6, in the fourth step, all parts of speech in the text data are labeled, including word segmentation, part of speech labeling and role labeling of the text data;
identifying all entities in the text data, including word segmentation, keyword extraction, entity identification and reference resolution of the text data;
and judging the semantics expressed by the text data, including abstracting the text data, extracting events, extracting semantics, extracting themes, extracting keywords and disambiguating the semantics.
Specifically, fig. 4 is a flow of constructing a word segmentation tagging model, where a text is first segmented into words with different lengths by a word segmentation algorithm, then the words are determined by a part-of-speech tagging algorithm for the parts of speech of all the words, and then the words are tagged with additional information, such as location, time, and other attributes, by a role. In the agricultural field, the words belong to key information, and the role labeling algorithm is added to promote relationship reasoning and information retrieval in application. For example: the diseases and pests to be controlled of the oranges in summer are divided into summer, oranges, control and pests. In the part-of-speech tagging process of the word, wherein summer is tagged as the part of speech of the noun, the additional information of time needs to be input by tagging the word with characters, because the time plays an unavailable role in the agricultural field, and both plant cultivation and animal cultivation are sensitive to the time.
FIG. 5 is a process flow for entity recognition model construction. Firstly, segmenting a sentence into words with different lengths through a word segmentation algorithm, filtering verbs, adjectives and the like in the words by using a keyword extraction algorithm, wherein the processing time can be shortened through filtering operation, and then confirming all entity nouns in the text through entity recognition, because the words such as 'planting' and 'breeding' in the agricultural field can be verbs or nouns; or the relation nouns such as "father class" and "subclass" can be distinguished through entity identification, and the entities in the agricultural field can only be explicitly pointed to by "west canna" and "teddy dog". And then, carrying out entity alignment through a reference resolution algorithm, wherein the entity alignment of the reference resolution is to carry out alignment on the entity and the reference. For example, "apple appeals to the high-traffic company, but does not respond to the high-pass aspect," the high-traffic company "and" high-pass "are the relationship between the entity and the reference, and the two are combined into an equivalent set through a resolution algorithm.
FIG. 6 is a semantic recognition model construction flow. The semantic recognition algorithm mainly aims at large text data, firstly, the text is subjected to summarization and event extraction processing algorithm to perform slimming operation on the data, key sentences are extracted, semantic extraction algorithm is performed on the key sentences to obtain all possibly existing semantic sentences, and then theme extraction and keyword processing operation are performed on the extracted text so as to provide auxiliary analysis information for text semantics by utilizing information such as keywords and themes. And performing semantic disambiguation algorithm on the extracted keywords, the extracted subject information and all possible semantic sentences, wherein the semantic disambiguation algorithm scores and sorts all the sentences, and the first ranked sentence is considered as the most possible semantic.
In a further embodiment of this embodiment, iterative training and error correction of the text data are implemented, in the fifth step, the error existing in the existing data is modified, the behavior data of the modified action is recorded, the behavior data is processed and converted into trainable data, and iterative training of the text data is implemented;
downstream tasks of supply and demand information matching, intelligent question answering and patent document retrieval in the agricultural field are realized;
testing and marking the generated text data;
and adding, modifying and deleting the knowledge graph to complete the real-time updating of the knowledge graph.
Specifically, specific downstream tasks such as supply and demand information matching, intelligent question answering and document retrieval in the agricultural field are realized. The basic functions can be used for processing the test of the data and the marking of the new data, and then the correctness of the data is checked manually, so that the production efficiency can be improved, and the consumption of human resources is reduced. And modifying functions of adding, modifying, deleting and the like of the map in real time. And modifying errors existing in the existing data, recording behavior data of the modification action at the same time, converting the behavior data into trainable data of the model through data processing, and performing iterative optimization on the click model.
The embodiment also provides a semantic understanding system based on an agricultural field text, as shown in fig. 2, for implementing the semantic understanding method, the system includes a data processing layer and a data application layer, the data processing layer includes a data obtaining unit 110, a data processing unit 120 and a map building unit 130, the data application layer includes a data combining unit 140 and a data application unit 150; the functions are specifically realized as follows:
a data acquisition unit 110 for acquiring text data of an agricultural field;
the data processing unit 120 is configured to perform word segmentation and part-of-speech tagging on the text data, and perform entity processing on the text data subjected to word segmentation and part-of-speech tagging according to context information of the text data;
a map construction unit 130 for constructing a basic knowledge map of homologous text data and a semantic knowledge map of heterologous text data;
the data combination unit 140 is used for combining the text data processing methods of the data processing unit to form a word segmentation tagging model, an entity recognition model and a semantic recognition model;
and the data application unit 150 is used for performing iterative updating on the text data and performing real-time updating on the knowledge graph.
In another embodiment of this embodiment, the data processing unit 120 includes a first processing unit and a second processing unit, which are specifically implemented as follows:
the first processing unit is used for carrying out natural language processing on the text data; the natural language processing comprises word segmentation, keyword extraction, theme extraction, abstract, event extraction and semantic extraction;
the second processing unit is used for performing entity processing on the text data subjected to the natural language processing; the entity processing comprises part of speech tagging, entity recognition, semantic disambiguation, role tagging, reference resolution and entity alignment.
Specifically, the NLP processing and the NLP entity processing of the text data have been described in the above-mentioned semantic understanding method, and are not described here.
In another embodiment of this embodiment, the graph constructing unit 130 includes a basic knowledge graph constructing unit and a semantic knowledge graph constructing unit, which are specifically implemented as follows:
the basic knowledge graph construction unit is used for aligning the entities of the text data, importing the aligned text data into a basic knowledge graph and generating the basic knowledge graph; the method comprises the steps of comparing similarity of attributes owned by a plurality of entities of text data, setting a similarity threshold, and considering that the text data belong to the same entity when the similarity reaches a certain similarity threshold;
the semantic knowledge map construction unit is used for quantizing all atoms in the basic knowledge map, and processing the quantized data by a translation algorithm, a path learning algorithm, a type constraint algorithm and a heterogeneous information algorithm in a natural language to generate the semantic knowledge map.
Specifically, the construction of two kinds of knowledge maps is described in the semantic understanding method, and is not described here.
In another embodiment of this embodiment, the word segmentation tagging model is configured to tag all parts of speech in the text data, including performing word segmentation, part of speech tagging and role tagging on the text data;
the entity identification model is used for identifying all entities in the text data, and comprises the steps of performing word segmentation, keyword extraction, entity identification and reference resolution on the text data;
the semantic recognition model is used for judging the semantics expressed by the text data, and comprises the steps of abstracting the text data, extracting events, extracting semantics, extracting themes, extracting keywords and disambiguating the semantics.
Specifically, the data combining unit 140 includes: a rule model and a matching model. The rule model is provided with a word segmentation labeling model; the matching models include an entity recognition model and a semantic recognition model, wherein the functions and effects of the word segmentation labeling model, the entity recognition model and the semantic recognition model are described in the embodiments of the semantic understanding method, and are not described here
In another embodiment of this embodiment, the data combining unit 140 further forms a click model, and the data applying unit includes an AI interface, a basic function interface, a map query interface, and an error correction and inference interface;
the error correction and reasoning interface is used for modifying errors in the existing data, recording behavior data of modified actions, processing and converting the behavior data into trainable data of the click model, and realizing iterative training of the click model;
the AI interface comprises a word segmentation labeling model, an entity recognition model and a semantic recognition model and is used for realizing the downstream tasks of supply and demand information matching, intelligent question answering and patent document retrieval in the agricultural field;
the basic function interface is used for testing and marking the text data generated by the data processing unit;
and the map query interface is used for adding, modifying and deleting the knowledge map in the map construction unit to finish the real-time updating of the knowledge map.
Specifically, the click model in the data combination unit 140 is a source of behavior data, the click model records operation behavior data in the form of a log, the behavior data is collected according to several dimensions such as id, user name, search statement, click content, and the like, part of the data of the search statement is used as optimization model data, the rest of the data is regularly stored in a database, and the part of the data can be used as intelligent recommendation basic data to analyze user behavior. The data application unit 150 is mainly used for opening functions in the form of web interfaces, and the AI interfaces cover the rule models and the matching models of the data combination unit 140 and serve specific downstream tasks such as supply and demand information matching, intelligent question answering, patent document retrieval and the like in agriculture. The basic function interface opens all algorithms in the data processing unit 120 in an interface form, and the function opening is characterized in that the basic functions can be used for processing data test and new data marking, and then the data correctness is checked manually, so that the production efficiency can be improved, and the consumption of human resources is reduced. The map query interface comprises functions of adding, modifying, deleting and the like to the map, and the interface form is open, so that the operation is convenient, and the real-time modification can be realized. The error correction and reasoning interface is a click model presentation mode, errors existing in existing data can be modified in an interface mode, behavior data of modification actions are recorded at the same time, the behavior data are converted into model trainable data through data processing, and iterative optimization is conducted on the click model.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A semantic understanding method based on an agricultural field text is characterized by comprising the following steps:
acquiring text data in the agricultural field;
performing word segmentation and part-of-speech tagging on the text data, and performing entity processing on the text data subjected to word segmentation and part-of-speech tagging according to the context information of the text data;
constructing a basic knowledge graph of homologous text data and a semantic knowledge graph of heterologous text data;
step four, combining the text data processing methods in the step two to form a word segmentation labeling model, an entity recognition model and a semantic recognition model;
and step five, performing iterative update on the text data and updating the knowledge graph in real time.
2. The semantic understanding method based on the agricultural field text according to claim 1, wherein in the second step, natural language processing is performed on the text data; the natural language processing comprises word segmentation, keyword extraction, theme extraction, abstract, event extraction and semantic extraction;
carrying out entity processing on the text data subjected to natural language processing; the entity processing comprises part of speech tagging, entity recognition, semantic disambiguation, role tagging, reference resolution and entity alignment.
3. The semantic understanding method based on the agricultural field text according to claim 1, characterized in that in the third step, entity alignment is performed on the text data, and the aligned text data is imported into a basic map to generate a basic knowledge map; the method comprises the steps of comparing similarity of attributes owned by a plurality of entities of text data, setting a similarity threshold, and considering that the text data belong to the same entity when the similarity reaches a certain similarity threshold;
all atoms in the basic knowledge graph are quantized, and the quantized data are processed by a translation algorithm, a path learning algorithm, a type constraint algorithm and a heterogeneous information algorithm in a natural language to generate the semantic knowledge graph.
4. The semantic understanding method based on the texts in the agricultural field according to claim 1, wherein in the fourth step, all parts of speech in the text data are labeled, including word segmentation, part of speech labeling and role labeling of the text data;
identifying all entities in the text data, including word segmentation, keyword extraction, entity identification and reference resolution of the text data;
and judging the semantics expressed by the text data, including abstracting the text data, extracting events, extracting semantics, extracting themes, extracting keywords and disambiguating the semantics.
5. The semantic understanding method based on the agricultural field text according to claim 1, wherein in the fifth step, errors existing in existing data are modified, behavior data of modified actions are recorded, the behavior data are processed and converted into trainable data, and iterative training of text data is achieved;
downstream tasks of supply and demand information matching, intelligent question answering and patent document retrieval in the agricultural field are realized;
testing and marking the generated text data;
and adding, modifying and deleting the knowledge graph to complete the real-time updating of the knowledge graph.
6. A semantic understanding system based on agricultural field texts is used for realizing the semantic understanding method according to any one of claims 1 to 5, and comprises a data processing layer and a data application layer, wherein the data processing layer comprises a data acquisition unit, a data processing unit and a map construction unit, the data application layer comprises a data combination unit and a data application unit, and the functions of the semantic understanding system are specifically realized as follows:
the data acquisition unit is used for acquiring text data of the agricultural field;
the data processing unit is used for performing word segmentation and part-of-speech tagging on the text data and performing entity processing on the text data subjected to word segmentation and part-of-speech tagging according to the text data context information;
the map construction unit is used for constructing a basic knowledge map of homologous text data and a semantic knowledge map of heterologous text data;
the data combination unit is used for combining the text data processing methods of the data processing unit to form a word segmentation labeling model, an entity recognition model and a semantic recognition model;
and the data application unit is used for performing iterative update on the text data and updating the knowledge graph in real time.
7. The agricultural field text-based semantic understanding system according to claim 6, wherein the data processing unit comprises a first processing unit and a second processing unit, and is implemented as follows:
the first processing unit is used for carrying out natural language processing on the text data; the natural language processing comprises word segmentation, keyword extraction, theme extraction, abstract, event extraction and semantic extraction;
the second processing unit is used for performing entity processing on the text data subjected to the natural language processing; the entity processing comprises part of speech tagging, entity recognition, semantic disambiguation, role tagging, reference resolution and entity alignment.
8. The agricultural field text-based semantic understanding system according to claim 6, wherein the graph construction unit comprises a basic knowledge graph construction unit and a semantic knowledge graph construction unit, and the semantic knowledge graph construction unit is specifically realized as follows:
the basic knowledge graph construction unit is used for aligning the entities of the text data, importing the aligned text data into a basic knowledge graph and generating the basic knowledge graph; the method comprises the steps of comparing similarity of attributes owned by a plurality of entities of text data, setting a similarity threshold, and considering that the text data belong to the same entity when the similarity reaches a certain similarity threshold;
the semantic knowledge map construction unit is used for quantizing all atoms in the basic knowledge map, and processing the quantized data by a translation algorithm, a path learning algorithm, a type constraint algorithm and a heterogeneous information algorithm in a natural language to generate the semantic knowledge map.
9. The agricultural field text-based semantic understanding system according to claim 6, wherein the segmentation labeling model is used for labeling all parts of speech in the text data, including segmentation, part of speech labeling and role labeling of the text data;
the entity identification model is used for identifying all entities in the text data, and comprises the steps of performing word segmentation, keyword extraction, entity identification and reference resolution on the text data;
the semantic recognition model is used for judging the semantics expressed by the text data, and comprises the steps of abstracting the text data, extracting events, extracting semantics, extracting themes, extracting keywords and disambiguating the semantics.
10. The agricultural field text-based semantic understanding system according to claim 6, wherein the data combining unit further forms a click model, and the data application unit comprises an AI interface, a basic function interface, a map query interface, an error correction and reasoning interface;
the error correction and reasoning interface is used for modifying errors in the existing data, recording behavior data of modified actions, processing and converting the behavior data into trainable data of the click model, and realizing iterative training of the click model;
the AI interface comprises a word segmentation labeling model, an entity recognition model and a semantic recognition model and is used for realizing the downstream tasks of supply and demand information matching, intelligent question answering and patent document retrieval in the agricultural field;
the basic function interface is used for testing and marking the text data generated by the data processing unit;
and the map query interface is used for adding, modifying and deleting the knowledge map in the map construction unit to finish the real-time updating of the knowledge map.
CN202111203860.1A 2021-10-15 2021-10-15 Semantic understanding method and system based on agricultural field text Pending CN113869066A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111203860.1A CN113869066A (en) 2021-10-15 2021-10-15 Semantic understanding method and system based on agricultural field text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111203860.1A CN113869066A (en) 2021-10-15 2021-10-15 Semantic understanding method and system based on agricultural field text

Publications (1)

Publication Number Publication Date
CN113869066A true CN113869066A (en) 2021-12-31

Family

ID=78999825

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111203860.1A Pending CN113869066A (en) 2021-10-15 2021-10-15 Semantic understanding method and system based on agricultural field text

Country Status (1)

Country Link
CN (1) CN113869066A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115345152A (en) * 2022-10-19 2022-11-15 北方健康医疗大数据科技有限公司 Template library updating method, report analyzing method, device, equipment and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115345152A (en) * 2022-10-19 2022-11-15 北方健康医疗大数据科技有限公司 Template library updating method, report analyzing method, device, equipment and medium
CN115345152B (en) * 2022-10-19 2023-03-14 北方健康医疗大数据科技有限公司 Template library updating method, report analyzing method, device, equipment and medium

Similar Documents

Publication Publication Date Title
Velardi et al. Ontolearn reloaded: A graph-based algorithm for taxonomy induction
CN108804521B (en) Knowledge graph-based question-answering method and agricultural encyclopedia question-answering system
CN108121829B (en) Software defect-oriented domain knowledge graph automatic construction method
CN108897857B (en) Chinese text subject sentence generating method facing field
CN103823824B (en) A kind of method and system that text classification corpus is built automatically by the Internet
JP5798624B2 (en) System and method for analyzing and synthesizing complex knowledge representations
CN111401066B (en) Artificial intelligence-based word classification model training method, word processing method and device
CN109937417A (en) The system and method for context searchig for electronical record
CN111897968A (en) Industrial information security knowledge graph construction method and system
CN111274790B (en) Chapter-level event embedding method and device based on syntactic dependency graph
WO2020074787A1 (en) Method of searching patent documents
CN113196277A (en) System for retrieving natural language documents
CN110175334A (en) Text knowledge's extraction system and method based on customized knowledge slot structure
CN112000802A (en) Software defect positioning method based on similarity integration
CN114912435A (en) Power text knowledge discovery method and device based on frequent itemset algorithm
Popchev et al. Text Mining in the Domain of Plant Genetic Resources
CN114610846A (en) Knowledge graph expanding and complementing method for heuristic bionic knowledge grafting strategy
CN116010564A (en) Construction method of rice pest question-answering system based on multi-mode knowledge graph
Maitre et al. A meaningful information extraction system for interactive analysis of documents
Goel et al. Towards a virtual librarian for biologically inspired design
CN114579695A (en) Event extraction method, device, equipment and storage medium
CN113869066A (en) Semantic understanding method and system based on agricultural field text
CN116049376B (en) Method, device and system for retrieving and replying information and creating knowledge
CN114238735B (en) Intelligent internet data acquisition method
CN112487154B (en) Intelligent search method based on natural language

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination