CN113869066A

CN113869066A - Semantic understanding method and system based on agricultural field text

Info

Publication number: CN113869066A
Application number: CN202111203860.1A
Authority: CN
Inventors: 方佩; 冯仁伟; 全威; 谢昭俊; 侯敏; 杨森; 李国民; 姜涛
Original assignee: China Comservice Enrising Information Technology Co Ltd
Current assignee: China Comservice Enrising Information Technology Co Ltd
Priority date: 2021-10-15
Filing date: 2021-10-15
Publication date: 2021-12-31

Abstract

The invention discloses a semantic understanding method and a semantic understanding system based on texts in the agricultural field, which relate to natural language processing and solve the problems of inaccurate results of supply and demand information matching, intelligent question answering and document retrieval in the agricultural field, and have the technical scheme that: acquiring text data in the agricultural field; performing word segmentation and part-of-speech tagging on the text data, and performing entity processing on the text data subjected to word segmentation and part-of-speech tagging according to the context information of the text data; constructing a basic knowledge graph of homologous text data and a semantic knowledge graph of heterologous text data; combining the text data processed in the step two to form a word segmentation labeling model, an entity recognition model and a semantic recognition model; and step five, performing iterative update on the text data and updating the knowledge graph in real time. The invention provides a basic natural language processing method which is combined and output in a composite model form, and provides application to text data.

Description

Semantic understanding method and system based on agricultural field text

Technical Field

The invention relates to the field of natural language processing, in particular to a semantic understanding method and a semantic understanding system based on texts in the agricultural field.

Background

With the large environment in which natural language processing technology is vigorously developed in various fields, there is also an urgent need for the construction of natural language processing directions in the agricultural field.

The results of the prior art on supply and demand information matching, intelligent question answering and literature retrieval in the agricultural field are inaccurate, and the updating of data in the agricultural field is extremely slow, so that the data in the agricultural field is still in a state of years ago.

The invention provides a semantic understanding method and a semantic understanding system for words in the agricultural field, which are designed according to natural language characteristics and lower-layer application in the agricultural field and are combined with the characteristics of various knowledge types, strong heterogeneous relevance, relatively fixed language sentence patterns and the like in the agricultural field.

Disclosure of Invention

The invention aims to provide a semantic understanding method and a semantic understanding system based on texts in the agricultural field, and solves the problems of inaccurate results of supply and demand information matching, intelligent question answering and document retrieval in the agricultural field.

The technical purpose of the invention is realized by the following technical scheme:

in a first aspect, the invention provides a semantic understanding method based on an agricultural field text, which comprises the following steps:

acquiring text data in the agricultural field;

performing word segmentation and part-of-speech tagging on the text data, and performing entity processing on the text data subjected to word segmentation and part-of-speech tagging according to the context information of the text data;

constructing a basic knowledge graph of homologous text data and a semantic knowledge graph of heterologous text data;

step four, combining the text data processing methods in the step two to form a word segmentation labeling model, an entity recognition model and a semantic recognition model;

and step five, performing iterative update on the text data and updating the knowledge graph in real time.

The invention carries out word segmentation on text data in the aspect of agricultural field and word part of speech tagging, then constructs a basic knowledge graph of homologous text data and a semantic knowledge graph of heterologous text data from the text data after word segmentation and word part of speech tagging.

Further, in the second step, natural language processing is performed on the text data; the natural language processing comprises word segmentation, keyword extraction, theme extraction, abstract, event extraction and semantic extraction;

carrying out entity processing on the text data subjected to natural language processing; the entity processing comprises part of speech tagging, entity recognition, semantic disambiguation, role tagging, reference resolution and entity alignment.

Further, in the third step, entity alignment is performed on the text data, and the aligned text data is imported into the basic map to generate a basic knowledge map; the method comprises the steps of comparing similarity of attributes owned by a plurality of entities of text data, setting a similarity threshold, and considering that the text data belong to the same entity when the similarity reaches a certain similarity threshold;

all atoms in the basic knowledge graph are quantized, and the quantized data are processed by a translation algorithm, a path learning algorithm, a type constraint algorithm and a heterogeneous information algorithm in a natural language to generate the semantic knowledge graph.

Furthermore, in the fourth step, all parts of speech in the text data are labeled, including word segmentation, part of speech labeling and role labeling for the text data;

identifying all entities in the text data, including word segmentation, keyword extraction, entity identification and reference resolution of the text data;

and judging the semantics expressed by the text data, including abstracting the text data, extracting events, extracting semantics, extracting themes, extracting keywords and disambiguating the semantics.

Furthermore, in the fifth step, the error existing in the existing data is modified, the behavior data of the modified action is recorded, the behavior data is processed and converted into trainable data, and iterative training of the text data is realized;

downstream tasks of supply and demand information matching, intelligent question answering and patent document retrieval in the agricultural field are realized;

testing and marking the generated text data;

and adding, modifying and deleting the knowledge graph to complete the real-time updating of the knowledge graph.

In a second aspect, the invention further provides a semantic understanding system based on the text in the agricultural field, which is used for realizing the semantic understanding method in the first aspect, and the system comprises a data processing layer and a data application layer, wherein the data processing layer comprises a data acquisition unit, a data processing unit and a map construction unit, and the data application layer comprises a data combination unit and a data application unit; the functions are specifically realized as follows:

the data acquisition unit is used for acquiring text data of the agricultural field;

the data processing unit is used for performing word segmentation and part-of-speech tagging on the text data and performing entity processing on the text data subjected to word segmentation and part-of-speech tagging according to the text data context information;

the map construction unit is used for constructing a basic knowledge map of homologous text data and a semantic knowledge map of heterologous text data;

the data combination unit is used for combining the text data processing methods of the data processing unit to form a word segmentation labeling model, an entity recognition model and a semantic recognition model;

and the data application unit is used for performing iterative update on the text data and updating the knowledge graph in real time.

Further, the data processing unit includes a first processing unit and a second processing unit, and is specifically implemented as follows:

the first processing unit is used for carrying out natural language processing on the text data; the natural language processing comprises word segmentation, keyword extraction, theme extraction, abstract, event extraction and semantic extraction;

the second processing unit is used for performing entity processing on the text data subjected to the natural language processing; the entity processing comprises part of speech tagging, entity recognition, semantic disambiguation, role tagging, reference resolution and entity alignment.

Further, the map construction unit comprises a basic knowledge map construction unit and a semantic knowledge map construction unit, and is specifically realized as follows:

the basic knowledge graph construction unit is used for aligning the entities of the text data, importing the aligned text data into a basic knowledge graph and generating the basic knowledge graph; the method comprises the steps of comparing similarity of attributes owned by a plurality of entities of text data, setting a similarity threshold, and considering that the text data belong to the same entity when the similarity reaches a certain similarity threshold;

the semantic knowledge map construction unit is used for quantizing all atoms in the basic knowledge map, and processing the quantized data by a translation algorithm, a path learning algorithm, a type constraint algorithm and a heterogeneous information algorithm in a natural language to generate the semantic knowledge map.

Furthermore, the word segmentation tagging model is used for tagging all parts of speech in the text data, including word segmentation, part of speech tagging and role tagging of the text data;

the entity identification model is used for identifying all entities in the text data, and comprises the steps of performing word segmentation, keyword extraction, entity identification and reference resolution on the text data;

the semantic recognition model is used for judging the semantics expressed by the text data, and comprises the steps of abstracting the text data, extracting events, extracting semantics, extracting themes, extracting keywords and disambiguating the semantics.

Further, the data combination unit also forms a click model, and the data application unit comprises an AI interface, a basic function interface, a map query interface and an error correction and reasoning interface;

the error correction and reasoning interface is used for modifying errors in the existing data, recording behavior data of modified actions, processing and converting the behavior data into trainable data of the click model, and realizing iterative training of the click model;

the AI interface comprises a word segmentation labeling model, an entity recognition model and a semantic recognition model and is used for realizing the downstream tasks of supply and demand information matching, intelligent question answering and patent document retrieval in the agricultural field;

the basic function interface is used for testing and marking the text data generated by the data processing unit;

and the map query interface is used for adding, modifying and deleting the knowledge map in the map construction unit to finish the real-time updating of the knowledge map.

Compared with the prior art, the invention has the following beneficial effects:

the invention is used for serving specific downstream tasks such as supply and demand information matching, intelligent question answering, patent literature retrieval and the like in agriculture. All algorithms for processing data in the agricultural field are opened in an interface form through the basic function interfaces, and the basic function interfaces are used as functional opening advantages that the basic function interfaces can be used for processing data test and new data marking, and then the data correctness is checked manually, so that the production efficiency is improved, and the consumption of human resources is reduced. The knowledge graph can be added, modified, deleted and the like through the graph query interface, the interface form is open, the operation is convenient, and the modification can be carried out in real time. The error correction and reasoning interface is a click model presentation mode, errors existing in existing data can be modified in an interface mode, behavior data of modification actions are recorded at the same time, the data are converted into model trainable data through data processing, and iterative optimization of the data can be achieved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

FIG. 1 is a flow chart of a semantic understanding method according to an embodiment of the present invention;

FIG. 2 is an architecture diagram of a semantic understanding system provided by an embodiment of the present invention;

fig. 3 is a flowchart of acquiring text data according to an embodiment of the present invention;

FIG. 4 is a flow chart of a construction of a segmentation tagging model according to an embodiment of the present invention;

FIG. 5 is a flow chart of entity recognition model construction according to an embodiment of the present invention;

FIG. 6 is a flow chart of semantic identification model construction according to an embodiment of the present invention;

FIG. 7 is a flow diagram of a basic knowledge graph construction provided by an embodiment of the present invention;

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention.

It will be understood that when an element is referred to as being "secured to" or "disposed on" another element, it can be directly on the other element or be indirectly on the other element. When an element is referred to as being "connected to" another element, it can be directly or indirectly connected to the other element.

It will be understood that the terms "length," "width," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like, as used herein, refer to an orientation or positional relationship indicated in the drawings that is solely for the purpose of facilitating the description and simplifying the description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and is therefore not to be construed as limiting the invention.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

Examples

As shown in fig. 1, the embodiment provides a semantic understanding method based on an agricultural field text, and the method includes the following steps:

acquiring text data in the agricultural field;

Specifically, as shown in fig. 3, in the step one, the data types of the text data may be divided into text data, behavior data, encyclopedia data, and core word data. The text data mainly comes from data such as forum comment information, news information, supply and demand order information and the like which are open on the Internet, and the data can be directly acquired through a data capture tool, and then the acquired data set is cleaned and sorted. The behavior data is derived from operator behavior log data generated by a click model in a core engine layer; the encyclopedia data can use official open source data such as encyclopedia, Wikipedia and the like, and the data has the advantages of detailed content and high accuracy; the data about the core words refers to proper nouns in the agricultural field or words with special meanings in the field, and the data is provided by agricultural experts.

In the second step, the text data processing can be divided into NLP processing and NLP entity processing, word segmentation and part of speech tagging belong to the NLP processing, and entity processing belongs to the NLP entity processing. The natural language processing is abbreviated as NLP processing, and the NLP processing comprises word segmentation, keyword extraction, theme extraction, abstract, event extraction and semantic extraction; NLP entity processing comprises part of speech tagging, entity recognition, semantic disambiguation, role tagging, reference resolution and entity alignment.

In the third step, agriculture can be broadly divided into five types of planting industry, fishery industry, animal husbandry, forestry, subsidiary industry and the like, and due to the fact that association exists among the types, knowledge in the agriculture industry is rich, the relation is complex, semantic understanding and reasoning need to be assisted by means of knowledge maps, basic knowledge maps are mainly constructed by agricultural experts, and accuracy and authority of the maps can be guaranteed. The method has the great characteristics that animals and plants have certain correlation, the animal and plant knowledge graph belongs to a heterogeneous knowledge graph in the natural language field, the basic knowledge graph cannot meet the requirement of reasoning the heterogeneous knowledge graph, the heterogeneous knowledge graph needs to be subjected to translation, path learning, type constraint and heterogeneous information algorithm processing in natural language, the possible hidden relation is found and deduced, and the heterogeneous information is combined to form the semantic knowledge graph.

In the fourth step, the text data processed in the second step is output through three models to complete information matching, intelligent question answering and document retrieval.

In the fifth step, agriculture progresses with the development, so that data needs to be updated and corrected, and part-of-speech tagging needs to be performed on new text data to complete subsequent information matching and searching.

In a further embodiment of this embodiment, the text is processed by natural language processing and natural language entity processing, and in the second step, the text data is processed by natural language processing; the natural language processing comprises word segmentation, keyword extraction, theme extraction, abstract, event extraction and semantic extraction;

In a further embodiment of this embodiment, a knowledge graph is constructed for the text data, as shown in fig. 7, in the third step, entity alignment is performed on the text data, and the aligned text data is imported into a basic graph to generate a basic knowledge graph; the method comprises the steps of comparing similarity of attributes owned by a plurality of entities of text data, setting a similarity threshold, and considering that the text data belong to the same entity when the similarity reaches a certain similarity threshold;

Specifically, entity alignment is performed on data before the data is imported into a basic map, most plants and animals are distinguished in agriculture, and the entity alignment algorithm compares the similarity of attributes owned by a plurality of entities, and the entities are considered to be the same entity when the similarity reaches a certain threshold.

The basic knowledge graph can be constructed by experts in the agricultural field, the advantage of constructing the basic knowledge graph by the experts is that the experts deeply plough the field for a long time, the basic knowledge graph has an effect of advancing with time on the knowledge in the field, and some relations which do not exist or can hardly be used can be effectively avoided. The expert can ensure the accuracy and authority of the map by constructing the basic knowledge map, and the semantic map construction is facilitated. The knowledge graph is formed by combining a plurality of triples, all atoms in the basic graph are quantized, and the quantized data are processed by a translation algorithm, a path learning algorithm, a type constraint algorithm and a heterogeneous information algorithm in natural language to generate the semantic knowledge graph. The essence of semantic knowledge graph is to fuse the heterogeneous knowledge graphs. Because the animals and the plants belong to the heterogeneous knowledge maps, although the knowledge in other fields can be mentioned in the leaves of the knowledge tree, the animals and the plants are scattered at all. The semantic knowledge graph integrates the heterogeneous knowledge graph through an algorithm to form a field closed loop, and the field closed loop has the advantage that information with high association degree can be provided according to the existing information for reasoning if accurate data is not found in lower-layer application. For example, if no information on "pest control in apple plantation" is found in the application search, fruits with similar attributes are inferred from the attributes of apples about pest control, and the discovered pests are used to deduce plants on which they act, so that it may be possible to discover that the pests also act on apples.

In a further embodiment of this embodiment, the method for processing text data is combined to construct three models to achieve data output, as shown in fig. 4, 5 and 6, in the fourth step, all parts of speech in the text data are labeled, including word segmentation, part of speech labeling and role labeling of the text data;

Specifically, fig. 4 is a flow of constructing a word segmentation tagging model, where a text is first segmented into words with different lengths by a word segmentation algorithm, then the words are determined by a part-of-speech tagging algorithm for the parts of speech of all the words, and then the words are tagged with additional information, such as location, time, and other attributes, by a role. In the agricultural field, the words belong to key information, and the role labeling algorithm is added to promote relationship reasoning and information retrieval in application. For example: the diseases and pests to be controlled of the oranges in summer are divided into summer, oranges, control and pests. In the part-of-speech tagging process of the word, wherein summer is tagged as the part of speech of the noun, the additional information of time needs to be input by tagging the word with characters, because the time plays an unavailable role in the agricultural field, and both plant cultivation and animal cultivation are sensitive to the time.

FIG. 5 is a process flow for entity recognition model construction. Firstly, segmenting a sentence into words with different lengths through a word segmentation algorithm, filtering verbs, adjectives and the like in the words by using a keyword extraction algorithm, wherein the processing time can be shortened through filtering operation, and then confirming all entity nouns in the text through entity recognition, because the words such as 'planting' and 'breeding' in the agricultural field can be verbs or nouns; or the relation nouns such as "father class" and "subclass" can be distinguished through entity identification, and the entities in the agricultural field can only be explicitly pointed to by "west canna" and "teddy dog". And then, carrying out entity alignment through a reference resolution algorithm, wherein the entity alignment of the reference resolution is to carry out alignment on the entity and the reference. For example, "apple appeals to the high-traffic company, but does not respond to the high-pass aspect," the high-traffic company "and" high-pass "are the relationship between the entity and the reference, and the two are combined into an equivalent set through a resolution algorithm.

FIG. 6 is a semantic recognition model construction flow. The semantic recognition algorithm mainly aims at large text data, firstly, the text is subjected to summarization and event extraction processing algorithm to perform slimming operation on the data, key sentences are extracted, semantic extraction algorithm is performed on the key sentences to obtain all possibly existing semantic sentences, and then theme extraction and keyword processing operation are performed on the extracted text so as to provide auxiliary analysis information for text semantics by utilizing information such as keywords and themes. And performing semantic disambiguation algorithm on the extracted keywords, the extracted subject information and all possible semantic sentences, wherein the semantic disambiguation algorithm scores and sorts all the sentences, and the first ranked sentence is considered as the most possible semantic.

In a further embodiment of this embodiment, iterative training and error correction of the text data are implemented, in the fifth step, the error existing in the existing data is modified, the behavior data of the modified action is recorded, the behavior data is processed and converted into trainable data, and iterative training of the text data is implemented;

testing and marking the generated text data;

Specifically, specific downstream tasks such as supply and demand information matching, intelligent question answering and document retrieval in the agricultural field are realized. The basic functions can be used for processing the test of the data and the marking of the new data, and then the correctness of the data is checked manually, so that the production efficiency can be improved, and the consumption of human resources is reduced. And modifying functions of adding, modifying, deleting and the like of the map in real time. And modifying errors existing in the existing data, recording behavior data of the modification action at the same time, converting the behavior data into trainable data of the model through data processing, and performing iterative optimization on the click model.

The embodiment also provides a semantic understanding system based on an agricultural field text, as shown in fig. 2, for implementing the semantic understanding method, the system includes a data processing layer and a data application layer, the data processing layer includes a data obtaining unit 110, a data processing unit 120 and a map building unit 130, the data application layer includes a data combining unit 140 and a data application unit 150; the functions are specifically realized as follows:

a data acquisition unit 110 for acquiring text data of an agricultural field;

the data processing unit 120 is configured to perform word segmentation and part-of-speech tagging on the text data, and perform entity processing on the text data subjected to word segmentation and part-of-speech tagging according to context information of the text data;

a map construction unit 130 for constructing a basic knowledge map of homologous text data and a semantic knowledge map of heterologous text data;

the data combination unit 140 is used for combining the text data processing methods of the data processing unit to form a word segmentation tagging model, an entity recognition model and a semantic recognition model;

and the data application unit 150 is used for performing iterative updating on the text data and performing real-time updating on the knowledge graph.

In another embodiment of this embodiment, the data processing unit 120 includes a first processing unit and a second processing unit, which are specifically implemented as follows:

Specifically, the NLP processing and the NLP entity processing of the text data have been described in the above-mentioned semantic understanding method, and are not described here.

In another embodiment of this embodiment, the graph constructing unit 130 includes a basic knowledge graph constructing unit and a semantic knowledge graph constructing unit, which are specifically implemented as follows:

Specifically, the construction of two kinds of knowledge maps is described in the semantic understanding method, and is not described here.

In another embodiment of this embodiment, the word segmentation tagging model is configured to tag all parts of speech in the text data, including performing word segmentation, part of speech tagging and role tagging on the text data;

Specifically, the data combining unit 140 includes: a rule model and a matching model. The rule model is provided with a word segmentation labeling model; the matching models include an entity recognition model and a semantic recognition model, wherein the functions and effects of the word segmentation labeling model, the entity recognition model and the semantic recognition model are described in the embodiments of the semantic understanding method, and are not described here

In another embodiment of this embodiment, the data combining unit 140 further forms a click model, and the data applying unit includes an AI interface, a basic function interface, a map query interface, and an error correction and inference interface;

Specifically, the click model in the data combination unit 140 is a source of behavior data, the click model records operation behavior data in the form of a log, the behavior data is collected according to several dimensions such as id, user name, search statement, click content, and the like, part of the data of the search statement is used as optimization model data, the rest of the data is regularly stored in a database, and the part of the data can be used as intelligent recommendation basic data to analyze user behavior. The data application unit 150 is mainly used for opening functions in the form of web interfaces, and the AI interfaces cover the rule models and the matching models of the data combination unit 140 and serve specific downstream tasks such as supply and demand information matching, intelligent question answering, patent document retrieval and the like in agriculture. The basic function interface opens all algorithms in the data processing unit 120 in an interface form, and the function opening is characterized in that the basic functions can be used for processing data test and new data marking, and then the data correctness is checked manually, so that the production efficiency can be improved, and the consumption of human resources is reduced. The map query interface comprises functions of adding, modifying, deleting and the like to the map, and the interface form is open, so that the operation is convenient, and the real-time modification can be realized. The error correction and reasoning interface is a click model presentation mode, errors existing in existing data can be modified in an interface mode, behavior data of modification actions are recorded at the same time, the behavior data are converted into model trainable data through data processing, and iterative optimization is conducted on the click model.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A semantic understanding method based on an agricultural field text is characterized by comprising the following steps:

acquiring text data in the agricultural field;

2. The semantic understanding method based on the agricultural field text according to claim 1, wherein in the second step, natural language processing is performed on the text data; the natural language processing comprises word segmentation, keyword extraction, theme extraction, abstract, event extraction and semantic extraction;

3. The semantic understanding method based on the agricultural field text according to claim 1, characterized in that in the third step, entity alignment is performed on the text data, and the aligned text data is imported into a basic map to generate a basic knowledge map; the method comprises the steps of comparing similarity of attributes owned by a plurality of entities of text data, setting a similarity threshold, and considering that the text data belong to the same entity when the similarity reaches a certain similarity threshold;

4. The semantic understanding method based on the texts in the agricultural field according to claim 1, wherein in the fourth step, all parts of speech in the text data are labeled, including word segmentation, part of speech labeling and role labeling of the text data;

5. The semantic understanding method based on the agricultural field text according to claim 1, wherein in the fifth step, errors existing in existing data are modified, behavior data of modified actions are recorded, the behavior data are processed and converted into trainable data, and iterative training of text data is achieved;

testing and marking the generated text data;

6. A semantic understanding system based on agricultural field texts is used for realizing the semantic understanding method according to any one of claims 1 to 5, and comprises a data processing layer and a data application layer, wherein the data processing layer comprises a data acquisition unit, a data processing unit and a map construction unit, the data application layer comprises a data combination unit and a data application unit, and the functions of the semantic understanding system are specifically realized as follows:

7. The agricultural field text-based semantic understanding system according to claim 6, wherein the data processing unit comprises a first processing unit and a second processing unit, and is implemented as follows:

8. The agricultural field text-based semantic understanding system according to claim 6, wherein the graph construction unit comprises a basic knowledge graph construction unit and a semantic knowledge graph construction unit, and the semantic knowledge graph construction unit is specifically realized as follows:

9. The agricultural field text-based semantic understanding system according to claim 6, wherein the segmentation labeling model is used for labeling all parts of speech in the text data, including segmentation, part of speech labeling and role labeling of the text data;

10. The agricultural field text-based semantic understanding system according to claim 6, wherein the data combining unit further forms a click model, and the data application unit comprises an AI interface, a basic function interface, a map query interface, an error correction and reasoning interface;