CN114385819B - Environment judicial domain ontology construction method and device and related equipment - Google Patents

Environment judicial domain ontology construction method and device and related equipment Download PDF

Info

Publication number
CN114385819B
CN114385819B CN202210286382.3A CN202210286382A CN114385819B CN 114385819 B CN114385819 B CN 114385819B CN 202210286382 A CN202210286382 A CN 202210286382A CN 114385819 B CN114385819 B CN 114385819B
Authority
CN
China
Prior art keywords
unstructured
structured
text data
extracted
term
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210286382.3A
Other languages
Chinese (zh)
Other versions
CN114385819A (en
Inventor
陈晓红
柏天翼
曹文治
胡东滨
刘利枚
徐雪松
梁伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University of Technology
Original Assignee
Hunan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University of Technology filed Critical Hunan University of Technology
Priority to CN202210286382.3A priority Critical patent/CN114385819B/en
Publication of CN114385819A publication Critical patent/CN114385819A/en
Application granted granted Critical
Publication of CN114385819B publication Critical patent/CN114385819B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Technology Law (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for constructing an ontology in the field of environmental judicial and a related medium, wherein the method comprises the following steps: acquiring structured text data and unstructured text data, and respectively performing term extraction to construct a structured term set and an unstructured term set; taking the structured term set as a structured concept set to carry out relationship extraction to obtain a structured relationship set; based on TF-IDF algorithm and clustering algorithm, carrying out concept extraction on unstructured text data by combining an unstructured term set to obtain an unstructured concept set; selecting sentences with the same terms from the unstructured text data in sequence as sentences to be extracted; extracting the relation of the sentences to be extracted according to the unstructured concept set, and adding the obtained relation into the unstructured relation set until all the sentences to be extracted are executed; and obtaining an environment judicial domain ontology based on the structured relation set and the unstructured relation set, and improving the processing efficiency of the environment judicial case by adopting the method.

Description

Environment judicial domain ontology construction method and device and related equipment
Technical Field
The invention relates to the field of ontology construction, in particular to an ontology construction method and device in the field of environmental judicial, computer equipment and a storage medium.
Background
Since 2014, 1 thousand special environment judicial institutions are established all over the country, aiming at carrying out specialized trial and judgment on relevant cases related to ecological environment, natural resources and the like. The text of the environmental judicial field relates to a plurality of subject knowledge of law, biology, chemistry and the like, generally identifies the field of the related case, and then carries out professional judgment in the corresponding field, but at present, there are many problems in the judgment of the environmental judicial case. For example, since the environmental judicial domain text relates to knowledge in multiple domains, when handling such cases, the difficulty of cross-domain recognition is high, which often results in low efficiency of handling such cases because the same case may relate to multiple domains simultaneously or details of some domains are ignored.
Therefore, the existing environmental judicial cases have the problem of low treatment efficiency.
Disclosure of Invention
The embodiment of the invention provides a method and a device for constructing an environment judicial domain ontology, computer equipment and a storage medium, which are used for improving the processing efficiency of environment judicial cases.
In order to solve the above technical problem, an embodiment of the present application provides a method for constructing an ontology in an environmental judicial domain, including.
The method comprises the steps of obtaining at least one piece of structured text data and at least one piece of unstructured text data, wherein the structured text data and the unstructured text data are both environment judicial domain text data.
And respectively carrying out term extraction on the structured text data and the unstructured text data, constructing a structured term set according to the extracted structured terms, and constructing an unstructured term set according to the extracted unstructured terms.
And taking the structured term set as a structured concept set, and performing relation extraction on the structured text data based on the structured concept set to obtain a structured relation set.
And performing concept extraction on each unstructured text data by combining the unstructured term set based on a TF-IDF algorithm and a clustering algorithm to obtain an unstructured concept set.
And sequentially selecting sentences with the same terms from the unstructured text data as sentences to be extracted based on a preset sentence selection mode.
And extracting the relation of the sentences to be extracted according to the unstructured concept set, and adding the extracted relation into an unstructured relation set until all the sentences to be extracted are executed.
And obtaining an environment judicial domain ontology based on the structured relation set and the unstructured relation set.
In order to solve the above technical problem, an embodiment of the present application further provides an apparatus for building an ontology in the environmental judicial field, including.
The data acquisition module is used for acquiring at least one structured text data and at least one unstructured text data, wherein the structured text data and the unstructured text data are both environment judicial domain text data.
And the term set acquisition module is used for respectively extracting terms from the structured text data and the unstructured text data, constructing a structured term set according to the extracted structured terms, and constructing an unstructured term set according to the extracted unstructured terms.
And the structured relation set acquisition module is used for taking the structured term set as a structured concept set and carrying out relation extraction on the structured text data based on the structured concept set to obtain a structured relation set.
And the unstructured concept set acquisition module is used for performing concept extraction on each unstructured text data by combining the unstructured term set based on a TF-IDF algorithm and a clustering algorithm to obtain an unstructured concept set.
And the sentence to be extracted acquisition module is used for sequentially selecting sentences with the same terms from the unstructured text data as sentences to be extracted based on a preset sentence selection mode.
And the unstructured relation set acquisition module is used for extracting the relation of the sentences to be extracted according to the unstructured concept set and adding the extracted relation into the unstructured relation set until all the sentences to be extracted are executed.
And the environment judicial domain ontology acquisition module is used for acquiring an environment judicial domain ontology based on the structured relation set and the unstructured relation set.
In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the above method for constructing an ontology in an environmental judicial domain when executing the computer program.
In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps of the above method for constructing an ontology in an environmental judicial domain.
According to the method, the device, the computer equipment and the storage medium for constructing the environment judicial domain ontology provided by the embodiment of the invention, at least one piece of structured text data and at least one piece of unstructured text data are obtained, wherein the structured text data and the unstructured text data are both environment judicial domain text data; respectively extracting terms from the structured text data and the unstructured text data, constructing a structured term set according to the extracted structured terms, and constructing an unstructured term set according to the extracted unstructured terms; taking the structured term set as a structured concept set, and extracting the relation of structured text data based on the structured concept set to obtain a structured relation set; based on a TF-IDF algorithm and a clustering algorithm, carrying out concept extraction on each unstructured text data by combining an unstructured term set to obtain an unstructured concept set; based on a preset sentence selection mode, sequentially selecting sentences with the same terms from the unstructured text data as sentences to be extracted; extracting the relation of the sentences to be extracted according to the unstructured concept set, and adding the extracted relation into the unstructured relation set until all the sentences to be extracted are executed; and obtaining an environment judicial domain ontology based on the structured relation set and the unstructured relation set. The method comprises the steps of sequentially carrying out term extraction, concept extraction and concept relation extraction on structured data and unstructured data in the text data of the environment judicial field, constructing an environment judicial field body, and effectively improving the processing efficiency of the environment judicial cases based on the environment judicial field body.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied.
FIG. 2 is a flow diagram of one embodiment of a method for environment judicial domain ontology construction of the present application.
Fig. 3 is a schematic structural diagram of one embodiment of an environmental judicial domain ontology building apparatus according to the present application.
FIG. 4 is a schematic block diagram of one embodiment of a computer device according to the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof in the description and claims of this application and the description of the figures above, are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, as shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104 and a server 105. Network 104 is the medium used to provide communication links between terminal devices 101, 102, 103 and server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like.
The terminal devices 101, 102, 103 may be various electronic devices having display screens and supporting web browsing, including but not limited to smart phones, tablet computers, E-book readers, MP3 players (Moving Picture E interface shows a properties Group Audio Layer III, motion Picture experts compress standard Audio Layer 3), MP4 players (Moving Picture E interface shows a properties Group Audio Layer IV, motion Picture experts compress standard Audio Layer 4), laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that the method for constructing the environment judicial domain ontology provided by the embodiment of the present application is executed by a server, and accordingly, the apparatus for constructing the environment judicial domain ontology is disposed in the server.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. Any number of terminal devices, networks and servers may be provided according to implementation needs, and the terminal devices 101, 102 and 103 in this embodiment may specifically correspond to an application system in actual production.
Referring to fig. 2, fig. 2 shows a method for constructing an ontology in an environmental judicial domain according to an embodiment of the present invention, which is described by taking the method applied to the server in fig. 1 as an example, and is described in detail as follows.
S201, obtaining at least one structured text data and at least one unstructured text data, wherein the structured text data and the unstructured text data are both environment judicial domain text data.
In step S201, the structured text data refers to data in which the text data in the environmental judicial domain is structured data.
The unstructured text data refers to data in which text data in the environmental judicial field is unstructured data.
It should be understood that the environmental judicial field refers to the general term of one or more fields appearing in judicial cases related to the ecological environment, natural resources, etc., wherein the environmental judicial field relates to multi-domain knowledge of biology, medicine, environment, law, etc. It is noted herein that the field to which the environmental judicial domain relates may be one or more domains. The structured text data and the unstructured text data are corresponding text data in the related fields of the environmental judicial field. For example, when the environmental judicial field relates to the biological and medical fields, structured text data and unstructured text data related to the biological field and structured text data and unstructured text data related to the medical field are acquired.
The structured text data includes, but is not limited to, legal texts, industry standard documents, and industry standard tables. The method for acquiring the structured text data comprises but is not limited to directly acquiring the legal text, national and local environmental resource case trial and judgment industry standard. For example, the structured text data is shown in table 1 national hazardous waste list and table 2 national focus protection wildlife list.
TABLE 1 national hazardous waste part list
Figure 115750DEST_PATH_IMAGE001
TABLE 2 partial list of national protection of important wild animals
Figure 236152DEST_PATH_IMAGE002
The unstructured text data include, but are not limited to, laws and regulations, official documents, and prosecution in the environmental jurisdictional domain.
By acquiring the structured text data and the unstructured text data in the environment judicial field, the method is convenient for constructing an environment judicial field body based on the structured text data and the unstructured text data subsequently, and improves the processing efficiency of the environment judicial case.
S202, respectively extracting terms from the structured text data and the unstructured text data, constructing a structured term set according to the extracted structured terms, and constructing an unstructured term set according to the extracted unstructured terms.
In step S202, the term extraction refers to extracting a domain term related to each domain in all the domains related to the environmental judicial domain in the text data.
It should be understood that when multiple fields are involved in the environmental judicial field, structured text data and unstructured text data related to each field are acquired, the involved fields are sequentially used as target fields, term extraction is performed on the structured text data and the unstructured text data corresponding to the target fields, and the extracted terms are field terms corresponding to the target fields.
For example, when the environmental judicial field relates to a biological field and a medical field, structured text data and unstructured text data related to the biological field and structured text data and unstructured text data related to the medical field are acquired, and the biological field or the medical field is sequentially used as a target field, a field term of the biological field is extracted from the structured text data and the unstructured text data of the biological field, and a field term of the medical field is extracted from the structured text data and the unstructured text data of the medical field.
It should be noted here that the "(term, field)" or "field" may be used: { term 1, term 2, … … } "to construct a structured set of terms and an unstructured set of terms. In the case of "(term, domain)", for example, when the term extracted for structured text data in the biological domain is a honey monkey and the related domain is a living organism, (honey monkey, living organism) is used as a structured term, and a structured term set is added. Aiming at the field: { term 1, term 2, … … } ", for example, when the terms extracted for structured text data in the medical field are HW01 medical waste, infectious waste, etc., then the structured term set is: "medical treatment: { HW01 medical waste, infectious waste, … … } ".
It should be understood that the above uses (terms, fields) or "fields: { term 1, term 2, … … } "is merely exemplary and can be specifically designed according to the actual application scenario.
Preferably, the embodiments of the present application select "the field: { term 1, term 2, … … } "to construct a structured set of terms and an unstructured set of terms.
The method comprises the steps of carrying out term extraction on structured text data and unstructured text data to obtain a structured term set and an unstructured data set, so that concept extraction and concept relation extraction are carried out on the structured term set and the unstructured term set in a follow-up mode, an environment judicial domain ontology is constructed, and based on the environment judicial domain ontology, the processing efficiency of environment judicial cases is effectively improved.
And S203, taking the structured term set as a structured concept set, and performing relation extraction on the structured text data based on the structured concept set to obtain a structured relation set.
In step S203, the structured term set is directly used as a structured concept set because the structured text data has concepts with obvious hierarchical relationship.
In the following, a specific embodiment is described, for example, when the environmental judicial field relates to the biological field, the structured text data of the biological field is "funiculus-mammalia-lingeri-lazy monkey-honey monkey-bono bee monkey", the structured term set extracted from the structured text data is "living: { phylum chordata, class mammae, order primates, family lazulidae, honey monkey, squash monkey } ". Since structured text data has the concept of a distinct hierarchical relationship, the structured term set "living being: { phylum chordata, class mammae, order primates, family lazy monkey, honey monkey, squash monkey } "as a structured concept set" creature: { phylum chordata, class mammae, order primates, family lazulidae, honey monkey, squash monkey } ".
The method for extracting the relationship of the structured text data comprises the steps of inputting the data in the structured concept set into an ontology construction tool, and performing relationship connection on the data in the structured concept set based on the ontology construction tool.
It should be noted here that the ontology construction tool refers to a Prot g e ontology construction tool, where the Prot g e software is ontology editing and knowledge acquisition software developed by a biological information research center at the medical institute of Stanford university based on Java language, or an ontology development tool, and is also an editor based on knowledge, belongs to open source code software, and is mainly used for constructing an ontology in a semantic network.
The relational connection in the structured text data is a "kid-of" relational connection. In the structured text data, the domain-related concepts have obvious dependencies, and the connection concepts are configured to be subclasses of 'lazy monkey' by operating in the Prot g facility, for example, when the structured concept set is 'bee monkey'.
The concept extraction and the concept relation extraction are carried out on the structured term set, the structured relation set is constructed, and based on the structured relation set, partial contents which are processed aiming at the structured data in the environment judicial field ontology are constructed, so that the processing efficiency of the environment judicial cases is effectively improved.
And S204, performing concept extraction on each unstructured text data by combining an unstructured term set based on a TF-IDF algorithm and a clustering algorithm to obtain an unstructured concept set.
In step S204, the above-mentioned TF-IDF algorithm is used to calculate the word frequency of terms that each unstructured text data appears in the unstructured term set.
The clustering algorithm is used for extracting the concept of each unstructured text data in the unstructured term set and adding the extracted concept into the unstructured concept set.
The concept extraction is carried out on the unstructured text data through the TF-IDF algorithm and the clustering algorithm, an unstructured concept set can be quickly and accurately obtained, so that the concept relation extraction is carried out on the unstructured concept set in the following process, and the processing efficiency of the unstructured text data in the environment judicial case is improved.
S205, based on a preset sentence selection mode, sentences with the same terms are sequentially selected from the unstructured text data to serve as sentences to be extracted.
In step S205, the preset sentence extracting manner is a manner of extracting a sentence from the unstructured text data and performing relationship extraction.
It should be understood that the above-mentioned predetermined sentence selecting manner can be adjusted according to specific requirements. For example, when six-tuple is required, the predetermined sentence selection manner is to select three sentences having the same term from the unstructured text data.
Preferably, the preset sentence selection manner adopted by the embodiment of the present application is to select two sentences having the same term from the unstructured text data.
The above-mentioned sentence to be extracted is a sentence for extracting a conceptual relationship.
And selecting the sentences to be extracted by a preset sentence selection mode so as to extract the concept relation of the sentences to be extracted based on the unstructured concept set in the following process, thereby improving the processing efficiency of unstructured text data in the environment judicial case.
And S206, extracting the relation of the sentences to be extracted according to the unstructured concept set, and adding the extracted relation into the unstructured relation set until all the sentences to be extracted are executed.
In step S206, the method for extracting the relationship of the sentence to be extracted includes inputting the data in the unstructured concept set into an ontology construction tool, and performing relationship connection on the data in the unstructured concept set based on the ontology construction tool.
The relational links in the sentence to be extracted include, but are not limited to, "part-of", "kind-of", "instance-of", "attribute-of".
The method comprises the steps of extracting the relation of sentences to be extracted based on an unstructured concept set, adding the extracted relation into an unstructured relation set, and constructing partial content for processing unstructured text data in an environment judicial field ontology based on the unstructured relation set, so that the processing efficiency of the unstructured text data in the environment judicial case is effectively improved.
And S207, obtaining an environment judicial domain ontology based on the structured relation set and the unstructured relation set.
In step S207, it is specifically: and merging the structured relation set and the unstructured relation set to obtain the environment judicial domain ontology.
The structured relation set and the unstructured relation set are combined to obtain an environment judicial field body, the environment judicial field body is constructed to process the structured text data and the unstructured text data, and the processing efficiency of the environment judicial cases is effectively improved.
In this embodiment, a context judicial domain ontology is constructed by sequentially performing term extraction, concept extraction and concept relationship extraction on structured data and unstructured data in context data of the context judicial domain, and based on the context judicial domain ontology, the processing efficiency of context judicial cases is effectively improved.
In some optional implementations of this embodiment, step S202 further includes the following steps a to G.
A. And acquiring a text format of the structured text data.
B. And based on the text format, performing term extraction on the structured text data by adopting a structured term extraction mode adaptive to the text format, and constructing a structured term set according to the extracted structured terms.
C. And segmenting the unstructured text data based on a preset segmentation mode to obtain a segmented document text.
D. And performing part-of-speech tagging on the word segmentation document text to obtain word-part-of-speech pairs corresponding to all sentences in the word segmentation document text.
E. And selecting words with nouns in parts of speech from word-part-of-speech pairs corresponding to all sentences as unstructured terms, and constructing an unstructured term set based on the unstructured terms.
F. And selecting part-of-speech combinations corresponding to adjacent sentences from the segmented text documents.
G. And taking two adjacent words of which the part-of-speech combinations conform to the part-of-speech of the preset combination word as combination words, and adding the combination words into the unstructured term set.
For the step a, the text format refers to a data distribution format of the structured text data.
The method for acquiring the text format of the structured text data is obtained through manual analysis.
For the step B, a text format of the structured text data is obtained through manual analysis, an automatic extraction program is written for the text format, terms are extracted according to a hierarchy, and a structured term set is constructed according to the extracted structured terms. For example, honey monkey is a term, but it is according to the indented format the term lazy monkey under the order primates. According to the automated extraction procedure, the honey monkeys are extracted and placed in the lazy monkey family.
For the step C, the preset segmentation modes include, but are not limited to, a precise mode, a full mode and a search engine mode. The accurate mode, the full mode and the search engine mode are all word segmentation modes of the jieba word segmentation tool.
The word segmentation is to reserve word segmentation results according to different word segmentation modes after Chinese word segmentation is carried out on unstructured text data by utilizing a jieba word segmentation tool so as to cover potential terms as much as possible.
For the step D, the part-of-speech tagging refers to recognizing and tagging the part-of-speech of all word segmentation results in the word segmentation document text. The parts of speech include, but are not limited to, verbs, nouns, adjectives.
For step G, the parts of speech of the preset compound word include, but are not limited to, "verb + noun" and "noun + noun".
It should be understood that due to the particularity of the environmental judicial field, there are combinants such as "forest of forest abuse", "large number", etc. often applied as terms of art, and therefore immediately adjacent to "verb + noun", "noun + noun" are constructed as combinants, adding to the unstructured term set.
In this embodiment, the structured term set and the unstructured term set are obtained through the above steps, so that concept extraction and concept relationship extraction are performed on the structured term set and the unstructured term set in the following steps, an environment judicial domain ontology is constructed, and based on the environment judicial domain ontology, the processing efficiency of the environment judicial cases is effectively improved.
In some optional implementations of this embodiment, step S203 further includes a1 to A3.
And A1, taking the structured term set as a structured concept set.
And A2, inputting the structured concept set into the domain ontology construction model.
And A3, extracting the relationship between the structured text data and the concept in the structured concept set by adopting a domain ontology construction model according to the concept hierarchy relationship of the structured text data to obtain a structured relationship set.
For step A1 above, the structured term set is directly treated as a structured concept set due to the concept that the structured text data has a distinct hierarchical relationship.
For step a2, the domain ontology model is referred to as an ontology building tool. The ontology construction tool is a Prot g e ontology construction tool, Prot g e software is ontology editing and knowledge acquisition software developed by a Stanford university medical institute biological information research center based on Java language, or an ontology development tool and a knowledge-based editor, belongs to open source code software, and is mainly used for constructing ontologies in a semantic network.
For step a3, the method for extracting the relationship of the structured text data includes inputting the data in the structured concept set into an ontology building tool, and performing the relationship connection on the data in the structured concept set based on the ontology building tool.
The relational connection in the structured text data is a "kid-of" relational connection. In the structured text data, the related concepts in the domain have obvious subordinate relations, and the connected concepts are operated in the Prot g e tool, for example, when the structured concept set is "bee monkey", the connected concepts are configured as subclasses of "lazy monkey".
In this embodiment, a structured term set is extracted and concept relationship is extracted to construct a structured relationship set, and based on the structured relationship set, a part of content of the environment judicial domain ontology, which is processed aiming at structured data, is constructed, so that the processing efficiency of the environment judicial case is effectively improved.
In some optional implementations of this embodiment, step S204 further includes steps B1 through B5.
And B1, carrying out word frequency statistics on terms of the unstructured text data in the unstructured term set based on a TF-IDF algorithm aiming at each piece of unstructured text data to obtain the word frequency corresponding to each term in the unstructured text data.
And B2, calculating the domain importance of the unstructured text data according to the word frequency corresponding to each term in the unstructured text data to obtain the domain importance corresponding to the unstructured text data.
And B3, determining the domain to which the unstructured text data belongs based on the domain importance.
B4, clustering the unstructured term sets corresponding to the unstructured text data in the same field based on a clustering algorithm to obtain concept subsets corresponding to the unstructured text data in the same field.
And B5, merging all concept subsets to obtain an unstructured concept set.
As for the above step B2, it is specific.
And according to the word frequency corresponding to each term in the unstructured text data, calculating the domain relevancy of the unstructured text data to obtain the domain relevancy.
And performing field consistency calculation on the unstructured text data according to the word frequency corresponding to each term in the unstructured text data to obtain the field consistency.
And obtaining the field importance degree based on the field relevance degree and the field consistency.
In this embodiment, the domain correlation is calculated according to the following formula (1).
Figure 281469DEST_PATH_IMAGE003
(1)
Wherein t is the term DmRefers to the m-th domain, n domains in total, DR (t, D)m) Means that the term t is in the field DmThe degree of correlation in the field of (1),
Figure 755306DEST_PATH_IMAGE004
means that the term t is in the field DmThe term frequency in (1) and f is the term frequency.
The domain consistency is calculated according to the following formula (2).
Figure 835258DEST_PATH_IMAGE006
(2)
Wherein t is the term DmRefers to the m-th domain, n domains in total, D refers to the slave domain DmOf randomly selected one of the unstructured text data, DC (t, D)m) Means that the term t is in the field DmThe field of the process is consistent with the field of the process,
Figure 860983DEST_PATH_IMAGE007
means that the term t is in the field DmThe probability of presence in, H (P (t,d) means the distribution of the terms t in the unstructured text data d, the larger the term distribution, the more uniform and ubiquitous; the smaller the term t is, the more likely it is a domain-specific term, and f refers to the word frequency.
The domain importance is calculated according to the following formula (3).
Figure 393595DEST_PATH_IMAGE008
(3)
Wherein Wt, m refers to the domain importance of the term t in the domain m,
Figure 920392DEST_PATH_IMAGE009
in order to be a measure of the correlation of the domain,
Figure 854850DEST_PATH_IMAGE010
in order to achieve the consistency of the field,
Figure 51476DEST_PATH_IMAGE011
and
Figure 71384DEST_PATH_IMAGE012
the parameters can be adjusted by experiment.
With respect to the above step B3, it is noted that the environmental judicial terms come from both the legal field and the environmental resource related field, and the concept is divided into two parts, namely, the general term in the legal field and the term in the environmental resource field.
Determining the fields of the multi-field text according to the field importance, and if the field importance of the terms in environmental judicial and other legal fields is greater than the field importance of the terms in fields such as biology, chemistry and the like, considering the terms as general terms in the legal fields; terms are considered environmental resource domain terms if their domain importance in environmental jurisdictions, biology, chemistry is greater than their importance in other legal domains.
And combining the general terms in the legal field and the terms in the environmental resource field to construct a non-structural term set.
Specifically, step B4 includes the following steps.
And randomly selecting P terms from an unstructured term set corresponding to the same unstructured text data in the field as a clustering center, wherein the terms are in the form of word vectors.
And aiming at each clustering center, performing iterative computation on term word vectors and clustering centers in an unstructured term set corresponding to unstructured text data in the same field to obtain cosine similarity corresponding to each clustering center.
And sequencing all the clustering centers according to all the cosine similarity, and determining the concept subsets corresponding to the unstructured text data in the same field according to the clustering centers meeting the preset conditions.
Specifically, for a term subset extracted from unstructured text data, p term word vectors are randomly selected as a clustering center, ordering is performed by calculating cosine similarity of the term word vectors and the clustering center, and a final concept cluster is obtained through iterative calculation.
For step B5 above, the union of all concept subsets is treated as an unstructured concept set.
In this embodiment, the concept extraction is performed on the unstructured term set through the word frequency statistics and the clustering algorithm to obtain an unstructured concept set, so that the concept relationship extraction is performed subsequently based on the unstructured concept set to construct an unstructured relationship set, and based on the unstructured relationship set, the processing efficiency of the environmental judicial case is effectively improved.
In some optional implementations of this embodiment, step S206 further includes steps C1 through C6.
And C1, according to the part of speech of the sentence to be extracted, extracting a verb of the sentence to be extracted to obtain a core verb corresponding to the sentence to be extracted.
And C2, acquiring term word vectors corresponding to the sentences to be extracted, and performing distance calculation on the term word vectors to obtain distance vectors corresponding to the sentences to be extracted.
And C3, acquiring concepts corresponding to the sentences to be extracted according to the unstructured concept set.
And C4, constructing a multi-tuple based on the concept, the core verb and the distance vector corresponding to the sentence to be extracted.
And C5, analyzing the multi-element group to obtain the corresponding relation of the unstructured text data.
And C6, adding the corresponding relation of the unstructured text data into the unstructured relation set until all the sentences to be extracted are executed.
For the step C1, the core verb refers to a word having an association relationship between pairs of terms in the sentence to be extracted and having a part of speech of a verb. For example, in the "content of the constituent requirement," there are two concepts of "constituent requirement" and "forest deforestation" in the "forest deforestation or other forest" sentence, and "yes" is a core verb, and there is an "instance-of" relationship between the two concepts.
For step C4, the tuples include, but are not limited to, quadruples and hexatuples.
Preferably, the embodiments of the present application employ quadruplets.
And calculating distance vectors among terms in the sentence to be extracted by utilizing the term word vectors constructed by the pre-training model, and constructing the < concept, core verb, concept and distance vector > four-tuple together with the corresponding concept and core verb.
For the step C5, analyzing the core verbs and the corresponding distance vectors in the quadruple, and summarizing four main relationships between the environmental judicial concepts: "part-of", "kid-of", "instance-of", "attribute-of".
In the embodiment, the core verb, the distance vector, the concept extraction and the like are performed on the sentence to be extracted to construct the quadruple, the quadruple is analyzed to obtain the concept relationship among different terms, all the concept relationships are input into the unstructured relationship set, and the processing efficiency of the environmental judicial case is effectively improved based on the unstructured relationship set.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
Fig. 3 is a schematic block diagram of an environment judicial domain ontology construction device corresponding to the environment judicial domain ontology construction method in one-to-one manner according to the foregoing embodiment. As shown in fig. 3, the apparatus for constructing an ontology of an environmental judicial domain includes a data acquisition module 31, a term set acquisition module 32, a structured relationship set acquisition module 33, an unstructured concept set acquisition module 34, a sentence to be extracted acquisition module 35, an unstructured relationship set acquisition module 36, and an ontology of an environmental judicial domain acquisition module 37. Each functional block is described in detail below.
The data obtaining module 31 is configured to obtain at least one structured text data and at least one unstructured text data, where the structured text data and the unstructured text data are both environment judicial domain text data.
The term set obtaining module 32 is configured to perform term extraction on the structured text data and the unstructured text data respectively, construct a structured term set according to the extracted structured terms, and construct an unstructured term set according to the extracted unstructured terms.
And the structured relationship set obtaining module 33 is configured to use the structured term set as a structured concept set, and perform relationship extraction on the structured text data based on the structured concept set to obtain a structured relationship set.
And the unstructured concept set acquisition module 34 is configured to perform concept extraction on each unstructured text data by combining an unstructured term set based on a TF-IDF algorithm and a clustering algorithm to obtain an unstructured concept set.
And a to-be-extracted sentence acquisition module 35, configured to sequentially select sentences with the same terms from the unstructured text data as to-be-extracted sentences based on a preset sentence selection manner.
The unstructured relationship set obtaining module 36 is configured to perform relationship extraction on the sentences to be extracted according to the unstructured concept set, and add the extracted relationships into the unstructured relationship set until all the sentences to be extracted are executed.
And an environment judicial domain ontology obtaining module 37, configured to obtain an environment judicial domain ontology based on the structured relationship set and the unstructured relationship set.
Optionally, the term set acquisition module 32 further includes.
And the text format acquisition unit is used for acquiring the text format of the structured text data.
And the structured term set acquisition unit is used for extracting terms of the structured text data by adopting a structured term extraction mode adaptive to the text format based on the text format and constructing a structured term set according to the extracted structured terms.
And the word segmentation unit is used for segmenting the unstructured text data based on a preset word segmentation mode to obtain a word segmentation document text.
And the part-of-speech tagging unit is used for carrying out part-of-speech tagging on the word segmentation document text to obtain word-part-of-speech pairs corresponding to all sentences in the word segmentation document text.
And the unstructured term acquisition unit is used for selecting words with parts of speech as nouns from the word-part-of-speech pairs corresponding to all sentences as unstructured terms and constructing an unstructured term set based on the unstructured terms.
And the part-of-speech combination acquisition unit is used for selecting part-of-speech combinations corresponding to adjacent sentences from the segmented text documents.
And the unstructured term set acquisition unit is used for taking two adjacent words of which the part-of-speech combinations accord with the part-of-speech of the preset combination word as the combination word and adding the combination word into the unstructured term set.
Optionally, the structured relation set obtaining module 33 further includes:
and the structured concept set acquisition unit is used for taking the structured term set as the structured concept set.
And the data input unit is used for inputting the structured concept set into the domain ontology construction model.
And the structured relation set acquisition unit is used for extracting the relation between the structured text data and the concepts in the structured concept set by adopting a domain ontology construction model according to the concept hierarchy relation of the structured text data to obtain a structured relation set.
Optionally, the unstructured concept set acquisition module 34 further comprises.
And the word frequency acquisition unit is used for carrying out word frequency statistics on terms of the unstructured text data in the unstructured term set based on a TF-IDF algorithm aiming at each piece of unstructured text data to obtain the word frequency corresponding to each term in the unstructured text data.
And the field importance calculating unit is used for calculating the field importance of the unstructured text data according to the word frequency corresponding to each term in the unstructured text data to obtain the field importance corresponding to the unstructured text data.
And the domain determining unit is used for determining the domain to which the unstructured text data belongs based on the domain importance.
And the clustering unit is used for clustering the unstructured term sets corresponding to the unstructured text data in the same field based on a clustering algorithm to obtain the concept subsets corresponding to the unstructured text data in the same field.
And the merging unit is used for merging all concept subsets to obtain an unstructured concept set.
Optionally, the clustering unit further comprises.
And the clustering center acquisition unit is used for randomly selecting P terms from the unstructured term set corresponding to the unstructured text data in the same field as the clustering center, wherein the terms are in the form of word vectors.
And the cosine similarity calculation unit is used for carrying out iterative calculation on term word vectors and clustering centers in the unstructured term set corresponding to the unstructured text data in the same field aiming at each clustering center to obtain the cosine similarity corresponding to each clustering center.
And the concept subset acquisition unit is used for sequencing all the clustering centers according to all the cosine similarities and determining the concept subsets corresponding to the unstructured text data in the same field according to the clustering centers meeting the preset conditions.
Optionally, the unstructured-relationship-set obtaining module 36 further comprises.
And the core verb extraction unit is used for extracting verbs of the sentences to be extracted according to the parts of speech of the sentences to be extracted to obtain the core verbs corresponding to the sentences to be extracted.
And the distance vector acquisition unit is used for acquiring the term word vector corresponding to the sentence to be extracted and carrying out distance calculation on the term word vector to obtain the distance vector corresponding to the sentence to be extracted.
And the concept extraction unit is used for acquiring concepts corresponding to the sentences to be extracted according to the unstructured concept set.
And the multi-element group building unit is used for building multi-element groups based on concepts, core verbs and distance vectors corresponding to the sentences to be extracted.
And the multi-component analysis unit is used for analyzing the multi-component to obtain the corresponding relation of the unstructured text data.
And the unstructured relation set acquisition unit is used for adding the corresponding relation of the unstructured text data into the unstructured relation set until all the sentences to be extracted are executed.
For specific limitations of the environment judicial domain ontology construction device, reference may be made to the above limitations on the environment judicial domain ontology construction method, which is not described herein again. All or part of each module in the environment judicial domain ontology construction device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 4 in particular, fig. 4 is a block diagram of a basic structure of a computer device according to the embodiment.
The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It is noted that only the computer device 4 having the components connection memory 41, processor 42, network interface 43 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
The memory 41 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or D interface display memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 4. Of course, the memory 41 may also include both an internal storage unit of the computer device 4 and an external storage device thereof. In this embodiment, the memory 41 is generally used for storing an operating system installed in the computer device 4 and various types of application software, such as program codes for controlling electronic files. Further, the memory 41 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 42 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute the program code stored in the memory 41 or process data, such as program code for executing control of an electronic file.
The network interface 43 may comprise a wireless network interface or a wired network interface, and the network interface 43 is generally used for establishing communication connection between the computer device 4 and other electronic devices.
The present application further provides another embodiment, which is to provide a computer-readable storage medium, wherein the computer-readable storage medium stores an interface display program, and the interface display program is executable by at least one processor, so that the at least one processor executes the steps of the method for constructing the environment judicial domain ontology as described above.
Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method of the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but in many cases, the former is a better implementation. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims (9)

1. The method for constructing the ontology in the environmental judicial field is characterized by comprising the following steps of:
acquiring at least one structured text data and at least one unstructured text data, wherein the structured text data and the unstructured text data are both environment judicial domain text data;
respectively extracting terms from the structured text data and the unstructured text data, constructing a structured term set according to the extracted structured terms, and constructing an unstructured term set according to the extracted unstructured terms;
taking the structured term set as a structured concept set, and performing relation extraction on the structured text data based on the structured concept set to obtain a structured relation set;
based on TF-IDF algorithm and clustering algorithm, carrying out concept extraction on each unstructured text data by combining the unstructured term set to obtain an unstructured concept set;
based on a preset sentence selection mode, sequentially selecting sentences with the same terms from the unstructured text data as sentences to be extracted;
extracting the relation of the sentences to be extracted according to the unstructured concept set, and adding the extracted relation into an unstructured relation set until all the sentences to be extracted are executed;
obtaining an environment judicial domain ontology based on the structured relation set and the unstructured relation set;
the method comprises the following steps of extracting the relation of the sentences to be extracted according to the unstructured concept set, adding the extracted relation into an unstructured relation set, and completing the execution of all the sentences to be extracted, wherein the steps comprise:
according to the part of speech of the sentence to be extracted, verb extraction is carried out on the sentence to be extracted, and a core verb corresponding to the sentence to be extracted is obtained;
obtaining term word vectors corresponding to the sentences to be extracted, and performing distance calculation on the term word vectors to obtain distance vectors corresponding to the sentences to be extracted;
acquiring concepts corresponding to the sentences to be extracted according to the unstructured concept set;
constructing a multi-element group based on the concept, the core verb and the distance vector corresponding to the sentence to be extracted;
analyzing the multi-element group to obtain a corresponding relation of the unstructured text data;
and adding the corresponding relation of the unstructured text data into an unstructured relation set until all the sentences to be extracted are executed.
2. The method as claimed in claim 1, wherein the step of extracting terms from the structured text data and the unstructured text data respectively, and constructing a structured term set according to the extracted structured terms, and the step of constructing an unstructured term set according to the extracted unstructured terms comprises:
acquiring a text format of the structured text data;
based on the text format, adopting a structured term extraction mode which is adaptive to the text format to extract terms of the structured text data, and constructing a structured term set according to the extracted structured terms;
based on a preset word segmentation mode, carrying out word segmentation on the unstructured text data to obtain a word segmentation document text;
performing part-of-speech tagging on the word segmentation document text to obtain word-part-of-speech pairs corresponding to all sentences in the word segmentation document text;
selecting words with parts of speech as nouns from the word-part-of-speech pairs corresponding to all the sentences as unstructured terms, and constructing an unstructured term set based on the unstructured terms;
selecting part-of-speech combinations corresponding to adjacent sentences from the word segmentation text documents;
and taking two adjacent words of which the part-of-speech combinations conform to the part-of-speech of a preset combination word as combination words, and adding the combination words into the unstructured term set.
3. The method according to claim 1, wherein the step of taking the structured term set as a structured concept set and performing relationship extraction on the structured text data based on the structured concept set to obtain a structured relationship set comprises:
taking the set of structured terms as a set of structured concepts;
inputting the structured concept set into a domain ontology construction model;
and according to the concept hierarchy relation of the structured text data, performing relation extraction on the structured text data and concepts in the structured concept set by adopting the domain ontology construction model to obtain a structured relation set.
4. The method according to claim 1, wherein the step of extracting concepts from each unstructured text data by combining the unstructured term set based on a TF-IDF algorithm and a clustering algorithm to obtain an unstructured concept set comprises:
for each unstructured text data, performing word frequency statistics on terms of the unstructured text data in the unstructured term set based on a TF-IDF algorithm to obtain word frequency corresponding to each term in the unstructured text data;
according to the word frequency corresponding to each term in the unstructured text data, calculating the field importance of the unstructured text data to obtain the field importance corresponding to the unstructured text data;
determining the domain to which the unstructured text data belongs based on the domain importance;
clustering unstructured term sets corresponding to unstructured text data in the same field based on a clustering algorithm to obtain concept subsets corresponding to the unstructured text data in the same field;
and combining all the concept subsets to obtain an unstructured concept set.
5. The method according to claim 4, wherein the clustering algorithm-based method for clustering the unstructured term sets corresponding to the unstructured text data in the same domain to obtain the concept subsets corresponding to the unstructured text data in the same domain comprises:
randomly selecting P terms from an unstructured term set corresponding to unstructured text data in the same field as the unstructured text data to serve as a clustering center, wherein the terms are in the form of word vectors;
for each clustering center, performing iterative computation on term word vectors in an unstructured term set corresponding to unstructured text data in the same field and the clustering center to obtain cosine similarity corresponding to each clustering center;
and sequencing all the clustering centers according to all the cosine similarity, and determining the concept subsets corresponding to the unstructured text data in the same field according to the clustering centers meeting preset conditions.
6. The utility model provides an environment judicial domain body construction device which characterized in that, environment judicial domain body construction device includes:
the data acquisition module is used for acquiring at least one structured text data and at least one unstructured text data, wherein the structured text data and the unstructured text data are both environment judicial domain text data;
the term set acquisition module is used for respectively extracting terms from the structured text data and the unstructured text data, constructing a structured term set according to the extracted structured terms and constructing an unstructured term set according to the extracted unstructured terms;
the structured relation set acquisition module is used for taking the structured term set as a structured concept set and extracting the relation of the structured text data based on the structured concept set to obtain a structured relation set;
the unstructured concept set acquisition module is used for performing concept extraction on each unstructured text data by combining the unstructured term set based on a TF-IDF algorithm and a clustering algorithm to obtain an unstructured concept set;
the to-be-extracted sentence acquisition module is used for sequentially selecting sentences with the same terms from the unstructured text data as to-be-extracted sentences based on a preset sentence selection mode;
the unstructured relation set acquisition module is used for extracting the relation of the sentences to be extracted according to the unstructured concept set and adding the extracted relation into an unstructured relation set until all the sentences to be extracted are executed;
the environment judicial domain ontology acquisition module is used for acquiring an environment judicial domain ontology based on the structural relationship set and the unstructured relationship set;
wherein the unstructured relational set acquisition module comprises:
the core verb extraction unit is used for extracting verbs of the sentences to be extracted according to the parts of speech of the sentences to be extracted to obtain core verbs corresponding to the sentences to be extracted;
the distance vector acquisition unit is used for acquiring term word vectors corresponding to the sentences to be extracted and carrying out distance calculation on the term word vectors to obtain distance vectors corresponding to the sentences to be extracted;
a concept extraction unit, configured to obtain a concept corresponding to the sentence to be extracted according to the unstructured concept set;
the multi-element group building unit is used for building a multi-element group based on the concept, the core verb and the distance vector corresponding to the sentence to be extracted;
the multi-component analysis unit is used for analyzing the multi-component to obtain the corresponding relation of the unstructured text data;
and the unstructured relation set acquisition unit is used for adding the relation corresponding to the unstructured text data into an unstructured relation set until all the sentences to be extracted are executed.
7. The environment judicial domain ontology construction apparatus of claim 6, wherein the term set acquisition module comprises:
a text format acquiring unit for acquiring a text format of the structured text data;
a structured term set acquisition unit, configured to perform term extraction on the structured text data in a structured term extraction manner adapted to the text format based on the text format, and construct a structured term set according to the extracted structured terms;
the word segmentation unit is used for segmenting the unstructured text data based on a preset word segmentation mode to obtain a word segmentation document text;
the part-of-speech tagging unit is used for carrying out part-of-speech tagging on the word segmentation document text to obtain word-part-of-speech pairs corresponding to all sentences in the word segmentation document text;
an unstructured term obtaining unit, configured to select words with parts of speech as nouns from word-part-of-speech pairs corresponding to all the sentences as unstructured terms, and construct an unstructured term set based on the unstructured terms;
a part-of-speech combination obtaining unit, configured to select a part-of-speech combination corresponding to an adjacent sentence from the segmented text document;
and the unstructured term set acquisition unit is used for taking two adjacent words of which the part-of-speech combinations accord with the part-of-speech of a preset combination word as combination words and adding the combination words into the unstructured term set.
8. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the environment judicial domain ontology construction method according to any one of claims 1 to 5 when executing the computer program.
9. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the environment judicial domain ontology construction method according to any one of claims 1 to 5.
CN202210286382.3A 2022-03-23 2022-03-23 Environment judicial domain ontology construction method and device and related equipment Active CN114385819B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210286382.3A CN114385819B (en) 2022-03-23 2022-03-23 Environment judicial domain ontology construction method and device and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210286382.3A CN114385819B (en) 2022-03-23 2022-03-23 Environment judicial domain ontology construction method and device and related equipment

Publications (2)

Publication Number Publication Date
CN114385819A CN114385819A (en) 2022-04-22
CN114385819B true CN114385819B (en) 2022-06-21

Family

ID=81204778

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210286382.3A Active CN114385819B (en) 2022-03-23 2022-03-23 Environment judicial domain ontology construction method and device and related equipment

Country Status (1)

Country Link
CN (1) CN114385819B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116975595B (en) * 2023-07-03 2024-03-26 华南师范大学 Unsupervised concept extraction method and device, electronic equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104182454B (en) * 2014-07-04 2018-03-27 重庆科技学院 The integrated model of multi-source heterogeneous data semantic based on domain body structure and method
US11194849B2 (en) * 2018-09-11 2021-12-07 International Business Machines Corporation Logic-based relationship graph expansion and extraction
CN109635272A (en) * 2018-10-24 2019-04-16 中国电子科技集团公司第二十八研究所 A kind of ontology interaction models construction method in air traffic control field
CN111163086B (en) * 2019-12-27 2022-06-07 北京工业大学 Multi-source heterogeneous network security knowledge graph construction and application method

Also Published As

Publication number Publication date
CN114385819A (en) 2022-04-22

Similar Documents

Publication Publication Date Title
KR102288249B1 (en) Information processing method, terminal, and computer storage medium
CN107992585B (en) Universal label mining method, device, server and medium
CN106960030B (en) Information pushing method and device based on artificial intelligence
CN112101041B (en) Entity relationship extraction method, device, equipment and medium based on semantic similarity
CN111797214A (en) FAQ database-based problem screening method and device, computer equipment and medium
CN112100326B (en) Anti-interference question and answer method and system integrating retrieval and machine reading understanding
CN110427453B (en) Data similarity calculation method, device, computer equipment and storage medium
CN110276009B (en) Association word recommendation method and device, electronic equipment and storage medium
CN111813905A (en) Corpus generation method and device, computer equipment and storage medium
CN112287069A (en) Information retrieval method and device based on voice semantics and computer equipment
CN113722438A (en) Sentence vector generation method and device based on sentence vector model and computer equipment
CN111966792B (en) Text processing method and device, electronic equipment and readable storage medium
CN114547315A (en) Case classification prediction method and device, computer equipment and storage medium
Yang et al. Improving word representations with document labels
CN114385819B (en) Environment judicial domain ontology construction method and device and related equipment
Li et al. Wikipedia based short text classification method
Song et al. Semi-automatic construction of a named entity dictionary for entity-based sentiment analysis in social media
CN114817478A (en) Text-based question and answer method and device, computer equipment and storage medium
CN112686053A (en) Data enhancement method and device, computer equipment and storage medium
CN114742058B (en) Named entity extraction method, named entity extraction device, computer equipment and storage medium
CN110888940A (en) Text information extraction method and device, computer equipment and storage medium
CN114780724A (en) Case classification method and device, computer equipment and storage medium
CN113505196A (en) Part-of-speech-based text retrieval method and device, electronic equipment and storage medium
CN112989003A (en) Intention recognition method, device, processing equipment and medium
CN115495541B (en) Corpus database, corpus database maintenance method, apparatus, device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant