CN113435164B - Automatic labeling and extracting method and device for Mongolian arbitration document information - Google Patents
Automatic labeling and extracting method and device for Mongolian arbitration document information Download PDFInfo
- Publication number
- CN113435164B CN113435164B CN202110532905.3A CN202110532905A CN113435164B CN 113435164 B CN113435164 B CN 113435164B CN 202110532905 A CN202110532905 A CN 202110532905A CN 113435164 B CN113435164 B CN 113435164B
- Authority
- CN
- China
- Prior art keywords
- document
- mongolian
- judgment
- information
- chinese
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000002372 labelling Methods 0.000 title claims abstract description 66
- 238000000034 method Methods 0.000 title claims abstract description 61
- 238000000605 extraction Methods 0.000 claims abstract description 18
- 230000014509 gene expression Effects 0.000 claims abstract description 14
- 238000007781 pre-processing Methods 0.000 claims abstract description 14
- 230000008569 process Effects 0.000 claims description 14
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 6
- 238000010276 construction Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 abstract description 2
- 230000009286 beneficial effect Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000012937 correction Methods 0.000 description 1
- 230000001627 detrimental effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
- G06F40/117—Tagging; Marking up; Designating a block; Setting of attributes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/18—Legal services
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Primary Health Care (AREA)
- Marketing (AREA)
- Human Resources & Organizations (AREA)
- General Business, Economics & Management (AREA)
- Economics (AREA)
- Technology Law (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Document Processing Apparatus (AREA)
Abstract
The invention provides a method and a device for automatically labeling and extracting key information from Mongolian judgment documents, and relates to the technical field of text processing. According to the method, original data of Mongolian judgment documents are obtained; preprocessing the original data of the Mongolian judgment document; the method comprises the steps of marking key elements of preprocessed Mongolian judgment document original data according to a preset attribute tag system, and obtaining a marked document, wherein the preset attribute tag system is constructed based on Chinese judgment documents; and extracting information from the marked document by adopting the regular expression to obtain key information. Aiming at the situation that the comprehensive attribute labels are difficult to obtain in the direct Mongolian judgment document, the invention adopts the method for obtaining the more comprehensive attribute labels from the large-scale Chinese judgment document and constructs a system according to the more comprehensive attribute labels. And then, the constructed system is applied to the Mongolian judgment document, so that automatic annotation extraction of the Mongolian judgment document is realized, and the annotation efficiency and the accuracy are improved.
Description
Technical Field
The invention relates to the technical field of text processing, in particular to a method and a device for automatically labeling and extracting Mongolian arbitration document information.
Background
Along with the development of society, legal system is perfected continuously, and the legal consciousness of masses is improved continuously. As the number of various cases increases, the number of various case decisions or decisions increases. In the face of such situations, on the one hand, law practitioners need to continuously review a large amount of related cases and related laws and regulations in the process of knowing the cases so as to grasp the actual situations of the cases, and then develop further work. This has increased the demands on law practitioners, making their tasks increasingly more demanding, not only detrimental to efficiency improvements, but also increasing the risk of errors during operation. The judgment document is marked, so that legal practitioners can know the case conveniently.
The traditional Mongolian arbitration document mainly takes a manual labeling mode, key information in legal texts is extracted, then the key information is marked with corresponding label attributes, and the legal arbitration document labeled in the mode is high in accuracy and good in readability. However, on one hand, the labeling mode has high requirements on labeling people, and the labeling people have certain legal knowledge to successfully finish labeling tasks. On the other hand, as the data volume of legal documents increases, time and effort are consumed in a manual mode, uniformity is poor, and error rate in the manual marking process is high. The existing Mongolian arbitration document labeling method is low in efficiency.
Disclosure of Invention
(one) solving the technical problems
Aiming at the defects of the prior art, the invention provides a method and a device for automatically labeling and extracting Mongolian arbitration document information, and solves the technical problem of low efficiency of the existing Mongolian arbitration document labeling method.
(II) technical scheme
In order to achieve the above purpose, the invention is realized by the following technical scheme:
in a first aspect, the present invention provides a method for automatically labeling and extracting Mongolian arbitration document information, the method comprising:
s1, acquiring original data of a Mongolian judgment document;
s2, preprocessing the original data of the Mongolian judgment document;
s3, marking key elements of the preprocessed Mongolian judgment document original data according to a preset Chinese attribute tag system, so as to obtain a marked document, wherein the preset Chinese attribute tag system is constructed based on the Chinese judgment document;
and S4, extracting information from the labeling document by adopting a regular expression to obtain key information.
Preferably, the method further comprises:
and S5, storing the key information into a text with a preset structural rule.
Preferably, the preprocessing the original data of the mongolian decision document includes:
s201, converting Mongolian judgment documents from Meng Keli codes to international standard codes;
s202, uniformly converting the deformation controller and the additional components;
s203, converting the full-angle character into the half-angle character, and deleting the page number and the redundant paragraph characters.
Preferably, the construction process of the preset Chinese attribute tag system comprises the following steps:
setting a fixed category label for the external attribute label of the Chinese judgment document, marking the Chinese judgment document by the fixed category label, splitting the marked Chinese judgment document according to the external label, and extracting an attribute label system from a term and French knowledge base; for the label labeling principle, the following rules are followed:
automatic labeling by a machine;
based on automatic labeling of the machine, a manual checking mode is adopted;
for unstructured parts of Chinese decision documents, the unstructured parts are converted into structured texts, and the conversion steps are as follows:
a. analyzing the head-tail structural characteristics, researching a head-tail attribution representation method of a judgment document based on structural relation, and constructing a structural attribute tag matching rule;
b. analyzing the basic information in the text and the content characteristics of the judgment result, researching an attribute representation method of the judgment book based on rules, selecting related information of keywords by combining a professional term library to formulate rules, and constructing attribute tag matching rules of unstructured texts.
Preferably, the extracting information from the labeling document by using a regular expression to obtain key information includes:
and (3) automatically extracting the labeling document in the step (S3) by adopting a regular expression character string matching mode to obtain key information, and forming an XML template file.
Preferably, the storing the key information in a text of a rule with a preset structure includes:
s501, writing a Python program, and extracting key information from an XML template file by using a regular matching algorithm;
s502, writing the extracted key information into a txt text file.
In a second aspect, the present invention provides an automatic labeling and extracting device for mongolian arbitration document information, the device comprising:
the data acquisition module is used for acquiring original data of the Mongolian judgment document;
the preprocessing module is used for preprocessing the original data of the Mongolian judgment document;
the marking module is used for marking key elements of the preprocessed Mongolian judgment document original data according to a preset Chinese attribute tag system to obtain a marked document, and the preset Chinese attribute tag system is constructed based on the Chinese judgment document;
and the extraction module is used for extracting information from the marked document by adopting the regular expression to obtain key information.
Preferably, the apparatus further comprises:
and the rule text module is used for storing the key information into a text of a rule with a preset structure.
In a third aspect, the present invention provides a computer readable storage medium storing a computer program for automatic labeling and extraction of mongolian arbitration document information, wherein the computer program causes a computer to execute the method for automatic labeling and extraction of mongolian arbitration document information as described above.
In a third aspect, the present invention provides an electronic device comprising:
one or more processors;
a memory; and
one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing the automatic mongolian arbitration document information labeling and extraction method described above.
(III) beneficial effects
The invention provides a method and a device for automatically labeling and extracting Mongolian arbitration document information. Compared with the prior art, the method has the following beneficial effects:
aiming at the situation that the comprehensive attribute labels are difficult to obtain in the direct Mongolian judgment document, the invention adopts the method for obtaining the more comprehensive attribute labels from the large-scale Chinese judgment document and constructs a system according to the more comprehensive attribute labels. And then, the constructed system is applied to the Mongolian judgment document, so that automatic annotation extraction of the Mongolian judgment document is realized, and the annotation efficiency and the accuracy are improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a block diagram of a method for automatically labeling and extracting Mongolian arbitration document information according to an embodiment of the present invention;
FIG. 2 is a partially labeled Mongolian decision document;
fig. 3 is a schematic diagram of an XML template file.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions in the embodiments of the present invention are clearly and completely described, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
According to the method and device for automatically labeling and extracting the Mongolian arbitration document information, the technical problem that an existing Mongolian arbitration document labeling method is low in efficiency is solved, automatic labeling and extracting of Mongolian arbitration documents are achieved, and labeling effect and accuracy are improved.
The technical scheme in the embodiment of the application aims to solve the technical problems, and the overall thought is as follows:
aiming at the defect that the traditional judgment document adopts a manual labeling mode to consume time and labor, the embodiment of the invention realizes an automatic labeling extraction method based on rules based on a designed attribute label system, extracts key information labels in the Chinese and Mongolian judgment document to form a rule text, thereby constructing a corpus for auxiliary judgment prediction tasks. And applying the acquired Chinese judgment document attribute tag system to a Mongolian judgment document tag system. Aiming at the situation that the comprehensive attribute labels are difficult to obtain in the direct Mongolian judgment document, the system is constructed by adopting the method that the more comprehensive attribute labels are obtained from the large-scale Chinese judgment document. And then, the constructed system is applied to the Mongolian judgment document, so that automatic annotation extraction of the Mongolian judgment document is realized, and the annotation efficiency and the accuracy are improved.
In order to better understand the above technical solutions, the following detailed description will refer to the accompanying drawings and specific embodiments.
The embodiment of the invention provides a method for automatically labeling and extracting Mongolian arbitration document information, which is executed by a computer, as shown in fig. 1, and comprises the following steps:
s1, acquiring original data of a Mongolian judgment document;
s2, preprocessing original data of the Mongolian judgment document;
s3, marking key elements of the preprocessed Mongolian judgment document original data according to a preset Chinese attribute tag system, so as to obtain a marked document, wherein the preset Chinese attribute tag system is constructed based on the Chinese judgment document;
and S4, extracting information from the marked document by adopting a regular expression to obtain key information.
The embodiment of the invention applies the Chinese judgment document attribute tag system to the Mongolian judgment document tag system. Aiming at the situation that the comprehensive attribute labels are difficult to obtain in the direct Mongolian judgment document, the system is constructed by adopting the method that the more comprehensive attribute labels are obtained from the large-scale Chinese judgment document. And then, the constructed system is applied to the Mongolian judgment document, so that automatic annotation extraction of the Mongolian judgment document is realized, and the annotation efficiency and the accuracy are improved.
The following describes the steps in detail:
in step S1, original data of a mongolian decision document is acquired. The specific implementation process is as follows:
the original data of the Mongolian judgment document is obtained through a web crawler technology or other methods, and in the embodiment of the invention, the Mongolian judgment document is obtained from the national language document column of the national judgment document network (https:// wenchu. Kurt. Cn /), so as to obtain the original data of the Mongolian judgment document.
In step S2, preprocessing is performed on the original data of the mongolian decision document. The specific implementation process is as follows:
preprocessing Mongolian language features.
S201, code conversion, namely Meng Keli codes are adopted in Mongolian judgment documents instead of international codes, so that the text is required to be converted from Meng Keli codes to international standard codes.
S202, correcting, namely, in Mongolian, part of deformation control symbols (U180B, U180C, U D) and additional components are included, so that unified conversion is needed, the Mongolian international coding standard is met, and part of words in Mongolian are corrected in a dictionary and rule mode.
S203, aiming at full-angle characters, page numbers and redundant paragraph characters in Mongolian, the full-angle characters are uniformly converted into half-angle characters, and the page numbers and the redundant paragraph characters are directly deleted.
In step S3, key element labeling is carried out on the preprocessed original data of the Mongolian judgment document according to a preset Chinese attribute label system, and a labeling document is obtained. The specific implementation process is as follows:
in the embodiment of the invention, a preset Chinese attribute label system is pre-constructed according to a Chinese judgment document, and the construction process is as follows:
firstly, in order to facilitate the inquiry of users and realize the functions of statistics and the like, a fixed category label is set for the external attribute label of the judgment document: a first-level label (a head part) and a text (a tail part) are set, wherein the text is taken as a core, and the text is disassembled into labels such as basic information and judgment result. And then, marking the Chinese judgment document, splitting the external label according to the written guideline of the conventional judge document of the national institutes of China, and extracting a set of basically perfect attribute label system from a term and French knowledge base. For the label labeling principle, mainly follow the following:
1. the authenticity of the judgment document is ensured to the greatest extent by adopting a machine labeling mode;
2. on the basis of automatic labeling of a machine, in order to further improve accuracy, a manual correction mode is also adopted;
3. for criminal judgment documents, structured texts are generally adopted, but unstructured examples exist, and inherent element properties of the documents are utilized to describe inherent properties (writing specifications, text structures, words and the like) so as to convert the documents from unstructured to structured examples. And for the structural part, labeling an attribute label system by using a criminal judgment document, and dividing the internal structure by using a regular matching method. For unstructured parts, it is necessary to convert them into structured text, the conversion steps are as follows:
a. analyzing the head-tail structural characteristics, researching a head-tail attribution representation method of a judgment document based on structural relation, and constructing a structural attribute tag matching rule;
b. analyzing the content characteristics of the basic information and the judgment result in the text, researching an attribute representation method of a rule-based judgment book, selecting related keyword information by combining a professional term library to formulate rules, and constructing an attribute tag matching rule of the unstructured text.
And then, according to the constructed label marking system of the judgment document, a series of rules are revised manually according to legal specialists, and each entity label in the judgment document is automatically marked by adopting a regular expression construction mode matching method, wherein an example after marking part labels is shown in figure 2.
In step S4, information extraction is carried out on the marked document by adopting a regular expression, and key information is obtained. The specific implementation process is as follows:
after the result of the automatic labeling of the partial labels in the following fig. 2 is obtained, the partial labels of the labeling document in the step S3 are automatically extracted by adopting a regular expression character string matching mode, so as to obtain key information, and finally, the XML template file in fig. 3 is formed.
In an embodiment of the present invention, in order to expand the corpus, the method further includes: and S5, saving the key information into a text with a preset structure rule. The specific implementation process is as follows:
in order to use the key information extracted in the step S4 for judging and predicting tasks, extracting part of the information in the key information and storing the part of the information into a text, the specific steps are as follows:
s501, writing a Python program, and extracting the field meaning corresponding to the table 1 from the XML template file in the step S4 by using a regular matching algorithm;
s502, writing the extracted fields in the table 1 into a txt text file, wherein one row represents information of a decision document.
TABLE 1 meaning of each field in rule text
Fields | Meaning of field |
Fact | Description of case facts |
Meta | Case attributes |
punish_of_money | Fine (Unit: yuan) |
Accusation | Crime name |
relevant_articles | Correlation method |
Criminals | Interviewee |
term_of_imprisonment | Criminal period related attributes |
death_penalty | Whether or not to death |
Imprisonment | Criminal period (unit: month) of non-dead criminal |
life_imprisonment | Whether or not to convicte in the future |
The embodiment of the invention provides an automatic labeling and extracting device for Mongolian arbitration document information, which comprises the following components:
the data acquisition module is used for acquiring original data of the Mongolian judgment document;
the preprocessing module is used for preprocessing the original data of the Mongolian judgment document;
the marking module is used for marking key elements of the preprocessed Mongolian judgment document original data according to a preset Chinese attribute tag system to obtain a marked document, and the preset Chinese attribute tag system is constructed based on the Chinese judgment document;
and the extraction module is used for extracting information from the marked document by adopting the regular expression to obtain key information.
It can be understood that the automatic labeling and extracting device for mongolian arbitration document information provided by the embodiment of the invention corresponds to the automatic labeling and extracting method for mongolian arbitration document information, and the explanation, the example, the beneficial effects and other parts of the related content can refer to the corresponding content in the automatic labeling and extracting method for mongolian arbitration document information, which is not repeated here.
The embodiment of the invention also provides a computer readable storage medium which stores a computer program for automatic labeling and extracting of Mongolian arbitration document information, wherein the computer program enables a computer to execute the method for automatic labeling and extracting of Mongolian arbitration document information.
The embodiment of the invention also provides electronic equipment, which comprises:
one or more processors;
a memory; and
one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing the automatic Mongolian arbitration document information labeling and extraction method as described above.
In summary, compared with the prior art, the method has the following beneficial effects:
1. the embodiment of the invention applies the Chinese judgment document attribute tag system to the Mongolian judgment document tag system. Aiming at the situation that the comprehensive attribute labels are difficult to obtain in the direct Mongolian judgment document, the system is constructed by adopting the method that the more comprehensive attribute labels are obtained from the large-scale Chinese judgment document. And then, the constructed system is applied to the Mongolian judgment document, so that automatic annotation extraction of the Mongolian judgment document is realized, and the annotation efficiency and the accuracy are improved.
2. The embodiment of the invention realizes an automatic annotation extraction method based on rules, extracts the key information annotation in the Mongolian judgment document to form a rule text, thereby constructing a corpus for auxiliary judgment prediction tasks.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (9)
1. An automatic labeling and extracting method for Mongolian arbitration document information is characterized by comprising the following steps:
s1, acquiring original data of a Mongolian judgment document;
s2, preprocessing the original data of the Mongolian judgment document;
s3, marking key elements of the preprocessed Mongolian judgment document original data according to a preset Chinese attribute tag system, so as to obtain a marked document, wherein the preset Chinese attribute tag system is constructed based on the Chinese judgment document;
s4, extracting information from the labeling document by adopting a regular expression to obtain key information;
the construction process of the preset Chinese attribute label system comprises the following steps:
setting a fixed category label for the external attribute label of the Chinese judgment document, marking the Chinese judgment document by the fixed category label, splitting the marked Chinese judgment document according to the external label, and extracting an attribute label system from a term and French knowledge base; for the label labeling principle, the following rules are followed:
automatic labeling by a machine;
based on automatic labeling of the machine, a manual checking mode is adopted;
for unstructured parts of Chinese decision documents, the unstructured parts are converted into structured texts, and the conversion steps are as follows:
a. analyzing the head-tail structural characteristics, researching a head-tail attribution representation method of a judgment document based on structural relation, and constructing a structural attribute tag matching rule;
b. analyzing the basic information in the text and the content characteristics of the judgment result, researching an attribute representation method of the judgment book based on rules, selecting related information of keywords by combining a professional term library to formulate rules, and constructing attribute tag matching rules of unstructured texts.
2. The method for automatically labeling and extracting mongolian arbitration document information as recited in claim 1, further comprising:
and S5, storing the key information into a text with a preset structural rule.
3. The automatic labeling and extracting method for mongolian decision document information according to any one of claims 1 to 2, wherein the preprocessing of the mongolian decision document raw data comprises:
s201, converting Mongolian judgment documents from Meng Keli codes to international standard codes;
s202, uniformly converting the deformation controller and the additional components;
s203, converting the full-angle character into the half-angle character, and deleting the page number and the redundant paragraph characters.
4. The automatic labeling and extracting method for Mongolian arbitration document information according to any one of claims 1-2, wherein the extracting of information from the labeling document by using regular expressions to obtain key information comprises:
and (3) automatically extracting the labeling document in the step (S3) by adopting a regular expression character string matching mode to obtain key information, and forming an XML template file.
5. The automatic labeling and extracting method for mongolian arbitration document information according to claim 2, wherein the storing the key information in a text with a preset structure rule comprises:
s501, writing a Python program, and extracting key information from an XML template file by using a regular matching algorithm;
s502, writing the extracted key information into a txt text file.
6. An automatic labeling and extracting device for Mongolian arbitration document information, which is characterized by comprising:
the data acquisition module is used for acquiring original data of the Mongolian judgment document;
the preprocessing module is used for preprocessing the original data of the Mongolian judgment document;
the marking module is used for marking key elements of the preprocessed Mongolian judgment document original data according to a preset Chinese attribute tag system to obtain a marked document, and the preset Chinese attribute tag system is constructed based on the Chinese judgment document;
the extraction module is used for extracting information from the marked document by adopting the regular expression to obtain key information;
the construction process of the preset Chinese attribute label system comprises the following steps:
setting a fixed category label for the external attribute label of the Chinese judgment document, marking the Chinese judgment document by the fixed category label, splitting the marked Chinese judgment document according to the external label, and extracting an attribute label system from a term and French knowledge base; for the label labeling principle, the following rules are followed:
automatic labeling by a machine;
based on automatic labeling of the machine, a manual checking mode is adopted;
for unstructured parts of Chinese decision documents, the unstructured parts are converted into structured texts, and the conversion steps are as follows:
a. analyzing the head-tail structural characteristics, researching a head-tail attribution representation method of a judgment document based on structural relation, and constructing a structural attribute tag matching rule;
b. analyzing the basic information in the text and the content characteristics of the judgment result, researching an attribute representation method of the judgment book based on rules, selecting related information of keywords by combining a professional term library to formulate rules, and constructing attribute tag matching rules of unstructured texts.
7. The automatic mongolian arbitration document information labeling and extracting device of claim 6, further comprising:
and the rule text module is used for storing the key information into a text of a rule with a preset structure.
8. A computer-readable storage medium storing a computer program for automatic labeling and extraction of mongolian arbitration document information, wherein the computer program causes a computer to execute the method for automatic labeling and extraction of mongolian arbitration document information according to any one of claims 1 to 5.
9. An electronic device, comprising:
one or more processors;
a memory; and
one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing the automatic mongolian arbitration document information labeling and extraction method of any one of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110532905.3A CN113435164B (en) | 2021-05-17 | 2021-05-17 | Automatic labeling and extracting method and device for Mongolian arbitration document information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110532905.3A CN113435164B (en) | 2021-05-17 | 2021-05-17 | Automatic labeling and extracting method and device for Mongolian arbitration document information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113435164A CN113435164A (en) | 2021-09-24 |
CN113435164B true CN113435164B (en) | 2024-02-13 |
Family
ID=77802523
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110532905.3A Active CN113435164B (en) | 2021-05-17 | 2021-05-17 | Automatic labeling and extracting method and device for Mongolian arbitration document information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113435164B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110837564A (en) * | 2019-09-25 | 2020-02-25 | 中央民族大学 | Construction method of knowledge graph of multilingual criminal judgment books |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9348902B2 (en) * | 2013-01-30 | 2016-05-24 | Wal-Mart Stores, Inc. | Automated attribute disambiguation with human input |
-
2021
- 2021-05-17 CN CN202110532905.3A patent/CN113435164B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110837564A (en) * | 2019-09-25 | 2020-02-25 | 中央民族大学 | Construction method of knowledge graph of multilingual criminal judgment books |
Also Published As
Publication number | Publication date |
---|---|
CN113435164A (en) | 2021-09-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11734328B2 (en) | Artificial intelligence based corpus enrichment for knowledge population and query response | |
WO2019227584A1 (en) | Method for parsing and processing resume data information, device, apparatus, and storage medium | |
Jayram et al. | Avatar information extraction system. | |
CN108287911B (en) | Relation extraction method based on constrained remote supervision | |
CN106934069B (en) | Data retrieval method and system | |
CN109886270B (en) | Case element identification method for electronic file record text | |
CN109933796B (en) | Method and device for extracting key information of bulletin text | |
CN111209412A (en) | Method for building knowledge graph of periodical literature by cyclic updating iteration | |
CN110287482B (en) | Semi-automatic participle corpus labeling training device | |
CN109460725B (en) | Receipt consumption details content mergence and extracting method, equipment and storage medium | |
CN112417891B (en) | Text relation automatic labeling method based on open type information extraction | |
CN110991163A (en) | Document comparison analysis method and device, electronic equipment and storage medium | |
CN113934909A (en) | Financial event extraction method based on pre-training language and deep learning model | |
CN110705211A (en) | Text key content marking method and device, computer equipment and storage medium | |
CN116821376B (en) | Knowledge graph construction method and system in coal mine safety production field | |
CN113362072A (en) | Wind control data processing method and device, electronic equipment and storage medium | |
CN113435164B (en) | Automatic labeling and extracting method and device for Mongolian arbitration document information | |
CN113159969A (en) | Financial long text rechecking system | |
CN111737498A (en) | Domain knowledge base establishing method applied to discrete manufacturing production process | |
CN111709221A (en) | Document generation method and system | |
CN112257442A (en) | Policy document information extraction method based on corpus expansion neural network | |
CN112418813A (en) | AEO qualification intelligent rating management system and method based on intelligent analysis and identification and storage medium | |
CN116362247A (en) | Entity extraction method based on MRC framework | |
CN114611489A (en) | Text logic condition extraction AI model construction method, extraction method and system | |
CN112395878B (en) | Text processing method and system based on electricity price policy |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |