CN113378542B

CN113378542B - Method and device for evaluating quality of referee document

Info

Publication number: CN113378542B
Application number: CN202110163510.0A
Authority: CN
Inventors: 杨哲; 艾中良; 李�灿; 贾高峰
Original assignee: China Judicial Big Data Research Institute Co ltd
Current assignee: China Judicial Big Data Research Institute Co ltd
Priority date: 2021-02-05
Filing date: 2021-02-05
Publication date: 2022-04-01
Anticipated expiration: 2041-02-05
Also published as: CN113378542A

Abstract

The invention provides a method and a device for evaluating the quality of a referee document. The method comprises the following steps: inputting a target referee document to be evaluated; analyzing the target referee document; performing index analysis on the content obtained by analysis, wherein the index analysis comprises wrongly written character screening, format integrity analysis, content normative analysis, law article citation accuracy analysis and content logic analysis; calculating the score of the index item according to the result of the index analysis; and further calculating the comprehensive evaluation score of the quality of the target referee document. The device comprises a referee document input module, a referee document analysis module, a document index scoring module and a document quality comprehensive evaluation module. The method and the device for evaluating the quality of the official document automatically executed by the computer system can be used for finding the quality problem of the official document in time while saving the labor cost, effectively assisting the official to quickly solve the quality problem of the official document and improving the quality of the official document.

Description

Method and device for evaluating quality of referee document

Technical Field

The invention relates to the technical field of natural language processing, in particular to a method and a device for evaluating the quality of a referee document.

Background

The referee document records the content and result of case trial of the people's court, is a carrier of the result of litigation activities, and is a certificate for the people's court to determine and distribute the entity right obligation of the party. A referee document with complete structure, complete elements and strict logic is a certificate for the right and burden of the party and also an important basis for supervising the trial activities of the national court. After the referee document is published on the internet, the referee document draws high attention from all social circles, and simultaneously, higher requirements are provided for the quality of the referee document. At present, the referee document still has the problems of non-standard format, non-meticulous citation law, wrongly written characters, wrong content logic and the like. These problems severely restrict the performance of the official documents, and cause the public to question the judging ability of judges and the writing ability of documents, even cause the public to question the justice and fairness of the justice.

Disclosure of Invention

In view of the above, the present invention provides a method and an apparatus for evaluating the quality of official documents, which realizes the function of evaluating the quality of official documents automatically executed by a computer system and independent of subjective judgment of people.

The invention provides a method for evaluating the quality of a referee document, which comprises the following steps:

first, the official document is entered. And inputting a document to be evaluated for the referee.

And secondly, analyzing the official document. And analyzing the target referee document input in the first step to obtain basic information, party information, trial process, forensic content, home opinion content, referee result content and other information.

And thirdly, analyzing the index of the referee document. And analyzing five indexes of wrongly-written characters, format completeness, content normalization, law citation accuracy and content logicality of the analyzed content in the second step.

Fourthly, calculating the scores of the index items. And (4) aiming at the analysis result in the third step, calculating the scores of the index items by respectively calling the calculation processing logic of each index item to obtain five index score data of the target referee document.

And fifthly, comprehensively evaluating the quality of the official documents. And calculating the comprehensive evaluation score of the target referee document according to the comprehensive score calculation model for evaluating the quality of the referee document by combining the five index score data of the target referee document.

Further, the first referee document entry part supports the functions of uploading entry and inputting document and case number acquisition. Self-entry mainly supports doc, docx and txt formats, and referee documents in other formats need to be converted into the doc, docx and txt formats. The input text and case number retrieval support judge automatically acquires the corresponding referee document according to the input case number.

Further, the second referee document analysis part adopts a method of combining rule identification and machine learning model identification to respectively analyze partial contents such as basic information, party information, trial process, dialectical content, hospital deem content, referee result content and the like.

Further, the third step of index analysis of the official document analyzes five indexes of wrongly written characters, format completeness, content normalization, law citation accuracy, content logicality and the like of the contents of each part analyzed in the second step by adopting a method of combining natural language processing with named entity identification and rule identification.

The screening and analyzing of the wrongly-written characters adopts a word vector-based automatic detection method of the wrongly-written characters, an N-gram language model is used for calculating probability, and word vectors are used for calculating the association degree of the words and the preceding and following context words, so that the false alarm rate of the wrongly-written characters is reduced. The method comprises the following steps:

1) firstly, a jieba word segmentation device is utilized to segment words of a target document S, and document elements obtained after word segmentation are marked as S_i；

2) Calculating a document element S_iThe collocation relevancy T of the context;

the method for calculating the collocation correlation degree T comprises the following steps:

wherein, T^LAnd T^RRespectively represent an element S_iA left degree of association with the above and a right degree of association with the below;

represents the element S_iAnd context (S) above₁,S₂,S₃,...,S_i-1) Combining probabilities of occurrence in context

Represents the element S_iAnd context (S)_i+1,S_i+2,S_i+3,...,S_n) Combining probabilities of occurrence in context

K_maxIs the maximum value of the occurrence probability of the vocabulary combination.

3) For the elements with the collocation association degree T lower than the threshold value, the word vector is utilized to obtain the element S_iThe word S 'with the closest semantic meaning is put into the original context, and the collocation association degree T' is calculated again;

4) calculating S_iProbability of context of the context

According to the conditional probability formula, S_iProbability of context of the context

The calculation method comprises the following steps:

wherein S is_i-M、S_i+MRespectively represent with the current word S_iWord with front-to-back distance M.

5) Wrongly written character detection function using comprehensive collocation correlation degree and probability

To determine whether the comment contains wrongly written words.

The calculation formula of (a) is as follows:

wherein λ is₁、λ₂Represents the proportion of the association degree and the probability of collocation, 0<λ₁<1，0<λ₂<1. null indicates absence.

The format integrity and content normalization inspection adopts a method of combining a named entity recognition technology with a preset rule to perform format integrity and content normalization inspection on six parts of target referee document format and document basic information, party information, trial process, legal and dialect content, hospital deeming content and referee result content.

And (3) the law article introduction accuracy adopts a Text-CNN model to establish a law article detection model. Firstly, after word segmentation processing is carried out on a target referee document, word vectors are trained by using a skip-gram model, and the accuracy of law article citation is detected by using a Text-CNN model.

The content logical analysis index adopts context collocation correlation degree T in the context, and if the value is smaller than a set threshold value, the text is marked as a logical problem text.

Further, the fourth step of the index scoring part of the official document acquires five index scoring data of the official document, including: wrong character screening, format completeness, content normalization, law citation accuracy and content logicality. And according to the calculation processing logic of each index item, calculating each index item of each case to obtain scores of five index data of the target referee document.

Further, the fifth step is a comprehensive evaluation section for the quality of the official documents, which performs comprehensive evaluation on the quality of the target official documents. And establishing a comprehensive evaluation score calculation model for judging the document quality evaluation by big data mining analysis and combining with expert experience, and calculating the comprehensive evaluation score of the target judging document according to the comprehensive evaluation score calculation model for judging the document quality evaluation.

The second aspect of the present invention provides an apparatus for evaluating the quality of official documents, comprising:

and the referee document input module is used for inputting the electronic referee document and transmitting the electronic referee document to the referee document analysis module. The self-recording referee document format mainly supports doc format, docx format and txt format, and referee documents in other formats need to be converted into doc format, docx format and txt format. The input mode also supports the mode of automatically acquiring the corresponding referee document by inputting the document and case number for retrieval.

The judge document analysis module is mainly used for completing the segmentation analysis of the judge document, and the analysis result is transmitted to the judge document index analysis module.

And the referee document index analysis module is used for performing index item analysis on the input analyzed segmented referee document content and transmitting an analysis result to the referee document index scoring module.

The judgment document index scoring module is used for calculating and processing each index item of each case according to the calculation and processing logic of each index item; counting and analyzing each index item calculation result of the target referee document to obtain case monitoring index data of the target referee document;

and the judge document quality comprehensive evaluation module is used for calculating the comprehensive scoring value of the target judge document according to the judge document quality evaluation comprehensive scoring model.

Compared with the prior art, the embodiment of the invention has the following beneficial effects: establishing a referee document quality evaluation index system; the judgment document quality evaluation index data can be obtained; according to the comprehensive scoring model for judging the document quality evaluation, calculating the comprehensive scoring value of the quality of each judging document; according to the method for evaluating the quality of the referee document, documents with incomplete formats, wrong law citation of evidence argument and wrongly written or mispronounced characters are pre-warned and prevented, and the method for evaluating the quality of the referee document is supervised by the automatic operation of a computer system. The method can monitor early warning in time to prevent errors in the documents, and greatly improve the timeliness of judging document quality inspection while saving labor cost.

Drawings

FIG. 1 is a schematic flow chart of a method for evaluating the quality of a referee document according to an embodiment of the present invention;

fig. 2 is a block diagram of an apparatus for evaluating the quality of official documents according to an embodiment of the present invention.

The specific implementation mode is as follows:

in order to make those skilled in the art better understand the technical solution of the present invention, the method and apparatus for evaluating the quality of official document provided in the present application will be described in further detail with reference to the accompanying drawings and specific embodiments. It should be understood that the following description of specific embodiments is illustrative only and is not intended to limit the scope of the invention.

Fig. 1 is a schematic flow chart of an implementation of a method for evaluating the quality of a referee document according to an embodiment of the present invention, which is detailed as follows:

and S101, inputting a referee document.

In the embodiment of the invention, the functions of uploading, entering and inputting the document number acquisition are supported. Self-entry mainly supports doc, docx and txt formats, and referee documents in other formats need to be converted into the doc, docx and txt formats. The input document case number retrieval support officer automatically acquires the corresponding referee document according to the input case number, wherein the data sources related to the related referee document are acquired, and the data sources include but are not limited to: a management system for trial process of the people's court and an electronic file management system of the people's court.

And S102, analyzing the target referee document.

The referee document analysis named entity recognition technology is combined with a preset rule method to analyze basic information, party information, trial process, dissatisfaction content, home opinion content, referee result content and the like. The basic information includes court name, document name and case number of title in the referee document; the head, the fact, the reason, the judgment basis, the judgment subject and the tail of the text; the signature and date of the drop.

And S103, analyzing the target referee document index.

The referee document index analysis part adopts a natural language processing method and a machine learning algorithm to analyze the basic information, the party information, the trial process, the content of the dispute, the content of the hospital deems, the content of the referee result and other contents in the second step: screening wrongly written characters, analyzing five indexes of format completeness, content normalization, law article citation accuracy and content logicality. The N-gram language model which has better semantic test performance and more accurate description on a single vocabulary is adopted when the training word vector is screened by wrongly written characters. The format integrity and the content normalization check six parts of the format of the referee document, the document basic information, the information of the parties, the trial process, the content of the legal dispute, the content of the hospital deems and the content of the referee result by using a named entity recognition technology and combining with preset rules. The accuracy of the law bar citation adopts a Text-CNN model to establish a law bar detection model, after word segmentation processing is carried out on a document, word vectors are trained by using a skip-gram model, and law bar citation detection is carried out by using the Text-CNN model. The content logical analysis index adopts context collocation correlation degree T in the context, and if the value is smaller than a set threshold value, the text is marked as a logical problem text.

And S104, a referee document index scoring module calculates and processes each index item of each case by adopting a weight-based mechanism on the basis of analysis and processing of the five index items, wherein the weight of each part can be adjusted according to the evaluation emphasis of the actual document. And (5) counting and analyzing the calculation results of all index items to obtain case monitoring index data of the target referee document.

And S105, comprehensively evaluating the quality of the official documents.

And the comprehensive evaluation part of the quality of the referee document performs comprehensive evaluation on the quality of the target referee document. And establishing a comprehensive scoring model for judging document quality evaluation by big data mining analysis and combining expert experience, and calculating the comprehensive scoring value of the target judging document.

The method comprises the following steps of establishing a comprehensive scoring model for judging document quality evaluation through big data mining analysis and combining expert experience, wherein the comprehensive scoring model specifically comprises the following steps: and performing mathematical modeling on the five acquired index data of the target referee document, and then finely adjusting parameters and weights by combining with expert experience to construct a comprehensive scoring model for evaluating the quality of the referee document.

Fig. 2 is a block diagram of an apparatus for evaluating the quality of a referee document according to an embodiment of the present invention, including:

and S201, a referee document input module is used for supporting uploading of a target document to be evaluated or searching of the document to be evaluated by inputting a case number. Self-entry mainly supports doc, docx and txt formats, and referee documents in other formats need to be converted into the doc, docx and txt formats. The input text and case number retrieval support judge automatically acquires the corresponding referee document according to the input case number. The module realizes the function of self-entering of the referee document or the retrieval of case numbers.

And S202, the referee document analysis module realizes the functions of analyzing parts such as basic information, party information, trial process, dialectical content, home courtesy content, referee result content and the like by using a rule method and a named entity identification technology.

S203, the referee document index analysis module adopts a natural language processing and machine learning method to realize the referee document index analysis function facing five indexes of wrongly-written character screening, format completeness, content normalization, law citation accuracy and content logicality.

S204, a referee document index scoring module calculates and processes each index item of each case according to the calculation and processing logic of each index item; counting and analyzing the calculation results of each index item of the referee document to realize the function of scoring data of each index of the referee document;

and S205, a comprehensive judgment document quality evaluation module evaluates the comprehensive scoring model according to the judgment document quality to realize the function of calculating the comprehensive scoring value of the target judgment document.

Based on the same inventive concept, another embodiment of the present invention provides an electronic device (computer, server, smartphone, etc.) comprising a memory storing a computer program configured to be executed by the processor, and a processor, the computer program comprising instructions for performing the steps of the inventive method.

Based on the same inventive concept, another embodiment of the present invention provides a computer-readable storage medium (e.g., ROM/RAM, magnetic disk, optical disk) storing a computer program, which when executed by a computer, performs the steps of the inventive method.

Parts of the invention not described in detail are well known to the person skilled in the art.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: modifications to the technical solutions described in the above embodiments, or equivalents of some technical features of the technical solutions, are not essential to the spirit and scope of the technical solutions of the embodiments of the present invention, and are included in the scope of the present invention.

Claims

1. A method for evaluating the quality of a referee document is characterized by comprising the following steps:

inputting a target referee document to be evaluated;

analyzing the target referee document;

performing index analysis on the content obtained by analysis, wherein the index analysis comprises wrongly-written character screening, format integrity analysis, content standardization analysis, law entry accuracy analysis and content logic analysis;

calculating the score of the index item according to the result of the index analysis;

calculating the comprehensive evaluation score of the quality of the target referee document according to the scores of the index items;

the wrongly written character screening method comprises the following steps:

1) utilizing a word segmentation device to segment the target document S, and recording the document elements obtained after word segmentation as S_i；

2) Calculating a document element S_iThe collocation relevancy T of the context; the method for calculating the collocation correlation degree T comprises the following steps:

K_maxThe maximum value of the occurrence probability of the vocabulary combination is obtained;

4) calculating S_iProbability of context of the context

The calculation method comprises the following steps:

wherein S is_i-M、S_i+MRespectively represent with the current word S_iA word with a front-to-back distance of M;

To determine whether the comment contains wrongly written words,

the calculation formula of (a) is as follows:

wherein λ is₁、λ₂Represents the proportion of the association degree and the probability of collocation, 0<λ₁<1，0<λ₂<1。

2. The method according to claim 1, characterized in that said entering of the target official document to be evaluated comprises: uploading a referee document; or automatically acquiring the corresponding referee document according to the input case number.

3. The method of claim 1, wherein said parsing the target referee document comprises: and respectively analyzing basic information, party information, trial process, dialectical content, home-thought content and referee result content by adopting a mode of combining rule identification with machine learning model identification.

4. The method of claim 1, wherein the format integrity analysis and the content normalization analysis adopt a method of combining a named entity recognition technology with preset rules to perform format integrity and content normalization check on the format and basic information, party information, trial process, dialectical content, home opinion content and referee result content of the target referee document; analyzing the accuracy of the law bar citation, establishing a law bar detection model by adopting a Text-CNN model, firstly, carrying out word segmentation processing on a target referee document, then, training word vectors by using a skip-gram model, and then, carrying out the accuracy detection of the law bar citation by using the Text-CNN model; and the content logic analysis adopts context matching relevance degree T in the context, and if the value of the T is less than a set threshold value, the text is marked as a logic problem text.

5. The method according to claim 1, wherein the index item score calculation based on the result of the index analysis includes: and acquiring scoring data of each index item, and performing calculation processing on each index item according to the calculation processing logic of each index item to obtain scores of five index items of the target referee document.

6. The method according to claim 1, wherein said calculating a composite evaluation score of the quality of the target official document based on the respective index item scores comprises: and establishing a comprehensive evaluation score calculation model for judging the document quality evaluation by big data mining analysis and combining with expert experience, and calculating the comprehensive evaluation score of the target judging document according to the comprehensive evaluation score calculation model for judging the document quality evaluation.

7. An apparatus for evaluating the quality of official documents by using the method according to any one of claims 1 to 6, comprising:

the judging document input module is used for inputting a target judging document to be evaluated and transmitting the target judging document to the judging document analysis module;

the judge document analysis module is used for analyzing the target judge document and transmitting the analysis result to the judge document index analysis module;

the judge document index analysis module is used for carrying out index analysis on the analyzed content and transmitting an index analysis result to the judge document index scoring module; the index analysis comprises wrongly-written character screening, format integrity analysis, content normative analysis, law article citation accuracy analysis and content logic analysis;

the judge document index scoring module is used for calculating the index item score according to the index analysis result and transmitting the index item score calculation result to the judge document quality comprehensive evaluation module;

and the comprehensive evaluation module of the quality of the referee document is used for calculating the comprehensive evaluation score of the quality of the target referee document according to the scores of the index items.

8. An electronic apparatus, comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for performing the method of any of claims 1 to 6.

9. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a computer, implements the method of any one of claims 1 to 6.