CN111950253B - Evidence information extraction method and device for referee document - Google Patents

Evidence information extraction method and device for referee document Download PDF

Info

Publication number
CN111950253B
CN111950253B CN202010886867.7A CN202010886867A CN111950253B CN 111950253 B CN111950253 B CN 111950253B CN 202010886867 A CN202010886867 A CN 202010886867A CN 111950253 B CN111950253 B CN 111950253B
Authority
CN
China
Prior art keywords
evidence
text
candidate
information
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010886867.7A
Other languages
Chinese (zh)
Other versions
CN111950253A (en
Inventor
晋耀红
李德彦
刘大双
张志一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dingfu Intelligent Technology Co ltd
Original Assignee
Dingfu Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dingfu Intelligent Technology Co ltd filed Critical Dingfu Intelligent Technology Co ltd
Priority to CN202010886867.7A priority Critical patent/CN111950253B/en
Publication of CN111950253A publication Critical patent/CN111950253A/en
Application granted granted Critical
Publication of CN111950253B publication Critical patent/CN111950253B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application provides a evidence information extraction method and device for referee documents. The method comprises the following steps: after the judge document text is preprocessed, a crime fact text and an evidence expression text are respectively extracted from the preprocessed judge document text, and target evidence information is extracted from target evidence expression texts corresponding to the crime fact text by adopting target evidence extraction rules aiming at any crime fact text. On one hand, because the embodiment of the application adopts the target evidence extraction rule to extract the target evidence information, compared with the mode of extracting the evidence according to human experience in the prior art, the method is not easy to be influenced by human factors, and the accuracy of the acquired evidence information can be improved; on the other hand, the method does not need to manually analyze the judge document, so that manpower resources are greatly saved, and the evidence extraction efficiency is improved.

Description

Evidence information extraction method and device for referee document
Technical Field
The application relates to the technical field of data processing, in particular to a evidence information extraction method and device for referee documents.
Background
The judge document is taken as legal document, the content is longer, the words are more obscure and understandable, and the content needing to be carefully browsed is difficult to be quickly positioned from the whole judge document. For example, if the user needs to search for content related to evidence from the referee document, browsing from the first character of the referee document is needed, after knowing each part of content set forth by the referee document, judging the part of content where evidence may appear, and further extracting content related to evidence from the part of content.
However, this method of extracting evidence information by manually analyzing the referee document is not only time-consuming, but also affected by uncertain factors such as learning and thinking, which easily results in low accuracy of the acquired evidence information and no reference value.
Based on this, there is a need for a method for extracting evidence information for referee documents, which is used for solving the problems that the method for manually extracting the evidence information in the prior art is time-consuming and labor-consuming, and the accuracy of the acquired evidence information is low.
Disclosure of Invention
The application provides a evidence information extraction method and device for a referee document, which can be used for solving the technical problems that the mode of manually extracting evidence information in the prior art is time-consuming and labor-consuming, and the acquired evidence information is low in accuracy easily.
In a first aspect, an embodiment of the present application provides a method for extracting evidence information for referee documents, where the method includes:
acquiring a judge document text;
preprocessing the referee document text to obtain a preprocessed referee document text;
extracting a crime fact text from the preprocessed referee document text according to a preset crime fact starting keyword and a preset crime fact ending keyword;
according to the preset evidence expression start keywords and the preset evidence expression end keywords, extracting a evidence expression text from the preprocessed referee document text;
for any crime fact text, determining a target evidence expression text corresponding to the crime fact text according to a position index corresponding to the crime fact text and a position index corresponding to each evidence expression text;
determining a target evidence extraction rule according to the target evidence type and the corresponding relation between the preset evidence type and the evidence extraction rule; the evidence extraction rule is determined according to at least one of a keyword corresponding to the evidence type, a context corresponding to the evidence type and a symbol corresponding to the evidence type;
And extracting target evidence information from the target evidence expression text by adopting the target evidence extraction rule.
With reference to the first aspect, in an implementation manner of the first aspect, extracting target evidence information from the target evidence expression text by using the target evidence extraction rule includes:
extracting a plurality of candidate target evidence information from the target evidence expression text by adopting the target evidence extraction rule;
judging whether the position index of the first candidate target evidence information is the same as the position index of the second candidate target evidence information, if the position index of the first candidate target evidence information is the same as the position index of the second candidate target evidence information, determining any candidate target evidence information in the first candidate target evidence information or the second candidate target evidence information as target evidence information;
if the position index of the first candidate target evidence information is different from the position index of the second candidate target evidence information, judging whether the first candidate target evidence information and the second candidate target evidence information have content intersection or not;
if the first candidate target evidence information and the second candidate target evidence information have content intersection, merging the first candidate target evidence information and the second candidate target evidence information and then taking the merged first candidate target evidence information and the second candidate target evidence information as the target evidence information;
And if the first candidate target evidence information and the second candidate target evidence information do not have content intersection, determining the first candidate target evidence information and the second candidate target evidence information as the target evidence information.
With reference to the first aspect, in an implementation manner of the first aspect, extracting the crime fact text from the preprocessed referee document text according to a preset crime fact start keyword and a preset crime fact end keyword includes:
extracting at least one candidate crime fact text from the preprocessed referee document text according to a preset crime fact start keyword and a preset crime fact end keyword;
determining a first candidate crime fact text and a second candidate crime fact text as the crime fact text if a position index of the first candidate crime fact text is different from a position index of the second candidate crime fact text;
if the position index of the first candidate crime fact text is the same as the position index of the second candidate crime fact text, determining any one of the first candidate crime fact text or the second candidate crime fact text as the crime fact text;
Wherein the first candidate crime fact text is any one of the at least one candidate crime fact text, and the second candidate crime fact text is any one of the at least one candidate crime fact text other than the first candidate crime fact text.
With reference to the first aspect, in an implementation manner of the first aspect, the extracting the document expression text from the preprocessed referee document text according to a preset evidence expression start keyword and a preset evidence expression end keyword includes:
extracting at least one candidate evidence expression text from the preprocessed referee document text according to a preset evidence expression start keyword and a preset evidence expression end keyword;
if the position index of the first candidate evidence expression text is different from that of the second candidate evidence expression text, determining the first candidate evidence expression text and the second evidence expression text as the evidence expression text;
if the position index of the first candidate evidence expression text is the same as the position index I of the second candidate evidence expression text, determining any one of the first candidate evidence expression text or the second candidate evidence expression text as the evidence expression text;
The first candidate evidence expression text is any one candidate evidence expression text in the at least one candidate evidence expression text, and the second candidate evidence expression text is any one candidate evidence expression text except the first candidate evidence expression text in the at least one candidate evidence expression text.
With reference to the first aspect, in an implementation manner of the first aspect, the target evidence type is a physical evidence, where the physical evidence includes at least one of a document, a record, a material evidence, and electronic data;
when the target evidence type is a book, extracting target evidence information by adopting the following steps:
extracting the document information from the target evidence expression text according to the keywords corresponding to the document and the symbols corresponding to the document;
when the target evidence type is a stroke list, extracting target evidence information by adopting the following steps:
extracting the stroke information from the target evidence expression text according to the keywords corresponding to the stroke and the symbols corresponding to the stroke;
when the target evidence type is a material evidence, extracting target evidence information by adopting the following steps:
extracting material evidence information from the target evidence expression text according to the keywords corresponding to the material evidence;
When the target evidence type is electronic data, extracting target evidence information by adopting the following steps:
and extracting electronic data information from the target evidence expression text according to the keywords corresponding to the electronic data and the symbols corresponding to the electronic data.
With reference to the first aspect, in an implementation manner of the first aspect, the target evidence type is linguistic evidence, and the linguistic evidence includes at least one of a witness, a victim statement, a victim offer, and a victim dialect;
when the target evidence type is witness, extracting target evidence information by adopting the following steps:
extracting evidence proving information from the target evidence expression text according to the name of the witness and the context corresponding to the witness;
when the target evidence type is stated by a victim, the following steps are adopted to extract target evidence information:
extracting presentation information of the victim from the target evidence expression text according to the name of the victim and the context corresponding to the presentation of the victim;
when the target evidence type is provided for the reported person, the following steps are adopted to extract target evidence information:
extracting the information of the advisee from the target evidence expression text according to the name of the advisee and the corresponding context of the advisee;
When the target evidence type is dialect of the reported person, the following steps are adopted to extract target evidence information:
and extracting the information of the dialer from the target evidence expression text according to the name of the dialer and the context of the dialer dialect.
With reference to the first aspect, in an implementation manner of the first aspect, the target evidence type is authoritative evidence, and the authoritative evidence includes at least one of an authentication opinion, an authentication authority name, and an authentication certificate name;
when the target evidence type is the identification opinion, extracting target evidence information by adopting the following steps:
extracting identification opinion information from the target evidence expression text according to keywords corresponding to the identification opinion;
when the target evidence type is the name of the identification mechanism, extracting target evidence information by adopting the following steps:
extracting the name of the identification mechanism from the identification opinion information according to the keywords corresponding to the name of the identification mechanism;
when the target evidence type is the identification certificate name, extracting target evidence information by adopting the following steps:
and extracting the identification certificate name from the identification opinion information according to the keyword corresponding to the identification certificate name and the symbol corresponding to the identification certificate name.
With reference to the first aspect, in an implementation manner of the first aspect, preprocessing the referee document text to obtain a preprocessed referee document text includes:
performing symbol processing on the judge document text;
according to a preset trial finding out starting keyword and a preset trial finding out ending keyword, extracting a trial finding text from the processed referee document text;
and taking the trial finding text as the preprocessed referee document text.
With reference to the first aspect, in an implementation manner of the first aspect, the performing symbol processing on the referee document text includes:
converting the English symbols in the judge document text into Chinese symbols;
converting a plurality of continuous line feed symbols in the referee document text into a line feed symbol;
and removing the hollow grid symbols in the referee document text.
In a second aspect, an embodiment of the present application provides an evidence information extraction apparatus for referee documents, the apparatus including:
the acquiring unit is used for acquiring the judge document text;
the processing unit is used for preprocessing the referee document text to obtain a preprocessed referee document text; extracting a crime fact text from the preprocessed referee document text according to a preset crime fact starting keyword and a preset crime fact ending keyword; according to the preset evidence expression start keyword and the preset evidence expression end keyword, extracting a evidence expression text from the preprocessed referee document text; and determining a target evidence expression text corresponding to any crime fact text according to the position index corresponding to the crime fact text and the position index corresponding to each evidence expression text; determining a target evidence extraction rule according to the target evidence type and the corresponding relation between the preset evidence type and the evidence extraction rule; the evidence extraction rule is determined according to at least one of a keyword corresponding to the evidence type, a context corresponding to the evidence type and a symbol corresponding to the evidence type; and extracting target evidence information from the target evidence expression text by adopting the target evidence extraction rule.
With reference to the second aspect, in an implementation manner of the second aspect, the processing unit is specifically configured to:
extracting a plurality of candidate target evidence information from the target evidence expression text by adopting the target evidence extraction rule; judging whether the position index of the first candidate target evidence information is the same as the position index of the second candidate target evidence information, and if the position index of the first candidate target evidence information is the same as the position index of the second candidate target evidence information, determining any candidate target evidence information in the first candidate target evidence information or the second candidate target evidence information as target evidence information; and if the position index of the first candidate target evidence information is different from the position index of the second candidate target evidence information, judging whether the first candidate target evidence information and the second candidate target evidence information have content intersection or not; if the first candidate target evidence information and the second candidate target evidence information have content intersection, merging the first candidate target evidence information and the second candidate target evidence information to be used as the target evidence information; and if the first candidate target evidence information and the second candidate target evidence information have no content intersection, determining the first candidate target evidence information and the second candidate target evidence information as the target evidence information.
With reference to the second aspect, in an implementation manner of the second aspect, the processing unit is specifically configured to:
extracting at least one candidate crime fact text from the preprocessed referee document text according to a preset crime fact start keyword and a preset crime fact end keyword; and determining a first candidate crime fact text and a second candidate crime fact text as the crime fact text if a position index of the first candidate crime fact text is different from a position index of the second candidate crime fact text; and if the position index of the first candidate crime fact text is the same as the position index of the second candidate crime fact text, determining any one of the first candidate crime fact text or the second candidate crime fact text as the crime fact text;
wherein the first candidate crime fact text is any one of the at least one candidate crime fact text, and the second candidate crime fact text is any one of the at least one candidate crime fact text other than the first candidate crime fact text.
With reference to the second aspect, in an implementation manner of the second aspect, the processing unit is specifically configured to:
extracting at least one candidate evidence expression text from the preprocessed referee document text according to a preset evidence expression start keyword and a preset evidence expression end keyword; and if the position index of the first candidate evidence expression text is different from the position index of the second candidate evidence expression text, determining the first candidate evidence expression text and the second evidence expression text as the evidence expression text; and if the position index of the first candidate evidence expression text is the same as the position index of the second candidate evidence expression text, determining any one of the first candidate evidence expression text or the second candidate evidence expression text as the evidence expression text;
the first candidate evidence expression text is any one candidate evidence expression text in the at least one candidate evidence expression text, and the second candidate evidence expression text is any one candidate evidence expression text except the first candidate evidence expression text in the at least one candidate evidence expression text.
With reference to the second aspect, in an implementation manner of the second aspect, the target evidence type is a physical evidence, and the physical evidence includes at least one of a document, a stroke, a physical evidence and electronic data;
when the target evidence type is a book, extracting target evidence information by adopting the following steps:
extracting the document information from the target evidence expression text according to the keywords corresponding to the document and the symbols corresponding to the document;
when the target evidence type is a stroke list, extracting target evidence information by adopting the following steps:
extracting the stroke information from the target evidence expression text according to the keywords corresponding to the stroke and the symbols corresponding to the stroke;
when the target evidence type is a material evidence, extracting target evidence information by adopting the following steps:
extracting material evidence information from the target evidence expression text according to the keywords corresponding to the material evidence;
when the target evidence type is electronic data, extracting target evidence information by adopting the following steps:
and extracting electronic data information from the target evidence expression text according to the keywords corresponding to the electronic data and the symbols corresponding to the electronic data.
With reference to the second aspect, in an implementation manner of the second aspect, the target evidence type is linguistic evidence, and the linguistic evidence includes at least one of witness, victim statement, victim offer, and victim dialect;
When the target evidence type is witness, extracting target evidence information by adopting the following steps:
extracting evidence proving information from the target evidence expression text according to the name of the witness and the context corresponding to the witness;
when the target evidence type is stated by a victim, the following steps are adopted to extract target evidence information:
extracting presentation information of the victim from the target evidence expression text according to the name of the victim and the context corresponding to the presentation of the victim;
when the target evidence type is provided for the reported person, the following steps are adopted to extract target evidence information:
extracting the information of the advisee from the target evidence expression text according to the name of the advisee and the corresponding context of the advisee;
when the target evidence type is dialect of the reported person, the following steps are adopted to extract target evidence information:
and extracting the information of the dialer from the target evidence expression text according to the name of the dialer and the context of the dialer dialect.
With reference to the second aspect, in an implementation manner of the second aspect, the target evidence type is authoritative evidence, and the authoritative evidence includes at least one of an authentication opinion, an authentication organ name, and an authentication certificate name;
When the target evidence type is the identification opinion, extracting target evidence information by adopting the following steps:
extracting identification opinion information from the target evidence expression text according to keywords corresponding to the identification opinion;
when the target evidence type is the name of the identification mechanism, extracting target evidence information by adopting the following steps:
extracting the name of the identification mechanism from the identification opinion information according to the keywords corresponding to the name of the identification mechanism;
when the target evidence type is the identification certificate name, extracting target evidence information by adopting the following steps:
and extracting the identification certificate name from the identification opinion information according to the keyword corresponding to the identification certificate name and the symbol corresponding to the identification certificate name.
With reference to the second aspect, in an implementation manner of the second aspect, the processing unit is specifically configured to:
performing symbol processing on the judge document text; according to the preset trial finding out the starting keywords and the preset trial finding out the ending keywords, extracting the trial finding out text from the processed referee document text; and taking the trial finding text as the preprocessed referee document text.
With reference to the second aspect, in an implementation manner of the second aspect, the processing unit is specifically configured to:
converting the English symbols in the judge document text into Chinese symbols; converting a plurality of continuous line feed symbols in the referee document text into a line feed symbol; and removing the hollow grid symbol in the referee document text.
In the embodiment of the application, after the judge document text is preprocessed, a crime fact text and an evidence expression text are respectively extracted from the preprocessed judge document text, and target evidence information is extracted from target evidence expression texts corresponding to the crime fact text by adopting target evidence extraction rules aiming at any crime fact text. On one hand, because the embodiment of the application adopts the target evidence extraction rule to extract the target evidence information, compared with the mode of extracting the evidence according to human experience in the prior art, the method is not easy to be influenced by human factors, and the accuracy of the acquired evidence information can be improved; on the other hand, the method does not need to manually analyze the judge document, so that manpower resources are greatly saved, and the evidence extraction efficiency is improved.
Drawings
Fig. 1 is a schematic flow diagram corresponding to a method for extracting evidence information of a referee document according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of the integrity corresponding to the evidence information extraction method for referee documents according to the embodiment of the present application;
FIG. 3 is a schematic diagram of a device for extracting evidence information for referee documents according to an embodiment of the present application;
fig. 4 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.
Referring to fig. 1, a schematic flow diagram corresponding to a method for extracting evidence information of a referee document according to an embodiment of the present application is shown. As shown in fig. 1, the method specifically comprises the following steps:
and step 101, acquiring a referee document text.
And 102, preprocessing the referee document text to obtain the preprocessed referee document text.
And step 103, extracting the crime fact text from the preprocessed referee document text according to the preset crime fact start keyword and the preset crime fact end keyword.
And 104, expressing a start keyword according to preset evidence and expressing an end keyword according to the preset evidence, and extracting a evidence expression text from the preprocessed referee document text.
Step 105, for any crime fact text, determining a target evidence expression text corresponding to the crime fact text according to the position index corresponding to the crime fact text and the position index corresponding to each evidence expression text.
And step 106, determining a target evidence extraction rule according to the target evidence type and the corresponding relation between the preset evidence type and the evidence extraction rule.
And 107, extracting target evidence information from the target evidence expression text by adopting a target evidence extraction rule.
In the embodiment of the application, after the judge document text is preprocessed, a crime fact text and an evidence expression text are respectively extracted from the preprocessed judge document text, and target evidence information is extracted from target evidence expression texts corresponding to the crime fact text by adopting target evidence extraction rules aiming at any crime fact text. On one hand, because the embodiment of the application adopts the target evidence extraction rule to extract the target evidence information, compared with the mode of extracting the evidence according to human experience in the prior art, the method is not easy to be influenced by human factors, and the accuracy of the acquired evidence information can be improved; on the other hand, the method does not need to manually analyze the judge document, so that manpower resources are greatly saved, and the evidence extraction efficiency is improved.
In step 101, the judge document records the process and result of case trial, which is the carrier of litigation activity result.
The judge document comprises a judgment book, a judging book, a reconciliation book, a decision book and the like, wherein the judgment book can be divided into criminal judgment books, civil judgment books, administrative judgment books and the like.
The following describes a directory structure of a criminal judgment document taking the case that the referee document is the criminal judgment document as an example.
The directory structure of the criminal judgment mainly comprises a document head, principal information, an examination pass, examination find, a dispute focus, a court's belief, a judgment result, a document tail and an appendix.
The document head comprises contents such as a court name, a document number and the like.
The principal information includes contents such as interviewee information, dialer information, and complaint information.
The process of court trial cases is generally described in the trial passes.
The approval ascertains the contents including the fact, reason, and applicable legal basis of decision approval.
The dispute focus includes the content of the disputes between the parties, and it should be noted that there may be no dispute focus in the criminal decision.
The court is considered to include the court's perspective.
The judgment result comprises a judgment result published by a court.
The tail part of the document comprises the content of judging length information, judging person information, booklet information and the like.
The appendix includes reference legal names and entry numbers.
In step 102, the referee document text is preprocessed in various ways, for example, the preprocessing may include symbol processing, text cleaning, text filtering, and the like.
The method of preprocessing referee document text is described in detail below in conjunction with specific examples.
In one example, after the referee document text is subjected to symbol processing, a start keyword and a preset end keyword are ascertained according to a preset trial, the trial ascertained text is extracted from the processed referee document text, and the trial ascertained text is used as the preprocessed referee document text.
The specific mode of symbol processing may include at least one of the following three modes: (1) Converting English symbols in the judge document text into Chinese symbols; (2) Converting a plurality of continuous line feed symbols in the judge document text into a line feed symbol; (3) removing the hollow lattice symbols in the referee document text.
Further, the skilled artisan may preset an aesthetic ascertaining starting keyword and an aesthetic ascertaining ending keyword. Wherein, the beginning keywords of the trial finding include "home finding", "trial finding", and "trial finding as follows", etc. The approval ascertaining end keywords include "regarded as by the home" and "regarded as by the home as follows", and the like. That is, the extracted trial-finding text may refer to a text from "home finding" (or "trial finding", or "trial finding as follows") to "home regarding" (or "home regarding as follows").
In another example, the judge document text may be cleaned, the appendix is removed from the judge document, the start keyword and the end keyword are ascertained according to a preset examination, the examination ascertaining text is extracted from the judge document text after the appendix is removed, and the examination ascertaining text is used as the preprocessed judge document text.
Specifically, the appendix comprises cited legal and legal contents, the existence of the appendix easily influences the structuring of the judge document catalogue and the accuracy of evidence information extraction, and based on the fact, the appendix is removed from the judge document text in a text cleaning mode.
In the embodiment of the application, the annex can be removed by adopting regular grammar, namely, an extraction expression is designed by combining the rule of the annex information, and the text information of the starting position of the annex is positioned. And intercepting the text content of the referee document by utilizing the information content of the starting position of the appendix and combining a traditional character string intercepting method. And only reserving the starting position of the judgment book to the matched starting position of the annex, wherein the reserved document content is the judge document text after the annex is removed.
In other possible examples, the person skilled in the art may also pre-process the referee document according to experience and practical situations in other ways, for example, after sign processing may be performed on the referee document first, text cleaning may be performed on the processed referee document, the appendix is removed from the processed referee document, the starting keyword is ascertained according to a preset trial, the ending keyword is ascertained according to a preset trial, the trial ascertained text is extracted from the referee document after removing the appendix, and the trial ascertained text is used as the pre-processed referee document, which is not limited in particular.
In step 103, since there may be a plurality of crime facts in the criminal judgement book, the extracted crime facts text needs further processing, and the repeated crime facts are removed to be used as the crime facts text.
In the implementation process, at least one candidate crime fact text can be extracted from the preprocessed referee document text according to a preset crime fact start keyword and a preset crime fact end keyword; further, whether the position index of the first candidate crime fact text is the same as the position index of the second candidate crime fact text is judged, and if the position index of the first candidate crime fact text is different from the position index of the second candidate crime fact text, the first candidate crime fact text and the second candidate crime fact text are determined to be crime fact texts; and if the position index of the first candidate crime fact text is the same as the position index of the second candidate crime fact text, determining any one of the first candidate crime fact text or the second candidate crime fact text as the crime fact text.
The first candidate crime fact text is any one candidate crime fact text in the at least one candidate crime fact text, and the second candidate crime fact text is any one candidate crime fact text except the first candidate crime fact text in the at least one candidate crime fact text.
Further, a crime facts starting keyword and a crime facts ending keyword may be preset by those skilled in the art. Among these, crime-initiating keywords include "XX month and XX day in XX year", and "XX month and XX day in XX year", etc. Wherein "a", "a" and "a" are numbers for distinguishing crime facts of different crimes, and according to the number of crime facts, the numbers of "a", "a" and the like may be adopted in order, and correspondingly, the crime fact initiation keywords are "a first, a XX year XX month XX day", "a second, a XX year XX month XX day", "a third, a XX year XX month XX day" and the like in order.
The crime fact end key word includes "the following evidence confirmed by the above-mentioned fact … … institute: the evidence identified by the "and" the fact … … "institute is as follows: the evidence of "," above facts … … is as follows: the evidence of this crime … … is as follows: the evidence of the facts … … of "and" above is as follows: "etc.
That is, the extracted crime fact text refers to the following evidence confirmed from "XX year XX month XX day" (or "first, XX year XX month XX day", or "second, XX year XX month XX day") to "the above fact … … institute: text between "(or" evidence identified by the above-mentioned fact … … institute as follows "," or "evidence … … as follows", "or" evidence … … for crime as follows "," or "evidence … … as follows)").
In step 104, in the criminal decision book where there are multiple crime facts, for each crime fact, there is a corresponding evidence expression, that is, the extracted evidence expression text also needs to be further processed, and after repeated evidence expressions are removed, the text can be expressed as evidence.
In the implementation process, at least one candidate evidence expression text can be extracted from the preprocessed referee document text according to a preset evidence expression start keyword and a preset evidence expression end keyword; further, judging whether the position index of the first candidate evidence expression text is the same as the position index of the second candidate evidence expression text, and if the position index of the first candidate evidence expression text is different from the position index of the second candidate evidence expression text, determining the first candidate evidence expression text and the second evidence expression text as the evidence expression texts; if the position index of the first candidate evidence expression text is the same as that of the second candidate evidence expression text, judging whether the text content of the first candidate evidence expression text is the same as that of the second candidate evidence expression text, if the text content of the first candidate evidence expression text is the same as that of the second candidate evidence expression text, determining any one of the first candidate evidence expression text or the second candidate evidence expression text as the evidence expression text, and if the text content of the first candidate evidence expression text is different from that of the second candidate evidence expression text, determining the candidate evidence expression text with longer character length as the evidence expression text.
The first candidate evidence expression text is any one of at least one candidate evidence expression text, and the second candidate evidence expression text is any one of the at least one candidate evidence expression text except the first candidate evidence expression text.
Further, the person skilled in the art may preset the evidence expression start keyword and the evidence expression end keyword. Wherein the evidence expression initiation keywords include the following evidence confirmed by the institute of facts … …, above: the evidence identified by the "and" the fact … … "institute is as follows: the evidence of "," above facts … … is as follows: the evidence of this crime … … is as follows: the evidence of the facts … … of "and" above is as follows: "etc.
The evidence expression end keywords include "validated by home", "validated by court", and "validated by court", etc.
That is, the extracted evidence expression text refers to a text from "the above facts" (or "the above facts … … are confirmed by the institute of evidence," or "the above facts … … are confirmed by the institute of evidence," or "the above facts … … are confirmed by the institute of evidence," or "the crime facts … … are confirmed by the institute of evidence," or "the above facts … … are confirmed by the institute of evidence") to "the institute of evidence" (or "the institute of evidence," or "the court of evidence").
It should be noted that, as is clear from the descriptions of the crime-ending keyword and the evidence-expression starting keyword in step 103 and step 104, the crime-ending keyword corresponds to the evidence-expression starting keyword, and both may be identical.
In step 105, if the interviewee has only one crime fact, a crime fact text and a evidence expression text can be extracted from the trial-and-error text, and the crime fact text corresponds to the evidence expression text with easy understanding.
If the interviewee has a plurality of crime facts, a plurality of crime fact texts and a plurality of evidence expression texts can be extracted from the trial finding text, and for any crime fact text, a target evidence expression text corresponding to the crime fact text can be determined according to the position index corresponding to the crime fact text and the position index corresponding to each evidence expression text.
The position index corresponding to the crime facts text refers to the position of the crime facts text in the trial finding text, and the position index corresponding to the evidence expression text refers to the position of the evidence expression text in the trial finding text.
Considering that the criminal judgment is generally followed by the evidence expression, according to the position index corresponding to the criminal text and the position index corresponding to each evidence expression text, the evidence expression text nearest to the criminal text can be determined as the target evidence expression text corresponding to the criminal text.
In the embodiment of the application, the evidence extraction rule can be determined according to at least one of a keyword corresponding to the evidence type, a context corresponding to the evidence type and a symbol corresponding to the evidence type. Where the context to which the evidence type corresponds refers to when the evidence of the type appears in the evidence expression text, other language units that appear before or after it are the context of the evidence.
In the embodiment of the application, the evidence types comprise physical evidence, language evidence and authoritative evidence.
Wherein the physical evidence comprises at least one of a book certificate, a pen record, a material evidence and electronic data; the linguistic evidence includes at least one of witness, victim statement, victim supply and victim debate; authoritative evidence includes at least one of an authentication opinion, an authentication authority name, and an authentication certificate name.
For different types of evidence, the evidence extraction rules are also different, and the evidence extraction rules corresponding to the different evidence types are respectively described below.
(1) Book card
The evidence extraction rule corresponding to the document may include a keyword corresponding to the document and a symbol corresponding to the document. The keywords corresponding to the certificates comprise a book … …, a table … …, a description … … and the like. The corresponding correspondence of the book card includes the name number (i.e., book.
(2) Pen list
The evidence extraction rule corresponding to the stroke may include a keyword corresponding to the stroke and a symbol corresponding to the stroke. The keywords corresponding to the strokes include a stroke, a stroke making, and the like. The symbols corresponding to the strokes include a colon (i.e., a:).
(3) Material evidence
Evidence extraction rules corresponding to the evidence may include keywords corresponding to the evidence. The keywords corresponding to the material evidence comprise a starting keyword corresponding to the material evidence (such as 'theft of property for checking'), an ending keyword corresponding to the material evidence (such as legal withholding), and an article keyword corresponding to the material evidence (such as 'one', and 'one', etc.).
(4) Electronic data
The evidence extraction rule corresponding to the electronic data may include a keyword corresponding to the electronic data and a symbol corresponding to the electronic data. The keywords corresponding to the electronic data comprise an optical disc, a recording, a video, a USB flash disk and the like. The symbols corresponding to the electronic data include a colon.
(5) Witness and witness
The evidence extraction rules for a witness' witness may include the name of the witness and the context for the witness. Wherein, the context corresponding to the witness is a language unit indicating that before and after the witness, for example, "XX (witness) saying", "XX (witness) testimony" or "XX (witness) statement" generally appears before the witness; "XXX (another party) … …" generally appears after witness.
In the practical implementation process, a plurality of witness can be considered, so that the embodiment of the application can also establish the mapping relation between the witness and the witness. For example, witness A states that "seeing that the interviewee and the interviewee disputed in XX, X month and X day in XX square"; witness B states that "witness B is with the victim when the victim is in distress". Based on the above examples, the following mapping relationship may be established in the embodiment of the present application: the "witness A" and "see the mapping relationship between the victims and the victims on the X-month-X-day of XX, disputes on XX square", and the mapping relationship between the "witness B" and "witness B and the victims together when they are in distress".
In the embodiment of the present application, the expression form of the mapping relationship between the witness and the witness may be various, and may be expressed in a form of a table or a key-value, which is not particularly limited.
(6) Statement of victim
The evidence extraction rules corresponding to the victim statement may include the name of the victim and the context corresponding to the victim statement. Wherein the context of the victim statement corresponds to a language unit indicating that the occurrence of the victim statement is followed, e.g., "XX (victim) say" etc. generally occurs before the victim statement; "XXX (another party) … …" generally appears after a statement by a victim.
(7) The person being informed of the supply
The evidence extraction rules for the interviewee to provide for the corresponding may include the interviewee's name, the interviewee to provide for the corresponding context. Wherein the context of the corresponding notice is a language unit indicating that before and after the notice is presented by the notice, for example, "XX (notice) talk", "XX (notice) presentation" and the like generally appear before the notice is presented by the notice; "XXX (another party) … …" generally appears after the interviewee has been presented.
(8) The person to be told dialect
The evidence extraction rules by which the interviewee dialects the corresponding may include the interviewee's name and the interviewee dialects the corresponding context. Wherein the interviewee dialects the corresponding context is a language unit indicating that the interviewee dialects before and after, for example, "XX (interviewee) dialect" etc. generally appears before the interviewee dialects; "XXX (another party) … …" generally appears after the interview by the interviewee.
(9) Identifying opinion
The evidence extraction rule corresponding to the authentication opinion may include a keyword corresponding to the authentication opinion. Wherein, the keywords corresponding to the identification opinion comprise 'identified', 'identified as follows', 'identified opinion as follows', and the like.
(10) Certification authority name
Evidence extraction rules corresponding to certification authority names may include keywords corresponding to certification authority names.
(11) Authentication certificate name
The evidence extraction rule corresponding to the authentication certificate name may include a keyword corresponding to the authentication certificate name and a symbol corresponding to the authentication certificate name. Wherein, the symbol corresponding to the identification certificate name comprises a signature number.
It should be noted that the above 11 types of evidence are only exemplary, and those skilled in the art may add other types of evidence according to experience and actual situations, which are not limited in particular.
In step 106, the extraction methods of the target evidence information are different for different types of evidence, and the extraction methods of the target evidence information corresponding to the different evidence types are described below.
(1) Book card
When the target evidence type is a certificate, the following steps can be adopted to extract the target evidence information: and extracting the certificate information from the target evidence expression text according to the keywords corresponding to the certificates and the symbols corresponding to the certificates.
(2) Pen list
When the target evidence type is a stroke, the following steps can be adopted to extract the target evidence information: and extracting the stroke information from the target evidence expression text according to the keywords corresponding to the stroke and the symbols corresponding to the stroke.
(3) Material evidence
When the target evidence type is a material evidence, the following steps can be adopted to extract the target evidence information: and extracting the material evidence information from the target evidence expression text according to the keywords corresponding to the material evidence.
(4) Electronic data
When the target evidence type is electronic data, the following steps can be adopted to extract the target evidence information: and extracting the electronic data information from the target evidence expression text according to the keywords corresponding to the electronic data and the symbols corresponding to the electronic data.
(5) Witness and witness
When the target evidence type is witness, the following steps can be adopted to extract target evidence information: and extracting the evidence proving information from the target evidence expression text according to the name of the witness and the context corresponding to the witness.
(6) Statement of victim
When the target evidence type is stated by the victim, the following steps can be adopted to extract the target evidence information: and extracting presentation information of the victim from the target evidence expression text according to the name of the victim and the context corresponding to the presentation of the victim.
(7) The person being informed of the supply
When the target evidence type is provided for the reported person, the following steps can be adopted to extract the target evidence information: and extracting the information of the advisee from the target evidence expression text according to the name of the advisee and the corresponding context of the advisee.
(8) The person to be told dialect
When the target evidence type is dialect of the reported person, the following steps can be adopted to extract the target evidence information: and extracting the information of the advisee from the target evidence expression text according to the name of the advisee and the context of the advisee.
(9) Identifying opinion
When the target evidence type is the identification opinion, the following steps can be adopted to extract the target evidence information: and extracting the identification opinion information from the target evidence expression text according to the keywords corresponding to the identification opinion.
(10) Certification authority name
When the target evidence type is the name of the appraisal institution, the following steps can be adopted to extract the target evidence information: and extracting the identification organization name from the identification opinion information according to the keyword corresponding to the identification organization name.
(11) Authentication certificate name
When the target evidence type is the identification certificate name, the following steps can be adopted to extract the target evidence information: and extracting the identification certificate name from the identification opinion information according to the keyword corresponding to the identification certificate name and the symbol corresponding to the identification certificate name.
Further, in consideration of extracting target evidence information, since there are a plurality of evidence extraction rules, evidence extracted by different evidence extraction rules may be identical. In view of the above problems, in the embodiments of the present application, a deduplication method may be used to screen out duplicate evidence.
Specifically, a target evidence extraction rule is adopted to extract a plurality of candidate target evidence information from a target evidence expression text; judging whether the position index of the first candidate target evidence information is the same as the position index of the second candidate target evidence information, if so, determining any candidate target evidence information of the first candidate target evidence information or the second candidate target evidence information as target evidence information; if the position index of the first candidate target evidence information is different from that of the second candidate target evidence information, judging whether the first candidate target evidence information and the second candidate target evidence information have content intersection, if the first candidate target evidence information and the second candidate target evidence information have content intersection, merging the first candidate target evidence information and the second candidate target evidence information to serve as target evidence information, and if the first candidate target evidence information and the second candidate target evidence information have no content intersection, determining the first candidate target evidence information and the second candidate target evidence information as target evidence information.
It should be noted that, there are various methods for judging whether the first candidate target evidence information and the second candidate target evidence information have content intersection, for example, a similarity judging method may be adopted, if the similarity between the first candidate target evidence information and the second candidate target evidence information is greater than a preset threshold, it may be determined that the first candidate target evidence information and the second candidate target evidence information have content intersection, otherwise, the first candidate target evidence information and the second candidate target evidence information do not have content intersection.
In order to more clearly describe the evidence information extraction method for the referee document provided by the embodiment of the present application, the following describes the integrity of the evidence information extraction method for the referee document with reference to fig. 2, which specifically includes the following steps:
step 201, acquiring referee document text.
Step 202, preprocessing the referee document text to obtain the preprocessed referee document text.
And 203, extracting at least one candidate crime fact text from the preprocessed referee document text according to the preset crime fact start keyword and the preset crime fact end keyword.
Step 204, judging whether the position index of the first candidate crime fact text is the same as the position index of the second candidate crime fact text, if the position index of the first candidate crime fact text is different from the position index of the second candidate crime fact text, executing step 205; otherwise, step 206 is performed.
In step 205, the first candidate crime fact text and the second candidate crime fact text are determined to be crime fact text.
And step 206, determining any one of the first candidate crime fact text or the second candidate crime fact text as the crime fact text.
Step 207, extracting at least one candidate evidence expression text from the preprocessed referee document text according to the preset evidence expression start keyword and the preset evidence expression end keyword.
Step 208, determining whether the position index of the first candidate evidence expression text is the same as the position index of the second candidate evidence expression text, and if the position index of the first candidate evidence expression text is different from the position index of the second candidate evidence expression text, executing step 209; otherwise, step 210 is performed.
In step 209, the first candidate evidence expression text and the second evidence expression text are determined as evidence expression text.
And step 210, determining any one of the first candidate evidence expression text or the second candidate evidence expression text as the evidence expression text.
Step 211, for any crime fact text, determining a target evidence expression text corresponding to the crime fact text according to the position index corresponding to the crime fact text and the position index corresponding to each evidence expression text.
And step 212, determining a target evidence extraction rule according to the target evidence type and the corresponding relation between the preset evidence type and the evidence extraction rule.
And step 213, extracting a plurality of candidate target evidence information from the target evidence expression text by adopting a target evidence extraction rule.
Step 214, determining whether the position index of the first candidate target evidence information is the same as the position index of the second candidate target evidence information, and if the position index of the first candidate target evidence information is the same as the position index of the second candidate target evidence information, executing step 215; otherwise, step 216 is performed.
Step 215, determining any one of the first candidate target evidence information or the second candidate target evidence information as target evidence information.
Step 216, judging whether the first candidate target evidence information and the second candidate target evidence information have content intersection, if the first candidate target evidence information and the second candidate target evidence information have content intersection, executing step 217; otherwise, step 218 is performed.
And step 217, merging the first candidate target evidence information and the second candidate target evidence information to serve as target evidence information.
Step 218, determining the first candidate object evidence information and the second candidate object evidence information as object evidence information.
On one hand, because the embodiment of the application adopts the target evidence extraction rule to extract the target evidence information, compared with the mode of extracting the evidence according to human experience in the prior art, the method is not easy to be influenced by human factors, and the accuracy of the acquired evidence information can be improved; furthermore, the embodiment of the application carries out the duplication removal processing on the crime fact text, the evidence expression text and the evidence information, thereby further improving the accuracy of the acquired evidence information. On the other hand, the method does not need to manually analyze the judge document, so that manpower resources are greatly saved, and the evidence extraction efficiency is improved.
In order to more visually describe the evidence information extraction method for referee documents provided by the embodiment of the present application, an exemplary description is made below in connection with a specific example.
One example of an aesthetically ascertained text is as follows:
"via trial find: 1. after the person to be told is somehow old and Shu Mou (arbitrated), peng Mou and Zhang Mou A (processed in a different mode) go to the three-water area of the city of Buddha, the person is told Wang Mou (minor) to rob a XXX brand XXXXX two-wheeled motorcycle after the person to be told is taken at 23.50 minutes on 9.2015, 18.18.9. Through identification, the motorcycle value Renminbi 2445 yuan.
The above facts are confirmed by the following evidence confirmed by court trial evidence and a court:
1. records of records and records of records. The source condition of the scheme is confirmed.
2. Statement of victim Wang Mou (minor) and vehicle sales information.
3. The counselor can know the statement and the dialect of the ancient certain of the counselor and recognize the strokes.
2. After being colluded with Shu Mou (arbitrated crimes) and Yang Mou C (case separation treatment), the told person goes to the front section of the gate of the XX town XX way XXXXX limited company in the three-water area of Buddha city, and robs a XX brand XXXXX type two-wheeled motorcycle (minor). After getting hands, the reported person somebody and Shu Mou hide the motorcycle in the shed of XX town XXXXX factory in the three water areas of Buddha city. Through identification, the motorcycle value RMB 2580 yuan. After breaking the case, the police officer takes back the motorcycle and gives the victim the result.
The fact is that the palace of the person to be told is not objectionable in the court trial process, and a record form and a filing decision form provided by a public complaint authority are provided, the statement and household materials of the person to be told are covered by the palace (minor), the statement and the dialect of the palace of the person to be told, the witness Shu Mou (minor), yang Mou C (minor), gui Mou B and Zhang Mou B are provided, the record form and the photo are recognized, the check work record is checked on site, the filing decision form, the filing list and the photo are provided, the statement of the property price of the person to be told, the bill of the evidence receiving material, the bill of the material of the forerunner are provided, and the evidence that the passing and the registering evidence is verified is enough to confirm the present house. "
From the above-mentioned trial finding text, two crime facts can be extracted, the first crime fact is "one, 2015, 9, 18, 23, 50 minutes, the reported person is ancient somewhere and Shu Mou (arbitrated), peng Mou, zhang Mou A (all processed in another case) goes to the three-water area of Buddha city, x town x department store, and then robs Wang Mou (minor) a XXX brand XXXX type two-wheeled motorcycle. Through identification, the motorcycle value Renminbi 2445 yuan. "
The second crime fact text is "two, 2015, 10 month, 4 day, 20, and the told somebody is colluded with Shu Mou (arbitrated crimes) and Yang Mou C (case division treatment) to the front section of the limited company of XXXXX in XX town XX in three-water area in Buddha city, and robs a certain (minor) XX brand XXXXX type motorcycle. After getting hands, the reported person somebody and Shu Mou hide the motorcycle in the shed of XX town XXXXX factory in the three water areas of Buddha city. Through identification, the motorcycle value RMB 2580 yuan. After breaking the case, the police officer takes back the motorcycle and gives the victim the result. "
From the above-mentioned trial finding text, two evidence expression texts can be extracted, and the first evidence expression text is "the fact, and the following evidence confirmed by court trial evidence and quality evidence is confirmed:
1. records of records and records of records. The source condition of the scheme is confirmed.
2. Statement of victim Wang Mou (minor) and vehicle sales information.
3. The counselor can know the statement and the dialect of the ancient certain of the counselor and recognize the strokes. "
The first evidence expression text is "the fact that the palace of the person to be reported is not inconsistent in the court trial process, and has a record list of the records and a record list of the records provided by a public agency, the statement and the household materials of the palace of the person to be reported are covered by (minor), the statement and the dialect of the palace of the person to be reported are confirmed, the dialect of the witness Shu Mou (minor), yang Mou C (minor), gui Mou B and Zhang Mou B, the record and the photo are identified, the work record is checked in the spot, the record list of the withholding, the record list of the records, the invoice, the bill list of the records, the record list of the records to be tested, the evidence confirmation of the records to be checked through and the household, and the like are accepted, and the stock is confirmed. "
Further, for the first crime fact text, the evidence extraction rule corresponding to each evidence type is adopted, so that the following evidence can be extracted from the first evidence expression text corresponding to the evidence extraction rule:
book evidence: the records of the records and the records of the records are set up.
And (3) writing: identification pen record of somebody old of person being told
Material evidence: vehicle sales information.
Statement by the victim: statement of victim Wang Mou (minor).
The interviewee supplies: the telling person can report something ancient.
The interviewee dialects: the interviewee can somehow resolve.
Similarly, using the same method, corresponding evidence may be extracted from a second evidence representation text corresponding to a second crime facts text, which will not be described in detail herein.
The following are examples of the apparatus of the present application that may be used to perform the method embodiments of the present application. For details not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the method of the present application.
Fig. 3 is a schematic structural diagram schematically illustrating an evidence information extraction apparatus for referee documents according to an embodiment of the present application. As shown in fig. 3, the device has a function of implementing the evidence information extraction method for referee documents, and the function can be implemented by hardware or by executing corresponding software by hardware. The apparatus may include: an acquisition unit 301 and a processing unit 302.
An acquiring unit 301 for acquiring a referee document text;
the processing unit 302 is configured to pre-process the referee document text to obtain a pre-processed referee document text; extracting a crime fact text from the preprocessed referee document text according to a preset crime fact starting keyword and a preset crime fact ending keyword; according to the preset evidence expression start keyword and the preset evidence expression end keyword, extracting a evidence expression text from the preprocessed referee document text; and determining a target evidence expression text corresponding to any crime fact text according to the position index corresponding to the crime fact text and the position index corresponding to each evidence expression text; determining a target evidence extraction rule according to the target evidence type and the corresponding relation between the preset evidence type and the evidence extraction rule; the evidence extraction rule is determined according to at least one of a keyword corresponding to the evidence type, a context corresponding to the evidence type and a symbol corresponding to the evidence type; and extracting target evidence information from the target evidence expression text by adopting the target evidence extraction rule.
Optionally, the processing unit 302 is specifically configured to:
extracting a plurality of candidate target evidence information from the target evidence expression text by adopting the target evidence extraction rule; judging whether the position index of the first candidate target evidence information is the same as the position index of the second candidate target evidence information, and if the position index of the first candidate target evidence information is the same as the position index of the second candidate target evidence information, determining any candidate target evidence information in the first candidate target evidence information or the second candidate target evidence information as target evidence information; and if the position index of the first candidate target evidence information is different from the position index of the second candidate target evidence information, judging whether the first candidate target evidence information and the second candidate target evidence information have content intersection or not; if the first candidate target evidence information and the second candidate target evidence information have content intersection, merging the first candidate target evidence information and the second candidate target evidence information to be used as the target evidence information; and if the first candidate target evidence information and the second candidate target evidence information have no content intersection, determining the first candidate target evidence information and the second candidate target evidence information as the target evidence information.
Optionally, the processing unit 302 is specifically configured to:
extracting at least one candidate crime fact text from the preprocessed referee document text according to a preset crime fact start keyword and a preset crime fact end keyword; and determining a first candidate crime fact text and a second candidate crime fact text as the crime fact text if a position index of the first candidate crime fact text is different from a position index of the second candidate crime fact text; and if the position index of the first candidate crime fact text is the same as the position index of the second candidate crime fact text, determining any one of the first candidate crime fact text or the second candidate crime fact text as the crime fact text;
wherein the first candidate crime fact text is any one of the at least one candidate crime fact text, and the second candidate crime fact text is any one of the at least one candidate crime fact text other than the first candidate crime fact text.
Optionally, the processing unit 302 is specifically configured to:
extracting at least one candidate evidence expression text from the preprocessed referee document text according to a preset evidence expression start keyword and a preset evidence expression end keyword; and if the position index of the first candidate evidence expression text is different from the position index of the second candidate evidence expression text, determining the first candidate evidence expression text and the second evidence expression text as the evidence expression text; and if the position index of the first candidate evidence expression text is the same as the position index of the second candidate evidence expression text, determining any one of the first candidate evidence expression text or the second candidate evidence expression text as the evidence expression text;
the first candidate evidence expression text is any one candidate evidence expression text in the at least one candidate evidence expression text, and the second candidate evidence expression text is any one candidate evidence expression text except the first candidate evidence expression text in the at least one candidate evidence expression text.
Optionally, the target evidence type is a physical evidence, and the physical evidence comprises at least one of a book certificate, a pen record, a physical evidence and electronic data;
when the target evidence type is a book, extracting target evidence information by adopting the following steps:
extracting the document information from the target evidence expression text according to the keywords corresponding to the document and the symbols corresponding to the document;
when the target evidence type is a stroke list, extracting target evidence information by adopting the following steps:
extracting the stroke information from the target evidence expression text according to the keywords corresponding to the stroke and the symbols corresponding to the stroke;
when the target evidence type is a material evidence, extracting target evidence information by adopting the following steps:
extracting material evidence information from the target evidence expression text according to the keywords corresponding to the material evidence;
when the target evidence type is electronic data, extracting target evidence information by adopting the following steps:
and extracting electronic data information from the target evidence expression text according to the keywords corresponding to the electronic data and the symbols corresponding to the electronic data.
Optionally, the target evidence type is linguistic evidence, the linguistic evidence including at least one of witness, victim statement, victim supply and victim debate;
When the target evidence type is witness, extracting target evidence information by adopting the following steps:
extracting evidence proving information from the target evidence expression text according to the name of the witness and the context corresponding to the witness;
when the target evidence type is stated by a victim, the following steps are adopted to extract target evidence information:
extracting presentation information of the victim from the target evidence expression text according to the name of the victim and the context corresponding to the presentation of the victim;
when the target evidence type is provided for the reported person, the following steps are adopted to extract target evidence information:
extracting the information of the advisee from the target evidence expression text according to the name of the advisee and the corresponding context of the advisee;
when the target evidence type is dialect of the reported person, the following steps are adopted to extract target evidence information:
and extracting the information of the dialer from the target evidence expression text according to the name of the dialer and the context of the dialer dialect.
Optionally, the target evidence type is authoritative evidence, the authoritative evidence including at least one of an authentication opinion, an authentication authority name, and an authentication certificate name;
When the target evidence type is the identification opinion, extracting target evidence information by adopting the following steps:
extracting identification opinion information from the target evidence expression text according to keywords corresponding to the identification opinion;
when the target evidence type is the name of the identification mechanism, extracting target evidence information by adopting the following steps:
extracting the name of the identification mechanism from the identification opinion information according to the keywords corresponding to the name of the identification mechanism;
when the target evidence type is the identification certificate name, extracting target evidence information by adopting the following steps:
and extracting the identification certificate name from the identification opinion information according to the keyword corresponding to the identification certificate name and the symbol corresponding to the identification certificate name.
Optionally, the processing unit 302 is specifically configured to:
performing symbol processing on the judge document text; according to the preset trial finding out the starting keywords and the preset trial finding out the ending keywords, extracting the trial finding out text from the processed referee document text; and taking the trial finding text as the preprocessed referee document text.
Optionally, the processing unit 302 is specifically configured to:
Converting the English symbols in the judge document text into Chinese symbols; converting a plurality of continuous line feed symbols in the referee document text into a line feed symbol; and removing the hollow grid symbol in the referee document text.
Fig. 4 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application. As shown in fig. 4, an electronic device provided in an embodiment of the present application includes: a memory 401 for storing program instructions; and a processor 402, configured to invoke and execute the program instructions in the memory, so as to implement the evidence information extraction method for referee documents according to the above embodiment.
In this embodiment, the processor 402 and the memory 401 may be connected by a bus or other means. The processor may be a general-purpose processor, such as a central processing unit, a digital signal processor, an application specific integrated circuit, or one or more integrated circuits configured to implement embodiments of the present application. The memory may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, a hard disk, or a solid state disk.
The embodiment of the application also provides a storage medium, wherein the storage medium stores a computer program, and when at least one processor of the evidence information extraction device for the referee document executes the computer program, the evidence information extraction device for the referee document executes the evidence information extraction method for the referee document.
The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a random-access memory (random access memory, RAM), or the like.
It will be apparent to those skilled in the art that the techniques of embodiments of the present application may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present application may be embodied in essence or what contributes to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.
The same or similar parts between the various embodiments in this specification are referred to each other. In particular, for the service building apparatus and the service loading apparatus embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description in the method embodiments for the matters.
The embodiments of the present application described above do not limit the scope of the present application.

Claims (9)

1. A method for evidence information extraction for referee documents, the method comprising:
acquiring a judge document text;
preprocessing the referee document text to obtain a preprocessed referee document text;
extracting a crime fact text from the preprocessed referee document text according to a preset crime fact starting keyword and a preset crime fact ending keyword;
according to the preset evidence expression start keywords and the preset evidence expression end keywords, extracting a evidence expression text from the preprocessed referee document text;
for any crime fact text, determining a target evidence expression text corresponding to the crime fact text according to a position index corresponding to the crime fact text and a position index corresponding to each evidence expression text;
determining a target evidence extraction rule according to the target evidence type and the corresponding relation between the preset evidence type and the evidence extraction rule; the evidence extraction rule is determined according to at least one of a keyword corresponding to the evidence type, a context corresponding to the evidence type and a symbol corresponding to the evidence type; and extracting target evidence information from the target evidence expression text by adopting the target evidence extraction rule, wherein the method comprises the following steps:
Extracting a plurality of candidate target evidence information from the target evidence expression text by adopting the target evidence extraction rule;
judging whether the position index of the first candidate target evidence information is the same as the position index of the second candidate target evidence information, if the position index of the first candidate target evidence information is the same as the position index I of the second candidate target evidence information, determining any candidate target evidence information in the first candidate target evidence information or the second candidate target evidence information as target evidence information;
if the position index of the first candidate target evidence information is different from the position index of the second candidate target evidence information, judging whether the first candidate target evidence information and the second candidate target evidence information have content intersection or not;
if the first candidate target evidence information and the second candidate target evidence information have content intersection, merging the first candidate target evidence information and the second candidate target evidence information and then taking the merged first candidate target evidence information and the second candidate target evidence information as the target evidence information;
and if the first candidate target evidence information and the second candidate target evidence information do not have content intersection, determining the first candidate target evidence information and the second candidate target evidence information as the target evidence information.
2. The method according to claim 1, wherein extracting the crime fact text from the preprocessed referee document text based on a preset crime fact start keyword and a preset crime fact end keyword, comprises:
extracting at least one candidate crime fact text from the preprocessed referee document text according to a preset crime fact start keyword and a preset crime fact end keyword;
determining a first candidate crime fact text and a second candidate crime fact text as the crime fact text if a position index of the first candidate crime fact text is different from a position index of the second candidate crime fact text;
if the position index of the first candidate crime fact text is the same as the position index of the second candidate crime fact text, determining any one of the first candidate crime fact text or the second candidate crime fact text as the crime fact text;
wherein the first candidate crime fact text is any one of the at least one candidate crime fact text, and the second candidate crime fact text is any one of the at least one candidate crime fact text other than the first candidate crime fact text.
3. The method of claim 1, wherein extracting the evidence expression text from the preprocessed referee document text based on a preset evidence expression start keyword and a preset evidence expression end keyword comprises:
extracting at least one candidate evidence expression text from the preprocessed referee document text according to a preset evidence expression start keyword and a preset evidence expression end keyword;
if the position index of the first candidate evidence expression text is different from that of the second candidate evidence expression text, determining the first candidate evidence expression text and the second candidate evidence expression text as the evidence expression text;
if the position index of the first candidate evidence expression text is the same as the position index of the second candidate evidence expression text, determining any one of the first candidate evidence expression text or the second candidate evidence expression text as the evidence expression text;
the first candidate evidence expression text is any one candidate evidence expression text in the at least one candidate evidence expression text, and the second candidate evidence expression text is any one candidate evidence expression text except the first candidate evidence expression text in the at least one candidate evidence expression text.
4. The method of claim 1, wherein the target evidence type is a physical evidence comprising at least one of a document, a transcript, a physical evidence, and electronic data;
when the target evidence type is a book, extracting target evidence information by adopting the following steps:
extracting the document information from the target evidence expression text according to the keywords corresponding to the document and the symbols corresponding to the document;
when the target evidence type is a stroke list, extracting target evidence information by adopting the following steps:
extracting the stroke information from the target evidence expression text according to the keywords corresponding to the stroke and the symbols corresponding to the stroke;
when the target evidence type is a material evidence, extracting target evidence information by adopting the following steps:
extracting material evidence information from the target evidence expression text according to the keywords corresponding to the material evidence;
when the target evidence type is electronic data, extracting target evidence information by adopting the following steps:
and extracting electronic data information from the target evidence expression text according to the keywords corresponding to the electronic data and the symbols corresponding to the electronic data.
5. The method of claim 1, wherein the target evidence type is linguistic evidence, the linguistic evidence comprising at least one of witness, victim statement, victim supply and victim dialect;
When the target evidence type is witness, extracting target evidence information by adopting the following steps:
extracting evidence proving information from the target evidence expression text according to the name of the witness and the context corresponding to the witness;
when the target evidence type is stated by a victim, the following steps are adopted to extract target evidence information:
extracting presentation information of the victim from the target evidence expression text according to the name of the victim and the context corresponding to the presentation of the victim;
when the target evidence type is provided for the reported person, the following steps are adopted to extract target evidence information:
extracting the information of the advisee from the target evidence expression text according to the name of the advisee and the corresponding context of the advisee;
when the target evidence type is dialect of the reported person, the following steps are adopted to extract target evidence information:
and extracting the information of the dialer from the target evidence expression text according to the name of the dialer and the context of the dialer dialect.
6. The method of claim 1, wherein the target evidence type is authoritative evidence comprising at least one of an authentication opinion, an authentication authority name, and an authentication certificate name;
When the target evidence type is the identification opinion, extracting target evidence information by adopting the following steps:
extracting identification opinion information from the target evidence expression text according to keywords corresponding to the identification opinion;
when the target evidence type is the name of the identification mechanism, extracting target evidence information by adopting the following steps:
extracting the name of the identification mechanism from the identification opinion information according to the keywords corresponding to the name of the identification mechanism;
when the target evidence type is the identification certificate name, extracting target evidence information by adopting the following steps:
and extracting the identification certificate name from the identification opinion information according to the keyword corresponding to the identification certificate name and the symbol corresponding to the identification certificate name.
7. The method of claim 1, wherein preprocessing the referee document text to obtain preprocessed referee document text comprises:
performing symbol processing on the judge document text;
according to a preset trial finding out starting keyword and a preset trial finding out ending keyword, extracting a trial finding text from the processed referee document text;
and taking the trial finding text as the preprocessed referee document text.
8. The method of claim 7, wherein the symbolizing the referee document text comprises:
converting the English symbols in the judge document text into Chinese symbols;
converting a plurality of continuous line feed symbols in the referee document text into a line feed symbol;
and removing the hollow grid symbols in the referee document text.
9. An evidence information extraction apparatus for referee documents, the apparatus comprising:
the acquiring unit is used for acquiring the judge document text;
the processing unit is used for preprocessing the referee document text to obtain a preprocessed referee document text; extracting a crime fact text from the preprocessed referee document text according to a preset crime fact starting keyword and a preset crime fact ending keyword; according to the preset evidence expression start keyword and the preset evidence expression end keyword, extracting a evidence expression text from the preprocessed referee document text; and determining a target evidence expression text corresponding to any crime fact text according to the position index corresponding to the crime fact text and the position index corresponding to each evidence expression text; determining a target evidence extraction rule according to the target evidence type and the corresponding relation between the preset evidence type and the evidence extraction rule; the evidence extraction rule is determined according to at least one of a keyword corresponding to the evidence type, a context corresponding to the evidence type and a symbol corresponding to the evidence type; and extracting target evidence information from the target evidence expression text by adopting the target evidence extraction rule, wherein the target evidence information comprises:
Extracting a plurality of candidate target evidence information from the target evidence expression text by adopting the target evidence extraction rule;
judging whether the position index of the first candidate target evidence information is the same as the position index of the second candidate target evidence information, if the position index of the first candidate target evidence information is the same as the position index of the second candidate target evidence information, determining any candidate target evidence information in the first candidate target evidence information or the second candidate target evidence information as target evidence information;
if the position index of the first candidate target evidence information is different from the position index of the second candidate target evidence information, judging whether the first candidate target evidence information and the second candidate target evidence information have content intersection or not;
if the first candidate target evidence information and the second candidate target evidence information have content intersection, merging the first candidate target evidence information and the second candidate target evidence information and then taking the merged first candidate target evidence information and the second candidate target evidence information as the target evidence information;
and if the first candidate target evidence information and the second candidate target evidence information do not have content intersection, determining the first candidate target evidence information and the second candidate target evidence information as the target evidence information.
CN202010886867.7A 2020-08-28 2020-08-28 Evidence information extraction method and device for referee document Active CN111950253B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010886867.7A CN111950253B (en) 2020-08-28 2020-08-28 Evidence information extraction method and device for referee document

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010886867.7A CN111950253B (en) 2020-08-28 2020-08-28 Evidence information extraction method and device for referee document

Publications (2)

Publication Number Publication Date
CN111950253A CN111950253A (en) 2020-11-17
CN111950253B true CN111950253B (en) 2023-12-08

Family

ID=73367557

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010886867.7A Active CN111950253B (en) 2020-08-28 2020-08-28 Evidence information extraction method and device for referee document

Country Status (1)

Country Link
CN (1) CN111950253B (en)

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013182338A (en) * 2012-02-29 2013-09-12 Ubic:Kk Document classification system and document classification method and document classification program
CN106650799A (en) * 2016-12-08 2017-05-10 重庆邮电大学 Electronic evidence classification extraction method and system
CN106815207A (en) * 2015-12-01 2017-06-09 北京国双科技有限公司 For the information processing method and device of law judgement document
CN107632968A (en) * 2017-05-22 2018-01-26 南京大学 A kind of construction method of chain of evidence relational model towards judgement document
CN108763485A (en) * 2018-05-25 2018-11-06 南京大学 A kind of chain of evidence relational model construction method of the judgement document based on text similarity
CN109359175A (en) * 2018-09-07 2019-02-19 平安科技(深圳)有限公司 Electronic device, the method for lawsuit data processing and storage medium
CN109472722A (en) * 2017-09-08 2019-03-15 北京国双科技有限公司 Obtain the method and device that judgement document to be generated finds out section relevant information through trying
CN110246063A (en) * 2018-03-09 2019-09-17 北京国双科技有限公司 A kind of method and device for guiding case trial
CN110457479A (en) * 2019-08-12 2019-11-15 贵州大学 A kind of judgement document's analysis method based on criminal offence chain
CN110516257A (en) * 2019-08-30 2019-11-29 贵州大学 It is a kind of based on Boundary Recognition and combined judgement document's evidence abstracting method
CN111078839A (en) * 2019-12-19 2020-04-28 广州佳都数据服务有限公司 Structured processing method and processing device for referee document
CN111104798A (en) * 2018-10-27 2020-05-05 北京智慧正安科技有限公司 Analysis method, system and computer readable storage medium for criminal plot in legal document
CN111222326A (en) * 2020-01-15 2020-06-02 中科鼎富(北京)科技发展有限公司 Information extraction method and device for referee document
KR20200065348A (en) * 2018-11-30 2020-06-09 한국과학기술원 Method and system for accelerating judgments of documents by clustering arguments and supporting evidence based on credibility distribution
CN111259645A (en) * 2020-01-15 2020-06-09 中科鼎富(北京)科技发展有限公司 Referee document structuring method and device
CN111259631A (en) * 2020-01-15 2020-06-09 中科鼎富(北京)科技发展有限公司 Referee document structuring method and device
CN111291548A (en) * 2020-02-12 2020-06-16 中科鼎富(北京)科技发展有限公司 Method and device for acquiring information from court documents
CN111310446A (en) * 2020-01-15 2020-06-19 中科鼎富(北京)科技发展有限公司 Information extraction method and device for referee document

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107391682B (en) * 2017-07-24 2020-06-09 京东方科技集团股份有限公司 Knowledge verification method, knowledge verification apparatus, and storage medium

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013182338A (en) * 2012-02-29 2013-09-12 Ubic:Kk Document classification system and document classification method and document classification program
CN106815207A (en) * 2015-12-01 2017-06-09 北京国双科技有限公司 For the information processing method and device of law judgement document
CN106650799A (en) * 2016-12-08 2017-05-10 重庆邮电大学 Electronic evidence classification extraction method and system
CN107632968A (en) * 2017-05-22 2018-01-26 南京大学 A kind of construction method of chain of evidence relational model towards judgement document
CN109472722A (en) * 2017-09-08 2019-03-15 北京国双科技有限公司 Obtain the method and device that judgement document to be generated finds out section relevant information through trying
CN110246063A (en) * 2018-03-09 2019-09-17 北京国双科技有限公司 A kind of method and device for guiding case trial
CN108763485A (en) * 2018-05-25 2018-11-06 南京大学 A kind of chain of evidence relational model construction method of the judgement document based on text similarity
CN109359175A (en) * 2018-09-07 2019-02-19 平安科技(深圳)有限公司 Electronic device, the method for lawsuit data processing and storage medium
CN111104798A (en) * 2018-10-27 2020-05-05 北京智慧正安科技有限公司 Analysis method, system and computer readable storage medium for criminal plot in legal document
KR20200065348A (en) * 2018-11-30 2020-06-09 한국과학기술원 Method and system for accelerating judgments of documents by clustering arguments and supporting evidence based on credibility distribution
CN110457479A (en) * 2019-08-12 2019-11-15 贵州大学 A kind of judgement document's analysis method based on criminal offence chain
CN110516257A (en) * 2019-08-30 2019-11-29 贵州大学 It is a kind of based on Boundary Recognition and combined judgement document's evidence abstracting method
CN111078839A (en) * 2019-12-19 2020-04-28 广州佳都数据服务有限公司 Structured processing method and processing device for referee document
CN111222326A (en) * 2020-01-15 2020-06-02 中科鼎富(北京)科技发展有限公司 Information extraction method and device for referee document
CN111259645A (en) * 2020-01-15 2020-06-09 中科鼎富(北京)科技发展有限公司 Referee document structuring method and device
CN111259631A (en) * 2020-01-15 2020-06-09 中科鼎富(北京)科技发展有限公司 Referee document structuring method and device
CN111310446A (en) * 2020-01-15 2020-06-19 中科鼎富(北京)科技发展有限公司 Information extraction method and device for referee document
CN111291548A (en) * 2020-02-12 2020-06-16 中科鼎富(北京)科技发展有限公司 Method and device for acquiring information from court documents

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Build Evidence Chain Relational Model Based on Chinese Judgment Documents;Kong Siyuan, et al.;Data Science;全文 *
基于边界识别与组合的裁判文书证据抽取方法研究;杨健;黄瑞章;丁志远;陈艳平;秦永彬;;中文信息学报(第03期);全文 *

Also Published As

Publication number Publication date
CN111950253A (en) 2020-11-17

Similar Documents

Publication Publication Date Title
US8468167B2 (en) Automatic data validation and correction
CN109767787B (en) Emotion recognition method, device and readable storage medium
US7739133B1 (en) System and method for processing insurance claims
CN111310446B (en) Information extraction method and device for judge document
CN111104798B (en) Resolution method, system and computer readable storage medium for sentencing episodes in legal documents
CN108153732B (en) Examination method and device for interrogation notes
US20030172030A1 (en) Payee match positive pay banking
CN110110325B (en) Repeated case searching method and device and computer readable storage medium
CN111222326A (en) Information extraction method and device for referee document
CN108897770A (en) A kind of law article name authority and case towards judgement document is by being associated with statistical method with law article
CN112328936A (en) Website identification method, device and equipment and computer readable storage medium
CN110941702A (en) Retrieval method and device for laws and regulations and laws and readable storage medium
CN111078839A (en) Structured processing method and processing device for referee document
CN111950253B (en) Evidence information extraction method and device for referee document
CN111259645A (en) Referee document structuring method and device
CN110955796B (en) Case feature information extraction method and device based on stroke information
CN109388648B (en) Method for extracting personnel information and relation person from electronic record
KR102101456B1 (en) Method for reducing false positives for diagnosis of personal information exposure of text files and irregular image files
CN110991352A (en) File data examination method and device
Hashimoto et al. Solving the Cherry-Picking Problemn in Legislative History Use-A Corpus-Based Approach for Empirical Intentionalist Legal Interpretation Analysis
CN110728582A (en) Information processing method, device, storage medium and processor
CN112597763A (en) Method and device for extracting and displaying judicial literature information in association manner and storage medium
CN112507709A (en) Document matching method, electronic device and storage device
CN111753538A (en) Method and device for extracting document elements of divorce dispute referee
CN110765263B (en) Display method and device for search cases

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant