CN110929497B - Method and device for determining document - Google Patents

Method and device for determining document Download PDF

Info

Publication number
CN110929497B
CN110929497B CN201811092823.6A CN201811092823A CN110929497B CN 110929497 B CN110929497 B CN 110929497B CN 201811092823 A CN201811092823 A CN 201811092823A CN 110929497 B CN110929497 B CN 110929497B
Authority
CN
China
Prior art keywords
referee document
document
referee
determining
matching rule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811092823.6A
Other languages
Chinese (zh)
Other versions
CN110929497A (en
Inventor
石鹏
赵健
陈春磊
赵耀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201811092823.6A priority Critical patent/CN110929497B/en
Publication of CN110929497A publication Critical patent/CN110929497A/en
Application granted granted Critical
Publication of CN110929497B publication Critical patent/CN110929497B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Tourism & Hospitality (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Technology Law (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a document determining method and a document determining device, which are used for acquiring at least one second referee document with the same element characteristics under the condition of determining the element characteristics corresponding to a first referee document, so that the second referee document different from the element characteristics of the first referee document can be filtered according to the element characteristics, and the second referee document similar to the first referee document can be determined from the second referee document with the same element characteristics of the first referee document. That is, the determined similar second referee document has the same element characteristics as the first referee document, and the element characteristics are contents capable of indicating the facts of the cases of the first referee document, which means that the determined similar second referee document is the same document as the facts of the cases of the first referee document, and the accuracy of the determined document can be improved compared with the existing manner of determining the similar second referee document according to the similarity.

Description

Method and device for determining document
Technical Field
The present invention relates to the field of information processing technologies, and in particular, to a method and an apparatus for determining a document.
Background
Along with the construction and perfection of the national corporate society, the requirements of China on judicial are higher and higher, and meanwhile, the accuracy of the generated judge document is higher, so that after the judge document of a certain case is generated, the judge document similar to the judge document is pushed, and thus, judges can determine whether the judge result given in the generated judge document, the legal regulations and the like are consistent with those given in the pushed judge document, the legal regulations and the like according to the pushed judge document, and the accuracy of the judge document is determined.
The current document determination method comprises the following steps: obtaining a second referee document (which can be regarded as a referee document generated earlier than the first referee document) identical to the case book according to the case book of the first referee document (which can be regarded as a referee document generated earlier than the first referee document), matching the full-text similarity between the first referee document and the second referee document to obtain the similarity between the first referee document and the second referee document, and determining N second referee documents similar to the first referee document according to the similarity between the first referee document and the second referee document, wherein N is a natural number greater than 1.
However, when the similarity is obtained, the method for determining the documents divides the first referee document and the second referee document, and the similarity is calculated by adopting the same weight for all the words obtained by the division, so that the similarity calculation mode can improve the influence of the words which are commonly used in the documents, such as court, judgment, thinking, trial finding and the like and are appeared in each document but are irrelevant to the case on the similarity, thereby reducing the accuracy of the calculated similarity and further reducing the accuracy of the second referee document obtained according to the similarity.
Disclosure of Invention
The present invention has been made in view of the above problems, and it is an object of the present invention to provide a document determining method and apparatus for improving the accuracy of a determined document, which overcomes or at least partially solves the above problems. The technical proposal is as follows:
the invention provides a document determination method, which comprises the following steps:
determining element characteristics corresponding to a first referee document, wherein the element characteristics are contents capable of indicating the case facts of the first referee document;
acquiring at least one second referee document having the same element characteristics as the first referee document;
From the at least one second referee document, a second referee document similar to the first referee document is determined.
Preferably, the determining the element features corresponding to the first referee document includes: determining a preset feature matching rule corresponding to a case list of the first referee document according to the case list of the first referee document, wherein the preset feature matching rule is obtained by summarizing contents related to case facts in a plurality of second referee documents which are the same as the case list of the first referee document, and each preset feature matching rule corresponds to an element feature;
judging whether the first referee document has the content conforming to the preset feature matching rule, if so, determining the element feature corresponding to the preset feature matching rule as the element feature corresponding to the first referee document.
Preferably, the method further comprises: determining each component part of the first referee document, wherein each component part corresponds to at least one preset feature matching rule, and the preset feature matching rule corresponding to any component part is obtained by summarizing the content related to the case facts in the component parts of a plurality of second referee documents which are the same as the first referee document;
The determining the element characteristics corresponding to the first referee document comprises the following steps: determining a preset feature matching rule corresponding to each component part in the first referee document according to the case of the first referee document;
for any component: judging whether the component part has the content which accords with the preset feature matching rule corresponding to the component part, if so, determining the element feature corresponding to the preset feature matching rule corresponding to the component part as the element feature corresponding to the component part;
and determining the element characteristics corresponding to the first referee document according to the element characteristics corresponding to each component.
Preferably, the determining the element feature corresponding to the first referee document according to the element feature corresponding to each component includes: and processing the element characteristics corresponding to each component part according to a preset element processing rule to obtain the element characteristics corresponding to the first referee document.
Preferably, said determining a second referee document similar to said first referee document from said at least one second referee document comprises: respectively calculating the similarity between each second referee document and the first referee document;
And determining a second referee document similar to the first referee document according to the similarity between each second referee document and the first referee document.
The invention also provides a document determining device, which comprises:
a first determining unit, configured to determine an element feature corresponding to a first referee document, where the element feature is a content capable of indicating a case fact of the first referee document;
an acquisition unit configured to acquire at least one second referee document having the same element characteristics as the first referee document;
and the second determining unit is used for determining a second referee document similar to the first referee document from the at least one second referee document.
Preferably, the first determining unit is configured to determine, according to a case law of the first referee document, a preset feature matching rule corresponding to the case law of the first referee document, determine whether the first referee document has content conforming to the preset feature matching rule, if so, determine an element feature corresponding to the preset feature matching rule as an element feature corresponding to the first referee document, where the preset feature matching rule is obtained by summarizing content related to case facts in a plurality of second referee documents identical to the case law of the first referee document, and each preset feature matching rule corresponds to an element feature.
Preferably, the apparatus further comprises: a third determining unit, configured to determine each component of the first referee document, where each component corresponds to at least one preset feature matching rule, and the preset feature matching rule corresponding to any component is obtained by summarizing content related to a case fact in the component of a plurality of second referee documents that are the same as the first referee document;
the first determining unit is configured to determine, according to the case of the first referee document, a preset feature matching rule corresponding to each component in the first referee document, for any component: judging whether the component part has the content which accords with the preset feature matching rule corresponding to the component part, if so, determining the element feature corresponding to the preset feature matching rule corresponding to the component part as the element feature corresponding to the component part, and determining the element feature corresponding to the first judge document according to the element feature corresponding to each component part.
The present invention also provides a storage medium having a program stored thereon, which when executed by a processor, implements the above-described document determination method.
The invention also provides a processor for running a program, wherein the program runs to execute the method for determining the document.
By means of the technical scheme, the document determining method and device provided by the invention acquire at least one second referee document with the same element characteristics under the condition that the element characteristics corresponding to the first referee document are determined, so that the second referee document different from the element characteristics of the first referee document can be filtered according to the element characteristics, and the second referee document similar to the first referee document can be determined from the second referee document with the same element characteristics as the first referee document. That is, the determined similar second referee document has the same element characteristics as the first referee document, and the element characteristics are contents capable of indicating the facts of the cases of the first referee document, which means that the determined similar second referee document is the same document as the facts of the cases of the first referee document, and the accuracy of the determined document can be improved compared with the existing method of determining similar second referee document according to similarity (such as similarity of text, word frequency, keyword matching, etc.).
The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
FIG. 1 illustrates a flowchart of a document determination method provided by an exemplary embodiment of the present disclosure;
FIG. 2 illustrates another flow chart of a document determination method provided by an exemplary embodiment of the present disclosure;
FIG. 3 is a schematic diagram showing a structure of a document determining apparatus provided in an exemplary embodiment of the present disclosure;
fig. 4 illustrates another structural diagram of a document determining apparatus provided in an exemplary embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Referring to fig. 1, a flowchart of a document determining method according to an exemplary embodiment of the present disclosure is shown, which is used to filter a part of a second referee document according to element features corresponding to a first referee document, so as to improve accuracy of the determined document, where the document determining method shown in fig. 1 may include the following steps:
101: and determining the element characteristics corresponding to the first referee document. It will be appreciated that: the first referee document is any generated referee document, and the document determining method provided by the embodiment is used for matching the first referee document with a similar second referee document, so as to verify at least the accuracy of the first referee document according to the similar second referee document, such as at least the accuracy of the referee result given in the first referee document and the legal rules according to the same.
The element features corresponding to the first referee document are content capable of indicating the case facts of the first referee document, so that the element features can indicate the case facts, all cases corresponding to the judicial field can be divided according to the cases related to the judicial field, each case corresponds to one type of case, for example, the infringement trademark case corresponds to the infringement trademark type case, the infringement patent case corresponds to the infringement patent case, the intentional injury case corresponds to the intentional injury case, and the like. For each case, by analyzing the content describing the case facts in the plurality of existing referee documents under the case, the important content capable of showing the uniqueness and the representativeness of the case is summarized, but the important content capable of showing the uniqueness and the representativeness of the case is related to the case facts (each case is different from the case facts under the case), so that the uniqueness and the representativeness of the case under the case can be regarded as the element characteristics of the case, and the element characteristics corresponding to the first referee document can be determined according to the case of the first referee document by analyzing the content describing the case facts in the plurality of existing referee documents under the case (such as the second referee document).
In this embodiment, one possible way to determine the element feature corresponding to the first referee document is: and determining a preset feature matching rule corresponding to the case law of the first referee document according to the case law of the first referee document, judging whether the first referee document has content conforming to the preset feature matching rule, and if so, determining the element feature corresponding to the preset feature matching rule as the element feature corresponding to the first referee document.
The preset feature matching rules are obtained by summarizing contents related to case facts in a plurality of second referee documents which are the same as the case version of the first referee document in advance, and each preset feature matching rule corresponds to one element feature. That is, any case list corresponding to at least one preset feature matching rule is preset, and under the condition that the case list of the first referee document is known, the preset feature matching rule corresponding to the case list of the first referee document can be determined according to the corresponding relation between the case list and the preset feature matching rule.
That is, in addition to summarizing the element features of the category, it is also necessary to summarize a preset feature matching rule for determining the element features, where the preset feature matching rule may indicate what the content of the case facts is in the referee document below, e.g. indicate at least one of keywords, keyword combinations and keyword word orders in the referee document below for indicating the case facts and a combination relationship between words indicating the case facts, for example, one expression form of the preset feature matching rule is: the regular expression is used for predefining keywords used for indicating the case facts and the combination relation of the keywords through the regular expression so as to form a rule character string, judging whether the first referee document contains the content conforming to the rule character string or not through the rule character string, and if so, summarizing the element characteristics corresponding to the rule character string (namely, a preset characteristic matching rule) from the first referee document, so that the element characteristics corresponding to the first referee document are determined as the element characteristics corresponding to the first referee document.
The following description will take a regular expression that can be used as a preset feature matching rule as an example, where the case corresponding to the first referee document is an intentional injury case, and one regular expression of the intentional injury case is: (cause|cause|victim|victim|thorn|genus) {0,20} light injury, { 0}, judging whether the first referee document contains the content conforming to the regular expression by the regular expression, if yes, determining the element characteristic corresponding to the regular expression as the element characteristic corresponding to the first referee document.
Figure BDA0001804793440000061
Figure BDA0001804793440000071
The corresponding counter for the first referee document is the theft counter, and one regular expression of the theft counter is as follows: (1|one year) {0,30} (three|many|number| [3-9] |four|five|six|seven|eight|nine) ten) times {0,30} (room |user| residence {0,30} (theft|theft|theft), the regular expression can also judge whether the first referee document contains the content conforming to the regular expression, if so, the element characteristic 'theft crime-three times of household theft in one year' corresponding to the regular expression is determined as the element characteristic corresponding to the first referee document.
Figure BDA0001804793440000072
The prosecution section is a part for recording the content of the case stated by the original notice in the judge document, and the home thinking section and the fact-recognizing section are parts for recognizing and summarizing the case facts in the judge document.
The points to be described here are: if the case is corresponding to at least one preset feature matching rule, determining an element feature by judging whether the first referee document has content conforming to the preset feature matching rule, and further regarding the element feature as the element feature of the first referee document; if the case is formed by corresponding to a plurality of preset feature matching rules, a plurality of element features can be determined by judging whether the first referee document has content conforming to the preset feature matching rules, and the element features can be regarded as the element features corresponding to the first referee document. Of course, in order to improve the accuracy of pushing, the plurality of element features may be regarded as initial element features corresponding to the first referee document, and the plurality of initial element features may be processed to obtain element features corresponding to the first referee document.
The possible ways in which the plurality of initial elemental features are processed are: and processing the plurality of initial element features according to a preset element processing rule, wherein the preset element processing rule is at least used for carrying out de-duplication processing and conflict processing on the plurality of initial element features, the de-duplication processing is to keep one same initial element feature when the same initial element feature exists in the plurality of initial element features, and the conflict processing is used for avoiding any one of a logic mutual exclusion relation and a inclusion relation in the plurality of initial element features.
The logical mutual exclusion relation refers to mutual exclusion of case facts indicated by a plurality of initial element features, for example, for a judge document of a motor vehicle traffic accident liability dispute, the plurality of initial element features are as follows: the interviewee has main responsibility, secondary responsibility and equal responsibility, and one interviewee can only have one responsibility in the motor vehicle traffic accident dispute, so if a plurality of initial element characteristics comprise one interviewee to have multiple responsibilities, the initial element characteristics are indicated to have a logical mutual exclusion relation, and one initial element characteristic needs to be selected from the initial element characteristics, for example, the interviewee is selected to have main responsibility; the inclusion relationship means that one case fact among case facts indicated by the plurality of initial element features includes another case fact, for example, the plurality of initial element features are as follows: the two initial element features of the dab 1 person and the dab 5 person are regarded as the inclusion relationship, and the dab 5 person includes the dab 1 person, for such initial element features having the inclusion relationship, the two initial element features may be combined, or the initial element feature that may include one initial element feature may be selected, and how to handle may depend on the actual application.
102: at least one second referee document having the same element characteristics as the first referee document is obtained to filter out second referee documents having different element characteristics from the first referee document by the element characteristics. And as can be seen from the above description of the regular expression as the preset feature matching rule, the element feature includes the information of the case-by-case, so that the second referee document having a different case-by-case from the first referee document can be filtered out by the element feature. If the information of the case is not included in the element characteristics, acquiring at least one second referee document having the same element characteristics as the first referee document means that: at least one second referee document having the same element characteristics and the same literacy as the first referee document is obtained.
For any second referee document, the process of determining the feature of the element corresponding to the second referee document may refer to the process of determining the feature of the element corresponding to the first referee document, which is not described in this embodiment. After determining the element features corresponding to any one of the second referee documents, comparing the element features corresponding to each of the second referee documents with the element features corresponding to the first referee document respectively, and determining at least one second referee document having the same element features as the first referee document, such as all the second referee documents having the same element features as the first referee document.
In this embodiment, any of the second referee documents may be a referee document located in a preset document library, which is a document library composed of preselected second referee documents, and the preset document library may include a second referee document of one type of case or a second referee document of a plurality of types of cases, which is not limited in this embodiment. When the second referee document comprises multiple types of desktops, the second referee document is stored in the preset document library in the form of the referee document and the desktops corresponding to the referee document.
103: from the at least one second referee document, a second referee document similar to the first referee document is determined. In this embodiment, the feasible ways to determine a second referee document similar to the first referee document are: and respectively calculating the similarity between each second referee document and the first referee document, and determining the second referee document similar to the first referee document according to the similarity between each second referee document and the first referee document.
Wherein the similarity between the second referee document and the first referee document is: the second referee document and the first referee document are subjected to full-text matching, and the similarity may be calculated by a cosine similarity algorithm, a TF-IDF (Term Frequency-inverse text Frequency index) algorithm, a distance correlation algorithm (e.g., euclidean distance algorithm, manhattan distance algorithm, etc.), a similarity coefficient algorithm, etc.
And according to the similarity between each second referee document and the first referee document, the feasible ways of determining the second referee document similar to the first referee document are as follows: the second referee documents with the similarity within the preset similarity range are determined to be second referee documents similar to the first referee documents, wherein the preset similarity range can be determined according to practical application, the embodiment is not limited, or the second referee documents are sorted according to the sequence of the similarity from big to small, N second referee documents with the sorting before the sorting of other second referee documents are selected, and N is a natural number greater than 1.
That is, in this embodiment, the determined second referee documents similar to the first referee document are not only the second referee documents with the same element features as the first referee document, but also the sorting of the second referee documents with the same element features as the first referee document, such as from large to small, is performed by introducing a similarity calculation method, such as TF-IDF algorithm, so that N second referee documents with sorting before the sorting of other second referee documents are selected, thereby improving the pushing accuracy.
In the case of obtaining a second referee document similar to the first referee document, the embodiment may further push the second referee document similar to the first referee document, and may further display the similar second referee document, so that a forensic person may view the similar second referee document.
As is apparent from the above-described technical solution, in the case where the element features corresponding to the first referee document are determined, at least one second referee document having the same element features as the first referee document is acquired, so that a second referee document different from the element features of the first referee document can be filtered out according to the element features, and a second referee document similar to the first referee document can be determined therefrom. That is, the determined similar second referee document has the same element characteristics as the first referee document, and the element characteristics are contents capable of indicating the facts of the cases of the first referee document, which means that the determined similar second referee document is the same document as the facts of the cases of the first referee document, and the accuracy of the determined document can be improved compared with the existing method of determining similar second referee document according to similarity (such as similarity of text, word frequency, keyword matching, etc.).
The points to be described here are: from the above illustration of regular expressions as preset feature matching rules, regular expressions are known: (cause|cause|victim|victim|thorn|genus) {0,20} light injury the corresponding referee document has the following components: the present institute considers the section and the fact-recognizing section, the regular expression: (1|one) year {0,30} (three|many|number| [3-9] |four|five|six|seven|eight|nine|ten) {0,30 }. The corresponding referee document (house |user| residence) {0,30} (theft|theft| theft) has the following components: the please section, the fact-recognizing section and the home-recognizing section, which means that the preset feature matching rule corresponds to the component in the referee document, based on which the present embodiment further provides a text pushing method shown in fig. 2, which may include the following steps:
201: determining each component of the first referee document, where each component of the first referee document is each independent of the first referee document in relation to the case trial, where each component may be determined by a pre-established document parsing system, such as each component including, but not limited to: the title section is a part of the first referee document in which the corresponding counter of the first referee document is recorded, the court section is a part of the first referee document in which the court information of the case trial is recorded, the anti-dialect section is a part of the first referee document in which the statement content of the reported party is recorded, the fact-identifying section and the decision following section is a part of the first referee document in which the decision content of the court is recorded.
For each of the above components, each component corresponds to at least one preset feature matching rule, and the preset feature matching rule corresponding to any component is obtained by summarizing the content related to the case facts in the component of the same plurality of second referees documents as the case of the first referee document, and the detailed description will refer to the above method embodiment, which will not be described again.
202: and determining a preset feature matching rule corresponding to each component part in the first referee document according to the scheme of the first referee document. The method comprises the steps of determining a preset feature matching rule corresponding to each component part in a first referee document according to the corresponding relation between the case root and the preset feature matching rule under the condition that the case root of the first referee document is known.
203: for any component: judging whether the component part has the content which accords with the preset feature matching rule corresponding to the component part, if so, determining the element feature corresponding to the preset feature matching rule corresponding to the component part as the element feature corresponding to the component part.
The process of determining whether the component has a content that meets the preset feature matching rule corresponding to the component is similar to the process of determining whether the first referee document has a content that meets the preset feature matching rule, which is not described in this embodiment. And it is to be noted that: the content recorded by each component of the first referee document may be irrelevant to the case facts, such as the title section and the court section, so that preset feature matching rules are not preset for the components irrelevant to the case facts, so that the preset feature matching rules corresponding to the determined components irrelevant to the case facts are null, and further, the element features corresponding to the components irrelevant to the case facts are null. If the component part does not have the content which accords with the corresponding preset feature matching rule, the component part also looks like the corresponding element feature of the component part is empty or can output prompt information to prompt the manual determination of the element feature.
204: and determining the element characteristics corresponding to the first referee document according to the element characteristics corresponding to each component. One possible way is: the element features corresponding to each component part are processed according to a preset element processing rule to obtain element features corresponding to the first referee document, where the preset element processing rule is at least used to perform duplication removal processing and conflict processing on the element features corresponding to each component part, and specific reference is made to the related description of duplication removal processing and conflict processing on multiple initial element features in the above method embodiment, which is not described in this embodiment.
205: at least one second referee document having the same elemental signature as the first referee document is obtained.
206: from the at least one second referee document, a second referee document similar to the first referee document is determined.
In this embodiment, step 205 and step 206: the same as the above-described steps 102 and 103 is not explained in this embodiment. In the case that a second referee document similar to the first referee document is determined, the determined second referee document may also be pushed.
According to the technical scheme, under the condition that each component part of the first referee document is determined, the element characteristics corresponding to the first referee document are determined according to the element characteristics corresponding to each component part of the first referee document, the purpose of determining the element characteristics corresponding to the first referee document based on the content of the case facts described in each component part is achieved, the accuracy of the element characteristics is improved, and the accuracy of the second referee document matched based on the element characteristics is further improved.
Corresponding to the above method embodiment, an exemplary embodiment of the present disclosure further provides a document determining apparatus, whose structure is shown in fig. 3, may include: a first determination unit 11, an acquisition unit 12, and a second determination unit 13.
The first determining unit 11 is configured to determine an element feature corresponding to the first referee document, where the element feature is a content capable of indicating a case fact of the first referee document, so that the element feature may indicate the case fact, and all case followers corresponding to the judicial field may be classified according to each case related to the judicial field, each case is corresponding to one type of case, and for each case follower, by analyzing the content describing the case fact in a plurality of existing referees documents under the case follower, an important content capable of showing the uniqueness and representativeness of the case follower, and an important content related to the case fact (each case follower is different) can be summarized, so that the uniqueness and representativeness of the case follower under the general referee document may be regarded as the element feature of the case follower, and by analyzing the corresponding element feature of each case follower (such as in the first referee document) under the case follower, so that the feature is determined by the first referee document.
In the present embodiment, one possible way for the first determination unit 11 to determine the element characteristics is: and determining a preset feature matching rule corresponding to the case law of the first referee document according to the case law of the first referee document, judging whether the first referee document has content conforming to the preset feature matching rule, and if so, determining the element feature corresponding to the preset feature matching rule as the element feature corresponding to the first referee document.
The preset feature matching rules are obtained by summarizing contents related to case facts in the same plurality of second referees documents of the case-by-case of the first referee document in advance, and each preset feature matching rule corresponds to an element feature, and specific description and examples refer to related description in the method embodiment, and this embodiment will not be described.
The obtaining unit 12 is configured to obtain at least one second referee document having the same element characteristics as the first referee document, so as to filter out, by using the element characteristics, the second referee document having different element characteristics from the first referee document, and for a specific description, refer to the related description in the method embodiment, which will not be described in this embodiment.
A second determining unit 13 for determining a second referee document similar to the first referee document from the at least one second referee document. In this embodiment, the feasible ways to determine a second referee document similar to the first referee document are: the similarity between each second referee document and the first referee document is calculated respectively, and the second referee document similar to the first referee document is determined according to the similarity between each second referee document and the first referee document, and the specific description refers to the related description in the method embodiment, which is not described in this embodiment.
In the case of obtaining a second referee document similar to the first referee document, the embodiment may further push the second referee document similar to the first referee document, and may further display the similar second referee document, so that a forensic person may view the similar second referee document.
As is apparent from the above-described technical solution, in the case where the element features corresponding to the first referee document are determined, at least one second referee document having the same element features as the first referee document is acquired, so that a second referee document different from the element features of the first referee document can be filtered out according to the element features, and a second referee document similar to the first referee document can be determined therefrom. That is, the determined similar second referee document has the same element characteristics as the first referee document, and the element characteristics are contents capable of indicating the facts of the cases of the first referee document, which means that the determined similar second referee document is the same document as the facts of the cases of the first referee document, and the accuracy of the determined document can be improved compared with the existing method of determining similar second referee document according to similarity (such as similarity of text, word frequency, keyword matching, etc.).
Referring to fig. 4, another structure of the document determining apparatus provided in the exemplary embodiment of the present disclosure may further include, on the basis of fig. 3: the third determining unit 10 is configured to determine each component of the first referee document, where each component corresponds to at least one preset feature matching rule, and the preset feature matching rule corresponding to any component is summarized from content related to a case fact in the component of the second referee document that is the same as the first referee document, and the detailed description refers to the above method embodiment, which is not described again.
The corresponding first determining unit 11 is configured to determine, according to the case of the first referee document, a preset feature matching rule corresponding to each component in the first referee document, for any component: judging whether the component part has the content which accords with the preset feature matching rule corresponding to the component part, if so, determining the element feature corresponding to the preset feature matching rule corresponding to the component part as the element feature corresponding to the component part, and determining the element feature corresponding to the first judge document according to the element feature corresponding to each component part, wherein the detailed description is omitted from the embodiment of the method.
The first determining unit 11 determines, according to the element feature corresponding to each component, that the feasible manner of the element feature corresponding to the first referee document is: the element features corresponding to each component part are processed according to a preset element processing rule to obtain element features corresponding to the first referee document, where the preset element processing rule is at least used to perform duplication removal processing and conflict processing on the element features corresponding to each component part, and specific reference is made to the related description of duplication removal processing and conflict processing on multiple initial element features in the above method embodiment, which is not described in this embodiment.
According to the technical scheme, under the condition that each component part of the first referee document is determined, the element characteristics corresponding to the first referee document are determined according to the element characteristics corresponding to each component part of the first referee document, the purpose of determining the element characteristics corresponding to the first referee document based on the content of the case facts described in each component part is achieved, the accuracy of the element characteristics is improved, and the accuracy of the second referee document matched based on the element characteristics is further improved.
The document determining device comprises a processor and a memory, wherein the first determining unit, the acquiring unit, the second determining unit, the third determining unit and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor includes a kernel, and the kernel fetches the corresponding program unit from the memory. The kernel can be provided with one or more than one, and the accuracy of the determined document is improved by adjusting the kernel parameters.
The memory may include volatile memory, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flashRAM), among other forms in computer readable media, the memory including at least one memory chip.
The embodiment of the invention provides a storage medium on which a program is stored, which when executed by a processor, implements the document determination method.
The embodiment of the invention provides a processor which is used for running a program, wherein the program runs to execute the document determination method.
The embodiment of the invention provides equipment, which comprises a processor, a memory and a program stored in the memory and capable of running on the processor, wherein the processor realizes the following steps when executing the program:
determining element characteristics corresponding to a first referee document, wherein the element characteristics are contents capable of indicating the case facts of the first referee document;
acquiring at least one second referee document having the same element characteristics as the first referee document;
From the at least one second referee document, a second referee document similar to the first referee document is determined.
Preferably, the determining the element features corresponding to the first referee document includes: determining a preset feature matching rule corresponding to a case list of the first referee document according to the case list of the first referee document, wherein the preset feature matching rule is obtained by summarizing contents related to case facts in a plurality of second referee documents which are the same as the case list of the first referee document, and each preset feature matching rule corresponds to an element feature;
judging whether the first referee document has the content conforming to the preset feature matching rule, if so, determining the element feature corresponding to the preset feature matching rule as the element feature corresponding to the first referee document.
Preferably, the processor when executing the program further implements the steps of: determining each component part of the first referee document, wherein each component part corresponds to at least one preset feature matching rule, and the preset feature matching rule corresponding to any component part is obtained by summarizing the content related to the case facts in the component parts of a plurality of second referee documents which are the same as the first referee document;
The determining the element characteristics corresponding to the first referee document comprises the following steps: determining a preset feature matching rule corresponding to each component part in the first referee document according to the case of the first referee document;
for any component: judging whether the component part has the content which accords with the preset feature matching rule corresponding to the component part, if so, determining the element feature corresponding to the preset feature matching rule corresponding to the component part as the element feature corresponding to the component part;
and determining the element characteristics corresponding to the first referee document according to the element characteristics corresponding to each component.
Preferably, the determining the element feature corresponding to the first referee document according to the element feature corresponding to each component includes: and processing the element characteristics corresponding to each component part according to a preset element processing rule to obtain the element characteristics corresponding to the first referee document.
Preferably, said determining a second referee document similar to said first referee document from said at least one second referee document comprises: respectively calculating the similarity between each second referee document and the first referee document;
And determining a second referee document similar to the first referee document according to the similarity between each second referee document and the first referee document.
The device herein may be a server, PC, PAD, cell phone, etc.
The present application also provides a computer program product adapted to perform, when executed on a data processing device, a program initialized with the method steps of:
determining element characteristics corresponding to a first referee document, wherein the element characteristics are contents capable of indicating the case facts of the first referee document;
acquiring at least one second referee document having the same element characteristics as the first referee document;
from the at least one second referee document, a second referee document similar to the first referee document is determined.
Preferably, the determining the element features corresponding to the first referee document includes: determining a preset feature matching rule corresponding to a case list of the first referee document according to the case list of the first referee document, wherein the preset feature matching rule is obtained by summarizing contents related to case facts in a plurality of second referee documents which are the same as the case list of the first referee document, and each preset feature matching rule corresponds to an element feature;
Judging whether the first referee document has the content conforming to the preset feature matching rule, if so, determining the element feature corresponding to the preset feature matching rule as the element feature corresponding to the first referee document.
Preferably, the program is further adapted to perform, when executed on a data processing device, an initialization program having the following method steps: determining each component part of the first referee document, wherein each component part corresponds to at least one preset feature matching rule, and the preset feature matching rule corresponding to any component part is obtained by summarizing the content related to the case facts in the component parts of a plurality of second referee documents which are the same as the first referee document;
the determining the element characteristics corresponding to the first referee document comprises the following steps: determining a preset feature matching rule corresponding to each component part in the first referee document according to the case of the first referee document;
for any component: judging whether the component part has the content which accords with the preset feature matching rule corresponding to the component part, if so, determining the element feature corresponding to the preset feature matching rule corresponding to the component part as the element feature corresponding to the component part;
And determining the element characteristics corresponding to the first referee document according to the element characteristics corresponding to each component.
Preferably, the determining the element feature corresponding to the first referee document according to the element feature corresponding to each component includes: and processing the element characteristics corresponding to each component part according to a preset element processing rule to obtain the element characteristics corresponding to the first referee document.
Preferably, said determining a second referee document similar to said first referee document from said at least one second referee document comprises: respectively calculating the similarity between each second referee document and the first referee document;
and determining a second referee document similar to the first referee document according to the similarity between each second referee document and the first referee document.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash memory (flashRAM). Memory is an example of a computer-readable medium.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims (8)

1. A method of document determination, the method comprising:
determining a preset feature matching rule corresponding to a case law of a first referee document according to the case law of the first referee document, wherein the preset feature matching rule is obtained by summarizing contents related to case facts in the same plurality of second referee documents of the case law of the first referee document, and each preset feature matching rule corresponds to an element feature, and the preset feature matching rule indicates at least one of keywords, keyword combinations and keyword word sequences for indicating the case facts and a combination relationship among words indicating the case facts;
judging whether the first referee document has content conforming to the preset feature matching rule, if so, determining the element feature corresponding to the preset feature matching rule as the element feature corresponding to the first referee document, wherein the element feature is the content capable of indicating the case facts of the first referee document;
acquiring at least one second referee document having the same element characteristics as the first referee document;
from the at least one second referee document, a second referee document similar to the first referee document is determined.
2. The method according to claim 1, wherein the method further comprises: determining each component part of the first referee document, wherein each component part corresponds to at least one preset feature matching rule, and the preset feature matching rule corresponding to any component part is obtained by summarizing the content related to the case facts in the component parts of a plurality of second referee documents which are the same as the first referee document;
the determining the element characteristics corresponding to the first referee document comprises the following steps: determining a preset feature matching rule corresponding to each component part in the first referee document according to the case of the first referee document;
for any component: judging whether the component part has the content which accords with the preset feature matching rule corresponding to the component part, if so, determining the element feature corresponding to the preset feature matching rule corresponding to the component part as the element feature corresponding to the component part;
and determining the element characteristics corresponding to the first referee document according to the element characteristics corresponding to each component.
3. The method according to claim 2, wherein determining the element feature corresponding to the first referee document according to the element feature corresponding to each component comprises: and processing the element characteristics corresponding to each component part according to a preset element processing rule to obtain the element characteristics corresponding to the first referee document.
4. The method of claim 1, wherein said determining a second referee document similar to said first referee document from said at least one second referee document comprises: respectively calculating the similarity between each second referee document and the first referee document;
and determining a second referee document similar to the first referee document according to the similarity between each second referee document and the first referee document.
5. A document determining apparatus, the apparatus comprising:
a first determining unit, configured to determine an element feature corresponding to a first referee document, where the element feature is a content capable of indicating a case fact of the first referee document;
an acquisition unit configured to acquire at least one second referee document having the same element characteristics as the first referee document;
a second determining unit configured to determine a second referee document similar to the first referee document from the at least one second referee document;
the first determining unit is configured to determine, according to a case law of the first referee document, a preset feature matching rule corresponding to the case law of the first referee document, and determine whether the first referee document has content conforming to the preset feature matching rule, if so, determine an element feature corresponding to the preset feature matching rule as an element feature corresponding to the first referee document, where the preset feature matching rule is obtained by summarizing content related to a case fact in the same plurality of second referee documents as the case of the first referee document, and each preset feature matching rule corresponds to an element feature, and the preset feature matching rule indicates a keyword, a keyword combination, and at least one keyword word sequence for indicating the case fact, and a combination relationship between words for indicating the case fact.
6. The apparatus of claim 5, wherein the apparatus further comprises: a third determining unit, configured to determine each component of the first referee document, where each component corresponds to at least one preset feature matching rule, and the preset feature matching rule corresponding to any component is obtained by summarizing content related to a case fact in the component of a plurality of second referee documents that are the same as the first referee document;
the first determining unit is configured to determine, according to the case of the first referee document, a preset feature matching rule corresponding to each component in the first referee document, for any component: judging whether the component part has the content which accords with the preset feature matching rule corresponding to the component part, if so, determining the element feature corresponding to the preset feature matching rule corresponding to the component part as the element feature corresponding to the component part, and determining the element feature corresponding to the first judge document according to the element feature corresponding to each component part.
7. A storage medium having a program stored thereon, which when executed by a processor, implements the document determination method according to any one of claims 1 to 4.
8. A processor, characterized in that the processor is adapted to run a program, wherein the program when run performs the document determination method according to any one of claims 1 to 4.
CN201811092823.6A 2018-09-19 2018-09-19 Method and device for determining document Active CN110929497B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811092823.6A CN110929497B (en) 2018-09-19 2018-09-19 Method and device for determining document

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811092823.6A CN110929497B (en) 2018-09-19 2018-09-19 Method and device for determining document

Publications (2)

Publication Number Publication Date
CN110929497A CN110929497A (en) 2020-03-27
CN110929497B true CN110929497B (en) 2023-07-07

Family

ID=69855157

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811092823.6A Active CN110929497B (en) 2018-09-19 2018-09-19 Method and device for determining document

Country Status (1)

Country Link
CN (1) CN110929497B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507079B (en) * 2020-12-15 2023-01-17 科大讯飞股份有限公司 Document case situation matching method, device, equipment and storage medium
CN114510593A (en) * 2021-12-28 2022-05-17 上海联数物联网有限公司 Case similarity reminding method and system, storage medium and terminal
CN114996400A (en) * 2022-05-26 2022-09-02 平安银行股份有限公司 Referee document processing method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330071A (en) * 2017-06-30 2017-11-07 北京神州泰岳软件股份有限公司 A kind of legal advice information intelligent replies method and platform
CN107590131A (en) * 2017-10-16 2018-01-16 北京神州泰岳软件股份有限公司 A kind of specification document processing method, apparatus and system
CN108009299A (en) * 2017-12-28 2018-05-08 北京市律典通科技有限公司 Law tries method and device for business processing
CN108038091A (en) * 2017-10-30 2018-05-15 上海思贤信息技术股份有限公司 A kind of similar calculating of judgement document's case based on figure and search method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015084404A1 (en) * 2013-12-06 2015-06-11 Hewlett-Packard Development Company, L.P. Matching of an input document to documents in a document collection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107330071A (en) * 2017-06-30 2017-11-07 北京神州泰岳软件股份有限公司 A kind of legal advice information intelligent replies method and platform
CN107590131A (en) * 2017-10-16 2018-01-16 北京神州泰岳软件股份有限公司 A kind of specification document processing method, apparatus and system
CN108038091A (en) * 2017-10-30 2018-05-15 上海思贤信息技术股份有限公司 A kind of similar calculating of judgement document's case based on figure and search method and system
CN108009299A (en) * 2017-12-28 2018-05-08 北京市律典通科技有限公司 Law tries method and device for business processing

Also Published As

Publication number Publication date
CN110929497A (en) 2020-03-27

Similar Documents

Publication Publication Date Title
WO2018113498A1 (en) Method and apparatus for retrieving legal knowledge
CN106649346B (en) Data repeatability checking method and device
CN110929497B (en) Method and device for determining document
CN108694178B (en) Method and device for recommending judicial knowledge
CN110019669B (en) Text retrieval method and device
US20070106405A1 (en) Method and system to provide reference data for identification of digital content
CN105975459B (en) A kind of the weight mask method and device of lexical item
CN110019785B (en) Text classification method and device
US20110320442A1 (en) Systems and Methods for Semantics Based Domain Independent Faceted Navigation Over Documents
US20200272674A1 (en) Method and apparatus for recommending entity, electronic device and computer readable medium
CN112434167B (en) Information identification method and device
Amato et al. Deep permutations: deep convolutional neural networks and permutation-based indexing
CN107368489B (en) Information data processing method and device
CN111125086A (en) Method, device, storage medium and processor for acquiring data resources
CN103714118A (en) Book cross-reading method
CN108228612B (en) Method and device for extracting network event keywords and emotional tendency
CN107784110A (en) A kind of index establishing method and device
CN110969022A (en) Semantic determination method and related equipment
WO2020063524A1 (en) Method and system for determining legal instrument
US10147095B2 (en) Chain understanding in search
CN110032721A (en) A kind of judgement document's method for pushing and device
CN115017267A (en) Unsupervised semantic retrieval method and device and computer readable storage medium
CN110019670A (en) A kind of text searching method and device
CN105354182A (en) Method for obtaining related digital resources and method and apparatus for generating special topic by using method
CN110895703B (en) Legal document case recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant