CN110990523A - Legal document determining method and system - Google Patents

Legal document determining method and system Download PDF

Info

Publication number
CN110990523A
CN110990523A CN201811161204.8A CN201811161204A CN110990523A CN 110990523 A CN110990523 A CN 110990523A CN 201811161204 A CN201811161204 A CN 201811161204A CN 110990523 A CN110990523 A CN 110990523A
Authority
CN
China
Prior art keywords
legal document
vector
legal
network model
word segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811161204.8A
Other languages
Chinese (zh)
Inventor
戴威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201811161204.8A priority Critical patent/CN110990523A/en
Priority to PCT/CN2019/107255 priority patent/WO2020063524A1/en
Publication of CN110990523A publication Critical patent/CN110990523A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Primary Health Care (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Databases & Information Systems (AREA)
  • Technology Law (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a legal document determining method and system, which are used for performing word segmentation processing on acquired text information of a first legal document to obtain word segmentation data of the first document, inputting the obtained word segmentation data into a network model obtained by pre-training for processing to obtain a first representation vector of the first legal document, acquiring a second representation vector of each second legal document in a legal document set, and determining a second legal document corresponding to the first legal document based on the first representation vector and the second representation vector of the second legal document. With the legal document determination method disclosed above, the second legal document corresponding to the first legal document is determined based on the first characterization vector and the second characterization vector of each second legal document through the above-described process. Thereby helping the legal personnel to quickly find the case documents similar to the case in the legal library.

Description

Legal document determining method and system
Technical Field
The invention relates to the technical field of deep learning, in particular to a method and a system for determining legal documents.
Background
With the development of modern society, law is one of the products in the development process of civilized society. Law is generally a specific behavior rule which is set by a social approved national validation legislation and has general constraint on all members of the society, and the national mandatory guarantees define the rights and obligations of parties as contents. When disputes occur among the members of the society, the judicial authorities carry out official working and adjudication according to laws.
When someone violates the law, in order to ensure fairness, court judgment must be passed. Sometimes, judges need to look at the judgment of similar cases as reference in order to make a more fair judgment. In the prior art, for searching similar cases, the key words, categories and other factors of the cases can only be manually extracted, and then the similar cases are entered into a database to be compared one by one to judge whether the cases in the database are similar to the cases, and further the similar cases are selected.
However, because the number of cases in the case inventory is large, when a legal person finds a case similar to the present case in the case inventory, the method wastes a lot of time, and it is difficult to find a case with high similarity.
Disclosure of Invention
In view of this, the embodiments of the present invention provide a method and an apparatus for determining a legal document, so as to achieve the purpose of quickly querying and acquiring cases corresponding to the present disclosure.
In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:
the invention discloses a method for determining a legal document in a first aspect, which comprises the following steps:
performing word segmentation processing on the acquired text information of the first legal document to obtain first text word segmentation data;
obtaining a first characterization vector of the first legal document based on the first text word segmentation data by using a network model obtained by training in advance, wherein the network model is obtained by fusing a cyclic attention network RAM Net and a capsule network model with a neural network model;
acquiring a second characterization vector of each second legal document in the legal document set;
determining a second legal document corresponding to the first legal document based on the first characterization vector and the second characterization vector of each second legal document;
the network model is used for obtaining a representation vector representing cases corresponding to corresponding legal documents according to case elements in the first text participle data.
Preferably, the obtaining a first characterization vector of the first legal document based on the first text segmentation data by using a network model obtained by pre-training includes:
mapping the text word segmentation data to a word vector model for word vector processing to obtain a word vector, wherein the word vector model is an input layer of the network model;
and sequentially inputting the word vectors into a capsule network model and a RAM Net network model for processing to obtain a vector output by the RAM Net network model as a first representation vector of the first legal document.
Preferably, the obtaining the second characterization vector of each second legal document in the legal document set includes:
acquiring text information of all second legal documents in the legal document set, and performing word segmentation processing on the text information of each second legal document to obtain second text word segmentation data;
and inputting the second text word segmentation data into a network model obtained by pre-training for processing to obtain a second characterization vector corresponding to each second legal document.
Preferably, the determining the second legal document corresponding to the first legal document based on the first characterization vector and the second characterization vector of each second legal document includes:
calculating a distance value between the first characterization vector and a second characterization vector of each second legal document to obtain a distance value between the first characterization vector and each second characterization vector;
and determining a second legal document corresponding to the first legal document according to the corresponding relation between the distance and the similarity, wherein the corresponding relation between the distance and the similarity is that the smaller the distance value is, the higher the similarity is.
The second aspect of the present invention discloses a system for determining a legal document, comprising:
the first processing unit is used for performing word segmentation processing on the acquired text information of the first legal document to obtain text word segmentation data, and obtaining a first representation vector of the first legal document based on the first text word segmentation data by utilizing a network model obtained through pre-training, wherein the network model is obtained by fusing a circulating attention network RAM Net and a capsule network model with a neural network model, and the network model is used for obtaining the representation vector representing the case corresponding to the corresponding legal document according to the case elements in the first text word segmentation data;
the second processing unit is used for acquiring a second characterization vector of each second legal document in the legal document set;
a determining unit, configured to determine the second legal document corresponding to the first legal document based on the first characterization vector and the second characterization vector.
Preferably, the first processing unit includes:
the text processing module is used for mapping the text word segmentation data to a word vector model for word vector processing to obtain a word vector, and the word vector model is an input layer of the network model;
and the characteristic vector processing module is used for sequentially inputting the word vectors into the capsule network and the RAM Net network model for processing to obtain a vector output by the RAM Net network model as a first characteristic vector of the first legal document.
Preferably, the second processing unit includes:
the word segmentation module is used for acquiring text information of all second legal documents in the legal document set and performing word segmentation processing on the text information of each second legal document to obtain second text word segmentation data;
and the characteristic vector processing module is used for inputting the second text word segmentation data into a pre-trained network model for processing to obtain a second characteristic vector corresponding to each second legal document.
Preferably, the determining unit includes:
the calculation module is used for calculating a distance value between the first characterization vector and the second characterization vector of each second legal document to obtain a distance value between the first characterization vector and each second characterization vector;
and the determining module is used for determining a second legal document corresponding to the first legal document according to the corresponding relation between the distance and the similarity, wherein the corresponding relation between the distance and the similarity is that the smaller the distance value is, the higher the similarity is.
A third aspect of the present invention discloses a storage medium including a stored program, wherein a device on which the storage medium is located is controlled to execute the method for determining a legal document according to any one of claims 1 to 5 when the program is executed.
A fourth aspect of the invention discloses a processor for running a program, wherein the program when running performs the method of determining a legal document according to any one of claims 1 to 5.
Through the technical scheme, the invention discloses a legal document determining method, a legal document determining system, a storage medium and a processor. The method comprises the steps of segmenting a first legal document to obtain first text segmentation data, then obtaining a first characterization vector corresponding to the first legal document based on the first text segmentation data by utilizing a network model obtained through pre-training, then obtaining a second characterization vector corresponding to each second legal document in a legal document set, and determining a second legal document corresponding to the first legal document based on the first characterization vector and the second characterization vector. The legal documents are determined through the process, a first characterization vector corresponding to the first legal document and a second characterization vector corresponding to the second legal document are obtained through a network model, and then the second legal document is determined based on the first characterization vector and the second characterization vector, so that the legal document similar to the case can be found in a legal library by a legal staff.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a schematic flow chart of a legal document determination method disclosed in an embodiment of the present invention;
FIG. 2 is a schematic flow chart of another legal document determination method disclosed in the embodiment of the present invention;
FIG. 3 is a schematic flow chart of another legal document determination method disclosed in the embodiment of the present invention;
FIG. 4 is a schematic flow chart of another legal document determination method disclosed in the embodiments of the present invention;
FIG. 5 is a schematic flow chart of another legal document determination method disclosed in the embodiments of the present invention;
FIG. 6 is a schematic diagram of a legal document determination system according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of another legal document determination system disclosed in an embodiment of the present invention;
FIG. 8 is a schematic diagram of another legal document determination system disclosed in an embodiment of the present invention;
FIG. 9 is a schematic diagram of another legal document determination system according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
As can be seen from the background art, in the prior art, similar legal documents can only be searched based on manual comparison one by one, which not only takes a lot of time, but also the accuracy cannot be effectively guaranteed, so that cases of the same type cannot be effectively returned. Therefore, the invention discloses a legal document determining method and a legal document determining system, which are used for rapidly and accurately determining similar legal documents without excessive labor consumption and time cost.
Example one
Fig. 1 is a schematic flow chart of a legal document determination method disclosed in the embodiment of the present invention. The method at least comprises the following steps:
step S101: and performing word segmentation processing on the acquired text information of the first legal document to obtain first text word segmentation data.
It should be noted that the text information here is a fact description section of the first legal document, and the fact description section includes: the fact of a major crime, the description of the crime course, the identification of the inspection yard, and the like, are related to the case.
In step S101, the word segmentation process is a process of recombining continuous word sequences into word sequences according to a certain specification, and the obtained text word segmentation data is single word data.
Step S102: and obtaining a first characterization vector of the first legal document based on the first text word segmentation data by using a network model obtained by pre-training.
In step S102, the network model is obtained by fusing a cyclic attention network RAM Net and a capsule network model with a neural network model, and the network model is used for obtaining a characterization vector representing a case corresponding to a corresponding legal document according to a case element in the first text participle data.
The specific execution process of step S102, as shown in fig. 2, mainly includes the following steps:
step S201: and mapping the text word segmentation data to a word vector model for word vector processing to obtain a word vector, wherein the word vector model is an input layer of the network model.
In step S201, the process of the word vector model for processing the text word segmentation data is to map the text word segmentation data into a space with a certain dimension, and obtain a word vector by representing the similarity between words. The dimension value of a certain dimension is generally 50-250, and can be determined according to specific situations, and 100 dimensions are preferred.
In addition, the word vector model comprises low-frequency long-tail words appearing in the corpus, and the low-frequency long-tail words have unique word vector expressions in the word vector model.
Step S202: and sequentially inputting the word vectors into a capsule network model and a RAM Net network model for processing to obtain a vector output by the RAM Net network model as a first representation vector of the first legal document.
In order to clearly describe the process of processing the word vectors in sequence by the capsule network and RAM Net network models in step S202, the following description is given by way of example.
For example, the capsule network model contains 5 neuron units, and the output dimension of a single neuron unit is 256. After a document is processed by using a word vector model, a word vector with 100 dimensions is obtained, the 100 dimensions are processed by a bidirectional LSTM network layer to obtain two vectors with 128 dimensions, and then the vectors are spliced into a vector with 256 dimensions. The 256-dimensional vector is processed through a capsule network to obtain a 1280-dimensional vector, the 500-dimensional vector is converted into a 500-dimensional vector through a full connection layer, the 500-dimensional vector is processed through a RAM Net network, and then a 250-dimensional vector F is output, wherein the vector F is a feature vector corresponding to the document.
Each node of the full connection layer is connected with all nodes of the RAM Net network and the capsule network model, and is used for converting the output vector of the capsule network model and then inputting the converted output vector into the RAM Net network for processing.
Step S103: and acquiring a second characterization vector of each second legal document in the legal document set.
In a specific implementation, as shown in fig. 3, a specific execution procedure of step S103 mainly includes:
step S301: acquiring text information of all second legal documents in the legal document set, and performing word segmentation processing on the text information of each second legal document to obtain second text word segmentation data.
The legal document set may be a legal document library or a legal document set obtained by a search engine, and any document set does not affect the entire scheme.
In addition, the principle of obtaining the second text segmentation data in step S301 is the same as the principle of obtaining the first text segmentation data in step S101, and redundant description is not repeated here.
Step S302: and inputting the second text word segmentation data into a network model obtained by pre-training for processing to obtain a second characterization vector corresponding to each second legal document.
It should be noted that the execution principle in step S302 is the same as the principle of obtaining the first token vector in step S102, and thus, redundant description is not repeated here.
It should be noted that, the step S103 of obtaining the second token vector of each second legal document in the legal document set and the step S102 of obtaining the first token vector of the first legal document may be performed simultaneously, or the step S102 may be performed first and then the step S103 is performed, or the step S103 may be performed first and then the step S102 is performed, and it is preferable that the step S102 is performed first and then the step S103 is performed in this embodiment.
In addition, the obtaining of the second representation vector of each second legal document in the legal document set may be performed by storing all the second representation vectors obtained after step S301 and step S302 in the second vector set F in correspondence with the second legal documents, and then directly obtaining the second representation vector corresponding to the second legal document from the second vector set.
Step S104: and determining a second legal document corresponding to the first legal document based on the first characterization vector and the second characterization vector of each second legal document.
In the process of executing step S104, as shown in fig. 4, the specific execution process includes the steps of:
step S401: and calculating the distance value between the first characterization vector and the second characterization vector of each second legal document to obtain the distance value between the first characterization vector and each second characterization vector.
In step S401, the distance between the first token vector and each token vector of the second legal document is calculated, where the distance is the euclidean distance, i.e., the euclidean distance, and here refers to the true distance between two token vector points in the m-dimensional space.
Further, all of the second set of legal documents may be listed by sorting the distance values in a descending order or in an ascending order.
Step S402: and determining a second legal document corresponding to the first legal document according to the corresponding relation between the distance and the similarity, wherein the corresponding relation between the distance and the similarity is that the smaller the distance value is, the higher the similarity is.
The legal document determining method disclosed in the embodiment of the present invention obtains the first text word segmentation data by segmenting the first legal document, then obtains the first characterization vector corresponding to the first legal document based on the first text word segmentation data by using the network model obtained by pre-training, then obtains the second characterization vector corresponding to each second legal document in the legal document set, and determines the second legal document corresponding to the first legal document based on the first characterization vector and the second characterization vector. The legal documents are determined through the process, a first characterization vector corresponding to the first legal document and a second characterization vector corresponding to the second legal document are obtained through a network model, and then the second legal document is determined based on the first characterization vector and the second characterization vector, so that the legal document similar to the case can be found in a legal library by a legal staff.
The legal document determining method is disclosed based on the embodiment of the invention. In the process of executing step S102, the network model involved is obtained by fusing the following manners, as shown in fig. 5, including the following steps:
step S501: the method comprises the steps of obtaining a published judicial literature as a training text, carrying out word vector training on the training text to obtain a word vector model, and using the word vector model as an input layer of the neural network model.
It should be noted that training for training a Word vector to obtain a Word vector model may be performed by Word2vec or Glove, but is not limited to the above training.
In addition, the principle of selecting the dimension of the word vector in step S501 is the same as that of selecting the dimension of the word vector in step S201, and thus, the description thereof is omitted.
Step S502: taking the word vector model as an input layer of the neural network model, and taking the capsule network as a second layer of the neural network model; and constructing the neural network model by taking the RAM Net network as a third layer of the neural network model.
In the process of executing the step S502, the capsule network processes the vector obtained by performing word vector training on the input layer, and then the RAM Net network processes the vector output by the capsule network; or the vector obtained by training the word vector of the input layer can be processed by the RAM Net network, and then the vector output by the RAM Net network can be processed by the capsule network.
Preferably, the capsule network processes the vector obtained by training the word vector of the input layer, and then the RAMNet network processes the vector output by the capsule network.
Step S503: and training the neural network model based on the training text, and taking the neural network model with iteration times reaching preset times or training rounds reaching specified times as the network model.
It should be noted that, for the training of the neural network model fusing the RAM Net network and the capsule network, the specific training process is as follows:
firstly, an abridged finding and factual determination section in a training text is obtained by using a rule determination system, wherein the abridged finding and factual determination section is a section which is described in detail about a scenario in a document.
Then, information such as crime names, law rules, criminal periods, single-person or multi-person crimes and the like of the judgment books can be obtained through the document analysis system.
And finally, training the neural network model fusing the RAM Net network and the capsule network through the document and the analyzed information, and taking the obtained trained neural network model fusing the RAM Net network and the capsule network as a network model.
Furthermore, a bidirectional LSTM network layer can be added between the word vector model layer and the capsule network layer to connect the vectors after text conversion end to end, so as to avoid text loss caused by wrong sequence.
Furthermore, in order to obtain a better network model, training times and an initial learning rate are set in the network model training process, and the learning rate is attenuated according to preset steps in the learning process, so that the learning capacity is optimized. For ease of understanding, this is illustrated here.
For example, an open document which needs to be learned is selected from a document library, a first learning document is input into the network model for learning, based on the initial learning rate of 1e-3, then every 25000 training steps, the learning rate is attenuated to 0.65 times of the original learning rate, the process is 1 time of network model learning, and after 15 input documents are trained, the acquisition of the documents in the library is stopped for training and learning.
It should be noted that training data of a network model is generally very large, hundreds of thousands to millions, and a batch of data is read by one training generally due to hardware video memory limitation, and the reading of the batch of data is a training step. For example, if the amount of the batch of data is 256, then reading 256 pieces of data once is a training step.
The legal document determining method disclosed in the embodiment of the present invention obtains the first text word segmentation data by segmenting the first legal document, then obtains the first characterization vector corresponding to the first legal document based on the first text word segmentation data by using the network model obtained by pre-training, then obtains the second characterization vector corresponding to each second legal document in the legal document set, and determines the second legal document corresponding to the first legal document based on the first characterization vector and the second characterization vector. The legal documents are determined through the process, a first characterization vector corresponding to the first legal document and a second characterization vector corresponding to the second legal document are obtained through a network model, and then the second legal document is determined based on the first characterization vector and the second characterization vector, so that the legal document similar to the case can be found in a legal library by a legal staff.
Further, the RAM Net network and the capsule network are respectively used as a network layer in the structure of the neural network model, so that the trained network model has rich vector expression capability of the capsule network and an attention mechanism in the RAM Net network, the network can pay attention to the first and other detailed conditions, and the document determining method can more accurately determine the second legal document.
Example two
Corresponding to the legal document determining method provided by the embodiment of the invention, the embodiment of the invention also provides a corresponding legal document determining system. As shown in fig. 6, a system for determining a legal document according to a second embodiment of the present invention includes:
the first processing unit 601 is configured to perform word segmentation processing on the acquired text information of the first legal document to obtain text word segmentation data; and the first representation vector of the first legal document is obtained based on the text word segmentation data by utilizing a network model obtained by pre-training.
A second processing unit 602, configured to obtain a second characterization vector of each second legal document in the legal document set.
A determining unit 603, configured to determine, based on the first characterization vector and the second characterization vector of each second legal document, a second legal document corresponding to the first legal document.
Preferably, the first processing unit 601, as shown in fig. 7, includes:
a text processing module 6011, configured to map the text word segmentation data to a word vector model for word vector processing, so as to obtain a word vector, where the word vector model is an input layer of the network model;
the network module 6012 is configured to sequentially input the word vectors into the capsule network and the RAM Net network model for processing, and obtain a vector output by the RAM Net network model as a first characterization vector of the first legal document.
Preferably, the second processing unit 602, as shown in fig. 8, includes:
the word segmentation module 6021 is configured to obtain text information of all second legal documents in the legal document set, and perform word segmentation processing on the text information of each second legal document to obtain second text word segmentation data.
And the characteristic vector processing module 6022 is configured to input the second text word segmentation data into a network model obtained through pre-training and process the network model to obtain a second characteristic vector corresponding to each second legal document.
Preferably, the determining unit 603, as shown in fig. 9, includes:
a calculating module 6031, configured to calculate a distance value between the first characterization vector and the second characterization vector of each second legal document, to obtain a distance value between the first characterization vector and each second characterization vector.
A determining module 6032, configured to determine a second legal document corresponding to the first legal document according to a corresponding relationship between a distance and a similarity, where the corresponding relationship between the distance and the similarity is that the smaller the distance value, the higher the similarity is.
The specific implementation principle and the implementation process of each unit in the system for determining a legal document disclosed in the embodiment of the present invention are the same as the method for determining a legal document disclosed in the embodiment of the present invention, and reference may be made to corresponding parts in the method for determining a legal document disclosed in the embodiment of the present invention, and redundant description is omitted here.
Based on the method for determining the legal document disclosed in the embodiment of the present invention, the modules may be implemented by a hardware device including a processor and a memory. The method specifically comprises the following steps: the modules are stored in the memory as program units, and the program units stored in the memory are executed by the processor to realize the recommendation of legal documents.
The processor comprises a kernel, and the kernel calls a corresponding program unit from the memory. The kernel can be set to one or more than one, and the legal documents are determined by adjusting the parameters of the kernel.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
Further, an embodiment of the present invention provides a processor, where the processor is configured to execute a program, where the program executes the method for determining the legal document.
Further, an embodiment of the present invention provides an apparatus, where the apparatus includes a processor, a memory, and a program stored in the memory and executable on the processor, and the processor implements the following steps when executing the program: performing word segmentation processing on the acquired text information of the first legal document to obtain first text word segmentation data; obtaining a first representation vector of the first legal document based on the first text word segmentation data by utilizing a network model obtained by pre-training; acquiring a second characterization vector of each second legal document in the legal document set; and determining a second legal document corresponding to the first legal document based on the first characterization vector and the second characterization vector of each second legal document.
Wherein the obtaining of the first characterization vector of the first legal document based on the first text segmentation data by using the network model obtained by pre-training comprises: mapping the text word segmentation data to a word vector model for word vector processing to obtain a word vector, wherein the word vector model is an input layer of the network model; and sequentially inputting the word vectors into a capsule network model and a RAM Net network model for processing to obtain a vector output by the RAM Net network model as a first representation vector of the first legal document.
Wherein the obtaining a second characterization vector for each second legal document in the set of legal documents comprises: acquiring text information of all second legal documents in the legal document set, and performing word segmentation processing on the text information of each second legal document to obtain second text word segmentation data. And inputting the second text word segmentation data into a network model obtained by pre-training for processing to obtain a second characterization vector corresponding to each second legal document.
Wherein said determining a second legal document corresponding to said first legal document based on said first characterization vector and a second characterization vector of said each second legal document comprises: and calculating the distance value between the first characterization vector and the second characterization vector of each second legal document to obtain the distance value between the first characterization vector and each second characterization vector. And determining a second legal document corresponding to the first legal document according to the corresponding relation between the distance and the similarity, wherein the corresponding relation between the distance and the similarity is that the smaller the distance value is, the higher the similarity is.
The equipment disclosed in the embodiment of the invention can be a PC, a PAD, a mobile phone and the like.
Further, an embodiment of the present invention also provides a storage medium having a program stored thereon, where the program is executed by a processor to implement display of a progress bar.
The present application further provides a computer program product adapted to perform a program for initializing the following method steps when executed on a data processing device:
performing word segmentation processing on the acquired text information of the first legal document to obtain first text word segmentation data; obtaining a first representation vector of the first legal document based on the first text word segmentation data by utilizing a network model obtained by pre-training; acquiring a second characterization vector of each second legal document in the legal document set; and determining a second legal document corresponding to the first legal document based on the first characterization vector and the second characterization vector of each second legal document.
Wherein the obtaining of the first characterization vector of the first legal document based on the first text segmentation data by using the network model obtained by pre-training comprises: mapping the text word segmentation data to a word vector model for word vector processing to obtain a word vector, wherein the word vector model is an input layer of the network model; and sequentially inputting the word vectors into a capsule network model and a RAM Net network model for processing to obtain a vector output by the RAM Net network model as a first representation vector of the first legal document.
Wherein the obtaining a second characterization vector for each second legal document in the set of legal documents comprises: acquiring text information of all second legal documents in the legal document set, and performing word segmentation processing on the text information of each second legal document to obtain second text word segmentation data; and inputting the second text word segmentation data into a network model obtained by pre-training for processing to obtain a second characterization vector corresponding to each second legal document.
Wherein said determining a second legal document corresponding to said first legal document based on said first characterization vector and a second characterization vector of said each second legal document comprises: calculating a distance value between the first characterization vector and a second characterization vector of each second legal document to obtain a distance value between the first characterization vector and each second characterization vector;
and determining a second legal document corresponding to the first legal document according to the corresponding relation between the distance and the similarity, wherein the corresponding relation between the distance and the similarity is that the smaller the distance value is, the higher the similarity is.
The method comprises the steps of performing word segmentation processing on acquired text information of a first legal document through hardware equipment consisting of a processor and a memory to obtain document word segmentation data, inputting the obtained word segmentation data into a neural network model to be processed to obtain a first characterization vector of the first legal document, acquiring a second characterization vector of each second legal document in a legal document set, and determining a second legal document corresponding to the first legal document based on the first characterization vector and the second characterization vector of the second legal document. The method is determined by the legal instrument disclosed above. Based on the distance relationship between the characterization vector of the first legal document and the characterization vector of the second legal document, cases similar to the first legal document are effectively found in the legal document set, and therefore, cases similar to the case can be found in a legal library for legal staff through the hardware device consisting of the processor and the memory disclosed by the invention.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, apparatus, client, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of legal instrument determination, comprising:
performing word segmentation processing on the acquired text information of the first legal document to obtain first text word segmentation data;
obtaining a first characterization vector of the first legal document based on the first text word segmentation data by using a network model obtained by training in advance, wherein the network model is obtained by fusing a cyclic attention network RAM Net and a capsule network model with a neural network model;
acquiring a second characterization vector of each second legal document in the legal document set;
determining a second legal document corresponding to the first legal document based on the first characterization vector and the second characterization vector of each second legal document;
the network model is used for obtaining a representation vector representing cases corresponding to corresponding legal documents according to case elements in the first text participle data.
2. The method of claim 1, wherein said deriving a first characterization vector for the first legal document based on the first text segmentation data using a pre-trained network model comprises:
mapping the text word segmentation data to a word vector model for word vector processing to obtain a word vector, wherein the word vector model is an input layer of the network model;
and sequentially inputting the word vectors into a capsule network model and a RAM Net network model for processing to obtain a vector output by the RAM Net network model as a first representation vector of the first legal document.
3. The method of claim 1, wherein obtaining the second characterization vector for each second legal instrument in the set of legal instruments comprises:
acquiring text information of all second legal documents in the legal document set, and performing word segmentation processing on the text information of each second legal document to obtain second text word segmentation data;
and inputting the second text word segmentation data into a network model obtained by pre-training for processing to obtain a second characterization vector corresponding to each second legal document.
4. The method of claim 1, wherein said determining a second legal instrument corresponding to the first legal instrument based on the first characterization vector and the second characterization vector of each of the second legal instruments comprises:
calculating a distance value between the first characterization vector and a second characterization vector of each second legal document to obtain a distance value between the first characterization vector and each second characterization vector;
and determining a second legal document corresponding to the first legal document according to the corresponding relation between the distance and the similarity, wherein the corresponding relation between the distance and the similarity is that the smaller the distance value is, the higher the similarity is.
5. A system for determining legal documents, comprising:
the first processing unit is used for performing word segmentation processing on the acquired text information of the first legal document to obtain text word segmentation data, and obtaining a first representation vector of the first legal document based on the first text word segmentation data by utilizing a network model obtained through pre-training, wherein the network model is obtained by fusing a circulating attention network RAM Net and a capsule network model with a neural network model, and the network model is used for obtaining the representation vector representing the case corresponding to the corresponding legal document according to the case elements in the first text word segmentation data;
the second processing unit is used for acquiring a second characterization vector of each second legal document in the legal document set;
a determining unit, configured to determine the second legal document corresponding to the first legal document based on the first characterization vector and the second characterization vector.
6. The system of claim 5, wherein the first processing unit comprises:
the text processing module is used for mapping the text word segmentation data to a word vector model for word vector processing to obtain a word vector, and the word vector model is an input layer of the network model;
and the network module is used for sequentially inputting the word vectors into the capsule network and the RAM Net network model for processing to obtain the vector output by the RAM Net network model as the first representation vector of the first legal document.
7. The system of claim 5, wherein the second processing unit comprises:
the word segmentation module is used for acquiring text information of all second legal documents in the legal document set and performing word segmentation processing on the text information of each second legal document to obtain second text word segmentation data;
and the characteristic vector processing module is used for inputting the second text word segmentation data into a pre-trained network model for processing to obtain a second characteristic vector corresponding to each second legal document.
8. The system of claim 5, wherein the determining unit comprises:
the calculation module is used for calculating a distance value between the first characterization vector and the second characterization vector of each second legal document to obtain a distance value between the first characterization vector and each second characterization vector;
and the determining module is used for determining a second legal document corresponding to the first legal document according to the corresponding relation between the distance and the similarity, wherein the corresponding relation between the distance and the similarity is that the smaller the distance value is, the higher the similarity is.
9. A storage medium characterized by comprising a stored program, wherein a device on which the storage medium is located is controlled to execute the method for determining a legal document according to any one of claims 1 to 5 when the program is executed.
10. A processor, characterized in that the processor is configured to run a program, wherein the program when running performs the method of determining a legal document according to any one of claims 1-5.
CN201811161204.8A 2018-09-30 2018-09-30 Legal document determining method and system Pending CN110990523A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811161204.8A CN110990523A (en) 2018-09-30 2018-09-30 Legal document determining method and system
PCT/CN2019/107255 WO2020063524A1 (en) 2018-09-30 2019-09-23 Method and system for determining legal instrument

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811161204.8A CN110990523A (en) 2018-09-30 2018-09-30 Legal document determining method and system

Publications (1)

Publication Number Publication Date
CN110990523A true CN110990523A (en) 2020-04-10

Family

ID=69950345

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811161204.8A Pending CN110990523A (en) 2018-09-30 2018-09-30 Legal document determining method and system

Country Status (2)

Country Link
CN (1) CN110990523A (en)
WO (1) WO2020063524A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111651604A (en) * 2020-06-04 2020-09-11 腾讯科技(深圳)有限公司 Emotion classification method based on artificial intelligence and related device
CN111694945A (en) * 2020-06-03 2020-09-22 北京北大软件工程股份有限公司 Legal association recommendation method and device based on neural network
CN112699215A (en) * 2020-12-24 2021-04-23 齐鲁工业大学 Grading prediction method and system based on capsule network and interactive attention mechanism

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230028534A1 (en) * 2021-07-23 2023-01-26 EMC IP Holding Company LLC Method and apparatus for contract analysis

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294319A (en) * 2016-08-04 2017-01-04 武汉数为科技有限公司 One is combined related cases recognition methods
US20170132730A1 (en) * 2015-11-11 2017-05-11 International Business Machines Corporation Legal document search based on legal similarity
CN107818138A (en) * 2017-09-28 2018-03-20 银江股份有限公司 A kind of case legal regulation recommends method and system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10489489B2 (en) * 2016-03-09 2019-11-26 Adobe Inc. Automatically classifying and presenting digital fonts
CN107562812B (en) * 2017-08-11 2021-01-15 北京大学 Cross-modal similarity learning method based on specific modal semantic space modeling
CN108170736B (en) * 2017-12-15 2020-05-05 南瑞集团有限公司 Document rapid scanning qualitative method based on cyclic attention mechanism
CN108121446B (en) * 2017-12-25 2018-11-30 邱亮南 Exchange method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170132730A1 (en) * 2015-11-11 2017-05-11 International Business Machines Corporation Legal document search based on legal similarity
CN106294319A (en) * 2016-08-04 2017-01-04 武汉数为科技有限公司 One is combined related cases recognition methods
CN107818138A (en) * 2017-09-28 2018-03-20 银江股份有限公司 A kind of case legal regulation recommends method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MAOMAO2017: "基于动态路由的胶囊网络在文本分类上的探索", 《CSDN》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694945A (en) * 2020-06-03 2020-09-22 北京北大软件工程股份有限公司 Legal association recommendation method and device based on neural network
CN111651604A (en) * 2020-06-04 2020-09-11 腾讯科技(深圳)有限公司 Emotion classification method based on artificial intelligence and related device
CN111651604B (en) * 2020-06-04 2023-11-10 腾讯科技(深圳)有限公司 Emotion classification method and related device based on artificial intelligence
CN112699215A (en) * 2020-12-24 2021-04-23 齐鲁工业大学 Grading prediction method and system based on capsule network and interactive attention mechanism
CN112699215B (en) * 2020-12-24 2022-07-05 齐鲁工业大学 Grading prediction method and system based on capsule network and interactive attention mechanism

Also Published As

Publication number Publication date
WO2020063524A1 (en) 2020-04-02

Similar Documents

Publication Publication Date Title
CN108920654B (en) Question and answer text semantic matching method and device
CN110287477B (en) Entity emotion analysis method and related device
CN110990523A (en) Legal document determining method and system
CN110008080B (en) Business index anomaly detection method and device based on time sequence and electronic equipment
Zhang et al. Quality attribute modeling and quality aware product configuration in software product lines
CN114298417A (en) Anti-fraud risk assessment method, anti-fraud risk training method, anti-fraud risk assessment device, anti-fraud risk training device and readable storage medium
CN110990560A (en) Judicial data processing method and system
CN111160783B (en) Digital asset value evaluation method and system and electronic equipment
CN109299276B (en) Method and device for converting text into word embedding and text classification
CN111199157B (en) Text data processing method and device
CN111144950A (en) Model screening method and device, electronic equipment and storage medium
Behera et al. Machine learning approach for reliability assessment of open source software
CN108733694B (en) Retrieval recommendation method and device
JP7059220B2 (en) Machine learning program verification device and machine learning program verification method
CN113656699B (en) User feature vector determining method, related equipment and medium
CN110532773B (en) Malicious access behavior identification method, data processing method, device and equipment
CN110969549A (en) Judicial data processing method and system
CN112465012A (en) Machine learning modeling method and device, electronic equipment and readable storage medium
CN113469111A (en) Image key point detection method and system, electronic device and storage medium
CN115982388B (en) Case quality control map establishment method, case document quality inspection method, case quality control map establishment equipment and storage medium
Suthaharan et al. Supervised learning algorithms
CN116501979A (en) Information recommendation method, information recommendation device, computer equipment and computer readable storage medium
CN112766288B (en) Image processing model construction method, device, electronic equipment and readable storage medium
CN110163470B (en) Event evaluation method and device
CN115168575A (en) Subject supplement method applied to audit field and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200410

RJ01 Rejection of invention patent application after publication