CN114860900A - Sentencing prediction method and device - Google Patents

Sentencing prediction method and device Download PDF

Info

Publication number
CN114860900A
CN114860900A CN202210365513.7A CN202210365513A CN114860900A CN 114860900 A CN114860900 A CN 114860900A CN 202210365513 A CN202210365513 A CN 202210365513A CN 114860900 A CN114860900 A CN 114860900A
Authority
CN
China
Prior art keywords
vector
prediction
word
chapters
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210365513.7A
Other languages
Chinese (zh)
Other versions
CN114860900B (en
Inventor
张淯易
黄继超
陈维强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hisense Group Holding Co Ltd
Original Assignee
Hisense Group Holding Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hisense Group Holding Co Ltd filed Critical Hisense Group Holding Co Ltd
Priority to CN202210365513.7A priority Critical patent/CN114860900B/en
Publication of CN114860900A publication Critical patent/CN114860900A/en
Application granted granted Critical
Publication of CN114860900B publication Critical patent/CN114860900B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Tourism & Hospitality (AREA)
  • Technology Law (AREA)
  • Human Resources & Organizations (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Economics (AREA)
  • Software Systems (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Machine Translation (AREA)

Abstract

The application discloses a sentencing prediction method and a sentencing prediction device, which are used for solving the problems of incomplete discrimination results and accuracy in the prior art. The method provided by the application comprises the following steps: acquiring case related information and a criminal fact description text; vectorizing the participles included in a plurality of chapters of the criminal fact description text to obtain a first word vector corresponding to each participle, and vectorizing the participles included in case related information to obtain a second word vector corresponding to each participle; performing feature extraction on the first word vectors included in the plurality of chapters to obtain a first feature vector of each chapter in the plurality of chapters, and determining the prediction category of each chapter according to the first feature vector of each chapter; performing feature extraction on the second word vector included in the case related information to obtain a second feature vector of the case related information; and performing normal prediction, criminal name prediction and criminal phase prediction according to the prediction categories of the first feature vectors and the second feature vectors corresponding to the chapters.

Description

Sentencing prediction method and device
Technical Field
The application relates to the technical field of information, in particular to a sentencing prediction method and device.
Background
Currently, legal decision prediction mainly involves three subtasks: namely the prediction of the names of crimes, laws and criminal phases. The criminal period prediction process is very complex, and not only the basic state and the committing process of the defendant need to be considered, but also whether the defendant is actively acquainted, the first factor and other relevant factors need to be considered. First, the prior art lacks external information and only attaches importance to the criminal fact description of the case. Criminal fact description does play an irreplaceable role in carrying out criminal name prediction and law statement prediction, but it is not practical to rely solely on the information in criminal fact description in the course of criminal period prediction. Secondly, in the prior art, the related information corresponding to the three subtasks is extracted from the criminal fact description, and the prediction is performed according to the information corresponding to a single subtask without considering the incidence relation among the subtasks, so that the judgment result has certain errors in the aspects of comprehensiveness and accuracy.
Disclosure of Invention
The embodiment of the application provides a sentencing prediction method and a sentencing prediction device, which are used for solving the problems of incomplete judgment results and accuracy in the prior art.
In a first aspect, an embodiment of the present application provides a sentencing prediction method, including:
acquiring case related information and a criminal fact description text, wherein the case related information comprises at least one of testimony, material evidence, informed person information, testimony, mouth supply of a suspect and a record; the crime fact description text comprises a plurality of chapters, each chapter of the plurality of chapters comprises a plurality of clauses, and each clause of the plurality of sentences comprises a plurality of participles;
vectorizing the participles included in the plurality of chapters to obtain a first word vector corresponding to each participle, and vectorizing the participles included in the case related information to obtain a second word vector corresponding to each participle; performing feature extraction on the first word vectors included in the plurality of chapters to obtain a first feature vector of each chapter in the plurality of chapters, and determining a prediction category of each chapter according to the first feature vector of each chapter, wherein the prediction category is a law article category or a criminal name category or a criminal period category;
performing feature extraction on a second word vector included in the case related information to obtain a second feature vector of the case related information; and performing normal prediction, criminal name prediction and criminal phase prediction according to the prediction categories of the first feature vectors corresponding to the chapters and the second feature vectors.
Based on the scheme, case related information is introduced to carry out criminal prediction on the basis of a criminal fact description text, and the prediction effect is improved. Meanwhile, due to the complexity of the criminal phase, a topological structure is introduced on multiple tasks, the criminal name is predicted, then the law is predicted, and finally the criminal phase is predicted on the basis of the criminal name and the law, so that the criminal phase prediction effect is further improved.
In a possible implementation manner, performing feature extraction on the first word vector included in the plurality of chapters to obtain a first feature vector of each chapter of the plurality of chapters includes: filtering the participles in a first chapter based on a first word vector included in the first chapter to obtain a filtered first chapter, wherein the first word vector in a plurality of clauses included in the filtered first chapter is related to sentry prediction, and the first chapter is any one of the plurality of chapters; combining the multiple clauses included in the filtered first discourse to obtain multiple clause combinations, wherein each clause combination in the multiple clause combinations comprises at least two clauses;
extracting the characteristics of each clause combination through a first semantic vector encoder to obtain a word-level characteristic vector of each clause combination; performing feature splicing on the word-level feature vectors of the multiple clause combinations to obtain statement vector representation of each clause combination; performing feature extraction on the sentence vector of each clause combination through a second semantic vector encoder to obtain a clause-level feature vector of each clause combination; and performing feature splicing on the sentence-level feature vectors of the plurality of sentence combinations to obtain the first feature vector.
Based on the scheme, the context characteristics among the multiple participles and the context characteristics among the clauses in each chapter are obtained through characteristic extraction.
In a possible implementation, the method further includes: encoding position vectors corresponding to a plurality of first word vectors included in the filtered first discourse, wherein the position vectors corresponding to the first word vectors are used for representing positions of participles corresponding to the first word vectors in texts corresponding to the first discourse; fusing the first word vectors of the multiple participles included in the filtered first discourse with the corresponding position vectors to obtain fused word vectors of the multiple participles in the first discourse;
the extracting the feature of each clause combination through the first semantic vector encoder to obtain the word-level feature vector of each clause combination comprises the following steps: and performing feature extraction on a first sentence combination by adopting a first semantic vector encoder according to a fused word vector of a plurality of clauses included in the first sentence combination to obtain a word-level feature vector of the first sentence combination, wherein the first sentence combination is any one of the plurality of sentence combinations.
Based on the scheme, the first word vector and the position vector are fused, so that the relative and absolute position information of each participle in the sentence can be obtained in the subsequent encoding process.
In a possible implementation, the method further includes: encoding position vectors corresponding to a plurality of first statement vectors included in the filtered first chapters, wherein the position vectors corresponding to the first statement vectors are used for representing positions of clauses corresponding to the first statement vectors in texts corresponding to the first chapters; fusing the plurality of first sentence vectors included in the filtered first discourse with corresponding position vectors to obtain a plurality of fused sentence vectors of the clauses in the first discourse;
the extracting the feature of the sentence vector of each sentence combination by the second semantic vector encoder to obtain the sentence-level feature vector of each sentence combination includes: and performing feature extraction on the first sentence combination by adopting a second semantic vector encoder according to a fusion sentence vector of a plurality of sentences included in the first sentence combination to obtain a sentence-level feature vector of the first sentence combination, wherein the first sentence combination is any one of the plurality of sentence combinations.
Based on the scheme, the first sentence vector and the position vector are fused, so that the relative and absolute position information of each clause in the sentence combination can be obtained in the subsequent coding process.
In one possible implementation, the first semantic vector encoder includes N attention network layers, a first neural network layer, and a first single-headed attention layer; each of the N attention network layers comprises a first multi-headed attention layer and a first additive normalization layer; n is a positive integer;
the method for extracting the features of the first sentence combination by adopting a first semantic vector encoder according to the fusion word vector of the plurality of participles included in the first sentence combination to obtain the word-level feature vector of the first sentence combination comprises the following steps:
a plurality of attention modules included in the first multi-head attention layer in the ith attention network layer respectively perform attention operation on fused word vectors of a plurality of participles included in the first sentence combination to obtain outputs of the plurality of attention modules;
the first addition normalization layer in the ith attention network layer splices the output results of the plurality of attention modules to obtain a spliced result; performing linear transformation on the splicing result according to the output result of the ith-1 attention network layer to obtain a first output result of a first multi-head attention layer of the ith attention network layer; i is a positive integer less than or equal to N; normalizing the first output result of the first multi-head attention layer of the ith attention network layer to obtain a second output result, wherein the second output result is used for linear transformation of the (i + 1) th attention network layer;
performing feature extraction on the second output result of the Nth attention network layer through the first neural network layer to obtain a feature matrix of a first word segmentation of each sentence combination;
and extracting characteristic information in a characteristic matrix of a first word segmentation through the first single-head attention layer to obtain a word-level characteristic vector of the first sentence combination.
Based on the scheme, the N attention network layers are used for extracting the features of the multiple participles included in the first sentence combination, long-distance features between the participles in the text can be captured, abundant context semantic representation information can be extracted, and the feature extraction capability is enhanced.
In one possible implementation, the second semantic vector encoder includes N attention network layers, a second neural network layer, and a second single-headed attention layer; each of the N attention network layers comprises a second multi-headed attention layer and a second additive normalization layer; n is a positive integer;
the method for extracting the features of the first sentence combination by adopting a second semantic vector encoder according to the fusion sentence vector of the plurality of sentences included in the first sentence combination to obtain the sentence-level feature vector of the first sentence combination comprises the following steps:
a plurality of attention modules included in the second multi-head attention layer in the ith attention network layer respectively perform attention calculation on a fusion sentence vector of a plurality of sentences included in the first sentence combination to obtain outputs of the plurality of attention modules;
the second addition normalization layer in the ith attention network layer splices the output results of the plurality of attention modules to obtain a spliced result; performing linear transformation on the splicing result according to the output result of the ith-1 attention network layer to obtain a third output result of a second multi-head attention layer of the ith attention network layer; i is a positive integer less than or equal to N; normalizing the third output result of the second multi-head attention layer of the ith attention network layer to obtain a fourth output result, wherein the fourth output result is used for linear transformation of the (i + 1) th attention network layer;
performing feature extraction on the fourth output result of the Nth attention network layer through the second neural network layer to obtain a feature matrix of the first clause of each clause combination;
and extracting characteristic information in a characteristic matrix of the first clause through the second single-head attention layer to obtain a clause-level characteristic vector of the first clause combination.
Based on the scheme, the N attention network layers are used for extracting the features of the multiple clauses included in the first clause combination, long-distance features between the clauses in the text can be captured, abundant context semantic representation information can be extracted, and the feature extraction capability is enhanced.
In a possible implementation manner, the performing, according to the prediction categories of the first feature vectors corresponding to the chapters and the second feature vector, a law prediction, a criminal name prediction, and a criminal phase prediction includes:
performing nonlinear transformation on the second feature vector and a first feature vector of which the prediction category is a normal category in a plurality of first feature vectors corresponding to the plurality of chapters to obtain a normal prediction vector, and performing normal prediction according to the normal prediction vector;
carrying out nonlinear transformation on the second characteristic vector, a first characteristic vector with a prediction category being a guilt category in a plurality of first characteristic vectors corresponding to the plurality of chapters and the normal prediction vector to obtain a guilt prediction vector, and carrying out guilt prediction according to the guilt prediction vector;
and carrying out nonlinear transformation on the second feature vector, the first feature vector with the prediction category being the criminal category in the plurality of first feature vectors corresponding to the plurality of chapters, the normal prediction vector and the criminal name prediction vector to obtain a criminal prediction vector, and carrying out criminal prediction according to the criminal prediction vector.
Based on the scheme, the topological structure is adopted in criminal forecasting, the criminal name is forecasted firstly, then the law is forecasted, and finally the criminal period is forecasted on the basis of the criminal name and the law, so that the criminal period forecasting effect is further improved.
In one possible implementation, the case-related information includes first data and second data; the first data comprise at least one item of testimony words, suspected person oral supply and notes, and the second data comprise at least one item of testimony, material evidence and information of an advertiser; the vectorizing processing of the participles included in the case related information to obtain a second word vector corresponding to each participle includes:
vectorizing the participles included in the first data to obtain a second word vector corresponding to each participle in the first data;
determining the category to which each participle included in the second data belongs, and determining a category vector corresponding to the category to which each participle included in the second data belongs from a data vector table; the data vector table comprises category vectors corresponding to a plurality of categories; and determining the second word vector corresponding to each participle according to the category vector corresponding to the category to which each participle included in the second data belongs.
Based on the scheme, case related information is introduced on the basis of the criminal fact description text, feature extraction is carried out on the case related information, criminal prediction is carried out jointly through the case related information introduced on the basis of the criminal fact description text, and the prediction effect is improved.
In one possible implementation manner, the filtering the participles in the first discourse based on the first word vector included in the first discourse to obtain a filtered first discourse includes:
filtering, by a convolutional neural network, a plurality of first word vectors included in the first chapters to obtain the filtered first chapters.
In a second aspect, an embodiment of the present application provides a sentencing prediction apparatus, including an acquisition unit and a processing unit;
the acquisition unit is used for acquiring case related information and criminal fact description texts, wherein the case related information comprises at least one of testimony, material evidence, information of an advertiser, testimony, oral supply of a suspect and a note; the crime fact description text comprises a plurality of chapters, each chapter of the plurality of chapters comprises a plurality of clauses, and each clause of the plurality of sentences comprises a plurality of participles;
the processing unit is used for vectorizing the participles included in the plurality of chapters to obtain a first word vector corresponding to each participle, and vectorizing the participles included in the case related information to obtain a second word vector corresponding to each participle; performing feature extraction on the first word vectors included in the plurality of chapters to obtain a first feature vector of each chapter in the plurality of chapters, and determining a prediction category of each chapter according to the first feature vector of each chapter, wherein the prediction category is a law article category or a criminal name category or a criminal period category;
the processing unit is further configured to perform feature extraction on a second word vector included in the case related information to obtain a second feature vector of the case related information; and performing normal prediction, criminal name prediction and criminal phase prediction according to the prediction categories of the first feature vectors corresponding to the chapters and the second feature vectors.
In a possible implementation manner, when performing feature extraction on the first word vector included in the chapters to obtain the first feature vector of each chapter of the chapters, the processing unit is specifically configured to:
filtering the participles in a first chapter based on a first word vector included in the first chapter to obtain a filtered first chapter, wherein the first word vector in a plurality of clauses included in the filtered first chapter is related to sentry prediction, and the first chapter is any one of the plurality of chapters;
combining the multiple clauses included in the filtered first discourse to obtain multiple clause combinations, wherein each clause combination in the multiple clause combinations comprises at least two clauses;
extracting the features of each clause combination through a first semantic vector encoder to obtain a word-level feature vector of each clause combination;
performing feature splicing on the word-level feature vectors of the multiple clause combinations to obtain statement vector representation of each clause combination;
performing feature extraction on the sentence vector of each sentence combination through a second semantic vector encoder to obtain a sentence-level feature vector of each sentence combination;
and performing feature splicing on the sentence-level feature vectors of the plurality of sentence combinations to obtain the first feature vector.
In one possible implementation, the processing unit is further configured to: encoding position vectors corresponding to a plurality of first word vectors included in the filtered first chapters, wherein the position vectors corresponding to the first word vectors are used for representing positions of participles corresponding to the first word vectors in texts corresponding to the first chapters; fusing the first word vectors of the multiple participles included in the filtered first discourse with the corresponding position vectors to obtain fused word vectors of the multiple participles in the first discourse;
the processing unit, when performing feature extraction on each sentence combination through the first semantic vector encoder to obtain a word-level feature vector of each sentence combination, is specifically configured to: and performing feature extraction on a first sentence combination by adopting a first semantic vector encoder according to a fused word vector of a plurality of clauses included in the first sentence combination to obtain a word-level feature vector of the first sentence combination, wherein the first sentence combination is any one of the plurality of sentence combinations.
In one possible implementation, the processing unit is further configured to: encoding position vectors corresponding to a plurality of first statement vectors included in the filtered first chapters, wherein the position vectors corresponding to the first statement vectors are used for representing positions of clauses corresponding to the first statement vectors in texts corresponding to the first chapters; fusing the plurality of first sentence vectors included in the filtered first chapters with corresponding position vectors to obtain fused sentence vectors of a plurality of clauses in the first chapters;
the processing unit, when performing feature extraction on the sentence vector of each sentence combination through the second semantic vector encoder to obtain a sentence-level feature vector of each sentence combination, is specifically configured to: and performing feature extraction on the first sentence combination by adopting a second semantic vector encoder according to a fusion sentence vector of a plurality of sentences included in the first sentence combination to obtain a sentence-level feature vector of the first sentence combination, wherein the first sentence combination is any one of the plurality of sentence combinations.
In one possible implementation, the first semantic vector encoder includes N attention network layers, a first neural network layer, and a first single-headed attention layer; each of the N attention network layers comprises a first multi-headed attention layer and a first additive normalization layer; n is a positive integer;
the processing unit, when performing feature extraction on a first sentence combination by using a first semantic vector encoder according to a fused word vector of a plurality of participles included in the first sentence combination to obtain a word-level feature vector of the first sentence combination, is specifically configured to: a plurality of attention modules included in the first multi-head attention layer in the ith attention network layer respectively perform attention operation on fused word vectors of a plurality of participles included in the first sentence combination to obtain outputs of the plurality of attention modules; the first addition normalization layer in the ith attention network layer splices the output results of the plurality of attention modules to obtain a spliced result; performing linear transformation on the splicing result according to the output result of the ith-1 attention network layer to obtain a first output result of a first multi-head attention layer of the ith attention network layer; i is a positive integer less than or equal to N; normalizing the first output result of the first multi-head attention layer of the ith attention network layer to obtain a second output result, wherein the second output result is used for linear transformation of the (i + 1) th attention network layer; performing feature extraction on the second output result of the Nth attention network layer through the first neural network layer to obtain a feature matrix of a first word segmentation of each sentence combination; and extracting characteristic information in a characteristic matrix of a first word segmentation through the first single-head attention layer to obtain a word-level characteristic vector of the first sentence combination.
In one possible implementation, the second semantic vector encoder includes N attention network layers, a second neural network layer, and a second single-headed attention layer; each of the N attention network layers comprises a second multi-headed attention layer and a second additive normalization layer; n is a positive integer;
the processing unit, when performing feature extraction on a first sentence combination by using a second semantic vector encoder according to a fused sentence vector of a plurality of sentences included in the first sentence combination to obtain a sentence-level feature vector of the first sentence combination, is specifically configured to: a plurality of attention modules included in the second multi-head attention layer in the ith attention network layer respectively perform attention calculation on a fusion sentence vector of a plurality of sentences included in the first sentence combination to obtain outputs of the plurality of attention modules; the second addition normalization layer in the ith attention network layer splices the output results of the plurality of attention modules to obtain a spliced result; performing linear transformation on the splicing result according to the output result of the ith-1 attention network layer to obtain a third output result of a second multi-head attention layer of the ith attention network layer; i is a positive integer less than or equal to N; normalizing the third output result of the second multi-head attention layer of the ith attention network layer to obtain a fourth output result, wherein the fourth output result is used for linear transformation of the (i + 1) th attention network layer; performing feature extraction on the fourth output result of the Nth attention network layer through the second neural network layer to obtain a feature matrix of the first clause of each clause combination; and extracting characteristic information in a characteristic matrix of the first clause through the second single-head attention layer to obtain a clause-level characteristic vector of the first clause combination.
In a possible implementation manner, the processing unit is specifically configured to, when performing the law prediction, the criminal name prediction, and the criminal period prediction according to the prediction categories of the plurality of first feature vectors corresponding to the plurality of chapters and the second feature vector: performing nonlinear transformation on the second feature vector and a first feature vector with a prediction category of a normal category in a plurality of first feature vectors corresponding to the plurality of chapters to obtain a normal prediction vector, and performing normal prediction according to the normal prediction vector; carrying out nonlinear transformation on the second characteristic vector, a first characteristic vector with a prediction category being a guilt category in a plurality of first characteristic vectors corresponding to the plurality of chapters and the normal prediction vector to obtain a guilt prediction vector, and carrying out guilt prediction according to the guilt prediction vector; and carrying out nonlinear transformation on the second feature vector, the first feature vector with the prediction category being the criminal category in the plurality of first feature vectors corresponding to the plurality of chapters, the normal prediction vector and the criminal name prediction vector to obtain a criminal prediction vector, and carrying out criminal prediction according to the criminal prediction vector.
In one possible implementation, the case-related information includes first data and second data; the first data comprise at least one item of testimony words, suspected person oral supply and notes, and the second data comprise at least one item of testimony, material evidence and information of an advertiser; the processing unit, when performing vectorization processing on the participles included in the case related information to obtain a second word vector corresponding to each participle, is specifically configured to: vectorizing the participles included in the first data to obtain a second word vector corresponding to each participle in the first data; determining the category to which each participle included in the second data belongs, and determining a category vector corresponding to the category to which each participle included in the second data belongs from a data vector table; the data vector table comprises category vectors corresponding to a plurality of categories; and determining the second word vector corresponding to each participle according to the category vector corresponding to the category to which each participle included in the second data belongs.
In a possible implementation manner, when the word segments in the first chapters are filtered based on the first word vectors included in the first chapters to obtain filtered first chapters, the processing unit is specifically configured to: filtering, by a convolutional neural network, a plurality of first word vectors included in the first chapters to obtain the filtered first chapters.
In a third aspect, an embodiment of the present application provides a sentencing prediction apparatus, including a memory and a processor;
the memory to store program instructions;
the processor is configured to invoke the program instructions stored in the memory, and execute the method according to the obtained program in any one of the possible implementation manners included in the first aspect and the first aspect.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, which stores computer instructions that, when executed on a computer, cause the computer to perform the method according to the first aspect and any one of the possible implementation manners included in the first aspect.
In addition, for technical effects brought by any one implementation manner of the second aspect to the fourth aspect, reference may be made to the technical effects brought by the first aspect and different implementation manners of the first aspect, and details are not described here.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
FIG. 1A is a schematic diagram of a system architecture provided in an embodiment of the present application;
fig. 1B is a schematic structural diagram of a server according to an embodiment of the present application;
fig. 2 is a schematic flow chart of a sentencing prediction method provided in an embodiment of the present application;
FIG. 3 is a schematic diagram of a network model for case-related information feature extraction according to an embodiment of the present disclosure;
fig. 4A is a schematic diagram of a network model for extracting features of a crime fact description text according to an embodiment of the present disclosure;
fig. 4B is a schematic flowchart of crime fact description text feature extraction according to an embodiment of the present disclosure;
fig. 5 is a schematic flow chart of obtaining word-level feature vectors according to an embodiment of the present disclosure;
fig. 6 is a schematic diagram of an attention network layer according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a classifier provided in an embodiment of the present application;
fig. 8 is a schematic structural diagram of a crime prediction model provided in an embodiment of the present application;
fig. 9 is a schematic view of a crime prediction device provided in an embodiment of the present application;
fig. 10 is a schematic view of another crime-forecasting device provided by an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
It is noted that relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Aiming at the problem that the criminal period is predicted only according to the criminal fact description of the case in the prior art, and a judgment result has certain errors in the aspects of comprehensiveness and accuracy, the criminal prediction method is provided, case related information is added on the basis of the criminal fact description text, and the criminal prediction is realized by coding and extracting features of the criminal fact description text and the case related information, so that the accuracy of criminal prediction can be improved.
Fig. 1A schematically shows a system architecture to which embodiments of the present application are applicable, which may comprise a crime prediction apparatus. In some embodiments, the sentencing prediction apparatus may comprise one or more servers 100, for example three servers in fig. 1A. The server 100 may be implemented by a physical server or may be implemented by a virtual server. The server may be implemented by a single server, or may be implemented by a server cluster composed of a plurality of servers. A single server or a cluster of servers implement the sentencing prediction method provided by the present application. Alternatively, the server 100 may be connected to a terminal device, receive a sentencing prediction task sent by the terminal device, or send a sentencing prediction result to the terminal device. For example, the terminal device may be a mobile phone, a tablet computer, a personal computer, and the like.
By way of example, referring to FIG. 1B, a server may include a processor 110, a communication interface 120, and a memory 130. Of course, other components, not shown in FIG. 1B, may also be included in the server 100.
Taking the example that the server 100 is connected with a plurality of terminal devices, the communication interface 120 is used for communicating with different terminal devices, and is used for receiving the sentencing prediction task sent by the terminal device or sending the sentencing prediction result to the terminal device.
In the embodiments of the present application, the processor 110 may be a general-purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, and may implement or execute the methods, steps, and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor.
The processor 110 is a control center of the server 100, connects various parts of the entire server 100 using various interfaces and routes, performs various functions of the server 100 and processes data by operating or executing software programs and/or modules stored in the memory 130 and calling data stored in the memory 130. Alternatively, processor 110 may include one or more processing units. The processor 110 may be a control component such as a processor, a microprocessor, a controller, etc., and may be, for example, a general purpose Central Processing Unit (CPU), a general purpose processor, a Digital Signal Processing (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof.
The memory 130 may be used to store software programs and modules, and the processor 110 executes various functional applications and data processing by operating the software programs and modules stored in the memory 130. The memory 130 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to a business process, and the like. Memory 130, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 130 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charge Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. The memory 130 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 130 in the embodiments of the present application may also be circuitry or any other device capable of performing a storage function for storing program instructions and/or data.
It should be noted that the structure shown in fig. 1A and 1B is only an example, and the embodiment of the present invention is not limited thereto.
In some scenarios, the sentencing prediction method provided by the embodiment of the present application may be implemented by one or more local terminal devices.
Fig. 2 exemplarily shows a flow of the sentencing prediction method, which may be performed by a sentencing prediction apparatus, which may be located in the server 100 shown in fig. 1B, such as the processor 110, or the server 100, and the server 100 is taken as an example for description in the following description, and for convenience of description, the server 100 is not further illustrated in the following description. The sentencing prediction means may also be located in a local terminal device. The specific process is as follows:
case-related information and a criminal fact description text are acquired 201.
In some embodiments, case-related information refers to other auxiliary information related to the case, such as testimonials, material certificates, information of the defendant, testimonials, suspicion of people, notes, and the like. The crime fact description text may include informer information, crime information. The information of the defendant may include information of foresubjects such as the number of foresubjects, foresubjects fine, foresubjects peeling, foresubjects criminal stage, and foresubjects criminal name. The crime information may include a crime fact description, or include a crime fact description, an amount involved in a case, and the like.
In some embodiments, the crime fact description text is tokenized after the crime fact description text is obtained. The word segmentation may be performed on the crime fact description text based on a word segmenter, which is not specifically limited by the present application. For example, the word segmenter may use a text Analysis word segmenter Analysis, chinese Language Processing package (Han Language Processing, HanLP).
In some embodiments, after the criminal fact description text is participled, the criminal fact description text may be divided into a plurality of chapters with a fixed number of participles or a number of clauses, each chapter including a plurality of clauses, each clause including a plurality of participles. The criminal fact description may also be divided into a plurality of chapters according to paragraphs, which are not specifically limited in the embodiments of the present application.
In some embodiments, the crime fact description text and case related information may come from a terminal device connected to the server. In some embodiments, the user may trigger the sentencing prediction task through the terminal device, and when the sentencing prediction task is sent to the server, the crime fact description text and case related information may be sent to the server, and the sentencing prediction task is used for instructing the server to perform sentencing prediction. And after receiving the sentencing prediction task, the server executes the sentencing prediction method flow.
202, vectorizing the participles included in the plurality of chapters to obtain a first word vector corresponding to each participle, and vectorizing the participles included in the case related information to obtain a second word vector corresponding to each participle.
In some embodiments, the crime fact description text is written and official, and has a simple and comprehensive characteristic, so that after a plurality of chapters included in the crime fact description text are determined, a plurality of participles corresponding to the plurality of chapters are vectorized to obtain a word vector corresponding to each participle. For convenience of description, the word vector corresponding to the participle included in the chapters is referred to as a first word vector. In some scenarios, the crime fact description text is first processed at word granularity to describe the crime factThe text is converted into a numerical form. Specifically, taking the first chapter as an example, each participle included in the first chapter corresponds to a token, and the first chapter can be expressed as T ═ (T ═ T 1 ,t 2 ,…,t n ). Wherein T represents a token sequence, T i Token represents a participle, and n is the number of participles included in the first chapter. Further, the token sequence included in the first piece may be mapped to an integer sequence. The sequence of integers may be represented as D (t → D) ═ D (D) 1 ,d 2 ,…,d n ) Wherein n is the number of participles included in the first chapter. The correspondence between token and an integer value corresponding to each participle can be determined, for example, by a dictionary. For example, the dictionary stores the correspondence between tokens corresponding to all the participles encountered in the current scene and an integer value. For example, if 5000 participles can be obtained based on the statistical feature training labels, the dictionary stores the correspondence between tokens corresponding to the 5000 participles and integer values. Different participles correspond to different integer values.
In some embodiments, after converting the crime fact description text into a numerical form in a text form by using word granularity, the participles may be mapped to a vector space to obtain a plurality of word vectors. The sentencing prediction device stores a word vector sequence, the number of the word vector sequence is the size of a dictionary, and a plurality of first word vectors are mapped according to the word vector sequence through numerical values corresponding to a plurality of participles. In particular, the word vector sequence may be denoted as E ═ (E) 1 ,e 2 ,…,e s ) Where s is the number of segmented words included in the dictionary, e i For word vectors, the length can be set, e i ∈R k And k is the length of the word vector. For example, if a word vector in the dictionary is "valid," the length of the word vector is 2. Each participle included in the crime fact description text corresponds to a token, a vector can be mapped through E according to a numerical value corresponding to each token, all mapped word vectors are combined according to the original sequence of the tokens, namely the sequence of the participle in the crime fact description text, and then a first word vector of each chapter including the participle in the crime fact description text is obtained. As an example, in chapter iIncluding s participles as an example, the first chapter may be denoted X ═ X (X) after mapping by the word vector sequence E 1 ,x 2 ,…,x s ) Wherein x is i For the first word vector, x, corresponding to the ith word-segment i ∈R k And k is the length of the first word vector. In some scenarios, the word vector sequence may be generated by a criminal device model. The sentencing prediction device can automatically generate a word vector sequence after receiving the sentencing prediction task sent by the terminal equipment. After converting the participles of the crime fact description text into numerical values, the participles can be mapped to a vector space according to a word vector sequence to obtain a plurality of word vectors.
In other embodiments, after obtaining the case-related information, the case-related information includes the first data and the second data. It is understood that the first data and the second data may be referred to by other terms, for example, the first data may be referred to as continuous data and the second data may be referred to as discrete data. The first data may include at least one of a testimony, a suspected person oral supply and a note, and the second data may include at least one of a testimony, a material evidence and an informed person.
In some scenarios, vectorization processing may be performed on the participles included in the first data to directly obtain a second word vector corresponding to each participle included in the first data. The method for vectorizing the participles included in the first data is the same as the above-mentioned method for vectorizing the participles included in the crime fact description text, and thus the details are not repeated here.
In other scenarios, the plurality of segmented words included in the first data may belong to different categories. Of course, some of the participles in the first data may belong to the same category, i.e. each category comprises a plurality of participles. The vectorized value ranges corresponding to the multiple categories may be different, and the range of the vectorized value ranges corresponding to the multiple categories is relatively large. Therefore, after vectorization processing is performed on the participles included in the first data, normalization operation is performed on word vectors obtained after vectorization of the plurality of participles included in the first data, and a second word vector corresponding to each participle is obtained. Therefore, the second word vector corresponding to the ith participle can satisfy the condition shown in the following formula (1):
Figure BDA0003585711300000091
wherein, c' i Representing a second word vector, mu, corresponding to the ith word in the first data c Representing the mean, σ, of all values included in the first data c Represents the variance, c i And the word vector is obtained by vectorizing the ith word segmentation.
In some scenarios, the first word vector, in which the first data includes a category to which all participles belong, may be represented by a sequence of vectors, such as C ═ C' 1 ,c′ 2 ,c′ 3 ,…,c′ g ). Wherein C' is epsilon R g And g is the number of categories of the first data.
In other embodiments, a category to which each participle included in the second data belongs is determined, and a category vector corresponding to the category to which each participle included in the second data belongs is determined from the data vector table. The number of categories in the second data is less than the number of categories in the first data. The category vector corresponding to the category to which each participle included in the second data belongs may be determined by a pre-constructed data vector table. The data vector table comprises category vectors corresponding to a plurality of categories.
203, extracting the features of the first word vectors of the chapters to obtain a first feature vector of each chapter of the chapters, and determining the prediction category of each chapter according to the first feature vector of each chapter.
And 204, extracting the features of the second word vector included in the case related information to obtain a second feature vector of the case related information.
For example, the Concat function may be used to splice second word vectors included in case-related information to obtain a second feature vector.
And 205, performing normal prediction, criminal name prediction and criminal phase prediction according to the prediction categories of the first feature vectors and the second feature vectors corresponding to the chapters.
In some embodiments, after obtaining the result of the sentencing prediction, the server may send the sentencing prediction result to the terminal device, and the user may obtain the sentencing prediction result through the terminal device.
Illustratively, an investigation prediction model may be deployed in the investigation prediction device, and the investigation prediction device performs step 202 and step 205 through the investigation prediction model.
Through the scheme, case related information is introduced to carry out criminal prediction on the basis of a criminal fact description text, and the prediction effect is improved. Meanwhile, due to the complexity of the criminal phase, a topological structure is introduced on multiple tasks, the criminal name is predicted, then the law is predicted, and finally the criminal phase is predicted on the basis of the criminal name and the law, so that the criminal phase prediction effect is further improved.
In one possible implementation, the feature extraction of the second word vector included in the case-related information in step 204 may be performed by using a feedforward neural network, as shown in fig. 3. For example, a second word vector in the first data is input into a first feedforward neural network to obtain an output vector of the first data. For example, the first feedforward neural network includes a fully-connected layer, and the second word vector in the first data is input into the fully-connected layer to obtain an output vector of the first data. The output vector of the first data satisfies the condition shown in the following formula (2):
T c =Relu(C′W c +b c ); (2)
wherein, T c Is an output vector of the first data, T c ∈R p P is the vector dimension of the output vector of the first data, and C is the vector sequence of the first data. Relu is the activation function, W c To transform the matrix, b c Is the bias term. Through the above formula, different classes of data isolated from each other can be fused together.
In some embodiments, a second word vector corresponding to the segmented word included in the second data may be input into the second feedforward neural network to obtain an output vector of the second data, as shown in fig. 3. For example, a second word vector in the second data is input into a second feedforward neural network to obtain an output vector of the second data. For example, the second feedforward neural network includes a fully-connected layer, and the second word vector in the second data is input to the fully-connected layer to obtain an output vector for each class included in the second data. The second data includes that the output vector of the ith category satisfies the condition shown in the following formula (3):
T di =Relu(d′ i W di +b di ); (3)
wherein, T di Representing an output vector corresponding to the ith class in the second data, Relu representing an activation function, W di Representing a transformation matrix, b di A bias term is represented. Further, the output vector of the second data may be represented as T d =(T d1 ,T d2 ,…,T dq ) (ii) a Wherein q is the number of categories included in the second data.
Further, after the output vector of the first data and the output vector of the second data are obtained, the output vector of the first data and the output vector of the second data are spliced to obtain a second feature vector of the case related information, as shown in fig. 3. The second feature vector satisfies a condition shown in the following formula (4):
T m =Concat(T c ,T d ); (4)
wherein, T m Representing the second feature vector, concat representing the splicing function, T c An output vector, T, representing the first data d An output vector representing the second data.
In one possible implementation, in the step 203 of obtaining the first feature vector of each chapter, word-level encoding may be performed by the first semantic vector encoder, and then sentence-level encoding may be performed by the second semantic vector encoder. The first semantic vector encoder may also be referred to as a word encoder and the second semantic vector encoder may also be referred to as a sentence encoder. In some embodiments, before the word-level coding and sentence-level coding, the participles included in each chapter are filtered, and the participles that are not related to the sentencing prediction are filtered out. For example, when performing filtering, a neural network may be employed to implement, such as a convolutional neural network.
As an example, referring to fig. 4A, a convolutional neural network is used for word segmentation filtering. Taking the first chapter as an example, the participles in the first chapter may be filtered based on a first word vector included in the first chapter to obtain a filtered first chapter, and the first word vector in the multiple clauses included in the filtered first chapter is all related to the measure prediction. Specifically, the filtered first chapters may be obtained by filtering the participles included in the first chapters through a convolutional neural network. The convolutional neural network has the advantages of fewer parameters and higher calculation speed than the fully-connected neural network. Performing convolution of text by a convolutional neural network may interpret the convolution kernel in a convolutional neural network as a filter, similar to a high-pass filter in the field of communications, that may pass meaningful word vectors (i.e., the first word vectors associated with sentry prediction) by convolving the first word vectors included in the first chapter, while ignoring meaningless words (e.g., "what", etc.). The use of a convolutional neural network to filter the participles included in the first chapter allows each word to have the characteristics of the word as one without overfitting. Performing convolution operation through a convolution kernel to finally obtain a feature representation quantity of token sequence D corresponding to the first chapter, which can represent C ═ C 1 ,c 2 ,…,c m ) As shown in fig. 4A. The first word vector of the plurality of participles included in the filtered first chapters has n-gram context characteristics after convolution, namely, the participles are not isolated from each other. For example, when n is 2, the first word segmentation vector in the first filtered chapter has context features with the two word vectors before and after the first word segmentation vector.
In some embodiments, the criminal fact description text belongs to a long chapter level, a long-distance dependence problem is inevitably generated, and in order to further obtain the relation between the participles, the clauses in the text can be recombined. Specifically, the multiple clauses included in the filtered first chapter may be combined to obtain multiple clause combinations, where each clause combination in the multiple clause combinations includes at least two clauses. The first chapter after filtering can be expressed as C ″ ═ C (C) 1 ″,c 2 ″,…,c m "). Illustratively, when sentence combination is performed, the order between the sentences is not limited. It is understood that a plurality of clauses can be combined after being arranged in different orders.
Referring to fig. 4A, further, feature extraction is performed on each sentence combination through the first semantic vector encoder to obtain a word-level feature vector of each sentence combination. And then carrying out feature splicing on the word-level feature vectors of the multiple clause combinations to obtain statement vector representation of each clause combination. Then, feature extraction is carried out on the sentence vector of each sentence combination through a second semantic vector encoder so as to obtain a sentence-level feature vector of each sentence combination; and performing feature splicing on the sentence-level feature vectors of the plurality of sentence combinations to obtain a first feature vector.
As an example, the first semantic vector encoder may employ a transform encoder. And the Transformer encoder performs feature extraction on each clause combination through an attention mechanism to obtain a word-level feature vector of each clause combination.
In some embodiments, the location of each participle may be combined when performing word-level encoding by the first semantic vector encoder. For example, referring to fig. 4B, the first word vector of the plurality of participles included in the filtered first chapters is fused with the corresponding position vector to obtain a fused word vector of the plurality of participles in the first chapters. The fused word vector satisfies the condition shown in the following formula (5):
Figure BDA0003585711300000111
wherein, C p Representing a fused word vector, PE representing a position-encoded vector, and C "representing a first word vector corresponding to a plurality of participles included in the filtered first chapter.
Figure BDA0003585711300000112
The representative element corresponds to the addition operator.
In the above formula, the mode of adding corresponding elements is selected to perform the fusion of the position coding vectors, instead of the method of selecting the vector splicing, so that the problem that the parameters are increased due to the mode of vector splicing does not occur, and overfitting is not easy to occur.
Illustratively, the position vector may be encoded by a sine function and a cosine function. Specifically, after the position vectors corresponding to the plurality of first word vectors included in the combined first chapters are encoded, the position vector of the ith participle in the chapters satisfies the conditions shown in the following formulas (6) and (7):
Figure BDA0003585711300000113
Figure BDA0003585711300000114
where pos represents the position of the current word in the sentence, i represents the index of each word vector in the first chapter, and dmodel represents the dimensions of the word vector. Thus, given a participle's position pos, a location vector of dmodel dimensions can be generated according to equations (6) and (7) above. The generated position vector is absolute position coding, but since a trigonometric function is used, the absolute position coding also includes relative position information between the participles.
The position coding is carried out on the plurality of first word vectors, and the plurality of first word vectors are fused to obtain the fused word vectors, so that the situation that the position information of the participles in the sentences cannot be obtained and the final criminal prediction result is influenced can be avoided. It can be understood that, in general, the Transformer has no mechanism for capturing the relative positions of the participles in the clause, and the output is not affected when the participles exchange positions. When a Transformer encoder is adopted, word sequence information of the participles is lost, and relative and absolute position information of each participle in a sentence cannot be obtained. And adding the position of the participle in the sentence or the chapter into subsequent characteristic extraction through the coding position.
In some embodiments, the fused word vector of the plurality of participles for obtaining the first sentence combination may be input into the first semantic encoder and feature extraction may be performed to obtain a word-level feature vector of the first sentence combination, as shown in fig. 4B. The first semantic vector encoder comprises N attention network layers, a first neural network layer and a first single-head attention layer; each of the N attention network layers comprises a first multi-head attention layer and a first additive normalization layer; n is a positive integer. The process of extracting features of each sentence combination through the first semantic vector encoder to obtain a word-level feature vector of each sentence combination is shown in fig. 5, and includes the following steps:
501, a plurality of attention modules included in a first multi-head attention layer in an ith attention network layer respectively perform attention operation on a fused word vector of a plurality of participles included in a first sentence combination to obtain outputs of the plurality of attention modules.
In some embodiments, the first semantic vector encoder includes N layers of the first multi-headed attention layer, each layer performing a multi-headed attention mechanism, each layer of the first multi-headed attention layer including a plurality of attention modules. And inputting the fused word vectors of the multiple participles included in the first clause combination into the multiple attention modules through a multi-head attention mechanism respectively to obtain the outputs of the multiple attention modules. For example, the ith layer the first multi-headed attention layer includes h attention modules. And taking the fused word vectors of the multiple participles as the input of the first multi-head attention layer of the ith layer, and inputting the fused word vectors into the h attention modules. And the h attention modules respectively carry out attention operation on the input fusion word vectors of the multiple participles to obtain the output of the h attention modules. Each attention module may perform an attention operation using the following equations (8) to (10), and the output result of the ith attention module may be expressed by equation (10).
Q,K,V=C p ; (8)
Q′ i =QW i Q 、K′ i =KW i K 、V′ i =VW i V ; (9)
Head i =Attention(Q′ i ,K′ i ,V′ i ); (10)
Wherein, C p Representing a fused word vector, W i Q Query weight matrix, Q ', representing the ith attention module in the ith first multi-head attention layer' i A query matrix representing an ith attention module in an ith first multi-headed attention layer. W i K Key weight matrix, K ', representing the ith attention module in the ith first multi-head attention layer' i A key matrix representing an ith attention module in an ith first multi-headed attention layer. W i V A value weight matrix, V 'representing the ith attention module in the ith layer first multi-head attention layer' i A matrix of values representing an ith attention module in an ith layer of the first multi-headed attention layer. Head i The output matrix of the ith Attention module is shown, and Attention is shown as the Attention operation.
502, a first addition normalization layer in the ith attention network layer splices output results of a plurality of attention modules in each layer to obtain a splicing result; performing linear transformation on the splicing result according to the output result of the ith-1 attention network layer to obtain a first output result of a first multi-head attention layer of the ith attention network layer; and normalizing the first output result of the first multi-head attention layer of the ith attention network layer to obtain a second output result.
In some embodiments, the data form of the output result of the attention module is a matrix, the data form of the stitching result is also a matrix, and the dimension number of the stitching result is equal to the sum of the dimension numbers of the output results of each attention module. The splicing mode can be transverse splicing, and the splicing process can be realized by calling a concat function. It should be understood that the manner of transverse stitching is merely illustrative. Optionally, the output result of each attention module is spliced by using other splicing manners, for example, the output result of each attention module is spliced by using a longitudinal splicing manner to obtain a splicing result, and the number of lines of the splicing result is equal to the sum of the number of lines of the output result of each attention module.
In some embodiments, after obtaining the stitching result, the stitching result may be linearly transformed to obtain a first output result. The linear transformation may be performed by multiplying a weight matrix, multiplying the splicing result by the weight matrix, and taking the product as the first output result. Alternatively, the linear transformation may also adopt other manners besides the multiplication by the weight matrix, for example, the splicing result is multiplied by a certain constant to perform linear transformation on the splicing result, or the splicing result is added by a certain constant to perform linear transformation on the splicing result, and the manner adopted by the linear transformation in the embodiment of the present application is not limited specifically.
As an example, in the embodiment of the present application, when the output results of a plurality of attention modules in a first multi-head attention layer of an ith attention network layer are spliced, contract splicing may be adopted to obtain a splicing result. And then, when the splicing result is subjected to linear transformation, a first output result can be obtained by multiplying the splicing result by a weight matrix. Wherein the weight matrix is the output result of the i-1 st attention network layer. The first output result of the first multi-head attention layer of the ith attention network layer satisfies a condition shown by the following formula (11):
Figure BDA0003585711300000121
wherein, W i O The weight matrix representing the ith attention network layer (i.e. the second output result of the (i-1) th attention network layer), concat represents the splicing function, MHA i (Q, K, V) represents a first output result of the ith layer attention network layer, h represents the number of attention modules in a first multi-head attention layer in the ith layer attention network layer, and h is a positive integer greater than 1;
Figure BDA0003585711300000122
the output of the h attention modules in the i-th layer attention network layer is shown.
By utilizing a multi-head attention mechanism, long-distance features between participles in the text can be captured, abundant context semantic representation information can be extracted, and the feature extraction capability is enhanced. The vector splicing is performed on the output results of the plurality of attention modules, so that the original information is introduced when the first output result is calculated, and the problem of information loss can be solved. In addition, vector splicing of output results of a plurality of attention modules is equivalent to introducing a network access, so that a part of the network can be directly transmitted to original information without passing through a complex network when the network is reversely transmitted, and gradient explosion or gradient disappearance is prevented.
In some embodiments, after the first output result is obtained, the first output result is normalized to obtain a second output result. The normalized mean and variance satisfy the conditions shown in the following equations (12) (13):
Figure BDA0003585711300000131
Figure BDA0003585711300000132
in the above formula, h represents the number of attention modules in the ith attention network layer,
Figure BDA0003585711300000133
represents the output, μ, of the g attention module in the i attention network layer i Mean, σ, of attention module outputs representing the ith layer i The variance of the attention module output of the ith layer is indicated.
And each layer normalizes the first output result of the first multi-head attention layer of the ith layer of attention network layer through the same mean value and variance to obtain a second output result of the ith layer of attention network layer. For example, the second output result of the ith attention network layer may be represented by M ═ M' 1 ,m′ 2 ,m′ 3 ,…,m′ n ) Is represented by m' j The condition shown in the following formula (14) is satisfied:
Figure BDA0003585711300000134
wherein m' j Represents the jth vector, m ', in the second output result of the ith layer attention network layer' j ∈R m M represents m' j Dimension of (d), m j Represents the jth vector in the first output result of the first multi-headed attention layer of the ith layer attention network layer.
After the normalization, the data distribution can be relatively consistent. In the process of network propagation, deviation often occurs, which causes difficulty in back propagation. And carrying out normalization operation on each layer, and enabling the second output result of each layer to accord with normal distribution after normalization. The normalization is characterized by not depending on the number of input sequences and the length of the input sequences, and has a promoting effect on the effect of the neural network.
In some embodiments, after obtaining the second output result of the ith layer attention network layer, the second output result of the ith layer attention network layer may be used for the linear transformation of the (i + 1) th attention network layer. For example, after the fused word vector of the multiple participles included in the first sentence combination is input into the 1 st attention network layer to obtain the second output result of the 1 st attention network layer, the obtained second output result of the 1 st attention network layer may be used as the weight matrix of the linear transformation in the 2 nd attention network layer. For example, the second output result of layer 1 may be represented as M' 1 May be prepared from M' 1 Weight matrix as a linear transformation in the 2 nd attention network layer
Figure BDA0003585711300000135
By analogy, the second output result M 'of the ith attention network layer can be obtained' i Weight matrix in linear transformation for i +1 th attention network layer
Figure BDA0003585711300000136
As shown in fig. 6. The weight matrix satisfies the condition shown in the following formula:
Figure BDA0003585711300000137
wherein,
Figure BDA0003585711300000138
weight matrix, M 'representing the 2 nd attention network layer' 1 Representing a second output result, MHA, of the 1 st attention network layer 2 (Q, K, V) represents the first output result of the layer 2 attention network layer, W i O Weight matrix, M 'representing the ith attention network layer' i-1 Representing a second output result, MHA, of the i-1 st attention network layer i (Q, K, V) represents a first output result of the i-th layer attention network layer,
Figure BDA0003585711300000139
weight matrix, M 'representing the ith +1 attention network layer' i Showing the second output, MHA, of the ith attention network layer i+1 (Q, K, V) represents the first output result of the i +1 th layer attention network layer.
503, performing feature extraction on the second output result of the nth attention network layer through the first neural network layer to obtain a feature matrix of the first participle of each sentence combination.
In some embodiments, after obtaining the second output result of the nth attention network layer, the second output result of the nth attention network layer may be linearly or non-linearly transformed by the first neural network to obtain the feature matrix of the first participle for each sentence combination. The linear transformation may include an operation of multiplication by a matrix, an operation of addition by an offset, and the nonlinear transformation may be realized by a nonlinear function. For example, the non-linear transformation may be a maximum operation. For example, a max function may be used. The max function is only an exemplary implementation manner of the nonlinear transformation, and other manners may also be used to perform the nonlinear transformation, for example, performing operation by using an activation function, so as to implement the nonlinear transformation. As an example, the first neural network in the embodiment of the present application may be implemented by a feedforward neural network, and specifically, a two-layer fully-connected network may be used to perform feature extraction on the second output result of the nth attention network layer, and the feature matrix of the first participle of each clause combination satisfies a condition shown in the following formula (15):
M d =Relu((M′W 1 +b 1 )W 2 +b 2 ); (15)
wherein M' represents the second output result of the Nth attention network layer, M d A feature matrix, W, representing the first participle 1 、W 2 Representing a parameter matrix, W 1 、W 2 ∈R m×m ,b 1 、b 2 Representing the bias term and Relu the activation function.
And 504, extracting feature information in the feature matrix of the first segmentation through the first single-head attention layer to obtain a word-level feature vector of the first segmentation combination.
In some embodiments, a trainable attention vector may be used to extract feature information in the feature matrix of the first segmentation using the Q query vector in the existing single-headed attention layer to obtain a word-level feature vector of the first segmentation group. The word-level feature vector satisfies the conditions shown in the following equations (16) to (18):
a i =u w ·m di ; (16)
Figure BDA0003585711300000141
Figure BDA0003585711300000142
wherein, a i Is a 'of a feature matrix weighted by attention device' i Is a feature matrix normalized by softmax, u w For initialized attention vector, m di Is the ith column vector, u, in the feature matrix of the first participle w ∈R m ,s j Representing word-level feature vectors, s j ∈R m ,R m Is a matrix of m x m.
In some embodiments, feature concatenation may be performed on the word-level feature vectors of multiple sentence combinations to obtain a sentence vector representation of each sentence combination. For example, when the first sentence combination includes n word-level feature vectors, then the sentence vector of the first sentence may be expressed as S ═ S (S) 1 ,s 2 ,s 3 ,…,s n )。
In some embodiments, feature extraction may be performed on the sentence vectors of each sentence combination by the second semantic vector encoder to obtain a sentence-level feature vector of each sentence combination. In some scenarios, the location vectors corresponding to the plurality of first sentence vectors included in the filtered first chapters may be encoded before feature extraction is performed on the sentence vectors. And the position vector corresponding to the first sentence vector is used for representing the position of the clause corresponding to the first sentence vector in the text corresponding to the first chapter.
In some embodiments, after obtaining the position vector corresponding to the first sentence vector, the position vector of the first sentence vector is fused with the first sentence vector to obtain a fused sentence vector. The specific method may refer to the above encoding method of the fused word vector, and is not described herein again. Further, the fused sentence vector corresponding to the multiple clauses may be input to the second semantic vector encoder and feature extraction may be performed, so as to obtain a clause-level feature vector of the first clause combination, as shown in fig. 4B.
In some embodiments, the second semantic vector encoder comprises N attention network layers, a second neural network layer, and a second single-headed attention layer; each of the N attention network layers includes a second multi-headed attention layer and a second additive normalization layer. The method comprises the following steps of performing feature extraction on a first sentence combination by adopting a second semantic vector encoder according to a fusion sentence vector of a plurality of sentences included in the first sentence combination to obtain a sentence-level feature vector of the first sentence combination, wherein the sentence-level feature vector comprises the following steps:
601, a plurality of attention modules included in a second multi-head attention layer in the ith network attention layer respectively perform attention calculation on a fusion sentence vector of a plurality of sentences included in the first sentence combination to obtain outputs of the plurality of attention modules.
602, splicing the output results of the plurality of attention modules of each layer by a second addition normalization layer in the ith network attention layer to obtain a splicing result; performing linear transformation on the splicing result according to the output result of the (i-1) th attention network layer to obtain a third output result of a second multi-head attention layer according to the (i-1) th attention network layer; i is a positive integer less than or equal to N; and normalizing the third output result of the second multi-head attention layer according to the ith attention network layer to obtain a fourth output result.
In some embodiments, the fourth output result may be used for a linear transformation of the (i + 1) th attention network layer.
603, performing feature extraction on the fourth output result of the nth attention network layer through the second neural network layer to obtain a feature matrix of the first clause of each clause combination.
And 604, extracting feature information in the feature matrix of the first clause through the second single-head attention layer to obtain a clause-level feature vector of the first clause combination.
The specific method of steps 601-604 is the same as that of steps 501-504, and is not described herein again.
In some embodiments, after the sentence-level feature vectors of the multiple sentence combinations are obtained, the sentence-level feature vectors of the multiple sentence combinations may be subjected to feature concatenation to obtain a first feature vector corresponding to each chapter.
In some embodiments, after obtaining the first feature vector corresponding to each chapter, the prediction category of each chapter is determined according to the first feature vector. Therefore, the information of the plurality of chapters can be stripped, and the prediction categories can be determined to be the plurality of chapters corresponding to the law article category, the criminal name category and the criminal period category respectively.
After determining the prediction category of each chapter, criminal phase prediction, french prediction and criminal name prediction can be performed by combining the second feature vector, the first feature vector of each chapter and the prediction category. For example, referring to fig. 7, the sentry prediction apparatus may obtain a normal prediction vector by performing a nonlinear transformation on the second feature vector and a plurality of first feature vectors corresponding to chapters of which the prediction category is a normal category, and perform normal prediction according to the normal prediction vector. And carrying out nonlinear transformation according to the second characteristic vector, a plurality of first characteristic vectors corresponding to a plurality of chapters with prediction categories as the guiltname categories and the guiltname prediction vector to obtain a guiltname prediction vector, and carrying out guiltname prediction according to the guiltname prediction vector. And carrying out nonlinear transformation according to the second characteristic vector, a plurality of first characteristic vectors corresponding to a plurality of chapters with the prediction types being the criminal periods, the normal prediction vector and the criminal name prediction vector to obtain a criminal period prediction vector, and carrying out criminal period prediction according to the criminal period prediction vector.
In one example, performing criminal prediction, law prediction, and criminal name prediction may be performed by a classifier, which may include a law prediction network, a criminal name prediction network, and a criminal prediction network. As shown in fig. 8. The law enforcement forecast network, the criminal name forecast network and the criminal period forecast network can adopt a forward propagation network. With reference to fig. 3 and 4B, after the prediction category corresponding to each chapter is obtained through the network of the encoded crime fact description text corresponding to fig. 4B, and the second feature vector is obtained through the network corresponding to fig. 3, the second feature vector is spliced with a plurality of first feature vectors corresponding to a plurality of chapters of which the prediction category is the law category (see formula (19)), so as to obtain a first spliced vector, and the first spliced vector is input into the classifier (see formulas (20) and (21)), so as to perform law prediction. Generally, a law enacted by law includes a plurality of clauses, and different clauses may be set as one clause category. The calculation process of the law prediction satisfies the conditions shown in the following equations (19) to (21):
Figure BDA0003585711300000151
Figure BDA0003585711300000152
Figure BDA0003585711300000153
wherein,
Figure BDA0003585711300000154
is a first stitching vector, T m Is a second feature vector, T 1 A first feature vector corresponding to a plurality of chapters of which the prediction category is a law category,
Figure BDA0003585711300000155
predicting the vector, T, for the normal l Probability distribution, T, for each clause category in the French forecast l ∈R x X is the number of first eigenvectors of the normal prediction,
Figure BDA0003585711300000156
W l 1 in order to be the weight, the weight is,
Figure BDA0003585711300000157
relu is the activation function for the bias term.
In some embodiments, the input vector for the guilt name prediction includes T m ,T 2 And the normal prediction vector T l 1 . Will T m ,T 2 ,T l 1 The stitching is performed to obtain a second stitching vector (see formula (22)), the second stitching vector is input into the classifier (see formula (23) and formula (24)), and the criminal name prediction is performed. In general, the names of the guilties may include a plurality of types, and different guilties may be set as one guilty category. The calculation process of the criminal name prediction satisfies the conditions shown in the following equations (22) to (24):
Figure BDA0003585711300000161
Figure BDA0003585711300000162
Figure BDA0003585711300000163
wherein,
Figure BDA0003585711300000164
representing a second stitching vector, T m Representing a second feature vector, T l 1 Representing a normal prediction vector, T 2 A first feature vector corresponding to a plurality of chapters representing a prediction category as a guilty category,
Figure BDA0003585711300000165
representing the criminal name prediction vector, T ch Representing the probability distribution, T, of each of the categories of the names in the prediction of the names of the guilties ch ∈R y Y is the number of first feature vectors predicted by the name of the guilt,
Figure BDA0003585711300000166
in order to be the weight of the weight,
Figure BDA0003585711300000167
relu is the activation function for the bias term.
In some embodiments, the input vector for criminal prediction comprises T m ,T 3 ,T l 1 And guilty name prediction vector
Figure BDA0003585711300000168
Will T m ,T 2 ,T l 1 ,
Figure BDA0003585711300000169
And (3) performing splicing to obtain a third splicing vector (see formula (25)), inputting the third splicing vector into a classifier (see formula (26) and formula (27)), and performing criminal prediction. In general, a criminal phase may include a variety of time periods, such as 5 years, 10 years, no penalty, dead criminal, and no period. Different criminal periods can be set to a period classOtherwise. The calculation process of the penalty period prediction satisfies the conditions shown in the following equations (25) to (27):
Figure BDA00035857113000001610
Figure BDA00035857113000001611
Figure BDA00035857113000001612
wherein,
Figure BDA00035857113000001613
representing a second stitching vector, T m Representing a second feature vector, T l 1 The normal-prediction vector is represented by a normal-prediction vector,
Figure BDA00035857113000001614
predicting vectors for criminal names, T 3 A first feature vector corresponding to a plurality of chapters representing a prediction category as a criminal phase category,
Figure BDA00035857113000001615
expressing the penalty phase prediction vector, T p Representing the probability distribution, T, of each term class in criminal phase prediction p ∈R z Z is the number of first feature vectors for criminal phase prediction,
Figure BDA00035857113000001616
in order to be the weight, the weight is,
Figure BDA00035857113000001617
relu is the activation function for the bias term.
In some embodiments, the crime prediction model may be trained over multiple samples in a training set. The samples include official documents and case related information. The referee document comprises document information, informer information, crime fact description and court judgment information. The document information is an abstract and a title in a crime referee document, and the court judgment information comprises fact identification part information and label information. The factual confirmation comprises conclusion type information, money amount type information, plot type information, consequence type information and acquitted attitude information. Label extraction three types: each item category in the related law article, the crime name (e.g., containing innocence) includes multiple crime name categories, the criminal term (e.g., containing no crime, dead criminal, and no time). When the criminal prediction model is trained, a plurality of samples can be input into the criminal prediction model through multiple iterations, one sample can be input each time, and each network parameter in the criminal prediction model is adjusted by comparing a law prediction result output by the criminal prediction model and aiming at the sample with a law prediction result in a sample label. And comparing the criminal phase prediction result of the sample output by the criminal prediction model with the criminal phase label in the sample label to adjust each network parameter in the criminal prediction model. And comparing the crime name prediction result of the sample output by the sentencing prediction model with the crime name labels in the sample labels to adjust each network parameter in the sentencing prediction model.
In one possible example, when adjusting the network parameter, the network parameter may be adjusted by determining a comparison result (i.e., a loss value) obtained by the comparison through a loss function. The loss function may, for example, cross the entropy loss function.
The penalty for each prediction task may be obtained using a cross-entropy penalty function, the expression of which satisfies the condition shown in the following equation (28):
Figure BDA00035857113000001618
wherein y represents whether the data belongs to the current category, for example, y may take a value of 0 or 1, l represents the category, takes a value of 1 or 2 or 3,
Figure BDA00035857113000001619
indicates the prediction result, i.e. the probability of belonging to the current class,
Figure BDA00035857113000001620
to predict the loss value of the ith sample as class i. A value of 1 or 2 or 3 is used to identify a loss value predicted by a criminal name, a loss value predicted by a law article, or a loss value predicted by a criminal phase. For example 1 for identifying criminal name predictions, 2 for identifying law enforcement predictions, and 3 for identifying criminal period predictions.
In some embodiments, the loss value determined by the loss function may be predicted for the name of the crime, and the loss value determined by the loss function may be predicted by the law and the loss value determined by the loss function may be predicted by the criminal term to be accumulated to obtain a total loss value, and then the network parameters of the criminal prediction model may be adjusted based on the total loss value. In some scenarios, the network parameters may be adjusted by an optimization algorithm, for example, by an Adam optimization algorithm.
In some possible embodiments, training is done separately for different networks in the crime prediction model. Such as separately trained on the network used to encode crime fact description text of fig. 4B. Each sample in the training set of the network encoding the crime fact description text may include a referee document and a label corresponding to each sentence or paragraph in the referee document, the label indicating that the sentence or paragraph belongs to a label of a crime name category, a criminal period category, and a law article category. When training a network for coding a crime fact description text, a plurality of samples can be input into the network for coding the crime fact description text through a plurality of iterations, one sample can be input each time, a prediction result aiming at the sample and output by the network for coding the crime fact description text are compared with the classes (a criminal name class, a criminal period class and a law article class) in a sample label to obtain a comparison result, and each network parameter in the network for coding the crime fact description text is adjusted through the comparison result.
In one possible example, when adjusting the network parameter, the network parameter may be adjusted by determining a comparison result obtained by the comparison through a loss function. The loss function may, for example, cross the entropy loss function.
The penalty for each prediction task may be obtained using a cross-entropy penalty function, the expression of which satisfies the condition shown in the following equation (29):
Figure BDA0003585711300000171
wherein y represents whether the data belongs to the current category (criminal name category, law article category or criminal period category), for example, y may take the value of 0 or 1, l represents the category, takes the value of 1 or 2 or 3,
Figure BDA0003585711300000172
indicates the prediction result, i.e. the probability of belonging to the current class,
Figure BDA0003585711300000173
to predict the loss of the ith sample as class l. A value of 1 or 2 or 3 is used to identify a criminal name category, a law article category or a criminal period category. Such as 1 for identifying a criminal name category, 2 for identifying a law article category, and 3 for identifying a criminal period category.
In some embodiments, the data of the probability distribution of the prediction classes, which refers to the combination of multiple chapters and the predicted classification result, may be represented by a set of numbers consisting of 1 and 0 in the training process. For example, each chapter may correspond to a sequence number, and when the data of the probability distribution is [35273, label1], it represents the 35273 th chapter, and the prediction category label is the first category. If the prediction result is consistent with the result of manual annotation, the result is marked as 1, otherwise, the result is marked as 0.
Based on the same technical concept, the embodiment of the present application provides a sentencing prediction device 800, which is shown in fig. 9. The device 800 may perform the various steps of the aforementioned sentencing prediction method, and will not be described in detail herein in order to avoid repetition. The apparatus 800 comprises an acquisition unit 801, a processing unit 802, including an acquisition unit and a processing unit;
the acquiring unit 801 is configured to acquire case-related information and a criminal fact description text, where the case-related information includes at least one of a testimony, a material evidence, information of an advertiser, a testimony, a mouth supply of a suspect, and a record; the crime fact description text comprises a plurality of chapters, each chapter of the plurality of chapters comprises a plurality of clauses, and each clause of the plurality of sentences comprises a plurality of participles;
the processing unit 802 is configured to perform vectorization processing on the participles included in the multiple chapters to obtain a first word vector corresponding to each participle, and perform vectorization processing on the participles included in the case-related information to obtain a second word vector corresponding to each participle; performing feature extraction on the first word vectors included in the plurality of chapters to obtain a first feature vector of each chapter in the plurality of chapters, and determining a prediction category of each chapter according to the first feature vector of each chapter, wherein the prediction category is a law article category or a criminal name category or a criminal period category;
the processing unit 802 is further configured to perform feature extraction on a second word vector included in the case related information to obtain a second feature vector of the case related information; and performing normal prediction, criminal name prediction and criminal phase prediction according to the prediction categories of the first feature vectors corresponding to the chapters and the second feature vectors.
In some embodiments, when the feature extraction is performed on the first word vector included in the chapters to obtain the first feature vector of each chapter of the chapters, the processing unit 802 is specifically configured to: filtering the participles in a first discourse based on a first word vector in the first discourse to obtain a filtered first discourse, wherein the first word vector in a plurality of clauses in the filtered first discourse is all related to discourse prediction, and the first discourse is any discourse in the plurality of discourse; combining the multiple clauses included in the filtered first discourse to obtain multiple clause combinations, wherein each clause combination in the multiple clause combinations comprises at least two clauses;
extracting the characteristics of each clause combination through a first semantic vector encoder to obtain a word-level characteristic vector of each clause combination; performing feature splicing on the word-level feature vectors of the multiple clause combinations to obtain statement vector representation of each clause combination;
performing feature extraction on the sentence vector of each sentence combination through a second semantic vector encoder to obtain a sentence-level feature vector of each sentence combination; and performing feature splicing on the sentence-level feature vectors of the plurality of sentence combinations to obtain the first feature vector.
In some embodiments, the processing unit 802 is further configured to: encoding position vectors corresponding to a plurality of first word vectors included in the filtered first chapters, wherein the position vectors corresponding to the first word vectors are used for representing positions of participles corresponding to the first word vectors in texts corresponding to the first chapters; fusing the first word vectors of the multiple participles included in the filtered first discourse with the corresponding position vectors to obtain fused word vectors of the multiple participles in the first discourse;
the processing unit 802, when performing feature extraction on each sentence combination through the first semantic vector encoder to obtain a word-level feature vector of each sentence combination, is specifically configured to: and performing feature extraction on a first sentence combination by adopting a first semantic vector encoder according to a fused word vector of a plurality of clauses included in the first sentence combination to obtain a word-level feature vector of the first sentence combination, wherein the first sentence combination is any one of the plurality of sentence combinations.
In other embodiments, the processing unit 802 is further configured to: encoding position vectors corresponding to a plurality of first statement vectors included in the filtered first chapters, wherein the position vectors corresponding to the first statement vectors are used for representing positions of clauses corresponding to the first statement vectors in texts corresponding to the first chapters; fusing the plurality of first sentence vectors included in the filtered first chapters with corresponding position vectors to obtain fused sentence vectors of a plurality of clauses in the first chapters;
the processing unit 802, when performing feature extraction on the sentence vector of each sentence combination by using the second semantic vector encoder to obtain a sentence-level feature vector of each sentence combination, is specifically configured to: and performing feature extraction on the first sentence combination by adopting a second semantic vector encoder according to a fusion sentence vector of a plurality of sentences included in the first sentence combination to obtain a sentence-level feature vector of the first sentence combination, wherein the first sentence combination is any one of the plurality of sentence combinations.
In some embodiments, the first semantic vector encoder comprises N attention network layers, a first neural network layer, and a first single-headed attention layer; each of the N attention network layers comprises a first multi-headed attention layer and a first additive normalization layer; n is a positive integer;
the processing unit 802, when performing feature extraction on a first sentence combination by using a first semantic vector encoder according to a fused word vector of a plurality of participles included in the first sentence combination to obtain a word-level feature vector of the first sentence combination, is specifically configured to: a plurality of attention modules included in the first multi-head attention layer in the ith attention network layer respectively perform attention operation on fused word vectors of a plurality of participles included in the first sentence combination to obtain outputs of the plurality of attention modules; the first addition normalization layer in the ith attention network layer splices the output results of the plurality of attention modules to obtain a spliced result; performing linear transformation on the splicing result according to the output result of the ith-1 attention network layer to obtain a first output result of a first multi-head attention layer of the ith attention network layer; i is a positive integer less than or equal to N; normalizing the first output result of the first multi-head attention layer of the ith attention network layer to obtain a second output result, wherein the second output result is used for linear transformation of the (i + 1) th attention network layer; performing feature extraction on the second output result of the Nth attention network layer through the first neural network layer to obtain a feature matrix of a first word segmentation of each sentence combination; and extracting characteristic information in a characteristic matrix of a first word segmentation through the first single-head attention layer to obtain a word-level characteristic vector of the first sentence combination.
In other embodiments, the second semantic vector encoder includes N attention network layers, a second neural network layer, and a second single-headed attention layer; each of the N attention network layers comprises a second multi-headed attention layer and a second additive normalization layer; n is a positive integer;
the processing unit 802, when performing feature extraction on a first sentence combination by using a second semantic vector encoder according to a fused sentence vector of a plurality of sentences included in the first sentence combination to obtain a sentence-level feature vector of the first sentence combination, is specifically configured to: a plurality of attention modules included in the second multi-head attention layer in the ith attention network layer respectively perform attention calculation on a fusion sentence vector of a plurality of sentences included in the first sentence combination to obtain outputs of the plurality of attention modules; the second addition normalization layer in the ith attention network layer splices the output results of the plurality of attention modules to obtain a spliced result; performing linear transformation on the splicing result according to the output result of the (i-1) th attention network layer to obtain a third output result of a second multi-head attention layer of the (i) th attention network layer; i is a positive integer less than or equal to N; normalizing the third output result of the second multi-head attention layer of the ith attention network layer to obtain a fourth output result, wherein the fourth output result is used for linear transformation of the (i + 1) th attention network layer; performing feature extraction on the fourth output result of the Nth attention network layer through the second neural network layer to obtain a feature matrix of the first clause of each clause combination; and extracting characteristic information in a characteristic matrix of the first clause through the second single-head attention layer to obtain a clause-level characteristic vector of the first clause combination.
In some embodiments, the processing unit 802 is specifically configured to, when performing the law prediction, the criminal name prediction, and the criminal period prediction according to the prediction categories of the first feature vectors corresponding to the chapters and the second feature vector: performing nonlinear transformation on the second feature vector and a first feature vector of which the prediction category is a normal category in a plurality of first feature vectors corresponding to the plurality of chapters to obtain a normal prediction vector, and performing normal prediction according to the normal prediction vector; carrying out nonlinear transformation on the second feature vector, a first feature vector with a prediction category being a guilty category in a plurality of first feature vectors corresponding to the plurality of chapters and the normal prediction vector to obtain a guilty prediction vector, and carrying out guilty prediction according to the guilty prediction vector; and carrying out nonlinear transformation on the second feature vector, the first feature vector with the prediction category being the criminal category in the plurality of first feature vectors corresponding to the plurality of chapters, the normal prediction vector and the criminal name prediction vector to obtain a criminal prediction vector, and carrying out criminal prediction according to the criminal prediction vector.
In other embodiments, the case-related information includes first data and second data; the first data comprise at least one item of testimony words, suspected person oral supply and notes, and the second data comprise at least one item of testimony, material evidence and information of an advertiser; the processing unit 802, when performing vectorization processing on the participles included in the case related information to obtain a second word vector corresponding to each participle, is specifically configured to: vectorizing the participles included in the first data to obtain a second word vector corresponding to each participle in the first data; determining the category to which each participle included in the second data belongs, and determining a category vector corresponding to the category to which each participle included in the second data belongs from a data vector table; the data vector table comprises category vectors corresponding to a plurality of categories; and determining the second word vector corresponding to each participle according to the category vector corresponding to the category to which each participle included in the second data belongs.
In some embodiments, the processing unit 802 is specifically configured to, when filtering the participles in the first chapters based on the first word vector included in the first chapters to obtain filtered first chapters: filtering, by a convolutional neural network, a plurality of first word vectors included in the first chapters to obtain the filtered first chapters.
Based on the same technical concept, the embodiment of the present application provides a crime prediction device 1000, which is shown in fig. 10. The device 1000 may perform the various steps of the aforementioned sentencing prediction method, and will not be described in detail herein in order to avoid repetition. The device 1000 includes a memory 1001 and a processor 1002.
The memory 1001 for storing program instructions;
the processor 1002 is configured to call the program instructions stored in the memory, and execute the above sentencing prediction method according to the obtained program.
An embodiment of the present application provides a computer-readable storage medium, which stores computer instructions, and when the computer instructions are executed on a computer, the computer is caused to execute the method according to the first aspect and any possible implementation manner included in the first aspect.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A crime prediction method, comprising:
acquiring case related information and a criminal fact description text, wherein the case related information comprises at least one of testimony, material evidence, informed person information, testimony, mouth supply of a suspect and a record; the crime fact description text comprises a plurality of chapters, each chapter of the plurality of chapters comprises a plurality of clauses, and each clause of the plurality of sentences comprises a plurality of participles;
vectorizing the participles included in the plurality of chapters to obtain a first word vector corresponding to each participle, and vectorizing the participles included in the case related information to obtain a second word vector corresponding to each participle;
performing feature extraction on the first word vectors included in the plurality of chapters to obtain a first feature vector of each chapter in the plurality of chapters, and determining a prediction category of each chapter according to the first feature vector of each chapter, wherein the prediction category is a law article category or a criminal name category or a criminal period category;
performing feature extraction on a second word vector included in the case related information to obtain a second feature vector of the case related information;
and performing normal prediction, criminal name prediction and criminal phase prediction according to the prediction categories of the first feature vectors corresponding to the chapters and the second feature vectors.
2. The method of claim 1, wherein feature extracting the first word vector included in the plurality of chapters to obtain the first feature vector of each chapter of the plurality of chapters comprises:
filtering the participles in a first chapter based on a first word vector included in the first chapter to obtain a filtered first chapter, wherein the first word vector in a plurality of clauses included in the filtered first chapter is related to sentry prediction, and the first chapter is any one of the plurality of chapters;
combining the multiple clauses included in the filtered first discourse to obtain multiple clause combinations, wherein each clause combination in the multiple clause combinations comprises at least two clauses;
extracting the characteristics of each clause combination through a first semantic vector encoder to obtain a word-level characteristic vector of each clause combination;
performing feature splicing on the word-level feature vectors of the multiple clause combinations to obtain statement vector representation of each clause combination;
performing feature extraction on the sentence vector of each sentence combination through a second semantic vector encoder to obtain a sentence-level feature vector of each sentence combination;
and performing feature splicing on the sentence-level feature vectors of the plurality of sentence combinations to obtain the first feature vector.
3. The method of claim 2, wherein the method further comprises:
encoding position vectors corresponding to a plurality of first word vectors included in the filtered first chapters, wherein the position vectors corresponding to the first word vectors are used for representing positions of participles corresponding to the first word vectors in texts corresponding to the first chapters;
fusing the first word vectors of the multiple participles included in the filtered first discourse with the corresponding position vectors to obtain fused word vectors of the multiple participles in the first discourse;
the extracting the feature of each clause combination through the first semantic vector encoder to obtain the word-level feature vector of each clause combination comprises the following steps:
and performing feature extraction on a first sentence combination by adopting a first semantic vector encoder according to a fused word vector of a plurality of clauses included in the first sentence combination to obtain a word-level feature vector of the first sentence combination, wherein the first sentence combination is any one of the plurality of sentence combinations.
4. The method of claim 2, wherein the method further comprises:
encoding position vectors corresponding to a plurality of first statement vectors included in the filtered first chapters, wherein the position vectors corresponding to the first statement vectors are used for representing positions of clauses corresponding to the first statement vectors in texts corresponding to the first chapters;
fusing the plurality of first sentence vectors included in the filtered first chapters with corresponding position vectors to obtain fused sentence vectors of a plurality of clauses in the first chapters;
the extracting the feature of the sentence vector of each sentence combination by the second semantic vector encoder to obtain the sentence-level feature vector of each sentence combination includes:
and performing feature extraction on the first sentence combination by adopting a second semantic vector encoder according to a fusion sentence vector of a plurality of sentences included in the first sentence combination to obtain a sentence-level feature vector of the first sentence combination, wherein the first sentence combination is any one of the plurality of sentence combinations.
5. The method of any of claims 1-4, wherein said performing forensic prediction, criminal name prediction and criminal phase prediction based on the prediction categories of the first plurality of feature vectors corresponding to the plurality of chapters and the second feature vector comprises:
performing nonlinear transformation on the second feature vector and a first feature vector of which the prediction category is a normal category in a plurality of first feature vectors corresponding to the plurality of chapters to obtain a normal prediction vector, and performing normal prediction according to the normal prediction vector;
carrying out nonlinear transformation on the second feature vector, a first feature vector with a prediction category being a guilty category in a plurality of first feature vectors corresponding to the plurality of chapters and the normal prediction vector to obtain a guilty prediction vector, and carrying out guilty prediction according to the guilty prediction vector;
and carrying out nonlinear transformation on the second feature vector, the first feature vector with the prediction category being the criminal category in the plurality of first feature vectors corresponding to the plurality of chapters, the normal prediction vector and the criminal name prediction vector to obtain a criminal prediction vector, and carrying out criminal prediction according to the criminal prediction vector.
6. The method according to any of the claims 1-4, wherein said case related information comprises first data and second data; the first data comprise at least one item of testimony words, suspected person oral supply and notes, and the second data comprise at least one item of testimony, material evidence and information of an advertiser; the vectorizing processing of the participles included in the case related information to obtain a second word vector corresponding to each participle includes:
vectorizing the participles included in the first data to obtain a second word vector corresponding to each participle in the first data;
determining the category to which each participle included in the second data belongs, and determining a category vector corresponding to the category to which each participle included in the second data belongs from a data vector table; the data vector table comprises category vectors corresponding to a plurality of categories; and determining the second word vector corresponding to each participle according to the category vector corresponding to the category to which each participle included in the second data belongs.
7. The method of any of claims 1-4, wherein the filtering the participles in the first chapter based on the first word vector comprised by the first chapter to obtain a filtered first chapter comprises:
filtering, by a convolutional neural network, a plurality of first word vectors included in the first chapters to obtain the filtered first chapters.
8. A crime prediction apparatus, comprising an acquisition unit and a processing unit;
the acquisition unit is used for acquiring case related information and criminal fact description texts, wherein the case related information comprises at least one of testimony, material evidence, information of an advertiser, testimony, oral supply of a suspect and a note; the crime fact description text comprises a plurality of chapters, each chapter of the plurality of chapters comprises a plurality of clauses, and each clause of the plurality of sentences comprises a plurality of participles;
the processing unit is used for vectorizing the participles included in the plurality of chapters to obtain a first word vector corresponding to each participle, and vectorizing the participles included in the case related information to obtain a second word vector corresponding to each participle; performing feature extraction on the first word vectors included in the plurality of chapters to obtain a first feature vector of each chapter in the plurality of chapters, and determining a prediction category of each chapter according to the first feature vector of each chapter, wherein the prediction category is a law article category or a criminal name category or a criminal period category;
the processing unit is further configured to perform feature extraction on a second word vector included in the case related information to obtain a second feature vector of the case related information; and performing normal prediction, criminal name prediction and criminal phase prediction according to the prediction categories of the first feature vectors corresponding to the chapters and the second feature vectors.
9. A crime prediction apparatus comprising a memory and a processor;
the memory to store program instructions;
the processor, for calling program instructions stored in the memory, for executing the method of any one of claims 1-7 according to the obtained program.
10. A computer-readable storage medium having stored thereon computer instructions which, when executed on a computer, cause the computer to perform the method of any one of claims 1-7.
CN202210365513.7A 2022-04-07 2022-04-07 Criminal investigation prediction method and device Active CN114860900B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210365513.7A CN114860900B (en) 2022-04-07 2022-04-07 Criminal investigation prediction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210365513.7A CN114860900B (en) 2022-04-07 2022-04-07 Criminal investigation prediction method and device

Publications (2)

Publication Number Publication Date
CN114860900A true CN114860900A (en) 2022-08-05
CN114860900B CN114860900B (en) 2024-11-01

Family

ID=82630211

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210365513.7A Active CN114860900B (en) 2022-04-07 2022-04-07 Criminal investigation prediction method and device

Country Status (1)

Country Link
CN (1) CN114860900B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376227A (en) * 2018-10-29 2019-02-22 山东大学 A kind of prison term prediction technique based on multitask artificial neural network
CN109472021A (en) * 2018-10-12 2019-03-15 北京诺道认知医学科技有限公司 Critical sentence screening technique and device in medical literature based on deep learning
CN111160050A (en) * 2019-12-20 2020-05-15 沈阳雅译网络技术有限公司 Chapter-level neural machine translation method based on context memory network
CN111768024A (en) * 2020-05-20 2020-10-13 中国地质大学(武汉) Criminal period prediction method and equipment based on attention mechanism and storage equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109472021A (en) * 2018-10-12 2019-03-15 北京诺道认知医学科技有限公司 Critical sentence screening technique and device in medical literature based on deep learning
CN109376227A (en) * 2018-10-29 2019-02-22 山东大学 A kind of prison term prediction technique based on multitask artificial neural network
CN111160050A (en) * 2019-12-20 2020-05-15 沈阳雅译网络技术有限公司 Chapter-level neural machine translation method based on context memory network
CN111768024A (en) * 2020-05-20 2020-10-13 中国地质大学(武汉) Criminal period prediction method and equipment based on attention mechanism and storage equipment

Also Published As

Publication number Publication date
CN114860900B (en) 2024-11-01

Similar Documents

Publication Publication Date Title
CN110598206B (en) Text semantic recognition method and device, computer equipment and storage medium
Dragoni et al. A neural word embeddings approach for multi-domain sentiment analysis
CN111783474B (en) Comment text viewpoint information processing method and device and storage medium
Ay Karakuş et al. Evaluating deep learning models for sentiment classification
Zhao et al. The study on the text classification for financial news based on partial information
CN110427623A (en) Semi-structured document Knowledge Extraction Method, device, electronic equipment and storage medium
CN111597803B (en) Element extraction method and device, electronic equipment and storage medium
CN114140673B (en) Method, system and equipment for identifying violation image
CN111324738B (en) Method and system for determining text label
CN113065358A (en) Text-to-semantic matching method based on multi-granularity alignment for bank consultation service
Gridach et al. Churn identification in microblogs using convolutional neural networks with structured logical knowledge
CN114491258A (en) Keyword recommendation system and method based on multi-modal content
Gridach Deep learning approach for arabic named entity recognition
Su et al. Exploring encoder-decoder model for distant supervised relation extraction.
CN118094639B (en) Enterprise big data mining method and system based on artificial intelligence
CN114417823A (en) Aspect level emotion analysis method and device based on syntax and graph convolution network
CN114281996A (en) Long text classification method, device, equipment and storage medium
CN114266252A (en) Named entity recognition method, device, equipment and storage medium
CN112036439A (en) Dependency relationship classification method and related equipment
CN115422920B (en) Method for identifying dispute focus of referee document based on BERT and GAT
CN114860900B (en) Criminal investigation prediction method and device
Xie et al. Ercnn: Enhanced recurrent convolutional neural networks for learning sentence similarity
Surana et al. Identifying contradictions in the legal proceedings using natural language models
CN113191135A (en) Multi-category emotion extraction method fusing facial characters
CN111158640B (en) One-to-many demand analysis and identification method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant