CN111488503A

CN111488503A - Case classification method and device

Info

Publication number: CN111488503A
Application number: CN201910087480.2A
Authority: CN
Inventors: 周鑫; 张雅婷; 孙常龙; 刘晓钟; 司罗
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-01-29
Filing date: 2019-01-29
Publication date: 2020-08-04

Abstract

The invention discloses a case classification method and device. Wherein, the method comprises the following steps: acquiring multiple types of characteristics associated with cases to be classified, wherein the multiple types of characteristics comprise: text features, voice features, image features, discrete features; and setting the various types of features as input parameters of a neural network classification model, and obtaining a classification result through the neural network classification model. The invention solves the technical problem of low efficiency when processing cases in judicial mode in the related art.

Description

Case classification method and device

Technical Field

The invention relates to the technical field of information processing, in particular to a case classification method and device.

Background

Currently, as people's legal consciousness is enhanced, cases needing to be handled by a judge are more and more, in a traditional judicial mode, a plurality of processes such as case setting, mediation, delivery, court trial, judgment, execution, filing, complaint and the like are generally involved, in the related art, each node for processing the processes is generally operated by natural people (the judge, the party and other auxiliary personnel), but with the increase of the cases, the manual operation is still used, the required labor cost is too high, and the current labor quantity cannot deal with the gradual increase of the case quantity, so that the case handling efficiency is low.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the invention provides a case classification method and device, which at least solve the technical problem of low efficiency when cases in a judicial mode are processed in the related art.

According to an aspect of an embodiment of the present invention, there is provided a case classification method, including: acquiring multiple types of features related to cases to be classified, wherein the multiple types of features comprise: text features, voice features, image features, discrete features; and setting the various types of features as input parameters of a neural network classification model, and obtaining a classification result through the neural network classification model.

According to another aspect of the embodiments of the present invention, there is also provided a case classifying apparatus, including: the device comprises an acquisition unit, a classification unit and a classification unit, wherein the acquisition unit is used for acquiring multiple types of characteristics related to cases to be classified, and the multiple types of characteristics comprise: text features, voice features, image features, discrete features; and the classification unit is used for setting the various types of characteristics as input parameters of a neural network classification model and obtaining a classification result through the neural network classification model.

According to another aspect of the embodiments of the present invention, there is also provided a storage medium, the storage medium including a stored program, wherein when the program runs, the apparatus on which the storage medium is located is controlled to execute any of the case classification methods described above.

According to another aspect of the embodiments of the present invention, there is also provided a terminal, including: a first device; a second device; a processor that executes a program, wherein the program when executed performs the following processing steps for data output from the first and second devices: the method comprises a first step of acquiring multiple types of characteristics associated with cases to be classified, wherein the multiple types of characteristics comprise: text features, voice features, image features, discrete features; and secondly, setting the multiple types of features as input parameters of a neural network classification model, and obtaining a classification result through the neural network classification model.

The invention can be applied to various case division works, and aims to reduce the workload of manual case division. Due to the fact that the characteristics related to cases to be classified are used, the classification accuracy can be improved to a greater extent, manual workload can be reduced, complicated and simplified cases are divided into a complicated case and a simplified case, the complicated cases are directly pushed to a judge, the simple cases are pre-examined by an intelligent judging system first and then pushed to the judge to be confirmed, the purpose of relieving load of the judge is achieved, and the working efficiency of judicial workers is improved.

In the embodiment of the invention, a plurality of types of characteristics associated with cases to be classified are acquired, wherein the plurality of types of characteristics comprise: the method comprises the steps of setting various types of features as input parameters of a neural network classification model, and obtaining a classification result through the neural network classification model. In the embodiment, multiple types of characteristics associated with cases to be classified can be used as input parameters, the cases to be classified are classified through the neural network classification model, the workload of manual case classification in a judicial mode is reduced, complex and simple division of intelligent cases is realized, the case classification efficiency is improved, and the technical problem of low efficiency in case processing in the judicial mode in the related technology is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a block diagram showing a hardware configuration of a computer terminal for implementing a case classification method;

FIG. 2 shows a schematic diagram of a case classification method network terminal;

FIG. 3 is a flowchart of a case classification method according to a first embodiment of the present invention;

FIG. 4 is a schematic illustration of an alternative legal knowledge graph according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of an alternative neural network classification model according to an embodiment of the present invention;

FIG. 6 is a system block diagram of a case classification system according to an embodiment of the present invention;

fig. 7 is a schematic view of a case sorting apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, some terms or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:

n L P Natural L language Processing.

OCR: optical Character Recognition, an Optical Character Recognition technique, refers to a process in which an electronic device checks characters printed on paper, determines the shape of the characters by detecting dark and light patterns, and then translates the shape into computer characters by a Character Recognition method; namely, the process of scanning the text data, then analyzing and processing the image file and obtaining the character and layout information.

ASR automated Speech Recognition, a Speech Recognition technique, whose goal is to convert the vocabulary content in human Speech into computer-readable input.

The Skip-gram belongs to natural language processing, is improved from a feedforward neural network model, and is a kind of neural network model of Word2 vec. In many natural language processing tasks, many word expressions are determined by their tf-idf scores, which determine the relative importance of a word in a text, and given an unlabeled corpus, generate a vector of expressible semantics for the words in the corpus.

CBOW: the continuous bag-of-words model is a mirror image of the skip-gram model, and is also a Word2vec processing model.

A neural network, generally comprising: the input layer comprises a plurality of neurons, and a large amount of nonlinear input information is received through the neurons, wherein the input information can be called as input vectors; the information in the output layer is transmitted, analyzed and balanced in the neuron link to obtain an output result, and the output information is an output vector; and the hidden layer is formed by a plurality of neurons and links between the input layer and the output layer.

Bidirectional L STM, Bi-directional L STM, bidirectional long-short term memory network.

CNN, convolutional neural network.

Example 1

There is also provided, in accordance with an embodiment of the present invention, a method embodiment of case classification, noting that the steps illustrated in the flowchart of the accompanying figures may be performed in a computer system, such as a set of computer-executable instructions, and that, while a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than presented herein.

The method provided by the first embodiment of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Fig. 1 shows a hardware configuration block diagram of a computer terminal (or mobile device) for implementing the case classification method. As shown in fig. 1, the computer terminal 10 (or mobile device 10) may include one or more (shown as 102a, 102b, … …, 102 n) processors 102 (the processors 102 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA, etc.), a memory 104 for storing data, and a transmission module 106 for communication functions. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial Bus (USB) port (which may be included as one of the ports of the I/O interface), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

It should be noted that the one or more processors 102 and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuit may be a single stand-alone processing module, or incorporated in whole or in part into any of the other elements in the computer terminal 10 (or mobile device). As referred to in the embodiments of the application, the data processing circuit acts as a processor control (e.g. selection of a variable resistance termination path connected to the interface).

The memory 104 may be used for storing software programs and modules of application software, such as program instructions/data storage devices corresponding to the case classification method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the software programs and modules stored in the memory 104, so as to implement the case classification method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

The display may be, for example, a touch screen-type liquid crystal display (L CD) that may enable a user to interact with the user interface of the computer terminal 10 (or mobile device).

Fig. 1 shows a block diagram of a hardware structure, which may be taken as an exemplary block diagram of the computer terminal 10 (or the mobile device) and also taken as an exemplary block diagram of the server, in an alternative embodiment, fig. 2 shows a schematic diagram of a case classification method network terminal, and as shown in fig. 2, the computer terminal 10 (or the mobile device) may be connected or electronically connected to one or more servers (e.g., a security server, a resource server, a game server, etc.) via a data network. In an alternative embodiment, the computer terminal 10 (or mobile device) may be any mobile computing device or the like. The data network connection may be a local area network connection, a wide area network connection, an internet connection, or other type of data network connection. The computer terminal 10 (or mobile device) may execute to connect to a network service executed by a server (e.g., a secure server) or a group of servers. A web server is a network-based user service such as social networking, cloud resources, email, online payment, or other online applications.

Under the above operating environment, the present application provides a case classification method as shown in fig. 3. FIG. 3 is a flowchart of a case classification method according to a first embodiment of the present invention. As shown in fig. 3, the method includes:

step S302, obtaining multiple types of characteristics related to the cases to be classified, wherein the multiple types of characteristics comprise: text features, voice features, image features, discrete features;

and step S304, setting the various types of characteristics as input parameters of a neural network classification model, and obtaining a classification result through the neural network classification model.

Through the steps, multiple types of characteristics related to the cases to be classified are obtained, wherein the multiple types of characteristics comprise: the method comprises the steps of setting various types of features as input parameters of a neural network classification model, and obtaining a classification result through the neural network classification model. In the embodiment, multiple types of characteristics associated with cases to be classified can be used as input parameters, the cases to be classified are classified through the neural network classification model, the workload of manual case classification in a judicial mode is reduced, complex and simple division of intelligent cases is realized, the case classification efficiency is improved, and the technical problem of low efficiency in case processing in the judicial mode in the related technology is solved.

The embodiment of the invention can be applied to various case division works and strives to reduce the workload of manual case division, preferably, in the application, the case division in a judicial mode is taken as an explanation to help related case division personnel to carry out complex and simple classification. Due to the fact that the characteristics related to cases to be classified are used, the classification accuracy can be improved to a greater extent, manual workload can be reduced, intelligent cases are divided into a large variety and a simple variety, the cases can be divided into the large variety and the simple variety, then the complicated cases can be directly pushed to a judge, the simple cases are pre-reviewed by an intelligent judging system first and then pushed to the judge to be confirmed, the purpose of relieving the judge is achieved, and the working efficiency of judicial workers is improved.

Optionally, the case related to the embodiment of the present invention may be a case of an entity party, or may be a network transaction case.

The present invention will be described below with reference to the respective steps.

Step S302, obtaining multiple types of characteristics related to the cases to be classified, wherein the multiple types of characteristics comprise: text features, speech features, image features, discrete features.

The cases to be classified can be cases sorted manually or by terminals, and can include a plurality of case information, including but not limited to: the starting party, the complaint party, the case occurrence time, the case type, the case course, the case number and the like. The cases to be classified can be directly input through a terminal, and after the cases to be classified are determined, various types of characteristics related to the cases to be classified are collected.

Optionally, the discrete features include: the first discrete feature subset, then acquiring the multiple types of features associated with the case to be classified comprises: a first subset of discrete features is extracted from litigation material of a case to be classified. Litigation materials may include, but are not limited to: a prosecution book, a debate book, evidence information, etc., for example, the original complaint information can be extracted from the prosecution book; extracting whether refund and whether compensation right is reserved from the evidence transaction log; the discount price, the actual price, etc. are extracted from the evidentiary commodity information. Namely, the corresponding characteristic parameters can be obtained through information extraction or data mining.

Alternatively, the discrete features further comprise: a second subset of discrete features, the text features comprising: the first text feature subset, acquiring multiple types of features associated with cases to be classified comprises: and carrying out user portrait analysis on the party of the case to be classified to obtain an analysis result, wherein the data source of the user portrait analysis comprises at least one of the following data sources: historical litigation data of the party, and transaction information and behavior information of the party in the network transaction process; and performing feature engineering construction by adopting the analysis result to obtain a second discrete feature subset and/or a first text feature subset.

Namely, the analysis result can be determined by analyzing the parties of the cases to be classified, and the historical litigation data of the parties can comprise the complaint data and the active complaint data. For example, the parties involved in a network transaction case may be divided into buyers and sellers, and their users may be drawn as litigation history from two parts, one being the parties' litigation history, and their data sources may include: the characteristics that data and the like of executed persons of Internet courts and highest laws can be mined comprise buyer complaint initiating times, seller complaint receiving times, executed times of parties and the like; and secondly, the user portrays the user, namely the user can dig out the preferences of the party and the number of times of complaints of the buyer, the number of times of complaints of the seller in disputes, the credit score and star grade of the buyer, the credit score and star grade of the seller, the star grade of the shop, the dispute record of the seller, the chat record and other text information by using the party in the transaction information and behavior information of the application software. The principal's representation has both the second discrete feature subset and the first text feature subset.

As an alternative embodiment of the present invention, the text features include: the second text feature subset, acquiring the multiple types of features associated with the case to be classified, comprises: carrying out natural language processing on text data in litigation materials of cases to be classified to obtain a processing result, wherein the natural language processing comprises the following steps: word segmentation, part of speech tagging and entity identification; performing feature engineering construction by adopting the processing result, performing statistics on the processing result by a word frequency class statistical mode to obtain statistical features, and training the processing result by adopting a word vector model to obtain word vector features; and determining the statistical features and the word vector features as a second text feature subset.

Optionally, the materials that the second subset of text features can analyze include, but are not limited to: the evidence of the submissions of the appeal, the answer, the original and/or the defendant, wherein the appeal and the answer are the most important case analysis materials and are very important characteristic sources for case description and anti-dialect texts. Two types of features may be extracted for this second subset of text features, the first, statistical word features, such as tf-idf, and the second, word vector features.

The method comprises the steps of carrying out natural language processing on text data in litigation materials of cases to be classified to obtain a processing result, preprocessing the materials, for example, carrying out N L P processing such as word segmentation, part of speech tagging and entity recognition on the materials, mining statistical characteristics of words after the natural language processing, namely, carrying out feature engineering construction by using the processing result, carrying out statistics on the processing result by a word frequency type statistical mode to obtain statistical characteristics, for example, mining statistical characteristics such as tf-idf and tf, and the like.

In another alternative embodiment of the present invention, the text features include: the third text feature subset, the fourth text feature subset and the fifth text feature subset, and the obtaining of the multiple types of features associated with the case to be classified comprises: classifying the evidence materials of the cases to be classified to obtain a classification result; and adopting the classification result to construct feature engineering, extracting a third text feature subset from the text type evidence, extracting a fourth text feature subset and/or image features from the image type evidence, and extracting a voice feature and/or a fifth text feature subset from the audio or video type evidence.

Among the text features, there would typically be included: text features recorded by text (e.g., txt, word), text features obtained by picture recognition, text features by voice recognition and/or video recognition. For example, the description is made by a network transaction dispute, and the evidence for the network transaction dispute generally includes a screenshot of a commodity detail page, a commodity picture/photo, an identification book, a chat record, a commodity video and the like; for pictures, the core is evidence of characters, specific characters need to be identified by technologies such as OCR and the like, the evidence mainly includes evidence of commodities (including commodity types, commodity quantity, commodity price and the like), brands and the like, and features need to be extracted by technologies such as image segmentation, extraction and the like; for evidence of speech type, ASR and other techniques are needed to convert sound into text.

In another alternative embodiment of the present invention, the discrete features further comprise: the third discrete feature subset, the obtaining of the multiple types of features associated with the case to be classified includes: and filling the first discrete feature subset into a legal knowledge graph, and performing feature engineering construction to obtain a third discrete feature subset, wherein the legal knowledge graph is constructed in advance according to the case routing field to which the case to be classified belongs.

The embodiment of the invention can realize the operations of automatic classification, judgment, reasoning and the like through the legal knowledge graph, and can construct different legal knowledge graphs according to different case-based fields, and the knowledge graph can be constructed manually by legal experts and also can be constructed automatically by a terminal and an algorithm. FIG. 4 is a schematic diagram of an alternative legal knowledge graph according to an embodiment of the present invention, as shown in FIG. 4, where each node represents an element, decision point, or logic gate, and the node to the left of each edge is the input to the node to the right, this graph contains all the elements required to generate the document and the intermediate nodes of the decision logic, for example, the initial input data may be the number of litigation times of the identified plaintiff in the web of referees in China and/or the number of litigation times of the plaintiff in the mutual law, then, a parameter (e.g. 001 in fig. 4) is obtained by combining the information of the original (including "whether the original was self-confirmed," whether the original was reported or not "," whether the original was self-confirmed "," whether the original was judged 3 times or more in the chinese judge netbook ", and whether the original was self-confirmed by the mutual law or not in the knowledge graph), and whether the original was self-confirmed is determined by the parameter.

Alternatively, the relationships of the knowledge-graph may be stored in the form of triples.

In this embodiment, after the legal knowledge graph is obtained, the first discrete feature subset is filled in the legal knowledge graph to perform feature engineering construction, so as to obtain a third discrete feature subset, where the third discrete feature subset may refer to each legal element feature of a case, and the complexity of the case is generally closely related to the legal elements.

In an alternative embodiment, the third discrete feature subset comprises: whether the legal element exists, the weight (importance degree) of the legal element, and the shortest path length from the legal element to the referee node.

After the multiple types of features are determined, the multiple types of features can be used as input parameters and input into a neural network classification model to classify the case to be classified. Optionally, compared with the traditional linear classification model, the neural network classification model in the invention can fuse more types of features, support a vector machine, maximize entropy and the like, and improve the classification accuracy.

In the embodiment of the invention, the setting of the various types of features as input parameters of the neural network classification model, and the obtaining of the classification result through the neural network classification model comprises the following steps: setting the multiple types of features as input parameters, respectively adopting a feature encoder corresponding to each type of feature in the multiple types of features to perform encoding processing, and outputting dense feature vectors together; and taking the dense feature vectors as input parameters of a next-layer encoder, and obtaining a classification result through encoding processing.

Fig. 5 is a schematic diagram of a network structure of an optional neural network classification model according to an embodiment of the present invention, in the embodiment of the present invention, when performing classification through the neural network classification model, as shown in fig. 5, in the network structure, different feature encoders are respectively used for different types of features, and optionally, a feature encoder corresponding to each type of feature in the multiple types of features is respectively used for encoding processing, including encoding processing of text and/or speech features by using a bidirectional long-short term memory network, encoding processing of image features by using a convolutional neural network, and encoding processing of discrete features by using a multilayer perceptron, that is, for the text and/or speech features, Bi-L STM (bidirectional long-short term memory network L STM) is used, for the image features, CNN (convolutional neural network) is used, for the discrete features, M L P (multilayer perceptron) is used, dense feature vectors are obtained by different encoders, and are used as inputs of a next encoder, and the next encoder may be a normal neural layer or a transform based on an attion mechanism, and finally is an output layer, and a classification result is given.

The present invention is explained by a detailed classification block diagram, fig. 6 is a system block diagram of a case classification system according to an embodiment of the present invention, and as shown in fig. 6, the case classification system includes 5 input modules, which are respectively: the system comprises an information extraction and data mining module 61, a legal element feature engineering module 62, a party portrait feature engineering module 63, a text evidence feature engineering module 64 and a multi-mode evidence feature engineering module 65, wherein the legal element feature engineering module 62 can perform feature matching through the constructed legal knowledge map 66 to obtain each legal element feature. The case classification system also comprises a neural network classification model 67, and all the characteristics input in the information extraction and data mining module 61, the legal element characteristic engineering module 62, the person concerned portrait characteristic engineering module 63, the text evidence characteristic engineering module 64 and the multi-mode evidence characteristic engineering module 65 can be input into the neural network classification model 67 so as to classify the cases to be classified through the neural network classification model. The above-described respective modules are explained below.

With respect to information extraction/data mining module 61, the role of this module is to mine meaningful discrete features and entities, providing input to modules 62 and 67. The input sources of the module comprise a complaint book, a debate book, evidence information and the like, for example, original complaint information can be extracted from the complaint book; extracting whether refund and whether compensation right is reserved from the evidence transaction log; the discount price, the actual price, etc. are extracted from the evidentiary commodity information.

Regarding the characteristic engineering module 63 of the person concerned, the person concerned in the network transaction case can be classified into buyer and seller, and the user image comes from two parts, one is the litigation history of the person concerned, the executed person data from the internet court and the highest law, and the characteristics that can be mined are buyer appeal number, seller appeal number, executed person appeal number, etc.; the second is user portrayal in Ali, which can be used for mining the preferences of the parties and the dispute complaints of the buyer, the complaints of the seller in disputes, the credit score and star grade of the buyer, the credit score and star grade of the seller, the star grade of the shop, the dispute records of the seller, the chatting records and other text information from the parties in the transaction information and behavior information of the application software. The person of interest has both discrete and textual features in the representation.

Regarding the text evidence feature engineering module 64, the appeal and answer forms are the most important case analysis materials, and the evidence submitted by the original notice is provided, so that the case description and the anti-dialectic text are very important feature sources, two types of features are mainly extracted aiming at the text, the statistical features of words, such as tf-idf and the vector features of the words, are specifically processed by the basic N L P, such as preprocessing, word segmentation, part-of-speech tagging, entity recognition and the like, the statistical features of tf-idf, tf and the like are mined, the text materials of appeal, answer forms, judgment books and the like in the network transaction case are used, the more the words are better, the words can be segmented, CBOW, Skip-gram and the like can be used for training word vectors of the linguistic materials after word segmentation, and the specific method is not limited in the embodiment of the invention.

Regarding the multi-modal evidence feature engineering module 65, the evidence for the network transaction dispute generally includes screenshots of the commodity detail page, commodity pictures, photos, appraisal books, chat records, commodity videos and other evidences; for the picture, the core is evidence of characters, the OCR technology is required to identify specific characters, for example, evidences of commodities, brands and the like, the image segmentation, extraction and other technologies are required to extract features, and the feature can be specifically the gray level of the picture; for speech-type evidence, ASR techniques are needed to convert sound into text.

Regarding the legal knowledge graph 66, which is a core data structure for subsequent automatic referee reasoning, different knowledge graphs are constructed according to different case domain needs, either manually by legal experts or automatically by algorithms.

With regard to the legal element feature engineering module 62, the features extracted by this module are all related to legal elements, there is a legal knowledge graph 66, and the key information obtained in the information extraction and data mining module 61 is used to match the legal knowledge graph 66, so that the features can be obtained by: whether legal elements exist; weight of legal element (degree of importance); the shortest path length from the legal element to the referee node.

Firstly, different feature encoders are respectively used for different types of features, namely Bi-L STM (bidirectional L STM), CNN (convolutional neural network) and M L P (multilayer perceptron), dense feature vectors are obtained through different encoders and are used as the input of a next encoder, the next encoder can be a common neuron layer or a transformer based on an attention mechanism, and finally, an output layer is used for giving a classification result.

Through the embodiment, case division links in a judicial mode can be realized, various types of features associated with cases to be classified are adopted, the various types of features are used as input parameters, intelligent case division is carried out by utilizing a classification model, the cases can be divided into a large class and a simple class, so that the complicated cases can be directly pushed to a judge, the simple cases are firstly prequalified by an intelligent judging system and then pushed to the judge for confirmation, the purpose of reducing the burden of the judge is achieved, the workload of classification personnel can be reduced, and the classification efficiency is improved.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

Through the above description of the embodiments, those skilled in the art can clearly understand that the case classification method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation manner in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Example 2

According to an embodiment of the present invention, there is also provided a case sorting apparatus for implementing the case sorting method described above, and fig. 7 is a schematic view of a case sorting apparatus according to an embodiment of the present invention, as shown in fig. 7, the apparatus includes: an acquisition unit 71, a classification unit 73, wherein,

an obtaining unit 71, configured to obtain multiple types of features associated with cases to be classified, where the multiple types of features include: text features, voice features, image features, discrete features;

and the classification unit 73 is used for setting the various types of features as input parameters of the neural network classification model, and obtaining a classification result through the neural network classification model.

The case classification device acquires various types of characteristics associated with cases to be classified through the acquisition unit 71, wherein the various types of characteristics comprise: the text features, the voice features, the image features and the discrete features are set as input parameters of a neural network classification model through a classification unit 73, and a classification result is obtained through the neural network classification model. In the embodiment, multiple types of characteristics associated with cases to be classified can be used as input parameters, the cases to be classified are classified through the neural network classification model, the workload of manual case classification in a judicial mode is reduced, complex and simple division of intelligent cases is realized, the case classification efficiency is improved, and the technical problem of low efficiency in case processing in the judicial mode in the related technology is solved.

Optionally, the discrete features comprise: a first subset of discrete features, a second subset of discrete features, and a third subset of discrete features, the text features comprising: the first text feature subset, the second text feature subset, the third text feature subset, the fourth text feature subset, and the fifth text feature subset, and the obtaining unit includes: the information extraction and data mining module is used for extracting a first discrete feature subset from litigation material of cases to be classified; the legal element feature engineering module is used for carrying out feature engineering construction by filling the first discrete feature subset into a legal knowledge graph to obtain a third discrete feature subset, wherein the legal knowledge graph is constructed in advance according to the case routing field to which the case to be classified belongs; the principal part portrait characteristic engineering module is used for carrying out user portrait analysis on the principal part of the case to be classified to obtain an analysis result, wherein the data source of the user portrait analysis comprises at least one of the following data sources: historical litigation data of the party, and transaction information and behavior information of the party in the network transaction process; performing feature engineering construction by adopting the analysis result to obtain a second discrete feature subset and/or a first text feature subset; the text evidence characteristic engineering module is used for performing natural language processing on text data in litigation materials of cases to be classified to obtain a processing result, wherein the natural language processing comprises the following steps: word segmentation, part of speech tagging and entity identification; performing feature engineering construction by adopting the processing result, performing statistics on the processing result by a word frequency class statistical mode to obtain statistical features, and training the processing result by adopting a word vector model to obtain word vector features; determining the statistical features and the word vector features as a second text feature subset; the multi-modal evidence characteristic engineering module is used for classifying the evidence materials of the case to be classified to obtain a classification result; and adopting the classification result to construct feature engineering, extracting a third text feature subset from the text type evidence, extracting a fourth text feature subset and/or image features from the image type evidence, and extracting a voice feature and/or a fifth text feature subset from the audio or video type evidence.

It should be noted here that the acquiring unit 71 and the classifying unit 73 correspond to steps S302 to S304 in embodiment 1, and the two modules are the same as the examples and application scenarios realized by the corresponding steps, but are not limited to the disclosure of the first embodiment. It should be noted that the modules described above as part of the apparatus may be run in the computer terminal 10 provided in the first embodiment.

Example 3

The embodiment of the invention can provide a computer terminal which can be any computer terminal device in a computer terminal group. Optionally, in this embodiment, the computer terminal may also be replaced with a terminal device such as a mobile terminal.

Optionally, in this embodiment, the computer terminal may be located in at least one network device of a plurality of network devices of a computer network.

Optionally, in this embodiment, the terminal may include: a first device; a second device; a processor running a program, wherein the program is running to perform the following processing steps on data output from the first device and the second device: the method comprises a first step of acquiring multiple types of characteristics associated with cases to be classified, wherein the multiple types of characteristics comprise: text features, voice features, image features, discrete features; and secondly, setting the various types of features as input parameters of a neural network classification model, and obtaining a classification result through the neural network classification model.

In this embodiment, the computer terminal may execute the program code of the following steps in the case classification method: acquiring multiple types of characteristics associated with cases to be classified, wherein the multiple types of characteristics comprise: text feature voice feature, image feature, discrete feature; and setting the various types of features as input parameters of a neural network classification model, and obtaining a classification result through the neural network classification model.

The computer terminal may further include a memory, where the memory may be used to store software programs and modules, such as program instructions/modules corresponding to the case classification method and apparatus in the embodiments of the present invention, and the processor executes various functional applications and data processing by running the software programs and modules stored in the memory, so as to implement the case classification method described above. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some instances, the memories may further include a memory located remotely from the processor, which may be connected to the terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: a first subset of discrete features is extracted from litigation material of a case to be classified.

Optionally, the processor may further execute the program code of the following steps: and carrying out user portrait analysis on the party of the case to be classified to obtain an analysis result, wherein the data source of the user portrait analysis comprises at least one of the following data sources: historical litigation data of the party, and transaction information and behavior information of the party in the network transaction process; and performing feature engineering construction by adopting the analysis result to obtain a second discrete feature subset and/or a first text feature subset.

Optionally, the processor may further execute the program code of the following steps: carrying out natural language processing on text data in litigation materials of cases to be classified to obtain a processing result, wherein the natural language processing comprises the following steps: word segmentation, part of speech tagging and entity identification; performing feature engineering construction by adopting the processing result, performing statistics on the processing result by a word frequency class statistical mode to obtain statistical features, and training the processing result by adopting a word vector model to obtain word vector features; and determining the statistical features and the word vector features as a second text feature subset.

Optionally, the processor may further execute the program code of the following steps: classifying the evidence materials of the cases to be classified to obtain a classification result; and adopting the classification result to construct feature engineering, extracting a third text feature subset from the text type evidence, extracting a fourth text feature subset and/or image features from the image type evidence, and extracting a voice feature and/or a fifth text feature subset from the audio or video type evidence.

Optionally, the processor may further execute the program code of the following steps: and filling the first discrete feature subset into a legal knowledge graph, and performing feature engineering construction to obtain a third discrete feature subset, wherein the legal knowledge graph is constructed in advance according to the case routing field to which the case to be classified belongs.

Optionally, the processor may further execute the program code of the following steps: setting the multiple types of features as input parameters, respectively adopting a feature encoder corresponding to each type of feature in the multiple types of features to perform encoding processing, and outputting dense feature vectors together; and taking the dense feature vectors as input parameters of a next-layer encoder, and obtaining a classification result through encoding processing.

Optionally, the processor may further execute the program code of the following steps: coding the text and/or voice characteristics by adopting a bidirectional long-short term memory network; coding the image characteristics by adopting a convolutional neural network; and (4) adopting a multilayer perceptron to carry out coding processing on the discrete features.

By adopting the embodiment of the invention, the method and the device for obtaining the multiple types of characteristics associated with the cases to be classified are provided, wherein the multiple types of characteristics comprise: the method comprises the steps of setting various types of features as input parameters of a neural network classification model, and obtaining a classification result through the neural network classification model. The cases to be classified are classified through the neural network classification model, the workload of manual case classification in the judicial mode is reduced, complex and simple division of intelligent cases is realized, the case classification efficiency of the judicial mode is improved, and the technical problem of low efficiency in case processing in the judicial mode in the related technology is solved.

It can be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration, and the computer terminal may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 1 is a diagram illustrating a structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

Example 4

The embodiment of the invention also provides a storage medium. Optionally, in this embodiment, the storage medium may be configured to store a program code executed by the case classification method provided in the first embodiment.

Optionally, in this embodiment, the storage medium may be located in any one of computer terminals in a computer terminal group in a computer network, or in any one of mobile terminals in a mobile terminal group.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring multiple types of characteristics associated with cases to be classified, wherein the multiple types of characteristics comprise: text feature voice feature, image feature, discrete feature; and setting the various types of features as input parameters of a neural network classification model, and obtaining a classification result through the neural network classification model.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A case classification method is characterized by comprising the following steps:

acquiring multiple types of features related to cases to be classified, wherein the multiple types of features comprise: text features, voice features, image features, discrete features;

and setting the various types of features as input parameters of a neural network classification model, and obtaining a classification result through the neural network classification model.

2. The method of claim 1, wherein the discrete features comprise: a first discrete feature subset, wherein the obtaining of the multiple types of features associated with the case to be classified comprises:

a first subset of discrete features is extracted from litigation material of the case to be classified.

3. The method of claim 1, wherein the discrete features comprise: a second discrete feature subset, the text features comprising: a first text feature subset, wherein the obtaining of the multiple types of features associated with the case to be classified comprises:

and analyzing the user portrait of the party of the case to be classified to obtain an analysis result, wherein the data source of the user portrait analysis comprises at least one of the following data sources: historical litigation data of the party, and transaction information and behavior information of the party in the network transaction process;

and performing feature engineering construction by using the analysis result to obtain the second discrete feature subset and/or the first text feature subset.

4. The method of claim 1, wherein the text feature comprises: a second text feature subset, wherein the obtaining of the multiple types of features associated with the case to be classified comprises:

carrying out natural language processing on text data in litigation materials of cases to be classified to obtain a processing result, wherein the natural language processing comprises the following steps: word segmentation, part of speech tagging and entity identification;

performing feature engineering construction by adopting the processing result, performing statistics on the processing result by a word frequency type statistical mode to obtain statistical features, and training the processing result by adopting a word vector model to obtain word vector features;

determining the statistical features and the word vector features as the second text feature subset.

5. The method of claim 1, wherein the text feature comprises: the third text feature subset, the fourth text feature subset and the fifth text feature subset, and the obtaining of the multiple types of features associated with the case to be classified includes:

classifying the evidence materials of the cases to be classified to obtain classification results;

and performing feature engineering construction by adopting the classification result, extracting the third text feature subset from text type evidence, extracting the fourth text feature subset and/or image feature from image type evidence, and extracting voice feature and/or the fifth text feature subset from audio or video type evidence.

6. The method of claim 2, wherein the discrete features comprise: a third discrete feature subset, wherein the obtaining of the multiple types of features associated with the case to be classified comprises:

and filling the first discrete feature subset into a legal knowledge graph, and performing feature engineering construction to obtain a third discrete feature subset, wherein the legal knowledge graph is constructed in advance according to the case domain to which the case to be classified belongs.

7. The method of claim 1, wherein the plurality of types of features are set as the input parameters of the neural network classification model, and obtaining the classification result through the neural network classification model comprises:

setting the multiple types of features as the input parameters, respectively adopting a feature encoder corresponding to each type of feature in the multiple types of features to perform encoding processing, and outputting dense feature vectors together;

and taking the dense feature vector as an input parameter of a next-layer encoder, and obtaining the classification result through encoding processing.

8. The method according to claim 7, wherein the performing the encoding process by using the feature encoder corresponding to each of the plurality of types of features respectively comprises:

coding the text features and/or the voice features by adopting a bidirectional long-short term memory network;

coding the image features by adopting a convolutional neural network;

and adopting a multilayer perceptron to carry out coding processing on the discrete features.

9. A case sorting apparatus, comprising:

the device comprises an acquisition unit, a classification unit and a classification unit, wherein the acquisition unit is used for acquiring multiple types of characteristics related to cases to be classified, and the multiple types of characteristics comprise: text features, voice features, image features, discrete features;

and the classification unit is used for setting the various types of characteristics as input parameters of a neural network classification model and obtaining a classification result through the neural network classification model.

10. The apparatus of claim 9, wherein the discrete features comprise: a first subset of discrete features, a second subset of discrete features, and a third subset of discrete features, the text features comprising: a first text feature subset, a second text feature subset, a third text feature subset, a fourth text feature subset, and a fifth text feature subset, the obtaining unit including:

an information extraction and data mining module for extracting a first discrete feature subset from litigation material of the case to be classified;

the legal element feature engineering module is used for carrying out feature engineering construction by filling the first discrete feature subset into a legal knowledge graph to obtain a third discrete feature subset, wherein the legal knowledge graph is constructed in advance according to the case domain to which the case to be classified belongs;

the principal part portrait characteristic engineering module is used for carrying out user portrait analysis on the principal part of the case to be classified to obtain an analysis result, wherein the data source of the user portrait analysis comprises at least one of the following data sources: historical litigation data of the party, and transaction information and behavior information of the party in the network transaction process; performing feature engineering construction by using the analysis result to obtain the second discrete feature subset and/or the first text feature subset;

the text evidence characteristic engineering module is used for performing natural language processing on text data in litigation materials of cases to be classified to obtain a processing result, wherein the natural language processing comprises the following steps: word segmentation, part of speech tagging and entity identification; performing feature engineering construction by adopting the processing result, performing statistics on the processing result by a word frequency type statistical mode to obtain statistical features, and training the processing result by adopting a word vector model to obtain word vector features; determining the statistical features and the word vector features as a second text feature subset;

the multi-modal evidence characteristic engineering module is used for classifying the evidence materials of the cases to be classified to obtain a classification result; and performing feature engineering construction by adopting the classification result, extracting the third text feature subset from text type evidence, extracting the fourth text feature subset and/or image feature from image type evidence, and extracting voice feature and/or the fifth text feature subset from audio or video type evidence.

11. A storage medium, characterized in that the storage medium comprises a stored program, wherein the program, when running, controls a device in which the storage medium is located to execute the case classification method according to any one of claims 1 to 8.

12. A terminal, comprising:

a first device;

a second device;

a processor that executes a program, wherein the program when executed performs the following processing steps for data output from the first and second devices:

the method comprises a first step of acquiring multiple types of characteristics associated with cases to be classified, wherein the multiple types of characteristics comprise: text feature voice feature, image feature, discrete feature;

and secondly, setting the multiple types of features as input parameters of a neural network classification model, and obtaining a classification result through the neural network classification model.