CN112632223A

CN112632223A - Case and event knowledge graph construction method and related equipment

Info

Publication number: CN112632223A
Application number: CN202011592591.8A
Authority: CN
Inventors: 朵思惟; 余梓飞; 于锋杰; 薛晨云
Original assignee: Tianjin Huizhi Xingyuan Information Technology Co ltd
Current assignee: Tianjin Huizhi Xingyuan Information Technology Co ltd
Priority date: 2020-12-29
Filing date: 2020-12-29
Publication date: 2021-04-09
Anticipated expiration: 2040-12-29
Also published as: CN112632223B

Abstract

One or more embodiments of the present specification provide a case event knowledge graph construction method and related equipment, where the construction method includes: the collected judicial case event related data is subjected to data processing, a case event representation system is summarized and constructed, case event information is extracted based on the case event representation system, and finally a structured case event map is formed by taking the extracted information as points and edges. The invention provides a method for constructing a law enforcement case event map based on an event extraction method of deep learning. The case event knowledge graph structures judicial case event information existing in a free text form through information extraction, and provides basic guarantee for application of a series of judicial fields of downstream tasks such as similar case retrieval, accurate class case pushing, automatic generation of referee documents and the like.

Description

Case and event knowledge graph construction method and related equipment

Technical Field

One or more embodiments of the present disclosure relate to the field of knowledge graph technology, and in particular, to a case event knowledge graph construction method and related apparatus.

Background

Knowledge map (Knowledge Graph) is a series of different graphs displaying Knowledge development process and structure relationship in the book intelligence field, describing Knowledge resources and carriers thereof by using visualization technology, mining, analyzing, constructing, drawing and displaying Knowledge and mutual relation between Knowledge resources and Knowledge carriers.

Specifically, the knowledge graph is a modern theory which achieves the purpose of multi-discipline fusion by combining theories and methods of applying subjects such as mathematics, graphics, information visualization technology, information science and the like with methods such as metrology introduction analysis, co-occurrence analysis and the like and utilizing a visualized graph to vividly display the core structure, development history, frontier field and overall knowledge framework of the subjects. The method displays the complex knowledge field through data mining, information processing, knowledge measurement and graph drawing, reveals the dynamic development rule of the knowledge field, and provides a practical and valuable reference for subject research. So far, the practical application of the method has been gradually expanded and has achieved better effect in developed countries, but the method still belongs to the beginning stage of research in China.

At present, map construction oriented to the vertical field is successfully applied to a plurality of fields such as medicine, economy and the like, and the research on case and event information extraction and map construction in the judicial field is relatively deficient. The judicial case event map based on deep learning is used for recording cases in a structured and logical mode in a map mode, and the situation can be learned and understood by machines in a pushing mode.

Disclosure of Invention

In view of the above, an object of one or more embodiments of the present disclosure is to provide a case event knowledge graph construction method and related apparatus.

In view of the above, one or more embodiments of the present specification provide a case event knowledge graph construction method, including:

collecting relevant data of a law case event;

carrying out data processing on the relevant data of the legal case event to obtain a case event;

defining case event types based on the existing laws and regulations, classifying the case events based on the case event types, mining the case events to establish case event roles corresponding to the case event types, and constructing an event representation system based on the case event types and the case event roles;

and extracting event information from the case event by adopting a joint extraction algorithm, classifying the event information based on the event representation system, and constructing a case event knowledge graph based on the classified event information.

Further, the data processing of the data related to the legal case event includes:

removing the non-case event content in the relevant data of the judicial case event, and reserving the text of the relevant case event;

extracting case event names in the related data of the legal case events;

carrying out normalization processing on the same case event in the related data of the law case event;

all the non-case event contents of each case event are fused to obtain case event related information, and the case event related information is associated with the case event.

Further, the classifying the case event based on the case event type includes:

the method comprises the steps of coding a case name of a case through a pre-trained bidirectional encoder representation BERT model from a converter to obtain a vector representation of the case name, calculating a score vector of the vector representation corresponding to the case type through a feedforward neural network, and calculating the case type with the highest probability through a softmax function based on the score vector to obtain the case type corresponding to the case name.

Further, the extracting event information from the case event by using a joint extraction algorithm, and classifying the event information based on the event representation system includes:

coding the case event statement through a BERT model to obtain the vector representation of each word in the case event statement;

marking the vector representation of each character by adopting a BIO sequence marking method, and identifying an entity and an event trigger word in the case event;

taking the entity and the event trigger word as nodes of the case and event knowledge graph, averaging vector representations of all words in the nodes to obtain vector representations of the nodes, splicing the two vector representations of the nodes to obtain vector representations of corresponding edges, respectively generating score vectors corresponding to the vector representations of the nodes and the edges through feedforward neural network calculation based on categories in an event representation system, and taking the categories corresponding to the maximum components of the score vectors as the categories of the nodes and the edges.

Further, the constructing a case and event knowledge graph based on the classified event information includes:

iterating the nodes and the edges by adopting a clustering search algorithm, forming a candidate graph set by all the points and the edges in the cluster, defining a global score function based on score vectors of the points and the edges, respectively calculating the global score of each candidate graph in the candidate graph set based on the global score function, sequencing all the candidate graphs according to the global score, and outputting the candidate graph with the highest global score as the case event knowledge graph.

Based on the same inventive concept, one or more embodiments of the present specification further provide a case event knowledge graph constructing device, including:

a data collection module configured to collect judicial case event related data;

the event library construction module is configured to perform data processing on the relevant data of the law case event to obtain a case event, and construct an event library based on the case event;

the event representation system construction module is configured to define case event types based on existing laws and regulations, classify the case events based on the case event types, establish case event roles corresponding to the case event types by performing role mining on the case events, and construct an event representation system based on the case event types and the case event roles;

the case event knowledge graph construction module is configured to adopt a joint extraction algorithm to extract event information from the case event, classify the event information based on the event representation system, and construct a case event knowledge graph based on the classified event information.

Based on the same inventive concept, one or more embodiments of the present specification provide an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable by the processor, and the processor implements the method as described in any one of the above items when executing the computer program.

Based on the same inventive concept, one or more embodiments of the present specification provide a non-transitory computer-readable storage medium storing computer instructions that, when executed by a computer, cause the computer to implement the method as described in any one of the above.

As can be seen from the above description, the case event knowledge graph construction method and the related device provided in one or more embodiments of the present specification provide a set of case event graph construction methods by combining legal experts and a legal case event representation system created by manual extraction with an event extraction method based on deep learning. The case event knowledge graph structures judicial case event information existing in a free text form through information extraction, and provides basic guarantee for application of a series of judicial fields of downstream tasks such as similar case retrieval, accurate class case pushing, automatic generation of referee documents and the like.

Drawings

In order to more clearly illustrate one or more embodiments or prior art solutions of the present specification, the drawings that are needed in the description of the embodiments or prior art will be briefly described below, and it is obvious that the drawings in the following description are only one or more embodiments of the present specification, and that other drawings may be obtained by those skilled in the art without inventive effort from these drawings.

FIG. 1 is a schematic diagram of a case event knowledge graph construction method flow according to one or more embodiments of the present disclosure;

FIG. 2 is a schematic diagram of a data processing operational flow for one or more embodiments of the present description;

FIG. 3 is a schematic diagram of an event library configuration according to one or more embodiments of the present disclosure;

FIG. 4 is a schematic diagram of the operational flow of extracting event information and constructing a case and event knowledge graph according to one or more embodiments of the present disclosure;

FIG. 5 is a diagram of a global feature template application of one or more embodiments of the present description;

FIG. 6 is a block diagram of a case event knowledge graph building apparatus according to one or more embodiments of the present disclosure;

fig. 7 is a hardware configuration diagram of an electronic device according to one or more embodiments of the present disclosure.

Detailed Description

For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.

It is to be noted that unless otherwise defined, technical or scientific terms used in one or more embodiments of the present specification should have the ordinary meaning as understood by those of ordinary skill in the art to which this disclosure belongs. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect.

As described in the background section, the information of judicial fields cases has grown dramatically in the era of information explosion. Although the offices in the judicial field gradually transform to informationization, a considerable part of case information still exists in the form of free text in the face of massive judicial official documents, civil affairs, criminal adjudications and the like. In the judicial reform promoted by the artificial intelligence wave, a machine is used for recognizing and understanding case situations through related leading-edge technologies in the field of artificial intelligence, extracting information from the case situations through an information extraction technology and forming a structured case event map for subsequent case query and data mining research, and the method is a basic premise and weak point of application of the current artificial intelligence in the judicial field.

The event graph is essentially a knowledge network which comprises events, event roles (or event attributes), event arguments (or attribute values), associations between the events and the like and takes the events as basic knowledge units. The knowledge network is composed of nodes and edges, wherein the nodes can be event triggers, types to which events belong or event arguments, and the edges can be event roles (or event attributes) or associations between events. Each event has its corresponding event type, and the event roles are defined according to the event types, and different events have different event types. For example, for a marital type event, the possible event roles are: husband, wife, marrying time, etc. Unlike knowledge-maps generated by entity-relationships, event-maps can dynamically characterize changes in the objective world. When entities and entity attributes are updated, traditional knowledge-graphs can only record the latest status values of entities and attributes, ignoring variability and comparison to previous information. The event map records the event itself, and the changed entity and attribute are recorded as a new event, so that the values of the entity and attribute before can be reserved, the correlation between the two changes before and after can be further obtained through the calculation of the map, and the variability of the objective world is well described.

In view of the above problems in the prior art, one or more embodiments of the present disclosure provide a method for constructing a case and event knowledge graph, which includes performing data detection on collected judicial case and event data, summarizing and constructing a case and event representation system, extracting case and event information based on the case and event representation system, and finally forming a structured case and event graph with the extracted information as points and edges.

Hereinafter, the technical means of the present disclosure will be described in further detail with reference to specific examples.

Referring to fig. 1, a case event knowledge graph construction method according to an embodiment of the present disclosure includes the following steps:

s101, collecting relevant data of the law case event. The official case event related data includes, but is not limited to, legal referee documents, civil/criminal adjudications, and user application logs.

In the step, the relevant text data of the judicial case event is firstly collected, and the collection source mainly comprises a semi-structured text legal judge document, an unstructured text civil affair and criminal judge document and an application log of a user. The user application log refers to text information searched by a user in various application scenes of the case event knowledge graph. The legal referee document belongs to a semi-structured text, and mainly comprises the following contents: case basic information, case characteristics, parties, trial passes, original declaration, notice dispute, antecedent passes, present finding, present deeming, and the like. Because the statement of case events of the original notice part and the defended dialect part in the legal referee document has personal color, the data acquisition is mainly based on the parts of finding and thinking of the home hospital, and the like, and the 'prior review process' of cases for multiple audits can also be used as a main source of information acquisition.

And S102, carrying out data processing on the related data of the law case event to obtain a case event, and constructing an event library based on the case event.

Referring to fig. 2, in this embodiment, the data processing on the judicial case event related data mainly includes the following steps:

step S201, case event segment identification, which mainly aims to remove non-case event content in the relevant data of the Selfame case and keep relevant case event texts.

Specifically, for a given legal referee document, structured text information, such as case-related information like "document title", "case number", "party information", etc., is identified according to title information; and for other parts of the official document, extracting texts of corresponding parts such as 'found out in the home' and 'thought in the home' in a mode of combining regular matching and manual verification. For unstructured civil and criminal judgment books, segmentation marking is carried out based on a preset rule, a class structured text and an unstructured case event related text are identified, and paragraphs which do not belong to case event description are deleted. Since the input content of the user application log is generally short, the entire text is temporarily retained at this step. And then, performing text preprocessing on the obtained case event text. The preprocessing stage mainly comprises the conventional operations of deleting punctuation marks and special marks, simplifying and unifying, standardizing and unifying expression modes, correcting texts and the like.

Step S202, case event name extraction, wherein the purpose is to extract the case event name in the related data of the judicial case. For texts such as law referee documents, civil affairs, criminal judgment documents and the like, the title of the text is the case event name, so that the title can be directly extracted as the case event name.

Step S203, case event normalization, which aims to perform normalization processing on the same case event in the relevant data of the legal case event.

Since the data sources are different, it is highly possible that the text related to the same case and event is extracted from different data sources, and therefore, the normalization processing needs to be performed on the same case and event. Although the text data of the description documents of the same legal case event may be slightly different at different websites, the final sources are normative legal referee documents, so that the description keywords and the professional terms of the case are basically consistent. Based on this, we use a relatively simple and efficient Jaccard similarity coefficient to calculate the similarity of the two event descriptions:

the numerator in the above formula represents the number of the set elements after the intersection of all the words of the event texts A and B, that is, the number of the same words of the two texts, and the denominator represents the number of the union of all the words of the event texts A and B. Given a threshold of 0.9, if the value of J (a, B) is greater than 0.9, the two texts are judged to be text descriptions for the same event, and normalization processing is performed.

Step S204, case event related information association, which is to fuse all the non-case event contents of each case event to obtain case event related information and associate the case event related information with the case event.

Specifically, for each given case, the content related to the case and not described in the case is recalled and associated with the case, which is removed in step S201, based on the original data information. These pieces of information include: case number, document title, etc. Since the same event may correspond to multiple pieces of data, which may be simply repeated or may have information that is not available to another party, during the recall process, information from different data sources is merged. Whether the normalization of the case event in step S203 is accurate can be confirmed again according to the unique features of the extracted case such as the case number.

Through steps S201 to S204, data processing on the collected judicial case event related data is completed, and an event library as shown in fig. 3 is constructed. As can be seen from the figure, each case event corresponds to an event name, an event segment and event association information, and all the case events form an event library.

Step S103, defining case event types based on the existing laws and regulations, classifying the case events based on the case event types, mining the case events to establish case event roles corresponding to the case event types, and constructing an event representation system based on the case event types and the case event roles.

In the traditional knowledge graph construction, most knowledge sources are high-quality structured texts, and a knowledge representation system of the graph can be preferentially constructed by using structured fields in the texts. In the construction of the event graph, because high-quality structured data is lacked, data processing is firstly needed, and then an event representation system is constructed. The purpose of the construction of the event representation system is to construct an event knowledge representation system with high accuracy and wide coverage, and prepare for tasks such as subsequent event information extraction.

Specifically, the steps can be divided into the following three steps:

step S301, case and event classification, wherein case and event names of case and event are classified to form category labels based on case and event types (such as civil case and event system) determined by legal experts. Firstly, inputting case event names in an event library into a BERT-Chinese pre-training model for coding to obtain vector representation v corresponding to the case event names_i. Calculating a score vector y for each case event name corresponding to different category labels using a feed-forward neural network_i＝FFN(v_i). Wherein, the vector y_iEach component of (a) represents a score for the case event name for the corresponding category label. Finally, the probability p that the case event name is divided into different case event types is obtained by applying a Softmax function_i＝Softmax(y_i) And taking the class label with the highest probability as the case type corresponding to the case.

Step S302, event role mining, wherein related information is recalled according to case event types, case events are mined in a role mode by combining the suggestions of legal experts and manual extraction modes, roles mined from all case events belonging to one case event type are merged and sorted, and then a case event role corresponding to each case event type is established.

And step S303, manual verification, namely, manually verifying the case event type and the case event role obtained in the step S301 and the step S302 by further combining the experience of experts in the legal field, and finally obtaining a more accurate case event representation system.

Based on steps S301 to S303, the construction of the event representation hierarchy is completed.

And step S104, extracting event information from the case event by adopting a joint extraction algorithm, classifying the event information based on the event representation system, and constructing a case event knowledge graph based on the classified event information.

In the step, an entity, relation and event combined extraction algorithm based on deep learning is adopted to extract information of the unstructured legal text. Information extraction plays the most important role in construction of case and event knowledge maps, and the accuracy of extraction directly determines the accuracy of map-to-case description and the generalization capability of maps. In this step, the text is first preprocessed, and the case event description text list is obtained by completing the original reported notations and performing sentence division processing. The key entities such as time, place, and related articles, the relationship between the entities, the case and event and the case and event trigger words and case and event roles related to the case and event, etc. in the case and document are extracted. Different from the traditional way of extracting entities, relations and events separately, a joint extraction algorithm of the entities, the relations and the events is adopted. The idea of the joint extraction can associate the entities, the relations, the events and the elements thereof in the legal document from the perspective of the whole semantics, thereby avoiding incomplete establishment of the relation links caused by a separate extraction mode and link errors caused by the fact that the whole sentence semantics is not associated.

Specifically, referring to fig. 4, the present embodiment includes the following steps:

and S401, coding, namely, coding the whole text content of the case event sentence by adopting a Chinese pre-training model Bert-Chinese which is very mature in the NLP field and has strong expression capability in the coding stage, and finally outputting the vector representation of each word in the sentence.

Step S402, entity and event trigger word recognition, based on the vector representation of each word obtained in the previous step S401, we label each word vector, and finally return a sentence labeled with specific entity and event trigger word category. Here we define label categories using the most common BIO label format in word labeling, such as: 'B-PER', 'I-PER', 'O', etc. Where the left side of the connector '-' represents the location of the word in the entity or trigger, there are only two cases: 'B' represents a starting position and 'I' represents a non-starting position. The right side of the connector represents the category of the word or phrase to which the word belongs: the entity, the event trigger, or neither. For example, 'PER' represents 'human' as a specific entity class and 'O' represents neither an entity nor an event trigger. For example: ' Wanghong quilt robbing wallet. "in this sentence," king "is labeled as 'B-PER', 'red' is labeled as 'I-PER'; 'by' belongs neither to an entity nor to a case event trigger, and is therefore labeled 'O'; the "robbery" is labeled as 'B-ATTACK', and belongs to the event trigger. The labeling process is equivalent to classifying each word vector according to a defined entity and an event trigger word category label, wherein the category label is derived from a previously constructed event representation system.

Assuming that there are L words in the sentence, the word vector obtained through step S401 is represented as { x }₁,…,x_LWe adopt a feed-forward neural network (FFN) to vector x for each word_iGenerating a score vector:

each component of the score vector corresponds to a score of each label corresponding to the word vector, the score vector

The length of (a) is the number of all tags. After obtaining the score vector of each word vector, if we directly take

Maximum component as x_iThen a corresponding label path is obtained:

wherein the content of the first and second substances,

representative vector

As the word vector x_iThe tag output of (1). The label path obtained by directly taking the maximum value does not consider the connection of each label in the path, the reasonability of the sequence and the like. For example:

this label path is not reasonable because the 'I-PER' may only be preceded by 'B-PER' or another 'I-PER'. In order to take into account the correlation between labels in the label path, we do not directly choose the maximum value after getting the score vector, but add a CRF layer (conditional random field layer) after that. The layer is positioned by introducing a relation matrix A of labels

Of (2) element(s)

For representing labels in paths

And

and will have a relationship of

The value of (A) is fused into the whole sentence X pair label path

In the calculation of the scoring function of (a):

wherein the content of the first and second substances,

representative tag score vector

To (1) a

And (4) a component. The matrix a may be learned during training. In the training process, we aim to find the label path with the highest score, z ═ z₁,…,z_LIt is equivalent to minimizing the loss function as follows:

after training, an optimal label path z can be obtained finally, labels of all words can be obtained according to the label path, and recognition of entities and event trigger words is completed according to the labeled labels. The identified entities and event trigger words are used as nodes of the case and event knowledge graph.

In step S403, classifying graph nodes and edges, because an entity or event trigger word corresponding to the same node may be composed of multiple words, usually a word or a phrase, the vector representation of each word in the word is averaged, and the obtained vector is used as the unique vector representation of the node. On the basis, for two node-class tasks (the two node-class tasks comprise entity recognition and event trigger word recognition), a corresponding score vector is generated for the vector representation of the node i by adopting a feed forward neural network (FFN):

wherein the upper superscript't' represents a certain one of the two node-class tasks,

is a score vector for all categories in task t, each component represents the score of the word vector for the category corresponding to the component, and the higher the score, the more likely the word corresponds to the category.

Is equal to the number of total classes of task t. For example: let us assume that the entity classes predefined in the entity recognition task are three classes, "people", "time", "place", and then the score vector corresponding to the entity node "king red" is

The length of the vector is 3, which is equal to the total number of categories in the entity recognition task.

For the vector representation of an "edge", we will first connect the vectors v of the two nodes of the edge_iAnd v_jPerforming direct splicing to obtain a vector (v)_i,v_j) As a vector representation of the edge. For an edge, two types of tasks are also provided, the two types of tasks are respectively relationship extraction and event role extraction, and the two types of tasks can be distinguished based on node types. According to the two types of nodes obtained in the last step, the node pairs have three combination modes: "entity-entity", "trigger-trigger", "entity-Trigger words ". Since "trigger-trigger" is unreasonable and cannot occur in practical situations, we have only two node pair types, exactly corresponding to two types of edge tasks. Based on this, for a given task t, we also use a feed-forward neural network (FFN), and generate a corresponding score vector for the vector representation of edge k:

during the training of the feedforward neural network model parameters, for a given task t, our goal is to minimize the cross-entropy loss function as follows:

wherein N is the number of nodes corresponding to the task t. The category corresponding to the maximum component of the score vector of each node and edge is taken, and a local optimal graph is obtained

The score function of this graph is calculated as follows:

where T represents a collection of four types of tasks.

Step S404, global feature fusion, according to the steps S401 to S403, a local optimal map can be obtained

However, as the model for generating the graph does not take global features of the whole sentence into consideration, that is, the comprehensive relation between all nodes and edges in the graph, the situation of error judgment based on local information is easy to occur. As shown in fig. 5, for example: "explosion caused terrorist and three purchasesThe person dies. A "terrorist" in this context is easily identified as the argument of the event role "attacker" in an "explosion" event, but is not easily identified as the "victim" of an "explosion" event because it is relatively distant in the sentence from the word "dead".

In this step, we add global features summarized from the legal referee document case events to the model to improve the understanding and fusion of the model to the sentence global information. For global features, we summarize the corresponding template system according to the recommendations of the legal experts, which also adds new templates correspondingly as case materials increase, here we illustrate the form of each template in the template system only for the above sentence: the number of entities that are arguments of both < role 1> and < role 2> in < event a >. Applying this template, the number of entities in the above sentence that are arguments to < assailant > and < victim > in the event < explosion > is 1. If the model does not add this global feature template system, then the information that "there are arguments that are both < assailant > and < victim > in the < explosion > event" is not easily learned. In the training process, a graph G is given, and the corresponding global feature vector is as follows:

f_G＝{f₁(G),…,f_M(G)}

wherein M is the number of global features in graph G, and function f_i(. cndot.) will return the number of nodes or edges that satisfy the global feature i. By way of example of the template above, function f_i(. cndot.) is defined as:

f_i(G)＝n·χ(C_i)

wherein C is_iAs events for "presentities<Explosion of the vessel>In<Attacker>And<victim>The fact that "X (-) is a characteristic function, if fact C_iOccurs, then x (C)_i) 1, otherwise 0; n is the number of entities satisfying the condition. Global feature vector f of graph G_GWeighted summation is added with the local score function s' (G) of the graph G, and finally, the global score function of the graph G is obtained：

s(G)＝s′(G)+u·f_G

The vector u is a weight vector and can be obtained through training and learning of a model. It is desirable that the global feature is fused to obtain the optimal graph G and the local optimal graph is fused to obtain the local optimal graph

The same atlas is unified as much as possible, so that the following loss functions need to be minimized in the training process:

the loss function L in steps S402 to S404 is integrated₁，L^tAnd L₂We define the loss function of the whole model as follows:

finally, we get the final model by minimizing the above loss function L in the training process. The result of the model output is a set of points and edges with label scores.

And S405, decoding, namely decoding the point and edge set with the label score output by the model in a decoding stage. And comprehensively considering the information of all nodes and edges connected with the nodes, and finally obtaining a candidate graph with the highest global score as a final case and event knowledge graph to be output.

Specifically, because greedy search is performed on all candidate graphs, which causes great computational effort, a beam search (beam search) algorithm is adopted here. We first use a zero-order graph (order-zero graph) K₀Initializing bundle B ═ K₀}. At each iteration step i, we will expand the candidate graph in B by nodes and edges. Wherein the content of the first and second substances,

and (3) node expansion: selecting a node v_i∈VAnd defining its candidate set as

Wherein

Represents v_iThe label corresponding to the component of the score vector of the kth largest node, a_iAnd b_iRespectively represent nodes v_iHead and tail positions of Chinese words, beta_vA hyperparameter to control the maximum number of candidate tags.

After node expansion we update the bundle as follows:

B←{G+v|(G,v)∈B×V_i}

and (3) edge expansion: we iteratively choose nodes v with numbers less than i_j∈V,j<i, and add plus v_iAnd v_jAll possible edges in between. But for v_iAnd v_jIs the case of event-triggered words, we do not add edges between them, and skip over node v_j. In each iteration, we build a set of candidate edges

Wherein

Represents e_ijThe component of the score vector of the kth largest edge of (1) corresponds to the label, β_vA hyperparameter to control the maximum number of candidate tags.

After edge extension we update the bundle as follows:

B←{G+e|(G,e)∈B×E_ij}

the width of the bundle B is theta, after each iteration, all points and all edges in the bundle B form a candidate graph set, global score calculation is carried out on each candidate graph in the candidate graph set through a previously defined global score function, all candidate graphs are ranked according to the global scores, and the first theta candidate graphs are reserved. After all iteration steps are completed, the candidate graph with the highest global score is returned to be used as the final case event knowledge graph of the case event to be output.

And finishing the event information extraction of the case and event based on the steps S401 to S405 and constructing a case and event knowledge graph.

It is understood that the above case knowledge graph construction method can be performed by any device, equipment, platform, or cluster of equipment having computing and processing capabilities.

It should be noted that the method of one or more embodiments of the present disclosure may be performed by a single device, such as a computer or server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the devices may perform only one or more steps of the method of one or more embodiments of the present disclosure, and the devices may interact with each other to complete the method.

It should be noted that the above description describes certain embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Based on the same inventive concept, one or more embodiments of the present disclosure corresponding to any of the above-described embodiment methods also provide a case and event knowledge graph construction apparatus.

Referring to fig. 6, the case event knowledge map constructing apparatus includes:

a data collection module 601 configured to collect judicial case event related data;

an event library construction module 602, configured to perform data processing on the relevant data of the law case event to obtain a case event, and construct an event library based on the case event;

an event representation system construction module 603 configured to define case event types based on existing laws and regulations, classify the case events based on the case event types, establish case event roles corresponding to each case event type by performing role mining on the case events, and construct an event representation system based on the case event types and the case event roles;

a case event knowledge graph construction module 604 configured to extract event information from the case event by using a joint extraction algorithm, classify the event information based on the event representation system, and construct a case event knowledge graph based on the classified event information.

As an alternative embodiment, the data collection module 601 is specifically configured to the relevant data of the law case event including, but not limited to, legal referee documents, civil/criminal adjudications, and user application logs.

As an alternative embodiment, the event library construction module 602 is specifically configured to:

identifying case event segments, removing non-case event contents in the related data of the law case event, and reserving related case event texts;

extracting case event names, namely extracting case event names in the related data of the law case events;

case event normalization, wherein the same case event in the related data of the law case event is subjected to normalization processing;

case and event related information is associated, all the non-case event contents of each case and event are fused to obtain case and event related information, and the case and event related information is associated with the case and event.

As an alternative embodiment, the event representation system building module 603 is specifically configured to encode the case name of the case through a pre-trained bidirectional encoder representation BERT model from the transformer to obtain a vector representation of the case name, obtain a score vector corresponding to the case type through calculation of a feed-forward neural network, and obtain the case type with the highest probability as the case type corresponding to the case name through calculation of a softmax function based on the score vector.

As an alternative embodiment, the case event knowledge graph constructing module 604 is specifically configured to:

For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, the functionality of the modules may be implemented in the same one or more software and/or hardware implementations in implementing one or more embodiments of the present description.

The apparatus of the foregoing embodiment is used to implement the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.

Based on the same inventive concept, one or more embodiments of the present specification further provide an electronic device, which includes a memory, a processor, and a computer program stored in the memory and running on the processor, and when the processor executes the computer program, the processor implements the method according to any one of the above embodiments.

Fig. 7 is a schematic diagram of a more specific hardware structure of an electronic device provided in this embodiment, where the electronic device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.

The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.

The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.

The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.

The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).

Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.

It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.

The electronic device of the foregoing embodiment is used to implement the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.

Based on the same inventive concept, one or more embodiments of the present specification also provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method according to any one of the above embodiments.

Computer-readable media of the present embodiments, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the spirit of the present disclosure, features from the above embodiments or from different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of different aspects of one or more embodiments of the present description as described above, which are not provided in detail for the sake of brevity.

In addition, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown in the provided figures, for simplicity of illustration and discussion, and so as not to obscure one or more embodiments of the disclosure. Furthermore, devices may be shown in block diagram form in order to avoid obscuring the understanding of one or more embodiments of the present description, and this also takes into account the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the one or more embodiments of the present description are to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the disclosure, it should be apparent to one skilled in the art that one or more embodiments of the disclosure can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.

While the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.

It is intended that the one or more embodiments of the present specification embrace all such alternatives, modifications and variations as fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of one or more embodiments of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A case and event knowledge graph construction method is characterized by comprising the following steps:

collecting relevant data of a law case event;

2. The construction method according to claim 1, wherein the forensic event related data comprises: legal referee documents, civil/criminal adjudications and user application logs.

3. The construction method according to claim 1 or 2, wherein the data processing of the forensic event related data comprises:

extracting case event names in the related data of the legal case events;

4. The method of claim 1 or 2, wherein the classifying the case based on the case type comprises:

5. The construction method according to claim 1 or 2, wherein the extracting event information from the case event by using a joint extraction algorithm, and classifying the event information based on the event representation system comprises:

6. The method of constructing according to claim 5, wherein the constructing a case and event knowledge graph based on the classified event information comprises:

7. A case and event knowledge graph construction device is characterized by comprising:

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 6 when executing the program.

9. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 6.