CN111881288B - Method and device for judging true and false of stroke information, storage medium and electronic equipment - Google Patents

Method and device for judging true and false of stroke information, storage medium and electronic equipment Download PDF

Info

Publication number
CN111881288B
CN111881288B CN202010426275.7A CN202010426275A CN111881288B CN 111881288 B CN111881288 B CN 111881288B CN 202010426275 A CN202010426275 A CN 202010426275A CN 111881288 B CN111881288 B CN 111881288B
Authority
CN
China
Prior art keywords
entity
information
relationship
event information
consistency detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010426275.7A
Other languages
Chinese (zh)
Other versions
CN111881288A (en
Inventor
陆韵
李冰
倪骏
黄刚
盛丽兰
陆克贤
俞山青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Chinaoly Technology Co ltd
Original Assignee
Hangzhou Chinaoly Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Chinaoly Technology Co ltd filed Critical Hangzhou Chinaoly Technology Co ltd
Priority to CN202010426275.7A priority Critical patent/CN111881288B/en
Publication of CN111881288A publication Critical patent/CN111881288A/en
Application granted granted Critical
Publication of CN111881288B publication Critical patent/CN111881288B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Abstract

The application provides a method and a device for judging whether pen record information is true or false, a storage medium and electronic equipment. Firstly, extracting entities, entity relations and event information in the transcript information; constructing a knowledge graph according to the entity, the entity relationship and the event information; carrying out relationship consistency detection and behavior consistency detection on the knowledge graph based on a historical database; adjusting confidence scores corresponding to the stroke information based on results of the relation consistency detection and the behavior consistency detection; and when the confidence score is smaller than a preset score threshold value, determining that the transcript information is in doubt. And acquiring the coincidence degree of the stroke information and the content recorded in the history database through the relation consistency detection, and acquiring the paradox degree of the stroke information through the behavior consistency detection, thereby adjusting the confidence score and comprehensively judging whether the stroke information is true and reliable. The judgment result is more accurate, and the case handling of related personnel is facilitated.

Description

Method and device for judging true and false of stroke information, storage medium and electronic equipment
Technical Field
The present invention relates to the field of computers, and in particular, to a method and apparatus for determining whether or not recording information is true or false, a storage medium, and an electronic device.
Background
In processing some cases, it is often necessary to record the transcript information based on the presentation of the relevant person. The transcript information is an important clue and basis for processing the case. The erroneous transcript information may mislead the judgment of the related personnel on the case, so that the whole processing flow of the case may be affected.
It is often necessary to confirm the authenticity of the transcript information. The prior method is to judge the authenticity of the transcript information by the experience of police. It requires a high level of judgment staff and takes a lot of time for the judgment staff.
Disclosure of Invention
The invention aims to provide a method and a device for judging whether the written information is true or false, a storage medium and electronic equipment, so as to solve the problems.
In order to achieve the above purpose, the technical solution adopted in the embodiment of the present application is as follows:
in a first aspect, an embodiment of the present application provides a method for determining whether or not transcript information is true or false, where the method includes:
extracting an entity, an entity relationship and event information in the transcript information, wherein the entity comprises one or more of a person, a time, a place, an article and a case type, the entity relationship represents relationship information among the entities, and the event information represents an event corresponding to the entity;
constructing a knowledge graph according to the entity, the entity relationship and the event information, wherein the knowledge graph comprises the corresponding relationship between the entity and the entity relationship and the corresponding relationship between the entity and the event information respectively;
performing relationship consistency detection and behavior consistency detection on the knowledge graph based on a historical database;
adjusting a confidence score corresponding to the transcript information based on the results of the relationship consistency detection and the behavior consistency detection;
when the confidence score is smaller than a preset score threshold, determining that the transcript information is in doubt;
the history database contains the written information which is verified to be true, the behavior consistency detection is used for detecting whether the event information corresponding to the entity has contradiction, and the relation consistency detection is used for detecting whether the expression of the entity relation corresponding to the entity in the history database and the knowledge graph is consistent.
In a second aspect, an embodiment of the present application provides a device for determining whether or not the transcript information is true or false, where the device includes:
the processing unit is used for extracting entities, entity relations and event information in the transcript information, wherein the entities comprise one or more of characters, time, places, articles and case types, the entity relations represent the relation information among the entities, and the event information represents events corresponding to the entities; the method is also used for constructing a knowledge graph according to the entity, the entity relationship and the event information, wherein the knowledge graph comprises the corresponding relationship between the entity and the entity relationship and the corresponding relationship between the entity and the event information respectively; the method is also used for carrying out relationship consistency detection and behavior consistency detection on the knowledge graph based on a historical database; the confidence score corresponding to the stroke information is adjusted based on the relation consistency detection and the behavior consistency detection;
the judging unit is used for determining that the transcript information is in doubt when the confidence score is smaller than a preset score threshold value;
the history database contains the written information which is verified to be true, the behavior consistency detection is used for detecting whether the event information corresponding to the entity has contradiction, and the relation consistency detection is used for detecting whether the expression of the entity relation corresponding to the entity in the history database and the knowledge graph is consistent.
In a third aspect, embodiments of the present application provide a storage medium having stored thereon a computer program which, when executed by a processor, implements the method described above.
In a fourth aspect, an embodiment of the present application provides an electronic device, including: a processor and a memory for storing one or more programs; the above-described method is implemented when the one or more programs are executed by the processor.
Compared with the prior art, the method, the device, the storage medium and the electronic equipment for judging whether the pen-recorded information is true or false have the following beneficial effects: firstly, extracting entities, entity relations and event information in the transcript information; constructing a knowledge graph according to the entity, the entity relationship and the event information; carrying out relationship consistency detection and behavior consistency detection on the knowledge graph based on a historical database; adjusting confidence scores corresponding to the stroke information based on results of the relation consistency detection and the behavior consistency detection; and when the confidence score is smaller than a preset score threshold value, determining that the transcript information is in doubt. And acquiring the coincidence degree of the stroke information and the content recorded in the history database through the relation consistency detection, and acquiring the paradox degree of the stroke information through the behavior consistency detection, thereby adjusting the confidence score and comprehensively judging whether the stroke information is true and reliable. The judgment result is more accurate, and the case handling of related personnel is facilitated.
In order to make the above objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered limiting in scope, and that other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 2 is a flow chart of a method for determining whether the transcript information is true or false according to the embodiment of the present application;
FIG. 3 is a schematic diagram of the substeps of S102 according to the embodiment of the present application;
FIG. 4 is a schematic diagram of sub-steps of S102-2 provided in an embodiment of the present application;
fig. 5 is a schematic diagram of sub-steps of S101 provided in the embodiment of the present application;
FIG. 6 is a schematic diagram of sub-steps of S101-6 provided in an embodiment of the present application;
fig. 7 is a schematic diagram of the substeps of S104 provided in the embodiment of the present application;
fig. 8 is a schematic unit diagram of a device for determining whether the recorded information is true or false according to an embodiment of the present application.
In the figure: 10-a processor; 11-memory; 12-bus; 13-a communication interface; 201-a processing unit; 202-a judging unit.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only to distinguish the description, and are not to be construed as indicating or implying relative importance.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the description of the present application, it should be noted that, the terms "upper," "lower," "inner," "outer," and the like indicate an orientation or a positional relationship based on the orientation or the positional relationship shown in the drawings, or an orientation or a positional relationship conventionally put in use of the product of the application, merely for convenience of description and simplification of the description, and do not indicate or imply that the apparatus or element to be referred to must have a specific orientation, be configured and operated in a specific orientation, and therefore should not be construed as limiting the present application.
In the description of the present application, it should also be noted that, unless explicitly specified and limited otherwise, the terms "disposed," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the terms in this application will be understood by those of ordinary skill in the art in a specific context.
Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.
The embodiment of the application provides electronic equipment which can be computer equipment or other intelligent equipment. Referring to fig. 1, a schematic structure of an electronic device is shown. The electronic device comprises a processor 10, a memory 11, a bus 12. The processor 10 and the memory 11 are connected by a bus 12, the processor 10 being adapted to execute executable modules, such as computer programs, stored in the memory 11.
The processor 10 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the method for judging whether the transcript information is true or false can be completed by an integrated logic circuit of hardware in the processor 10 or an instruction in a software form. The processor 10 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also digital signal processors (Digital Signal Processor, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field-programmable gate arrays (Field-Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
The memory 11 may comprise a high-speed random access memory (RAM: random Access Memory) and may also comprise a non-volatile memory (non-volatile memory), such as at least one disk memory.
Bus 12 may be a ISA (Industry Standard Architecture) bus, PCI (Peripheral Component Interconnect) bus, EISA (Extended Industry Standard Architecture) bus, or the like. Only one double-headed arrow is shown in fig. 1, but not only one bus 12 or one type of bus 12.
The memory 11 is used for storing programs such as programs corresponding to the judgment means for the authenticity of the transcript information. The judging means of the authenticity of the transcript information comprises at least one software function module which may be stored in the memory 11 in the form of software or firmware (firmware) or which is solidified in the Operating System (OS) of the electronic device. After receiving the execution instruction, the processor 10 executes the program to implement a method for judging whether the transcript information is true or false.
Possibly, the electronic device provided in the embodiment of the present application further includes a communication interface 13. The communication interface 13 is connected to the processor 10 via a bus. The electronic device may receive information transmitted by the external device, such as transcript information, through the communication interface 13.
It should be understood that the structure shown in fig. 1 is a schematic structural diagram of only a portion of an electronic device, which may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.
The method for judging whether the pen record information is true or false provided by the embodiment of the invention can be applied to the electronic equipment shown in fig. 1, and the specific flow is shown in fig. 2:
s101, extracting entities, entity relations and event information in the transcript information.
Often, the transcript information consists of characters, places, times, events, and relationships. The components in the transcript information are related to each other. In order to verify the authenticity of the transcript information, it is necessary to determine whether the constituent components and the associations with each other are correct, so it is necessary to extract the content in the transcript information.
The entity comprises one or more of characters, time, places, articles and case types, the entity relationship represents relationship information among the entities, and the event information represents events corresponding to the entities. The example is Zhang Sanzhan Liqu, zhang Sanzhan, liqu is an entity, and the event information is Zhang Sanzhan Liqu; "Xiaoming is a student of a king teacher", xiaoming and king teacher are entities, and student is an entity relationship between the two; the army can play basketball in the playground in the morning, and the army, the morning and the playground are entities.
S102, constructing a knowledge graph according to the entity, the entity relationship and the event information.
The knowledge graph comprises corresponding relations between the entities and the entity relations and between the entities and the event information. For example, "Xiaoming and Xiaohong" are couples ", the entity relationship" couples "corresponds to the entities" Xiaoming "," Xiaohong ", respectively; "xiao Deng and xiao Li are" racking ", then the event information" racking "corresponds to the entities" xiao Deng "," xiao Li ", respectively.
And S103, carrying out relationship consistency detection and behavior consistency detection on the knowledge graph based on the historical database.
Wherein the history database contains the transcript information that has been verified as authentic. The behavior consistency detection is used for detecting whether the event information corresponding to the entity has contradiction or not, and the relationship consistency detection is used for detecting whether the expression of the entity relationship corresponding to the entity in the historical database and the knowledge graph is consistent or not. For example, where Zhang Xiaogong and Wang Xiaoming in the history database are couples and Zhang Xiaogong and Li Xiaojun in the knowledge graph are couples, the entity relationship corresponding to Zhang Xiaogong is not represented consistently in the history database and knowledge graph. Event information described in, for example, a knowledge graph includes: eight points on the morning of the XX month of Xiaoming XX in the year XX and 9 points on the morning of the XX month of Xiaoming in the year XX are taken in the roman auction. Obviously, 8 o 'clock in Roman and 9 o' clock in Hangzhou on the same day is not possible to achieve. There is a contradiction between the two event information.
S104, adjusting the confidence score corresponding to the stroke information based on the results of the relation consistency detection and the behavior consistency detection.
Specifically, relationship consistency detection and behavior consistency detection are respectively performed on different entities, and confidence scores corresponding to the transcript information are adjusted based on detection results of the different entities.
S105, judging whether the confidence score is smaller than a preset score threshold value. If yes, executing S106; if not, S107 is performed.
Specifically, when the confidence score is smaller than the preset score threshold, it is indicated that the content of the transcript information does not match the content recorded in the history database or that the content of the transcript information contradicts the content of the transcript information, and S106 is executed. On the contrary, it is indicated that the paradox in the transcript information is less and substantially the same as the content recorded in the history database, and S107 is executed.
S106, determining that the transcript information is suspicious.
S107, determining that the stroke information is true.
In summary, in the method for judging whether the transcript information is true or false, the entity relationship and the event information in the transcript information are extracted first; constructing a knowledge graph according to the entity, the entity relationship and the event information; carrying out relationship consistency detection and behavior consistency detection on the knowledge graph based on a historical database; adjusting confidence scores corresponding to the stroke information based on results of the relation consistency detection and the behavior consistency detection; and when the confidence score is smaller than a preset score threshold value, determining that the transcript information is in doubt. And acquiring the coincidence degree of the stroke information and the content recorded in the history database through the relation consistency detection, and acquiring the paradox degree of the stroke information through the behavior consistency detection, thereby adjusting the confidence score and comprehensively judging whether the stroke information is true and reliable. The judgment result is more accurate, and the case handling of related personnel is facilitated.
On the basis of fig. 2, for the content in S104, a possible implementation manner is provided in the embodiment of the present application, please refer to the following, where the confidence score is obtained by the following expression:
wherein S is i Characterizing a confidence score of the ith transcript information; r is R F Characterizing the number of conflicting entity relationships; r is R all Characterizing a total number of entity relationships; a is that F Characterizing the number of event information that are contradictory; a is that all Characterizing a total amount of event information; omega 1 A score weight coefficient representing relationship consistency; omega 2 Representing the behavior consistency score weight coefficient.
It should be noted that, the entity relationships that are contradictory include entity relationships where the transcript information contradicts the history database and entity relationships where the transcript information is paradoxical. The number of event information in which contradictions exist includes paradox event information in the transcript information and event information in which the transcript information contradicts with the history database.
On the basis of fig. 2, for the content in S102, a possible implementation manner is further provided in the embodiment of the present application, please refer to fig. 3, S102 includes:
s102-1, constructing a knowledge graph by taking the entity, the entity relationship and the event information as different nodes.
Specifically, the nodes corresponding to different entities are different, the nodes corresponding to different entity relationships are different, and the nodes corresponding to different event information are different.
S102-2, eliminating repeated nodes in the knowledge graph.
The repeated nodes are any two nodes with the same corresponding entity, or any two nodes with the same corresponding entity relationship, or any two nodes with the same corresponding event information. Specifically, due to the difference in the personal memory manner, there is a difference in expressions such as a place of occurrence, a work tool, etc., which may cause the same entity, entity relationship, and event information to correspond to different expression manners. When the expression modes are different, the corresponding nodes in the knowledge graph are different. Duplicate node elimination, i.e., entity disambiguation, is required.
On the basis of fig. 3, for the content in S102-2, the embodiment of the present application further provides a possible implementation manner, please refer to fig. 4, S102-2 includes:
s102-2-1, and calculating the similarity of any two nodes.
Possibly, the similarity between any two nodes of the same type is calculated. For example, the type includes a node corresponding to the entity, a node corresponding to the entity relationship, and a node corresponding to the event information.
S102-2-2, judging whether the similarity is larger than a preset threshold value. If yes, executing S102-2-4; if not, S102-2-3 is performed.
The nature of characterizing any two nodes may be the same when the similarity between the two nodes is greater than a preset threshold. For example Deng Chaohe supergo, it may represent the same person, but the expressions are not identical. At this time, S102-2-4 is performed. Otherwise, S102-2-3 is performed.
S102-2-3, no elimination processing is performed.
S102-2-4, wherein two nodes are repeated nodes, and one node is eliminated.
Possibly, any one of the duplicate nodes is reserved.
The embodiment of the application provides a possible implementation manner, and the similarity of any two nodes is calculated by the following formula:
wherein sim (A, B) characterizes the similarity of node A to node B, A s Characterizing similarity features of node A, A 0 Representing the quantization characteristic of the node A, m representing the number of first-order neighbor nodes of the node A in the knowledge graph and P i Characterizing quantization characteristics of the ith neighbor node of node A, i is less than or equal to m, B s Characterizing similarity characteristics of node B, B 0 Characterizing the quantization characteristic of the node B, n characterizing the number of first-order neighbor nodes of the node B in the knowledge graph, and P k And the quantization characteristic of the kth neighbor node of the node B is represented, and k is less than or equal to n.
On the basis of fig. 2, for the content in S101, a possible implementation manner is further provided in the embodiment of the present application, please refer to fig. 5, S101 includes:
s101-1, carrying out quantization coding on characters in the stroke information.
Specifically, the quantization coding may be a coding consisting of 0 and 1, i.e., one-hot coding of chinese characters.
S101-2, taking the quantized codes as the input of the entity extraction model to obtain the labeling sequence scores of the corresponding characters.
Specifically, the entity extraction model may be a bilstm model. The labeling sequence score is the score of which position in the entity the text belongs to. For example Zhang Zi in Zhang Xiaoming, its possible labeling sequence scores are:
character first: 7 minutes; character second position: 0.5 minutes; character third position: 0.5 min …
Time first bit: 0 minutes; time second bit: 0 minutes; time third bit: 0 portion …
Place first bit: 2, dividing; site second bit: 0.5 minutes; third place: 0.5 min …
The labeling sequence score is merely for ease of understanding, and the expression and form of the labeling sequence score are not limited herein.
S101-3, outputting the entity type of the text according to the type transition probability between the text and the front and rear text and the marking sequence score.
For example Zhang Xiaoming eat breakfast, obviously, eat is a verb and eat is not the same type of early word as before and after. When the type between the text and the front and back text is not transferred, the text and the front and back text are often combined. The first digit of the name of the person, "small" is the second digit, and the third digit is the "light" can be obtained through the type transition probability and the labeling sequence score.
Possibly, the Chinese character entity type can be output by the bilstm model plus the softmax layer.
S101-4, combining the characters associated with the entity types together to obtain the entity.
For example, the three words "Zhang", "Xiao", "Ming" mentioned above are combined to obtain Zhang Xiaoming.
S101-5, acquiring entity relations based on the entities.
For example, when an entity is obtained that contains reddish and micro Wang Shi, in combination with semantic analysis, the entity relationship between reddish and microking is obtained from the transcript information.
S101-6, acquiring event information based on the syntactic relation.
The syntactic relationship is, for example, a master-predicate relationship, a dynamic guest relationship, or the like. Because the composition of events often contains predicates or verbs. Therefore, event information can be obtained by using predicates or verbs as trigger words and combining with syntactic relations.
On the basis of fig. 5, for the content in S106-6, the embodiment of the present application further provides a possible implementation manner, please refer to fig. 6, where S106-6 includes:
s106-6-1, using the stroke information as the input of the word segmentation model to obtain a word segmentation result.
The word segmentation result comprises words in the stroke information, such as verbs, nouns, adjectives and the like.
S106-6-2, constructing a word segmentation result into a syntactic network diagram based on the syntactic relation.
The syntactic network diagram comprises words in the word segmentation result, syntactic relations among the words and an adjacency matrix, and the adjacency matrix characterizes whether syntactic relations exist between any two words.
Possibly, a syntactic network diagram g= (V, E, a). V denotes any one node (each vocabulary is a node) in the syntactic network diagram G. E represents the node-to-node relationship in the syntactic network diagram G. A represents the adjacency matrix of the syntactic network map G. A may consist of (0, 1), 1 indicating that there is a syntactic relationship between two nodes, 0 indicating that there is no syntactic relationship.
S106-6-3, embedding the grammar network diagram based on the quantized representation of each vocabulary and the adjacency matrix to obtain a vocabulary characteristic table.
Specifically, the vocabulary feature table can be obtained according to the following expression.
Wherein A is an adjacency matrix;is a repair matrix; i is an identity matrix; d is->A degree matrix of (2); x is a quantization matrix; w (W) 1 、W 2 、b 1 、b 2 Is a trainable parameter; />Is a vocabulary of features. The quantization matrix is composed of quantized codes of respective words.
S106-6-4, calculating Euclidean distances between each verb and other vocabularies based on the vocabulary characteristic table.
Wherein, verbs and other words are words contained in the syntactic network diagram.
S106-6-5, judging whether the Euclidean distance between other vocabularies and the verbs is smaller than a preset distance threshold. If yes, executing S106-6-7; if not, S106-6-6 is performed.
Specifically, the Euclidean distance between other vocabularies and the verb is smaller than the preset distance threshold, which indicates that the vocabularies and the verb are associated, and S106-6-7 is executed. Otherwise, S106-6-6 is performed.
S106-6-6, the other vocabulary is not associated with the verb.
S106-6-7, taking other vocabularies as components of the event corresponding to the verb to obtain candidate events.
And S106-6-8, taking the candidate event as the input of the classifier model to output final event information.
Specifically, the classifier model may be trained in a manner of outputting final event information according to the following expression to obtain event types of candidate events.
Wherein C is i Representing a subsequent time; n represents the number of sentences in the sample; y is i Representing a true event category;representing a predicted event category; n is n p Representing the number of event categories.
On the basis of fig. 2, for the content in S104, a possible implementation manner is further provided in the embodiment of the present application, please refer to fig. 7, and S104 includes:
s104-1, when the result of the relation consistency detection is that the event information corresponding to the entity is contradictory, the confidence score is reduced.
And S104-2, when the result of the behavior consistency detection is that the expression of the entity relation corresponding to the entity is inconsistent in the historical database and the knowledge graph, the confidence score is reduced.
Referring to fig. 8, fig. 8 is a schematic diagram illustrating an apparatus for determining whether or not the recording information is true or false according to an embodiment of the present application, and optionally, the apparatus for determining whether or not the recording information is true or false is applied to the electronic device described above.
The judging device for the true and false of the stroke information comprises: a processing unit 201 and a judging unit 202.
The processing unit 201 is configured to extract an entity, an entity relationship, and event information in the transcript information, where the entity includes one or more of a person, a time, a place, an article, and a case type, the entity relationship represents relationship information between the entities, and the event information represents an event corresponding to the entity; the method is also used for constructing a knowledge graph according to the entity, the entity relationship and the event information, wherein the knowledge graph comprises the corresponding relationship between the entity and the entity relationship and the corresponding relationship between the entity and the event information respectively; the method is also used for carrying out relationship consistency detection and behavior consistency detection on the knowledge graph based on the historical database; and the confidence score corresponding to the stroke information is adjusted based on the results of the relation consistency detection and the behavior consistency detection. Specifically, the processing unit 201 may execute S101 to S104 described above.
The judging unit 202 is configured to determine that the transcript information is in doubt when the confidence score is less than a preset score threshold. Specifically, the judgment unit 202 may execute S105 to S107 described above.
The historical database contains the written information which is verified to be true, the behavior consistency detection is used for detecting whether the event information corresponding to the entity has contradiction, and the relationship consistency detection is used for detecting whether the expression of the entity relationship corresponding to the entity in the historical database and the knowledge graph is consistent.
It should be noted that, the judging device for true and false of the transcript information provided in the present embodiment may execute the method flow shown in the method flow embodiment, so as to achieve the corresponding technical effect. For a brief description, reference is made to the corresponding parts of the above embodiments, where this embodiment is not mentioned.
The embodiment of the invention also provides a storage medium which stores computer instructions and programs, and the computer instructions and the programs execute the judging method of the true or false of the stroke information in the embodiment when being read and run. The storage medium may include memory, flash memory, registers, combinations thereof, or the like.
The electronic device, which may be a computing device or other intelligent terminal device, is provided below, where the electronic device is shown in fig. 1, and the above method for judging whether the transcript information is true or false may be implemented; specifically, the electronic device includes: a processor 10, a memory 11, a bus 12. The processor 10 may be a CPU. The memory 11 is used to store one or more programs, which when executed by the processor 10, perform the method of determining whether the transcript information of the above-described embodiment is true or false.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners as well. The apparatus embodiments described above are merely illustrative, for example, flow diagrams and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the same, but rather, various modifications and variations may be made by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.
It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims (9)

1. The method for judging whether the written information is true or false is characterized by comprising the following steps:
extracting an entity, an entity relationship and event information in the transcript information, wherein the entity comprises one or more of a person, a time, a place, an article and a case type, the entity relationship represents relationship information among the entities, and the event information represents an event corresponding to the entity;
constructing a knowledge graph according to the entity, the entity relationship and the event information, wherein the knowledge graph comprises the corresponding relationship between the entity and the entity relationship and the corresponding relationship between the entity and the event information respectively;
performing relationship consistency detection and behavior consistency detection on the knowledge graph based on a historical database;
adjusting a confidence score corresponding to the transcript information based on the results of the relationship consistency detection and the behavior consistency detection;
when the confidence score is smaller than a preset score threshold, determining that the transcript information is in doubt;
the history database comprises stroke information which is verified to be true, the behavior consistency detection is used for detecting whether the event information corresponding to the entity has contradiction, and the relation consistency detection is used for detecting whether the expression of the entity relation corresponding to the entity in the history database and the knowledge graph is consistent;
the step of adjusting the confidence score corresponding to the transcript information based on the results of the relationship consistency detection and the behavior consistency detection comprises the following steps:
when the result of the relation consistency detection is that the event information corresponding to the entity is contradictory, the confidence score is reduced;
when the behavior consistency detection result is that the expression of the entity relation corresponding to the entity in the historical database and the knowledge graph is inconsistent, the confidence score is reduced;
the confidence score is obtained by the following equation:
wherein S is i Characterizing a confidence score of the ith transcript information; r is R F Characterizing the number of conflicting entity relationships; r is R all Characterizing a total number of entity relationships; a is that F Characterizing the number of event information that are contradictory; a is that all Characterizing a total amount of event information; omega 1 A score weight coefficient representing relationship consistency; omega 2 Representing a behavior consistency score weight coefficient; the entity relationship with contradiction includes the entity relationship with contradiction between the record information and the history database and the entity relationship with paradox in the record information, and the number of the event information with contradiction includes the event information with paradox in the record information and the event information with contradiction between the record information and the history database.
2. The method for determining whether the transcript information is true or false according to claim 1, wherein said step of constructing a knowledge graph according to said entity, said entity relationship, and said event information comprises:
constructing the knowledge graph by taking the entity, the entity relationship and the event information as different nodes;
and eliminating repeated nodes in the knowledge graph, wherein the repeated nodes are any two nodes with the same corresponding entity, or any two nodes with the same corresponding entity relationship, or any two nodes with the same corresponding event information.
3. The method for determining whether the transcript information is true or false according to claim 2, wherein the step of eliminating the repeated nodes in the knowledge graph comprises:
calculating the similarity of any two nodes;
judging whether the similarity is larger than a preset threshold value or not;
if yes, the two nodes are repeated nodes, and one node is eliminated.
4. The method for determining whether the transcript information is true or false according to claim 3, wherein the similarity between any two nodes is calculated by the following formula:
wherein sim (A, B) characterizes the similarity of node A to node B, A s Characterizing similarity features of node a,A 0 Characterizing the quantization characteristic of the node A, and m characterizes the number of first-order neighbor nodes of the node A in the knowledge graph, P i Characterizing quantization characteristics of the ith neighbor node of node A, i is less than or equal to m, B s Characterizing similarity characteristics of node B, B 0 Characterizing the quantization characteristic of the node B, n characterizing the number of first-order neighbor nodes of the node B in the knowledge graph, and P k And the quantization characteristic of the kth neighbor node of the node B is represented, and k is less than or equal to n.
5. The method for determining whether the transcript information is true or false according to claim 1, wherein the step of extracting the entity, the entity relationship and the event information in the transcript information comprises:
carrying out quantization coding on characters in the stroke information;
taking the quantized codes as the input of an entity extraction model to obtain the marking sequence scores of the corresponding words;
outputting the entity type of the text according to the type transition probability between the text and the front and rear text and the marking sequence score;
combining the text associated with the entity type to obtain the entity;
acquiring the entity relationship based on the entity;
and acquiring the event information based on the syntactic relation.
6. The method for determining whether the transcript information is true or false according to claim 5, wherein said step of obtaining said event information based on a syntactic relationship comprises:
taking the stroke information as input of a word segmentation model to obtain a word segmentation result, wherein the word segmentation result comprises words in the stroke information;
constructing the word segmentation result into a syntactic network diagram based on the syntactic relation, wherein the syntactic network diagram comprises words in the word segmentation result, syntactic relations among the words and an adjacency matrix, and the adjacency matrix represents whether the syntactic relations exist between any two words;
based on the quantized representation of each vocabulary and the adjacency matrix, carrying out embedded representation on the syntactic network diagram so as to obtain a vocabulary characteristic table;
calculating Euclidean distances between each verb and other vocabularies based on the vocabulary feature list, wherein the verbs and the other vocabularies are vocabularies contained in the syntactic network diagram;
if the Euclidean distance between other vocabularies and the verbs is smaller than a preset distance threshold, the other vocabularies are used as component parts of the corresponding events of the verbs so as to obtain candidate events;
and taking the candidate event as input of a classifier model to output final event information.
7. A device for judging whether or not the recorded information is true or false, the device comprising:
the processing unit is used for extracting entities, entity relations and event information in the transcript information, wherein the entities comprise one or more of characters, time, places, articles and case types, the entity relations represent the relation information among the entities, and the event information represents events corresponding to the entities; the method is also used for constructing a knowledge graph according to the entity, the entity relationship and the event information, wherein the knowledge graph comprises the corresponding relationship between the entity and the entity relationship and the corresponding relationship between the entity and the event information respectively; the method is also used for carrying out relationship consistency detection and behavior consistency detection on the knowledge graph based on a historical database; the confidence score corresponding to the stroke information is adjusted based on the relation consistency detection and the behavior consistency detection;
the judging unit is used for determining that the transcript information is in doubt when the confidence score is smaller than a preset score threshold value;
the history database comprises stroke information which is verified to be true, the behavior consistency detection is used for detecting whether the event information corresponding to the entity has contradiction, and the relation consistency detection is used for detecting whether the expression of the entity relation corresponding to the entity in the history database and the knowledge graph is consistent;
the adjusting the confidence score corresponding to the transcript information based on the results of the relationship consistency detection and the behavior consistency detection includes:
when the result of the relation consistency detection is that the event information corresponding to the entity is contradictory, the confidence score is reduced;
when the behavior consistency detection result is that the expression of the entity relation corresponding to the entity in the historical database and the knowledge graph is inconsistent, the confidence score is reduced;
the confidence score is obtained by the following equation:
wherein S is i Characterizing a confidence score of the ith transcript information; r is R F Characterizing the number of conflicting entity relationships; r is R all Characterizing a total number of entity relationships; a is that F Characterizing the number of event information that are contradictory; a is that all Characterizing a total amount of event information; omega 1 A score weight coefficient representing relationship consistency; omega 2 Representing a behavior consistency score weight coefficient; the entity relationship with contradiction includes the entity relationship with contradiction between the record information and the history database and the entity relationship with paradox in the record information, and the number of the event information with contradiction includes the event information with paradox in the record information and the event information with contradiction between the record information and the history database.
8. A storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any of claims 1-6.
9. An electronic device, comprising: a processor and a memory for storing one or more programs; the method of any of claims 1-6 is implemented when the one or more programs are executed by the processor.
CN202010426275.7A 2020-05-19 2020-05-19 Method and device for judging true and false of stroke information, storage medium and electronic equipment Active CN111881288B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010426275.7A CN111881288B (en) 2020-05-19 2020-05-19 Method and device for judging true and false of stroke information, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010426275.7A CN111881288B (en) 2020-05-19 2020-05-19 Method and device for judging true and false of stroke information, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN111881288A CN111881288A (en) 2020-11-03
CN111881288B true CN111881288B (en) 2024-04-09

Family

ID=73154345

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010426275.7A Active CN111881288B (en) 2020-05-19 2020-05-19 Method and device for judging true and false of stroke information, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN111881288B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114637829A (en) * 2022-02-21 2022-06-17 阿里巴巴(中国)有限公司 Recording text processing method, recording text processing device and computer readable storage medium
CN117591660B (en) * 2024-01-18 2024-04-16 杭州威灿科技有限公司 Material generation method, equipment and medium based on digital person

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101520740A (en) * 2009-04-03 2009-09-02 北京航空航天大学 Method for realizing event consistency based on time mapping
WO2012130489A1 (en) * 2011-04-01 2012-10-04 Siemens Aktiengesellschaft Method, system, and computer program product for maintaining data consistency between two databases
CN108241727A (en) * 2017-09-01 2018-07-03 新华智云科技有限公司 News reliability evaluation method and equipment
CN109388648A (en) * 2018-08-15 2019-02-26 王小易 A method of extracting personal information and party in electronic record
CN109766445A (en) * 2018-12-13 2019-05-17 平安科技(深圳)有限公司 A kind of knowledge mapping construction method and data processing equipment
CN109785968A (en) * 2018-12-27 2019-05-21 东软集团股份有限公司 A kind of event prediction method, apparatus, equipment and program product
CN109902151A (en) * 2019-03-08 2019-06-18 南阳市烟草公司城区分公司 Recording method, device and the electronic equipment of interrogation record
CN110489569A (en) * 2019-08-26 2019-11-22 上海秒针网络科技有限公司 A kind of event-handling method and device of knowledge based map
CN110634088A (en) * 2018-06-25 2019-12-31 阿里巴巴集团控股有限公司 Case refereeing method, device and system
WO2020001373A1 (en) * 2018-06-26 2020-01-02 杭州海康威视数字技术股份有限公司 Method and apparatus for ontology construction
CN110825880A (en) * 2019-09-18 2020-02-21 平安科技(深圳)有限公司 Case winning rate determining method, device, equipment and computer readable storage medium
CN110825879A (en) * 2019-09-18 2020-02-21 平安科技(深圳)有限公司 Case decision result determination method, device and equipment and computer readable storage medium
CN110895568A (en) * 2018-09-13 2020-03-20 阿里巴巴集团控股有限公司 Method and system for processing court trial records
CN111159428A (en) * 2019-12-30 2020-05-15 智慧神州(北京)科技有限公司 Method and device for automatically extracting event relation of knowledge graph in economic field

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11222052B2 (en) * 2011-02-22 2022-01-11 Refinitiv Us Organization Llc Machine learning-based relationship association and related discovery and
US20200117732A1 (en) * 2018-10-11 2020-04-16 International Business Machines Corporation Analysis and determination of relative consistency of identified relationships

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101520740A (en) * 2009-04-03 2009-09-02 北京航空航天大学 Method for realizing event consistency based on time mapping
WO2012130489A1 (en) * 2011-04-01 2012-10-04 Siemens Aktiengesellschaft Method, system, and computer program product for maintaining data consistency between two databases
CN108241727A (en) * 2017-09-01 2018-07-03 新华智云科技有限公司 News reliability evaluation method and equipment
CN110634088A (en) * 2018-06-25 2019-12-31 阿里巴巴集团控股有限公司 Case refereeing method, device and system
WO2020001373A1 (en) * 2018-06-26 2020-01-02 杭州海康威视数字技术股份有限公司 Method and apparatus for ontology construction
CN109388648A (en) * 2018-08-15 2019-02-26 王小易 A method of extracting personal information and party in electronic record
CN110895568A (en) * 2018-09-13 2020-03-20 阿里巴巴集团控股有限公司 Method and system for processing court trial records
CN109766445A (en) * 2018-12-13 2019-05-17 平安科技(深圳)有限公司 A kind of knowledge mapping construction method and data processing equipment
CN109785968A (en) * 2018-12-27 2019-05-21 东软集团股份有限公司 A kind of event prediction method, apparatus, equipment and program product
CN109902151A (en) * 2019-03-08 2019-06-18 南阳市烟草公司城区分公司 Recording method, device and the electronic equipment of interrogation record
CN110489569A (en) * 2019-08-26 2019-11-22 上海秒针网络科技有限公司 A kind of event-handling method and device of knowledge based map
CN110825880A (en) * 2019-09-18 2020-02-21 平安科技(深圳)有限公司 Case winning rate determining method, device, equipment and computer readable storage medium
CN110825879A (en) * 2019-09-18 2020-02-21 平安科技(深圳)有限公司 Case decision result determination method, device and equipment and computer readable storage medium
CN111159428A (en) * 2019-12-30 2020-05-15 智慧神州(北京)科技有限公司 Method and device for automatically extracting event relation of knowledge graph in economic field

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
刑事司法印证式采纳言词笔录实践之反思;郭文利;《证据科学》;第23卷(第06期);686-693 *
勘验笔录证明力的认证规则探讨;李明 等;《证据科学》;第26卷(第02期);151-160 *
法院判决书关键信息抽取系统设计与实现;刘稳 等;《湖北工业大学学报》;第33卷(第01期);63-67 *
论行政笔录在刑事诉讼中的使用;董坤 等;《苏州大学学报(哲学社会科学版)》;第36卷(第04期);107-114 *

Also Published As

Publication number Publication date
CN111881288A (en) 2020-11-03

Similar Documents

Publication Publication Date Title
CN109902307B (en) Named entity recognition method, named entity recognition model training method and device
CN106776544B (en) Character relation recognition method and device and word segmentation method
CN110275965B (en) False news detection method, electronic device and computer readable storage medium
US11024287B2 (en) Method, device, and storage medium for correcting error in speech recognition result
CN110232923B (en) Voice control instruction generation method and device and electronic equipment
CN111881288B (en) Method and device for judging true and false of stroke information, storage medium and electronic equipment
CN106570180A (en) Artificial intelligence based voice searching method and device
CN111079412A (en) Text error correction method and device
CN108573707B (en) Method, device, equipment and medium for processing voice recognition result
CN110298039B (en) Event place identification method, system, equipment and computer readable storage medium
CN114090794A (en) Event map construction method based on artificial intelligence and related equipment
CN114036930A (en) Text error correction method, device, equipment and computer readable medium
CN112836039A (en) Voice data processing method and device based on deep learning
CN110705261B (en) Chinese text word segmentation method and system thereof
CN111079433A (en) Event extraction method and device and electronic equipment
CN112699671B (en) Language labeling method, device, computer equipment and storage medium
KR102166102B1 (en) Device and storage medium for protecting privacy information
CN111966839B (en) Data processing method, device, electronic equipment and computer storage medium
CN112527967A (en) Text matching method, device, terminal and storage medium
CN113761137A (en) Method and device for extracting address information
CN111831685A (en) Query statement processing method, model training method, device and equipment
CN116431746A (en) Address mapping method and device based on coding library, electronic equipment and storage medium
CN115238092A (en) Entity relationship extraction method, device, equipment and storage medium
WO2021082570A1 (en) Artificial intelligence-based semantic identification method, device, and semantic identification apparatus
CN109993190B (en) Ontology matching method and device and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant