CN116521892A

CN116521892A - Knowledge graph application method, knowledge graph application device, electronic equipment, medium and program product

Info

Publication number: CN116521892A
Application number: CN202310461551.7A
Authority: CN
Inventors: 惠子津; 杨彬; 朱建强; 袁宝
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2023-04-26
Filing date: 2023-04-26
Publication date: 2023-08-01

Abstract

The present disclosure provides a method, apparatus, electronic device, medium and computer program product for applying knowledge-graph based on fraud information. The method and the device can be used in the technical field of artificial intelligence. The method comprises the steps of constructing a knowledge graph in real time according to fraud information of a system in a bank obtained in real time; determining a query result in response to the query request; determining relevant answers from the knowledge graph by utilizing a pre-constructed question-answer model in response to the query request; and matching the fraud prevention prompts according to the query results or the related answers. Wherein, according to the fraud information of the system in the bank that obtains in real time, construct the knowledge graph in real time, include: performing entity and relation extraction on fraud information of a system in a bank obtained in real time by using a pre-constructed entity relation extraction model to obtain a triplet pre-selection set; aligning the entities in the triplet preselection set to obtain a triplet set; and constructing a knowledge graph according to the triplet set.

Description

Knowledge graph application method, knowledge graph application device, electronic equipment, medium and program product

Technical Field

The present disclosure relates to the field of artificial intelligence technology, and more particularly, to a method, apparatus, electronic device, medium and computer program product for applying knowledge-graph based on fraud information.

Background

With popularization and rapid development of networks, the bank is used as a line of transactions, and has a large amount of fraud information and specific transaction information, and it can be understood that fraud means or account information and the like can be found to be trace and circulated through analysis, if scattered fraud information can be related, fraud generation can be effectively prevented, and property safety of users can be guaranteed.

Disclosure of Invention

In view of this, the present disclosure provides an application method, an apparatus, an electronic device, a computer readable storage medium and a computer program product for obtaining comprehensive fraud information, and obtaining corresponding fraud prevention prompts, so as to effectively prevent fraud from occurring, and ensure property security of users.

One aspect of the present disclosure provides a method for applying a knowledge graph based on fraud information, comprising: according to the fraud information of the system in the bank obtained in real time, a knowledge graph is constructed in real time; determining a query result in response to a query request, wherein the query request comprises keywords related to nodes and/or edges, and the query result comprises the nodes and/or edges corresponding to the query request and associated information of the nodes and/or edges in the knowledge graph; determining relevant answers from the knowledge graph by utilizing a pre-constructed question-answer model in response to a query request; and matching the fraud prevention prompt according to the query result or the related answer.

Wherein, the real-time knowledge graph construction according to the fraud information of the system in the bank obtained in real time comprises the following steps: performing entity and relation extraction on fraud information of a system in a bank obtained in real time by using a pre-constructed entity relation extraction model to obtain a triplet pre-selection set; aligning the entities in the triplet preselection set to obtain a triplet set; and constructing a knowledge graph according to the triplet set.

According to the application method of the knowledge graph based on the fraud information, the knowledge graph can be constructed in real time according to the fraud information of the system in the bank obtained in real time. In response to the query request, a search may be conducted in the knowledge-graph, and thus the query result may be determined in the knowledge-graph. Using a pre-built question-answer model, in response to a query request, a relevant answer may be determined from the knowledge graph, which may be a node and/or edge associated with a question in the query request, and an edge and/or node associated with the node and/or edge. The anti-fraud cues may be matched in a pre-built anti-fraud cue library according to the query results or the relevant answers. The application method of the invention can link scattered fraud information, can acquire comprehensive fraud information in response to inquiry or inquiry, and can also acquire corresponding fraud prevention prompts, thereby effectively preventing fraud generation and guaranteeing property safety of users.

In some embodiments, the aligning the entities in the pre-selected set of triples to obtain a set of triples includes: calculating the similarity between every two entities in the triplet preselection; when the similarity between two entities meets a set threshold value, judging whether the relationship in the triples where the two entities are located is consistent; when the relation among the triples of the two entities is consistent, deleting one of the triples of the two entities; and when the relationship in the triples where the two entities are located is inconsistent, replacing one of the two entities with the other.

In some embodiments, the computing the similarity between each two entities in the pre-set of triples includes: calculating a first similarity between every two entities in the triplet pre-selection set by using a Dice distance method; calculating a second similarity between the two entities by using an edit distance method; and weighting and summing the first similarity and the second similarity to obtain the similarity between every two entities in the triplet preselection.

In some embodiments, the pre-building entity relationship extraction model includes: operation S41, training extraction rules of three elements of a triplet in the entity relation extraction model according to labels of each word in training text data to obtain pre-extraction rules, wherein the triplet comprises a first entity, a relation between the first entity and a second entity and the second entity; operation S42, verifying the pre-extraction rule of the entity relation extraction model by using the verification text data; operation S43, if the verification is passed, the pre-extraction rule is used as the extraction rule of the entity relation extraction model to be applied; and operation S44, if the verification is not passed, repeating operation S41 and operation S42 until the verification is passed.

In some embodiments, the determining, using a pre-built question-answer model, a relevant answer from the knowledge-graph in response to a query request includes: splicing the question vectors of the query request and m pre-selected related answer vectors in the knowledge graph by using a pre-constructed vector splicing model to obtain m spliced vectors, wherein m is an integer greater than or equal to 1; predicting the probability value of each splicing vector in the m splicing vectors by using a pre-constructed probability prediction model; and determining one of m pre-selected answers to the correlations as the correlation answer according to the ordering of the m probability values.

In some embodiments, the pre-building a vector stitching model includes: operation S61, training the splicing parameters in the vector splicing model according to a training sample to obtain training splicing parameters, wherein the training sample comprises a question vector and a preselected related answer vector corresponding to the question vector; operation S62, verifying training splicing parameters of the vector splicing model by using a verification sample; operation S63, if the verification is passed, applying the training splicing parameter as a model parameter of the vector splicing model; and operation S64, if the verification is not passed, repeating operations S61 and S62 until the verification is passed.

In some embodiments, the pre-building the probabilistic predictive model includes: operation S71, training the probability prediction parameters in the probability prediction model according to the spliced vector training samples to obtain training probability prediction parameters; operation S72, verifying training probability prediction parameters of the probability prediction model by using a spliced vector verification sample; operation S73, if the verification is passed, applying the training probability prediction parameter as a model parameter of the probability prediction model; and operation S74, if the verification is not passed, repeating operations S71 and S72 until the verification is passed.

Another aspect of the present disclosure provides an application apparatus of a knowledge-graph based on fraud information, including: the first construction module is used for executing the real-time construction of the knowledge graph according to the fraud information of the system in the bank obtained in real time; the first determining module is used for executing a response to a query request and determining a query result, wherein the query request comprises keywords related to nodes and/or edges, and the query result comprises the nodes and/or edges corresponding to the query request and associated information of the nodes and/or edges in the knowledge graph; a second determining module for executing a determination of a relevant answer from the knowledge graph in response to a query request using a pre-built question-answer model; and a matching module for executing a matching of the fraud prevention hint according to the query result or the related answer.

Another aspect of the present disclosure provides an electronic device comprising one or more processors and one or more memories, wherein the memories are configured to store executable instructions that, when executed by the processors, implement the method as described above.

Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions that, when executed, are configured to implement a method as described above.

Another aspect of the present disclosure provides a computer program product comprising a computer program comprising computer executable instructions which, when executed, are for implementing a method as described above.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments thereof with reference to the accompanying drawings in which:

FIG. 1 schematically illustrates an exemplary system architecture to which methods, apparatuses may be applied according to embodiments of the present disclosure;

FIG. 2 schematically illustrates a flowchart of an application method of a knowledge-graph based on fraud information, according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a flowchart of constructing a knowledge-graph in real time based on fraud information of an in-bank system obtained in real time, according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow chart for aligning entities in a pre-selected set of triples to obtain a set of triples, in accordance with an embodiment of the present disclosure;

FIG. 5 schematically illustrates a flow chart for calculating a similarity between each two entities in a pre-set of triples, in accordance with an embodiment of the present disclosure;

FIG. 6 schematically illustrates a flow diagram of a pre-built entity relationship extraction model according to an embodiment of the disclosure;

FIG. 7 schematically illustrates a flow chart for determining relevant answers from a knowledge graph in response to a query request using a pre-built question-answer model, in accordance with an embodiment of the present disclosure;

FIG. 8 schematically illustrates a flow diagram of pre-building a vector stitching model according to an embodiment of the present disclosure;

FIG. 9 schematically illustrates a flow diagram of pre-building a probabilistic predictive model in accordance with an embodiment of the disclosure;

FIG. 10 schematically illustrates a flow chart of a method of entity alignment based on Dice and edit distance, according to an embodiment of the disclosure;

FIG. 11 schematically illustrates a flow diagram of question-relation semantic matching according to an embodiment of the present disclosure;

FIG. 12 schematically illustrates an overall architecture diagram of a banking fraud field knowledge graph retrieval platform, according to an embodiment of the present disclosure;

FIG. 13 schematically illustrates a back-end web architecture diagram according to an embodiment of the present disclosure;

FIG. 14 schematically illustrates a flow diagram of the operation of the bottom layer when this operation is extended by double clicking in accordance with an embodiment of the present disclosure;

FIG. 15 schematically illustrates a block diagram of an application apparatus of a knowledge-graph based on fraud information, according to an embodiment of the present disclosure;

FIG. 16 schematically illustrates a block diagram of a first build module in accordance with an embodiment of the disclosure;

fig. 17 schematically illustrates a block diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related personal information of the user all conform to the regulations of related laws and regulations, necessary security measures are taken, and the public order harmony is not violated. In the technical scheme of the disclosure, the processes of acquiring, collecting, storing, using, processing, transmitting, providing, disclosing, applying and the like of the data all conform to the regulations of related laws and regulations, necessary security measures are adopted, and the public order harmony is not violated.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

Where a formulation similar to at least one of "A, B or C, etc." is used, in general such a formulation should be interpreted in accordance with the ordinary understanding of one skilled in the art (e.g. "a system with at least one of A, B or C" would include but not be limited to systems with a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). The terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more of the described features.

Embodiments of the present disclosure provide a method, apparatus, electronic device, computer-readable storage medium, and computer program product for applying knowledge-graph based on fraud information. The application method of the knowledge graph based on the fraud information comprises the following steps: according to the fraud information of the system in the bank obtained in real time, a knowledge graph is constructed in real time; determining a query result in response to a query request, wherein the query request comprises keywords related to nodes and/or edges, and the query result comprises the nodes and/or edges corresponding to the query request in a knowledge graph and associated information of the nodes and/or edges; determining relevant answers from the knowledge graph by utilizing a pre-constructed question-answer model in response to the query request; and matching the fraud prevention prompts according to the query results or the related answers.

Wherein, according to the fraud information of the system in the bank that obtains in real time, construct the knowledge graph in real time, include: performing entity and relation extraction on fraud information of a system in a bank obtained in real time by using a pre-constructed entity relation extraction model to obtain a triplet pre-selection set; aligning the entities in the triplet preselection set to obtain a triplet set; and constructing a knowledge graph according to the triplet set.

It should be noted that, the application method, apparatus, electronic device, computer readable storage medium and computer program product of the knowledge graph based on fraud information of the present disclosure may be used in the field of artificial intelligence technology, and may also be used in any field other than the field of artificial intelligence technology, such as financial field, and the field of the present disclosure is not limited herein.

Fig. 1 schematically illustrates an exemplary system architecture 100 of an application method, apparatus, electronic device, computer-readable storage medium and computer program product, where knowledge-graph based on fraud information may be applied, according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios.

As shown in fig. 1, a system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.

It should be noted that, the application method of the knowledge graph based on the fraud information provided by the embodiments of the present disclosure may be generally executed by the server 105. Accordingly, the application device of the knowledge-graph based on fraud information provided by the embodiments of the present disclosure may be generally disposed in the server 105. The application method of the knowledge graph based on fraud information provided by the embodiments of the present disclosure may also be performed by a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the application apparatus of the knowledge-graph based on fraud information provided by the embodiments of the present disclosure may also be provided in a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

The application method of the knowledge-graph based on the fraud information according to the embodiment of the present disclosure will be described in detail with reference to fig. 2 to 9 based on the scenario described in fig. 1.

Fig. 2 schematically illustrates a flowchart of an application method of a knowledge-graph based on fraud information according to an embodiment of the present disclosure.

As shown in FIG. 2, the method for applying the knowledge-graph based on the fraud information in this embodiment includes operations S210 to S240.

In operation S210, a knowledge map is constructed in real time according to fraud information of the in-bank system obtained in real time.

In operation S220, in response to the query request, a query result is determined, wherein the query request includes keywords related to the node and/or the edge, and the query result includes the node and/or the edge corresponding to the query request and associated information of the node and/or the edge in the knowledge graph.

In operation S230, a related answer is determined from the knowledge graph in response to the query request using a previously constructed question-answer model.

In operation S240, the anti-fraud hint is matched according to the query result or the related answer.

As shown in fig. 3, operation S210 constructs a knowledge graph in real time according to fraud information of the in-bank system obtained in real time, including operations S211 to S213.

In operation S211, entity and relationship extraction is performed on fraud information of the in-bank system obtained in real time by using the pre-constructed entity relationship extraction model, so as to obtain a triplet pre-selection set.

In operation S212, entities in the pre-selected set of triples are aligned to obtain a set of triples.

In operation S213, a knowledge-graph is constructed from the triplet set. The operations S211 to S213 can facilitate the realization of the real-time construction of the knowledge graph according to the fraud information of the system in the bank obtained in real time, wherein the data of the aligned triplet set is cleaner and has no noise interference compared with the triplet preselect set which does not perform the alignment operation, so that the knowledge graph constructed by the triplet set is more simplified, and the knowledge graph can be more efficiently used in the operations S220 and S230.

Fig. 4 schematically illustrates a flow chart for aligning entities in a pre-selected set of triples to obtain a set of triples, in accordance with an embodiment of the present disclosure.

Operation S212 aligns the entities in the pre-selected set of triples to obtain a set of triples, including operations S2121-S2124.

In operation S2121, a similarity between each two entities in the pre-set of triples is calculated.

As an implementation, as shown in fig. 5, operation S2121 calculates the similarity between each two entities in the triplet pre-selection set, including operations S21211-S21213.

In operation S21211, a first similarity between each two entities in the pre-set of triples is calculated using the Dice distance method. For example, the first similarity may be determined by Dice (e ₁ ，e ₂ ) The first similarity can be obtained by the formula (1).

Wherein e ₁ Representing one of every two entities, e ₂ Representing the other of every two entities, len (e ₁ ) Representation e ₁ Is a character string length of Len (e ₂ ) Representation e ₂ Is of the character string length, common (e ₁ ，e ₂ ) Representation e ₁ And e ₂ Number of identical characters.

In operation S21212, a second similarity between the two entities is calculated using the edit distance method. It will be appreciated that the principle of editing distance method can be as shown in formula (2).

Distance is the edit Distance, i is the length of one string and j is the length of the other string, when min (i, j) =0, it is stated that one of the lengths of i and j is null, and at this time, max (i, j) insertion operations are required to convert the null string into non-null, in which case the edit Distance is max (i, j).

When min (i, j) noteq0, there are three cases for different actions. Distance (i-1, j-1) +1 represents performing a delete operation to delete the i-th character of a stringThe method comprises the steps of carrying out a first treatment on the surface of the Distance (i-1, j) +1 represents an insert operation, i.e., inserting the j-th character in another string; distance (i-1. J-1) _i≠j +1 represents a replacement operation in which when the ith character of one of the two character strings and the jth character of the other character string are different, the ith character of the one character string is replaced by the jth character of the other character string. Based on this principle, entity e can be found according to equation (3) ₁ And e ₂ Editing distance between them.

D(e ₁ ，e ₂ )＝Distance(Len(e ₁ )Len(e ₂ )) (3)

Wherein Len (e ₁ ) Representation e ₁ Is a character string length of Len (e ₂ ) Representation e ₂ Is a string length of (a) is a string length of (b).

It will be appreciated that, since the meaning of the data for the Dice distance is opposite to the edit distance, the greater the Dice distance, the higher the similarity, and the greater the edit distance, the lower the similarity. And the distance is in the range of 0 to 1, and the editing distance D (e ₁ ，e ₂ ) The scaling process is performed, and the scaled value is understood to be a second similarity, which may be obtained by Score (e ₁ ，e ₂ ) The second similarity can be obtained by the formula (4).

In operation S21213, the first similarity and the second similarity are weighted and summed to obtain a similarity between each two entities in the pre-selected set of triples. For example, the similarity between every two entities may be calculated using Com (e ₁ ，e ₂ ) The similarity can be obtained by the formula (5).

Com(e ₁ ，e ₂ )＝α·Dice(e ₁ ，e ₂ )+β·Score(e ₁ ，e ₂ ) (5)

Wherein alpha represents entity e ₁ Beta represents entity e ₂ Is a weight of (2).

Calculation of the similarity between each two entities in the pre-selected set of triples may be facilitated by operations S21211-S21213.

In operation S2122, when the similarity between the two entities satisfies the set threshold, it is determined whether the relationship in the triples where the two entities are located is consistent. It can be understood that, assuming that the set threshold may be a critical value, when the similarity is greater than the critical value, it is determined that the two entities are identical. When the two entities are judged to be the same, whether the relationship in the triples where the two entities are located is consistent needs to be continuously judged.

In operation S2123, when the relationships among the triples where the two entities are located are identical, one of the triples where the two entities are located is deleted.

In operation S2124, when the relationship in the triples where the two entities are located is inconsistent, one of the two entities is replaced with the other. Thus, the alignment of entities in the triad preselection can be facilitated through operations S2121-S2124, resulting in a triad set.

FIG. 6 schematically illustrates a flow diagram of a pre-built entity relationship extraction model according to an embodiment of the disclosure.

The entity relationship extraction model is constructed in advance, and the entity relationship extraction model comprises operations S41 to S44.

In operation S41, according to the label of each word in the training text data, the extraction rule of three elements of the triplet in the entity relation extraction model is trained to obtain a pre-extraction rule, wherein the triplet includes a first entity, a relation between the first entity and a second entity, and the second entity.

In operation S42, the pre-extraction rule of the entity relationship extraction model is verified using the verification text data.

In operation S43, if the verification is passed, the pre-extraction rule is applied as an extraction rule of the entity relationship extraction model.

In operation S44, if the verification is not passed, operations S41 and S42 are repeatedly performed until the verification is passed.

The pre-construction of the entity relationship extraction model can be facilitated through operations S41 to S44.

Fig. 7 schematically illustrates a flow chart for determining relevant answers from a knowledge graph in response to a query request using a pre-built question-answer model, in accordance with an embodiment of the present disclosure.

Operation S230 determines a relevant answer from the knowledge graph in response to the query request using the previously constructed question-answer model, including operations S231 to S233.

In operation S231, the question vector of the query request and m pre-selected relevant answer vectors in the knowledge graph are spliced by using a pre-constructed vector splicing model, so as to obtain m spliced vectors, where m is an integer greater than or equal to 1.

In operation S232, a probability value of each of the m splice vectors is predicted using a previously constructed probability prediction model. For example, the probability value of each spliced vector may be represented by P, and the probability value may be found by formula (6).

P＝sigmoid(W _f ×f+b _f ) (6)

Wherein f may represent a splice vector, W _f And b _f May be expressed as a probability prediction parameter.

In operation S233, one of m pre-selected correlation answers is determined as a correlation answer according to the ranking of the m probability values. For example, m probability values may be ranked from large to small, and the preselected associated answer corresponding to the first probability value of the ranking may be used as the associated answer; for example, m probability values may be ranked from small to large, and a preselected answer corresponding to the probability value with the first last ranking may be used as the answer.

The determination of the relevant answers from the knowledge graph in response to the query request using the previously constructed question-answer model can be facilitated through operations S231 to S233.

Fig. 8 schematically illustrates a flow chart of pre-building a vector stitching model according to an embodiment of the present disclosure.

The vector stitching model is constructed in advance, including operations S61-S64.

In operation S61, training is performed on the splicing parameters in the vector splicing model according to the training sample, so as to obtain training splicing parameters, where the training sample includes a question vector and a preselected relevant answer vector corresponding to the question vector.

In operation S62, training splice parameters of the vector splice model are verified using the verification samples.

In operation S63, if the verification is passed, the training splice parameters are applied as model parameters of the vector splice model.

In operation S64, if the verification is not passed, operations S61 and S62 are repeatedly performed until the verification is passed. The pre-construction of the vector stitching model can be facilitated through operations S61-S64.

Fig. 9 schematically illustrates a flow chart of pre-building a probabilistic predictive model in accordance with an embodiment of the disclosure.

The probability prediction model is constructed in advance, including operations S71 to S74.

In operation S71, training the probability prediction parameters in the probability prediction model according to the spliced vector training samples to obtain training probability prediction parameters.

In operation S72, training probabilistic predictive parameters of the probabilistic predictive model are validated using the stitching vector validation samples.

In operation S73, if the verification is passed, the training probability prediction parameters are applied as model parameters of the probability prediction model.

If the verification is not passed, operation S74, operations S71 and S72 are repeatedly performed until the verification is passed. The pre-construction of the probabilistic predictive model can be facilitated by operations S71-S74.

An application method of the knowledge-graph based on fraud information according to an embodiment of the present disclosure is described in detail below with reference to fig. 10 to 14. It is to be understood that the following description is exemplary only and is not intended to limit the disclosure in any way.

The application method of the knowledge graph based on the fraud information according to the embodiment of the disclosure may include the following steps.

1. And (6) constructing a bank data information knowledge graph.

The present disclosure has studied and implemented the key technologies required for knowledge graph construction, namely knowledge extraction, entity alignment and knowledge storage. The present disclosure designs a BERT language model-based entity-relationship joint extraction model. Meanwhile, aiming at the condition that one sentence contains a plurality of triples, the method and the device adopt the idea of combining pointers with labels to be applied to the knowledge extraction model of the disclosure. Aiming at the entity with semantic ambiguity, the method provides a Dice and edit distance joint algorithm for entity alignment, optimizes the map effect and avoids the entity ambiguity. And finally, storing the obtained triplet information through Neo4 j.

2. And (6) constructing a bank data information knowledge graph question-answering model.

The present disclosure achieves the entertainment knowledge graph-based question-answering of the present disclosure through recognition of fraud information entities, candidate answer generation, and banking domain fraud question-relationship semantic matching models. The relation semantic matching model realizes question and answer of the present disclosure based on the bank domain fraud knowledge graph. Aiming at the importance of a fraud entity identification task in the banking field in a question-answering model, the method adopts the BERT-BiLSTM-CRF model to extract fraud information entities in the banking field in questions, and the model can effectively improve the entity identification effect through a comparison experiment. The effect is identified for the entity. Aiming at the vector representation deficiency that only [ CLS ] positions are used in BERT downstream tasks, the method combines one-dimensional convolution and maximum pooling operation, designs the method and the device, combines one-dimensional convolution and maximum pooling operation, designs a bank field fraud question-relation semantic matching model of a bank field fraud question language model based on a BERT language model, realizes fusion of coding information of all positions of a BERT coding sequence, and improves the capability of model identification relation

3. Search early warning platform based on Web bank field fraud prevention.

Based on the knowledge graph constructed by the method, a Web page frame is built by utilizing a flash library of Python, and a search engine is built by utilizing an elastomer search. And through the interactive design of the technologies such as a flash background development framework, an Echarts graphic visualization framework, a BootStrap front end visualization framework and the like, a knowledge graph retrieval recommendation platform in the field of the disclosure is built. The platform integrates the modules of map visualization, map retrieval, intelligent recommendation, map expansion, dynamic adjustment of a time axis and the like. Can meet the requirement of the current society for acquiring news in the fraud field.

The specific technical scheme of the three steps is described in detail below.

1. The entity relationship joint extraction model.

The present disclosure uses a knowledge extraction algorithm constructed based on BERT, and selects the banking fraud field as a knowledge extraction Object, extracts a feature vector of text data through the BERT layer, predicts a Subject ' accordingly, and then predicts and extracts a corresponding relationship ' Link ' and Object ' Subject ' according to the extracted ' Subject '. The main flow of the algorithm is data preprocessing, model construction and model training.

The specific model construction process is as follows.

(1) The text is first processed into an ID sequence and then passed into the BERT layer as input to the overall model, which is then feature extracted by the network layer, which is then able to convert the text into a vector-coded sequence.

(2) We then access the vector sequence output by BERT through LN (Layer Normalization), layer normalization, to two classifiers to predict the Subject ".

(3) After predicting "Subject", we extract the feature vectors corresponding to the start and end positions of the Subject "from the sequence output by BERT according to the predicted" Subject ".

(4) The output sequence of the BERT is then layer normalized using the feature vector of "Subject" as a condition.

(5) Finally, the corresponding Object is predicted by using the same mode for each relation Link through a plurality of classifiers after the condition normalization. Thus, the information extraction task is skillfully converted into the classification task.

The current method for extracting the triples is to find all the entities in the text data through named entity recognition technology, and then classify the relationships of the entities to obtain the relationships. However, this method is difficult to solve the problem that one Object corresponds to a plurality of objects, and it divides the information extraction into two tasks, and two models need to be independently trained to complete, which makes the error accumulation of the two parts relatively complex. The present disclosure herein will use one model to accomplish the extraction of information, converting the extraction of physical keywords of text into a combination of multiple, categorized tasks. Specifically, the extraction of Subject subjects may translate into finding the beginning and ending positions of the Subject in the sentence. We can construct two classifiers and classify each word twice to determine if the word is the beginning and end of the Subject's Subject, i.e., output 1 otherwise output 0. And after the processing is finished, extracting the main body according to the labeling of the output sequence. Then, the Object is extracted for each type of relation Link in the same manner as the Object is extracted, taking the coding information of the Object as a condition.

Therefore, we use sigmoid activation function in the model, and adopt "0/1 labeling" strategy, and judge the position of the extracted entity or relation in the text vector by the positions of 1 and 0 in pointers Start and End.

Specifically, text pretreatment is performed before sentence text is input into the BERT language model, and the input text information is represented by word vectors, position vectors and segmentation vectors. And (4) inputting the BERT language model for coding, and extracting the context characteristics to obtain a sequence. The sequence contains the information of the text before and after each word, and then the information is transmitted to the next link for normalization operation. And through two pointer classifiers, the two pointer classifiers are realized by adopting a sigmoid activation function. The pointer marking method is that S-Start is 1 and represents a Start pointer, S-End is 1 and represents an End pointer, and the main entity Subject in the input sentence can be judged through the codes output by S-Start and S-End.

Taking the entity 'Zhang Sanning' as an example, we can obtain the entity 'Zhang Sanning' through S-Start and S-End pointers, and obtain the guest entity Object corresponding to the Link by removing the Object corresponding to the whole sentence in the 'Link' connected with the Object. Finally we can get a triplet in the form of "[ Object, link, object ]". As shown in fig. 1.1, taking the relation "fraud-transferred" as an example, assuming that the main entity Subject is "Zhang san", the O-Start and O-End positions obtained by the relation "fraud-transferred" can determine position information indicating guest entities corresponding to the main entity "Zhang san" and the relation "fraud-transferred", namely "Liqu". Thus, the successfully predicted triplet "[ Zhang three, fraudulently transferred, lifour ]" is output.

In the selection of the loss function, cross entropy is selected because the model deals with the classification problem. As shown in equation (7), where Loss is a Loss value, x is a sample, n is the number of samples, y is the actual value of the sample,the output values are predicted for the model. It is able to recognize y and +.>May be expressed herein as a distinction between predicted and actual derived variables. Smaller values demonstrate more accurate results.

2. The entity relationship in the field of banking fraud is aligned.

Because of reasons such as grammar mismatch, logic mismatch, semantic mismatch and the like, a large number of heterogeneous problems exist in the knowledge graph, and information between the knowledge graph and the knowledge graph cannot be interacted with each other. For example, zhang Xiaosan and Zhang Sanare representations of the same user, and are determined as two entities. The triples "[ Zhang Xiaosan, fraud transferred, prunefour ]" and the triples "[ Zhang three, fraud transferred, prunefour ]" represent the same triples relationship. Should be linked to the same node in the knowledge graph. Due to the complexity of news texts, this happens in many cases, which affects the quality of the atlas. The entity alignment technique can disambiguate the situation, so that after the knowledge extraction task is completed, entity alignment is required to be performed on the extracted triplet entity, and the disclosure provides an entity alignment method based on the Dice and the editing distance, and a specific flow is shown in fig. 10.

The disclosure provides a similarity-combined entity alignment method, which obtains a score by weighting a position distance and an editing distance. Setting a threshold value, when the joint similarity reaches the threshold value, indicating that the semantics of two entities are consistent, but the knowledge graph is a network formed by a plurality of triples, and possibly has the conditions of different relations corresponding to the same entities, so that people need to judge whether the relations connected with the two entities are the same or not, if the same indicates that the nodes are repeated, performing entity deletion operation, and if the relations are different, indicating that the meanings contained by the triples are different, performing entity replacement at the moment.

The method adopts a method of fusing the Dice coefficient and the editing distance to carry out weighted calculation on the similarity between the entities. The Dice distance may calculate the similarity of two strings, which are also one of the sets. The Dice coefficient is defined as follows: the scale factor is proportional to the similarity, the larger the value is, the more similar the two sets are, the value range is 0-1, and the definition is shown in formula (8).

Where |A|n|B| is the intersection between A and B representing similar values in the two sets, |A| and |B| representing the number of A and B, respectively, the numerator times 2 to balance, and the Dice coefficient is defined for the string as shown in equation (9) since the denominator repeatedly computes two characters.

Len(e ₁ ) And Len (e) ₂ ) Respectively represent character string e ₁ And e ₂ Length of common (e) ₁ ，e ₂ ) Representing a master entity e ₁ And e ₂ Number of identical characters.

The edit distance is calculated by changing one character into another character through deleting, adding and replacing steps, and the size and the similarity of the edit distance are inversely proportional to each other, and the smaller the step number is, the more approximate two character strings are indicated.

For any two strings s ₁ Sum s ₂ Using Distance (i, j) to represent s in the string ₁ First i characters and character strings s ₂ The edit distance of the first j characters in the list, the edit distance D(s) ₁ ，s ₂ ) Satisfy equation (10).

D(s ₁ ，s ₂ )＝Distance(Len(s ₁ )+Len(s ₂ )) (10)

Wherein Len(s) _i ) Representing a string s _i Distance (i, j) satisfies equation (11).

When min (i, j) =0, s is described ₁ Sum s ₂ One of the strings is empty, and at this time, max (i, j) insertion operations are required to convert the empty string into non-empty, in which case the edit distance is max (i, j).

When min (i, j) noteq0, there are three cases for different actions. Distance (i-1, j-1) +1 represents the operation of deleting the character string s ₁ Deleting the ith character in the list; distance (i-1, j) +1 represents an insert operation, i.e., insert string s ₂ The j-th character of (a); distance (i-1. J-1) _i≠j +1 represents a replacement operation in which when the ith character of one of the two character strings and the jth character of the other character string are different, the ith character of the one character string is replaced by the jth character of the other character string. Based on this principle, the entities e1 and e can be found according to formula (12) ₂ Editing distance between them.

D(e ₁ ，e ₂ )＝Distance(Len(e ₁ )Len(e ₂ )) (12)

It will be appreciated that, since the meaning of the data for the Dice distance is opposite to the edit distance, the greater the Dice distance, the higher the similarity, and the greater the edit distance, the lower the similarity. And the distance is in the range of 0 to 1, and the editing distance D (e ₁ ，e ₂ ) Performing scaling processing to convert the value of the editing distance into a value in the range of 0-1, and using Score (e) ₁ ，e ₂ ) The expression can be obtained by the expression (13).

Score(e ₁ ，e ₂ ) And D (e) ₁ ，e ₂ ) In inverse proportion to when e ₁ And e ₂ The higher the similarity, the edit distance D (e ₁ ，e ₂ ) The smaller the Score (e ₁ ，e ₂ ) Large, represent e ₁ And e ₂ Is a difference word. Otherwise, the terms are synonymous.

After obtaining e ₁ And e ₂ After the Dice coefficient and edit distance, the present disclosure devised a similarity method that combines the two as shown in equation (14).

Com(e ₁ ，e ₂ )＝α·Dice(e ₁ ，e ₂ )+β·Score(e ₁ ，e ₂ ) (14)

Where α and β represent the weights occupied by the Dice and edit distances, respectively, in the algorithm.

Obtaining two main entities e ₁ And e ₂ After the similarity is combined, the value is used as a standard measurement to judge whether the entities are similar, so that a critical value is set, and when the similarity is larger than 0.7, the entities are judged to be the same. Next, it is determined whether other relationships linked by the two entities in a triplet unit are coincident, and if so, the relationship is representedThese are two duplicate entities that perform delete triples; if the entity is not overlapped, the entity is a multi-link node for linking other entity relationships, and the same entity is contained in the triplet, and then the operation of replacing the entity is executed. Thus, the entity alignment operation is completed.

3. Banking fraud field question-relation semantic matching model based on BERT language model.

Based on the candidate triplet data generated by entity relation joint extraction, in order to calculate the semantic matching score between the user question and the relation information in the candidate triplet, the task of calculating the semantic matching score is converted into 0-1 classification problem to be solved. Wherein, tag 1 indicates that the question and the relation are semantically matched, and tag 0 indicates that the question and the relation are semantically mismatched.

For the 0-1 classification problem, the last classified activation function generally adopts a sigmoid activation function, the output result is the probability of an event, the range is 0-1, and the class of the input sample is judged by setting a threshold value. In the task of calculating the semantic matching score, the step of finally setting a threshold value in the 0-1 classification problem is omitted, and the event probability output by the sigmoid activation function is used as the semantic matching score between the input question and the relation. The closer the probability of output is to 1, the closer the semantics between the input question and the relationship.

The present disclosure combines the BERT language model, one-dimensional convolution, max pooling, and sigmoid activation functions to design a banking fraud field question-relationship semantic matching model.

Specifically, as shown in fig. 11, the input question q and the relation p are spliced by using special characters [ CLS ] and [ SEP ], and then the vector representation is input to the BERT coding layer, to obtain the BERT coding sequence T. Wherein the vector representation t of the corresponding position of the CLS character contains both the features of the character of the current position and the features of the entire context. In the downstream tasks of BERT, tasks such as text classification, text matching, etc. are typically implemented based on a vector representation t of [ CLS ] character positions. Considering that the feature information of other positions in the text can be ignored only by using the vector representation of the [ CLS ] position, the bank fraud field question-relation matching model is realized by combining one-dimensional convolution and maximum pooling operation on the basis of using the BERT language model. The method comprises the steps of dividing a coding sequence T output by a BERT coding layer into two parts for representation, directly using one-dimensional convolution and maximum pooling operation to extract text depth characteristics of the sequence T2, and then splicing the sequence T1 through vector splicing operation to further realize fusion of coding information of all positions of the BERT coding sequence. The disadvantage of using only vector representations of [ CLS ] positions in the BERT downstream tasks can be solved by this method.

During the operation of performing one-dimensional convolution, the present disclosure employs convolution kernels of different sizes to extract feature vectors of text. In the convolution operation of natural language text, the size of the convolution kernel, that is, the size of the moving window, is preferably 3, 4 and 5, so as to extract the local features of the text under different windows. For each window of size s, a kernel matrix W is used _S And the non-linear function relu convolves the code sequence T2 of BERT, the volume and processing can be as shown in equation (15).

C _i ＝relu(W _S ×t[i：i+s]+B _S ) (15)

Wherein W is _S And B _S Can be obtained through training, t [ i: i+s]The vector representation within the relative position i to position i + s is selected from the BERT coding sequence T2, i.e. the vector representation within the moving window is selected. For each window, carrying out convolution operation on a coding sequence T2 with the length of 1 to finally obtain local features, extracting the maximum features in c through maximum pooling operation for the local features c obtained by each window, connecting the local features of a plurality of windows, and further obtaining the feature vector f of a pooling layer _{max pooling} And then splicing the characteristic vector with T1 to obtain a final characteristic vector f.

Finally, the probability distribution of the tag, i.e., the probability that the input question and relationship are identified as positive samples "1" and negative samples "0", is calculated by the sigmoid function. The probability of being identified as a positive sample '1' is taken as the matching score of the question-relation, the higher the score is, the higher the matching degree between the question and the relation is, and the probability is calculated as shown in a formula (16).

P＝sigmoid(W _f ×f+b _f ) (16)

P represents the probability distribution of the model predicted labels. Wherein W is _f And B _f Is obtained through training.

4. Application of Web-based bank anti-fraud knowledge graph networks.

The flush employed in the present disclosure is a lightweight framework implemented by Python designed to provide the minimum subset of functionality required for Web development ^[51] . It is composed mainly of the Jinja2 template engine in the Werkzeug toolkit. The Jinja2 engine provides a website template inheritance mechanism for us, so that us can develop and modify the existing HTML template. And it has an HTML auto-escape mechanism that can prevent script attacks. Werkzeug integrates the URL webpage link routing request, can process and respond to the webpage access requests of a plurality of users at the same time, and can respond to different tasks initiated by the client.

Flashover does not require a binding connection, there are many built-in ways that enable the developer of the framework to design the application architecture using any means and tools. Thus, compared to other frameworks (e.g., django), the framework provides a developer with more flexibility, and when a client inputs a website to initiate HTTP requests to a server, the flashframework processes the requests. Through flash we can quickly build Web sites and services without designing and processing HTTP requests and responses and the like. The flash bottom layer has simple logic and can be quickly started. Therefore, the platform is built by adopting the flash framework, and the banking fraud field knowledge graph integrating the functions of graph visualization, graph retrieval, graph expansion, entity attribute display, most commonly access, related recommendation, time axis and the like is built by combining the built banking fraud field knowledge graph and the retrieval model.

The system uses Bootstrap ^[52] The frame is used as front end development frame, and the greatest characteristic of the frame is responsive interface design, which supports smooth switching of the browser between different devices, so that each plate of the page is well adapted to the display of the screen with different aspect ratiosShown. In addition, bootstrap rich components and jQuery plug-ins accessible through the Data API help develop an operation interface with a clear system structure.

The system uses ECharts ^[53] As a data visualization framework, ECharts can provide various types of visualized graphics, support information linkage among different charts, and realize style diversification of data.

The system adopts the elastic search to carry out semantic query on the key words input by the book, and returns all entity names and attribute values of the entities.

The system adopts a Neo4j graph database to construct a knowledge graph of the bank fraud field, queries information to traverse the database, returns all graphs connected with the node after finding the required node, and transmits all graphs to the front end for display.

The system uses the flash framework to build the application, and realizes the dynamic data loading function. The Ajax technology is combined to realize the Echarts software and database data calling operation.

The overall architecture of the bank fraud field knowledge graph retrieval platform designed by the disclosure is divided into a bank fraud field data layer, a bank fraud field technical layer and a knowledge graph application layer, wherein the technical layer corresponds to three important technical modules in bank fraud field extraction respectively. The overall structure of the platform is shown in fig. 12.

The data layer mainly comprises a data acquisition and data cleaning module. The data acquisition module combines the anti-crawler mechanism package encountered in the experimental process to design a knowledge crawler facing the bank fraud field according to the rules and the data form of the bank fraud field. The data cleaning module is used for preprocessing according to the application of the data mainly through standardized operations such as regular expressions and the like. And finally provides support for the construction of knowledge maps in the field of banking fraud and the application of the system.

The technical layer is divided into a technical layer for constructing a knowledge graph in the bank fraud field and a technical layer for searching.

Aiming at knowledge graph construction in the field of bank fraud, the method for autonomously labeling the data set for unstructured data is adopted, entities and relationships are defined, and triple information extraction is realized based on a BERT entity relationship joint extraction model. And normalizes it to a standard data format for subsequent knowledge storage. Aiming at the entity with semantic ambiguity, the disclosure provides a Dice and edit distance joint algorithm for entity alignment so as to improve the quality of the map. The knowledge-graph storage part of the banking fraud field mainly stores the triplet data and the attributes through Neo4 j.

Searching a knowledge graph aiming at the bank fraud field: the method is realized based on the constructed knowledge graph of the bank fraud field, natural language is converted into corresponding logic and query sentences through information submitted by a user into an elastic search, accurate query acceleration is carried out, knowledge graph information required by the user is obtained through accurate or fuzzy matching in a Neo4j knowledge graph base of the bank fraud field constructed by us, all nodes related to the knowledge graph information are found, and data are returned to the front end for rendering. Based on the method, the functions of identifying and inquiring the entities in the bank fraud field, inquiring the relationship among news, expanding the map, displaying the entity attributes, most commonly accessing, recommending the related, time axis and the like are achieved, and easy-to-operate application experience and exquisite pages are provided for users.

The application layer is used for carrying out application research on the knowledge graph obtained by the data layer storage and the technical layer algorithm training. A banking fraud field knowledge graph retrieval recommendation application with a rear end based on a flash frame and a front end Echarts and boost frame is built.

The application relates to a plurality of technologies, which are mainly divided into three-direction technologies, namely a database technology, an algorithm technology and a front-end and back-end interaction and design technology.

The database layer stores the triplet data and attributes primarily through Neo4 j. For large-scale data storage we use the py2neo packet in Python to store knowledge of the extracted triples. We connect python with Neo4j through the account password in Neo4j database. And constructing the entity name and the attribute of the node. And then connecting the host and guest entities based on entity links in the triplet data to realize the storage of the relationship. The index of the index and the type of the triplet are stored in the elastic search, information submitted by a user is stored in the elastic search, logic and query sentences corresponding to natural language conversion are subjected to accurate query acceleration, and knowledge map information required by the user is obtained through accurate or fuzzy matching in a Neo4j knowledge map library in the constructed bank fraud field.

At the algorithm layer, the method builds a deep learning model BERT, wherein the model is built based on a Tensoreflow and Keras deep learning framework, and the method also applies some natural language processing tool kits, english word segmentation packages ntk, language analysis packages numpy, pandas and the like.

At the front-back end interaction layer, a BootStrap framework is adopted to provide network line references for the chart display framework Echarts. And dynamically acquiring data from the knowledge graph by Echarts, and performing visual display on the knowledge graph network. The flash development framework completes front-end and back-end development of the bank fraud field knowledge graph retrieval platform, and builds an integral Web-based bank fraud field retrieval application system by combining a trained algorithm model and stored data.

The system is called a front end of an interface displayed to a user, is called a back end of the interface which is matched with the front end to continuously change data, and the back end of the system is used for data transmission with the front end based on a flash writing interface. The page body uses a Bootstrap framework to realize the positions and the layout of each functional module, and firstly, a DOM container, namely a display position of the knowledge graph, is reserved for the knowledge graph. Bootstrap supports smooth switching of the browser between different devices, so that each plate of the page is well adapted to display of screens with different aspect ratios. In addition, bootstrap rich components and jQuery plug-ins accessible through the Data API can help develop a clearly structured operation interface.

Echarts is a library for visualizing data, and mainly reveals the data in a chart form, a knowledge graph is also a graph, the visualization of the knowledge graph in the bank fraud field depends on the front end frame of the Echarts, the Echarts is static, and the information inquiry and the display of the knowledge graph are dynamic processes, so that the application needs to constantly and dynamically acquire the data, and the dynamic data loading function related to the application combines with Ajax technology to carry out the data retrieval operation of the Echarts software and the database. The main implementation logic is as follows.

(1) The JSP page (JAVA server page) requests data from the background by means of AJAX1 asynchronous data loading, and searches the news information of the banking fraud field input by the user.

(2) Servlet page skip acquires bank fraud field information transmitted by the JSP page to query, and stores the query result as list.

(3) And processing the list page to obtain json-form data acceptable by the JSP interface, and transmitting the queried information back to the JSP interface.

(4) The JSP page acquires json data transmitted by the background, and adjusts the format of the data according to the mode meeting the requirements. And obtaining Node and Link data of Echarts meeting assignment conditions. And assigning the finally obtained data through myChart. SetOption of Echarts, and converting the transmitted json data into a data format received by the Echarts, so that dynamic display of the map can be realized. The back-end web page architecture is shown in FIG. 13.

Aiming at keywords input by a user, firstly, word segmentation operation is carried out on the keywords input by the user through an nltk library of Python, index and type in an elastic search library are matched, a query result is analyzed, and then nodes and relations are matched through traversing a Neo4j graph database. Neo4j can perform precise matching and fuzzy matching. We first perform a precise match. The accurate matching only matches the keywords, the inclusion degree is lower, but the accuracy is higher.

The fuzzy matching of Neo4j is represented by using = - 'fuzzy matching object' in two ways. Alternatively, the representation is based on a position relation, and the relationship such as start with, end with, and contents can be used. Unlike the previous exact matching, the fuzzy matching converts the keyword into a fuzzy matching object, and the repeated query result may occur, so that the deduplication operation needs to be performed after the matching information is obtained. And finally obtaining the matching node information.

Due to the complexity of the fraud information, there is a complex link between a large number of nodes, in order to better exploit the connection, a true-to-name knowledge network is implemented. When a user clicks a certain node curiously, the expansion node and the original map are combined and displayed, and expansion is performed by taking the node as the center. The knowledge graph of the present disclosure has an extended functional module. Double clicking on a node invokes the function.

The effect after double clicking of the user cannot cover the original node, so that three steps of operation are needed, when the user double clicks, firstly, the currently displayed map information is reserved, then, secondary inquiry is carried out through the node clicked by the user, other node information related to the currently displayed map information is inquired, and finally, all the information is integrated and displayed, so that the effect of map expansion can be achieved. When a user double clicks the expansion map, the first step of saving the map information on the current page can lead to repeated saving of the information of the part, so that repeated nodes appear in the page. This step also requires the addition of a deduplication operation. The original nodes are covered while the current map is stored, and the new map expanded in this way is visually displayed as being diffused out of the new nodes when the nodes are clicked. The display effect is smooth, and the user experience is good. Fig. 14 shows our underlying operational flow when this operation is extended by double clicking.

The knowledge graph search website designed by the disclosure mainly comprises two pages, namely a knowledge graph search page and a knowledge graph display and interaction page. The search page is mainly used for acquiring keywords input by a user, obtaining the keywords, then giving the keywords to the back-end query Neo4j database, further finding out other nodes and relations which are mutually connected with the keyword nodes, and returning the rest nodes and relations to the knowledge graph display page for rendering and displaying.

And submitting the query by clicking a 'search' button on the right side of the search box to obtain a search sub page of the bank fraud information search platform. The page integrates the functions of map searching, map expanding, recommended searching, most common browsing, attribute displaying and the like. And visualizing the knowledge graph, namely displaying the network structure of the bank fraud field graph in a point-line-point mode.

Based on the knowledge graph application method based on the fraud information, the disclosure also provides an application device of the knowledge graph based on the fraud information. The application apparatus 10 based on the knowledge-graph of the fraud information will be described in detail with reference to fig. 15 and 16.

Fig. 15 schematically illustrates a block diagram of the application apparatus 10 based on knowledge-graph of fraud information according to an embodiment of the present disclosure.

The knowledge-graph based on fraud information application device 10 comprises a first construction module 1, a first determination module 2, a second determination module 3 and a matching module 4.

First building block 1, first building block 1 is configured to perform operation S210: and constructing a knowledge graph in real time according to the fraud information of the system in the bank obtained in real time.

The first determining module 2, the first determining module 2 is configured to perform operation S220: in response to a query request, determining a query result, wherein the query request comprises keywords related to nodes and/or edges, and the query result comprises nodes and/or edges corresponding to the query request in a knowledge graph and associated information of the nodes and/or edges.

The second determining module 3, the second determining module 3 is configured to perform operation S230: the relevant answers are determined from the knowledge graph in response to the query request using a pre-built question-answer model.

Matching module 4, matching module 4 is configured to perform operation S240: and matching the fraud prevention prompt according to the query result or the related answer.

In which fig. 16 schematically shows a block diagram of the first building module 1 according to an embodiment of the present disclosure.

The first construction module 1 is for performing real-time construction of a knowledge graph based on fraud information of an in-bank system obtained in real time, and the first construction module 1 may include an extraction unit 11, an alignment unit 12, and a first construction unit 13.

The extraction unit 11, the extraction unit 11 is used for extracting the entity and the relation of the fraud information of the system in the bank obtained in real time by utilizing the pre-constructed entity relation extraction model, so as to obtain the triplet pre-selection set.

And the alignment unit 12, the alignment unit 12 is used for aligning the entities in the triplet preselection set to obtain the triplet set.

The first construction unit 13, the first construction unit 13 is used for constructing the knowledge graph according to the triplet set.

According to some embodiments of the present disclosure, the alignment unit may include a calculation element, a determination element, a deletion element, and a replacement element.

And the calculating element is used for calculating the similarity between every two entities in the triad preselection.

And the judging element is used for judging whether the relation in the triples where the two entities are located is consistent or not when the similarity between the two entities meets a set threshold value.

And the deleting element is used for deleting one of the triples in which the two entities are located when the relations among the triples in which the two entities are located are consistent.

And the replacing element is used for replacing one of the two entities with the other entity when the relation among the triplets where the two entities are located is inconsistent.

According to some embodiments of the present disclosure, the computing element may include a first computing element, a second computing element, and a third computing element.

And the first calculating part is used for calculating the first similarity between every two entities in the triplet preselection by using a position distance method.

And the second calculating part is used for calculating the second similarity between the two entities by using an edit distance method.

And the third calculation part is used for carrying out weighted summation on the first similarity and the second similarity to obtain the similarity between every two entities in the triplet preselection.

According to some embodiments of the present disclosure, the application apparatus of a knowledge graph based on fraud information further includes a second construction module for constructing an entity relationship extraction model in advance, and the second construction module may include a first training unit, a first verification unit, a first determination unit, and a first repetition unit.

The first training unit is configured to train, according to the label of each word in the training text data, a rule for extracting three elements of a triplet in the entity relation extraction model to obtain a pre-extraction rule, where the triplet includes a first entity, a relation between the first entity and a second entity, and the second entity.

And a first verification unit for verifying the pre-extraction rule of the entity relationship extraction model using the verification text data in operation S42.

The first determining unit is configured to apply the pre-extraction rule as an extraction rule of the entity relationship extraction model if the verification is passed in operation S43.

And a first repeating unit for operation S44, and if the verification is not passed, repeating operations S41 and S42 until the verification is passed.

According to some embodiments of the present disclosure, the second determination module may include a stitching unit, a prediction unit, and a ranking unit.

The splicing unit is used for splicing the question vectors of the query requests and m pre-selected relevant answer vectors in the knowledge graph by using a pre-constructed vector splicing model to obtain m splicing vectors, wherein m is an integer greater than or equal to 1.

The prediction unit is used for predicting the probability value of each splicing vector in the m splicing vectors by using a pre-constructed probability prediction model.

And the sorting unit is used for determining one of m preselected correlation answers as the correlation answer according to the sorting of the m probability values.

According to some embodiments of the present disclosure, the apparatus for applying a knowledge graph based on fraud information further includes a third construction module for constructing a vector concatenation model in advance, and the third construction module may include a second training unit, a second verification unit, a second determination unit, and a second repetition unit.

And the second training unit is used for training the splicing parameters in the vector splicing model according to the training sample to obtain training splicing parameters, wherein the training sample comprises a question vector and a preselected related answer vector corresponding to the question vector.

And the second verification unit is used for verifying the training splicing parameters of the vector splicing model by using the verification sample in operation S62.

And a second determining unit, wherein the second determining unit is configured to apply the training stitching parameters as model parameters of the vector stitching model if the verification is passed in operation S63.

And a second repeating unit for repeating operations S61 and S62 until the verification is passed, if the verification is not passed, in operation S64.

According to some embodiments of the present disclosure, the application apparatus of a knowledge graph based on fraud information further includes a fourth construction module for constructing a probability prediction model in advance, and the fourth construction module may include a third training unit, a third verification unit, a third determination unit, and a third repetition unit.

And the third training unit is used for training the probability prediction parameters in the probability prediction model according to the spliced vector training samples to obtain training probability prediction parameters in operation S71.

And a third verification unit, configured to verify the training probability prediction parameters of the probability prediction model using the stitching vector verification samples in operation S72.

And a third determining unit for applying the training probability prediction parameters as model parameters of the probability prediction model if the verification is passed, in operation S73.

And a third repeating unit for repeating operations S71 and S72 until the verification is passed, if the verification is not passed, at operation S74.

In addition, according to an embodiment of the present disclosure, any of the first constructing module 1, the first determining module 2, the second determining module 3, and the matching module 4 may be combined in one module to be implemented, or any of the modules may be split into a plurality of modules. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module.

According to embodiments of the present disclosure, at least one of the first building block 1, the first determination block 2, the second determination block 3 and the matching block 4 may be implemented at least partly as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or as hardware or firmware by any other reasonable way of integrating or packaging the circuits, or as any one of or a suitable combination of three of software, hardware and firmware.

Alternatively, at least one of the first building module 1, the first determination module 2, the second determination module 3 and the matching module 4 may be at least partly implemented as computer program modules, which, when run, may perform the respective functions.

Fig. 17 schematically illustrates a block diagram of an electronic device adapted to implement the above-described method according to an embodiment of the present disclosure.

As shown in fig. 17, an electronic device 900 according to an embodiment of the present disclosure includes a processor 901 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage portion 908 into a Random Access Memory (RAM) 903. The processor 901 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. Processor 901 may also include on-board memory for caching purposes. Processor 901 may include a single processing unit or multiple processing units for performing the different actions of the method flows according to embodiments of the present disclosure.

In the RAM 903, various programs and data necessary for the operation of the electronic device 900 are stored. The processor 901, the ROM 902, and the RAM 903 are connected to each other by a bus 904. The processor 901 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 902 and/or the RAM 903. Note that the program may be stored in one or more memories other than the ROM 902 and the RAM 903. The processor 901 may also perform various operations of the method flow according to embodiments of the present disclosure by executing programs stored in the one or more memories.

According to an embodiment of the disclosure, the electronic device 900 may also include an input/output (I/O) interface 905, the input/output (I/O) interface 905 also being connected to the bus 904. The electronic device 900 may also include one or more of the following components connected to the I/O interface 905: an input section 906 including a keyboard, a mouse, and the like; an output portion 907 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 908 including a hard disk or the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to an input/output (I/O) interface 905 as needed. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 910 so that a computer program read out therefrom is installed into the storage section 908 as needed.

The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 902 and/or RAM 903 and/or one or more memories other than ROM 902 and RAM 903 described above.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowcharts. The program code, when executed in a computer system, causes the computer system to perform the methods of embodiments of the present disclosure.

The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 901. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.

In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed, and downloaded and installed in the form of a signal on a network medium, via communication portion 909, and/or installed from removable medium 911. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 909 and/or installed from the removable medium 911. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 901. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.

According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be combined in various combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.

The embodiments of the present disclosure are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.

Claims

1. The application method of the knowledge graph based on the fraud information is characterized by comprising the following steps of:

according to the fraud information of the system in the bank obtained in real time, a knowledge graph is constructed in real time;

Determining a query result in response to a query request, wherein the query request comprises keywords related to nodes and/or edges, and the query result comprises the nodes and/or edges corresponding to the query request and associated information of the nodes and/or edges in the knowledge graph;

determining relevant answers from the knowledge graph by utilizing a pre-constructed question-answer model in response to a query request; and

matching fraud prevention cues according to the query results or the related answers,

wherein, the real-time knowledge graph construction according to the fraud information of the system in the bank obtained in real time comprises the following steps:

performing entity and relation extraction on fraud information of a system in a bank obtained in real time by using a pre-constructed entity relation extraction model to obtain a triplet pre-selection set;

aligning the entities in the triplet preselection set to obtain a triplet set; and

and constructing a knowledge graph according to the triplet set.

2. The method of claim 1, wherein said aligning the entities in the pre-selected set of triples to obtain a set of triples comprises:

calculating the similarity between every two entities in the triplet preselection;

When the similarity between two entities meets a set threshold value, judging whether the relationship in the triples where the two entities are located is consistent;

when the relation among the triples of the two entities is consistent, deleting one of the triples of the two entities; and

when the relationship in the triples where the two entities are located is inconsistent, one of the two entities is replaced with the other.

3. The method of claim 2, wherein said calculating the similarity between each two entities in said pre-selected set of triples comprises:

calculating a first similarity between every two entities in the triplet pre-selection set by using a Dice distance method;

calculating a second similarity between the two entities by using an edit distance method; and

and carrying out weighted summation on the first similarity and the second similarity to obtain the similarity between every two entities in the triplet preselection.

4. The method of claim 1, wherein the pre-building entity-relationship extraction model comprises:

operation S41, training extraction rules of three elements of a triplet in the entity relation extraction model according to labels of each word in training text data to obtain pre-extraction rules, wherein the triplet comprises a first entity, a relation between the first entity and a second entity and the second entity;

Operation S42, verifying the pre-extraction rule of the entity relation extraction model by using the verification text data;

operation S43, if the verification is passed, the pre-extraction rule is used as the extraction rule of the entity relation extraction model to be applied; and

if the verification is not passed, operation S44, operations S41 and S42 are repeatedly performed until the verification is passed.

5. The method of claim 1, wherein determining the relevant answer from the knowledge-graph in response to the query request using a pre-constructed question-answer model comprises:

splicing the question vectors of the query request and m pre-selected related answer vectors in the knowledge graph by using a pre-constructed vector splicing model to obtain m spliced vectors, wherein m is an integer greater than or equal to 1;

predicting the probability value of each splicing vector in the m splicing vectors by using a pre-constructed probability prediction model; and

and determining one of m pre-selected related answers as a related answer according to the ordering of the m probability values.

6. The method of claim 5, wherein the pre-building a vector stitching model comprises:

Operation S61, training the splicing parameters in the vector splicing model according to a training sample to obtain training splicing parameters, wherein the training sample comprises a question vector and a preselected related answer vector corresponding to the question vector;

operation S62, verifying training splicing parameters of the vector splicing model by using a verification sample;

operation S63, if the verification is passed, applying the training splicing parameter as a model parameter of the vector splicing model; and

in operation S64, if the verification is not passed, operations S61 and S62 are repeatedly performed until the verification is passed.

7. The method of claim 5, wherein the pre-constructing a probabilistic predictive model comprises:

operation S71, training the probability prediction parameters in the probability prediction model according to the spliced vector training samples to obtain training probability prediction parameters;

operation S72, verifying training probability prediction parameters of the probability prediction model by using a spliced vector verification sample;

operation S73, if the verification is passed, applying the training probability prediction parameter as a model parameter of the probability prediction model; and

if the verification is not passed, operation S74, operations S71 and S72 are repeatedly performed until the verification is passed.

8. An application device of knowledge graph based on fraud information, which is characterized by comprising:

the first construction module is used for executing the real-time construction of the knowledge graph according to the fraud information of the system in the bank obtained in real time;

the first determining module is used for executing a response to a query request and determining a query result, wherein the query request comprises keywords related to nodes and/or edges, and the query result comprises the nodes and/or edges corresponding to the query request and associated information of the nodes and/or edges in the knowledge graph;

a second determining module for executing a determination of a relevant answer from the knowledge graph in response to a query request using a pre-built question-answer model; and

a matching module for executing a matching of fraud prevention prompts according to the query result or the related answer,

and constructing a knowledge graph according to the triplet set.

9. An electronic device, comprising:

one or more processors;

one or more memories for storing executable instructions which, when executed by the processor, implement the method of any of claims 1-7.

10. A computer readable storage medium, characterized in that the storage medium has stored thereon executable instructions which, when executed by a processor, implement the method according to any of claims 1-7.

11. A computer program product comprising a computer program comprising one or more executable instructions which when executed by a processor implement the method according to any one of claims 1 to 7.