CN111597333B - Event and event element extraction method and device for block chain field - Google Patents

Event and event element extraction method and device for block chain field Download PDF

Info

Publication number
CN111597333B
CN111597333B CN202010343965.6A CN202010343965A CN111597333B CN 111597333 B CN111597333 B CN 111597333B CN 202010343965 A CN202010343965 A CN 202010343965A CN 111597333 B CN111597333 B CN 111597333B
Authority
CN
China
Prior art keywords
graph
event
block chain
text
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010343965.6A
Other languages
Chinese (zh)
Other versions
CN111597333A (en
Inventor
陈志鹏
刘春阳
张丽
姜文华
张旭
孙旻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Computer Network and Information Security Management Center
Original Assignee
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Computer Network and Information Security Management Center filed Critical National Computer Network and Information Security Management Center
Priority to CN202010343965.6A priority Critical patent/CN111597333B/en
Publication of CN111597333A publication Critical patent/CN111597333A/en
Application granted granted Critical
Publication of CN111597333B publication Critical patent/CN111597333B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Tourism & Hospitality (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an event and event element extraction method and device for the field of block chains, wherein the method comprises the following steps: step one, clustering web texts based on a block chain keyword graph to obtain a block chain text aggregation word graph; secondly, establishing a graph representation learning event of a graph attention mechanism and an element extraction method thereof based on the block chain text aggregate word graph; firstly, taking a block chain text aggregation word graph as input, performing expression learning of words based on a deep learning model of a graph attention model GAT, and performing extracted model training by using events and elements thereof until the model converges; and realizing a background interface of Tensorflow based on a converged model, predicting a new text to be extracted through the background interface, and returning an output extraction value. The invention can accurately extract the event and the event elements thereof.

Description

Event and event element extraction method and device for block chain field
Technical Field
The invention discloses an event and event element extraction method and device for the block chain field, relates to the field of web text analysis, and particularly relates to an event of an internet text for the block chain field and an event element extraction method thereof.
Background
In recent years, with the development of information technology, the industry will eventually move to the state of the industrial internet. Through technologies such as block chains, big data, artificial intelligence and the like, digital assets and services are more efficient, faster and safer. The block chain has the core characteristics of openness, transparency and non-falsification, and is expected to permeate and be applied to various industries such as digital currency, finance and the like in the future, for example, in 2016, ant golden clothes and China social aid foundation can cooperate, a block chain public benefit financing project 'hearing impaired children recap new voice' is online on a payment, love and heart donation platform, and financing is carried out for 10 hearing impaired children. It can be seen that the block chain platform gradually faces the fields of public welfare charity, government affairs cooperation, material management, enterprise financing, citizen identity and the like facing problems. Then for web text, events integrated with the blockchain platform become publicly transparent, making blockchain-oriented text analysis such as event extraction and event element (such as blockchain entity) analysis particularly important.
The text representation of the main problems of the existing text extraction algorithm is high in dimension and sparseness, the feature expression capability is weak, in addition, the feature engineering needs to be carried out manually, and the cost is high.
Also in recent years, there has been a great deal of research focused on web text analysis representing learning, which is a data structure that can be used in many ways. The Graph Attention Network (GAT) provides a very efficient method for analyzing Graph structure data. The method is a model for enhancing representation by using neighborhood information, and the graph structure representation learning is widely applied to research.
Disclosure of Invention
The invention aims to provide a method and a device for extracting events and event elements in the field of block chains. Wherein the web text is clustered in a keyword graph associated with the aggregation root blockchain and the clustering of the text is implemented. And the network representation learning based on the graph attention machine mechanism realizes event extraction and element extraction. In the invention, for the extraction of the web events and the extraction of the event elements in the block chain field, the construction of the keyword graph is carried out after the web texts are aggregated, and the learning modeling is represented based on the graph attention network, so that not only the semantic information of the texts can be coded, but also the structure information of the word graph can be coded, and the accuracy of the extraction of the events and the extraction of the event elements is increased.
The invention adopts the following technical scheme:
an event and event element extraction method for the block chain field comprises the following steps:
step one, clustering the web texts based on the blockchain keyword graph to obtain a blockchain text aggregate word graph, as shown in fig. 1, specifically as follows:
s11, screening texts containing seed words in the texts by taking the block chains as the seed words;
and S12, performing word segmentation and stop word removal processing on the text, and performing pre-training learning on the block chain text by using a Gensim tool to obtain the vector representation of the words.
And S13, obtaining text word graph clusters with similar semantics by using a word graph clustering algorithm.
S14, calculating TF-IDF values of the words in each text, and extracting the 30 words with the maximum TF-IDF values in the words related to the seed words.
And S15, taking the average value of the word vectors of the 30 words obtained in the step S14 as the vector representation of the block chain semantic related text.
S16, assigning a candidate set with the clustering cluster number k, clustering under different k conditions by using a Gaussian mixture model, and selecting the clustering result of k with the maximum contour coefficient as a final result. The formula of the contour coefficient is as follows.
Figure BDA0002469469830000031
Wherein, a i Mean value representing the Euclidean distance of node i to other points in the cluster, b i And represents the minimum value of the mean Euclidean distances from the node i to other cluster nodes.
S17, setting a contour coefficient sim i And (5) obtaining a block chain text aggregation word graph under the threshold value.
And step two, constructing a graph attention mechanism representing the learning event and an element extraction method thereof based on the block chain text aggregate word graph obtained in the step one, as shown in FIG. 2. Firstly, a block chain text aggregation word graph is taken as input, a deep learning model based on a graph attention model GAT is used for representing and learning words, and extracted models are trained by events and elements thereof until the models converge. And realizing a background interface of Tensorflow based on a converged model, predicting a new text to be extracted through the background interface, and returning an output extraction value. Specifically, the method comprises the following steps:
s21: keyword graph representation learning modeling of graph attention mechanism
Graph attention network representation learning is defined as a hidden feature vector representation of a learned word graph
Figure BDA0002469469830000032
Figure BDA0002469469830000033
In this formula, softmax is a classification function for determining whether the word, i.e., the extracted content, belongs to an event, and W is a parameter matrix, which is self-learned in model modeling, and
Figure BDA0002469469830000034
representing the neighbor node j of the word in the word graph. Wherein a is ij For attention weight, define as:
Figure BDA0002469469830000041
wherein e ij A hidden vector correlation representation vector defined as two events:
Figure BDA0002469469830000042
wherein w is e ij The parameter matrix in the function is self-learned in model modeling,
Figure BDA0002469469830000043
is a unit vector, is convenient for adjusting the parameter dimension,
Figure BDA0002469469830000044
is the transpose of the unit vector.
And S22, through model design, extracting event trigger words and event elements, namely, true samples, and performing cross entropy loss model training until the model converges.
The invention builds an event and element extraction device of a neural network based on a graph attention machine GAT model, comprising:
the information input module is used for carrying out standardized text processing on the source text acquired by the external database and then importing the processed source text;
the text aggregation module based on the block chain keyword word graph applies the word graph aggregation method to carry out word graph construction and aggregation processing of the block chain text word graph on the input source text;
and an event and element extraction module of the deep learning model based on the graph attention model GAT realizes the code making background service of the Tensorflow of the model, requests for analysis based on web http, and then calls the model for extraction.
And the information output module is used for outputting the extraction result in the event and element extraction module in a database form.
The invention relates to an event and event element extraction method and a device for the field of block chains, which achieve the technical effects that: the method is based on a text aggregation module of a block chain keyword graph, and is used for performing word graph construction and aggregation processing of the block chain text word graph on an input source text by using a word graph aggregation method; events and their event elements can be accurately extracted.
Drawings
FIG. 1 is a flow chart illustrating a process of aggregation of web texts and keyword graph construction in the field of blockchains;
fig. 2 is a flow chart illustrating the implementation of events and their element extraction for the deep learning model of the attention model GAT.
Detailed Description
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
An event and event element extraction method for the block chain field comprises the following steps:
step one, clustering web texts based on a block chain keyword graph, as shown in fig. 1:
s11, screening texts containing seed words in the texts by taking the block chains as the seed words;
and S12, performing word segmentation and stop word removal processing on the text, and performing pre-training learning on the block chain text by using an LDA interface (topic model analysis interface) in a Gensim toolkit to obtain the vector representation of the words.
And S13, obtaining text word graph clusters with similar semantics by using a Kmeans word graph clustering algorithm.
S14, calculating TF-IDF values of the words in each text, and extracting the 30 words with the maximum TF-IDF values in the words related to the seed words.
And S15, taking the average value of the word vectors of the 30 words obtained in the step four as the vector representation of the block chain semantic related text.
S16, assigning a candidate set with the clustering cluster number k, clustering under different k conditions by using a Gaussian mixture model, and selecting the clustering result of k with the maximum contour coefficient as a final result. The formula of the contour coefficient is as follows.
Figure BDA0002469469830000061
Wherein, a i Mean value representing the Euclidean distance of node i to other points in the cluster, b i And represents the minimum value of the mean Euclidean distances from the node i to other cluster nodes.
S17, setting a contour coefficient sim i And (5) obtaining a block chain text aggregation word graph under the threshold value.
And step two, constructing a graph attention mechanism representing the learning event and an element extraction method thereof based on the block chain text aggregate word graph obtained in the step one, as shown in FIG. 2. Firstly, a block chain text aggregation word graph is taken as input, a deep learning model based on a graph attention model GAT is used for representing and learning words, and extracted models are trained by events and elements thereof until the models converge. And realizing a background interface of Tensorflow based on a converged model, predicting a new text to be extracted through the interface, and returning an output extraction value. Specifically, the method comprises the following steps:
s21: keyword graph representation learning modeling of graph attention mechanism
Graph attention network representation learning is defined as a hidden feature vector representation of a learned word graph
Figure BDA0002469469830000062
Figure BDA0002469469830000063
In this formula, softmax is a classification function for determining whether the word, i.e., the extracted content, belongs to an event, and W is a parameter matrix, which is self-learned in model modeling, and
Figure BDA0002469469830000064
representing the neighbor node j of the word in the word graph. Wherein a is ij For attention weight, define as:
Figure BDA0002469469830000065
wherein e ij The implicit vector correlation defined as two events represents a vector:
Figure BDA0002469469830000066
wherein w is e ij The parameter matrix in the function is self-learned in model modeling,
Figure BDA0002469469830000071
is a unit vector, is convenient for adjusting the parameter dimension,
Figure BDA0002469469830000072
is the transpose of the unit vector.
And S22, through model design, extracting event trigger words and event elements, namely, true samples, and performing cross entropy loss model training until the model converges.
The invention builds an event and element extraction device of a neural network based on a graph attention machine GAT model, comprising:
the information input module is used for carrying out standardized text processing on the source text acquired by the external database and then importing the processed source text;
the text aggregation module based on the block chain keyword word graph applies the word graph aggregation method to carry out word graph construction and aggregation processing of the block chain text word graph on the input source text;
and an event and element extraction module of the deep learning model based on the graph attention model GAT realizes the code making background service of the Tensorflow of the model, requests for analysis based on web http, and then calls the model for extraction.
And the information output module is used for outputting the extraction result in the event and element extraction module in a database form.
Example (b):
the following is a preferred embodiment of the present invention, and the technical solution of the present invention is further described, but the present invention is not limited to this embodiment. For example, by taking a block chain public benefit financing project of ' listening barrier children to get new sound ' on a payment love heart donation platform in 2016 (the ant golden service and the Chinese social help fund cooperation), and a text of ' buying barrier children ' for 10 listening barrier children ', events and element extraction effects thereof are as follows:
Figure BDA0002469469830000073
Figure BDA0002469469830000081
the event trigger words are 'online', namely online events of public service fund items related to the block chain, and the event elements comprise the time of the event, the executing subject which is the executing party of the event trigger action, namely 'ant gold clothes, Chinese social assistance fund council' of the text, and the subject object which is the object of the event trigger action, namely 'block chain public service fund items' online. The rest of the system also comprises the influence of the event, the keywords of the event and other elements.

Claims (2)

1. An event and event element extraction method for the block chain field is characterized in that: the method comprises the following steps:
step one, clustering web texts based on a block chain keyword graph to obtain a block chain text aggregation word graph;
secondly, establishing a graph representation learning event of a graph attention mechanism and an element extraction method thereof based on the block chain text aggregate word graph; firstly, taking a block chain text aggregation word graph as input, carrying out expression learning of the word graph based on a deep learning model of a graph attention model GAT, and carrying out extracted model training by using events and elements thereof until the model converges; a background interface of Tensorflow is realized based on a convergent model, and a new text to be extracted is predicted through the background interface, and an output extraction value is returned;
the specific process of the step one is as follows:
s11, screening texts containing seed words in the texts by taking the block chains as the seed words;
s12, performing word segmentation and stop word removal processing on the text, and performing pre-training learning on the block chain text by using a Gensim tool to obtain vector representation of words;
s13, obtaining text word graph clusters with similar semantics by using a word graph clustering algorithm;
s14, calculating TF-IDF values of words in each text, and extracting 30 words with the maximum TF-IDF values in the words related to the seed words;
s15, taking the average value of the word vectors of the 30 words obtained in the step S14 as the vector representation of the block chain semantic related text;
s16, assigning a candidate set with the clustering cluster number k, clustering under different k conditions by using a Gaussian mixture model, and selecting the clustering result of k with the maximum contour coefficient as a final result; the formula for the profile coefficients is as follows:
Figure FDA0003623232560000021
wherein, a i Mean value representing the Euclidean distance of node i to other points in the cluster, b i Representing the minimum value of the average Euclidean distance from the node i to other cluster nodes;
s17, setting a contour coefficient sim i And (5) obtaining a block chain text aggregation word graph under the threshold value.
2. The method of claim 1, wherein the method for extracting events and event elements comprises: the second step comprises the following specific processes:
s21: keyword graph representation learning modeling of graph attention mechanism
Graph attention network representation learning is defined as a hidden feature vector representation of a learned word graph
Figure FDA0003623232560000022
Figure FDA0003623232560000023
In this formula, softmax is a classification function for determining whether the keyword graph, i.e., the extracted content, belongs to an event, W is a parameter matrix, and is self-learned in model modeling, and
Figure FDA0003623232560000024
representing the neighbor node j of the keyword graph in the word graph; wherein a is ij For attention weight, define as:
Figure FDA0003623232560000025
wherein e ij The implicit vector correlation defined as two events represents a vector:
Figure FDA0003623232560000026
wherein w is e ij The parameter matrix in the function is self-learned in model modeling,
Figure FDA0003623232560000027
is a unit vector, is convenient for adjusting the parameter dimension,
Figure FDA0003623232560000028
is the transposition of a unit vector;
and S22, through model design, extracting real samples of event trigger words and event elements, and performing cross entropy loss model training until the model converges.
CN202010343965.6A 2020-04-27 2020-04-27 Event and event element extraction method and device for block chain field Active CN111597333B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010343965.6A CN111597333B (en) 2020-04-27 2020-04-27 Event and event element extraction method and device for block chain field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010343965.6A CN111597333B (en) 2020-04-27 2020-04-27 Event and event element extraction method and device for block chain field

Publications (2)

Publication Number Publication Date
CN111597333A CN111597333A (en) 2020-08-28
CN111597333B true CN111597333B (en) 2022-08-02

Family

ID=72185081

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010343965.6A Active CN111597333B (en) 2020-04-27 2020-04-27 Event and event element extraction method and device for block chain field

Country Status (1)

Country Link
CN (1) CN111597333B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112347249B (en) * 2020-10-30 2024-02-27 中科曙光南京研究院有限公司 Alert condition element extraction system and extraction method thereof
CN112989031B (en) * 2021-04-28 2021-08-03 成都索贝视频云计算有限公司 Broadcast television news event element extraction method based on deep learning
CN113536077B (en) * 2021-05-31 2022-06-17 烟台中科网络技术研究所 Mobile APP specific event content detection method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108763333A (en) * 2018-05-11 2018-11-06 北京航空航天大学 A kind of event collection of illustrative plates construction method based on Social Media
CN109582949A (en) * 2018-09-14 2019-04-05 阿里巴巴集团控股有限公司 Event element abstracting method, calculates equipment and storage medium at device
CN109871532A (en) * 2019-01-04 2019-06-11 平安科技(深圳)有限公司 Text subject extracting method, device and storage medium
CN110489541A (en) * 2019-07-26 2019-11-22 昆明理工大学 Case-involving public sentiment newsletter archive method of abstracting based on case element and BiGRU

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11574122B2 (en) * 2018-08-23 2023-02-07 Shenzhen Keya Medical Technology Corporation Method and system for joint named entity recognition and relation extraction using convolutional neural network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108763333A (en) * 2018-05-11 2018-11-06 北京航空航天大学 A kind of event collection of illustrative plates construction method based on Social Media
CN109582949A (en) * 2018-09-14 2019-04-05 阿里巴巴集团控股有限公司 Event element abstracting method, calculates equipment and storage medium at device
CN109871532A (en) * 2019-01-04 2019-06-11 平安科技(深圳)有限公司 Text subject extracting method, device and storage medium
CN110489541A (en) * 2019-07-26 2019-11-22 昆明理工大学 Case-involving public sentiment newsletter archive method of abstracting based on case element and BiGRU

Also Published As

Publication number Publication date
CN111597333A (en) 2020-08-28

Similar Documents

Publication Publication Date Title
CN108319666B (en) Power supply service assessment method based on multi-modal public opinion analysis
CN111597333B (en) Event and event element extraction method and device for block chain field
CN110110062B (en) Machine intelligent question and answer method and device and electronic equipment
CN108170848B (en) Chinese mobile intelligent customer service-oriented conversation scene classification method
CN111540367B (en) Voice feature extraction method and device, electronic equipment and storage medium
CN103514170B (en) A kind of file classification method and device of speech recognition
CN107688576B (en) Construction and tendency classification method of CNN-SVM model
CN112735383A (en) Voice signal processing method, device, equipment and storage medium
CN115408525B (en) Letters and interviews text classification method, device, equipment and medium based on multi-level label
Sun et al. Categorizing malware via A Word2Vec-based temporal convolutional network scheme
CN112434514B (en) Multi-granularity multi-channel neural network based semantic matching method and device and computer equipment
CN111078876A (en) Short text classification method and system based on multi-model integration
CN113609289A (en) Multi-mode dialog text-based emotion recognition method
CN113315789A (en) Web attack detection method and system based on multi-level combined network
CN111563373A (en) Attribute-level emotion classification method for focused attribute-related text
CN116304748A (en) Text similarity calculation method, system, equipment and medium
CN110246509B (en) Stack type denoising self-encoder and deep neural network structure for voice lie detection
CN113268974A (en) Method, device and equipment for marking pronunciations of polyphones and storage medium
CN111091809B (en) Regional accent recognition method and device based on depth feature fusion
CN111739537A (en) Semantic recognition method and device, storage medium and processor
CN114065749A (en) Text-oriented Guangdong language recognition model and training and recognition method of system
CN114118058A (en) Emotion analysis system and method based on fusion of syntactic characteristics and attention mechanism
Patil et al. Pattern recognition using genetic algorithm
Ibrahim et al. A study of using language models to detect sarcasm
CN113076424A (en) Data enhancement method and system for unbalanced text classified data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant