CN112256840A - Device for carrying out industrial internet discovery and extracting information by improving transfer learning model - Google Patents
Device for carrying out industrial internet discovery and extracting information by improving transfer learning model Download PDFInfo
- Publication number
- CN112256840A CN112256840A CN202011256306.5A CN202011256306A CN112256840A CN 112256840 A CN112256840 A CN 112256840A CN 202011256306 A CN202011256306 A CN 202011256306A CN 112256840 A CN112256840 A CN 112256840A
- Authority
- CN
- China
- Prior art keywords
- model
- industrial internet
- classification
- sentence
- entity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013526 transfer learning Methods 0.000 title claims abstract description 20
- 238000000034 method Methods 0.000 claims abstract description 21
- 238000012549 training Methods 0.000 claims abstract description 21
- 238000004140 cleaning Methods 0.000 claims abstract description 14
- 238000013145 classification model Methods 0.000 claims description 15
- 238000005070 sampling Methods 0.000 claims description 9
- 230000002457 bidirectional effect Effects 0.000 claims description 3
- 230000009193 crawling Effects 0.000 claims description 3
- 238000000605 extraction Methods 0.000 abstract description 6
- 230000011218 segmentation Effects 0.000 abstract description 2
- 238000013508 migration Methods 0.000 description 5
- 230000005012 migration Effects 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 4
- 238000003058 natural language processing Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
A device for improving a transfer learning model to discover industrial Internet and extract information relates to the technical field of information. The system consists of a web crawler, a text cleaning module, a content classification execution module, an improved transfer learning model and an entity identification module; the invention does not need massive texts with labels for training, thereby saving a great deal of labor cost; and secondly, the method is not influenced by word segmentation, and more relevant text features can be obtained for website classification and key service information extraction of the industrial internet platform website.
Description
Technical Field
The invention relates to the technical field of information, in particular to the technical field of information security.
Background
With the accelerated progress of the manufacturing industry from the digitalization stage to the networking stage, the industrial internet platform in China is rapidly started, and the timely discovery and management of the platform information become a problem which is urgently needed to be solved at present. The types of websites in the internet are numerous, the first problem which is faced at present is how to automatically find the industrial internet platform website from a large number of websites, and the second problem is how to extract key platform information from the platform website content.
At present, the industrial internet platform information is mainly collected manually, and manpower and time cost are wasted, so that the method for automatically discovering and extracting the platform information is very urgent.
In recent years, rapid development of artificial intelligence technology has made little progress in the field of natural language processing, in which text classification is used for text with different characteristics, and named entity recognition technology is mainly used for information extraction and text data structuring.
The current website classification method is mainly based on the traditional machine learning algorithm and the deep learning mode, and the traditional machine learning algorithm, such as the invention patent CN106168968A, determines the website category by calculating the weight of the data matched to the dictionary. Due to the difficulty in constructing the dictionary and the numerous types of websites, the conventional algorithm is difficult to accurately classify the websites according to the dictionary. The method based on deep learning, such as the invention patent CN110442823A, requires a large number of training samples to train parameters of the neural network, and the process of collecting a large number of samples is long, and consumes a large amount of human resources.
In the prior art, a named entity identification method is mainly an entity identification method based on traditional machine learning and an entity identification method based on deep learning. The entity identification method based on the traditional machine learning, such as the invention patent CN111274804A, model learning is carried out on labeled data through statistics, the data to be predicted are sent to model prediction, the model calculates the entity with the maximum possibility by utilizing a Viterbi algorithm, and the method has the biggest defect that the semantics cannot be understood and the method cannot be competent for the task of complex entity identification. The named entity recognition method based on deep learning, such as patent CN111126068A of the invention, builds a neural network model to learn semantic features, and can learn more complex semantics, but needs a large amount of labeled data to learn, and the data labeling work is very time-consuming and labor-consuming.
Based on the characteristics of high complexity, high implementation cost and large labor consumption of the prior art, the device for discovering the industrial Internet and extracting information by the improved migration learning model improves the migration learning model, improves the calculation efficiency of the migration learning model by sharing the layered calculation parameters of the migration learning model, can perform rapid classified modeling on classified industrial Internet sample data to obtain an industrial Internet classification model, then obtains real-time data by network information capture and data cleaning, inputs the real-time data into the industrial Internet classification model for classification to obtain the industrial Internet classification of the real-time data, then captures key information of the real-time data to obtain updated industrial Internet sample data, and updates the updated industrial Internet sample data into the classified industrial sample data, the invention can automatically complete the classification and information capture of the industrial internet in the whole process, and gradually amend and enrich the classified industrial internet sample data, thereby achieving the continuous evolution and improvement of the industrial internet classification model. The invention has the characteristics of high efficiency and real-time performance.
General technical description of the use
A transfer learning model: the transfer learning model used in the patent application refers to structBERT, which is an NLP pre-training model proposed by the Alibardamol institute, and makes related improvements on the basis of the traditional BERT. The author thinks that the pretraining task of Bert ignores the language structure information, so that structBert adds two language structure-based training targets to the original MaskLM training target of Bert: word order and sentence order tasks.
Named entity recognition: named entity recognition refers to the recognition of specific objects in text, the semantic categories of which are usually predefined before recognition, such as people, addresses, organizations, etc. Named entity recognition is not just an independent information extraction task, it also plays a key role in many large NLP applications such as information retrieval, automatic text summarization, question and answer systems, machine translation, and knowledge base building.
Disclosure of Invention
In view of the defects of the prior art, the device for discovering the industrial internet and extracting the information by the improved transfer learning model provided by the invention consists of a web crawler, a text cleaning module, a content classification execution module, the improved transfer learning model and an entity identification module;
the web crawler is responsible for crawling web page content and sending the web page content and the web page address to the text cleaning module;
the text cleaning module is responsible for removing noise characters in a text formed by the webpage content and the webpage address to generate clean webpage information, and the text cleaning module sends the clean webpage information to the content classification execution module; the noise characters include: html tags, stop words, forwarding symbols, urls and marking information;
the content classification execution module comprises an industrial internet classification model, and the industrial internet classification model is obtained by performing language training on classified internet sample data through an improved transfer learning model; the industrial internet classification model consists of classification labels of classified internet sample data and the probability that the content of the classified internet sample data belongs to each classification label;
the algorithm of the improved transfer learning model is represented as: 1) the method comprises the steps of using a structBERT to represent each word of each sentence in a text, then using a bidirectional Transformer to learn the represented text, wherein the Transformer is a standard program in the structBERT, each layer of parameters of the traditional Transformer are independent, when the number of layers is increased, the number of the parameters is obviously increased, and the model shares the parameters of all the layers and learns the parameter quantity of one layer; 2) the word representation of the improved StructBERT is represented by a word vector, a segment vector and a position vector together; the first word of the word vector is used for a subsequent classification task, the segment vector is used for distinguishing two sentences, and the position vector is used for representing word position information; 3) semantic features are learned through four training tasks: i) a masked language model, ii) a predict next sentence task, iii) a word order task, iv) a sentence structure task; the hidden language model task means that the model predicts that 15% of words are randomly hidden in the training process, 80% of the words in the 15% of words are replaced by mask symbols, 10% of the words are not replaced, and 10% of the words are replaced by other words; the model learns the semantic information of the text through the task; predicting next sentence task in order for the model to learn the relationship between sentences, assuming that the input of training is sentences S1 and S2, and S2 has half the probability of being the next sentence of S1, the two sentences are input, and the model predicts whether S2 is the next sentence of S1; the word sequence task selects a part of 3 subsequences with the length of 5% from the unmasked sequences, the word sequences in the subsequences are disordered, and the model reconstructs the original word sequences, so that the model learns the word sequence relation in sentences; a sentence structure task, wherein a sentence pair is given (S1, S2), the context and the independence of S2 and S1 are judged; in sampling, for a sentence S, the next sentence of the probability sampling S of 1/3 constitutes a sentence pair, the previous sentence of the probability sampling S of 1/3 constitutes a sentence pair, and the probability of 1/3 randomly samples sentence constituting sentence pairs of another document;
the content classification execution module compares clean webpage information with the industrial internet classification model, discards clean webpage information which is not classified by the industrial internet and sends the clean webpage information belonging to the industrial internet classification to the entity identification module;
the entity identification module comprises an entity category model, the entity category model is obtained by performing language training on classified industrial internet sample data with entity category labels through an improved transfer learning model, and the entity category model is composed of the classification labels of the classified industrial internet sample data with the entity category labels and the probability that the content of the classified industrial internet sample data with the entity category labels belongs to each classification label;
the entity identification module compares clean webpage information with an entity type model, outputs content in the clean webpage information and an entity type label corresponding to the content in the clean webpage information, and generates updated classified industrial internet data with the entity type label;
the entity identification module incorporates the updated categorized industrial internet data with the entity category label into categorized industrial internet sample data with the entity category label.
Advantageous effects
Compared with the traditional text classification and information extraction technology, the method does not need massive texts with labels for training, and saves a large amount of labor cost; and secondly, the method is not influenced by word segmentation, and more relevant text features can be obtained for website classification and key service information extraction of the industrial internet platform website.
Drawings
FIG. 1 is a system block diagram of the present invention.
Detailed Description
The device for realizing industrial internet discovery and information extraction of the improved transfer learning model provided by the invention with reference to fig. 1 is composed of a web crawler 1, a text cleaning module 2, a content classification execution module 3, an improved transfer learning model 4 and an entity recognition module 5;
the web crawler 1 is responsible for crawling web page contents and sending the web page contents and the web page addresses 10 to the text cleaning module 2;
the text cleaning module 2 is responsible for removing noise characters in the text formed by the webpage content and the webpage address 10 to generate clean webpage information, and the text cleaning module 2 sends the clean webpage information to the content classification execution module 3; the noise characters include: html tags, stop words, forwarding symbols, urls and marking information;
the content classification execution module 3 comprises an industrial internet classification model 41, and the industrial internet classification model 41 is obtained by performing language training on classified internet sample data 40 through an improved transfer learning model 4; the industrial internet classification model 41 is composed of classification labels of the classified internet sample data 40 and probabilities that the contents of the classified internet sample data 40 belong to each classification label;
the algorithm of the improved migration learning model 4 is represented as: 1) the method comprises the steps of using a structBERT to represent each word of each sentence in a text, then using a bidirectional Transformer to learn the represented text, wherein the Transformer is a standard program in the structBERT, each layer of parameters of the traditional Transformer are independent, when the number of layers is increased, the number of the parameters is obviously increased, and the model shares the parameters of all the layers and learns the parameter quantity of one layer; 2) the word representation of the improved StructBERT is represented by a word vector, a segment vector and a position vector together; the first word of the word vector is used for a subsequent classification task, the segment vector is used for distinguishing two sentences, and the position vector is used for representing word position information; 3) semantic features are learned through four training tasks: i) a masked language model, ii) a predict next sentence task, iii) a word order task, iv) a sentence structure task; the hidden language model task means that the model predicts that 15% of words are randomly hidden in the training process, 80% of the words in the 15% of words are replaced by mask symbols, 10% of the words are not replaced, and 10% of the words are replaced by other words; the model learns the semantic information of the text through the task; predicting next sentence task in order for the model to learn the relationship between sentences, assuming that the input of training is sentences S1 and S2, and S2 has half the probability of being the next sentence of S1, the two sentences are input, and the model predicts whether S2 is the next sentence of S1; the word sequence task selects a part of 3 subsequences with the length of 5% from the unmasked sequences, the word sequences in the subsequences are disordered, and the model reconstructs the original word sequences, so that the model learns the word sequence relation in sentences; a sentence structure task, wherein a sentence pair is given (S1, S2), the context and the independence of S2 and S1 are judged; in sampling, for a sentence S, the next sentence of the probability sampling S of 1/3 constitutes a sentence pair, the previous sentence of the probability sampling S of 1/3 constitutes a sentence pair, and the probability of 1/3 randomly samples sentence constituting sentence pairs of another document;
the content classification execution module 3 compares the clean webpage information with the industrial internet classification model 41, discards the clean webpage information which is not classified by the industrial internet and sends the clean webpage information belonging to the industrial internet classification to the entity identification module 5;
the entity identification module 5 comprises an entity category model 51, the entity category model 51 is obtained by language training of the classified industrial internet sample data 50 with the entity category label through the improved transfer learning model 4, and the entity category model 51 is composed of the classification label of the classified industrial internet sample data 50 with the entity category label and the probability that the content of the classified industrial internet sample data 50 with the entity category label belongs to each classification label;
the entity identification module 5 compares the clean webpage information with the entity type model 51, outputs the content in the clean webpage information and the entity type label corresponding to the content in the clean webpage information, and generates the updated classified industrial internet data 52 with the entity type label;
the entity identification module 5 incorporates the updated entity class tagged classified industrial internet data 52 into the entity class tagged classified industrial internet sample data 50.
Claims (1)
1. The device for carrying out industrial internet discovery and extracting information by improving the transfer learning model is characterized by consisting of a web crawler, a text cleaning module, a content classification execution module, an improved transfer learning model and an entity identification module;
the web crawler is responsible for crawling web page content and sending the web page content and the web page address to the text cleaning module;
the text cleaning module is responsible for removing noise characters in a text formed by the webpage content and the webpage address to generate clean webpage information, and the text cleaning module sends the clean webpage information to the content classification execution module; the noise characters include: html tags, stop words, forwarding symbols, urls and marking information;
the content classification execution module comprises an industrial internet classification model, and the industrial internet classification model is obtained by performing language training on classified internet sample data through an improved transfer learning model; the industrial internet classification model consists of classification labels of classified internet sample data and the probability that the content of the classified internet sample data belongs to each classification label;
the algorithm of the improved transfer learning model is represented as: 1) the method comprises the steps of using a structBERT to represent each word of each sentence in a text, then using a bidirectional Transformer to learn the represented text, wherein the Transformer is a standard program in the structBERT, each layer of parameters of the traditional Transformer are independent, when the number of layers is increased, the number of the parameters is obviously increased, and the model shares the parameters of all the layers and learns the parameter quantity of one layer; 2) the word representation of the improved StructBERT is represented by a word vector, a segment vector and a position vector together; the first word of the word vector is used for a subsequent classification task, the segment vector is used for distinguishing two sentences, and the position vector is used for representing word position information; 3) semantic features are learned through four training tasks: i) a masked language model, ii) a predict next sentence task, iii) a word order task, iv) a sentence structure task; the hidden language model task means that the model predicts that 15% of words are randomly hidden in the training process, 80% of the words in the 15% of words are replaced by mask symbols, 10% of the words are not replaced, and 10% of the words are replaced by other words; the model learns the semantic information of the text through the task; predicting next sentence task in order for the model to learn the relationship between sentences, assuming that the input of training is sentences S1 and S2, and S2 has half the probability of being the next sentence of S1, the two sentences are input, and the model predicts whether S2 is the next sentence of S1; the word sequence task selects a part of 3 subsequences with the length of 5% from the unmasked sequences, the word sequences in the subsequences are disordered, and the model reconstructs the original word sequences, so that the model learns the word sequence relation in sentences; a sentence structure task, wherein a sentence pair is given (S1, S2), the context and the independence of S2 and S1 are judged; in sampling, for a sentence S, the next sentence of the probability sampling S of 1/3 constitutes a sentence pair, the previous sentence of the probability sampling S of 1/3 constitutes a sentence pair, and the probability of 1/3 randomly samples sentence constituting sentence pairs of another document;
the content classification execution module compares clean webpage information with the industrial internet classification model, discards clean webpage information which is not classified by the industrial internet and sends the clean webpage information belonging to the industrial internet classification to the entity identification module;
the entity identification module comprises an entity category model, the entity category model is obtained by performing language training on classified industrial internet sample data with entity category labels through an improved transfer learning model, and the entity category model is composed of the classification labels of the classified industrial internet sample data with the entity category labels and the probability that the content of the classified industrial internet sample data with the entity category labels belongs to each classification label;
the entity identification module compares clean webpage information with an entity type model, outputs content in the clean webpage information and an entity type label corresponding to the content in the clean webpage information, and generates updated classified industrial internet data with the entity type label;
the entity identification module incorporates the updated categorized industrial internet data with the entity category label into categorized industrial internet sample data with the entity category label.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011256306.5A CN112256840A (en) | 2020-11-12 | 2020-11-12 | Device for carrying out industrial internet discovery and extracting information by improving transfer learning model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011256306.5A CN112256840A (en) | 2020-11-12 | 2020-11-12 | Device for carrying out industrial internet discovery and extracting information by improving transfer learning model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112256840A true CN112256840A (en) | 2021-01-22 |
Family
ID=74265439
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011256306.5A Pending CN112256840A (en) | 2020-11-12 | 2020-11-12 | Device for carrying out industrial internet discovery and extracting information by improving transfer learning model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112256840A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107451433A (en) * | 2017-06-27 | 2017-12-08 | 中国科学院信息工程研究所 | A kind of information source identification method and apparatus based on content of text |
CN111078978A (en) * | 2019-11-29 | 2020-04-28 | 上海观安信息技术股份有限公司 | Web credit website entity identification method and system based on website text content |
CN111428981A (en) * | 2020-03-18 | 2020-07-17 | 国电南瑞科技股份有限公司 | Deep learning-based power grid fault plan information extraction method and system |
CN111739520A (en) * | 2020-08-10 | 2020-10-02 | 腾讯科技(深圳)有限公司 | Speech recognition model training method, speech recognition method and device |
CN111767732A (en) * | 2020-06-09 | 2020-10-13 | 上海交通大学 | Document content understanding method and system based on graph attention model |
-
2020
- 2020-11-12 CN CN202011256306.5A patent/CN112256840A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107451433A (en) * | 2017-06-27 | 2017-12-08 | 中国科学院信息工程研究所 | A kind of information source identification method and apparatus based on content of text |
CN111078978A (en) * | 2019-11-29 | 2020-04-28 | 上海观安信息技术股份有限公司 | Web credit website entity identification method and system based on website text content |
CN111428981A (en) * | 2020-03-18 | 2020-07-17 | 国电南瑞科技股份有限公司 | Deep learning-based power grid fault plan information extraction method and system |
CN111767732A (en) * | 2020-06-09 | 2020-10-13 | 上海交通大学 | Document content understanding method and system based on graph attention model |
CN111739520A (en) * | 2020-08-10 | 2020-10-02 | 腾讯科技(深圳)有限公司 | Speech recognition model training method, speech recognition method and device |
Non-Patent Citations (1)
Title |
---|
DRUGAI: "ICLR2020|StructBERT:融合语言结构的BERT模型", pages 2, Retrieved from the Internet <URL:https://blog.csdn.net/u012325865/article/details/106464621?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522170659530716800213024812%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fall.%2522%257D&request_id=170659530716800213024812&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~all~first_rank_ecpm_v1~rank_v31_ecpm-11-106464621-null-null.142^v99^pc_search_result_base6&utm_term=structbert&spm=1018.2226.3001.4187> * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110110054B (en) | Method for acquiring question-answer pairs from unstructured text based on deep learning | |
CN108875051B (en) | Automatic knowledge graph construction method and system for massive unstructured texts | |
CN111159395B (en) | Chart neural network-based rumor standpoint detection method and device and electronic equipment | |
CN112989841B (en) | Semi-supervised learning method for emergency news identification and classification | |
CN106886580B (en) | Image emotion polarity analysis method based on deep learning | |
CN110009430B (en) | Cheating user detection method, electronic device and computer readable storage medium | |
CN108984775B (en) | Public opinion monitoring method and system based on commodity comments | |
CN111767725A (en) | Data processing method and device based on emotion polarity analysis model | |
CN110633366A (en) | Short text classification method, device and storage medium | |
CN112199606B (en) | Social media-oriented rumor detection system based on hierarchical user representation | |
CN113434688B (en) | Data processing method and device for public opinion classification model training | |
CN113806547B (en) | Deep learning multi-label text classification method based on graph model | |
CN115292568B (en) | Civil news event extraction method based on joint model | |
CN116150509B (en) | Threat information identification method, system, equipment and medium for social media network | |
CN111651566A (en) | Multi-task small sample learning-based referee document dispute focus extraction method | |
CN112257444A (en) | Financial information negative entity discovery method and device, electronic equipment and storage medium | |
CN113378024B (en) | Deep learning-oriented public inspection method field-based related event identification method | |
CN112579730A (en) | High-expansibility multi-label text classification method and device | |
CN111400617B (en) | Social robot detection data set extension method and system based on active learning | |
CN117520561A (en) | Entity relation extraction method and system for knowledge graph construction in helicopter assembly field | |
CN116775880A (en) | Multi-label text classification method and system based on label semantics and transfer learning | |
CN112256840A (en) | Device for carrying out industrial internet discovery and extracting information by improving transfer learning model | |
CN115878800A (en) | Double-graph neural network fusing co-occurrence graph and dependency graph and construction method thereof | |
CN113806538B (en) | Label extraction model training method, device, equipment and storage medium | |
CN115129875A (en) | Building accident report classification system and method based on graph neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |