CN111930909A - Geological intelligent question and answer oriented data automatic sequence labeling identification method - Google Patents

Geological intelligent question and answer oriented data automatic sequence labeling identification method Download PDF

Info

Publication number
CN111930909A
CN111930909A CN202010804098.1A CN202010804098A CN111930909A CN 111930909 A CN111930909 A CN 111930909A CN 202010804098 A CN202010804098 A CN 202010804098A CN 111930909 A CN111930909 A CN 111930909A
Authority
CN
China
Prior art keywords
data
result
gold
user
labeling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010804098.1A
Other languages
Chinese (zh)
Other versions
CN111930909B (en
Inventor
贺金龙
付立军
黄徐胜
唐珂珂
朱月琴
刘晓娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202010804098.1A priority Critical patent/CN111930909B/en
Publication of CN111930909A publication Critical patent/CN111930909A/en
Application granted granted Critical
Publication of CN111930909B publication Critical patent/CN111930909B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/117Tagging; Marking up; Designating a block; Setting of attributes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/169Annotation, e.g. comment data or footnotes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Mathematical Physics (AREA)
  • Animal Behavior & Ethology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of information, and provides a geological intelligent question and answer oriented data automatic sequence annotation identification method. The invention aims to realize the accuracy of the question-answer interaction effect of a user in the intelligent question-answer process of gold mine data. The main scheme comprises the steps of sorting and cleaning the map data of the gold mine literature to obtain batch literature data; carrying out machine automation labeling on character data by using BIOES labels aiming at the literature data to obtain a gold mine data labeling result, and carrying out input training by adopting deep learning to obtain a training result of the gold mine literature data; applying the training result of the document data to user query sentence recognition to obtain a labeling result of the user query sentence, and then performing attribute classification to obtain the classification of the user query sentence; and combining and packaging the labeling result and the classification through a set to obtain the labeling of the gold data in the user query sentence and the result of the semantic attribute of the query sentence, and mapping the result to the gold knowledge map to obtain the user query knowledge result.

Description

Geological intelligent question and answer oriented data automatic sequence labeling identification method
Technical Field
The invention relates to the technical field of knowledge map application in a deep learning knowledge mining process, and provides a gold mine data automatic sequence labeling method for realizing an intelligent question-answering platform.
Background
Currently, the intelligent question-answering service is an important application in the development stage of artificial intelligence, and has greater cognitive ability compared with the traditional rule matching and co-occurrence retrieval matching. In the implementation process, the concept and relationship association of knowledge is realized by introducing a knowledge map, and then the field recognition and the intention recognition are carried out by using an automatic sequence labeling method of deep learning in the question and answer process of a user, so that an intelligent question and answer platform is realized.
At present, the implementation of a question-answering system mostly depends on regular template matching and Elasricsearch retrieval matching, and the number of questions and answers in the general field is large, and meanwhile, due to lack of deep semantic knowledge analysis, the implementation of intelligent question-answering service in the specific field is challenging. When the existing question-answering system processes a Chinese text, sentences are generally converted into word representations through a word segmentation technology, and then knowledge base matching of the sentences is performed through semantic similarity calculation (editing distance, vector cosine similarity of TFIDF) so as to realize inquiry reply of a user. The word segmentation technology comprises three development stages of rule dictionary matching, statistical machine learning and deep learning. Matching based on the rule dictionary comprises forward maximum matching and reverse maximum matching bidirectional maximum matching; the statistical machine learning-based method comprises an n-element language model, a maximum entropy model, a conditional random field and the like; with mass data information generated in the stage of advancing web2.0 to web3.0, word segmentation methods based on deep learning are continuously started, and include a convolutional neural network, a cyclic neural network, a long-time memory network, a mode of combining with a conditional random field and the like, and a label mode adopted in the identification process is a BIO or BIOES label.
The existing labeling method has the following defects:
(1) in the gold mine knowledge mining and discovering process, manual processing of a large amount of data information consumes time and labor, and the processing efficiency is not high.
(2) The application of the word segmentation tool is seriously dependent on the construction of a dictionary, and the application effect cannot be achieved in the gold mine information processing process, so that the effect of the word segmentation tool in the general field is better.
(3) For the sequence annotation of massive gold mine data, structured information by means of specific domain knowledge categories is required on the basis of the prior art method.
Disclosure of Invention
The invention aims to realize the accuracy of the question-answer interaction effect of a user in the intelligent question-answer process of gold mine data, construct a deep learning identification method based on automatic sequence labeling, and construct the deep learning identification method by combining a gold mine field document and a map.
In order to solve the technical problems, the invention adopts the following technical scheme:
a geological intelligent question and answer oriented data automatic sequence labeling identification method comprises the following steps:
step 1: sorting the map data of the gold mine literature to obtain a domain entity classification description label (including an entity) as a labeling label for identifying a domain knowledge entity;
step 2: automatically cleaning the document data content by a machine, wherein English letters, punctuations and meaningless symbols are filtered to obtain effective Chinese text content;
and step 3: storing the cleaned text contents in an independent txt file to obtain a storage root path of batch document data;
and 4, step 4: performing machine automation labeling on character data by using BIOES labels aiming at the document data obtained in the step 3, wherein label combination is performed by combining the sorted map entity classification description data to obtain a gold ore data labeling result beginning from B, I, O, E, S;
and 5: inputting and training the character sequence data of the gold mine data labeling result in the step 4 by adopting a mode of combining a bidirectional LSTM model and a conditional random field CRF in deep learning, and adding the sorted gold mine map entity data by adjusting the structure and the overall parameters of memory cells in the LSTM model to obtain a training result of the gold mine literature data;
step 6: applying the training result of the document data to platform user query sentence recognition to obtain a labeling result of the user query sentence;
and 7: inputting the residual sentences obtained by subtracting the contents of the gold mine data labeling results from the contents of the user query sentences into a convolutional neural network for attribute classification to obtain the classification of the user query sentences;
and 8: combining and packaging the gold mine data identification result and the classification of the user query sentence through a Map set to obtain a result of the semantic attributes of the gold mine data in the user query sentence, such as what the profile of the geological entity GENT is in the Qinghai-Tibet plateau, which is a profile, for example;
and step 9: and (4) mapping the results of the semantic attributes of the label and inquiry statement of the gold mine data in the step (8) to a gold mine knowledge map to obtain the result of the inquiry knowledge of the user, thereby realizing intelligent question answering.
In the technical scheme, the sorting of the map data of the gold mine literature comprises the following steps:
the method is characterized in that gold mine literature data are collected through manual arrangement of geological encyclopedia and dog searching corpora, and classification description labels are constructed through gold mine field knowledge, wherein the classification description labels comprise geological entities GENT, geological effects GEFF, geochemical GEHE and geological methods GMET.
In the above technical solution, the tag combination in step 4 comprises the steps of:
firstly, carrying out character division on the BIOES label to obtain a single character letter B, I, O, E, S;
and (4) automatically labeling the single-character letters and the txt file content in the step 3 to obtain a gold mine data labeling result beginning from B, I, O, E, S.
In the technical scheme, the automatic labeling is carried out on the basis of gold mine data labeling, firstly, the gold mine data are used for training character vectors based on Word2vec, then, a mode of combining a bidirectional neural network LSTM and a conditional random field CRF in deep learning is used for training and learning gold mine data labeling results, and model parameters are adjusted to obtain the training results of the gold mine data.
In the technical scheme, the user inquiry sentence is identified, the user inquiry sentence is input into the model, and the training result model is used for automatically identifying the sequence of the user sentence information to obtain the labeling result of the user inquiry sentence;
in the above technical solution, the user query sentence recognition includes the following steps:
(1) inputting a user inquiry statement into a platform through an http interface, and firstly obtaining a word index (such as Qing: 15, Tibetan: 23, high: 54, original: 113 and the like) of the user statement;
(2) and (5) further calling and outputting the user sentence word index through the combined model training result of the LSTM and the CRF in the step 5 to obtain words combined by characters, namely the labeling result of the user inquiry sentence.
In the technical scheme, user statement classification is performed, other unidentified parts input into the sequence identification model are input into the convolutional neural network to perform attribute classification, and the classification of the user statement to be inquired is automatically realized through machine training of labeled data.
Due to the adoption of the technical scheme, the invention has the following beneficial effects:
1. the gold mine literature data needs professional knowledge skills for processing and application, and the automatic sequence marking and identification of a machine are adopted, so that the complexity of manual processing is reduced; another opposite aspect makes the domain knowledge focused on the inside, the user expands quickly during use without needing to concentrate on the inside of the bottom layer.
2. The automatic sequence labeling identification method based on the map gold mine data provides a convenient interaction mode for a user in an intelligent question-answering process, only an inquiry sentence needs to be input, and convenience of knowledge in the gold mine field in an application process is greatly improved.
3. The automatic sequence labeling recognition process does not depend on a word segmentation tool, only depends on automatic model training, greatly reduces human resources, and meanwhile, the model only needs to be trained once in the using process, and does not need to be trained during use, and only needs to be called.
4. The migration of the model technology only depends on the provided literature data, and the model can be conveniently and quickly customized and trained according to different data, so that the risk of model migration is reduced.
5. By adopting the automatic sequence labeling and identifying method of the map gold mine data, the intelligent question answering has generalization capability compared with the intelligent question answering based on the regular template matching and the retrieval matching.
Drawings
FIG. 1 is a flow diagram of an intelligent question and answer service;
FIG. 2 is a sequence annotation diagram based on BIOES and gold data classification description tag combinations;
FIG. 3 illustrates a process flow of annotation based on a segmentation tool;
FIG. 4 is a flow diagram of automated sequence annotation recognition.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the detailed description and specific examples, while indicating the preferred embodiment of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
It is noted that relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The automatic sequence labeling and identifying method based on the map gold mine data, which is adopted by the intelligent gold mine question-answering platform, realizes timely and accurate response of the user's inquiry by combining with the domain characteristic knowledge. Firstly, collecting gold mine data documents, and removing invalid symbols and meaningless labels to obtain Chinese text contents; then, establishing knowledge description classification information by combining the classification structure information of the domain knowledge; then, automatic sequence labeling of character labels is carried out on the text content, and combined label labeling of characters is carried out by combining with domain knowledge classification description labels; secondly, training and learning the gold mine text data by using a bidirectional neural network in a deep learning model, and adjusting parameters to reach a threshold model meeting automatic sequence labeling recognition; and then, using the obtained model to carry out sequence recognition of user sentences, carrying out intention classification on the data without the sequence recognition, mapping the classification result and the sequence recognition result to a gold mine knowledge map to carry out user inquiry and query, and further realizing user feedback. The question-answering service is shown in figure 1.
The steps are as follows:
(1) and (6) data arrangement. And (3) sorting and collecting the gold mine literature data, and constructing a classification description label such as geological entity GENT, geological action GEFF, geological chemistry GEHE and geological method GMET through gold mine field knowledge.
(2) And (6) data cleaning. And performing batch processing on the sorted document data to obtain text content, and cleaning the format of the text content in a regular matching expression mode to obtain an effective Chinese text.
(3) And storing the data in batches. And uniformly storing the batch text contents in a fixed root directory by using python according to the article space number, and storing the batch text contents in the form of utf-8 and txt files.
(4) Automated labeling of composite labels. The method comprises the steps of reading the contents of the gold mine data text one by one and one by one characters, and combining the sorted gold mine field knowledge classification description labels with the traditional BIOES labels for combined labeling to obtain a gold mine data character labeling result beginning from B, I, O, E, S. As shown in fig. 2.
(5) Automated sequence recognition for deep learning. On the basis of data labeling, firstly, training character vectors based on Word2vec by using gold mine data, then, training and learning the labeled data by using a mode of combining a bidirectional neural network LSTM and a conditional random field CRF in deep learning, and obtaining training results (checkpoint file storage) of the gold mine data by adjusting model parameters. The recognition manner in the word segmentation tool for scoring the word feature weights is not used here, as shown in fig. 3.
(6) And (4) user query sentence sequence identification. And inputting the user inquiry sentences into the model, and automatically identifying the sequence of the user sentence information by using the training result model to obtain the labeling result of the user data. As shown in fig. 4.
(7) And classifying the user sentences. Inputting other unidentified parts input into the sequence identification model into a convolutional neural network for attribute classification, wherein the attribute classification is automatically realized through machine training of labeled data to obtain user statement classification.
(8) And classifying and obtaining the user sequence labeling result and the statement attribute. And (4) combining the results in the step (6) and the step (7) to realize the understanding of the statement information of the user, and obtaining a combined result of the two.
(9) And (5) mapping and querying the map. And mapping the combined result in the step 8 to a gold mine knowledge graph, and obtaining feedback information through the knowledge graph mechanization query.
Examples
The invention provides a geological intelligent question and answer oriented data automatic sequence labeling and identifying method, which comprises the following steps:
step 1: sorting the map data of the gold mine literature to obtain a domain entity classification description label (including an entity) as a labeling label for identifying a domain knowledge entity;
step 2: automatically cleaning the document data content by a machine, wherein English letters, punctuations and meaningless symbols are filtered to obtain effective Chinese text content;
and step 3: storing the cleaned text contents in an independent txt file to obtain a storage root path of batch document data;
and 4, step 4: performing machine automation labeling on character data by using BIOES labels aiming at the document data obtained in the step 3, wherein label combination is performed by combining the sorted map entity classification description data to obtain a gold ore data labeling result beginning from B, I, O, E, S;
and 5: and (4) inputting and training the character sequence data of the gold mine data labeling result in the step (4) by adopting a mode of combining a bidirectional LSTM model and a conditional random field CRF in deep learning, and adding the sorted gold mine map entity data by adjusting the structure and the overall parameters of memory cells in the LSTM model to obtain a training result (checkpoint file storage) of the gold mine literature data. (ii) a
Step 6: applying the training result of the document data to platform user query sentence recognition to obtain a labeling result of the user query sentence;
and 7: inputting the residual sentences obtained by subtracting the contents of the gold mine data labeling results from the contents of the user query sentences into a convolutional neural network for attribute classification to obtain the classification of the user query sentences;
and 8: combining the gold ore data labeling result with the classification of the user query statement through the user query statement to obtain the result of the gold ore data labeling and query statement semantic attribute in the user query statement; the gold mine data annotation result refers to entity parts in the gold mine literature, such as geological entities (Qinghai-Tibet plateau, volcanic mechanism), geological action, geochemistry and geological methods; the classification of the user query statement refers to the attribute category of the user query for the entity part, such as: brief introduction, kind, size, relationship, area scope;
and step 9: and (4) mapping the results of the semantic attributes of the label and inquiry statement of the gold mine data in the step (8) to a gold mine knowledge map to obtain the result of the inquiry knowledge of the user, thereby realizing intelligent question answering.
In the scheme, the step of arranging the map data of the gold mine literature comprises the following steps:
the method is characterized in that gold mine literature data are collected through manual arrangement of geological encyclopedia and dog searching corpora, and classification description labels are constructed through gold mine field knowledge, wherein the classification description labels comprise geological entities GENT, geological effects GEFF, geochemical GEHE and geological methods GMET.
In the above scheme, the tag combination in step 4 comprises the steps of:
firstly, carrying out character division on the BIOES label to obtain a single character letter B, I, O, E, S;
and (4) automatically labeling the single-character letters and the txt file content in the step 3 to obtain a gold mine data labeling result beginning from B, I, O, E, S.
In the scheme, the automatic labeling is carried out on the basis of gold mine data labeling, firstly, the gold mine data are used for training character vectors based on Word2vec, then, a bidirectional neural network LSTM and conditional random field CRF combination mode in deep learning is used for training and learning gold mine data labeling results, and the training results of the gold mine data are obtained by adjusting model parameters.
In the above scheme, the identification of the user query sentence includes the following steps:
inputting a user inquiry statement into a platform through an http interface, and firstly obtaining a word index (such as Qing: 15, Tibetan: 23, high: 54, original: 113 and the like) of the user statement;
and (5) further calling and outputting the user sentence word index through the combined model training result of the LSTM and the CRF in the step 5 to obtain words combined by characters, namely the labeling result of the user inquiry sentence.
In the scheme, user statement classification is performed, other unidentified parts input into the sequence identification model are input into the convolutional neural network for attribute classification, and the classification of the user statement to be inquired is automatically realized through machine training of labeled data.

Claims (6)

1. A geological intelligent question and answer oriented data automatic sequence labeling identification method is characterized by comprising the following steps: the method comprises the following steps:
step 1: sorting the map data of the gold mine literature to obtain a domain entity classification description label (including an entity) as a labeling label for identifying a domain knowledge entity;
step 2: automatically cleaning the document data content by a machine, wherein English letters, punctuations and meaningless symbols are filtered to obtain effective Chinese text content;
and step 3: storing the cleaned text contents in an independent txt file to obtain a storage root path of batch document data;
and 4, step 4: performing machine automation labeling on character data by using BIOES labels aiming at the document data obtained in the step 3, wherein label combination is performed by combining the sorted map entity classification description data to obtain a gold ore data labeling result beginning from B, I, O, E, S;
and 5: inputting and training the character sequence data of the gold mine data labeling result in the step 4 by adopting a mode of combining a bidirectional LSTM model and a conditional random field CRF in deep learning, and adding the sorted gold mine map entity data by adjusting the structure and the overall parameters of memory cells in the LSTM model to obtain a training result of the gold mine literature data;
step 6: applying the training result of the document data to platform user query sentence recognition to obtain a labeling result of the user query sentence;
and 7: subtracting the identification content of the model for the gold and mineral data in the user statement from the content of the user inquiry statement, and inputting the obtained residual statement into a convolutional neural network for attribute classification to obtain the classification of the user inquiry statement;
and 8: combining and packaging the gold data identification result and the classification of the user query statement through a Map set to obtain the result of the gold data label and query statement semantic attribute in the user query statement;
and step 9: and (4) mapping the results of the semantic attributes of the label and inquiry statement of the gold mine data in the step (8) to a gold mine knowledge map to obtain the result of the inquiry knowledge of the user, thereby realizing intelligent question answering.
2. The method for automatically identifying sequence tags of data for geological intelligent question answering according to claim 1, wherein the sorting of the data of the golden ore literature maps comprises the following steps:
the method is characterized in that gold mine literature data are collected through manual arrangement of geological encyclopedia and dog searching corpora, and classification description labels are constructed through gold mine field knowledge, wherein the classification description labels comprise geological entities GENT, geological effects GEFF, geochemical GEHE and geological methods GMET.
3. The method for automatically identifying the sequence label of the data facing the geological intelligent question answering according to claim 1, wherein the label combination in the step 4 comprises the following steps:
firstly, carrying out character division on the BIOES label to obtain a single character letter B, I, O, E, S;
and (4) automatically labeling the single-character letters and the txt file content in the step 3 to obtain a gold mine data labeling result beginning from B, I, O, E, S.
4. The method for automatically identifying the sequence label of the data facing the geological intelligent question-answering according to claim 3, wherein the automatic label is carried out on the basis of gold data label, firstly, the gold data is used for training character vectors based on Word2vec, then, the combination mode of bidirectional neural network LSTM and conditional random field CRF in deep learning is used for training and learning the gold data label result, and the training result of the gold data is obtained by adjusting model parameters.
5. The method for identifying the data automation sequence annotation facing the geological intelligent question answering according to claim 1, wherein the identification of the user query sentence comprises the following steps:
inputting a user inquiry statement into a platform through an http interface, and firstly obtaining a word index of the user statement;
and (5) further calling and outputting the user sentence word index through the combined model training result of the LSTM and the CRF in the step 5 to obtain words combined by characters, namely the labeling result of the user inquiry sentence.
6. The method for automatically identifying the sequence label of the data oriented to the geological intelligence question-answering as claimed in claim 1, wherein the user sentence classification is performed by inputting other unidentified parts input into the sequence identification model into a convolutional neural network for attribute classification, and the classification of the user sentence to be queried is automatically realized through machine training of label data.
CN202010804098.1A 2020-08-11 2020-08-11 Geological intelligent question-answering oriented data automation sequence labeling identification method Active CN111930909B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010804098.1A CN111930909B (en) 2020-08-11 2020-08-11 Geological intelligent question-answering oriented data automation sequence labeling identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010804098.1A CN111930909B (en) 2020-08-11 2020-08-11 Geological intelligent question-answering oriented data automation sequence labeling identification method

Publications (2)

Publication Number Publication Date
CN111930909A true CN111930909A (en) 2020-11-13
CN111930909B CN111930909B (en) 2023-09-12

Family

ID=73312110

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010804098.1A Active CN111930909B (en) 2020-08-11 2020-08-11 Geological intelligent question-answering oriented data automation sequence labeling identification method

Country Status (1)

Country Link
CN (1) CN111930909B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114863351A (en) * 2022-07-07 2022-08-05 河北工业大学 Picture and sound fusion roadbed filling collaborative identification management system based on Web3.0

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109002436A (en) * 2018-07-12 2018-12-14 上海金仕达卫宁软件科技有限公司 Medical text terms automatic identifying method and system based on shot and long term memory network
CN109614457A (en) * 2018-11-28 2019-04-12 武汉大学 A kind of recognition methods of the geography information based on deep learning and device
CN110705293A (en) * 2019-08-23 2020-01-17 中国科学院苏州生物医学工程技术研究所 Electronic medical record text named entity recognition method based on pre-training language model
CN111274373A (en) * 2020-01-16 2020-06-12 山东大学 Electronic medical record question-answering method and system based on knowledge graph

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109002436A (en) * 2018-07-12 2018-12-14 上海金仕达卫宁软件科技有限公司 Medical text terms automatic identifying method and system based on shot and long term memory network
CN109614457A (en) * 2018-11-28 2019-04-12 武汉大学 A kind of recognition methods of the geography information based on deep learning and device
CN110705293A (en) * 2019-08-23 2020-01-17 中国科学院苏州生物医学工程技术研究所 Electronic medical record text named entity recognition method based on pre-training language model
CN111274373A (en) * 2020-01-16 2020-06-12 山东大学 Electronic medical record question-answering method and system based on knowledge graph

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114863351A (en) * 2022-07-07 2022-08-05 河北工业大学 Picture and sound fusion roadbed filling collaborative identification management system based on Web3.0

Also Published As

Publication number Publication date
CN111930909B (en) 2023-09-12

Similar Documents

Publication Publication Date Title
CN110399457B (en) Intelligent question answering method and system
Marie-Sainte et al. Arabic natural language processing and machine learning-based systems
CN111783394B (en) Training method of event extraction model, event extraction method, system and equipment
CN110727779A (en) Question-answering method and system based on multi-model fusion
CN107766483A (en) The interactive answering method and system of a kind of knowledge based collection of illustrative plates
CN105930452A (en) Smart answering method capable of identifying natural language
CN112541337B (en) Document template automatic generation method and system based on recurrent neural network language model
Weber et al. Towards a digital infrastructure for illustrated handwritten archives
CN115080694A (en) Power industry information analysis method and equipment based on knowledge graph
CN111143531A (en) Question-answer pair construction method, system, device and computer readable storage medium
CN104573030A (en) Textual emotion prediction method and device
CN110675962A (en) Traditional Chinese medicine pharmacological action identification method and system based on machine learning and text rules
CN111291168A (en) Book retrieval method and device and readable storage medium
CN112395392A (en) Intention identification method and device and readable storage medium
CN112328773A (en) Knowledge graph-based question and answer implementation method and system
CN111930909B (en) Geological intelligent question-answering oriented data automation sequence labeling identification method
CN109766442A (en) A kind of couple of user takes down notes the method and system classified
CN116542676A (en) Intelligent customer service system based on big data analysis and method thereof
CN110705285A (en) Government affair text subject word bank construction method, device, server and readable storage medium
Nagy Green information extraction from family books
CN114840657A (en) API knowledge graph self-adaptive construction and intelligent question-answering method based on mixed mode
CN111949781B (en) Intelligent interaction method and device based on natural sentence syntactic analysis
CN106844329A (en) A kind of open source software question and answer information extraction method based on mail tabulation
CN112579666A (en) Intelligent question-answering system and method and related equipment
Martoglia et al. Knowledge extraction, management and long-term preservation of non-Latin cultural heritages-Digital Maktaba project presentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant