CN111930909B - Geological intelligent question-answering oriented data automation sequence labeling identification method - Google Patents

Geological intelligent question-answering oriented data automation sequence labeling identification method Download PDF

Info

Publication number
CN111930909B
CN111930909B CN202010804098.1A CN202010804098A CN111930909B CN 111930909 B CN111930909 B CN 111930909B CN 202010804098 A CN202010804098 A CN 202010804098A CN 111930909 B CN111930909 B CN 111930909B
Authority
CN
China
Prior art keywords
data
gold
labeling
result
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010804098.1A
Other languages
Chinese (zh)
Other versions
CN111930909A (en
Inventor
贺金龙
付立军
黄徐胜
唐珂珂
朱月琴
刘晓娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202010804098.1A priority Critical patent/CN111930909B/en
Publication of CN111930909A publication Critical patent/CN111930909A/en
Application granted granted Critical
Publication of CN111930909B publication Critical patent/CN111930909B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/117Tagging; Marking up; Designating a block; Setting of attributes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/169Annotation, e.g. comment data or footnotes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Mathematical Physics (AREA)
  • Animal Behavior & Ethology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the technical field of information, and provides a data automation sequence labeling identification method for geological intelligent question-answering. The application aims to realize the accuracy of the user question-answer interaction effect in the intelligent question-answer process of gold mine data. The main scheme includes that the gold mine literature map data are arranged and cleaned to obtain batch literature data; performing machine automatic labeling of character data by using BIOES labels aiming at document data to obtain gold data labeling results, and performing input training by deep learning to obtain training results of gold document data; applying the training result of the literature data to the recognition of the user inquiry sentences to obtain the labeling result of the user inquiry sentences, and then carrying out attribute classification to obtain the classification of the user inquiry sentences; and combining and packaging the labeling result and the classification through a set to obtain a result of labeling gold data in the user inquiry statement and the semantic attribute of the inquiry statement, and mapping the result to a gold knowledge graph to obtain a user inquiry knowledge result.

Description

Geological intelligent question-answering oriented data automation sequence labeling identification method
Technical Field
The application relates to the technical field of knowledge graph application in the deep learning knowledge mining process, and provides a gold mine data automatic sequence labeling method for realizing an intelligent question-answering platform.
Background
Currently, intelligent question-answering services are an important application in the development stage of artificial intelligence, and have a larger cognitive ability compared with traditional rule matching and co-occurrence search matching. In the implementation process, the concept and relation association of knowledge is realized by introducing a knowledge graph, and then the field recognition and the intention recognition are carried out by using an automatic sequence labeling method of deep learning in the user question-answering process, so that an intelligent question-answering platform is realized.
At present, the realization of a question-answering system is mostly dependent on regular template matching and elastic search retrieval matching, and questions and answers in the general field are more, and meanwhile, the realization of intelligent question-answering service in the specific field is challenging due to lack of deep semantic knowledge analysis. When the traditional question-answering system processes Chinese text, sentences are generally converted into word representations through word segmentation technology, and then knowledge base matching of the sentences is carried out through semantic similarity calculation (editing distance and vector cosine similarity of TFIDF) so as to realize query answering of users. The word segmentation technology comprises three development stages of rule dictionary matching, statistical machine learning and deep learning. The rule dictionary-based matching comprises forward maximum matching and reverse maximum matching bidirectional maximum matching; the statistical machine learning comprises an n-gram language model, a maximum entropy model, a conditional random field and the like; with the massive data information generated in the step of advancing web2.0 to web3.0, a word segmentation method based on deep learning is continuously raised, and the word segmentation method comprises a convolutional neural network, a cyclic neural network, a long-short-time memory network, a mode of combining with a conditional random field and the like, wherein the label mode adopted in the identification process is BIO or BIOES label.
The existing labeling method has the defects that:
(1) In the gold mine knowledge mining discovery process, manual processing of a large amount of data information is time-consuming and labor-consuming, and the processing efficiency is low.
(2) Aiming at the application of word segmentation tools, the method is seriously dependent on the construction of a dictionary, and when the method is used in the gold information processing process, the application effect cannot be achieved, and the method has a good effect in the general field.
(3) For sequence labeling of massive gold data, structured information of specific domain knowledge categories is also needed on the basis of the prior art method.
Disclosure of Invention
The application aims to realize the accuracy of the user question-answer interaction effect in the intelligent question-answer process of gold data, constructs a deep learning identification method based on automatic sequence labeling, and constructs by combining gold field documents with a map.
In order to solve the technical problems, the application adopts the following technical scheme:
a data automation sequence labeling identification method for geological intelligent question and answer comprises the following steps:
step 1: the gold mine literature map data are arranged to obtain domain entity classification description tags (including entities) which are used as labeling tags for domain knowledge entity identification;
step 2: performing machine automatic cleaning on document data content, including filtering English letters, punctuation marks and nonsensical marks to obtain effective Chinese text content;
step 3: storing the cleaned text content in an independent txt file to obtain a storage root path of batch document data;
step 4: performing machine automatic labeling of character data by using BIOES labels for the literature data obtained in the step 3, and performing label combination by combining the sorted map entity classification description data to obtain a gold ore data labeling result beginning with B, I, O, E, S;
step 5: inputting and training character sequence data of the labeling result of the gold mine data in the step 4 by adopting a mode of combining a bidirectional LSTM model and a conditional random field CRF in deep learning, and adding the tidied gold mine map entity data by adjusting the structure and the integral parameters of the memory cells in the LSTM model to obtain a training result of gold mine literature data;
step 6: applying the training result of the literature data to the recognition of the platform user inquiry statement to obtain the labeling result of the user inquiry statement;
step 7: inputting the rest sentences obtained by subtracting the content of the gold ore data labeling result from the content of the user inquiry sentences into a convolutional neural network for attribute classification to obtain the classification of the user inquiry sentences;
step 8: combining and packaging the gold data identification result and the classification of the user inquiry statement through a Map set to obtain the label of the gold data in the user inquiry statement and the result of the semantic attribute of the inquiry statement, for example { Qinghai-Tibet plateau=what of geological entity GENT };
step 9: and (3) mapping the labeling of the gold data and the result of the semantic attribute of the query sentence in the step (8) to a gold knowledge graph to obtain a user query knowledge result, thereby realizing intelligent question and answer.
In the above technical scheme, the arrangement of the gold mine literature map data comprises:
aiming at gold document data, the gold document data is collected through artificial arrangement of a geological encyclopedia and dog searching corpus, and classification description tags are constructed through gold field knowledge, wherein the classification description tags comprise geological entities GENT, geological actions GEFF, geological chemistry GEHE and geological methods GMET.
In the above technical solution, the label combination in step 4 includes the steps of:
firstly, performing character division on the BIOES label to obtain a single word Fu Zimu B, I, O, E, S;
and (3) automatically labeling the single character letters and the txt file content in the step (3) to obtain a gold ore data labeling result beginning with B, I, O, E, S.
According to the technical scheme, automatic labeling is performed on the basis of gold data labeling, gold data is firstly used for training character vectors based on Word2vec, then training and learning are performed on gold data labeling results in a mode of combining a bidirectional neural network LSTM and a conditional random field CRF in deep learning, and training results of the gold data are obtained by adjusting model parameters.
In the technical scheme, the user inquiry statement identification is carried out, the sequence of the user inquiry statement information is automatically identified by inputting the user inquiry statement into the model and using the training result model, and the labeling result of the user inquiry statement is obtained;
in the above technical solution, the user inquiry sentence identification includes the following steps:
(1) Inputting a user inquiry sentence into a platform through an http interface, and firstly obtaining a word index (such as cyan: 15, tibetan: 23, high: 54, original: 113, etc.) of the user sentence;
(2) And (3) further calling and outputting the word index of the user statement through the combined model training result of the LSTM and the CRF in the step (5) to obtain words combined by characters, namely the labeling result of the user inquiry statement.
In the technical scheme, the user statement classification is realized automatically through machine training of labeling data by inputting other unidentified parts input into the sequence recognition model into the convolutional neural network to classify the attributes of the unidentified parts, so that the user inquiry statement classification is obtained.
By adopting the technical scheme, the application has the following beneficial effects:
1. the gold mine literature data needs professional knowledge skills for processing application, and automatic sequence labeling identification of a machine is adopted, so that on one hand, the complexity of manual processing is reduced; another negative aspect concentrates domain knowledge on the inside, and users expand quickly during use without concentrating on the inside of the floor.
2. The automatic sequence labeling identification method based on the map gold mine data provides a convenient interaction mode for users in the intelligent question-answering process, only the query sentences need to be input, and the convenience of gold mine field knowledge in the application process is greatly improved.
3. The automatic sequence labeling recognition process does not depend on word segmentation tools, only depends on automatic model training, so that manpower resources are greatly reduced, meanwhile, the model only needs to be trained once in the use process, and the model only needs to be called during the use without training.
4. The migration of the model technology only depends on the provided literature data, the model can be conveniently and rapidly customized and trained according to different data, and the model migration risk is reduced.
5. The adoption of the automatic sequence labeling and recognition method of the map gold mine data enables the intelligent question-answer to have more generalization capability compared with the matching based on the regular template and the matching based on the retrieval.
Drawings
FIG. 1 is a flow chart of an intelligent question-answering service;
FIG. 2 is a sequence annotation diagram of a tag combination described based on BIOES and gold data classification;
FIG. 3 is a flowchart of a labeling process based on a word segmentation tool;
FIG. 4 is a flowchart of automated sequence annotation recognition.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the particular embodiments described herein are illustrative only and are not intended to limit the application, i.e., the embodiments described are merely some, but not all, of the embodiments of the application. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.
It is noted that relational terms such as "first" and "second", and the like, are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The automatic sequence labeling identification method based on the map gold data, which is adopted by the gold intelligent question-answering platform, achieves timely and accurate answer of the user's query by combining the domain feature knowledge. Firstly, collecting gold mine data documents, and obtaining Chinese text content by removing invalid symbols and nonsensical labels; then, knowledge description classification information is constructed by combining the classification structure information of the domain knowledge; then, automatic sequence labeling of character labels is carried out on the text content, and combination label labeling of characters is carried out by combining domain knowledge classification description labels; training and learning the gold text data by using a bidirectional neural network in the deep learning model, and adjusting parameters to achieve a threshold model meeting automatic sequence labeling and recognition; and then using the obtained model to perform sequence recognition of the user statement, performing intention class on the data from which the sequence recognition is removed, mapping the classification result and the sequence recognition result to a gold ore knowledge graph to perform user inquiry, and further realizing user feedback. The question and answer service is shown in fig. 1.
The steps are as follows:
and (5) data arrangement. The gold document data are collected in a sorting mode, and classification description tags such as geological entities GENT, geological actions GEFF, geological chemistry GEHE and geological methods GMET are constructed through gold field knowledge.
And (5) data cleaning. And (3) processing the sorted document data in batches through texts to obtain text contents, and cleaning the text content formats in a regular matching expression mode to obtain effective Chinese texts.
And storing data in batches. The content of the batch text is uniformly stored in a fixed root directory according to the article space number by using python and is stored in the form of utf-8 and txt files.
Automated labeling of combined labels. And (3) combining and labeling the gold data text content piece by piece and character by combining the sorted gold field knowledge classification description labels with the traditional BIOES labels to obtain a gold data character labeling result beginning with B, I, O, E, S. As shown in fig. 2.
Automated sequence recognition for deep learning. On the basis of data annotation, gold data is firstly used for training character vectors based on Word2vec, then the annotation data is trained and learned in a combination mode of a bidirectional neural network LSTM and a conditional random field CRF in deep learning, and a training result (preservation of a checkpoint file) of the gold data is obtained by adjusting model parameters. The manner of recognition in the word segmentation tool that scores the weights of the word features is not used here, as shown in fig. 3.
User query sentence sequence identification. And inputting the user inquiry statement into the model, and automatically identifying the sequence of the user statement information by using the training result model to obtain the labeling result of the user data. As shown in fig. 4.
User statement classification. And inputting other unidentified parts input into the sequence recognition model into a convolutional neural network to classify the attributes, wherein the attribute classification is automatically realized through machine training of labeling data, and the user statement classification is obtained.
And classifying and obtaining the user sequence labeling result and sentence attribute. And (3) realizing user statement information understanding by combining the results in the step (6) and the step (7) to obtain a combined result of the two.
And (5) map mapping and inquiring. And (3) mapping the combined result in the step (8) into a gold ore knowledge graph, and obtaining feedback information through knowledge graph organization query.
Examples
The application provides a data automation sequence labeling identification method for geological intelligent question and answer, which comprises the following steps:
step 1: the gold mine literature map data are arranged to obtain domain entity classification description tags (including entities) which are used as labeling tags for domain knowledge entity identification;
step 2: performing machine automatic cleaning on document data content, including filtering English letters, punctuation marks and nonsensical marks to obtain effective Chinese text content;
step 3: storing the cleaned text content in an independent txt file to obtain a storage root path of batch document data;
step 4: performing machine automatic labeling of character data by using BIOES labels for the literature data obtained in the step 3, and performing label combination by combining the sorted map entity classification description data to obtain a gold ore data labeling result beginning with B, I, O, E, S;
step 5: inputting and training the character sequence data of the labeling result of the gold mine data in the step 4 by adopting a mode of combining a bidirectional LSTM model and a conditional random field CRF in deep learning, and adding the tidied gold mine map entity data by adjusting the structure and the integral parameters of the memory cells in the LSTM model to obtain the training result of the gold mine literature data (storing a checkpoint file).
Step 6: applying the training result of the literature data to the recognition of the platform user inquiry statement to obtain the labeling result of the user inquiry statement;
step 7: inputting the rest sentences obtained by subtracting the content of the gold ore data labeling result from the content of the user inquiry sentences into a convolutional neural network for attribute classification to obtain the classification of the user inquiry sentences;
step 8: combining the gold data labeling result with the classification of the user query statement through the user query statement to obtain a result of the labeling of the gold data in the user query statement and the semantic attribute of the query statement; the labeling result of the gold ore data refers to entity parts in the gold ore literature, such as geological entities (Qinghai-Tibet plateau and volcanic institutions), geological effects, geochemistry and geological methods; classification of user query sentences refers to attribute categories of user queries for the entity part, such as: brief introduction, category, size, relationship, regional scope;
step 9: and (3) mapping the labeling of the gold data and the result of the semantic attribute of the query sentence in the step (8) to a gold knowledge graph to obtain a user query knowledge result, thereby realizing intelligent question and answer.
In the scheme, the arrangement of the gold mine literature map data comprises the following steps:
aiming at gold document data, the gold document data is collected through artificial arrangement of a geological encyclopedia and dog searching corpus, and classification description tags are constructed through gold field knowledge, wherein the classification description tags comprise geological entities GENT, geological actions GEFF, geological chemistry GEHE and geological methods GMET.
In the above scheme, the label combination in step 4 includes the steps of:
firstly, performing character division on the BIOES label to obtain a single word Fu Zimu B, I, O, E, S;
and (3) automatically labeling the single character letters and the txt file content in the step (3) to obtain a gold ore data labeling result beginning with B, I, O, E, S.
In the scheme, automatic labeling is performed on the basis of gold data labeling, gold data is firstly used for training character vectors based on Word2vec, then training and learning are performed on gold data labeling results by combining a bidirectional neural network LSTM and a conditional random field CRF in deep learning, and training results of the gold data are obtained by adjusting model parameters.
In the above scheme, the user inquiry sentence identification includes the following steps:
inputting a user inquiry sentence into a platform through an http interface, and firstly obtaining a word index (such as cyan: 15, tibetan: 23, high: 54, original: 113, etc.) of the user sentence;
and (3) further calling and outputting the word index of the user statement through the combined model training result of the LSTM and the CRF in the step (5) to obtain words combined by characters, namely the labeling result of the user inquiry statement.
In the scheme, the user statement classification is realized automatically through machine training of labeling data by inputting other unidentified parts input into the sequence recognition model into the convolutional neural network to classify the attributes of the unidentified parts, so that the user inquiry statement classification is obtained.

Claims (4)

1. A data automation sequence labeling identification method for geological intelligent question and answer is characterized in that: the method comprises the following steps:
step 1: the gold mine literature map data are arranged to obtain domain entity classification description tags which are used as labeling tags for domain knowledge entity identification;
step 2: performing machine automatic cleaning on document data content, including filtering English letters, punctuation marks and nonsensical marks to obtain effective Chinese text content;
step 3: storing the cleaned text content in an independent txt file to obtain a storage root path of batch document data;
step 4: performing machine automatic labeling of character data by using BIOES labels for the literature data obtained in the step 3, and performing label combination by combining the sorted map entity classification description data to obtain a gold ore data labeling result beginning with B, I, O, E, S;
step 5: inputting and training character sequence data of the labeling result of the gold mine data in the step 4 by adopting a mode of combining a bidirectional LSTM model and a conditional random field CRF in deep learning, and adding the tidied gold mine map entity data by adjusting the structure and the integral parameters of the memory cells in the LSTM model to obtain a training result of gold mine literature data;
step 6: applying the training result of the literature data to the recognition of the platform user inquiry statement to obtain the labeling result of the user inquiry statement;
step 7: subtracting the identification content of the model on the gold data in the user statement from the content of the user query statement, and inputting the obtained residual statement into a convolutional neural network for attribute classification to obtain the classification of the user query statement;
step 8: combining and packaging the gold data identification result and the classification of the user inquiry statement through a Map set to obtain the label of the gold data in the user inquiry statement and the result of the semantic attribute of the inquiry statement;
step 9: mapping the labeling of the gold data in the step 8 and the result of the semantic attribute of the query sentence to a gold knowledge graph to obtain a user query knowledge result, thereby realizing intelligent question and answer;
the user query sentence identification includes the steps of:
inputting a user inquiry sentence into a platform through an http interface, and firstly obtaining a word index of the user sentence;
the word index of the user sentence is further invoked and output through the combined model training result of the LSTM and the CRF in the step 5, and the word combined by the characters, namely the labeling result of the user inquiry sentence, is obtained;
the user statement classification, namely inputting other unidentified parts input into the sequence recognition model into a convolutional neural network to classify the attributes of the unidentified parts, wherein the classification is automatically realized through machine training of labeling data, and the user inquiry statement classification is obtained.
2. The method for identifying the automatic sequence labels of the geological intelligent question-answering oriented data according to claim 1, wherein the arrangement of the gold mine literature map data comprises the following steps:
aiming at gold document data, the gold document data is collected through artificial arrangement of a geological encyclopedia and dog searching corpus, and classification description tags are constructed through gold field knowledge, wherein the classification description tags comprise geological entities GENT, geological actions GEFF, geological chemistry GEHE and geological methods GMET.
3. The method for identifying the automatic sequence labels of the data for intelligent geological questions and answers according to claim 1, wherein the label combination in the step 4 comprises the following steps:
firstly, performing character division on the BIOES label to obtain a single word Fu Zimu B, I, O, E, S;
and (3) automatically labeling the single character letters and the txt file content in the step (3) to obtain a gold ore data labeling result beginning with B, I, O, E, S.
4. The geological intelligent question-answering oriented data automatic sequence labeling identification method according to claim 3 is characterized in that automatic labeling is carried out on the basis of gold data labeling, gold data is firstly used for training character vectors based on Word2vec, then training and learning are carried out on gold data labeling results by combining a bidirectional neural network LSTM (local area network) and a conditional random field CRF (random field) in deep learning, and training results of gold data are obtained by adjusting model parameters.
CN202010804098.1A 2020-08-11 2020-08-11 Geological intelligent question-answering oriented data automation sequence labeling identification method Active CN111930909B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010804098.1A CN111930909B (en) 2020-08-11 2020-08-11 Geological intelligent question-answering oriented data automation sequence labeling identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010804098.1A CN111930909B (en) 2020-08-11 2020-08-11 Geological intelligent question-answering oriented data automation sequence labeling identification method

Publications (2)

Publication Number Publication Date
CN111930909A CN111930909A (en) 2020-11-13
CN111930909B true CN111930909B (en) 2023-09-12

Family

ID=73312110

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010804098.1A Active CN111930909B (en) 2020-08-11 2020-08-11 Geological intelligent question-answering oriented data automation sequence labeling identification method

Country Status (1)

Country Link
CN (1) CN111930909B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114863351B (en) * 2022-07-07 2022-09-20 河北工业大学 Picture and sound fusion roadbed filling collaborative identification management system based on Web3.0

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109002436A (en) * 2018-07-12 2018-12-14 上海金仕达卫宁软件科技有限公司 Medical text terms automatic identifying method and system based on shot and long term memory network
CN109614457A (en) * 2018-11-28 2019-04-12 武汉大学 A kind of recognition methods of the geography information based on deep learning and device
CN110705293A (en) * 2019-08-23 2020-01-17 中国科学院苏州生物医学工程技术研究所 Electronic medical record text named entity recognition method based on pre-training language model
CN111274373A (en) * 2020-01-16 2020-06-12 山东大学 Electronic medical record question-answering method and system based on knowledge graph

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109002436A (en) * 2018-07-12 2018-12-14 上海金仕达卫宁软件科技有限公司 Medical text terms automatic identifying method and system based on shot and long term memory network
CN109614457A (en) * 2018-11-28 2019-04-12 武汉大学 A kind of recognition methods of the geography information based on deep learning and device
CN110705293A (en) * 2019-08-23 2020-01-17 中国科学院苏州生物医学工程技术研究所 Electronic medical record text named entity recognition method based on pre-training language model
CN111274373A (en) * 2020-01-16 2020-06-12 山东大学 Electronic medical record question-answering method and system based on knowledge graph

Also Published As

Publication number Publication date
CN111930909A (en) 2020-11-13

Similar Documents

Publication Publication Date Title
CN110399457B (en) Intelligent question answering method and system
CN112115238B (en) Question-answering method and system based on BERT and knowledge base
CN108287822B (en) Chinese similarity problem generation system and method
CN110298033B (en) Keyword corpus labeling training extraction system
CN109766417B (en) Knowledge graph-based literature dating history question-answering system construction method
CN109033374B (en) Knowledge graph retrieval method based on Bayesian classifier
CN110727779A (en) Question-answering method and system based on multi-model fusion
CN107766483A (en) The interactive answering method and system of a kind of knowledge based collection of illustrative plates
CN109460459B (en) Log learning-based dialogue system automatic optimization method
CN105930452A (en) Smart answering method capable of identifying natural language
CN109002473B (en) Emotion analysis method based on word vectors and parts of speech
CN107133212B (en) Text implication recognition method based on integrated learning and word and sentence comprehensive information
CN112052324A (en) Intelligent question answering method and device and computer equipment
CN115080694A (en) Power industry information analysis method and equipment based on knowledge graph
CN110019698A (en) A kind of intelligent Service method and system of medicine question and answer
CN112541337A (en) Document template automatic generation method and system based on recurrent neural network language model
CN111143531A (en) Question-answer pair construction method, system, device and computer readable storage medium
CN110675962A (en) Traditional Chinese medicine pharmacological action identification method and system based on machine learning and text rules
CN114153994A (en) Medical insurance information question-answering method and device
CN111291168A (en) Book retrieval method and device and readable storage medium
CN112579666A (en) Intelligent question-answering system and method and related equipment
CN112380848A (en) Text generation method, device, equipment and storage medium
CN113934814B (en) Automatic scoring method for subjective questions of ancient poems
CN111930909B (en) Geological intelligent question-answering oriented data automation sequence labeling identification method
CN118132669A (en) Intelligent indexing method based on large language model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant