CN112270189A - Question type analysis node generation method, question type analysis node generation system and storage medium - Google Patents

Question type analysis node generation method, question type analysis node generation system and storage medium Download PDF

Info

Publication number
CN112270189A
CN112270189A CN202011259004.3A CN202011259004A CN112270189A CN 112270189 A CN112270189 A CN 112270189A CN 202011259004 A CN202011259004 A CN 202011259004A CN 112270189 A CN112270189 A CN 112270189A
Authority
CN
China
Prior art keywords
analysis
information
data
natural language
intention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011259004.3A
Other languages
Chinese (zh)
Other versions
CN112270189B (en
Inventor
姜磊
钟颖欣
辛岩
杨钊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Brilliant Data Analytics Inc
Original Assignee
Brilliant Data Analytics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Brilliant Data Analytics Inc filed Critical Brilliant Data Analytics Inc
Priority to CN202011259004.3A priority Critical patent/CN112270189B/en
Publication of CN112270189A publication Critical patent/CN112270189A/en
Application granted granted Critical
Publication of CN112270189B publication Critical patent/CN112270189B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a data analysis technology, in particular to a method, a system and a storage medium for generating question analysis nodes, wherein the method comprises the following steps: preprocessing and word segmentation processing are carried out on the input natural language problem; performing feature representation and feature extraction on the text data corresponding to the preprocessed input natural language problem, and converting the text data into a numerical form; extracting key information in the natural language problem, and identifying the type of the key information; constructing an intention recognition model, and judging the analysis intention of the input natural language question; and combining results of feature extraction, type recognition and intention recognition to obtain data sources, analysis dimensions, analysis indexes, analysis tasks and other additional data analysis information which need to be analyzed in the natural language problem, and automatically generating analysis nodes. The invention ensures that the user can finish the data analysis and exploration work without knowing a complex data structure and an analysis method, thereby quickly exploring the problems in the data discovery service.

Description

Question type analysis node generation method, question type analysis node generation system and storage medium
Technical Field
The present invention relates to data analysis technologies, and in particular, to a method, a system, and a storage medium for generating a question-asked analysis node.
Background
The existing question data analysis system generally puts forward simple natural language questions by users, and the system automatically queries a database after analyzing to obtain results and presents the results to the users as a visual answer. This is only a query for some specific, relatively simple questions, such as "what the power consumption in a certain area is in a month" is provided by the user, and the existing questioning data analysis system aggregates the power consumption data in the current month in the database into a summary value and returns a visual view or a specific numerical value to the user.
When the questions of the users are complicated, such as "how the electricity usage trends of different user types in guangzhou city in the first half year? Since the above conventional question analysis system only has a data query function, the result corresponding to the question asked by the user does not directly exist in the database, and thus the complicated question analysis requirement of the user cannot be satisfied.
In addition, if the user's question does not relate to an analysis path in the shared library of the data analysis system, the user may not get effective analysis path recommendation feedback from the questioning data analysis system. Therefore, it is necessary to provide a questionable analysis node generation method, system, and the like for solving the problems of the analysis path recommendation data analysis system.
Disclosure of Invention
The invention provides a questioning type analysis node generation method, a questioning type analysis node generation system and a storage medium, which can analyze natural language questions proposed by users, automatically extract data, select an analysis function and generate analysis nodes, so that the users can finish data analysis and exploration work without knowing complex data structures and analysis methods, and therefore, the problems in data discovery services can be rapidly explored.
The method for generating the analysis node of the questioning formula comprises the following steps:
s1, preprocessing the input natural language problem and performing word segmentation processing to obtain words after word segmentation processing;
s2, performing feature representation and feature extraction on the text data corresponding to the preprocessed input natural language problem, and converting the text data into a numerical form;
s3, extracting key information in the input natural language question, and performing type recognition on the key information to obtain entity category information;
s4, constructing an intention recognition model, judging the analysis intention of the input natural language question, and finishing intention recognition;
and S5, combining the results of feature extraction, type identification and intention identification in the steps S2-S4 to obtain data sources, analysis dimensions, analysis indexes, analysis tasks and other additional data analysis information which need to be analyzed in the natural language problem, and automatically generating analysis nodes.
In a preferred embodiment, step S5 includes:
s51, making a task data interface of the analysis node, and making a standard data interface for each analysis node task;
s52, generating data interface information, and matching and indexing to obtain data source information, index information, dimension information and other additional data analysis information based on entity category information and in combination with metadata information; determining an analysis node task based on the analysis intent; and processing the data source information, the index information, the dimension information and other additional data analysis information, transmitting the processed data source information, the index information, the dimension information and the other additional data analysis information to corresponding analysis node tasks, and calling the analysis node tasks to complete the generation and display of analysis results.
The question analysis node generation system according to the present invention includes:
the preprocessing module is used for preprocessing and word segmentation processing the input natural language problem to obtain words after word segmentation processing;
the feature extraction module is used for performing feature representation and feature extraction on the text data corresponding to the preprocessed input natural language problem and converting the text data into a numerical form;
the information extraction module is used for extracting key information in the input natural language question and identifying the type of the key information to obtain entity category information;
the intention recognition module is used for constructing an intention recognition model, judging the analysis intention of the input natural language question and finishing intention recognition;
and the analysis node generation module is used for combining the processing results of the feature extraction module, the information extraction module and the intention identification module to obtain a data source, analysis dimensionality, analysis indexes, analysis tasks and other additional data analysis information which are required to be analyzed in the natural language problem and automatically generating an analysis node.
The storage medium of the present invention has stored thereon computer instructions which, when executed by a processor, perform the steps of the analytical node generation method of the present invention.
Compared with the prior art, the invention has the remarkable effects that: according to the input natural language problem, the intention of the user for data analysis can be automatically identified, the source data can be automatically matched and indexed, the filtering condition is generated, the analysis dimension and the index are determined, the analysis node is automatically generated, the analysis path is formed, and the threshold of the user for data analysis is reduced.
Drawings
FIG. 1 is a flow chart of an implementation of a method for visualizing an analytic concept of the present invention;
FIG. 2 is a schematic structural diagram of an LSTM-CRF model;
fig. 3 is a block flow diagram of analysis node generation.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It is to be understood that the embodiments described below are only some embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the method for generating an analysis node of a question includes the following steps:
and S1, preprocessing the input natural language problem and performing word segmentation processing to obtain words after word segmentation processing.
Carrying out unified normalized preprocessing on natural language problems input by a user, and carrying out full half-angle conversion, case-case conversion, special symbol cleaning and removal and the like on text data corresponding to the input natural language problems; in addition, because of the particularity of Chinese, there is no obvious separator between words, even there is no separator in the mixed text of Chinese and English, so word segmentation is needed to divide the whole sentence text string into independent words.
Step S1 specifically includes: loading text data corresponding to the input natural language question to a memory for processing; uniformly converting text data corresponding to the input natural language problem into a form of lower case letters, half corners and simplified bodies, and performing word segmentation by using a jieba word segmentation tool; and judging the word list after word segmentation, if a stop word bank exists, removing corresponding stop words, and if not, keeping the stop words.
S2, feature extraction
And performing feature representation and feature extraction on the text data corresponding to the preprocessed input natural language problem. The machine learning model cannot directly use natural language, and the purpose can be achieved by expressing the machine learning model in a numerical form and by characteristic representation and extraction. In this embodiment, feature representation and feature extraction are performed on the preprocessed text by using TF-IDF (Term Frequency-Inverse Document Frequency), a conversion model Word2Vec, a text feature extraction function countvectorer, and the like, and the preprocessed text is converted into a numerical form.
When text characteristic representation is carried out in the step, words in the text data are converted into word frequency matrixes, TF-IDF weights of all the words are counted, and the weights of the words in the corresponding text data are obtained, which is a compromise process; therefore, partial features which can represent text semantics are selected, so that not only can the text be better expressed, but also the algorithm complexity can be reduced;
in this embodiment, TF-IDF is a combination of TF and IDF, and the calculation formula is as follows:
Figure BDA0002774019990000031
wherein, T FijIndicating the number of times the ith feature item in the document set appears in the document j. It should be noted that: TF is the word frequency, which means the number of times a word appears in a document, and is an important evaluation index because it considers not only whether a feature word appears, but also the number of times it appears.
IDF is the inverse document frequency, considering that if a word appears in every document, it indicates that the word is a normal word and does not have the ability to distinguish between documents, and if a word appears in only a few documents in the corpus, it indicates that the word has the ability to distinguish between documents. The expression is as follows:
Figure BDA0002774019990000032
where N represents the total number of documents in the document set, NjRepresenting the number of documents containing a feature word j, njThe significance of +0.01 is to prevent IDF from going to infinity.
S3, information extraction
The natural language questions input by the user through the machine learning algorithm are as follows: extracting key information such as time, area name, index name and the like; and the type of the key information is also identified. For example: the "Guangzhou city" is a regional information, and the "first half year" is a time information.
In the present invention, the TF-IDF weight corresponds to a numerical representation of a word in order to perform a mathematical operation. The key elements refer to time, area, index and other terms in a sentence, and the constructed entity recognition model is recognized. For example: "how much electricity is used in the city of Guangzhou for half a year? The words are divided into words to obtain each word, TF-IDF is used for carrying out mathematical representation on each word, then the entity recognition model is operated to recognize that Guangzhou is a region, the last half year is time, and electricity consumption is an index.
Further, step S3 specifically includes:
and S31, carrying out sequence tagging on the text data in the training data to obtain the entity type of the segment to which each word element belongs and the position of the word element in the segment to which the word element belongs in the text data, and forming tagged data.
And carrying out sequence labeling on the text data in the training data, wherein words are entity names, and words are not entity names. In this embodiment, a BIO (Inside, out) labeling manner is adopted to label each word element in the text data as "B-X", "I-X", or "O", where "B-X" indicates that the segment where the word element is located belongs to the X type and the word element is at the beginning of the segment, "I-X" indicates that the segment where the word element is located belongs to the X type and the word element is at the middle position of the segment, and "O" indicates that the word element does not belong to any type; and "X" represents the name of the type of the entity to be identified, such as time entity "TIM", regional entity "DIS", dimensional entity "DIM", etc. Taking the regional entity as an example, "B-DIS" represents the beginning of the regional entity, and "I-DIS" represents the middle of the regional entity. For example: "what is the electricity usage in Guangzhou in 6 months? "the result after sequence labeling is:
·6->B-TIM
monthly- > I-TIM
Part- > O
Guang- > B-DIS
Zhou- > I-DIS
- > O of
With- > B-IDX
E- > I-IDX
Quantity- > I-IDX
Is- > O
O is poly-)
O is small- > O
S32 model training
Named entity recognition for natural language problems NER aims at extracting text segments of specific required entities from text data, which is actually a sequence tagging problem from a model perspective. For each cell of the input sequence, a specific tag is output. In the machine learning based method, a Conditional Random Field (CRF) is a mainstream model of the named entity recognition NER, and its objective function not only considers the input state feature function, but also includes a label transfer feature function. Conditional random fields have the advantage that they can utilize rich internal and contextual feature information in labeling a location. And in the distributed representation of words in the neural network model, tokens are mapped to dense Embedding (Embedding) representation in a low-dimensional space from sparse one-hot representation, the representation of the words is enriched, an Embedding sequence of sentences is input into a recurrent neural network RNN, features are automatically extracted by the neural network, the complex feature engineering is not depended on, and the label of each token is predicted by Softmax. The disadvantage is that the process of tagging each token is independent and cannot directly utilize the tags already predicted above, resulting in the possibility of invalid predicted tag sequences.
The invention integrates the advantages of the two models, combines the neural network model and the conditional random field model to form the LSTM-CRF model, and can well solve the problem of NER named entity recognition as shown in figure 2. The LSTM, a Long Short Term Memory network (Long Short Term Memory network), is a special type of RNN that can learn Long distance dependent information. Different from the common RNN unit which only has one tanh layer, the LSTM has three gate structures (an input gate, a forgetting gate and an output gate), selectively forgets part of history information, adds part of current input information, and finally integrates the current state and generates an output state. The biLSTM-CRF model applied to the NER mainly comprises an Embedding layer, a bidirectional LSTM layer and a final CRF layer, and is the most mainstream model in the current NER method based on deep learning.
And taking the data marked by the sequence as training data, training by using a BilSTM-CRF model, and performing parameter optimization to identify the type of the newly input natural language problem to obtain entity category information.
S4 intention recognition
And constructing an intention recognition model, and judging the purpose of carrying out data analysis on the natural language question provided by the user, such as source data viewing, data filtering, multidimensional analysis, funnel analysis, comparative analysis, trend analysis, report analysis, correlation analysis and the like.
Further, step S4 specifically includes:
s41, labeling data
The purpose of intention identification is to judge the intention of data analysis of an input natural language question, whether the input natural language question is used for inquiring data or trend analysis or other analysis intents, and the essence of the intention is a text classification question; therefore, an intention recognition model is trained, namely a text classification model is trained. Firstly, training data needs to be labeled, and the intention type of each natural language question is labeled. For example, the intent types total 7 classes: source data viewing, data filtering, multidimensional analysis, funnel analysis, comparative analysis, trend analysis, report analysis, correlation analysis, which can be simply marked with the numbers 0, 1, 2, 3, 4, 5, 6.
S42 model training
The essence of the intention recognition is text classification, after input text is preprocessed and is processed by TF-IDF, the numerical characteristics of words are extracted, a Support Vector Machine (SVM) is used for training a classification model, and the classification model is constructed into an intention recognition model. After training and optimization, the intention recognition model can perform intention recognition on text data corresponding to the newly input natural language problem, perform probability prediction on each intention type, and select the intention type with the highest probability as the input natural language problem.
S5, analysis node generation
Combining the results of feature extraction, entity category identification and intention identification in the above steps S2-S4, the data source, analysis dimension, analysis index and analysis task which need to be analyzed in the natural language question input by the user can be obtained, and other additional data analysis information which may include time information, region information and the like. Combining the above information enables the system to automatically generate the analysis nodes.
Further, step S5 specifically includes:
s51, establishing a task data interface of the analysis node: a standard data interface is established for each analysis node task, for example, trend analysis node task input data includes: data source name, analysis index, time range and screening condition; distributing the analysis node task input data includes: data source name, analysis index, analysis dimension, and screening condition. By analogy, each analysis node task has corresponding input data according to the characteristics of the analysis node task. Wherein part of the input data is mandatory and part is optional. The filtering condition in both tasks is optional as described above, and other input data is necessary.
S52, data interface information generation: matching and indexing to obtain data source information, index information, dimension information, time information and region information based on entity category information obtained in an entity identification process and in combination with existing metadata information in a system; determining an analysis node task based on an analysis intention obtained in the intention identification process; the information is processed and then transmitted to the corresponding analysis node task, and the analysis node task is called to complete the generation and display of the analysis result, as shown in fig. 3. That is, the analysis intention information is used to determine which task node in the system is adopted (the task nodes are all built-in to the system), and each task node has a corresponding data interface; data source information, dimension information and index information in the sentence are obtained through the entity recognition process, and are matched with a data dictionary in the system to determine a data name, an index name and a dimension name; and the time information and the area information are subjected to regular standardization processing to serve as screening conditions of data. The above information is used as the input data of the task node, and the system can automatically generate the analysis node.
Correspondingly, the invention also provides a questioning type analysis node generation system, which comprises:
the preprocessing module is used for realizing the step S1, and carrying out preprocessing and word segmentation on the input natural language question to obtain words after word segmentation;
the feature extraction module is used for implementing the step S2, performing feature representation and feature extraction on the text data corresponding to the preprocessed input natural language problem, and converting the text data into a numerical form;
an information extraction module, configured to implement step S3, extract key information from the input natural language question, perform type identification on the key information, and obtain entity category information;
an intention recognition module, configured to implement step S4, construct an intention recognition model, determine an analysis intention of the input natural language question, and complete intention recognition;
and an analysis node generation module for implementing the step S5, obtaining a data source, an analysis dimension, an analysis index, an analysis task and other additional data analysis information to be analyzed in the natural language problem by combining the processing results of the feature extraction module, the information extraction module and the intention identification module, and automatically generating an analysis node.
Based on the same inventive concept, the present invention also proposes a storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps S1-S5 of the inventive analysis node generation method.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A question type analysis node generation method is characterized by comprising the following steps:
s1, preprocessing the input natural language problem and performing word segmentation processing to obtain words after word segmentation processing;
s2, performing feature representation and feature extraction on the text data corresponding to the preprocessed input natural language problem, and converting the text data into a numerical form;
s3, extracting key information in the input natural language question, and performing type recognition on the key information to obtain entity category information;
s4, constructing an intention recognition model, judging the analysis intention of the input natural language question, and finishing intention recognition;
and S5, combining the results of feature extraction, type identification and intention identification in the steps S2-S4 to obtain data sources, analysis dimensions, analysis indexes, analysis tasks and other additional data analysis information which need to be analyzed in the natural language problem, and automatically generating analysis nodes.
2. The analysis node generation method according to claim 1, wherein step S5 includes:
s51, making a task data interface of the analysis node, and making a standard data interface for each analysis node task;
s52, generating data interface information, and matching and indexing to obtain data source information, index information, dimension information and other additional data analysis information based on entity category information and in combination with metadata information; determining an analysis node task based on the analysis intent; and processing the data source information, the index information, the dimension information and other additional data analysis information, transmitting the processed data source information, the index information, the dimension information and the other additional data analysis information to corresponding analysis node tasks, and calling the analysis node tasks to complete the generation and display of analysis results.
3. The method according to claim 2, wherein the other additional data analysis message includes time information and area information.
4. The analytical node generation method of claim 2, wherein in step S51, the task input data of the trend analytical node includes a data source name, an analytical index, a time range, and a filtering condition; the distributed analysis node task input data comprises a data source name, an analysis index, an analysis dimension and a screening condition.
5. The analysis node generation method according to claim 1, wherein step S4 includes:
s41, firstly, marking training data, and marking the intention type of each natural language question;
and S42, training the classification model to construct an intention recognition model, performing intention recognition on text data corresponding to the input natural language question by using the intention recognition model, performing probability prediction on each intention type, and selecting the intention type with the highest probability as the input natural language question.
6. The analysis node generation method according to claim 1, wherein step S3 includes:
s31, carrying out sequence tagging on the text data in the training data to obtain the entity type of the segment to which each word element belongs and the position of the word element in the segment to which the word element belongs in the text data to form tagged data;
and S32, taking the data after the sequence labeling as training data, training by using a BilSTM-CRF model, and using the model obtained after parameter optimization for type recognition of the newly input natural language problem.
7. The method for generating an analysis node according to claim 6, wherein in step S31, each word element in the text data is labeled as "B-X", "I-X" or "O" in a BIO labeling manner, where "B-X" indicates that the segment where the word element is located belongs to the X type and the word element is at the beginning of the segment, "I-X" indicates that the segment where the word element is located belongs to the X type and the word element is at the middle position of the segment, and "O" indicates that the word element does not belong to any type; and "X" represents the name of the entity type to be identified.
8. A question-asked analysis node generation system, comprising:
the preprocessing module is used for preprocessing and word segmentation processing the input natural language problem to obtain words after word segmentation processing;
the feature extraction module is used for performing feature representation and feature extraction on the text data corresponding to the preprocessed input natural language problem and converting the text data into a numerical form;
the information extraction module is used for extracting key information in the input natural language question and identifying the type of the key information to obtain entity category information;
the intention recognition module is used for constructing an intention recognition model, judging the analysis intention of the input natural language question and finishing intention recognition;
and the analysis node generation module is used for combining the processing results of the feature extraction module, the information extraction module and the intention identification module to obtain a data source, analysis dimensionality, analysis indexes, analysis tasks and other additional data analysis information which are required to be analyzed in the natural language problem and automatically generating an analysis node.
9. The system according to claim 8, wherein the process of generating the analysis node by the analysis node generation module comprises:
making a data interface of an analysis node task, and making a standard data interface for each analysis node task;
generating data interface information, namely matching and indexing to obtain data source information, index information, dimension information and other additional data analysis information based on entity category information and in combination with metadata information; determining an analysis node task based on the analysis intent; and processing the data source information, the index information, the dimension information and other additional data analysis information, transmitting the processed data source information, the index information, the dimension information and the other additional data analysis information to corresponding analysis node tasks, and calling the analysis node tasks to complete the generation and display of analysis results.
10. Storage medium having stored thereon computer instructions, characterized in that said computer instructions, when executed by a processor, carry out the steps of the analysis node generation method according to any of claims 1-7.
CN202011259004.3A 2020-11-12 2020-11-12 Question type analysis node generation method, system and storage medium Active CN112270189B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011259004.3A CN112270189B (en) 2020-11-12 2020-11-12 Question type analysis node generation method, system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011259004.3A CN112270189B (en) 2020-11-12 2020-11-12 Question type analysis node generation method, system and storage medium

Publications (2)

Publication Number Publication Date
CN112270189A true CN112270189A (en) 2021-01-26
CN112270189B CN112270189B (en) 2023-07-18

Family

ID=74339857

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011259004.3A Active CN112270189B (en) 2020-11-12 2020-11-12 Question type analysis node generation method, system and storage medium

Country Status (1)

Country Link
CN (1) CN112270189B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114528404A (en) * 2022-02-18 2022-05-24 浪潮卓数大数据产业发展有限公司 Method and device for identifying provincial and urban areas

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20050032937A (en) * 2003-10-02 2005-04-08 한국전자통신연구원 Method for automatically creating a question and indexing the question-answer by language-analysis and the question-answering method and system
CN103226606A (en) * 2013-04-28 2013-07-31 浙江核新同花顺网络信息股份有限公司 Inquiry selection method and system
CN107545349A (en) * 2016-06-28 2018-01-05 国网天津市电力公司 A kind of Data Quality Analysis evaluation model towards electric power big data
CN108108426A (en) * 2017-12-15 2018-06-01 杭州网蛙科技有限公司 Understanding method, device and the electronic equipment that natural language is putd question to
WO2019153522A1 (en) * 2018-02-09 2019-08-15 卫盈联信息技术(深圳)有限公司 Intelligent interaction method, electronic device, and storage medium
CN110210036A (en) * 2019-06-05 2019-09-06 上海云绅智能科技有限公司 A kind of intension recognizing method and device
CN110309400A (en) * 2018-02-07 2019-10-08 鼎复数据科技(北京)有限公司 A kind of method and system that intelligent Understanding user query are intended to
CN110334347A (en) * 2019-06-27 2019-10-15 腾讯科技(深圳)有限公司 Information processing method, relevant device and storage medium based on natural language recognition
CN110413746A (en) * 2019-06-25 2019-11-05 阿里巴巴集团控股有限公司 The method and device of intention assessment is carried out to customer problem
CN110968663A (en) * 2018-09-30 2020-04-07 北京国双科技有限公司 Answer display method and device of question-answering system
CN111026941A (en) * 2019-10-28 2020-04-17 江苏普旭软件信息技术有限公司 Intelligent query method for demonstration and evaluation of equipment system
CN111125145A (en) * 2019-11-26 2020-05-08 复旦大学 Automatic system for acquiring database information through natural language
CN111709235A (en) * 2020-05-28 2020-09-25 上海发电设备成套设计研究院有限责任公司 Text data statistical analysis system and method based on natural language processing

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20050032937A (en) * 2003-10-02 2005-04-08 한국전자통신연구원 Method for automatically creating a question and indexing the question-answer by language-analysis and the question-answering method and system
CN103226606A (en) * 2013-04-28 2013-07-31 浙江核新同花顺网络信息股份有限公司 Inquiry selection method and system
CN107545349A (en) * 2016-06-28 2018-01-05 国网天津市电力公司 A kind of Data Quality Analysis evaluation model towards electric power big data
CN108108426A (en) * 2017-12-15 2018-06-01 杭州网蛙科技有限公司 Understanding method, device and the electronic equipment that natural language is putd question to
CN110309400A (en) * 2018-02-07 2019-10-08 鼎复数据科技(北京)有限公司 A kind of method and system that intelligent Understanding user query are intended to
WO2019153522A1 (en) * 2018-02-09 2019-08-15 卫盈联信息技术(深圳)有限公司 Intelligent interaction method, electronic device, and storage medium
CN110968663A (en) * 2018-09-30 2020-04-07 北京国双科技有限公司 Answer display method and device of question-answering system
CN110210036A (en) * 2019-06-05 2019-09-06 上海云绅智能科技有限公司 A kind of intension recognizing method and device
CN110413746A (en) * 2019-06-25 2019-11-05 阿里巴巴集团控股有限公司 The method and device of intention assessment is carried out to customer problem
CN110334347A (en) * 2019-06-27 2019-10-15 腾讯科技(深圳)有限公司 Information processing method, relevant device and storage medium based on natural language recognition
CN111026941A (en) * 2019-10-28 2020-04-17 江苏普旭软件信息技术有限公司 Intelligent query method for demonstration and evaluation of equipment system
CN111125145A (en) * 2019-11-26 2020-05-08 复旦大学 Automatic system for acquiring database information through natural language
CN111709235A (en) * 2020-05-28 2020-09-25 上海发电设备成套设计研究院有限责任公司 Text data statistical analysis system and method based on natural language processing

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114528404A (en) * 2022-02-18 2022-05-24 浪潮卓数大数据产业发展有限公司 Method and device for identifying provincial and urban areas

Also Published As

Publication number Publication date
CN112270189B (en) 2023-07-18

Similar Documents

Publication Publication Date Title
Shelar et al. Named entity recognition approaches and their comparison for custom ner model
Jung Semantic vector learning for natural language understanding
CN108319583B (en) Method and system for extracting knowledge from Chinese language material library
Mottaghinia et al. A review of approaches for topic detection in Twitter
Huang et al. Expert as a service: Software expert recommendation via knowledge domain embeddings in stack overflow
Lydia et al. Correlative study and analysis for hidden patterns in text analytics unstructured data using supervised and unsupervised learning techniques
CN112270188A (en) Questioning type analysis path recommendation method, system and storage medium
Kochtchi et al. Networks of Names: Visual Exploration and Semi‐Automatic Tagging of Social Networks from Newspaper Articles
Sahnoun et al. Event detection based on open information extraction and ontology
Hossari et al. TEST: A terminology extraction system for technology related terms
Ali et al. Named entity recognition using deep learning: A review
Hashemzadeh et al. Improving keyword extraction in multilingual texts.
Altuncu et al. Graph-based topic extraction from vector embeddings of text documents: Application to a corpus of news articles
CN112270189B (en) Question type analysis node generation method, system and storage medium
Atwan et al. The use of stemming in the Arabic text and its impact on the accuracy of classification
Girija et al. A comparative review on approaches of aspect level sentiment analysis
Pertsas et al. Ontology-driven information extraction from research publications
Chen et al. Multi-modal multi-layered topic classification model for social event analysis
Sheng et al. A Markov network based passage retrieval method for multimodal question answering in the cultural heritage domain
Rabby et al. Establishing a formal benchmarking process for sentiment analysis for the bangla language
CN111061939A (en) Scientific research academic news keyword matching recommendation method based on deep learning
CN117291192B (en) Government affair text semantic understanding analysis method and system
Habib et al. Iot-based pervasive sentiment analysis: A fine-grained text normalization framework for context aware hybrid applications
Rybak et al. Machine Learning-Enhanced Text Mining as a Support Tool for Research on Climate Change: Theoretical and Technical Considerations
Jo et al. Data encoding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant