CN113076127B - Method, system, electronic device and medium for extracting question and answer content in programming environment - Google Patents

Method, system, electronic device and medium for extracting question and answer content in programming environment Download PDF

Info

Publication number
CN113076127B
CN113076127B CN202110449778.0A CN202110449778A CN113076127B CN 113076127 B CN113076127 B CN 113076127B CN 202110449778 A CN202110449778 A CN 202110449778A CN 113076127 B CN113076127 B CN 113076127B
Authority
CN
China
Prior art keywords
question
steps
words
text
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110449778.0A
Other languages
Chinese (zh)
Other versions
CN113076127A (en
Inventor
陈林
赵恒辉
李言辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202110449778.0A priority Critical patent/CN113076127B/en
Priority to PCT/CN2021/089820 priority patent/WO2022226714A1/en
Publication of CN113076127A publication Critical patent/CN113076127A/en
Application granted granted Critical
Publication of CN113076127B publication Critical patent/CN113076127B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a method, a system, an electronic device and a medium for extracting question and answer contents in a programming environment, wherein the system comprises the following steps: a data processing module for executing: preprocessing the input network question-answering text data, removing useless information and performing word segmentation; entity identification module for executing: entity identification in the field of software engineering is carried out on the text processed by the data processing module; a document reading module for performing: inputting the text recognized by the entity recognition module into a neural network for document reading; the abstract extraction module is used for executing the following steps: and extracting key contents in the question-answer text by using another neural network. The invention can extract the key content in the technical questions and answers, reduce the browsing time of developers and improve the on-site development efficiency of programming.

Description

Method, system, electronic device and medium for extracting question and answer content in programming environment
Technical Field
The invention relates to a method, a system, electronic equipment and a medium for extracting question and answer contents in a programming environment, and belongs to the technical field of Internet.
Background
Software development is a flexible and challenging task that developers need strong learning ability and ability to solve problems. In a programming field, a developer can look up a tool book when encountering problems, can also frequently search for network help, inquire other developers who encounter similar problems, reference solutions of other people, avoid repeated labor, and improve development efficiency. Therefore, the software question-answering community is gradually active and aims to provide a platform for the developer to help each other and record the problems.
Active developers on the technical question-answering platform are more and more, and the developers are provided with questions to answer the questions, and meanwhile, the solution thought is provided for other developers who encounter similar questions, but not all questions can be solved on the platform, and a large amount of redundant information and irrelevant information exist on the platform, so that the assistance seeking by the developers is hindered. A question on a technical question-and-answer platform often corresponds to more than one answer, and there are cases where the answer is not related to the question, cases where the answers are repeated similarly, and cases where the relevant part is not related and the answer is repeated partially in the answer. For these situations, the platform has also made a lot of effort, such as Stack Overflow, to give the user a score for each answer to the question, and to make the highest scoring answer as visible to more people. This solves to some extent the interference of extraneous information, but there are still considerable limitations. If all answers under the same question are regarded as a document, abstract extraction is carried out on all the answers, and key contents are marked, so that the method can play a role similar to 'highlighting', help users reduce browsing time and improve development efficiency of programming sites.
Text summarization techniques may convert text or a collection of text into a short summary containing key information. The text abstract can be divided into an extraction type abstract and a generation type abstract according to the output type, wherein the extraction type abstract is an abstract formed by directly extracting a plurality of sentences from an original text, and sequencing and recombining the sentences. The extraction type abstract is applied to a technical question-answer community, so that key contents in answers can be extracted, and a developer is helped to quickly locate the desired answer contents.
In recent years, scholars have proposed a number of methods for summary extraction. Julian Kupiec et al propose that abstract extraction can be regarded as classical classification problem, given a series of training document data and manually extracted abstract results, training to obtain a classifier, obtaining probability that a given sentence can be incorporated into the abstract; conroy and O' Leary propose to abstract with hidden Markov model, and get the best effect compared with other models at that time; the Erkanand proposes a graph-based algorithm LexPageRank, when the cosine similarity of two sentences exceeds a certain threshold value, a corresponding edge is added into a connection matrix, and then the importance of the sentences is calculated through the connection matrix; woodsend et al propose a model of joint content selection and compression for document summarization that uses integer linear programming to select and combine words to construct a summary based on length, coverage and grammatical constraints; kageback et al calculate similarity between sentences through successive vector space representations and abstract extraction of documents using a recursive automatic encoder; yin et al project sentences into a continuous vector space through a Convolutional Neural Network (CNN), minimize cost based on 'prestige' and 'diversity', extract proper sentences, and obtain good effects in multi-document extraction type abstract tasks; cao et al also solved the query-oriented multi-document summarization problem using CNN, they expressed documents using weighted sum-pooling based on sentence representation, weights were learned from the attention mechanism of the query-clause representation; cheng et al propose an automatic summary framework based on hierarchical document encoders and attention mechanisms that can achieve a fairly good summary extraction without the aid of language labeling. However, the current abstract extraction work is aimed at the general field, and no technology and method have been proposed by a learner for abstract extraction in the field of software engineering.
Disclosure of Invention
The first object of the invention is to provide an automatic extraction system for key contents of programming field technical questions and answers, which can extract the key contents in the technical questions and answers, reduce the browsing time of developers and improve the programming field development efficiency. The second object of the invention is to provide an automatic extraction method of the key contents of the programming field technology question-answer.
The invention adopts the following technical scheme: the system for extracting the question and answer content in the programming environment comprises:
a data processing module for executing: preprocessing the input network question-answering text data, removing useless information and performing word segmentation;
entity identification module for executing: entity identification in the field of software engineering is carried out on the text processed by the data processing module;
a document reading module for performing: inputting the text recognized by the entity recognition module into a neural network for document reading;
the abstract extraction module is used for executing the following steps: and extracting key contents in the question-answer text by using another neural network.
As a preferred embodiment, the data processing module specifically performs the steps of: an initial state; processing code segments in the question-answering text; processing the HTML tag; processing the URL; processing the expression symbol; processing the "@" information; word segmentation is carried out by using an ntk tool; and finishing the data processing.
As a preferred embodiment, the entity identification module specifically performs the following steps: an initial state; calculating to obtain spelling patterns of words, including whether the initial of the word is capitalized, whether the underline is included and whether the word is included; calculating to obtain the context characteristics of the words, specifically using a window of [ -2,2], and adding the words in the window, namely the front word and the rear word, as the characteristics; calculating to obtain bit stream characteristics of words, specifically using unlabeled texts in the field of large-scale software engineering, clustering similarly distributed words into one class by using a clustering method, and representing the class by bit streams with different lengths as characteristics; calculating to obtain the external dictionary characteristics of the words, specifically collecting a large number of known entities to form an external dictionary, and checking whether the words exist in the external dictionary; performing entity identification by using a CRF model obtained by training a tool CRF++; and (5) finishing entity identification.
As a preferred embodiment, the document reading module specifically performs the steps of: an initial state; obtaining a sentence-level vector representation through a single-layer convolutional neural network with maximum pooling; converting the sentence-level vector representation into a document-level vector representation through a recurrent neural network; finishing the document reading; the abstract extraction module specifically performs the following steps: an initial state; by taking the thought of the attention mechanism as a reference, a cyclic neural network is used for marking whether each sentence can be regarded as a summary or not in sequence; and (5) finishing abstract extraction.
The invention also provides a method for extracting the question and answer content in the programming environment, which comprises the following steps:
the data processing step specifically comprises the following steps: preprocessing the input network question-answering text data, removing useless information and performing word segmentation;
the entity identification step specifically comprises the following steps: entity identification in the field of software engineering is carried out on the text processed by the data processing module;
the document reading step specifically comprises the following steps: inputting the text recognized by the entity recognition module into a neural network for document reading;
the abstract extraction step specifically comprises the following steps: and extracting key contents in the question-answer text by using another neural network.
As a preferred embodiment, the data processing step specifically includes: an initial state; processing code segments in the question-answering text; processing the HTML tag; processing the URL; processing the expression symbol; processing the "@" information; word segmentation is carried out by using an ntk tool; and finishing the data processing.
As a preferred embodiment, the entity identification step specifically includes: an initial state; calculating to obtain spelling patterns of words, including whether the initial of the word is capitalized, whether the underline is included and whether the word is included; calculating to obtain the context characteristics of the words, specifically using a window of [ -2,2], and adding the words in the window, namely the front word and the rear word, as the characteristics; calculating to obtain bit stream characteristics of words, specifically using unlabeled texts in the field of large-scale software engineering, clustering similarly distributed words into one class by using a clustering method, and representing the class by bit streams with different lengths as characteristics; calculating to obtain the external dictionary characteristics of the words, specifically collecting a large number of known entities to form an external dictionary, and checking whether the words exist in the external dictionary; performing entity identification by using a CRF model obtained by training a tool CRF++; and (5) finishing entity identification.
As a preferred embodiment, the document reading step specifically includes: an initial state; obtaining a sentence-level vector representation through a single-layer convolutional neural network with maximum pooling; converting the sentence-level vector representation into a document-level vector representation through a recurrent neural network; finishing the document reading; the abstract extracting step comprises the following specific execution steps: an initial state; by taking the thought of the attention mechanism as a reference, a cyclic neural network is used for marking whether each sentence can be regarded as a summary or not in sequence; and (5) finishing abstract extraction.
The invention also proposes an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method when executing the program.
The invention also proposes a medium on which a computer program is stored which, when being executed by a processor, implements the steps of the method.
The invention has the beneficial effects that: (1) The automatic extraction system for the key contents of the programming field technical questions and answers provided by the invention can extract the key contents in the technical questions and answers, reduce the browsing time of developers and improve the programming field development efficiency. (2) The automatic extraction system for the key contents of the technical questions and answers in the programming field can automatically extract the key contents of the technical questions and answers without manual labeling, and greatly reduces the extraction cost of the key contents. (3) The automatic extraction method of the key content of the programming field technology question and answer provided by the invention is a brand new attempt facing the field of software engineering, and fills the blank of the field of software engineering about key content extraction.
Drawings
Fig. 1 is a flowchart of a method of extracting question-answer contents in a programming environment of the present invention.
Fig. 2 is a schematic diagram of an example of the CNN structure of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.
Example 1: the invention provides a question and answer content extraction system in a programming environment, which comprises the following steps:
a data processing module for executing: preprocessing the input network question-answering text data, removing useless information and performing word segmentation;
entity identification module for executing: entity identification in the field of software engineering is carried out on the text processed by the data processing module;
a document reading module for performing: inputting the text recognized by the entity recognition module into a neural network for document reading;
the abstract extraction module is used for executing the following steps: and extracting key contents in the question-answer text by using another neural network.
Preferably, the specific execution of the data processing module includes: an initial state; processing code segments in the question-answering text; processing the HTML tag; processing the URL; processing the expression symbol; processing the "@" information; word segmentation is carried out by using an ntk tool; and finishing the data processing.
Preferably, the entity identification module specifically performs the following steps: an initial state; calculating to obtain spelling patterns of words, including whether the initial of the word is capitalized, whether the underline is included and whether the word is included; calculating to obtain the context characteristics of the words, specifically using a window of [ -2,2], and adding the words in the window, namely the front word and the rear word, as the characteristics; calculating to obtain bit stream characteristics of words, specifically using unlabeled texts in the field of large-scale software engineering, clustering similarly distributed words into one class by using a clustering method, and representing the class by bit streams with different lengths as characteristics; calculating to obtain the external dictionary characteristics of the words, specifically collecting a large number of known entities to form an external dictionary, and checking whether the words exist in the external dictionary; performing entity identification by using a CRF model obtained by training a tool CRF++; and (5) finishing entity identification.
Preferably, the document reading module specifically performs the steps of: an initial state; obtaining a sentence-level vector representation through a single-layer convolutional neural network with maximum pooling; converting the sentence-level vector representation into a document-level vector representation through a recurrent neural network; finishing the document reading; the abstract extraction module specifically performs the following steps: an initial state; by taking the thought of the attention mechanism as a reference, a cyclic neural network is used for marking whether each sentence can be regarded as a summary or not in sequence; and (5) finishing abstract extraction.
Example 2: the invention also provides a method for extracting the question-answer content in the programming environment, the general framework of the invention is shown in figure 1, and the method for extracting the question-answer content in the programming environment comprises the following 4 steps:
step 1: for the question and answer text on the network, firstly clearing the content in all the < pre > tags, wherein the code segments in the question and answer are appeared in the < pre > tags, and the content in the < pre > tags is cleared, so that the code segments are cleared; all html tags are then deleted, e.g., < pre > < p > < div > etc.; next, the URL appearing in the text is replaced by "@ u@", the appearing expression such as ":" is replaced by "@ e@", and the appearing "@" content of other users is replaced by "@ a@"; finally, word segmentation of text using the nltk word segmentation tool requires that the API name as a whole, e.g., os.path.join (path) needs to be indistinguishable as a word.
Step 2: and carrying out entity recognition on the text after the data processing. The subject of the entity identification method is a conditional random field model (CRF) implemented based on a tool crf++, the features of the CRF model including:
l features on word spelling. Such as whether the word initials are uppercase, contain underlining, and contain ";
l contextual characteristics. Using a window of [ -2,2] to add the words in the window, namely the front word and the back word, as the characteristics;
bit stream characteristics of word. Classifying words appearing in similar contexts into one class by utilizing unlabeled texts in the large-scale software field and adopting a Brown clustering algorithm, setting the number of classes of the words to be 1000 altogether, and representing the words in the same class by using the same bit stream as a characteristic;
external dictionary features. A large number of known entities are collected in advance to constitute an external dictionary, and whether or not a word exists in the external dictionary is checked.
Step 3: and reading the text identified by the entity and encoding. First, a single-layer Convolutional Neural Network (CNN) is used to obtain sentence-level document representation vectors; a Recurrent Neural Network (RNN) is then used to construct a vector representation of the document. The CNN operates at the word level to obtain a sentence-level representation, which is then used as input to the RNN, which obtains the document-level representation in a hierarchical manner. The embedding dimensions of words, sentences, documents are set to 150, 300, 750, respectively.
In the single-layer convolutional neural network, for each convolutional kernel, a series of features are calculated by using a plurality of feature graphs, so that the number of the features is 300 as well and is matched with the dimension of a sentence. And, different convolution kernels with dimensions of 1-7 are used to obtain different feature representation vectors of the sentence, and finally the vectors are added to obtain the final sentence vector representation. As in the lower half of fig. 2, is an example of a CNN structure. The dimension of the word is 5, the total of 6 words in the illustrated sentence, the two colors respectively represent two convolution kernels, the dimension of the blue convolution kernel is 2, the dimension of the red convolution kernel is 3, and the convolution kernels of the two dimensions have 6 feature graphs respectively. Each feature map corresponds to one dimension in the final vector after pooling, so that a vector with two dimensions of 6 can be obtained through two convolution kernels, and the two vectors are summed to obtain the final sentence vector.
The Recurrent Neural Network (RNN) uses a single layer long and short term memory recurrent neural network (LSTM) to solve the gradient vanishing problem during long sentence training.
Step 4: by taking the thought of the attention mechanism into consideration, a cyclic neural network is used for marking whether each sentence can be regarded as key content or not in sequence, and the marking process can consider whether the sentences are independent of each other or whether the meanings are repeated or not. As shown in the upper right part of fig. 2 above, the labeling result of the next sentence depends not only on the current input but also on the labeling result of the previous sentence.
Example 3: the invention also proposes an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method when executing the program.
Example 4: the invention also proposes a medium on which a computer program is stored which, when being executed by a processor, implements the steps of the method.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present invention, and such modifications and variations should also be regarded as being within the scope of the invention.

Claims (8)

1. The system for extracting the question and answer content in the programming environment is characterized by comprising the following steps:
a data processing module for executing: preprocessing the input network question-answering text data, removing useless information and performing word segmentation;
entity identification module for executing: entity identification in the field of software engineering is carried out on the text processed by the data processing module; the entity identification module specifically performs the following steps: an initial state; calculating to obtain spelling patterns of words, including whether the initial of the word is capitalized, whether the underline is included and whether the word is included; calculating to obtain the context characteristics of the words, specifically using a window of [ -2,2], and adding the words in the window, namely the front word and the rear word, as the characteristics; calculating to obtain bit stream characteristics of words, specifically using unlabeled texts in the field of large-scale software engineering, clustering similarly distributed words into one class by using a clustering method, and representing the class by bit streams with different lengths as characteristics; calculating to obtain the external dictionary characteristics of the words, specifically collecting a large number of known entities to form an external dictionary, and checking whether the words exist in the external dictionary; performing entity identification by using a CRF model obtained by training a tool CRF++; finishing entity identification;
a document reading module for performing: inputting the text recognized by the entity recognition module into a neural network for document reading;
the abstract extraction module is used for executing the following steps: and extracting key contents in the question-answer text by using another neural network.
2. The system for extracting question-answer content in a programming environment of claim 1, wherein the data processing module specifically performs the steps of: an initial state; processing code segments in the question-answering text; processing the HTML tag; processing the URL; processing the expression symbol; processing the "@" information; word segmentation is carried out by using an ntk tool; and finishing the data processing.
3. The system for extracting question-answer content in a programming environment of claim 1, wherein the document reading module specifically performs the steps of: an initial state; obtaining a sentence-level vector representation through a single-layer convolutional neural network with maximum pooling; converting the sentence-level vector representation into a document-level vector representation through a recurrent neural network; finishing the document reading; the abstract extraction module specifically performs the following steps: an initial state; by taking the thought of the attention mechanism as a reference, a cyclic neural network is used for marking whether each sentence can be regarded as a summary or not in sequence; and (5) finishing abstract extraction.
4. The method for extracting the question and answer content in the programming environment is characterized by comprising the following steps:
the data processing step specifically comprises the following steps: preprocessing the input network question-answering text data, removing useless information and performing word segmentation;
the entity identification step specifically comprises the following steps: performing entity identification in the field of software engineering on the text processed by the data processing step; the entity identification step specifically comprises the following steps: an initial state; calculating to obtain spelling patterns of words, including whether the initial of the word is capitalized, whether the underline is included and whether the word is included; calculating to obtain the context characteristics of the words, specifically using a window of [ -2,2], and adding the words in the window, namely the front word and the rear word, as the characteristics; calculating to obtain bit stream characteristics of words, specifically using unlabeled texts in the field of large-scale software engineering, clustering similarly distributed words into one class by using a clustering method, and representing the class by bit streams with different lengths as characteristics; calculating to obtain the external dictionary characteristics of the words, specifically collecting a large number of known entities to form an external dictionary, and checking whether the words exist in the external dictionary; performing entity identification by using a CRF model obtained by training a tool CRF++; finishing entity identification;
the document reading step specifically comprises the following steps: inputting the text identified by the entity identification step into a neural network for document reading;
the abstract extraction step specifically comprises the following steps: and extracting key contents in the question-answer text by using another neural network.
5. The method for extracting question-answer content in a programming environment according to claim 4, wherein the data processing step specifically comprises: an initial state; processing code segments in the question-answering text; processing the HTML tag; processing the URL; processing the expression symbol; processing the "@" information; word segmentation is carried out by using an ntk tool; and finishing the data processing.
6. The method for extracting question-answer contents in a programming environment according to claim 4, wherein the document reading step specifically comprises: an initial state; obtaining a sentence-level vector representation through a single-layer convolutional neural network with maximum pooling; converting the sentence-level vector representation into a document-level vector representation through a recurrent neural network; finishing the document reading; the abstract extracting step specifically comprises the following steps: an initial state; by taking the thought of the attention mechanism as a reference, a cyclic neural network is used for marking whether each sentence can be regarded as a summary or not in sequence; and (5) finishing abstract extraction.
7. Electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 4 to 6 when the program is executed.
8. A medium having stored thereon a computer program, which when executed by a processor performs the steps of the method according to any of claims 4 to 6.
CN202110449778.0A 2021-04-25 2021-04-25 Method, system, electronic device and medium for extracting question and answer content in programming environment Active CN113076127B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110449778.0A CN113076127B (en) 2021-04-25 2021-04-25 Method, system, electronic device and medium for extracting question and answer content in programming environment
PCT/CN2021/089820 WO2022226714A1 (en) 2021-04-25 2021-04-26 Method and system for extracting question and answer content in programming environment, electronic device, and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110449778.0A CN113076127B (en) 2021-04-25 2021-04-25 Method, system, electronic device and medium for extracting question and answer content in programming environment

Publications (2)

Publication Number Publication Date
CN113076127A CN113076127A (en) 2021-07-06
CN113076127B true CN113076127B (en) 2023-08-29

Family

ID=76618820

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110449778.0A Active CN113076127B (en) 2021-04-25 2021-04-25 Method, system, electronic device and medium for extracting question and answer content in programming environment

Country Status (2)

Country Link
CN (1) CN113076127B (en)
WO (1) WO2022226714A1 (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF
CN106776562A (en) * 2016-12-20 2017-05-31 上海智臻智能网络科技股份有限公司 A kind of keyword extracting method and extraction system
CN109902175A (en) * 2019-02-20 2019-06-18 上海方立数码科技有限公司 A kind of file classification method and categorizing system based on neural network structure model
CN110298037A (en) * 2019-06-13 2019-10-01 同济大学 The matched text recognition method of convolutional neural networks based on enhancing attention mechanism
CN110390049A (en) * 2019-07-10 2019-10-29 北京航空航天大学 A kind of answer automatic generation method of software-oriented development problem
CN111428012A (en) * 2020-03-02 2020-07-17 平安科技(深圳)有限公司 Intelligent question-answering method, device, equipment and storage medium based on attention mechanism
CN111522965A (en) * 2020-04-22 2020-08-11 重庆邮电大学 Question-answering method and system for entity relationship extraction based on transfer learning
CN111611361A (en) * 2020-04-01 2020-09-01 西南电子技术研究所(中国电子科技集团公司第十研究所) Intelligent reading, understanding, question answering system of extraction type machine
CN111666752A (en) * 2020-04-20 2020-09-15 中山大学 Circuit teaching material entity relation extraction method based on keyword attention mechanism
CN112149421A (en) * 2020-09-23 2020-12-29 云南师范大学 Software programming field entity identification method based on BERT embedding
WO2020261234A1 (en) * 2019-06-28 2020-12-30 Tata Consultancy Services Limited System and method for sequence labeling using hierarchical capsule based neural network
CN112329465A (en) * 2019-07-18 2021-02-05 株式会社理光 Named entity identification method and device and computer readable storage medium
CN112417854A (en) * 2020-12-15 2021-02-26 北京信息科技大学 Chinese document abstraction type abstract method
CN115952263A (en) * 2022-12-16 2023-04-11 桂林电子科技大学 Question-answering method fusing machine reading understanding

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6842167B2 (en) * 2017-05-08 2021-03-17 国立研究開発法人情報通信研究機構 Summary generator, summary generation method and computer program
US11487954B2 (en) * 2019-07-22 2022-11-01 Capital One Services, Llc Multi-turn dialogue response generation via mutual information maximization

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106569998A (en) * 2016-10-27 2017-04-19 浙江大学 Text named entity recognition method based on Bi-LSTM, CNN and CRF
CN106776562A (en) * 2016-12-20 2017-05-31 上海智臻智能网络科技股份有限公司 A kind of keyword extracting method and extraction system
CN109902175A (en) * 2019-02-20 2019-06-18 上海方立数码科技有限公司 A kind of file classification method and categorizing system based on neural network structure model
CN110298037A (en) * 2019-06-13 2019-10-01 同济大学 The matched text recognition method of convolutional neural networks based on enhancing attention mechanism
WO2020261234A1 (en) * 2019-06-28 2020-12-30 Tata Consultancy Services Limited System and method for sequence labeling using hierarchical capsule based neural network
CN110390049A (en) * 2019-07-10 2019-10-29 北京航空航天大学 A kind of answer automatic generation method of software-oriented development problem
CN112329465A (en) * 2019-07-18 2021-02-05 株式会社理光 Named entity identification method and device and computer readable storage medium
CN111428012A (en) * 2020-03-02 2020-07-17 平安科技(深圳)有限公司 Intelligent question-answering method, device, equipment and storage medium based on attention mechanism
CN111611361A (en) * 2020-04-01 2020-09-01 西南电子技术研究所(中国电子科技集团公司第十研究所) Intelligent reading, understanding, question answering system of extraction type machine
CN111666752A (en) * 2020-04-20 2020-09-15 中山大学 Circuit teaching material entity relation extraction method based on keyword attention mechanism
CN111522965A (en) * 2020-04-22 2020-08-11 重庆邮电大学 Question-answering method and system for entity relationship extraction based on transfer learning
CN112149421A (en) * 2020-09-23 2020-12-29 云南师范大学 Software programming field entity identification method based on BERT embedding
CN112417854A (en) * 2020-12-15 2021-02-26 北京信息科技大学 Chinese document abstraction type abstract method
CN115952263A (en) * 2022-12-16 2023-04-11 桂林电子科技大学 Question-answering method fusing machine reading understanding

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向知识库问答的多注意力RNN关系链接方法;李慧颖 等;《东南大学学报》;第36卷(第4期);385-392 *

Also Published As

Publication number Publication date
WO2022226714A1 (en) 2022-11-03
CN113076127A (en) 2021-07-06

Similar Documents

Publication Publication Date Title
CN108363743B (en) Intelligent problem generation method and device and computer readable storage medium
CN107463607B (en) Method for acquiring and organizing upper and lower relations of domain entities by combining word vectors and bootstrap learning
CN110134954B (en) Named entity recognition method based on Attention mechanism
CN110851599B (en) Automatic scoring method for Chinese composition and teaching assistance system
CN110750959A (en) Text information processing method, model training method and related device
CN108345583B (en) Event identification and classification method and device based on multilingual attention mechanism
CN106372061A (en) Short text similarity calculation method based on semantics
CN111858944A (en) Entity aspect level emotion analysis method based on attention mechanism
CN112966525B (en) Law field event extraction method based on pre-training model and convolutional neural network algorithm
CN112800184B (en) Short text comment emotion analysis method based on Target-Aspect-Opinion joint extraction
CN112101031B (en) Entity identification method, terminal equipment and storage medium
CN111125367A (en) Multi-character relation extraction method based on multi-level attention mechanism
US11170169B2 (en) System and method for language-independent contextual embedding
CN117076653B (en) Knowledge base question-answering method based on thinking chain and visual lifting context learning
CN113268576B (en) Deep learning-based department semantic information extraction method and device
CN112364132A (en) Similarity calculation model and system based on dependency syntax and method for building system
CN114239574A (en) Miner violation knowledge extraction method based on entity and relationship joint learning
CN115034218A (en) Chinese grammar error diagnosis method based on multi-stage training and editing level voting
Al Ghamdi A novel approach to printed Arabic optical character recognition
CN113076127B (en) Method, system, electronic device and medium for extracting question and answer content in programming environment
CN116595023A (en) Address information updating method and device, electronic equipment and storage medium
CN113220864B (en) Intelligent question-answering data processing system
CN116266268A (en) Semantic analysis method and device based on contrast learning and semantic perception
CN112214511A (en) API recommendation method based on WTP-WCD algorithm
CN110909547A (en) Judicial entity identification method based on improved deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant